Speech Recognition
Bibliography

Bookmark and Share


General overviews
Textbooks
Compilations and conference proceedings
Speech recognition techniques
Statistical modelling
Connected and continuous speech
Large vocabularies
Phonetic and linguistic knowledge in speech recognition
Phonetic knowledge
Prosodic knowledge
Phonological knowledge
Recognition of emotional speech
Speech recognition products and applications
Speech recognition evaluation and assessment
Speaker recognition
Language identification
Spoken language understanding

Speech recognition

Speech recognition



Suggested reading

General overviews

Speech Recognition


AINSWORTH, W.A. (1997) "Some Approaches to Automatic Speech Recognition", in HARDCASTLE, W.J. - LAVER, J. (Eds.) The Handbook of Phonetic Sciences. Oxford: Blackwell Publishers (Blackwell Handbooks in Linguistics, 5). pp. 721-743.

Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: A review. International Journal of Computer Science and Information Society, 6(3), 181-205. Retrieved from http://arxiv.org/pdf/1001.2267

BAKER, J.M. (1987) "State-of-the-Art Speech Recognition, US Research and Business Update", in LAVER, J.- JACK, M.A. (Eds.) (1987) European Conference on Speech Technology. Edinburgh, September 1987. Edinburgh: CEP Consultants Ltd. pp.440-446.


BERNSTEIN, J.- FRANCO, H. (1996) "Speech recognition by computer", in LASS, N.J (Ed.) Principles of Experimental Phonetics. St Louis: Mosby. pp. 408-434.

BRISTOW, G. (1986) "The Speech Recognition Problem" in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins.pp. 3-17.

CASACUBERTA, F.- VIDAL, E. (1987) "Reconocimiento automático del habla: metodologías y arquitecturas", in MOMPÍN, J. (Dir.) Inteligencia artificial. Conceptos, técnicas y aplicaciones. Barcelona: Marcombo - Boixareu Editores. pp. 167-177.

CASACUBERTA, F.- VIDAL, E. (1990) "Reconocimiento automático del habla", Estudios de Fonética Experimental 4: 169-180

CASACUBERTA NOLLA, F. (1991) "Aprendizaje automático en reconocimiento del habla", in Simposio de la Lengua Española. Ciencia y Tecnología. Pabellón de España, Barcelona 7-11 de octubre de 1991.


COLE, R.- ZUE, V. (Eds.) (1997) "Spoken Language Input", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press. pp. 1-70.
http://cslu.cse.ogi.edu/HLTsurvey/ch1node2.html#Chapter1

CHOLLET, G. (1994) "Automatic Speech and Speaker Recognition: Overview, Current Issues and Perspectives", in KELLER, E. (Ed.) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons. pp. 129-148.


DEROO, O. (1999) A Short Introduction to Speech Recognition. TCTS Lab, Faculté Polytechnique de Mons.
http://tcts.fpms.ac.be/asr/intro.php

ELPHICK, M. (1984) "Speech Recognition" in BRISTOW, G. (Ed.) Electronic Speech Synthesis. Techniques, Technology and Applications. London: Granada. pp. 114-128.

FURUI, S. (1991) "Recent advances in speech recognition", in Eurospeech 91. 2nd european conference on speech communication and technology. Genova, Italy, 24-26 September 1991. vol. 1 pp. 3-12.

GABRIANOWSKI, E. How Speech Recognition Works, HowStuffWorks
http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm

García Mateo, C. & Cardenal, A. (2008). Recoñecemento automático da fala: Ideas básicas e algúns exemplos. In E. Fernández Rei & X. L. Regueira (Eds.), Perspectivas sobre a oralidade. (pp. 249-72). Santiago de Compostela: Consello da Cultura Galega - Instituto da Lingua Galega.
http://consellodacultura.org/mediateca/extras/simposio_oralidade.pdf

GAUVAIN, J.L. - LAMEL, L.F. (2002) "Systèmes de reconnaissance, de compréhension et de dialogue", in MARIANI, J. (Ed.) Reconnaissance de la parole.Traitement automatique du langage parlé 2. Paris: Hermes Science - Lavoisier. Vol. 2, pp. 47-83.

GOLDEROS, A.- MARTÍNEZ, R.- NOMBELA, J.R.- PARDO, M.- SANTOS, J.- MUÑOZ, E. (1980) "Comunicación hombre máquina por voz ( y IV): El reconocimiento de la voz", Mundo electrónico 99: 131-134.

KLATT, D. H. (1983) "Human and Automatic Speech Recognition" in BROECKE, M.P.R. van den - COHEN, A. (Eds.) Proceedings of the Tenth International Congress of Phonetic Sciences. Dordrecht: Foris. pp. 183-186.


KURZWEIL, R. (1998) "When Will HAL Understand What We Are Saying? Computer Speech Recognition and Understanding", in STORK, D.G. (Ed.) Hal's Legacy: 2001's Computer as Dream and Reality. Cambridge, Mass.: The MIT Press.
http://mitpress.mit.edu/e-books/Hal/chap7/seven1.html


LAMEL, L.- GAUVAIN, J.L. (2003) "Speech recognition", in MITKOV, R. (Ed.) The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press.


La reconnaissance vocale. Délégation Génerale à la Langue Française, Réseau international des observatoires francophones de l'inforoute et du traitement informatique des langues.
http://www.culture.gouv.fr/culture/dglf/riofil/recon-vocal.htm

LEA, W.A. (1974) "Computer Recognition of Speech", in T.A. SEBEOK (Ed.) Current Trends in Linguistics, vol 12, Linguistics and Adjacent Arts and Sciences, vol 4. Mouton: The Hague. pp. 2765-2824.


LEA, W.A. (1986) "The Elements of Speech Recognition", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins. pp. 49-129.


LEVINSON, S.E.- LIBERMAN, M.Y. (1981) "Speech Recognition by Computer", Scientific American 244: 64-76; trad. cast. de R. Cerdà: "Reconocimiento del habla por medio de ordenadores", Investigación y Ciencia, 1981. pp. 38-51; in AGULLÓ, J. (Ed.) (1989) Acústica musical. Barcelona: Prensa Científica (Libros de Investigación y Ciencia) pp. 106-121.


MARIÑO, J.B.- NADEU, C. (2004) "La representación de la voz para el reconocimiento del habla", in MARTÍ, M. A. - LLISTERRI, J. (Eds.) Tecnologías del texto y del habla. Barcelona. Edicions de la Universitat de Barcelona - Fundación Duques de Soria (UB, 72). pp. 187-224.

MOORE, R. (1984) "Overview of speech input", in HOLMES, J.N. (Ed.) (1984) Proceedings of the First International Conference on Speech Technology. Brighton, 23-25 October 1984. Amsterdam: North Holland. pp. 25-38.


NADEU, C. (2001) "Representación de la voz en el reconocimiento del habla", Quark. Ciencia, Medicina, Comunicación y Cultura 21: 63-71.
http://quark.prbb.org/21/021063.htm


PARDO, J.M. (1988) "Reconocimiento del habla: una introducción", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 6 : 3-16.

PECKHAM, J.B. (1984) "Speech recognition - What is it worth ?", in HOLMES, J.N. (Ed.) (1984) Proceedings of the First International Conference on Speech Technology. Brighton, 23-25 October 1984. Amsterdam: North Holland. pp.39-48.

REDDY, R.D. (1976) "Speech Recognition by Machine: A Review", Proceedings of the IEEE 64, 4: 501-531.


Renals, S. & King, S. (2010). Automatic speech recognition. In W. J. Hardcastle, J. Laver, & F. E. Gibbon (Eds.), The handbook of phonetic sciences (2nd ed.). (pp. 804-38). Oxford: Wiley-Blackwell.

ROACH, P.- MILLER, D.- EMSLIE, J. (1992) "Speech Analysis and Recognition", in ROACH, P. (Ed.) Computing in Linguistics and Ponetics. London: Academic Press. pp. 35-50.

RUBIO, A.J.- BENÍTEZ, M.C. Sistemas de reconocimiento del habla. Programa de Doctorado: Tratamiento Digital de la Imagen y la Voz, Universidad de Granada
http://ceres.ugr.es/~rubio/docencia/doctorado/RAH_archivos/frame.htm

SOPEÑA, L. (1993) "Conversando con el ordenador. Reconocimiento automático del habla", Investigación y Ciencia, Mayo: 76-78.


TAPIAS MERINO, D. (2002) "Interfaces de voz con lenguaje natural", in MARTÍ, M.A.- LLISTERRI, J. (Eds.) (2002) Tratamiento del lenguaje natural. Tecnología de la lengua oral y escrita. Barcelona: Edicions Universitat de Barcelona - Fundación Duques de Soria (Biblioteca de la Universitat de Barcelona, Manuales, 53). pp. 189-207.


TORRES, M. I. (2006) "El reconocimiento del habla", in LLISTERRI, J.- MACHUCA, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). pp. 81-98.

VAISSIÈRE, J. (1985) "Speech Recognition: A Tutorial", in F. FALLSIDE - W.A. WOODS (Eds.) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 191-242.

VIGLIONE, S. (1986) "Recognition Past and Future", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins. pp. 373-387.

ZUE, V.- COLE, R.- WARD, W. (1997) "Speech Recognition", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press. pp.4-10.
http://cslu.cse.ogi.edu/HLTsurvey/ch1node4.html

arrow_up_gray

Textbooks

AINSWORTH, W.A. (1988) Speech Recognition by Machine. London: Peter Peregrinus Ltd on behalf of the IEE (IEE Computing Series, 12).

CASACUBERTA, F.- VIDAL, E. Con la colaboración de J.M. BENEDÍ y J. MARTÍ. (1987) Reconocimiento automático del habla. Barcelona: Marcombo - Boixareu Editores (Premios Mundo Electrónico).

CATER, J.P. (1984) Electronically Hearing: Computer Speech Recognition. Indianapolis: Howard W Sams & Co., Inc.

HATON, J.P.- CERISARA, C.- FOHR, D.- LAPRIE, Y.- SMAÏLI, K. (2006) Reconnaissance automatique de la parole. Du signal à son interprétation. Paris: Dunod (UniverSciences).
http://parole.loria.fr/livreParole/

1.- Introduction à la reconnaissance automatique de la parole; 2.- La communication parlée; 3.- Analyse du signal vocal; 4.- Modèles acoustiques pour la reconnaissance automatique de la parole; 5.- Techniques avancées; 6.- La modélisation statistique du langage: application à la reconnaissance de la parole; 7.- La compréhension automatique de la parole; 8.- Robustesse de la reconnaissance de la parole; 9.- Mise en oeuvre d'un système; 10.- Un cadre articulatoire pour la reconnaissance automatique de la parole; 11.- Applications de la reconnaissance automatique de la parole.
HATON, J.-P.- PIERREL, J.-M. - PERENNOU, G.- CAELEN, J.- GAUVAIN, J.-L. (1991) La reconnaissance automatique de la parole. Paris: Dunod.

HOLMES, J.N. (1988) Speech Synthesis and Recognition. Wokingham: Van Nostrand Reinhold (Aspects of Information Technology).

HOLMES, J.N..- HOLMES, W. (2001) Speech Synthesis and Recognition. London: Taylor & Francis, 2nd edition.

JELINEK, F. (1998) Statistical Methods for Speech Recognition. Cambridge: The MIT Press (Language, Speech and Communication Series).
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=7447

LLAMAS BELLO, C.- CARDEÑOSO PAYO, V. (1997) Reconocimiento automático del habla. Técnicas y aplicación. Valladolid: Secretariado de Publicaciones e Intercambio Científico, Universidad de Valladolid (Ciencias, 16).

O'SHAUGHNESSY, D. (1987) Speech Communication. Human and Machine. Reading, Mass.: Addison Wesley. Second Edition: IEEE Press, 2000.

POULTON, A.S. (1983) Microcomputer Speech Synthesis and Recognition. Wilmslow: Sigma Technical Press.

RABINER, L- HUANG, B.-H. (1993) Fundamentals of Speech Recognition. New York: Prentice Hall.

arrow_up_gray

Compilations and conference proceedings

BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins.

HATON, J.P. (Ed.) (1982) Automatic Speech Analysis and Recognition. Proceeedings of the NATO Advanced Study Instituteheld at Bonas, France, June 29- July 10, 1981. Dordrecht: Reidel (NATO Advanced Study Institute Series, Series C, Vol 88).

HOUSE, A.S. (1988) The Recognition of Speech by Machine: A Bibliography. New York: Academic Press.

KELLER, E. (Ed.) (1994) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons.

LAFACE, P. (Ed.) (1990) Speech Recognition and Understanding: Recent Advances, Trends and Applications. Springer-Verlag. (NATO ASI Series).

LEA, W.A. (Ed.) (1980) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall (Prentice Hall Signal Processing Series).

Proceedings of the ESCA ETRW Workshop "Accessing information in spoken audio". 19th and 20th April 1999, Cambridge, UK. ESCA, European Speech Communication Association.

REDDY, R.D. (Ed.) (1975) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press.

RUBIO, A.J. - LÓPEZ SOLER, J.M. (Eds.) Speech Recognition and Coding: News Advances and Trends. Springer-Verlag (NATO ASI series F. Computer and Systems Sciences).

SCHROEDER, M. R. (Ed.) (1985) Speech and Speaker Recognition. Basel: Karger (Bibliotheca Phonetica, 12).

SCHWAB, E.E.- NUSBAUM, H. (Eds.) (1986) Pattern Recognition by Humans and Machines. Volume 1: Speech Perception. Orlando: Academic Press, Inc.

SUEN, C.Y - DE MORI, R. (Eds.) (1982) Computer Analysis and Perception. Vol II: Auditory Signals. Boca Raton, F.L.: CRC Press.

TORRES, L.- MASGRAU, E.- LAGUNAS, M.A. (Eds.) (1990) Signal Processing V: Theories and Applications. Elsevier Science Publishers.

WAIBEL, A.- LEE, K.F. (Eds.) (1990) Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann.

Speech technologies: conference proceedings

arrow_up_gray

Speech recognition techniques

Anusuya, M., & Katti, S. (2011). Front end analysis of speech recognition: A review. International Journal of Speech Technology, Online first. doi:10.1007/s10772-010-9088-7

WOSZCYNA, M. (2001) "Técnicas de reconocimiento del habla: entre la precisión y la velocidad", Quark. Ciencia, Medicina, Comunicación y Cultura 21: 72-78.
http://quark.prbb.org/21/021062.htm

Statistical modelling

BELLEGARDA, J. (1997) "Statistical techniques for robust ASR: Review and perspectives", in KOKKINAKIS, G.- FAKOTAKIS, N.- DERMATAS, E. (Eds.) Eurospeech'97. 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 22-25 September 1997. Vol. 1. pp. KN-33 - KN-36.

COX, S.J. (1990) "Hidden Markov Models for Automatic Speech Recognition: Theory and Application", in WHEDDON, C.- LINGGARD, R. (Eds) Speech and Language Processing. London: Chapman and Hall. pp. 209-230.

HOLMES, W.- HUCKVALE, M. (1994) "Why have HMMs been so successful for automatic speech recognition and how might they be improved?", Speech, Hearing and Language, Work in Progress, 1994 (University College London, Department of Phonetics and Linguistics) 8: 207-219.

HUANG, X.- ARIKI, Y.- JACK, M. (1990) Hidden Markov Models for Speech Recognition. Edinburgh: Edinburgh University Press.

JELINEK, F. (1997) Statistical Methods for Speech Recognition. Cambridge: The MIT Press (Language, Speech and Communication Series).

JOUVET, D. (1996) "Modèles de Markov pour la reconnaissance de la parole, in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 225-238.

KNILL, K.- YOUNG, S. (1997) "Hidden Markov Models in Speech and Language Processing", in YOUNG, S.- BLOOTHOOFT, G. (Eds.) Corpus-Based Methods in Language and Speech Processing. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 2) pp. 27-68.

arrow_up_gray

Connected and continuous speech

DE LA TORRE MUNILLA, C.- HERNÁNDEZ GÓMEZ, L- CAMINERO GIL, F.J.- MARTÍN del ÁLAMO, C. (1994) "Reconocimiento de números conectados", Comunicaciones de Telefónica I+D 5, 2: 55-75.

MARIÑO, J.B.- NOGUEIRAS, A.- PACHÈS-LEAL, P.- BONAFONTE, A. (2000) "The demiphone: An efficient contextual subword unit for continuous speech recognition",Speech Communication 32, 3: 187-198.

SAN-SEGUNDO, R.- COLÁS, J.- DE CÓRDOBA, R.- PARDO, J.M. (2002) "Spanish recognizer of continuously spelled names over the telephone", Speech Communication 38, 3-4: 287-304.

arrow_up_gray

Large vocabularies

ÁLVAREZ CERCADILLO, J. (1994) "Reconocimiento de grandes vocabularios en habla continua basados en unidades inferiores a la palabra", Comunicaciones de Telefónica I+D 5, 2: 76-88.

LAMEL, L.- ADDA, M.- ADDA, G.- GAUVAIN, J.L. (1996) "Reconnaissance multilingue de grands vocabulaires, in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 299-310.

LAMEL, L.- ADDA-DECKER, M.- GAUVAIN, J.L. (1995) "Issues in Large Vocabulary Multilingual Speech Recognition", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 185-188.

WAIBEL, A. (1986) "Suprasegmentals in very large vocabulary word recognition", in SCHWAB, E.E.- NUSBAUM, H. (Eds.) (1986) Pattern Recognition by Humans and Machines. Volume 1: Speech Perception. Orlando: Academic Press, Inc. pp. 159-186.

YOUNG, S.J.- ADDA-DEKKER, M.- AUBERT, X.- DUGAST, C.- GAUVAIN, J.L.- KERSHAW, D.J.- LAMEL, L.- LEEUWEN, D.A.- PYE, D.- ROBINSON, A.J.- STEENEKEN, H.J.M. - WOODLAND, P.C. (1997) "Multilingual large vocabulary speech recognition: The European SQALE project", Computer Speech and Language 11,1: 73-89.

arrow_up_gray

Phonetic and linguistic knowledge in speech recognition

SCHARENBORG, O. (2007) "Reaching over the gap: A review of efforts to link human and automatic speech recognition research", Speech Communication 49, 5: 336-347.
http://dx.doi.org/10.1016/j.specom.2007.01.009

Phonetic knowledge in speech technology

Phonetic knowledge

ADDA-DECKER, M.- de MAREÜIL, P.B.- ADDA, G.- LAMEL, L. (2005) "Investigating syllabic structures and their variation in spontaneous French", Speech Communication 46, 2: 119-139.
http://dx.doi.org/10.1016/j.specom.2005.03.006

ADDA-DECKER, M.- LAMEL, L. (1998) "Pronunciation variants across systems, languages and speaking style", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 1-6.

ADDA-DECKER, M.- LAMEL, L. (2000) "The use of lexica in automatic speech recognition", in VAN EYNDE, F.- GIBBON, D. (Eds.) Lexicon Development for Speech and Language Processing. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 12). pp. 235-266.

AINSWORTH, W.A. (2005) "Can phonetic knowledge be used to improve the performance of speech recognisers and synthesisers?", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 13-20.

Aubanel, V., & Nguyen, N. (2010). Automatic recognition of regional phonological variation in conversational interaction. Speech Communication, In Press, Accepted Manuscript. doi:10.1016/j.specom.2010.02.008

BATES, R. A. - OSTENDORF, M. - WRIGHT, R. A. (2007) "Symbolic phonetic features for modeling of pronunciation variation", Speech Communication 49, 2: 83-97.
http://dx.doi.org/10.1016/j.specom.2006.10.007

BECKER, R.W.- POZA, F. (1975) "Acoustic Phonetic Research in Speech Understanding", IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23, 5: 416-426.

BENZEGHIBA, M. - DE MORI, R. - DEROO, O. - DUPONT, S. - ERBES, T. - JOUVET, D. - FISSORE, L. - LAFACE, P. - MERTINS, A. - RIS, C. - ROSE, R. - TYAGI, V. - WELLEKENS, C. (2007) "Automatic speech recognition and speech variability: A review", Speech Communication 49. 10-11: 763-786.
http://dx.doi.org/10.1016/j.specom.2007.02.006

BLADON, A. (1985) "Acoustic Phonetics, Auditory Phonetics, Speaker Sex and Speech Recognition: A Thread" , in FALLSIDE, F.- WOODS, W.A. (Eds.) (1985) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 29-38.

BROAD, D.J.- SHOUP, J.E. (1975) "Concepts for Acoustic Phonetic Recognition", in REDDY, R.D. (Ed.) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press. pp 243-274.

Caballero, M., Moreno, A., & Nogueiras, A. (2009). Multidialectal Spanish acoustic modeling for speech recognition. Speech Communication, 51(3), 217-229. doi:10.1016/j.specom.2008.08.003

CHRISTENSEN, H.- LINDGREN, B.- ANDERSEN, O. (2005) "Introducing phonetically motivated, heterogeneous information into automatic speech recognition", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 67-86.

DUSAN, S.- RABINER, L.R. (2005) "On integrating insights from human speech perception into automatic speech recognition", in EUROSPEECH 2005 - INTERSPEECH 2005. Proceedings of the 9th european conference on speech communication and technology. 4-8 September, 2005. Lisbon, Portugal. pp. 1233-1236.
http://cronos.rutgers.edu/~lrr/Reprints/353_dr_euro2005c.pdf

FERREIROS, J.- MACÍAS GUARASA, J.- PARDO, J.M.- VILLARRUBIA, L. (1998) "Introducing multiple pronunciations in Spanish speech recognition systems", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 29-34.

FOSLER-LUSSIER, E.- GREENBERG, S.- MORGAN, N. (1999) "Incorporating contextual phonetics into automatic speech recognition", in OHALA, J.J.- HASAGAWA, Y.- OHALA, M.- GRANVILLE, D.- BAILEY, A.C. (Eds.) Proceedings of the 14th International Congress of Phonetic Sciences. San Francisco, 1-7 August 1999.
http://www.icsi.berkeley.edu/~fosler/papers/ICPhS99-invited.pdf

FOSLER-LUSSIER, E.- BYRNE, W.- JURAFSKY, D. (Eds.) (2005) Pronunciation Modeling and Lexicon Adaptation. Special Issue. Speech Communication 46, 2.

Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001

GRAVIER, G.- YVON, F.- JACOB, B.- BIMBOT, F. (2005) "Introducing contextual transcription rules in large vocabulary speech recognition", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 87-106.

GREENBERG, S. (1998) "Recognition in a new key - Towards a science of spoken language", in ICASSP 1998. Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing. 12 -15 May, 1998. Seattle, Washington, USA. pp. 1401-1405.
http://www.icsi.berkeley.edu/~steveng/PDF/Recognition_in_a_New_Key.pdf

HAIN, T. (2005) "Implicit modelling of pronunciation variation in automatic speech recognition", Speech Communication 46, 2: 171-188.
http://dx.doi.org/10.1016/j.specom.2005.03.008

HARRINGTON, J. (1988) "Acoustic Cues for Automatic Recognition of English Consonants", in JACK, M.- LAVER, J. (Eds.) Aspects of Speech Technology. Edinburgh: Edinburgh University Press pp. 69-143.

KLATT, D. H. (1985) "The problem of variability in speech recognition and in models of speech perception", in J.A. PERKELL - D.H. KLATT (Eds.) Variability and Invariance in Speech Processes. Hillsdale, N.J.: Lawrence Erlbaum Ass. pp. 300-324.

KOREMAN, J.- ANDREEVA, B. (2000) "Can we use the linguistic information in the signal?", Phonus (Institute of Phonetics, University of the Saarland) 5: 47-58.
http://www.coli.uni-saarland.de/groups/WB/Phonetics/Research/PHONUS_research_reports/Phonus5/Koreman_PHONUS5.pdf

LI, D. - DONG, Y.- ACERO, A. (2006) "A bidirectional target-filtering model of speech coarticulation and reduction: Two-stage implementation for phonetic recognition", IEEE Transactions on Audio, Speech and Language Processing 14, 1: 256-265.
http://dx.doi.org/10.1109/TSA.2005.854107

NOLAN, F. (1986) "The nature of speech", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins.pp. 18-48.

OSTENDORF, M. (2000) "Incorporating linguistic theories of pronunciation variation into speech recognition models", in SPARCK JONES, K.- GAZDAR, G.- NEEDHAM, R. (Eds.) Computers, language and speech: Formal theories and statistical Data. Papers from a Royal Society / British Academy Discussion Meeting, September 1999. London: The Royal Society (Philosophical Transactions of the Royal Society, Series A: Mathematical, Physical en Engineering Sciences, Vol. 358, Issue 1769).

PASTOR, M.- CASACUBERTA, F. (2005) "Pronunciation modeling", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 133-148.

POLS, L.C.W. (1997) "Flexible, robust, and efficient human speech recognition", Proceedings of the Institute of Phonetic Sciences, University of Amsterdam 21: 1-10.
http://www.fon.hum.uva.nl/Proceedings/Proceedings21/LouisPols/LouisPols-Contents.html

PRUTHI, T.- ESPY-WILSON, C.Y. (2004) "Acoustic parameters for the automatic detection of nasal manner", Speech Communication 43, 3: 241-266.
http://dx.doi.org/10.1016/j.specom.2004.06.001

SCHARENBORG, O. - WAN, V. - MOORE, R. K. (2007) "Towards capturing fine phonetic variation in speech using articulatory features", Speech Communication 49, 10-11: 811-826.
http://dx.doi.org/10.1016/j.specom.2007.01.005

SCHRAMM, H. - AUBERT, X.- BAKKER, B.- MEYER, C.- NEY, H. (2006) "Modeling spontaneous speech variability in professional dictation", Speech Communication 48, 5: 493-515.
http://dx.doi.org/10.1016/j.specom.2005.08.003

SROKA, J.J.- BRAIDA, L.D. (2005) "Human and machine consonant recognition", Speech Communication 45, 4: 401-423.
http://dx.doi.org/10.1016/j.specom.2004.11.009


STRIK, H.- CUCCHIARINI, C. (1998) "Modeling pronunciation variation for ASR: overview and comparison of methods", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 137-144.
http://lands.let.kun.nl/TSpublic/strik/publications/a47.html


STRIK, H.- CUCCHIARINI, C. (1999) "Modeling pronunciation variation for ASR: A survey of the literature", in STRIK, H. (Ed.) Special Issue on Modeling Pronunciation Variation for Automatic Speech Recognition. Speech Communication 29, 2-4: 225-246.

STRIK, H. (Ed.) Special Issue on Modeling Pronunciation Variation for Automatic Speech Recognition. Speech Communication 29, 2-4.

STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) (1998) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. ESCA, European Speech Communication Association; COST Action 249, Continuous Speech over the Telephone; A2RT, Automatic Acoustic Recognition Technologie.

SUOMI, K. (1987) "On spectral coarticulation in stop-vowel-stop syllables: implications for automatic speech recognition", Journal of Phonetics 15,1: 85-100.

URAGA, E.- PINEDA, L. (2002) "Automatic Generation of Pronunciation Lexicons for Spanish", in GELBUKH, A. (Ed.) Computational Linguistics and Intelligent Text Processing. Proceedings of the Third International Conference, CICLing 2002. México City, México, February 17-23, 2002. Heidelberg: Springer Verlag (Lectures Notes in Computer Science, 2276). pp. 330-338.
http://springerlink.metapress.com/content/crl4dnrde4jkhlp4/

ZOLNAY, A. - KOCHAROV, D. - SCHLÜTER, R. - NEY, H. (2007) "Using multiple acoustic feature sets for speech recognition", Speech Communication 49, 6: 514-525.
http://dx.doi.org/10.1016/j.specom.2007.04.005

ZUE, V.W. (1983) "The use of phonetic rules in automatic speech recognition", Speech Communication 2, 2/3 : 181-186.

ZUE, V.W. (1985) "The Use of Speech Knowledge in Automatic Speech Recognition", Proceedings of the IEEE 73,11: 1602-1615.

ZUE, W.V. - SCHWARTZ, R.M. (1980) "Acoustic Processing and Phonetic Analysis", in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall (Prentice Hall Signal Processing Series) pp. 101-124.

Spectrogram reading and speech recognition

Cole, R. A., Rudnicky, A. I., Zue, V., & Reddy, R. D. (1980). Speech as patterns on paper. In R. A. Cole (Ed.), Perception and production of fluent speech. (pp. 3-50). Hillsdale, NJ: Lawrence Erlbaum.

Connolly, J. H., Edmonds, E. A., Guzy, J. J., Johnson, S. R., & Woodcock, A. (1986). Automatic speech recognition based on spectrogram reading. International Journal of Man-Machine Studies, 24(6), 611-621. doi:10.1016/S0020-7373(86)80012-8

Gabrys, G. (1990). Difficulty in learning to read speech spectrograms: The role of visual segmentation (Technical Report LRDC/PITT/IMP-1. Cognitive Science Program. Office of Naval Research). Pittsburgh: Learning Research and Development Center, University of Pittsburgh. Retrieved from http://handle.dtic.mil/100.2/ADA218827

Greene, B. G., Pisoni, D. B., & Carrell, T. D. (1984). Recognition of speech spectrograms. The Journal of the Acoustical Society of America, 76(1), 32-43. doi:10.1121/1.391035

Hatazaki, K., Komori, Y., Kawabata, T., & Shikano, K. (1990). Phoneme segmentation expert system using spectrogram reading knowledge. Systems and Computers in Japan, 21(12), 90-100. doi:10.1002/scj.4690211210

Ingemann, F., & Mermelstein, P. (1975). Speech recognition through spectrogram matching. The Journal of the Acoustical Society of America, 57(1), 253-255. Retrieved from http://www.haskins.yale.edu/Reprints/HL0166.pdf

Johannsen, J., MacAllister, J., Michalek, T., & Ross, S. (1983). A speech spectrogram expert. In ICASSP 1983. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 746-9). Boston, Massachusetts, USA. April 14-16, 1983. doi:10.1109/ICASSP.1983.1172057

Katagiri, S., & Yokota, M. (1987). Phoneme recognition using visual features on speech spectrograms. In European conference on speech technology. (pp. 1365-8). Edinburgh, Scotland, UK. September 1987. Retrieved from http://www.isca-speech.org/archive/ecst_1987/e87_1365.html

Klatt, D. H., & Stevens, K. N. (1972). Sentence recognition from visual examination of spectrograms and machine-aided lexical searching. In 1972 Conference on speech communication and processing. (pp. 315-8). New York: IEEE Press.

Klatt, D. H., & Stevens, K. N. (1973). On the automatic recognition of continuous speech: Implications from a spectrogram-reading experiment. IEEE Transactions on Audio and Electroacoustics, 21(3), 210-217. doi:10.1109/TAU.1973.1162453

Lamel, L. (1988). Formalizing knowledge used in spectrogram reading: Acoustic and perceptual evidence from stops (RLE Technical Report 537). Cambridge, MA: Research Laboratory of Electronics, Massachusetts Institute of Technology. Retrieved from http://dspace.mit.edu/bitstream/handle/1721.1/4955/RLE-TR-537-20137092.pdf

Lamel, L. (1993). A knowledge-based system for stop consonant identification based on spectrogram reading. Computer Speech and Language, 7(2), 169-191. Retrieved from ftp://tlp.limsi.fr/public/csl93.pdf

Leung, H., & Zue, V. (1986). Visual characterization of speech spectrograms. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 2751-4). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168558

Memmi, D., Eskenazi, M., Mariani, J., & Nguyen-Xuan, A. (1983). Un système expert pour la lecture de sonagrammes. Speech Communication, 2(2-3), 234-236. doi:10.1016/0167-6393(83)90037-7

Stern, P. E., Eskenazi, M., & Memmi, D. (1986). An expert system for speech spectrogram reading. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1193-6). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168793

Zue, V., & Cole, R. (1979). Experiments on spectrogram reading. In ICASSP 1979. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 116-9). Washington, District of Columbia, USA. April 2 - 4, 1979. doi:10.1109/ICASSP.1979.1170735

Zue, V., & Lamel, L. (1986). An expert spectrogram reader: A knowledge-based approach to speech recognition. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1197-200). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168798

Spectrographic analysis of speech

arrow_up_gray

Prosodic knowledge

Prosody in speech recognition

BATLINER, A.- MÖBIUS, B. (2005) "Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground?", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 21-44.

BARTKOVA, K. (1997) "Some experiments about the use of prosodic parameters in a speech recognition system", in Proceedings of the ESCA Workshop on Intonation. Athens, 18-20 September 1997. pp. 33-36.

BARTKOVA, K.- JOUVET, D. (1999) "Selective prosodic post-processing for improving recognition of French telephone numbers", in Eurospeech'99, 6th European Conference on Speech Communication and Technology. Budapest, Hungary, 5-10 September 1999. Vol 1 pp. 267-270.

BASSI, A.- BECERRA YOMA, N.- LONCOMILLA, P. (2006) "Estimating tonal prosodic discontinuities in Spanish using HMM", Speech Communication 48, 9: 1112-1125.
http://dx.doi.org/10.1016/j.specom.2006.03.006

CAMPBELL, N. (1993) "Automatic detection of prosodic boundaries in speech", Speech Communication 13, 3-4: 343-354.

CHEN, K. - HASEGAWA-JOHNSON, M. - COHEN, A. - BORYS, S. - SUNG-SUK, K. - COLE, J. - JEUNG-YOON, C. (2006) "Prosody dependent speech recognition on radio news corpus of American English", IEEE Transactions on Audio, Speech and Language Processing 14, 1: 232-245.
http://dx.doi.org/10.1109/TSA.2005.853208

ESCUDERO, D.- CARDEÑOSO, V. (2002) "Una experiencia en reconocimiento automático de tipos de unidades melódicas a partir de su perfil de entonación", in DÍAZ GARCÍA, J. (Ed.) Actas del II Congreso de Fonética Experimental. Sevilla 5, 6 y 7 de marzo de 2001. Sevilla: Laboratorio de Fonética, Facultad de Filología, Universidad de Sevilla. pp. 161-166.

GARCÍA, C.- TAPIAS, D. (2000) "La frecuencia fundamental de la voz y sus efectos en reconocimiento de habla continua", Procesamiento del Lenguaje Natural, Revista n. 26: 163-167.

Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001

HASEGAWA-JOHNSON, M.- CHEN, K.- COLE, J.- BORYS, S.- KIM, S.-S.- COHEN, A.- ZHANG, T.- CHOI, J.-Y.. KIM, H.- YOON, T.- CHAVARRIA, S. (2005) "Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus", Speech Communication 46: 418-439.
http://dx.doi.org/10.1016/j.specom.2005.01.009

KOMPE, R. (1997) Prosody in Speech Understanding Systems. Berlin-New York: Science Springer (Lecture Notes in Artificial Intelligence, Vol. 1307 Subseries of Lecture Notes in Computer Science Springer).

LEA, W.A. (1980) "Prosodic aids in speech recognition" in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs, N.J.: Prentice-Hall. pp. 166-205.

LONGUET-HIGGINS, C. (1985) "Tones of Voice: The Role of Intonation in Computer Speech Understanding", in FALLSIDE, F.- WOODS, W.A. (Eds.) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 293-302.

MÉLONI, H.- LANGLAIS, P. (1996) "Prosodie et reconnaissance de la parole", in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 205-224.

PAGEL, V. (1999) De l'utilisation d'informations acoustiques suprasegmentales en reconnaissance de la parole continue. Thèse Doctorale. Université Henri Poincaré, Nancy.
http://vincent.pagel.free.fr/THESE/

RUBIO AYUSO, A.J. - MILONE, D.H. (2002) "Información prosódica y acentual para el reconocimiento automático del habla", in DÍAZ GARCÍA, J. (Ed.) Actas del II Congreso de Fonética Experimental. Sevilla 5, 6 y 7 de marzo de 2001. Sevilla: Laboratorio de Fonética, Facultad de Filología, Universidad de Sevilla. pp. 56-77.

SHRIBERG, E.- STOLCKE, A.- HAKKANI-TÜR, D.- TÜR, G. (2000) "Prosody-based automatic segmentation of speech into sentence and topics", Speech Communication 32, 1-2: 127-154.

Vicsi, K., & Szaszák, G. (2010). Using prosody to improve automatic speech recognition. Speech Communication, 52(5), 413-426. doi:10.1016/j.specom.2010.01.003

VICSI, K. - SZASZÁK, G. (2005) "Automatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features", International Journal of Speech Technology 8, 4: 363-370.
http://dx.doi.org/10.1007/s10772-006-8534-z

WAIBEL, A. (1986) "Suprasegmentals in very large vocabulary word recognition", in SCHWAB, E.E.- NUSBAUM, H. (Eds.) Pattern Recognition by Humans and Machines. Volume 1: Speech Perception. Orlando: Academic Press, Inc. pp. 159-186.

WAIBEL, A. (1988) Prosody and Speech Recognition. San Mateo, CA: Morgan Kaufmann.

ZEISSLER, V. - ADELHARDT, J. - BATLINER, A. - FRANK, C. - NÖTH, E. - SHI R. P. - NIEMANN, H. (2006) "The prosody module", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp.139-152.

arrow_up_gray

Phonological knowledge

CARSON-BERNDSEN, J. (1998) Time Map Phonology. Finite State Models and Event Logics in Speech Recognition. Dordrecht - Boston - London: Kluwer Academic Publishers (Text, Speech and Language Technology, 5).

COHEN, P.S.- MERCER, R.L. (1975) "The Phonological Component of an Automatic Speech-Recognition System", in REDDY, D.R. (Ed) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press. pp. 275-319.

CHURCH, K.W. (1987) Phonological parsing in speech recognition. Boston: Kluwer Academic Publishers (Kluwer International Series in Engineering and Computer Science, SECS 38).

DENG, L. (1997) "Speech recognition using autosegmental representation of phonological units with interface to the trended HMM", Speech Communication 23, 3: 211-222.

GIACHIN, E.- ROSENBERG, A.E.- LEE, C.-H. (1991) "Word juncture modeling using phonological rules for HMM-based continuous speech recognition", Computer Speech and Language 5,2: 155-168.

HOEQUIST Jr., C.- NOLAN, F. (1991) "On an application of phonological knowledge in automatic speech recognition", Computer Speech and Language 5,2: 133-153.

OSHIKA, B.- ZUE, V.W.- WEEKS, R.V. - NEU, H.- AURBACH, J. (1975) "The Role of Phonological Rules in Speech Understanding Research", IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23: 104-112.

PERENNOU, G.- BRIEUSSEL-POUSSE, L. (1998) "Phonological component in automatic speech recognition", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 91-96.

SENEFF, S.- WANG, C. (2005) "Statistical modeling of phonological rules through linguistic hierarchies", Speech Communication 46, 2: 204-216.
http://dx.doi.org/10.1016/j.specom.2005.03.005

SHOUP, J. E. (1980) "Phonological Aspects of Speech Recognition", in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall . pp. 125-138.

arrow_up_gray

Recognition of emotional speech

Recognition of emotional speech

Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2010). Multiple feature extraction and hierarchical classifiers for emotions recognition. In A. Esposito, N. Campbell, C. Vogel, A. Hussain, & A. Nijholt (Eds.), Development of multimodal interfaces: Active listening and synchrony. Second COST 2102 International Training School. Dublin, Ireland, March 23-27, 2009. Revised selected papers. (pp. 242-54). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-12397-9_20. Retrieved from http://fich.unl.edu.ar/sinc/publications/2010/AMR10a/sinc_AMR10a.pdf

Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2010b). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, In Press, Accepted Manuscript. doi:10.1016/j.csl.2010.10.001

Ang, J., Dhillon, R., Krupski, A., Shriberg, E., & Stolcke, A. (2002). Prosody-Based automatic detection of annoyance and frustration in human-computer dialog. In ICSLP 2002 - interspeech 2002. Proceedings of the 7th international conference on spoken language processing. (pp. 2037-40). Denver, Colorado, USA, September 16-20, 2002. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.4027

Baber, C., Mellor, B., Graham, R., Noyes, J. M., & Tunley, C. (1996). Workload and the use of automatic speech recognition: The effects of time and resource demands. Speech Communication, 20(1-2), 37-54. doi:10.1016/S0167-6393(96)00043-X

Barra, R., Montero, J. M., Macías, J., D'Haro, L. F., San-Segundo, R., & Córdoba, R. (2006). Prosodic and segmental rubrics in emotion identification. In ICASSP 2006. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1085-8). Toulouse, France, 14-19 May 2006. Retrieved from http://www-gth.die.upm.es/research/documentation/AG-39Pro-06.pdf

Burkhardt, F., Ajmera, J., Englert, R., Stegmann, J., & Burleson, W. (2006). Detecting anger in automated voice portal dialogs. In Interspeech 2006 - ICSLP. Proceedings of the 9th international conference on spoken language processing. Pittsburgh, PA, USA. September 17-21, 2006. Retrieved from http://felix.syntheticspeech.de/publications/recognitionOfAnger.pdf

El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3). doi:10.1016/j.patcog.2010.09.020

Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007). Primitives-Based evaluation and estimation of emotions in speech. Speech Communication, 49(10-11), 787-800. doi:10.1016/j.specom.2007.01.010. Retrieved from http://asimov.usc.edu/~mower/Papers/GrimmSpeechComm2007.pdf

Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20(1-2), 151-173. doi:10.1016/S0167-6393(96)00050-7

Huber, R., Batliner, A., Buckow, J., Nöth, E., Warnke, V., & Niemann, H. (2000). Recognition of emotion in a realistic dialogue scenario. In ICSLP 2000. Proceedings of the 6th international conference on spoken language processing. (pp. 665-8). Beijing, China, October 16-20, 2000. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.6965

Kessous, L., Castellano, G., & Caridakis, G. (2010). Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. Journal on Multimodal User Interfaces, 3(1), 33-48. doi:10.1007/s12193-009-0025-5. Retrieved from http://www.image.ntua.gr/papers/638.pdf

Kotti, M., Paternò, F., & Kotropoulos, C. (2010). Speaker-Independent negative emotion recognition. In CIP 2010. 2nd International workshop on cognitive information processing. (pp. 417-22). Elba. June-14-16, 2010. doi:10.1109/CIP.2010.5604091. Retrieved from http://giove.isti.cnr.it/attachments/publications/2010-A2-041.pdf

Laukka, P., Neiberg, D., Forsell, M., Karlsson, I., & Elenius, K. (2011). Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation. Computer Speech & Language, 25(1), 84-104. doi:10.1016/j.csl.2010.03.004

Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48(5), 559-590. doi:10.1016/j.specom.2005.09.008.

López-Cózar, R., Silovsky, J., & Griol, D. (2010). Mejora del funcionamiento de sistemas de diálogo hablado mediante reconocimiento del estado emocional de usuarios. Procesamiento del Lenguaje Natural, 45, 191-198. Retrieved from http://www.sepln.org/ojs/ojs-2.2/index.php/pln/article/view/802/656

Luengo, I., & Navas, E. (2010). Feature analysis and evaluation for automatic emotion identification in speech. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 267-70). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0060.pdf

Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490-501. doi:10.1109/TMM.2010.2051872

Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Reconocimiento automático de emociones utilizando parámetros prosódicos. Procesamiento del Lenguaje Natural, 35, 13-20. Retrieved from http://www.sepln.org/revistaSEPLN/revista/35/02.pdf

Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98-112. doi:10.1016/j.specom.2006.11.004. Retrieved from http://www-ist.massey.ac.nz/rwang/publications/07SC.pdf

Neiberg, D., & Ellenius, K. (2008). Automatic recognition of anger in spontaneous speech. In Interspeech 2008. Proceedings of the 9th annual conference of the international speech communication association. (pp. 2755-8). Brisbane, Australia. September 22-26, 2008. Retrieved from http://www.speech.kth.se/prod/publications/files/3189.pdf

Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden markov models. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd Interspeech event. (pp. 2679-82). Aalborg, Denmark, September 3-7, 2001. Retrieved from http://gps-tsc.upc.es/veu/research/pubs/download/nog_emo_01.pdf

Origlia, A., Galatà, V., & Ludusan, B. (2010). Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, USA. May 11-14, 2010. Retrieved from http://aune.lpl.univ-aix.fr/~sprosig/sp2010/papers/100213.pdf

Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: Features and algorithms. International Journal of Human-Computer Studies, 59, 157-183. doi:10.1016/S1071-5819(02)00141-6. Retrieved from http://pyoudeyer.com/emotionsIJHCS.pdf

Polzehl, T., Schmitt, A., & Metze, F. (2010). Approaching multi-lingual emotion recognition from speech - on language dependency of acoustic/prosodic features for anger recognition. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, USA. May 11-14, 2010. Retrieved from http://aune.lpl.univ-aix.fr/~sprosig/sp2010/papers/100442.pdf

Sidorova, J. (2009). Optimization techniques for speech emotion recognition. PhD Thesis, Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra. Retrieved from http://www.tesisenred.net/TDX-0113110-133822/

Sidorova, J., & Badia, T. (2008). ESEDA: Tool for enhanced speech emotion detection and analysis. Procesamiento del Lenguaje Natural, 41, 307-308. Retrieved from http://www.sepln.org/revistaSEPLN/revista/41/demo9.pdf

ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1-2), 213-225. doi:10.1016/S0167-6393(02)00083-3. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.4047&rep=rep1&type=pdf

Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162-1181. doi:10.1016/j.specom.2006.04.003. Retrieved from http://poseidon.csd.auth.gr/papers/PUBLISHED/JOURNAL/pdf/Ververidis06a.pdf

Womack, B. D., & Hansen, J. H. L. (1996). Classification of speech under stress using target driven features. Speech Communication, 20(1-2), 131-150. doi:10.1016/S0167-6393(96)00049-0

Prosody and emotions

Synthesis of emotional speech

Emotions in spoken language systems

arrow_up_gray

Speech recognition products and applications

Speech recognition products

ALIPRANDI, C. - VERRUSO, F. (2006) "Tecnologie del Linguaggio Naturale e sottotitolazione multilingue diretta. Lo stato dell'arte in Italia e l'esperienza dei Campionati Intersteno", inTRAlinea. Special issue on Respeaking.
http://www.intralinea.it/specials/respeaking/eng_more.php?id=453_0_41_0_M

Barnard, E., Schalkwyk, J., van Heerden, C., & Moreno, P. J. (2010). Voice search for development. In Interspeech 2010. Proceedings of the 11th annual conference of the international speech communication association. Makuhari, Chiba, Japan. September 26-30, 2010. Retrieved from http://www.isca-speech.org/archive/interspeech_2010/i10_0282.html

BERTON, A. - KALTENMEIER, A. - HAIBER, U. - SCHREINER, O. (2006) "Speech recognition", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp. 85-108.

Cardenal, A., Peso, P., Bueno, M., Espiña, A., Rodríguez Silva, D. A., Adkinson, L., & Pellitero, A. (2010). TACOMA: On-Line transcription of audiovisual material. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 239-42). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0053.pdf

CERF-DANON, H.- DeGENNARO, S.- FERRETI, M.- GONZÁLEZ, J.- KEPPEL, E. (1991) "Tangora - a large vocabulary speech recognition system for five languages", in Eurospeech'91. 2nd European Conference on Speech Communication and Technology. Genova, Italy, 24-26 September 1991. Vol 1. p. 183-192.

CHELBA, C. - SILVA, J. - ACERO, A. (2007) "Soft indexing of speech content for search in spoken documents", Computer Speech and Language 21, 3: 458-478.
http://dx.doi.org/10.1016/j.csl.2006.09.001

CÓRDOBA, R.- MACÍAS, J.- SAMA, V.- BARRA, R.- PARDO, J.M. (2005) "New advances in cross-task and speaker adaptation for air traffic control tasks", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 21-28.
http://www-gth.die.upm.es/research/documentation/AI-90New-05.pdf

Delgado, H., Serrano, J., & Carrabina, J. (2010). Automatic metadata extraction from spoken content using speech and speaker recognition techniques. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 201-4). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0043.pdf

DEMEDTS, A. (1993) "Un sistema de reconocimiento del español con un léxico de 30.000 unidades", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 13: 435-437.

DIÉGUEZ, F.J.- GARCÍA, C.- CARDENAL, A. (2005) "Comparación de modelos de lenguaje para la transcripción automática de noticiarios televisivos", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 269-276.

DUGAST, Ch.- AUBERT, X.- KNESER, R. (1995) "The Philips Large-Vocabulary Recognition System for American English, French and German", in Eurospeech'95. Proceedings of the 4th european conference on speech communication and technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 197-200.

FLETCHER, R. (1997) "First Impressions of ViaVoice, Continuous Dictation Software from IBM", Translation Journal 2, 1.
http://translationjournal.net/journal//02dict1.htm

García Mateo, C., Diéguez, J., Docío, L., & Cardenal, A. (2004). Transcrigal: A bilingual system for automatic indexing of broadcast news. In LREC 2004. Proceedings of the 4th international conference on language resources and evaluation. Lisbon, Portugal. May 24-30, 2004. Retrieved from http://www.gts.tsc.uvigo.es/web/imaxes_user/051104101114_lrec04_transcrigal.pdf

GONZÁLEZ, J.- MACÍAS, J.- PALMA, M.A.- PALOU, F.- TROS DE ILARDUYA, M. (1992) "Tangora/E, un reconocedor del habla para el castellano", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 12.

GRIMES, B. (1997) "Voice Recognition Software: Naturally Speaking from Dragon Systems", Translation Journal 2, 1.
http://translationjournal.net/journal//02dict2.htm

HAEB-UMBACH, R.- GAMM, S. (1995) "Human Factors of a Voice-Controlled Car Stereo", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 2, pp. 1453-1456.

HAUPTMANN, A. (2006) "Automatic spoken document retrieval", in BROWN, K. (Ed.) Encyclopedia of Language & Linguistics. Amsterdam: Elsevier. pp. 95-103.
http://dx.doi.org/10.1016/B0-08-044854-2/00922-6

HAIN, T.- WOODLAND, P.C.- EVERMANN, G.- GALES, M.J.F.- LIU, X.- MOORE, G.L.- POVEY, D. (2005) "Automatic transcription of conversational telephone speech", IEEE Transactions on Speech and Audio Processing 13, 6: 1173-1185.
http://dx.doi.org/10.1109/TSA.2005.852999

HUANG, X.- ALLEVA, F.- HON, H.-W.- HWANG, M.-Y.- LEE, K.-F.- ROSENFELD, R. (1993) "The SPHINX-II speech recognition system: an overview", Computer Speech and Language 7,2: 137-148.

Hughes, T., Nakajima, K., Ha, L., Vasu, A., Moreno, P., & LeBeau, M. (2010). Building transcribed speech corpora quickly and cheaply for many languages. In Interspeech 2010. Proceedings of the 11th annual conference of the international speech communication association. (pp. 1914-7). Makuhari, Chiba, Japan. September 26-30, 2010. Retrieved from http://www.isca-speech.org/archive/interspeech_2010/i10_1914.html

HUNT, M.J. (1998) "Practical Automatic Dictation Systems", The ELRA Newsletter 3,1: 4-7

LAMBERT, E. (1991) "La máquina de escribir con entrada vocal", in VIDAL BENEYTO, J. ( Dir.) Las industrias de la lengua. Trad. de M. Alvar et al. Salamanca / Madrid: Fundación Sánchez Ruipérez / Pirámide (Biblioteca del Libro, 5). pp. 455-461.

LAMBOURNE, A.- HEWITT, J.- LYON, C.- WARREN, S. (2004) "Speech-based real-time subtitling services", International Journal of Speech Technology 7, 4: 269-279.
http://dx.doi.org/10.1023/B:IJST.0000037071.39044.cc

LEE, K.F. (1989) Automatic Speech Recognition. The Developmen of the SPHINX System. Dordrecht: Kluwer.

MANDEL, M.A. (1992) "A commercial large-vocabulary discrete speech recognition system: Dragon Dictate", Language and Speech 35, 1-2: 237-246.

MEISEL, W.S. (1986) "Towards the 'Talkwriter'", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins. pp. 338-348.

Moreno, A. (2010). Information search engine for multilingual audiovisual content: BUCEADOR. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 259-62). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0058.pdf

NÉEL, F.- CHOLLET, G.- LAMEL, L.- MINKER, W.- CONSTANTINESCU, A. (1996) "Reconnaissance et comprehénsion de la parole: évaluation et applications", in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones).

PEA, E. - CANNAROZZO, L. (2006) "Considerazione sull'uso del Via Voice alla RTSI", inTRAlinea. Special issue on Respeaking.
http://www.intralinea.it/specials/respeaking/eng_more.php?id=486_0_41_0_M

POZA LARA, M.J.- VILLARRUBIA GRANDE, L.- SILES SÁNCHEZ, J.A. (1991) "Teoría y aplicaciones del reconocimiento automático del habla", Comunicaciones de Telefónica I+D 3.

Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., . . . Strope, B. (2010). “Your word is my command”: Google search by voice: A case study. In A. Neustein (Ed.), Advances in speech recognition. Mobile environments, call centers and clinics. (pp. 61-90). New York: Springer. doi:10.1007/978-1-4419-5951-5_4. Retrieved from http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36340.pdf

Schmitt, A., Zaykovskiy, D., & Minker, W. (2009). Speech recognition for mobile devices. International Journal of Speech Technology, 11(2), 63-72.

Schuster, M. (2010). Speech recognition for mobile devices at Google. In B. T. Zhang & M. Orgun (Eds.), PRICAI 2010: Trends in artificial intelligence. 11th Pacific Rim international conference on artificial intelligence, Daegu, Korea, August 30–september 2, 2010. Proceedings. (pp. 8-10). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-15246-7_3. Retrieved from

STEINBISS, V.- NEY, H.- ESSEN, U.- TRAN, B.-H., - AUBERT, X.- DUGAST, C.- KNESER, R.- MEIER, H.-G. - OERDER, R.- HAEB-UMBACH, R.- GELLER, D.- HÖLLERBAUER, W.- BARTOSIK, H. (1995) "Continuous speech dictation - From theory to practice", Speech Communication 17, 1-2: 19-38.

TAPIAS MERINO, D. (1999) "Sistemas de reconocimiento de voz en las telecomunicaciones", in GÓMEZ GUINOVART, J.- LORENZO SUÁREZ, A.- PÉREZ GUERRA, J.- ÁLVAREZ LUGRÍS, A. (Eds.) Panorama de la investigación en lingüística informática. RESLA, Revista Española de Lingüística Aplicada, Volumen monográfico. pp. 83-102.

Varona, A., Rodríguez Fuentes, L. J., Penagarikano, M., Nieto, S., Diez, M., & Bordel, G. (2010). Search and access to information contained in the speech of multimedia resources. Procesamiento del Lenguaje Natural, 45, 317-318. Retrieved from http://www.sepln.org/ojs/ojs-2.2/index.php/pln/article/view/831/685

VILLARRUBIA GRANDE, L.- CORTÁZAR MÚGICA, I.- HERNÁNDEZ GÓMEZ, L.- LÓPEZ GONZALO, E. (2001) "Reconocimiento de voz en el entorno de las nuevas redes de comunicación UMTS e Internet", Comunicaciones de Telefónica I+D 23: 99-112.

VIVER, X. (2005) "Philips: Intelligent Speech Interpretation - la tecnología inteligente de reconocimiento de voz", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 459-460.

Zelenák, M., Schulz, H., & Hernando, J. (2010). Albayzín 2010 evaluation campaign: Speaker diarization. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 301-4). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0068.pdf

arrow_up_gray

Speech recognition evaluation and assessment

Broughton, M. (2002). Measuring the accuracy of commercial automated speech recognition systems during conversational speech. In Virtual conversational characters: Applications, methods, and research challenges. Melbourne, Australia. November 29, 2002. Retrieved from http://www.vhml.org/workshops/HF2002/papers/broughton/broughton.pdf

Burger, S., Sloane, Z. A., & Yang, J. (2006). Competitive evaluation of commercially available speech recognizers in multiple languages. In LREC 2006. Proceedings of the 5th International Conference on Language Resources and Evaluation. Genoa, Italy. May 24-26, 2006. Retrieved from http://pages.cs.brandeis.edu/~marc/misc/proceedings/lrec-2006/pdf/802_pdf.pdf

Castillo Condado, O. (1999). Evaluación de un reconocedor fonético para el español hablado en México. Tesis de Licenciatura, Universidad de Las Américas, Puebla, México. Retrieved from http://catarina.udlap.mx/u_dl_a/tales/documentos/lis/castillo_c_o/

de Yzaguirre, L. (2000). Evaluación comparativa de dos sistemas comerciales de reconocimiento de voz. In I jornadas en tecnología del habla. Sevilla: Universidad de Sevilla - Universidad de Granada - Red Temática en Tecnologías del Habla. Retrieved from http://latel.upf.edu/terminotica/membres/DE_YZA/PUBLI/eval2srv.pdf

Devine, E. G., Gaehde, S. A., & Curtis, A. C. (2000). Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports. Journal of the American Medical Informatics Association, 7(5), 462-468. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC79041/pdf/0070462.pdf

Furui, S. (2007). Speech and speaker recognition evaluation. In L. Dybkjaer, H. Hemsen, & W. Minker (Eds.), Evaluation of text and speech systems. (pp. 1-28). Dordrecht: Springer. doi:10.1007/978-1-4020-5817-2_1

Gibbon, D., Moore, R., & Winski, R. (Eds). (1998). Assessment of recognition systems. In Spoken language system assessment. (pp. 67-93). Berlin - New York: Mouton de Gruyer.

Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001

Hutchinson, B. (2001). A functional approach to speech recognition evaluation. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd Interspeech Event. (pp. 1683-6). Aalborg, Denmark, September 3-7, 2001. Retrieved from http://perso.telecom-paristech.fr/~chollet/Biblio/Congres/Audio/Eurospeech01/CDROM/papers/page1683.pdf

Lamel, L., Minker, W., & Paroubek, P. (2000). Toward best practice in the development and evaluation of speech recognition components of a spoken language dialogue system. Natural Language Engineering, 6(3-4), 305-322. Retrieved from http://www.limsi.fr/Individu/pap/nle99.ps

Mangold, H. (1989). Assessment of speech recognizers in public information and ordering systems. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 37-58). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1037.html

Moore, R. K. (1989). Assessment of speech input systems. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 27-32). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1027.html

Néel, F., Chollet, G., Lamel, L., Minker, W., & Constantinescu, A. (1996). Reconnaissance et comprehénsion de la parole: Évaluation et applications. In H. Méloni (Ed.), Fondements et perspectives en traitement automatique de la parole. (pp. 331-67). Paris: Éditions AUPELF-UREF.

Pallett, D. S. (1985). Performance assessment of automatic speech recognizers. Journal of Research of the National Bureau of Standards, 90(5), 371-387. Retrieved from http://nvl.nist.gov/pub/nistpubs/jres/090/5/V90-5.pdf#page=41

Pallett, D. S. (1986). Assessing the performance of speech recognisers. In G. Bristow (Ed.), Electronic speech recognition. Techniques, technology and applications. (pp. 277-306). London: Collins.

Pallett, D. S. (1989). Speech input assessment using benchmark tests: Procedures, advantages and limitations. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 33-6). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1033.html

Pallett, D. S., & Fourcin, A. (1996). Speech input: Assessment and evaluation. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology. (pp. 495-9). Cambridge: Cambridge University Press. Retrieved from http://cslu.cse.ogi.edu/HLTsurvey/ch13node8.html

Paulus, E. (2000). Some guidelines for the evaluation of approaches to automatic speech recognition. In W. F. Sendlmeier (Ed.), Speech and signals. Aspects of speech synthesis and automatic speech recognition. Dedicated to Wolfgang Hess on his 60th birthday. (pp. 129-39). Frankfurt am Main: Hector.

Serrahima, L. (2009). Reconocimiento de voz de Windows Vista: ¿Mejor, igual o peor que Dragon Naturally Speaking? Panace@, 10(29), 76-79. Retrieved from http://medtrad.org/panacea/IndiceGeneral/n29_tribuna-Serrahima2.pdf

Steeneken, H. J. M., & Varga, A. (1993). Assessment for automatic speech recognition: I. Comparison of assessment methods. Speech Communication, 12(3), 241-246. doi:10.1016/0167-6393(93)90094-2

Yao, X., Bhutada, P., Georgila, K., Sagae, K., Artstein, R., & Traum, D. (2010). Practical evaluation of speech recognisers for virtual human dialogue systems. In LREC 2010. Proceedings of the 7th International Conference on Language Resources and Evaluation. Valletta, Malta. 17-23 May, 2010. Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/675_Paper.pdf

Young, S. J., & Chase, L. L. (1998). Speech recognition evaluation: A review of the U.S. CSR and LVCSR programmes,. Computer Speech & Language, 12(4), 263-279. doi:10.1006/csla.1998.0101

Young, S. J., Adda-Dekker, M., Aubert, X., Dugast, C., Gauvain, J. L., Kershaw, D. J., . . . Woodland, P. C. (1997). Multilingual large vocabulary speech recognition: The European SQALE project. Computer Speech & Language, 11(1), 73-89. doi:10.1006/csla.1996.0023

arrow_up_gray

Speaker recognition

Speaker Recognition, Speaker Indentification and Speaker Verification

ADAMI, A. G. (2007) "Modeling prosodic differences for speaker recognition", Speech Communication 49, 4: 277-291.
http://dx.doi.org/10.1016/j.specom.2007.02.005

ANDRÉ-OBRECHT, R. (Ed.) (2000) Special Issue on Speaker Recognition and its Commercial and Forensic Applications, Speech Communication 31, 2-3.

Beigi, H. (2009). Fundamentals of speaker recognition. New York: Springer.

Preface.- Basic Theory.- Introduction.- Speaker and Vocal Tract Modeling.- Signal Processing and Feature Extraction Techniques.- Data Representation and Probability Distributions.- Information Theory.- Metrics and Distortion Measures Bayesian Learning and Gaussian Mixture Modeling.- Parameter Estimation and Learning.- Hidden Markov Modeling (HMM).- Support Vector Machines.- Neural Networks.- Advanced Theory.- Speaker Modeling.- Language Modeling and Dynamic Analysis.- Sub-Optimal Search.- Algorithms.- Practice.- Speaker Recognition.- Overall Design.- Representation of Results.- Extensions.- Language Detection.- Glossary.
BIMBOT, F. - HUTTER, H.P. - JABOULET, C., KOOLWAAIJ, J. - LINDBERG, J. - PIERROT, J.B. (1998) "An overview of the CAVE project research activities in Speaker Verification", in Proceedings of RLA2C, Speaker Recognition and its Commercial and Forensic Applications. Avignon, France, April 1998. pp. 215-220.

BIMBOT, F.- BLOMBERG, M.- BOVES, L.- CHOLLET, G.- JABOULET, C.- JACOB, B.- KHARROUBI, J.- KOOLWAAIJ, J.- LINDBERG, J.- MARIETHOZ, J.- MOKBEL, C.- MOKBEL, H. (1999) "An overview of the Picasso project research activities in speaker verification for telephone application", in Eurospeech'99, 6th European Conference on Speech Communication and Technology. September 5-9, 1999, Budapest, Hungary.

BIMBOT, F.- BLOMBERG, M.- BOVES, L.- GENOUD, D.- HUTTER, H.-P. - JABOULET, C.- KOOLWAAIJ, J.- LINDBERG, J.- PIERROT, J.-B. (2000) "An overview of the CAVE project research activites in speaker verification", Speech Communication 31, 2-3: 155-180.

BIMBOT, F.- CHOLLET, G.- PAOLONI, A. (Eds.) (1995) Special Section on Automatic Speaker Recognition, Identification and Verification, Speech Communication 17, 1-2: 81-298.

BIMBOT, F.- HUTTER, H.P.- JABOULET, C. - KOOLWAAIJ, J..- LINDBERG, J. - PIERROT, J.B. (1997) "Speaker Verification in the Telephone Network : Research Activities in the CAVE project", in Eurospeech'97. Proceedings of 5th International Conference on Speech Communication and Technology. Rhodes, Greece, September 1997. pp. 971-974.

BOURLARD, H.- MORGAN, N. (1998) Speaker Verification. A Quick Overview. IDIAP Technical Report, IDIAP-RR 98-12.
ftp://ftp.idiap.ch/pub/reports/1998/98-12.ps.gz

BRICKER, P.D.- PRUZANSKY, S. (1976) "Speaker Recognition", in N.J. LASS (Ed.) Contemporary Issues in Experimental Phonetics. New York: Academic Press. pp. 295-326.

CAMPBELL, J.P.- MASON, J.- ORTEGA-GARCÍA, J. (Eds.) (2006) Odyssey 2004: The Speaker and Language Recognition Workshop. Toledo, Spain. 31 May - 3 June 2004. Computer Speech and Language 20, 2-3.

CAPPÉ, O. (1996) Speaker Recognition Bibliography, Départment TSI Signal - Images, École Nationale Supérieure des Télécommunications
http://perso.telecom-paristech.fr/~cappe/docs/spkrec.html

CHOLLET, G. (1994) "Automatic Speech and Speaker Recognition: Overview, Current Issues and Perspectives", in KELLER, E. (Ed.) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons. pp. 129-148.<`> COSI, P. (1982) "Speaker recognition: A survey", in HATON, J.P. (Ed.) Automatic Speech Analysis and Recognition. Dordrecht: Reidel. pp. 277-308.

COST 250 (1996) COST 250 Workshop Proceedings "Application of Speaker Recognition Techniques in Telephony". Vigo, Spain, November 1996.

COST 250 (1998) COST 250 Workshop Proceedings "Speaker Recognition by Man and Machine: Directions for Forensic Applications". Ankara, Turkey, April 1998.

COST 250 (1999) COST 250 Speaker Recognition in Telephony. Final Report 1999. Brussels: European Commission DG XIII Directorate B / Roma: Fondazione Ugo Bordoni. (CD-ROM)

COST 275 (2002) The Advent of Biometrics on the Internet. A COST 275 Workshop. Proceedings. 7-8 November, 2002. Fondazione Ugo Bordoni, Rome, Italy.

COST 275 (2004) Biometrics on the Internet: Fundamentals, Advances and Applications. 2nd COST 275 Workshop. Proceedings. University of Vigo, 25-25 March, 2004. Vigo, Spain.

DANKOVICOVÁ, J.- NOLAN, F. (1999) "Some acoustic effects of speaking style on utterances for automatic speaker verification", Journal of the International Phonetic Association 29, 1: 115-128.

DODDINGTON, G. (1985) "Speaker recognition - identifying people by their voices", Proceedings of the IEEE 73: 1651-1664.

FURUI, S. (1996) "An overview of speaker recognition technology", in LEE, C.-H. - SOONG, F. K.- PALIWAL, K.K. (Eds.) Automatic Speech and Speaker Recognition. Dordrecth: Kluwer Academic Publishers. pp. 31-56.

ESCUDERO, D.- CARDEÑOSO, V.- SÁNCHEZ, J.M.- NAVAS, E.- HERNÁEZ, I. (2003) "Uso de entonación en reconocimiento automático del locutor: Resultados preliminares", in SEAF 2003. Actas del II Congreso de la Sociedad Española de Acústica Forense. Barcelona, 10 y 11 de abril de 2003. Barcelona: SEAF, Sociedad Española de Acústica Forense. pp. 167-174.
http://www.infor.uva.es/~descuder/investig/pdfs/SEAF2003.pdf

FAÚNDEZ, M. (1999) "Identificación de locutores sobre la base de datos Telvoice", XIV Simposium Nacional de la Unión Científica Internacional de Radio, URSI'99, Santiago de Compostela.

FAÚNDEZ, M.- SATUÉ, A. (1999) "Identificación de locutor sobre base de datos bilingüe", XIV Simposium Nacional de la Unión Científica Internacional de Radio, URSI'99, Santiago de Compostela.

FERNÁNDEZ POZO, R. - FOMBELLA MOURELLE, C. - TORRE TOLEDANO, D. - LÓPEZ GONZALO, E. - HERNÁNDEZ GÓMEZ, L. (2006) "Estudio del uso de información prosódica en reconocimiento de locutor en ámbito forense", in IV Jornadas en Tecnologías del Habla. Zaragoza, del 8 al 10 de novembre de 2006. Zaragoza: Universidad de Zaragoza. pp. 343-348.
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth.pdf

FURUI, S. (1986) "Research on individuality features of the speech waves and automatic speaker recognition techniques", Speech Communication 5, 2: 183-197.

FURUI, S. (1996) "An overview of speaker recognition technology", in LEE, C.-H. - SOONG, F. K.- PALIWAL, K.K. (Eds.) Automatic Speech and Speaker Recognition. Dordrecth: Kluwer Academic Publishers. pp. 31-56.

FURUI, S. (1997) "Speaker Recognition", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press. pp. 42-48.
http://speech.bme.ogi.edu/HLTsurvey/ch1node9.html#SECTION17

GARVIN, P.L.- LADEFOGED, P. (1963) "Speaker identification and message identification in speech recognition", Phonetica 9: 193-199.

HERNÁNDEZ, L.A.- CASAJÚS, F.J.- GARCÍA GÓMEZ, R. (1984) "Identificación de personas por sus voces", Mundo electrónico 146: 83-91.

HERNANDO, J.- GARCÍA, C.- RODRÍGUEZ, L.- GONZÁLEZ, J.- ORTEGA, J. (2000) "Reconocimiento del locutor en telefonía: actividades del proyecto europeo COST250", in ORTEGA GARCÍA, J. (Ed.) SEAF 2000. Actas del I Congreso de la Sociedad Española de Acústica Forense. Universidad Politécnica de Madrid, Escuela Universitaria de Ingeniería Técnica de Telecomunicación, Madrid, 5-6 de octubre de 2000. Madrid: EUIT de Telecomunicación. pp. 145-148.

LAVER, J.- JACK, M.- GARDINER, A. (Eds.) (1990) Proceedings of the tutorial and research workshop on Speaker Characterization in Speech Technology. Edinburgh, 26-28 June 1990. Edinburgh: Centre for Speech Technology Research, University of Edinburgh - ESCA, European Speech Communication Association.

LEUNG, K.Y.- MAK, M.W.- SIU, M.H.- YUNG, S.Y. (2006) "Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification", Speech Communication 48, 1: 71-84.
http://dx.doi.org/10.1016/j.specom.2005.05.013

LINDBERG, J.- BLOMBERG, M.- MELIN, H. (1997) "CAVE - Speaker verification in bank and telecom services", Phonum 4 (Fonetik 97, Umeå University, Sweden, May 28-30, 1997): 65-68.

MINEMATSU, N. - SEKIGUCHI, M. - HIROSE, K. (2002) "Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers", in ICASSP 2002. Proceedings of the 2002 IEEE international conference on acoustics, speech and signal processing. 13 – 17 May, 2002. Orlando, Florida, USA. Vol 1, pp. 137-140.

MINEMATSU, N. - SEKIGUCHI, M. - HIROSE, K. (2002) "Performance Improvement in Estimating Subjective Ageness with Prosodic Features", in Speech Prosody 2000. An International Conference. Aix-en-Provence, France, 11-13 April 2002.
http://aune.lpl.univ-aix.fr/sp2002/pdf/minematsu-etal.pdf

NOLAN, F.- SCHERER, K. (2000) "Speaker verification with elicited speaking styles in the VeriVox project", Speech Communication 31, 2-3: 121-130.

ORTEGA GARCÍA, J.- CRUZ LLAMAS, S.- GONZÁLEZ RODRÍGUEZ, J. (1998) "Quantitative influence of speech variability factors for automatic speaker verification in forensic tasks", in ICSLP 98 Conference Proceedings CD-ROM. The 5th International Conference on Spoken Language Processing. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998. Rundle Mall: Causal Productions, 1998.

ORTEGA GARCÍA, J.- GONZÁLEZ RODRÍGUEZ, J. - MARRERO AGUIAR, V.- DÍAZ GÓMEZ, J.J.- GARCÍA JIMÉNEZ, R.- LUCENA MOLINA, J.- SÁNCHEZ MOLERO, J.A.G. (1998) "AHUMADA: A Large Speech Corpus in Spanish for Speaker Identification and Verification", in Proceedings of ICAPSSP-98. IEEE International Conference on Acoustics Speech and Signal Processing. May 1998. pp. 773-776.

ORTEGA GARCÍA, J.- GONZÁLEZ RODRÍGUEZ, J.- MARRERO AGUIAR, V.- DÍAZ GÓMEZ, .J.- GARCÍA JIMÉNEZ, R.- LUCENA MOLINA, J.- SÁNCHEZ MOLERO, J.A.G. (1998) "Speaker recognition-oriented 'Ahumada' large speech corpus", in RUBIO, A.- GALLARDO, N.- CASTRO, R.- TEJADA, A. (Eds.) Proceedings of the First International Conference on Language Resources and Evaluation. May 28 - 30, 1998, Granada, Spain. European Language Resources Association. Vol. II. pp. 1101 - 1106.

ORTEGA, J.- GONZÁLEZ, J.- TAPIAS, D. (2000) "Consistencia fonética del español en sistemas de verificación de locutor sobre locuciones de corta duración tipo PIN", in ORTEGA GARCÍA, J. (Ed.) SEAF 2000. Actas del I Congreso de la Sociedad Española de Acústica Forense. Universidad Politécnica de Madrid, Escuela Universitaria de Ingeniería Técnica de Telecomunicación, Madrid, 5-6 de octubre de 2000. Madrid: EUIT de Telecomunicación. pp. 199-206.

RODRÍGUEZ, L.- DOCÍO, L.- GARCÍA, C. (1998) "Panorámica de la tecnología en reconocimiento automático de locutores", in GÓMEZ GUINOVART, J.- PALOMAR, M. (Coords.) (1998) Monografía: Lengua y Tecnologías de la Información. Novática, Revista de la Asociación de Técnicos de Informática 133 (Mayo-Junio): 36-40.

ROSE, P. (2006) "Technical forensic speaker recognition: Evaluation, types and testing of evidence", Computer Speech and Language 20, 2-3: 159-191.
http://dx.doi.org/10.1016/j.csl.2005.07.003

ROSENBERG, A.E. (1976) "Automatic speaker verification: a review", Proceedings of the IEEE 64, 4: 475-486.

SATUÉ, A.- FAÚNDEZ, M. (1999) "On the relevance of language in speaker recognition", Eurospeech'99, 6th European Conference on Speech Communication and Technology. September 5-9, 1999, Budapest, Hungary.
http://www.isca-speech.org/archive/eurospeech_1999/e99_1231.html

SHRIBERG, E.- FERRER, L.- KAJAREKAR, S.- VENKATARAMAN, A.- STOLCKE, A. (2005) "Modeling prosodic feature sequences for speaker recognition", Speech Communication 46: 455-472.
http://dx.doi.org/10.1016/j.specom.2005.02.018

SHUTERLAND, A.- JACK, M. (1988) "Speaker Verification", in JACK, M.- LAVER, J. (Eds.) Aspects of Speech Technology. Edinburgh: Edinburgh University Press. pp. 184-215.

arrow_up_gray

Language identification

ADDA-DECKER, M.- FABIEN, A.- BOULA DE MAREÜIL, Ph.- VASILESCU, I.- LAMEL, L.- VAISSIÈRE, J.- GEOFFROIS, E.- LIÉNARD, J.-S. (2003) "Phonetic Knowledge, Phonotactics and Perceptual Validation for Automatic Language Identification", in Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, 3-9 August 2003. pp. 747-750.
http://www.limsi.fr/Individu/mareuil/publi/PS021162.pdf

ANTOINE, F.- ZHU, D.- BOULA DE MAREÜIL, P.- ADDA-DECKER, M. (2004) "Approches segmentales multilingues pour l'identification automatique de la langue : phones et syllabes", in JEP 2004. Journées d'Etude sur la Parole 2004. 19-22 avril 2004. Fès, Maroc.
http://www.limsi.fr/Individu/mareuil/publi/Antoine-Zhu-etal.pdf

BARKAT-DEFRADAS, M.- VASILESCU, I.- PELLEGRINO, F. (2003) "Stratégies perceptuelles et identification automatique des langues: application au continuum dialectal arabe", Revue PArole (Mons) 25-26: 1-44.

BARTKOVA, K.- JOUVET, D. (2004) "Ensemble élargi de phonèmes pour la reconnaissance de parole avec accents", in MIDL 2004. Modélisations pour l'identification des langues et des variétés dialectales. 29-30 Novembre, 2004. Paris, France. pp. 77-78.
http://www.limsi.fr/MIDL/actes/session%20III/Bartkova&Jouvet_MIDL2004.pdf

GEOFFROIS, E. (2004) "Indentification automatique des langues: techniques, ressources et évaluations", in MIDL 2004. Modélisations pour l'identification des langues et des variétés dialectales. 29-30 Novembre, 2004. Paris, France. pp. 43-44.
http://www.limsi.fr/MIDL/actes/conference%20invitee%20I/Geoffrois_MIDL2004.pdf

MIDL 2004. Modélisations pour l'identification des langues et des variétés dialectales. 29-30 Novembre, 2004. Paris, France.
http://www.limsi.fr/MIDL/actes/

MUTHUSAMY, Y.K.- SPITZ, L. (1997) "Automatic Language Identification", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press. 314-317.
http://cslu.cse.ogi.edu/HLTsurvey/ch8node9.html

MUTUSHAMY, Y.K.- BARNARD, E.- COLE, R.A. (1994) "Reviewing Automatic Language Identification", IEEE Signal Processing Magazine, October 1994: 33-41.

Rodríguez Fuentes, L. J., Penagarikano, M., Varona, A., Díez, M., & Bordel, G. (2010). Overview of the Albayzín 2010 language recognition evaluation: Database design, evaluation plan and preliminary analysis of the results. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 309-15). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0070.pdf

ZISSMAN, M.A. - BERKLING, K.M. (2001) "Automatic Language Identification", Speech Communication 35, 1-2: 115-124.

arrow_up_gray

Spoken language understanding

GUPTA, N. - TUR, G. - HAKKANI-TÜR, D. - BANGALORE, S. - RICCARDI, G. - GILBERT, M. (2006) "The AT&T Spoken Language Understanding System", IEEE Transactions on Audio, Speech and Language Processing 14, 1: 213-222.
http://dx.doi.org/10.1109/TSA.2005.854085

KOMPE, R. (1997) Prosody in Speech Understanding Systems. Berlin-New York: Science Springer (Lecture Notes in Artificial Intelligence, Vol. 1307 Subseries of Lecture Notes in Computer Science Springer).

MINKER, W. (1999) Compréhension automatique de la parole spontanée. Paris: L'Harmattan.

PRICE, P. (1997) "Spoken Language Understanding", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press. pp. 49-56.
http://cslu.cse.ogi.edu/HLTsurvey/ch1node10.html


SEGARRA, E. (2006) "La interpretación semántica", in LLISTERRI, J.- MACHUCA, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). pp. 99-118.

Tur, G., & de Mori, R. (Eds). (2011). Spoken language understanding: Systems for extracting semantic information from speech. Oxford - New York: John Wiley & Sons.

WANG, Y.-Y.- DENG, L.- ACERO, A. (2005) "Spoken language understanding", IEEE Signal Processing Magazine 22, 5: 16-31.
http://dx.doi.org/10.1109/MSP.2005.1511821

ZUE, V.W. (1991) "From signals to symbols to meaning: On machine understanding of spoken language", in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence: Université de Provence, Service des Publications. vol 1. pp. 74 -83.

arrow_up_gray

Speech recognition

Speech Recognition


Speech Recognition - Bibliography
Joaquim Llisterri, Universitat Autònoma de Barcelona
http://liceu.uab.cat/~joaquim/speech_technology/tecnol_parla/recognition/refs_reconeixement.html
Last modified: 14/11/11 18:04

Bookmark and Share

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.