Speech Recognition
Bibliography
General overviews
Textbooks
Compilations and conference proceedings
Speech recognition techniques
Statistical modellingPhonetic and linguistic knowledge in speech recognition
Connected and continuous speech
Large vocabularies
Phonetic knowledgeSpeech recognition products and applications
Prosodic knowledge
Phonological knowledge
Recognition of emotional speech
Speech recognition evaluation and assessment
Speaker recognition
Language identification
Spoken language understanding
Suggested reading
Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: A review. International Journal of Computer Science and Information Society, 6(3), 181-205. Retrieved from http://arxiv.org/pdf/1001.2267
BAKER, J.M. (1987) "State-of-the-Art Speech Recognition, US Research and Business Update", in LAVER, J.- JACK, M.A. (Eds.) (1987) European Conference on Speech Technology. Edinburgh, September 1987. Edinburgh: CEP Consultants Ltd. pp.440-446.
BERNSTEIN, J.- FRANCO, H. (1996) "Speech recognition by computer", in LASS,
N.J (Ed.) Principles of Experimental Phonetics. St Louis: Mosby. pp.
408-434.
BRISTOW, G. (1986) "The Speech Recognition Problem" in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins.pp. 3-17.
CASACUBERTA, F.- VIDAL, E. (1987) "Reconocimiento automático del habla: metodologías y arquitecturas", in MOMPÍN, J. (Dir.) Inteligencia artificial. Conceptos, técnicas y aplicaciones. Barcelona: Marcombo - Boixareu Editores. pp. 167-177.
CASACUBERTA, F.- VIDAL, E. (1990) "Reconocimiento automático del habla", Estudios de Fonética Experimental 4: 169-180
CASACUBERTA NOLLA, F. (1991) "Aprendizaje automático en reconocimiento del habla", in Simposio de la Lengua Española. Ciencia y Tecnología. Pabellón de España, Barcelona 7-11 de octubre de 1991.
COLE, R.- ZUE, V. (Eds.) (1997) "Spoken Language Input", in COLE, R.A.- MARIANI,
J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds) Survey of the State of the Art
in Human Language Technology. Cambridge: Cambridge University Press. pp.
1-70.
http://cslu.cse.ogi.edu/HLTsurvey/ch1node2.html#Chapter1
CHOLLET, G. (1994) "Automatic Speech and Speaker Recognition: Overview, Current Issues and Perspectives", in KELLER, E. (Ed.) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons. pp. 129-148.
DEROO, O. (1999) A Short Introduction to Speech Recognition. TCTS Lab, Faculté Polytechnique de Mons.
http://tcts.fpms.ac.be/asr/intro.php
ELPHICK, M. (1984) "Speech Recognition" in BRISTOW, G. (Ed.) Electronic Speech Synthesis. Techniques, Technology and Applications. London: Granada. pp. 114-128.
FURUI, S. (1991) "Recent advances in speech recognition", in Eurospeech 91. 2nd european conference on speech communication and technology. Genova, Italy, 24-26 September 1991. vol. 1 pp. 3-12.
GABRIANOWSKI, E. How Speech Recognition Works, HowStuffWorks
http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm
García Mateo, C. & Cardenal, A. (2008). Recoñecemento automático da fala: Ideas básicas e algúns exemplos. In E. Fernández Rei & X. L. Regueira (Eds.), Perspectivas sobre a oralidade. (pp. 249-72). Santiago de Compostela: Consello da Cultura Galega - Instituto da Lingua Galega.
http://consellodacultura.org/mediateca/extras/simposio_oralidade.pdf
GAUVAIN, J.L. - LAMEL, L.F. (2002) "Systèmes de reconnaissance, de compréhension et de dialogue", in MARIANI, J. (Ed.) Reconnaissance de la parole.Traitement automatique du langage parlé 2. Paris: Hermes Science - Lavoisier. Vol. 2, pp. 47-83.
GOLDEROS, A.- MARTÍNEZ, R.- NOMBELA, J.R.- PARDO, M.- SANTOS, J.- MUÑOZ, E. (1980) "Comunicación hombre máquina por voz ( y IV): El reconocimiento de la voz", Mundo electrónico 99: 131-134.
KLATT, D. H. (1983) "Human and Automatic Speech Recognition" in BROECKE, M.P.R. van den - COHEN, A. (Eds.) Proceedings of the Tenth International Congress of Phonetic Sciences. Dordrecht: Foris. pp. 183-186.
KURZWEIL, R. (1998) "When Will HAL Understand What We Are Saying?
Computer Speech
Recognition and Understanding", in STORK, D.G. (Ed.) Hal's Legacy: 2001's Computer
as Dream and Reality. Cambridge, Mass.: The MIT Press.
http://mitpress.mit.edu/e-books/Hal/chap7/seven1.html
LAMEL, L.- GAUVAIN, J.L. (2003) "Speech recognition", in MITKOV, R. (Ed.)
The
Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press.
La reconnaissance vocale. Délégation Génerale à la Langue
Française, Réseau
international des observatoires francophones de l'inforoute et du traitement
informatique des langues.
http://www.culture.gouv.fr/culture/dglf/riofil/recon-vocal.htm
LEA, W.A. (1974) "Computer Recognition of Speech", in T.A. SEBEOK (Ed.) Current Trends in Linguistics, vol 12, Linguistics and Adjacent Arts and Sciences, vol 4. Mouton: The Hague. pp. 2765-2824.
LEA, W.A. (1986) "The Elements of Speech Recognition", in
BRISTOW, G. (Ed.)
(1986) Electronic Speech Recognition. Techniques, Technology and
Applications.
London: Collins. pp. 49-129.
LEVINSON, S.E.- LIBERMAN, M.Y. (1981) "Speech Recognition by
Computer", Scientific
American 244: 64-76; trad. cast. de R. Cerdà: "Reconocimiento del habla por
medio de ordenadores", Investigación y Ciencia, 1981. pp. 38-51; in AGULLÓ,
J. (Ed.) (1989) Acústica musical. Barcelona: Prensa Científica (Libros de
Investigación y Ciencia) pp. 106-121.
MARIÑO, J.B.- NADEU, C. (2004) "La representación de la voz para el reconocimiento del habla", in MARTÍ, M. A. - LLISTERRI, J. (Eds.) Tecnologías del texto y del habla. Barcelona. Edicions de la Universitat de Barcelona - Fundación Duques de Soria (UB, 72). pp. 187-224.
MOORE, R. (1984) "Overview of speech input", in HOLMES, J.N. (Ed.) (1984) Proceedings of the First International Conference on Speech Technology. Brighton, 23-25 October 1984. Amsterdam: North Holland. pp. 25-38.
NADEU, C. (2001) "Representación de la voz en el reconocimiento del habla",
Quark. Ciencia, Medicina, Comunicación y Cultura 21: 63-71.
http://quark.prbb.org/21/021063.htm
PARDO, J.M. (1988) "Reconocimiento del habla: una introducción", Boletín de
la Sociedad Española para el Procesamiento del Lenguaje Natural 6 : 3-16.
PECKHAM, J.B. (1984) "Speech recognition - What is it worth ?", in HOLMES, J.N. (Ed.) (1984) Proceedings of the First International Conference on Speech Technology. Brighton, 23-25 October 1984. Amsterdam: North Holland. pp.39-48.
REDDY, R.D. (1976) "Speech Recognition by Machine: A Review", Proceedings of the IEEE 64, 4: 501-531.
Renals, S. & King, S. (2010). Automatic speech recognition. In W. J. Hardcastle, J. Laver, & F. E. Gibbon (Eds.), The handbook of phonetic sciences (2nd ed.). (pp. 804-38). Oxford: Wiley-Blackwell.
ROACH, P.- MILLER, D.- EMSLIE, J. (1992) "Speech Analysis and Recognition", in ROACH, P. (Ed.) Computing in Linguistics and Ponetics. London: Academic Press. pp. 35-50.
RUBIO, A.J.- BENÍTEZ, M.C. Sistemas de reconocimiento del habla. Programa de Doctorado:
Tratamiento Digital de la Imagen y la Voz, Universidad de Granada
http://ceres.ugr.es/~rubio/docencia/doctorado/RAH_archivos/frame.htm
SOPEÑA, L. (1993) "Conversando con el ordenador. Reconocimiento automático del habla", Investigación y Ciencia, Mayo: 76-78.
TAPIAS MERINO, D. (2002) "Interfaces de voz con lenguaje natural", in MARTÍ,
M.A.- LLISTERRI, J. (Eds.) (2002) Tratamiento del lenguaje natural. Tecnología
de la lengua oral y escrita. Barcelona: Edicions Universitat de Barcelona - Fundación Duques de Soria
(Biblioteca de la Universitat de Barcelona, Manuales, 53). pp. 189-207.
TORRES, M. I. (2006) "El reconocimiento del habla", in LLISTERRI, J.- MACHUCA, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). pp. 81-98.
VAISSIÈRE, J. (1985) "Speech Recognition: A Tutorial", in F. FALLSIDE - W.A. WOODS (Eds.) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 191-242.
VIGLIONE, S. (1986) "Recognition Past and Future", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins. pp. 373-387.
ZUE, V.- COLE, R.- WARD, W. (1997) "Speech Recognition", in COLE, R.A.- MARIANI,
J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art
in Human Language Technology. Cambridge: Cambridge University Press.
pp.4-10.
http://cslu.cse.ogi.edu/HLTsurvey/ch1node4.html
CASACUBERTA, F.- VIDAL, E. Con la colaboración de J.M. BENEDÍ y J. MARTÍ. (1987) Reconocimiento automático del habla. Barcelona: Marcombo - Boixareu Editores (Premios Mundo Electrónico).
CATER, J.P. (1984) Electronically Hearing: Computer Speech Recognition. Indianapolis: Howard W Sams & Co., Inc.
HATON, J.P.- CERISARA, C.- FOHR, D.- LAPRIE, Y.- SMAÏLI, K. (2006) Reconnaissance automatique de la parole. Du signal à son interprétation. Paris: Dunod (UniverSciences).
http://parole.loria.fr/livreParole/
1.- Introduction à la reconnaissance automatique de la parole; 2.- La communication parlée; 3.- Analyse du signal vocal; 4.- Modèles acoustiques pour la reconnaissance automatique de la parole; 5.- Techniques avancées; 6.- La modélisation statistique du langage: application à la reconnaissance de la parole; 7.- La compréhension automatique de la parole; 8.- Robustesse de la reconnaissance de la parole; 9.- Mise en oeuvre d'un système; 10.- Un cadre articulatoire pour la reconnaissance automatique de la parole; 11.- Applications de la reconnaissance automatique de la parole.HATON, J.-P.- PIERREL, J.-M. - PERENNOU, G.- CAELEN, J.- GAUVAIN, J.-L. (1991) La reconnaissance automatique de la parole. Paris: Dunod.
HOLMES, J.N. (1988) Speech Synthesis and Recognition. Wokingham: Van Nostrand Reinhold (Aspects of Information Technology).
HOLMES, J.N..- HOLMES, W. (2001) Speech Synthesis and Recognition. London: Taylor & Francis, 2nd edition.
JELINEK, F. (1998) Statistical Methods for Speech Recognition. Cambridge: The
MIT Press (Language, Speech and Communication Series).
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=7447
LLAMAS BELLO, C.- CARDEÑOSO PAYO, V. (1997) Reconocimiento automático del habla. Técnicas y aplicación. Valladolid: Secretariado de Publicaciones e Intercambio Científico, Universidad de Valladolid (Ciencias, 16).
O'SHAUGHNESSY, D. (1987) Speech Communication. Human and Machine. Reading, Mass.: Addison Wesley. Second Edition: IEEE Press, 2000.
POULTON, A.S. (1983) Microcomputer Speech Synthesis and Recognition. Wilmslow: Sigma Technical Press.
RABINER, L- HUANG, B.-H. (1993) Fundamentals of Speech Recognition. New York: Prentice Hall.
HATON, J.P. (Ed.) (1982) Automatic Speech Analysis and Recognition. Proceeedings of the NATO Advanced Study Instituteheld at Bonas, France, June 29- July 10, 1981. Dordrecht: Reidel (NATO Advanced Study Institute Series, Series C, Vol 88).
HOUSE, A.S. (1988) The Recognition of Speech by Machine: A Bibliography. New York: Academic Press.
KELLER, E. (Ed.) (1994) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons.
LAFACE, P. (Ed.) (1990) Speech Recognition and Understanding: Recent Advances, Trends and Applications. Springer-Verlag. (NATO ASI Series).
LEA, W.A. (Ed.) (1980) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall (Prentice Hall Signal Processing Series).
Proceedings of the ESCA ETRW Workshop "Accessing information in spoken audio". 19th and 20th April 1999, Cambridge, UK. ESCA, European Speech Communication Association.
REDDY, R.D. (Ed.) (1975) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press.
RUBIO, A.J. - LÓPEZ SOLER, J.M. (Eds.) Speech Recognition and Coding: News Advances and Trends. Springer-Verlag (NATO ASI series F. Computer and Systems Sciences).
SCHROEDER, M. R. (Ed.) (1985) Speech and Speaker Recognition. Basel: Karger (Bibliotheca Phonetica, 12).
SCHWAB, E.E.- NUSBAUM, H. (Eds.) (1986) Pattern Recognition by Humans and Machines. Volume 1: Speech Perception. Orlando: Academic Press, Inc.
SUEN, C.Y - DE MORI, R. (Eds.) (1982) Computer Analysis and Perception. Vol II: Auditory Signals. Boca Raton, F.L.: CRC Press.
TORRES, L.- MASGRAU, E.- LAGUNAS, M.A. (Eds.) (1990) Signal Processing V: Theories and Applications. Elsevier Science Publishers.
WAIBEL, A.- LEE, K.F. (Eds.) (1990) Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann.
Speech technologies: conference proceedings
WOSZCYNA, M. (2001) "Técnicas de reconocimiento del habla: entre la precisión
y la velocidad", Quark. Ciencia, Medicina, Comunicación y Cultura 21:
72-78.
http://quark.prbb.org/21/021062.htm
COX, S.J. (1990) "Hidden Markov Models for Automatic Speech Recognition: Theory and Application", in WHEDDON, C.- LINGGARD, R. (Eds) Speech and Language Processing. London: Chapman and Hall. pp. 209-230.
HOLMES, W.- HUCKVALE, M. (1994) "Why have HMMs been so successful for automatic speech recognition and how might they be improved?", Speech, Hearing and Language, Work in Progress, 1994 (University College London, Department of Phonetics and Linguistics) 8: 207-219.
HUANG, X.- ARIKI, Y.- JACK, M. (1990) Hidden Markov Models for Speech Recognition. Edinburgh: Edinburgh University Press.
JELINEK, F. (1997) Statistical Methods for Speech Recognition. Cambridge: The MIT Press (Language, Speech and Communication Series).
JOUVET, D. (1996) "Modèles de Markov pour la reconnaissance de la parole, in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 225-238.
KNILL, K.- YOUNG, S. (1997) "Hidden Markov Models in Speech and Language Processing", in YOUNG, S.- BLOOTHOOFT, G. (Eds.) Corpus-Based Methods in Language and Speech Processing. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 2) pp. 27-68.
MARIÑO, J.B.- NOGUEIRAS, A.- PACHÈS-LEAL, P.- BONAFONTE, A. (2000) "The demiphone: An efficient contextual subword unit for continuous speech recognition",Speech Communication 32, 3: 187-198.
SAN-SEGUNDO, R.- COLÁS, J.- DE CÓRDOBA, R.- PARDO, J.M. (2002) "Spanish recognizer of continuously spelled names over the telephone", Speech Communication 38, 3-4: 287-304.
LAMEL, L.- ADDA, M.- ADDA, G.- GAUVAIN, J.L. (1996) "Reconnaissance multilingue de grands vocabulaires, in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 299-310.
LAMEL, L.- ADDA-DECKER, M.- GAUVAIN, J.L. (1995) "Issues in Large Vocabulary Multilingual Speech Recognition", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 185-188.
WAIBEL, A. (1986) "Suprasegmentals in very large vocabulary word recognition", in SCHWAB, E.E.- NUSBAUM, H. (Eds.) (1986) Pattern Recognition by Humans and Machines. Volume 1: Speech Perception. Orlando: Academic Press, Inc. pp. 159-186.
YOUNG, S.J.- ADDA-DEKKER, M.- AUBERT, X.- DUGAST, C.- GAUVAIN, J.L.- KERSHAW, D.J.- LAMEL, L.- LEEUWEN, D.A.- PYE, D.- ROBINSON, A.J.- STEENEKEN, H.J.M. - WOODLAND, P.C. (1997) "Multilingual large vocabulary speech recognition: The European SQALE project", Computer Speech and Language 11,1: 73-89.
Phonetic knowledge in speech technology
ADDA-DECKER, M.- LAMEL, L. (1998) "Pronunciation variants across systems, languages and speaking style", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 1-6.
ADDA-DECKER, M.- LAMEL, L. (2000) "The use of lexica in automatic speech recognition", in VAN EYNDE, F.- GIBBON, D. (Eds.) Lexicon Development for Speech and Language Processing. Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 12). pp. 235-266.
AINSWORTH, W.A. (2005) "Can phonetic knowledge be used to improve the performance of speech recognisers and synthesisers?", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 13-20.
Aubanel, V., & Nguyen, N. (2010). Automatic recognition of regional phonological variation in conversational interaction. Speech Communication, In Press, Accepted Manuscript. doi:10.1016/j.specom.2010.02.008
BATES, R. A. - OSTENDORF, M. - WRIGHT, R. A. (2007) "Symbolic phonetic features for modeling of pronunciation variation", Speech Communication 49, 2: 83-97.
http://dx.doi.org/10.1016/j.specom.2006.10.007
BECKER, R.W.- POZA, F. (1975) "Acoustic Phonetic Research in Speech Understanding", IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23, 5: 416-426.
BENZEGHIBA, M. - DE MORI, R. - DEROO, O. - DUPONT, S. - ERBES, T. - JOUVET, D. - FISSORE, L. - LAFACE, P. - MERTINS, A. - RIS, C. - ROSE, R. - TYAGI, V. - WELLEKENS, C. (2007) "Automatic speech recognition and speech variability: A review", Speech Communication 49. 10-11: 763-786.
http://dx.doi.org/10.1016/j.specom.2007.02.006
BLADON, A. (1985) "Acoustic Phonetics, Auditory Phonetics, Speaker Sex and Speech Recognition: A Thread" , in FALLSIDE, F.- WOODS, W.A. (Eds.) (1985) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 29-38.
BROAD, D.J.- SHOUP, J.E. (1975) "Concepts for Acoustic Phonetic Recognition", in REDDY, R.D. (Ed.) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press. pp 243-274.
Caballero, M., Moreno, A., & Nogueiras, A. (2009). Multidialectal Spanish acoustic modeling for speech recognition. Speech Communication, 51(3), 217-229. doi:10.1016/j.specom.2008.08.003
CHRISTENSEN, H.- LINDGREN, B.- ANDERSEN, O. (2005) "Introducing phonetically motivated, heterogeneous information into automatic speech recognition", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 67-86.
DUSAN, S.- RABINER, L.R. (2005) "On integrating insights from human speech perception into automatic speech recognition", in EUROSPEECH 2005 - INTERSPEECH 2005. Proceedings of the 9th european conference on speech communication and technology. 4-8 September, 2005. Lisbon, Portugal. pp. 1233-1236.
http://cronos.rutgers.edu/~lrr/Reprints/353_dr_euro2005c.pdf
FERREIROS, J.- MACÍAS GUARASA, J.- PARDO, J.M.- VILLARRUBIA, L. (1998) "Introducing multiple pronunciations in Spanish speech recognition systems", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 29-34.
FOSLER-LUSSIER, E.- GREENBERG, S.- MORGAN, N. (1999) "Incorporating contextual phonetics into automatic speech recognition", in OHALA, J.J.- HASAGAWA, Y.- OHALA, M.- GRANVILLE, D.- BAILEY, A.C. (Eds.) Proceedings of the 14th International Congress of Phonetic Sciences. San Francisco, 1-7 August 1999.
http://www.icsi.berkeley.edu/~fosler/papers/ICPhS99-invited.pdf
FOSLER-LUSSIER, E.- BYRNE, W.- JURAFSKY, D. (Eds.) (2005) Pronunciation Modeling and Lexicon Adaptation. Special Issue. Speech Communication 46, 2.
Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001
GRAVIER, G.- YVON, F.- JACOB, B.- BIMBOT, F. (2005) "Introducing contextual transcription rules in large vocabulary speech recognition", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 87-106.
GREENBERG, S. (1998) "Recognition in a new key - Towards a science of spoken language", in ICASSP 1998. Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing. 12 -15 May, 1998. Seattle, Washington, USA. pp. 1401-1405.
http://www.icsi.berkeley.edu/~steveng/PDF/Recognition_in_a_New_Key.pdf
HAIN, T. (2005) "Implicit modelling of pronunciation variation in automatic speech recognition", Speech Communication 46, 2: 171-188.
http://dx.doi.org/10.1016/j.specom.2005.03.008
HARRINGTON, J. (1988) "Acoustic Cues for Automatic Recognition of English Consonants", in JACK, M.- LAVER, J. (Eds.) Aspects of Speech Technology. Edinburgh: Edinburgh University Press pp. 69-143.
KLATT, D. H. (1985) "The problem of variability in speech recognition and in models of speech perception", in J.A. PERKELL - D.H. KLATT (Eds.) Variability and Invariance in Speech Processes. Hillsdale, N.J.: Lawrence Erlbaum Ass. pp. 300-324.
KOREMAN, J.- ANDREEVA, B. (2000) "Can we use the linguistic information in the signal?", Phonus (Institute of Phonetics, University of the Saarland) 5: 47-58.
http://www.coli.uni-saarland.de/groups/WB/Phonetics/Research/PHONUS_research_reports/Phonus5/Koreman_PHONUS5.pdf
LI, D. - DONG, Y.- ACERO, A. (2006) "A bidirectional target-filtering model of speech coarticulation and reduction: Two-stage implementation for phonetic recognition", IEEE Transactions on Audio, Speech and Language Processing 14, 1: 256-265.
http://dx.doi.org/10.1109/TSA.2005.854107
NOLAN, F. (1986) "The nature of speech", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins.pp. 18-48.
OSTENDORF, M. (2000) "Incorporating linguistic theories of pronunciation variation into speech recognition models", in SPARCK JONES, K.- GAZDAR, G.- NEEDHAM, R. (Eds.) Computers, language and speech: Formal theories and statistical Data. Papers from a Royal Society / British Academy Discussion Meeting, September 1999. London: The Royal Society (Philosophical Transactions of the Royal Society, Series A: Mathematical, Physical en Engineering Sciences, Vol. 358, Issue 1769).
PASTOR, M.- CASACUBERTA, F. (2005) "Pronunciation modeling", in BARRY, W.J.- van DOMMELEN, W.A. (Eds.) The Integration of Phonetic Knowledge in Speech Technology. Dordrecht: Springer (Text, Speech and Language Technology, 25). pp. 133-148.
POLS, L.C.W. (1997) "Flexible, robust, and efficient human speech recognition", Proceedings of the Institute of Phonetic Sciences, University of Amsterdam 21: 1-10.
http://www.fon.hum.uva.nl/Proceedings/Proceedings21/LouisPols/LouisPols-Contents.html
PRUTHI, T.- ESPY-WILSON, C.Y. (2004) "Acoustic parameters for the automatic detection of nasal manner", Speech Communication 43, 3: 241-266.
http://dx.doi.org/10.1016/j.specom.2004.06.001
SCHARENBORG, O. - WAN, V. - MOORE, R. K. (2007) "Towards capturing fine phonetic variation in speech using articulatory features", Speech Communication 49, 10-11: 811-826.
http://dx.doi.org/10.1016/j.specom.2007.01.005
SCHRAMM, H. - AUBERT, X.- BAKKER, B.- MEYER, C.- NEY, H. (2006) "Modeling spontaneous speech variability in professional dictation", Speech Communication 48, 5: 493-515.
http://dx.doi.org/10.1016/j.specom.2005.08.003
SROKA, J.J.- BRAIDA, L.D. (2005) "Human and machine consonant recognition", Speech Communication 45, 4: 401-423.
http://dx.doi.org/10.1016/j.specom.2004.11.009
STRIK, H.- CUCCHIARINI, C. (1998) "Modeling pronunciation variation for ASR: overview and comparison of methods", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 137-144.
http://lands.let.kun.nl/TSpublic/strik/publications/a47.html
STRIK, H.- CUCCHIARINI, C. (1999) "Modeling pronunciation variation for ASR:
A survey of the literature", in STRIK, H. (Ed.) Special Issue on Modeling Pronunciation
Variation for Automatic Speech Recognition. Speech Communication 29, 2-4: 225-246.
STRIK, H. (Ed.) Special Issue on Modeling Pronunciation Variation for Automatic Speech Recognition. Speech Communication 29, 2-4.
STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) (1998) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. ESCA, European Speech Communication Association; COST Action 249, Continuous Speech over the Telephone; A2RT, Automatic Acoustic Recognition Technologie.
SUOMI, K. (1987) "On spectral coarticulation in stop-vowel-stop syllables: implications for automatic speech recognition", Journal of Phonetics 15,1: 85-100.
URAGA, E.- PINEDA, L. (2002) "Automatic Generation of Pronunciation Lexicons for Spanish", in GELBUKH, A. (Ed.) Computational Linguistics and Intelligent Text Processing. Proceedings of the Third International Conference, CICLing 2002. México City, México, February 17-23, 2002. Heidelberg: Springer Verlag (Lectures Notes in Computer Science, 2276). pp. 330-338.
http://springerlink.metapress.com/content/crl4dnrde4jkhlp4/
ZOLNAY, A. - KOCHAROV, D. - SCHLÜTER, R. - NEY, H. (2007) "Using multiple acoustic feature sets for speech recognition", Speech Communication 49, 6: 514-525.
http://dx.doi.org/10.1016/j.specom.2007.04.005
ZUE, V.W. (1983) "The use of phonetic rules in automatic speech recognition", Speech Communication 2, 2/3 : 181-186.
ZUE, V.W. (1985) "The Use of Speech Knowledge in Automatic Speech Recognition", Proceedings of the IEEE 73,11: 1602-1615.
ZUE, W.V. - SCHWARTZ, R.M. (1980) "Acoustic Processing and Phonetic Analysis", in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall (Prentice Hall Signal Processing Series) pp. 101-124.
Connolly, J. H., Edmonds, E. A., Guzy, J. J., Johnson, S. R., & Woodcock, A. (1986). Automatic speech recognition based on spectrogram reading. International Journal of Man-Machine Studies, 24(6), 611-621. doi:10.1016/S0020-7373(86)80012-8
Gabrys, G. (1990). Difficulty in learning to read speech spectrograms: The role of visual segmentation (Technical Report LRDC/PITT/IMP-1. Cognitive Science Program. Office of Naval Research). Pittsburgh: Learning Research and Development Center, University of Pittsburgh. Retrieved from http://handle.dtic.mil/100.2/ADA218827
Greene, B. G., Pisoni, D. B., & Carrell, T. D. (1984). Recognition of speech spectrograms. The Journal of the Acoustical Society of America, 76(1), 32-43. doi:10.1121/1.391035
Hatazaki, K., Komori, Y., Kawabata, T., & Shikano, K. (1990). Phoneme segmentation expert system using spectrogram reading knowledge. Systems and Computers in Japan, 21(12), 90-100. doi:10.1002/scj.4690211210
Ingemann, F., & Mermelstein, P. (1975). Speech recognition through spectrogram matching. The Journal of the Acoustical Society of America, 57(1), 253-255. Retrieved from http://www.haskins.yale.edu/Reprints/HL0166.pdf
Johannsen, J., MacAllister, J., Michalek, T., & Ross, S. (1983). A speech spectrogram expert. In ICASSP 1983. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 746-9). Boston, Massachusetts, USA. April 14-16, 1983. doi:10.1109/ICASSP.1983.1172057
Katagiri, S., & Yokota, M. (1987). Phoneme recognition using visual features on speech spectrograms. In European conference on speech technology. (pp. 1365-8). Edinburgh, Scotland, UK. September 1987. Retrieved from http://www.isca-speech.org/archive/ecst_1987/e87_1365.html
Klatt, D. H., & Stevens, K. N. (1972). Sentence recognition from visual examination of spectrograms and machine-aided lexical searching. In 1972 Conference on speech communication and processing. (pp. 315-8). New York: IEEE Press.
Klatt, D. H., & Stevens, K. N. (1973). On the automatic recognition of continuous speech: Implications from a spectrogram-reading experiment. IEEE Transactions on Audio and Electroacoustics, 21(3), 210-217. doi:10.1109/TAU.1973.1162453
Lamel, L. (1988). Formalizing knowledge used in spectrogram reading: Acoustic and perceptual evidence from stops (RLE Technical Report 537). Cambridge, MA: Research Laboratory of Electronics, Massachusetts Institute of Technology. Retrieved from http://dspace.mit.edu/bitstream/handle/1721.1/4955/RLE-TR-537-20137092.pdf
Lamel, L. (1993). A knowledge-based system for stop consonant identification based on spectrogram reading. Computer Speech and Language, 7(2), 169-191. Retrieved from ftp://tlp.limsi.fr/public/csl93.pdf
Leung, H., & Zue, V. (1986). Visual characterization of speech spectrograms. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 2751-4). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168558
Memmi, D., Eskenazi, M., Mariani, J., & Nguyen-Xuan, A. (1983). Un système expert pour la lecture de sonagrammes. Speech Communication, 2(2-3), 234-236. doi:10.1016/0167-6393(83)90037-7
Stern, P. E., Eskenazi, M., & Memmi, D. (1986). An expert system for speech spectrogram reading. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1193-6). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168793
Zue, V., & Cole, R. (1979). Experiments on spectrogram reading. In ICASSP 1979. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 116-9). Washington, District of Columbia, USA. April 2 - 4, 1979. doi:10.1109/ICASSP.1979.1170735
Zue, V., & Lamel, L. (1986). An expert spectrogram reader: A knowledge-based approach to speech recognition. In ICASSP 1986. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1197-200). Tokyo, Japan. April 8 - 11, 1986. doi:10.1109/ICASSP.1986.1168798
Spectrographic analysis of speech
BARTKOVA, K. (1997) "Some experiments about the use of prosodic parameters in a speech recognition system", in Proceedings of the ESCA Workshop on Intonation. Athens, 18-20 September 1997. pp. 33-36.
BARTKOVA, K.- JOUVET, D. (1999) "Selective prosodic post-processing for improving recognition of French telephone numbers", in Eurospeech'99, 6th European Conference on Speech Communication and Technology. Budapest, Hungary, 5-10 September 1999. Vol 1 pp. 267-270.
BASSI, A.- BECERRA YOMA, N.- LONCOMILLA, P. (2006) "Estimating tonal prosodic discontinuities in Spanish using HMM", Speech Communication 48, 9: 1112-1125.
http://dx.doi.org/10.1016/j.specom.2006.03.006
CAMPBELL, N. (1993) "Automatic detection of prosodic boundaries in speech", Speech Communication 13, 3-4: 343-354.
CHEN, K. - HASEGAWA-JOHNSON, M. - COHEN, A. - BORYS, S. - SUNG-SUK, K. - COLE, J. - JEUNG-YOON, C. (2006) "Prosody dependent speech recognition on radio news corpus of American English", IEEE Transactions on Audio, Speech and Language Processing 14, 1: 232-245.
http://dx.doi.org/10.1109/TSA.2005.853208
ESCUDERO, D.- CARDEÑOSO, V. (2002) "Una experiencia en reconocimiento automático de tipos de unidades melódicas a partir de su perfil de entonación", in DÍAZ GARCÍA, J. (Ed.) Actas del II Congreso de Fonética Experimental. Sevilla 5, 6 y 7 de marzo de 2001. Sevilla: Laboratorio de Fonética, Facultad de Filología, Universidad de Sevilla. pp. 161-166.
GARCÍA, C.- TAPIAS, D. (2000) "La frecuencia fundamental de la voz y sus efectos en reconocimiento de habla continua", Procesamiento del Lenguaje Natural, Revista n. 26: 163-167.
Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001
HASEGAWA-JOHNSON, M.- CHEN, K.- COLE, J.- BORYS, S.- KIM, S.-S.- COHEN, A.- ZHANG, T.- CHOI, J.-Y.. KIM, H.- YOON, T.- CHAVARRIA, S. (2005) "Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus", Speech Communication 46: 418-439.
http://dx.doi.org/10.1016/j.specom.2005.01.009
KOMPE, R. (1997) Prosody in Speech Understanding Systems. Berlin-New York: Science Springer (Lecture Notes in Artificial Intelligence, Vol. 1307 Subseries of Lecture Notes in Computer Science Springer).
LEA, W.A. (1980) "Prosodic aids in speech recognition" in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs, N.J.: Prentice-Hall. pp. 166-205.
LONGUET-HIGGINS, C. (1985) "Tones of Voice: The Role of Intonation in Computer Speech Understanding", in FALLSIDE, F.- WOODS, W.A. (Eds.) Computer Speech Processing. Englewood Cliffs, N.J. : Prentice Hall International. pp. 293-302.
MÉLONI, H.- LANGLAIS, P. (1996) "Prosodie et reconnaissance de la parole", in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones). pp. 205-224.
PAGEL, V. (1999) De l'utilisation d'informations acoustiques suprasegmentales
en reconnaissance de la parole continue. Thèse Doctorale. Université Henri Poincaré,
Nancy.
http://vincent.pagel.free.fr/THESE/
RUBIO AYUSO, A.J. - MILONE, D.H. (2002) "Información prosódica y acentual para el reconocimiento automático del habla", in DÍAZ GARCÍA, J. (Ed.) Actas del II Congreso de Fonética Experimental. Sevilla 5, 6 y 7 de marzo de 2001. Sevilla: Laboratorio de Fonética, Facultad de Filología, Universidad de Sevilla. pp. 56-77.
SHRIBERG, E.- STOLCKE, A.- HAKKANI-TÜR, D.- TÜR, G. (2000) "Prosody-based automatic segmentation of speech into sentence and topics", Speech Communication 32, 1-2: 127-154.
Vicsi, K., & Szaszák, G. (2010). Using prosody to improve automatic speech recognition. Speech Communication, 52(5), 413-426. doi:10.1016/j.specom.2010.01.003
VICSI, K. - SZASZÁK, G. (2005) "Automatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features", International Journal of Speech Technology 8, 4: 363-370.
http://dx.doi.org/10.1007/s10772-006-8534-z
WAIBEL, A. (1986) "Suprasegmentals in very large vocabulary word recognition", in SCHWAB, E.E.- NUSBAUM, H. (Eds.) Pattern Recognition by Humans and Machines. Volume 1: Speech Perception. Orlando: Academic Press, Inc. pp. 159-186.
WAIBEL, A. (1988) Prosody and Speech Recognition. San Mateo, CA: Morgan Kaufmann.
ZEISSLER, V. - ADELHARDT, J. - BATLINER, A. - FRANK, C. - NÖTH, E. - SHI R. P. - NIEMANN, H. (2006) "The prosody module", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp.139-152.
COHEN, P.S.- MERCER, R.L. (1975) "The Phonological Component of an Automatic Speech-Recognition System", in REDDY, D.R. (Ed) Speech Recognition. Invited Papers Presented at the 1974 IEEE Symposium. New York: Academic Press. pp. 275-319.
CHURCH, K.W. (1987) Phonological parsing in speech recognition. Boston: Kluwer Academic Publishers (Kluwer International Series in Engineering and Computer Science, SECS 38).
DENG, L. (1997) "Speech recognition using autosegmental representation of phonological units with interface to the trended HMM", Speech Communication 23, 3: 211-222.
GIACHIN, E.- ROSENBERG, A.E.- LEE, C.-H. (1991) "Word juncture modeling using phonological rules for HMM-based continuous speech recognition", Computer Speech and Language 5,2: 155-168.
HOEQUIST Jr., C.- NOLAN, F. (1991) "On an application of phonological knowledge in automatic speech recognition", Computer Speech and Language 5,2: 133-153.
OSHIKA, B.- ZUE, V.W.- WEEKS, R.V. - NEU, H.- AURBACH, J. (1975) "The Role of Phonological Rules in Speech Understanding Research", IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23: 104-112.
PERENNOU, G.- BRIEUSSEL-POUSSE, L. (1998) "Phonological component in automatic speech recognition", in STRIK, H.- KESSENS, J.- WESTER, M. (Eds.) Proceedings of the Workshop Modeling Pronunciation Variation for Automatic Speech Recognition. Rolduc, 4-6 May 1998. pp. 91-96.
SENEFF, S.- WANG, C. (2005) "Statistical modeling of phonological rules through linguistic hierarchies", Speech Communication 46, 2: 204-216.
http://dx.doi.org/10.1016/j.specom.2005.03.005
SHOUP, J. E. (1980) "Phonological Aspects of Speech Recognition", in LEA, W.A. (Ed.) Trends in Speech Recognition. Englewood Cliffs: Prentice Hall . pp. 125-138.
Recognition of emotional speech
Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2010). Multiple feature extraction and hierarchical classifiers for emotions recognition. In A. Esposito, N. Campbell, C. Vogel, A. Hussain, & A. Nijholt (Eds.), Development of multimodal interfaces: Active listening and synchrony. Second COST 2102 International Training School. Dublin, Ireland, March 23-27, 2009. Revised selected papers. (pp. 242-54). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-12397-9_20. Retrieved from http://fich.unl.edu.ar/sinc/publications/2010/AMR10a/sinc_AMR10a.pdfAlbornoz, E. M., Milone, D. H., & Rufiner, H. L. (2010b). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, In Press, Accepted Manuscript. doi:10.1016/j.csl.2010.10.001
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., & Stolcke, A. (2002). Prosody-Based automatic detection of annoyance and frustration in human-computer dialog. In ICSLP 2002 - interspeech 2002. Proceedings of the 7th international conference on spoken language processing. (pp. 2037-40). Denver, Colorado, USA, September 16-20, 2002. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.4027
Baber, C., Mellor, B., Graham, R., Noyes, J. M., & Tunley, C. (1996). Workload and the use of automatic speech recognition: The effects of time and resource demands. Speech Communication, 20(1-2), 37-54. doi:10.1016/S0167-6393(96)00043-X
Barra, R., Montero, J. M., Macías, J., D'Haro, L. F., San-Segundo, R., & Córdoba, R. (2006). Prosodic and segmental rubrics in emotion identification. In ICASSP 2006. Proceedings of the IEEE international conference on acoustics, speech and signal processing. (pp. 1085-8). Toulouse, France, 14-19 May 2006. Retrieved from http://www-gth.die.upm.es/research/documentation/AG-39Pro-06.pdf
Burkhardt, F., Ajmera, J., Englert, R., Stegmann, J., & Burleson, W. (2006). Detecting anger in automated voice portal dialogs. In Interspeech 2006 - ICSLP. Proceedings of the 9th international conference on spoken language processing. Pittsburgh, PA, USA. September 17-21, 2006. Retrieved from http://felix.syntheticspeech.de/publications/recognitionOfAnger.pdf
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3). doi:10.1016/j.patcog.2010.09.020
Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007). Primitives-Based evaluation and estimation of emotions in speech. Speech Communication, 49(10-11), 787-800. doi:10.1016/j.specom.2007.01.010. Retrieved from http://asimov.usc.edu/~mower/Papers/GrimmSpeechComm2007.pdf
Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20(1-2), 151-173. doi:10.1016/S0167-6393(96)00050-7
Huber, R., Batliner, A., Buckow, J., Nöth, E., Warnke, V., & Niemann, H. (2000). Recognition of emotion in a realistic dialogue scenario. In ICSLP 2000. Proceedings of the 6th international conference on spoken language processing. (pp. 665-8). Beijing, China, October 16-20, 2000. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.6965
Kessous, L., Castellano, G., & Caridakis, G. (2010). Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. Journal on Multimodal User Interfaces, 3(1), 33-48. doi:10.1007/s12193-009-0025-5. Retrieved from http://www.image.ntua.gr/papers/638.pdf
Kotti, M., Paternò, F., & Kotropoulos, C. (2010). Speaker-Independent negative emotion recognition. In CIP 2010. 2nd International workshop on cognitive information processing. (pp. 417-22). Elba. June-14-16, 2010. doi:10.1109/CIP.2010.5604091. Retrieved from http://giove.isti.cnr.it/attachments/publications/2010-A2-041.pdf
Laukka, P., Neiberg, D., Forsell, M., Karlsson, I., & Elenius, K. (2011). Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation. Computer Speech & Language, 25(1), 84-104. doi:10.1016/j.csl.2010.03.004
Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48(5), 559-590. doi:10.1016/j.specom.2005.09.008.
López-Cózar, R., Silovsky, J., & Griol, D. (2010). Mejora del funcionamiento de sistemas de diálogo hablado mediante reconocimiento del estado emocional de usuarios. Procesamiento del Lenguaje Natural, 45, 191-198. Retrieved from http://www.sepln.org/ojs/ojs-2.2/index.php/pln/article/view/802/656
Luengo, I., & Navas, E. (2010). Feature analysis and evaluation for automatic emotion identification in speech. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 267-70). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0060.pdf
Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490-501. doi:10.1109/TMM.2010.2051872
Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Reconocimiento automático de emociones utilizando parámetros prosódicos. Procesamiento del Lenguaje Natural, 35, 13-20. Retrieved from http://www.sepln.org/revistaSEPLN/revista/35/02.pdf
Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98-112. doi:10.1016/j.specom.2006.11.004. Retrieved from http://www-ist.massey.ac.nz/rwang/publications/07SC.pdf
Neiberg, D., & Ellenius, K. (2008). Automatic recognition of anger in spontaneous speech. In Interspeech 2008. Proceedings of the 9th annual conference of the international speech communication association. (pp. 2755-8). Brisbane, Australia. September 22-26, 2008. Retrieved from http://www.speech.kth.se/prod/publications/files/3189.pdf
Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden markov models. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd Interspeech event. (pp. 2679-82). Aalborg, Denmark, September 3-7, 2001. Retrieved from http://gps-tsc.upc.es/veu/research/pubs/download/nog_emo_01.pdf
Origlia, A., Galatà, V., & Ludusan, B. (2010). Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, USA. May 11-14, 2010. Retrieved from http://aune.lpl.univ-aix.fr/~sprosig/sp2010/papers/100213.pdf
Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: Features and algorithms. International Journal of Human-Computer Studies, 59, 157-183. doi:10.1016/S1071-5819(02)00141-6. Retrieved from http://pyoudeyer.com/emotionsIJHCS.pdf
Polzehl, T., Schmitt, A., & Metze, F. (2010). Approaching multi-lingual emotion recognition from speech - on language dependency of acoustic/prosodic features for anger recognition. In Speech prosody 2010. Fifth international conference on speech prosody. Chicago, Illinois, USA. May 11-14, 2010. Retrieved from http://aune.lpl.univ-aix.fr/~sprosig/sp2010/papers/100442.pdf
Sidorova, J. (2009). Optimization techniques for speech emotion recognition. PhD Thesis, Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra. Retrieved from http://www.tesisenred.net/TDX-0113110-133822/
Sidorova, J., & Badia, T. (2008). ESEDA: Tool for enhanced speech emotion detection and analysis. Procesamiento del Lenguaje Natural, 41, 307-308. Retrieved from http://www.sepln.org/revistaSEPLN/revista/41/demo9.pdf
ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1-2), 213-225. doi:10.1016/S0167-6393(02)00083-3. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.4047&rep=rep1&type=pdf
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162-1181. doi:10.1016/j.specom.2006.04.003. Retrieved from http://poseidon.csd.auth.gr/papers/PUBLISHED/JOURNAL/pdf/Ververidis06a.pdf
Womack, B. D., & Hansen, J. H. L. (1996). Classification of speech under stress using target driven features. Speech Communication, 20(1-2), 131-150. doi:10.1016/S0167-6393(96)00049-0
Emotions in spoken language systems
Barnard, E., Schalkwyk, J., van Heerden, C., & Moreno, P. J. (2010). Voice search for development. In Interspeech 2010. Proceedings of the 11th annual conference of the international speech communication association. Makuhari, Chiba, Japan. September 26-30, 2010. Retrieved from http://www.isca-speech.org/archive/interspeech_2010/i10_0282.html
BERTON, A. - KALTENMEIER, A. - HAIBER, U. - SCHREINER, O. (2006) "Speech recognition", in WAHLSTER, W. (Ed.) SmartKom: Foundations of Multimodal Dialogue Systems. New York: Springer. pp. 85-108.
Cardenal, A., Peso, P., Bueno, M., Espiña, A., Rodríguez Silva, D. A., Adkinson, L., & Pellitero, A. (2010). TACOMA: On-Line transcription of audiovisual material. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 239-42). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0053.pdf
CERF-DANON, H.- DeGENNARO, S.- FERRETI, M.- GONZÁLEZ, J.- KEPPEL, E. (1991) "Tangora - a large vocabulary speech recognition system for five languages", in Eurospeech'91. 2nd European Conference on Speech Communication and Technology. Genova, Italy, 24-26 September 1991. Vol 1. p. 183-192.
CHELBA, C. - SILVA, J. - ACERO, A. (2007) "Soft indexing of speech content for search in spoken documents", Computer Speech and Language 21, 3: 458-478.
http://dx.doi.org/10.1016/j.csl.2006.09.001
CÓRDOBA, R.- MACÍAS, J.- SAMA, V.- BARRA, R.- PARDO, J.M. (2005) "New advances in cross-task and speaker adaptation for air traffic control tasks", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 21-28.
http://www-gth.die.upm.es/research/documentation/AI-90New-05.pdf
Delgado, H., Serrano, J., & Carrabina, J. (2010). Automatic metadata extraction from spoken content using speech and speaker recognition techniques. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 201-4). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0043.pdf
DEMEDTS, A. (1993) "Un sistema de reconocimiento del español con un léxico de 30.000 unidades", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 13: 435-437.
DIÉGUEZ, F.J.- GARCÍA, C.- CARDENAL, A. (2005) "Comparación de modelos de lenguaje para la transcripción automática de noticiarios televisivos", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 269-276.
DUGAST, Ch.- AUBERT, X.- KNESER, R. (1995) "The Philips Large-Vocabulary Recognition System for American English, French and German", in Eurospeech'95. Proceedings of the 4th european conference on speech communication and technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 197-200.
FLETCHER, R. (1997) "First Impressions of ViaVoice, Continuous Dictation Software
from IBM", Translation Journal 2, 1.
http://translationjournal.net/journal//02dict1.htm
García Mateo, C., Diéguez, J., Docío, L., & Cardenal, A. (2004). Transcrigal: A bilingual system for automatic indexing of broadcast news. In LREC 2004. Proceedings of the 4th international conference on language resources and evaluation. Lisbon, Portugal. May 24-30, 2004. Retrieved from http://www.gts.tsc.uvigo.es/web/imaxes_user/051104101114_lrec04_transcrigal.pdf
GONZÁLEZ, J.- MACÍAS, J.- PALMA, M.A.- PALOU, F.- TROS DE ILARDUYA, M. (1992) "Tangora/E, un reconocedor del habla para el castellano", Boletín de la Sociedad Española para el Procesamiento del Lenguaje Natural 12.
GRIMES, B. (1997) "Voice Recognition Software: Naturally Speaking from Dragon
Systems", Translation Journal 2, 1.
http://translationjournal.net/journal//02dict2.htm
HAEB-UMBACH, R.- GAMM, S. (1995) "Human Factors of a Voice-Controlled Car Stereo", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 2, pp. 1453-1456.
HAUPTMANN, A. (2006) "Automatic spoken document retrieval", in BROWN, K. (Ed.) Encyclopedia of Language & Linguistics. Amsterdam: Elsevier. pp. 95-103.
http://dx.doi.org/10.1016/B0-08-044854-2/00922-6
HAIN, T.- WOODLAND, P.C.- EVERMANN, G.- GALES, M.J.F.- LIU, X.- MOORE, G.L.- POVEY, D. (2005) "Automatic transcription of conversational telephone speech", IEEE Transactions on Speech and Audio Processing 13, 6: 1173-1185.
http://dx.doi.org/10.1109/TSA.2005.852999
HUANG, X.- ALLEVA, F.- HON, H.-W.- HWANG, M.-Y.- LEE, K.-F.- ROSENFELD, R. (1993) "The SPHINX-II speech recognition system: an overview", Computer Speech and Language 7,2: 137-148.
Hughes, T., Nakajima, K., Ha, L., Vasu, A., Moreno, P., & LeBeau, M. (2010). Building transcribed speech corpora quickly and cheaply for many languages. In Interspeech 2010. Proceedings of the 11th annual conference of the international speech communication association. (pp. 1914-7). Makuhari, Chiba, Japan. September 26-30, 2010. Retrieved from http://www.isca-speech.org/archive/interspeech_2010/i10_1914.html
HUNT, M.J. (1998) "Practical Automatic Dictation Systems", The ELRA Newsletter 3,1: 4-7
LAMBERT, E. (1991) "La máquina de escribir con entrada vocal", in VIDAL BENEYTO, J. ( Dir.) Las industrias de la lengua. Trad. de M. Alvar et al. Salamanca / Madrid: Fundación Sánchez Ruipérez / Pirámide (Biblioteca del Libro, 5). pp. 455-461.
LAMBOURNE, A.- HEWITT, J.- LYON, C.- WARREN, S. (2004) "Speech-based real-time subtitling services", International Journal of Speech Technology 7, 4: 269-279.
http://dx.doi.org/10.1023/B:IJST.0000037071.39044.cc
LEE, K.F. (1989) Automatic Speech Recognition. The Developmen of the SPHINX System. Dordrecht: Kluwer.
MANDEL, M.A. (1992) "A commercial large-vocabulary discrete speech recognition system: Dragon Dictate", Language and Speech 35, 1-2: 237-246.
MEISEL, W.S. (1986) "Towards the 'Talkwriter'", in BRISTOW, G. (Ed.) (1986) Electronic Speech Recognition. Techniques, Technology and Applications. London: Collins. pp. 338-348.
Moreno, A. (2010). Information search engine for multilingual audiovisual content: BUCEADOR. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 259-62). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0058.pdf
NÉEL, F.- CHOLLET, G.- LAMEL, L.- MINKER, W.- CONSTANTINESCU, A. (1996) "Reconnaissance et comprehénsion de la parole: évaluation et applications", in MÉLONI, H. (Coord.) Fondements et Perspectives en Traitement Automatique de la Parole. Paris: Éditions AUPELF-UREF (Collection Universités Francophones).
PEA, E. - CANNAROZZO, L. (2006) "Considerazione sull'uso del Via Voice alla RTSI", inTRAlinea. Special issue on Respeaking.
http://www.intralinea.it/specials/respeaking/eng_more.php?id=486_0_41_0_M
POZA LARA, M.J.- VILLARRUBIA GRANDE, L.- SILES SÁNCHEZ, J.A. (1991) "Teoría y aplicaciones del reconocimiento automático del habla", Comunicaciones de Telefónica I+D 3.
Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., . . . Strope, B. (2010). “Your word is my command”: Google search by voice: A case study. In A. Neustein (Ed.), Advances in speech recognition. Mobile environments, call centers and clinics. (pp. 61-90). New York: Springer. doi:10.1007/978-1-4419-5951-5_4. Retrieved from http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36340.pdf
Schmitt, A., Zaykovskiy, D., & Minker, W. (2009). Speech recognition for mobile devices. International Journal of Speech Technology, 11(2), 63-72.
Schuster, M. (2010). Speech recognition for mobile devices at Google. In B. T. Zhang & M. Orgun (Eds.), PRICAI 2010: Trends in artificial intelligence. 11th Pacific Rim international conference on artificial intelligence, Daegu, Korea, August 30–september 2, 2010. Proceedings. (pp. 8-10). Berlin - Heidelberg: Springer. doi:10.1007/978-3-642-15246-7_3. Retrieved from
STEINBISS, V.- NEY, H.- ESSEN, U.- TRAN, B.-H., - AUBERT, X.- DUGAST, C.- KNESER, R.- MEIER, H.-G. - OERDER, R.- HAEB-UMBACH, R.- GELLER, D.- HÖLLERBAUER, W.- BARTOSIK, H. (1995) "Continuous speech dictation - From theory to practice", Speech Communication 17, 1-2: 19-38.
TAPIAS MERINO, D. (1999) "Sistemas de reconocimiento de voz en las telecomunicaciones", in GÓMEZ GUINOVART, J.- LORENZO SUÁREZ, A.- PÉREZ GUERRA, J.- ÁLVAREZ LUGRÍS, A. (Eds.) Panorama de la investigación en lingüística informática. RESLA, Revista Española de Lingüística Aplicada, Volumen monográfico. pp. 83-102.
Varona, A., Rodríguez Fuentes, L. J., Penagarikano, M., Nieto, S., Diez, M., & Bordel, G. (2010). Search and access to information contained in the speech of multimedia resources. Procesamiento del Lenguaje Natural, 45, 317-318. Retrieved from http://www.sepln.org/ojs/ojs-2.2/index.php/pln/article/view/831/685
VILLARRUBIA GRANDE, L.- CORTÁZAR MÚGICA, I.- HERNÁNDEZ GÓMEZ, L.- LÓPEZ GONZALO, E. (2001) "Reconocimiento de voz en el entorno de las nuevas redes de comunicación UMTS e Internet", Comunicaciones de Telefónica I+D 23: 99-112.
VIVER, X. (2005) "Philips: Intelligent Speech Interpretation - la tecnología inteligente de reconocimiento de voz", Procesamiento del Lenguaje Natural (Actas del XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Granada, 14-16 de septiembre de 2005), Revista nº 35: 459-460.
Zelenák, M., Schulz, H., & Hernando, J. (2010). Albayzín 2010 evaluation campaign: Speaker diarization. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 301-4). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0068.pdf
Burger, S., Sloane, Z. A., & Yang, J. (2006). Competitive evaluation of commercially available speech recognizers in multiple languages. In LREC 2006. Proceedings of the 5th International Conference on Language Resources and Evaluation. Genoa, Italy. May 24-26, 2006. Retrieved from http://pages.cs.brandeis.edu/~marc/misc/proceedings/lrec-2006/pdf/802_pdf.pdf
Castillo Condado, O. (1999). Evaluación de un reconocedor fonético para el español hablado en México. Tesis de Licenciatura, Universidad de Las Américas, Puebla, México. Retrieved from http://catarina.udlap.mx/u_dl_a/tales/documentos/lis/castillo_c_o/
de Yzaguirre, L. (2000). Evaluación comparativa de dos sistemas comerciales de reconocimiento de voz. In I jornadas en tecnología del habla. Sevilla: Universidad de Sevilla - Universidad de Granada - Red Temática en Tecnologías del Habla. Retrieved from http://latel.upf.edu/terminotica/membres/DE_YZA/PUBLI/eval2srv.pdf
Devine, E. G., Gaehde, S. A., & Curtis, A. C. (2000). Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports. Journal of the American Medical Informatics Association, 7(5), 462-468. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC79041/pdf/0070462.pdf
Furui, S. (2007). Speech and speaker recognition evaluation. In L. Dybkjaer, H. Hemsen, & W. Minker (Eds.), Evaluation of text and speech systems. (pp. 1-28). Dordrecht: Springer. doi:10.1007/978-1-4020-5817-2_1
Gibbon, D., Moore, R., & Winski, R. (Eds). (1998). Assessment of recognition systems. In Spoken language system assessment. (pp. 67-93). Berlin - New York: Mouton de Gruyer.
Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181-200. doi:10.1016/j.specom.2009.10.001
Hutchinson, B. (2001). A functional approach to speech recognition evaluation. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd Interspeech Event. (pp. 1683-6). Aalborg, Denmark, September 3-7, 2001. Retrieved from http://perso.telecom-paristech.fr/~chollet/Biblio/Congres/Audio/Eurospeech01/CDROM/papers/page1683.pdf
Lamel, L., Minker, W., & Paroubek, P. (2000). Toward best practice in the development and evaluation of speech recognition components of a spoken language dialogue system. Natural Language Engineering, 6(3-4), 305-322. Retrieved from http://www.limsi.fr/Individu/pap/nle99.ps
Mangold, H. (1989). Assessment of speech recognizers in public information and ordering systems. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 37-58). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1037.html
Moore, R. K. (1989). Assessment of speech input systems. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 27-32). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1027.html
Néel, F., Chollet, G., Lamel, L., Minker, W., & Constantinescu, A. (1996). Reconnaissance et comprehénsion de la parole: Évaluation et applications. In H. Méloni (Ed.), Fondements et perspectives en traitement automatique de la parole. (pp. 331-67). Paris: Éditions AUPELF-UREF.
Pallett, D. S. (1985). Performance assessment of automatic speech recognizers. Journal of Research of the National Bureau of Standards, 90(5), 371-387. Retrieved from http://nvl.nist.gov/pub/nistpubs/jres/090/5/V90-5.pdf#page=41
Pallett, D. S. (1986). Assessing the performance of speech recognisers. In G. Bristow (Ed.), Electronic speech recognition. Techniques, technology and applications. (pp. 277-306). London: Collins.
Pallett, D. S. (1989). Speech input assessment using benchmark tests: Procedures, advantages and limitations. In Proceedings of the ESCA tutorial and research workshop on Speech Input / Output Assessment and Speech Databases. (pp. 33-6). Noordwijkerhout, The Netherlands. September 20-23, 1989. Retrieved from http://www.isca-speech.org/archive_open/sioa_89/sia_1033.html
Pallett, D. S., & Fourcin, A. (1996). Speech input: Assessment and evaluation. In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology. (pp. 495-9). Cambridge: Cambridge University Press. Retrieved from http://cslu.cse.ogi.edu/HLTsurvey/ch13node8.html
Paulus, E. (2000). Some guidelines for the evaluation of approaches to automatic speech recognition. In W. F. Sendlmeier (Ed.), Speech and signals. Aspects of speech synthesis and automatic speech recognition. Dedicated to Wolfgang Hess on his 60th birthday. (pp. 129-39). Frankfurt am Main: Hector.
Serrahima, L. (2009). Reconocimiento de voz de Windows Vista: ¿Mejor, igual o peor que Dragon Naturally Speaking? Panace@, 10(29), 76-79. Retrieved from http://medtrad.org/panacea/IndiceGeneral/n29_tribuna-Serrahima2.pdf
Steeneken, H. J. M., & Varga, A. (1993). Assessment for automatic speech recognition: I. Comparison of assessment methods. Speech Communication, 12(3), 241-246. doi:10.1016/0167-6393(93)90094-2
Yao, X., Bhutada, P., Georgila, K., Sagae, K., Artstein, R., & Traum, D. (2010). Practical evaluation of speech recognisers for virtual human dialogue systems. In LREC 2010. Proceedings of the 7th International Conference on Language Resources and Evaluation. Valletta, Malta. 17-23 May, 2010. Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/675_Paper.pdf
Young, S. J., & Chase, L. L. (1998). Speech recognition evaluation: A review of the U.S. CSR and LVCSR programmes,. Computer Speech & Language, 12(4), 263-279. doi:10.1006/csla.1998.0101
Young, S. J., Adda-Dekker, M., Aubert, X., Dugast, C., Gauvain, J. L., Kershaw, D. J., . . . Woodland, P. C. (1997). Multilingual large vocabulary speech recognition: The European SQALE project. Computer Speech & Language, 11(1), 73-89. doi:10.1006/csla.1996.0023
Speaker Recognition, Speaker Indentification and Speaker Verification
ADAMI, A. G. (2007) "Modeling prosodic differences for speaker recognition", Speech Communication 49, 4: 277-291.ANDRÉ-OBRECHT, R. (Ed.) (2000) Special Issue on Speaker Recognition and its Commercial and Forensic Applications, Speech Communication 31, 2-3.
Beigi, H. (2009). Fundamentals of speaker recognition. New York: Springer.
Preface.- Basic Theory.- Introduction.- Speaker and Vocal Tract Modeling.- Signal Processing and Feature Extraction Techniques.- Data Representation and Probability Distributions.- Information Theory.- Metrics and Distortion Measures Bayesian Learning and Gaussian Mixture Modeling.- Parameter Estimation and Learning.- Hidden Markov Modeling (HMM).- Support Vector Machines.- Neural Networks.- Advanced Theory.- Speaker Modeling.- Language Modeling and Dynamic Analysis.- Sub-Optimal Search.- Algorithms.- Practice.- Speaker Recognition.- Overall Design.- Representation of Results.- Extensions.- Language Detection.- Glossary.BIMBOT, F. - HUTTER, H.P. - JABOULET, C., KOOLWAAIJ, J. - LINDBERG, J. - PIERROT, J.B. (1998) "An overview of the CAVE project research activities in Speaker Verification", in Proceedings of RLA2C, Speaker Recognition and its Commercial and Forensic Applications. Avignon, France, April 1998. pp. 215-220.
BIMBOT, F.- BLOMBERG, M.- BOVES, L.- CHOLLET, G.- JABOULET, C.- JACOB, B.- KHARROUBI, J.- KOOLWAAIJ, J.- LINDBERG, J.- MARIETHOZ, J.- MOKBEL, C.- MOKBEL, H. (1999) "An overview of the Picasso project research activities in speaker verification for telephone application", in Eurospeech'99, 6th European Conference on Speech Communication and Technology. September 5-9, 1999, Budapest, Hungary.
BIMBOT, F.- BLOMBERG, M.- BOVES, L.- GENOUD, D.- HUTTER, H.-P. - JABOULET, C.- KOOLWAAIJ, J.- LINDBERG, J.- PIERROT, J.-B. (2000) "An overview of the CAVE project research activites in speaker verification", Speech Communication 31, 2-3: 155-180.
BIMBOT, F.- CHOLLET, G.- PAOLONI, A. (Eds.) (1995) Special Section on Automatic Speaker Recognition, Identification and Verification, Speech Communication 17, 1-2: 81-298.
BIMBOT, F.- HUTTER, H.P.- JABOULET, C. - KOOLWAAIJ, J..- LINDBERG, J. - PIERROT, J.B. (1997) "Speaker Verification in the Telephone Network : Research Activities in the CAVE project", in Eurospeech'97. Proceedings of 5th International Conference on Speech Communication and Technology. Rhodes, Greece, September 1997. pp. 971-974.
BOURLARD, H.- MORGAN, N. (1998) Speaker Verification. A Quick
Overview. IDIAP
Technical Report, IDIAP-RR 98-12.
ftp://ftp.idiap.ch/pub/reports/1998/98-12.ps.gz
BRICKER, P.D.- PRUZANSKY, S. (1976) "Speaker Recognition", in N.J. LASS (Ed.) Contemporary Issues in Experimental Phonetics. New York: Academic Press. pp. 295-326.
CAMPBELL, J.P.- MASON, J.- ORTEGA-GARCÍA, J. (Eds.) (2006) Odyssey 2004: The Speaker and Language Recognition Workshop. Toledo, Spain. 31 May - 3 June 2004. Computer Speech and Language 20, 2-3.
CAPPÉ, O. (1996) Speaker Recognition Bibliography, Départment TSI Signal - Images,
École Nationale Supérieure des Télécommunications
http://perso.telecom-paristech.fr/~cappe/docs/spkrec.html
CHOLLET, G. (1994) "Automatic Speech and Speaker Recognition: Overview, Current Issues and Perspectives", in KELLER, E. (Ed.) Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Challenges. Chichester: John Wiley & Sons. pp. 129-148.<`> COSI, P. (1982) "Speaker recognition: A survey", in HATON, J.P. (Ed.) Automatic Speech Analysis and Recognition. Dordrecht: Reidel. pp. 277-308.
COST 250 (1996) COST 250 Workshop Proceedings "Application of Speaker Recognition Techniques in Telephony". Vigo, Spain, November 1996.
COST 250 (1998) COST 250 Workshop Proceedings "Speaker Recognition by Man and Machine: Directions for Forensic Applications". Ankara, Turkey, April 1998.
COST 250 (1999) COST 250 Speaker Recognition in Telephony. Final Report 1999. Brussels: European Commission DG XIII Directorate B / Roma: Fondazione Ugo Bordoni. (CD-ROM)
COST 275 (2002) The Advent of Biometrics on the Internet. A COST 275 Workshop. Proceedings. 7-8 November, 2002. Fondazione Ugo Bordoni, Rome, Italy.
COST 275 (2004) Biometrics on the Internet: Fundamentals, Advances and Applications. 2nd COST 275 Workshop. Proceedings. University of Vigo, 25-25 March, 2004. Vigo, Spain.
DANKOVICOVÁ, J.- NOLAN, F. (1999) "Some acoustic effects of speaking style on utterances for automatic speaker verification", Journal of the International Phonetic Association 29, 1: 115-128.
DODDINGTON, G. (1985) "Speaker recognition - identifying people by their voices", Proceedings of the IEEE 73: 1651-1664.
FURUI, S. (1996) "An overview of speaker recognition technology", in LEE, C.-H. - SOONG, F. K.- PALIWAL, K.K. (Eds.) Automatic Speech and Speaker Recognition. Dordrecth: Kluwer Academic Publishers. pp. 31-56.
ESCUDERO, D.- CARDEÑOSO, V.- SÁNCHEZ, J.M.- NAVAS, E.- HERNÁEZ, I. (2003) "Uso de entonación en reconocimiento automático del locutor: Resultados preliminares", in SEAF 2003. Actas del II Congreso de la Sociedad Española de Acústica Forense. Barcelona, 10 y 11 de abril de 2003. Barcelona: SEAF, Sociedad Española de Acústica Forense. pp. 167-174.
http://www.infor.uva.es/~descuder/investig/pdfs/SEAF2003.pdf
FAÚNDEZ, M. (1999) "Identificación de locutores sobre la base de datos Telvoice", XIV Simposium Nacional de la Unión Científica Internacional de Radio, URSI'99, Santiago de Compostela.
FAÚNDEZ, M.- SATUÉ, A. (1999) "Identificación de locutor sobre base de datos bilingüe", XIV Simposium Nacional de la Unión Científica Internacional de Radio, URSI'99, Santiago de Compostela.
FERNÁNDEZ POZO, R. - FOMBELLA MOURELLE, C. - TORRE TOLEDANO, D. - LÓPEZ GONZALO, E. - HERNÁNDEZ GÓMEZ, L. (2006) "Estudio del uso de información prosódica en reconocimiento de locutor en ámbito forense", in IV Jornadas en Tecnologías del Habla. Zaragoza, del 8 al 10 de novembre de 2006. Zaragoza: Universidad de Zaragoza. pp. 343-348.
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/IV/4jth.pdf
FURUI, S. (1986) "Research on individuality features of the speech waves and automatic speaker recognition techniques", Speech Communication 5, 2: 183-197.
FURUI, S. (1996) "An overview of speaker recognition technology", in LEE, C.-H. - SOONG, F. K.- PALIWAL, K.K. (Eds.) Automatic Speech and Speaker Recognition. Dordrecth: Kluwer Academic Publishers. pp. 31-56.
FURUI, S. (1997) "Speaker Recognition", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press. pp. 42-48.
http://speech.bme.ogi.edu/HLTsurvey/ch1node9.html#SECTION17
GARVIN, P.L.- LADEFOGED, P. (1963) "Speaker identification and message identification in speech recognition", Phonetica 9: 193-199.
HERNÁNDEZ, L.A.- CASAJÚS, F.J.- GARCÍA GÓMEZ, R. (1984) "Identificación de personas por sus voces", Mundo electrónico 146: 83-91.
HERNANDO, J.- GARCÍA, C.- RODRÍGUEZ, L.- GONZÁLEZ, J.- ORTEGA, J. (2000) "Reconocimiento del locutor en telefonía: actividades del proyecto europeo COST250", in ORTEGA GARCÍA, J. (Ed.) SEAF 2000. Actas del I Congreso de la Sociedad Española de Acústica Forense. Universidad Politécnica de Madrid, Escuela Universitaria de Ingeniería Técnica de Telecomunicación, Madrid, 5-6 de octubre de 2000. Madrid: EUIT de Telecomunicación. pp. 145-148.
LAVER, J.- JACK, M.- GARDINER, A. (Eds.) (1990) Proceedings of the tutorial and research workshop on Speaker Characterization in Speech Technology. Edinburgh, 26-28 June 1990. Edinburgh: Centre for Speech Technology Research, University of Edinburgh - ESCA, European Speech Communication Association.
LEUNG, K.Y.- MAK, M.W.- SIU, M.H.- YUNG, S.Y. (2006) "Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification", Speech Communication 48, 1: 71-84.
http://dx.doi.org/10.1016/j.specom.2005.05.013
LINDBERG, J.- BLOMBERG, M.- MELIN, H. (1997) "CAVE - Speaker verification in bank and telecom services", Phonum 4 (Fonetik 97, Umeå University, Sweden, May 28-30, 1997): 65-68.
MINEMATSU, N. - SEKIGUCHI, M. - HIROSE, K. (2002) "Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers", in ICASSP 2002. Proceedings of the 2002 IEEE international conference on acoustics, speech and signal processing. 13 – 17 May, 2002. Orlando, Florida, USA. Vol 1, pp. 137-140.
MINEMATSU, N. - SEKIGUCHI, M. - HIROSE, K. (2002) "Performance Improvement in Estimating Subjective Ageness with Prosodic Features", in Speech Prosody 2000. An International Conference. Aix-en-Provence, France, 11-13 April 2002.
http://aune.lpl.univ-aix.fr/sp2002/pdf/minematsu-etal.pdf
NOLAN, F.- SCHERER, K. (2000) "Speaker verification with elicited speaking styles in the VeriVox project", Speech Communication 31, 2-3: 121-130.
ORTEGA GARCÍA, J.- CRUZ LLAMAS, S.- GONZÁLEZ RODRÍGUEZ, J. (1998) "Quantitative influence of speech variability factors for automatic speaker verification in forensic tasks", in ICSLP 98 Conference Proceedings CD-ROM. The 5th International Conference on Spoken Language Processing. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998. Rundle Mall: Causal Productions, 1998.
ORTEGA GARCÍA, J.- GONZÁLEZ RODRÍGUEZ, J. - MARRERO AGUIAR, V.- DÍAZ GÓMEZ, J.J.- GARCÍA JIMÉNEZ, R.- LUCENA MOLINA, J.- SÁNCHEZ MOLERO, J.A.G. (1998) "AHUMADA: A Large Speech Corpus in Spanish for Speaker Identification and Verification", in Proceedings of ICAPSSP-98. IEEE International Conference on Acoustics Speech and Signal Processing. May 1998. pp. 773-776.
ORTEGA GARCÍA, J.- GONZÁLEZ RODRÍGUEZ, J.- MARRERO AGUIAR, V.- DÍAZ GÓMEZ, .J.- GARCÍA JIMÉNEZ, R.- LUCENA MOLINA, J.- SÁNCHEZ MOLERO, J.A.G. (1998) "Speaker recognition-oriented 'Ahumada' large speech corpus", in RUBIO, A.- GALLARDO, N.- CASTRO, R.- TEJADA, A. (Eds.) Proceedings of the First International Conference on Language Resources and Evaluation. May 28 - 30, 1998, Granada, Spain. European Language Resources Association. Vol. II. pp. 1101 - 1106.
ORTEGA, J.- GONZÁLEZ, J.- TAPIAS, D. (2000) "Consistencia fonética del español en sistemas de verificación de locutor sobre locuciones de corta duración tipo PIN", in ORTEGA GARCÍA, J. (Ed.) SEAF 2000. Actas del I Congreso de la Sociedad Española de Acústica Forense. Universidad Politécnica de Madrid, Escuela Universitaria de Ingeniería Técnica de Telecomunicación, Madrid, 5-6 de octubre de 2000. Madrid: EUIT de Telecomunicación. pp. 199-206.
RODRÍGUEZ, L.- DOCÍO, L.- GARCÍA, C. (1998) "Panorámica de la tecnología en reconocimiento automático de locutores", in GÓMEZ GUINOVART, J.- PALOMAR, M. (Coords.) (1998) Monografía: Lengua y Tecnologías de la Información. Novática, Revista de la Asociación de Técnicos de Informática 133 (Mayo-Junio): 36-40.
ROSE, P. (2006) "Technical forensic speaker recognition: Evaluation, types and testing of evidence", Computer Speech and Language 20, 2-3: 159-191.
http://dx.doi.org/10.1016/j.csl.2005.07.003
ROSENBERG, A.E. (1976) "Automatic speaker verification: a review", Proceedings of the IEEE 64, 4: 475-486.
SATUÉ, A.- FAÚNDEZ, M. (1999) "On the relevance of language in speaker recognition",
Eurospeech'99, 6th European Conference on Speech Communication and
Technology.
September 5-9, 1999, Budapest, Hungary.
http://www.isca-speech.org/archive/eurospeech_1999/e99_1231.html
SHRIBERG, E.- FERRER, L.- KAJAREKAR, S.- VENKATARAMAN, A.- STOLCKE, A. (2005) "Modeling prosodic feature sequences for speaker recognition", Speech Communication 46: 455-472.
http://dx.doi.org/10.1016/j.specom.2005.02.018
SHUTERLAND, A.- JACK, M. (1988) "Speaker Verification", in JACK, M.- LAVER, J. (Eds.) Aspects of Speech Technology. Edinburgh: Edinburgh University Press. pp. 184-215.
ANTOINE, F.- ZHU, D.- BOULA DE MAREÜIL, P.- ADDA-DECKER, M. (2004) "Approches segmentales multilingues pour l'identification automatique de la langue : phones et syllabes", in JEP 2004. Journées d'Etude sur la Parole 2004. 19-22 avril 2004. Fès, Maroc.
http://www.limsi.fr/Individu/mareuil/publi/Antoine-Zhu-etal.pdf
BARKAT-DEFRADAS, M.- VASILESCU, I.- PELLEGRINO, F. (2003) "Stratégies perceptuelles et identification automatique des langues: application au continuum dialectal arabe", Revue PArole (Mons) 25-26: 1-44.
BARTKOVA, K.- JOUVET, D. (2004) "Ensemble élargi de phonèmes pour la reconnaissance de parole avec accents", in MIDL 2004. Modélisations pour l'identification des langues et des variétés dialectales. 29-30 Novembre, 2004. Paris, France. pp. 77-78.
http://www.limsi.fr/MIDL/actes/session%20III/Bartkova&Jouvet_MIDL2004.pdf
GEOFFROIS, E. (2004) "Indentification automatique des langues: techniques, ressources et évaluations", in MIDL 2004. Modélisations pour l'identification des langues et des variétés dialectales. 29-30 Novembre, 2004. Paris, France. pp. 43-44.
http://www.limsi.fr/MIDL/actes/conference%20invitee%20I/Geoffrois_MIDL2004.pdf
MIDL 2004. Modélisations pour l'identification des langues et des variétés dialectales. 29-30 Novembre, 2004. Paris, France.
http://www.limsi.fr/MIDL/actes/
MUTHUSAMY, Y.K.- SPITZ, L. (1997) "Automatic Language Identification", in COLE, R.A.- MARIANI, J.- USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in Human Language Technology. Cambridge: Cambridge University Press. 314-317.
http://cslu.cse.ogi.edu/HLTsurvey/ch8node9.html
MUTUSHAMY, Y.K.- BARNARD, E.- COLE, R.A. (1994) "Reviewing Automatic Language Identification", IEEE Signal Processing Magazine, October 1994: 33-41.
Rodríguez Fuentes, L. J., Penagarikano, M., Varona, A., Díez, M., & Bordel, G. (2010). Overview of the Albayzín 2010 language recognition evaluation: Database design, evaluation plan and preliminary analysis of the results. In FALA 2010. VI jornadas en tecnología del habla - II Iberian SLTech workshop. (pp. 309-15). Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010. Retrieved from http://fala2010.uvigo.es/images/proceedings/pdfs/0070.pdf
ZISSMAN, M.A. - BERKLING, K.M. (2001) "Automatic Language Identification", Speech Communication 35, 1-2: 115-124.
KOMPE, R. (1997) Prosody in Speech Understanding Systems. Berlin-New York: Science Springer (Lecture Notes in Artificial Intelligence, Vol. 1307 Subseries of Lecture Notes in Computer Science Springer).
MINKER, W. (1999) Compréhension automatique de la parole spontanée. Paris: L'Harmattan.
PRICE, P. (1997) "Spoken Language Understanding", in COLE, R.A.- MARIANI, J.-
USZKOREIT, H.- ZAENEN, A.- ZUE, V. (Eds.) Survey of the State of the Art in
Human Language Technology. Cambridge: Cambridge University Press. pp.
49-56.
http://cslu.cse.ogi.edu/HLTsurvey/ch1node10.html
SEGARRA, E. (2006) "La interpretación semántica", in LLISTERRI, J.- MACHUCA, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). pp. 99-118.
Tur, G., & de Mori, R. (Eds). (2011). Spoken language understanding: Systems for extracting semantic information from speech. Oxford - New York: John Wiley & Sons.
WANG, Y.-Y.- DENG, L.- ACERO, A. (2005) "Spoken language understanding", IEEE Signal Processing Magazine 22, 5: 16-31.
http://dx.doi.org/10.1109/MSP.2005.1511821
ZUE, V.W. (1991) "From signals to symbols to meaning: On machine understanding of spoken language", in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence: Université de Provence, Service des Publications. vol 1. pp. 74 -83.
Speech Recognition - Bibliography
Joaquim Llisterri, Universitat Autònoma de
Barcelona
http://liceu.uab.cat/~joaquim/speech_technology/tecnol_parla/recognition/refs_reconeixement.html
Last modified: 14/11/11 18:04
This
work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.