Speech analysis and transcription software


Tools for the acoustic analysis of speech

Tools for labelling and annotation of speech corpora

Tools for the acoustic analysis of speech

Aneto, Universitat Politècnica de Catalunya
Anvil, DFKI, German Research Center for Artificial Intelligence
Audiamus
CECIL, CCS Software Development
CSL, Computerized Speech Lab, Kay Elemetrics
CSLU Toolkit, Center for Spoken Language Understanding, Oregon Graduate Institute
ELAN, Max Plank Institute for Psycholinguistics
GIPOS, Institute for Perception Research, Eindhoven
ISA, Intelligent Speech Analyser, Oy Pitchsystems Ltd.
lingWaves, LingCom
Macquirer / PCquirer, Scicon R&D, Inc.
MATLAB Signal Processing Toolbox, The Math Works Inc.
MES Signaix, Laboratoire Parole et Langage, Aix-en-Provence
ONZE Miner, Linguistics Department, University of Canterbury, New Zealand
Phonédit, Multimedia Signal Editor and Analyser, Laboratoire Parole et Langage, Université de Provence
PitchWorks, Scicon R&D. Inc.
Praat, Institute of Phonetic Sciences, University of Amsterdam
Prosogram, P. Mertens, Department of Linguistics, KU Leuven
SFS/RTGRAM, Department of Phonetics and Linguistics, University College London
SFS, Speech Filing System, Department of Phonetics and Linguistics, University College London
Signalyze, InfoSignal Inc.
Sona, IKP, Institute for Communications Research and Phonetics, University of Bonn
Sonogram Visible Speech, C. Lauer
SoundIndex, M. Jacobson
SoundScope, GW Instruments
Speech Analyzer, CCS Software Development
Speech Studio, Laryngograph Ltd.
Transana, Wisconsin University
Transcriber, C. Barras, LIMSI, CNRS - E. Geoffrois, DGA, CTA, GIP
WaveSurfer, Centre for Speech Technology, KTH
WEDW Edit Waveform Program, Speech Research Lab, University of Delaware - A.I. duPont Hospital for Children
Winpitch, Pitch Instruments Inc.
WinSnoori, BaBel Technologies
xassp, Christian-Albrechts Universität
line_red

Aneto, Grup de Tractament de la Veu, Universitat Politècnica de Catalunya

Aneto
http://gps-tsc.upc.es/veu/soft/soft/aneto.php3

“Aneto is a software application developed in the context of Text-to-Speech research at the UPC that runs on WindowsNT, Windows 95 and Windows98.

Aneto can:

Open, visualise, record and play speech files
Analyse voice fundamental frequency and derive a pitch contour
Modify the stylised contour and synthesise a signal with the new prosody
Label relevant points in the speech signal”

aneto.gif
arrow_up

Anvil, M. Kipp, DFKI, German Research Center for Artificial Intelligence

http://www.anvil-software.org

“Anvil is a free video annotation tool, used at research institutes world-wide. It offers frame-accurate, hierarchical multi-layered annotation driven by user-defined annotation schemes. The intuitive annotation board shows color-coded elements on multiple tracks in time-alignment. Special features are cross-level links, non-temporal objects and a project tool for managing multiple annotations. Originally developed for Gesture Research, Anvil has also proved suitable for research in Human-Computer Interaction, Linguistics, Ethology, Anthropology, Psychotherapy, Embodied Agents, Computer Animation and many other fields.

Anvil can import data from the widely used, public domain phonetic tools PRAAT and XWaves which allow precise and comfortable speech transcription. Anvil can display waveform and pitch contour. Anvil's data files are XML-based. Special ASCII output can be used for import in statistical toolkits like SPSS. The Anvil system is written in Java and should run on Windows, Macintosh and Unix (Solaris/Linux) computers.”

anvil.jpg

anvil2.jpg
arrow_up

Audiamus, N. Thieberger

http://languages-linguistics.unimelb.edu.au/thieberger/audiamus.htm

“A tool for building corpora of linked transcripts and digitised media.

Audiamus instantiates the links to digitised media. It requires no segmentation of the sound/video file. Currently there is no limit to the size of the media file or the number of transcripts. Each 'card' of the current model represents a single transcript (typically a complete side of a cassette). Time-aligned transcripts, as produced for example by SoundIndex or Transcriber are the input for Audiamus.

The transcripts in Audiamus are plain text and can be edited, as can the timecodes. Thus the data in Audiamus is the master copy of the transcript that is improved incrementally with use. To avoid the problem of data being locked up in proprietary formats there is a mass export function that dumps all linked text and timecodes to plain text files, or to whatever format the user selects.”

arrow_up

CECIL, CCS Software Development

WinCECIL
http://www.sil.org/computing/catalog/show_software.asp?id=65

MacCECIL
http://www.sil.org/computing/speechtools/softdev2/Cecil2/CECdownloads2.htm

“WinCECIL is a speech analysis tool based on the DOS CECIL version 2.1 program. WinCECIL provides support forrecording, analyzing, and saving of 3 second sections of speech. WinCECIL requires a 20MHz 80386 computer or better running Microsoft Windows 3.1 or higher. It also requires a Windows Multimedia-compatible sound card.

Use this program to view speech recordings, automatic pitch contours, and spectrograms. Recording limit is 3 seconds.

Most of the functions of the WinCECIL program has been superseded by the Speech Analyzerprogram."

MacCECIL is a speech analysis tool based on the Windows WinCECIL version 2.1 program. MacCECIL is designed for use on Mac computers.

MacCECIL provides support for recording, analyzing, and saving of speech. Use this program to view speech recordings, automatic pitch contours, and spectrograms. Recording limit is 3 seconds.”

MacCecil.jpg
arrow_up

CSL, Computerized Speech Lab, Kay Elemetrics

http://www.kayelemetrics.com/index.php?option=com_product&Itemid=3&controller=product&task=learn_more&cid[]=73

“CSL is the most comprehensive PC-based system available for speech acquisition, analysis, editing, and playback. An integrated hardware/software system, the versatile platform is recognized internationally by both clinicians and researchers for its unique combination of sophistication, flexibility, and ease-of-use.

The system's robust hardware meets the rigorous specifications required by speech professionals and researchers. It contains an external module for high-fidelity data acquisition (>86 dB dynamic range), DSP circuitry for real-time processing/display of speech parameters needed for therapy applications, and CD-quality playback for critical listening tasks. The core software is fully integrated with the hardware. It contains a rich set of easily applied analysis and editing features and is complemented by 15 applicationspecific (e.g., clinical, linguistic, etc.) software modules and databases.

Built on Kay's decades of experience in speech analysis, the CSL accommodates the many and varied needs of speech/voice clinicians, phoneticians, speech scientists, phoniatricians, and otolaryngologists. CSL was developed jointly with Speech Technology Research (STR) of Victoria, B.C., Canada

Current CSL options include:

Synthesis Program
Multi-Dimensional Voice Program (MDVP)
Voice Range Profile
Sona-Match
Delayed Auditory Feedback
Real-Time Spectrogram
CSL-Pitch1
Phonetic Database
Palatometer Database
IPA Transcription Tutorial
Disordered Voice Database
EGG Processing
Motor Speech Profile (MSP)
Signal Enhancement in Noise
Auditory Perception Program and Database (APP)
Condenser Microphone
DAT Interface and Four-Channel Input
Direct-to-Disk
Programmer’s Kit”

CLS.jpg
arrow_up

CSLU Toolkit, Center for Spoken Language Understanding, Oregon Graduate Institute

http://www.cslu.ogi.edu/toolkit/index.html

“The CSLU Toolkit has been supporting research, development and learning activities for spoken language systems since January, 1996. It is designed to support a wide range of research activities, including data capture and analysis, corpus development, research in multilingual recognition and understanding, dialogue design, speech synthesis speaker recognition and language recognition, among others. In addition, the Toolkit provides easy to use graphical authoring tools (CSLUrp) for rapid prototyping of spoken language systems for useful applications. Finally, the toolkit is designed to provide a good environment for learning about spoken language technology. The Toolkit has been used to teach short courses, and students taking these courses have produced novel and useful spoken language systems, as described on our shortcourse page.

The Toolkit currently runs on Unix platforms which have Tcl/Tk (freely available).

The Toolkit is available free of charge for non commercial use. The license agreement that you accept before downloading the Toolkit says basically that you won't give the Toolkit away or profit from it financially.”

arrow_up

ELAN, EUDICO Linguistic Annotator, Max Plank Institute for Psycholinguistics

ELAN - Linguistic annotator. Language archiving technology portal [Computer Software]. Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from http://tla.mpi.nl/tools/tla-tools/elan/

“ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video and audio data. It was developed at the Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, with the aim to provide a sound technological basis for the annotation and exploitation of multi-media recordings. ELAN is specifically designed for the analysis of language, sign language, and gesture, but it can be used by everybody who works with media corpora, i.e., with video and/or audio data, for purposes of annotation, analysis and documentation.
ELAN supports:
display a speech and/or video signals, together with their annotations
time linking of annotations to media streams
linking of annotations to other annotations
unlimited number of annotation tiers as defined by the users
different character sets
export as tab-delimited text files
im- and export between ELAN and Shoebox
search options.”

elan_1.png

ELAN - Linguistic annotator. Language archiving technology portal [Computer Software]. Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from http://tla.mpi.nl/tools/tla-tools/elan/

elan_2.png

ELAN - Linguistic annotator. Language archiving technology portal [Computer Software]. Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from http://tla.mpi.nl/tools/tla-tools/elan/

arrow_up

GIPOS, Institute for Perception Research, Eindhoven

http://www.hum.uu.nl/uilots/lab/resources.php

“GIPOS stands for Graphical Interactive Processing of Speech. It is an integrated speech processing program. It provides the tools you need to create, view, play and manipulate waveforms, spectrograms and other forms of speech data. You'll find:

fast spectrogram displays
real-time spectrum displays
on-line pitch measurements (SHS, PDT)
capabilities for labeling of waveform files
modification of pitch and durations contours using PIOLA, PSOLA or LPC
LPC-parametric file manipulations
waveform editing (cut, copy, paste, add, scale, fade, reverse, etc.)
filtering (Low-pass, high-pass, bandpass, band stop, etc.)
waveform recording with real-time sample-rate conversion option
sound compression (PCM, ADPCM, CELP, VSELP, LPC, MPEG)
simultaneous processing of multiple files (up to 15)
synchronous zoom & scroll in all charts
high level of interaction
different tasks can be performed in parallel, e.g.: while making a recording, turn on spectrogram display, zoom-in on a specific region, play and save a segment of the recorded part all at the same time iconic help cues, context dependent menu's and other graphical user-interface features
command shell for more complex operations and for running in batch mode
undo/redo for the last 1000 operations
user macro definitions for function keys
all key combinations are user configurable
users can write their own programs to extend the standard functionality
runs on different platforms (SilliconGraphics, SUN and PC)
GIPOS is designed to run on different platforms (SilliconGraphics, SUN and PC (Windows95 and Linux) only for the time being).”

gipos.gif
arrow_up

ISA, Intelligent Speech Analyser, Oy Pitchsystems Ltd.

“The main scopes of application include:
Phonetics
Phoniatrics
Logopedics
Audiology
Speech Analysis
Sound Analysis
Singing Analysis
Music Analysis
Music Instrument Analysis
Research on Children's Crying
Research on Lung Sounds and Heart Sounds
Good Radio Voice Analysis
Sound Editing
All the analysis programs have been written using a machine language, because in this way ISA is many times faster than using a high level language. ISA is the unique software in the world. The use of ISA is very simple. All the analyses have their own windows. All the functions of the ISA are controlled by the mouse. All the displays can be listened to. ISA-software is running in Apple Macintosh computer.”

ISA.gif
arrow_up

lingWaves, LingCom

“lingWAVES represents a modern tool to analyze technical signals on the PC, mainly used for speech and video recordings. The program offers numerous long and short time analyses (time signal, FFT, fundamental frequency, spectrogram...) and is easy to handle.

Modules:

Processing: WAV-Import, AVI-Import, Record, WAV-Export, Play
Long time analysis: Spectrogram (Standard); Fundamental frequency (Standard)
Energy (Standard); Jitter; Shimmer
Short time analysis: FFT (Standard), AMDF, Cepstrum, LPC
Autocorrelation
Tools: Label, Calibration
Real time: Phonetogram”

lingWAVES.jpg
arrow_up

MacquirerX / PCquirerX, Scicon R&D, Inc.

http://www.sciconrd.com/macquirerx.aspx

http://www.sciconrd.com/pcquirerx.aspx

“PCquirer & Macquirer features include:

The same “LOOK-N-FEEL” between PCquirer & Macquirer with complete file interchangeability.
Complete waveform editing for single and multi-channel data.(data captured by X16 series)
PCquirer reads CSL, WAVES file formats directly.
Macquirer reads CSL, AIFF file formats directly.
Unmatched, high quality spectrograms.
FFT/LPC, Intensity.
Pitch records, only to be reproduced by workstation powered systems.
Complete labeling systems on main, spectrogram and pitch views.
Automatic Log Entry system with full online editing capability for addition of comments and other experiment related notes.
Direct printing onto high resolution laser printers as high as 2400 DPI.
Ability to save each window as bitmap(pc) & PICT(Mac) files for direct entry into word processors.
Fully complies with the Windows(WIN95/98/NT) and Mac(Power PC) operating system environments.
Full online help files for both PCquirer, and Macquirer.”

PCQuirer_1.gif

PCQuirer_2.gif
arrow_up

MATLAB Signal Processing Toolbox, The Math Works Inc.

http://www.mathworks.es/products/signal/index.html

“The Signal Processing Toolbox provides a rich, customizable framework for digital signal processing (DSP). Built on a solid foundation of filter design and spectral analysis techniques, the toolbox contains powerful tools for algorithm development, signal and linear system analysis, and time-series data modeling. The toolbox is useful in applications such as speech and audio processing, communications, geophysics, real-time control, finance, radar, and medicine.

Signal and linear system models:

Digital and analog filter design, analysis, and implementation
FFT, DCT, and other transforms
Spectrum estimation and statistical signal processing
Parametric time-series modeling
Waveform generation
Windowing”

arrow_up

MES Signaix, Laboratoire Parole et Langage, Aix-en-Provence

http://aune.lpl-aix.fr/ext/projects/mes_signaix.htm/

“Mes Signaix is a package for speech processing for Solaris 2.5.

The tool mes is for observing the signal, signaix is a set of speech processing tools and other utilitaries.

The mes/signaix package was developed to display, label and process speech signal, according to the "Software Lego" principles. That is, it was designed as a toolbox composed of a series of subtools, each devoted to solving a unique, specific problem. This package has grown out of a number of Unix-based speech analysis tools that have been developed over a period of several years to assist in phonetic research.

mes is the signal display/label tool, signaix is a set of speech processing tools and graphical utilities.Each tool is a shell level command, which may be run independently.As much as possible, non-graphical tools are designed as 'Unix filters'.

Main tools:

graphic tools: 2-D and 3-D ('sona' like) display, electropalatogram display
signal processing: RMS energy, zero-crossing rate, spectral analysis (FFT,LPC spectrum,sonagram, filtering
pitch detection(several methods), and modelisation, pitch and/or speech rate modification.
A free version of mes_signaix can be downloaded to your system for non-commercial, non-military purposes (see our user agreement). This version has been developed on Sun Workstations under Solaris 2.4 and has been tested under Solaris 2.4, 2.5 on Sun Workstations and PC 486s.”

arrow_up

ONZE Miner, R. Fromont - J. Hay, Linguistics Department, University of Canterbury, New Zealand.

http://onzeminer.sourceforge.net/

“ONZE Miner is essentially a database for time-aligned transcripts of audio recordings. Time-aligned transcripts are produced using Transcriber, which creates an XML document lining up the transcript text with the corresponding location in the audio recording. The transcript is then uploaded to ONZE Miner, which allows additional information about the speakers and the transcripts to be stored.

When the speakers have been selected, their utterances can be searched for text or regular expressions.

This returns a list of all of the utterances from the selected transcripts which match the query.

Alternatively, clicking on an utterance returned by the search produces the full transcript for the speaker involved, positioned with the relevant utterance at the top of the screen. Any part of the transcript can be clicked on, and listened to, if the audio media are available.

Clicking on the Praat icon to the left of any given utterance opens that utterance in Praat acoustic analysis software, so that its acoustic properties can be inspected. In addition, a Praat text-grid for the transcript can be generated.”

ONZE_1.jpg

ONZE_2.jpg
arrow_up

Phonédit Multimedia Signal Editor and Analyser, Laboratoire Parole et Langage, Université de Provence

http://aune.lpl-aix.fr/~lpldev/phonedit/

“PHONÉDIT is a signal editor that permits to record, edit, labelize, and analyse various types of signals. This software is dedicated to speech analysis. However it has the capability to analyse also aerodynamic parameters, electro-palatographic frames and kynesiographic movements.

It reads and writes the most common kind of file formats like MS-WAVE, CSL, Signalyze, ASCII, or raw binaries.

Many functions are applicable on the edited signals:

FFT : fast Fourier transform analysis, synchronous with the cursors movements.
Spectrogram : 3D spectral representation of a signal. Supports narrow and wide band analysis.
F0 detection : two methods are proposed for extracting the fundamental frequency from a speech signal. The comb algorithm based on a spectral analysis, and the AMDF algorithm based on a temporal analysis.
LPC : Linear Prediction Coding, for formantic studies.
RMS intensity : dB or linear, selection of the integration time.
Statistics : mean, standard deviation, Jitter/Shimmer extraction
PHONÉDIT integrates also a limited spreadsheet for data storage able to communicate with other applications like Microsoft Excel, Word, Access or Powerpoint.

PHONÉDIT is a standard multi-document application for windows 16 bits which supports : drag and drop, printing, copy/paste, context menus, data security, data recording/listening.

PHONEDIT operates on all computer compatible PC.”

arrow_up

PitchWorks, Scicon R&D

http://www.sciconrd.com/pitchworks.aspx

“PitchWorks is the next standard for pitch, labeling, and other intonation related studies.

PitchWorks uses the most sophisticated, Cepstral based pitch track engine along with the best spectrographic display to produce a series of displays for inspection.

The cursor for the main window of audio, labels, and pitch is fully linked with the spectrogram window. The selection in one, translates into the other, for more accurate measurement.

The log file keeps track of all the label information entries in the background. Thus, purely text formatted file can be imported into any other data basing programs such as EXCEL.

PitchWorks reads a wide variety of file formats, including: Xwaves and ESPS, NIST - LDC, Waves, CSL(nsp), Generic, ASCII, AIFF...

Pitch Works runs on both PC and Mac with all their files fully interchangeable.”

PitchWorks.gif
arrow_up

Praat, P. Boersma - D. Weenink, Institute of Phonetic Sciences, University of Amsterdam

Boersma, P., & Weenink, D. (2014). Praat: Doing phonetics by computer (version 5.3). [Computer Software] Amsterdam: Department of Language and Literature, University of Amsterdam. Retrieved from http://www.praat.org/

“The computer program Praat is a research, publication, and productivity tool for phoneticians.

This comprehensive speech analysis, synthesis, and manipulation package includes general numerical and statistical stuff, is built on a general-purpose GUI shell for handling objects, and produces publication-quality graphics.

Speech analysis:

spectral analysis (spectrograms)
pitch analysis
formant analysis
intensity analysis
jitter, shimmer, voice breaks
cochleagram
excitation pattern

Speech synthesis:

from pitch, formant, and intensity
articulatory synthesis

Listening experiments:

identification and discrimination tests

Labelling and segmentation:

label intervals and time points on multiple tiers
use phonetic alphabet
use sound files up to 2 gigabytes (3 hours)

Speech manipulation:

change pitch and duration contours
filtering

Learning algorithms:

feedforward neural networks
discrete and stochastic Optimality Theory

Statistics:

multidimensional scaling
principal component analysis
discriminant analysis

Graphics:

high quality for your articles and thesis
produce Encapsulated PostScript files
integrated mathematical and phonetic symbols

Programmability:

easy programmable scripting language
communicate with other programs
(the sendpraat source code)
create hypertext manuals with sound I/O

Portability:

machine-independent binary files
read and write many sound and other file types

Configurability:

grow or shrink menus
save prefs for fonts, views, sound devices

Versions for Macintosh, Windows, Linux, FreeBSD, SGI, Solaris, HPUX”

Praat_1.jpg

Boersma, P., & Weenink, D. (2011). Praat: Doing phonetics by computer. [Computer Software] Amsterdam: Department of Language and Literature, University of Amsterdam. Retrieved from http://www.praat.org/

Praat_2.jpg

Boersma, P., & Weenink, D. (2011). Praat: Doing phonetics by computer. [Computer Software] Amsterdam: Department of Language and Literature, University of Amsterdam. Retrieved from http://www.praat.org/

Praat_3.jpg

Boersma, P., & Weenink, D. (2011). Praat: Doing phonetics by computer. [Computer Software] Amsterdam: Department of Language and Literature, University of Amsterdam. Retrieved from http://www.praat.org/

Praat_4.jpg

Boersma, P., & Weenink, D. (2011). Praat: Doing phonetics by computer. [Computer Software] Amsterdam: Department of Language and Literature, University of Amsterdam. Retrieved from http://www.praat.org/

Scripts for Praat

Praat

Praat script resources, UCLA Phonetics Lab, University of California, Los Angeles.

C. Auran, Momel and Intsint, Praat scripts, Laboratoire "Savoirs, Textes, Langage", Université Charles de Gaulle - Lille3.

K. Crosswhite, Praat Scripts and Other Materials, Center for the Sciences of Language, University of Rochester.

C. Darwin, Praat scripts, Laboratory of Experimental Psychology, University of Sussex.

C. De Looze, Praat, Speech Communication Lab, Trinity College Dublin.

V. Dellwo, Praat Script Page, Phonetisches Laboratorium, Universität Zürich.

S. Grawunder, Praat scripts, Department of Linguistics, Max Planck Institut for Evolutionary Anthropology, Leipzig.

S. Kawahara, Praat scripts, The Institute of Cultural and Linguistic Studies, Keio University..

M. Lennes, SpeCT - The Speech Corpus Toolkit for Praat, Department of Modern Languages, University of Helsinki.

H. Loevenbruck, Quelques scripts sous Praat, Laboratoire de Psychologie et Neurocognition, Université Pierre Mendès-France.

B. Remijsen, Praat scripts, Linguistics and English Language, University of Edinburgh.

S. Sadowsky, Recursos de Praat, Universidad de la Frontera, Temuco.

J. Toscano, Praat Script Archives, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign.

P. Welby, Praat Info, Laboratoire Parole et Langage CNRS - Université de Provence.

Y. Xu, Praat scripts for f0 analysis, Department of Phonetics and Linguistics, University College London.

D. Gibbon, Format conversion tools for Praat, Transcriber and TASX Annotator, Computational Linguistics and Spoken Language Group, University of Bielefeld.

B. Plichta, AKUSTYK for Praat, Saint Paul, Minnesota.

Praat manuals, tutorials and support

Praat

Praat manual, P. Boersma & D. Weenink, University of Amsterdam.

Introductory tutorial to Praat, P. Boersma & D. Weenink, University of Amsterdam.

Praat tutorial and resources, J..P. Goldman, Université de Genève.

Manual para el análisis acústico, D. Román, Pontificia Universidad Católica de Chile.

Tutoriales sobre Praat, S. Sadowsky, Universidad de la Frontera, Temuco.

Manual del Praat, Laboratori de Fonètica, Universitat de Girona.

Praat tutorial, P. Welby & K. Ito, Ohio State University.

Tutorial for self study: basics of phonetics and how to use Praat, C. Welker & H. Quené, Utrecht University.

Praat for beginners, swphonetics, S. Wood.

Praat tutorial, K. Yoon, Ohio State University.

yahoo

Praat User's Group

arrow_up

Prosogram, P. Mertens, Department of Linguistics, KU Leuven

http://bach.arts.kuleuven.be/pmertens/prosogram/

“Transcription of prosody using pitch contour stylization based on a tonal perception model and automatic segmentation

Processing steps:

Calculate acoustic parameters: F0, intensity, voicing (V/UV).
Obtain a segmentation into units of the types indicated above. Select the relevant units (e.g. vowels, syllables). Select the voiced portion of these units, that has sufficient intensity/loudness (using difference thresholds relative to the local peak).
Stylize the F0 of the selected time intervals.
Determine pitch range used in speech fragment. Plot stylized pitch and some annotation tiers (text, phonetic transcription). Use a musical (semitone) scale and add calibration lines at every 2 ST for easy interpretation of pitch intervals.

The system is implemented as a Praat script.”

arrow_up

SFS, Speech Filing System, University College London

http://www.phon.ucl.ac.uk/resource/sfs/

“SFS provides a computing environment for conducting research into the nature of speech. It comprises software tools, file and data formats, subroutine libraries, graphics, standards and special programming languages. It performs standard operations such as acquisition, replay, display and labelling, spectrographic and formant analysis and fundamental frequency estimation.

Analysis programs:

Acquisition and replay
Waveform processing
Laryngographic processing
Fundamental frequency estimation (from SP or from LX)
Formant frequency estimation
Formant synthesis
Spectrographic analysis
Filterbank analysis/synthesis
Resampling
Speed/pitch changing
Annotation
Spectral cross-sections
Waveform envelope
Filtering
Signal editing
Signal alignment
SFS is copyrighted University College London, but is currently supplied free of charge to research establishments for non-profit use. SFS is supplied as is with no warranty or support.

Operating environments:
WIN32: Microsoft Visual C, WIN32 API. Windows 95/98/NT/2000.
Unix: GNU gcc compiler and X-Windows. SunOs, Solaris, Linux, etc.
MSDOS: Protected mode 32-bit with GNU compiler DJGPP. ”

arrow_up

SFS/RTGRAM, Department of Phonetics and Linguistics, University College London

http://www.phon.ucl.ac.uk/resource/sfs/rtgram/

“RTGRAM is a free program for displaying a real-time scrolling spectrographic display of an audio signal. With RTGRAM you can monitor the spectro-temporal characteristics of sounds being played into the computer's microphone or line input ports. RTGRAM is optimised for speech signals and has options for different sampling rates, analysis bandwidths, temporal resolution and colour maps.”

rtgram.png
arrow_up

Signalyze, LinguistList Plus Inc.

http://www.meijigakuin.ac.jp/~varden/other/signalyze.html

“Signalyze 3.0 is an interactive program for the analysis of speech and other acoustic material. It contains a large set of signal editing, signal analysis and signal manipulation tools.

Signalyze 3.0 runs (only) on Macintosh computers. Signalyze 3.0 runs on almost all versions of the Macintosh. With a Macintosh AV or a Power Macintosh, you have all you need to record, analyze and reproduce professional 16-bit sound.

Signalyze can be used to:

Make stimuli for perception experiments
Splice two signals together
Mix noise into a speech signal
Align dichotic stimuli

Perform interactive speech analysis

Make superb 256-grayscale or color spectrograms
Measure duration, frequency and amplitude
Perform pitch extractions
Slow down or speed up speech
Filter signals
Manipulate signals
Change the sampling frequency of a signal

Teach foreign languages

Record a sentence,
Replay the recording,
Show its fundamental frequency (intonation pattern)”

arrow_up

Sona, IKP, Institute for Communications Research and Phonetics, University of Bonn

“The program SONA is a versatile experimental tool for finding and visualizing relevant information in both the time and the frequency domain of a speech signal.

In the time domain, the program allows:

digital recording of speech of nearly unlimited length with16 bit quantization
oscillographic representation with freely scaleable time and amplitude resolution
all kinds of signal manipulation (waveform editing)
reproduction of single or concatenated speech segments
measurement of their duration and intensity

Furthermore, the segments can be marked and transcribed phonetically (Labeling).

In the frequency domain (lower half of the screen), the program generates a digital spectral analysis of the speech signal in 2D or 3D. The 3D representation of the time dependent power spectrum is known as Visible Speech or sonagram and is one of the most important practical tools of linguistics and phonetics. Sonagrams are represented in gray scale or colour coding in one of five frequency sections (0.5 to 8 KHz) with variable breadth. One mouse click enables the user to listen to a selected segment or measure frequency and intensity of its spectrum.

The program runs on normal PCs (486 and higher) equipped with 16-Bit-Soundblaster, Extended Memory and ET 4000 graphics card.”

arrow_up

Sonogram, C. Lauer

http://www.christoph-lauer.de/Homepage/Sonogram.html

“Sonogram has been programmed at the German Research Center for Artificial Intelligence /DFKI), and is a tool to analyze speech and sound signals.”

Disponible para Windows, MacOSX y Linux/Unix.

SonogramVisibleSpeech.jpg
arrow_up

SoundIndex, M. Jacobson

http://michel.jacobson.free.fr/soundIndex/index.html

“SoundIndex est un outil qui alie un éditeur de texte structuré en XML avec un éditeur de son. Il permet d'écrire des tags <audio> à n'importe quel niveau dans l’arborescence d’un fichier XML en mettant comme valeurs pour les attributs start et end celles qui sont lues dans l'éditeur de son. L’interprétation des tags <audio> se fait par le biais de feuilles de styles écrites en XSL.”

soundindex.gif
arrow_up

SoundScope, GW Instruments

http://www.gwinst.com/macsftwr/html/sos_summary.html

“SoundScope software digitizes, analyzes, presents and databases speech and sound waveforms on Macintosh computers.

SoundScope is a third generation speech and sound analysis product line that represents a breakthrough in ease-of-use and advanced features. Record a sound, perform analysis, extract key values, and compute statistics all with a few clicks of the mouse. Scroll through data, adjust the scale or display range, and even change the parameters for sound analysis computations.

Record, view, analyze, play, store & print sound waveforms.
See spectrograms in full color.
View fundamental frequency (Fo), jitter (pitch pertubation), shimmer (amplitude pertubation), frequency spectra (FFT), linear predictive coding (LPC), and much more.
Compute statistics such as percent voiced, percent unvoiced and percent silent.
Design your own instrument screen, no programming required.
Customize menus and displays.
Record and playback up to maximum CPU memory (e.g. record for 100 seconds at 22kSamples/sec with 4.5 MB of free memory).
Enter notes and observations into the integrated text editor.”

SoundScope.jpg
arrow_up

Speech Analyzer, CCS Software Development

http://www.sil.org/computing/sa/index.htm

“Speech Analyzer has been primarily designed to be used by anyone who is doing investigative research into the phonetics of a language. It is a component of the Acoustic Speech Analysis Project and is one of the 5 Speech Analysis Tools and currently works on Microsoft Windows based computers. It does not need any special hardware but does makes use of any standard Windows compatable sound card for the playback of recordings. Speech Analyzer has been designed such that it is able to read and write standard Windows .WAV files.

When Speech Analyzer is used as a stand-alone program it is able to read and write standard Windows .WAV files. Additionally it keeps user supplied data including the IPA phonetic transcription inside a custom defined region within the .WAV file. During normal operations Speech Analyzer provide the user with a digital waveform view of recorded speech signals. It can also present several other possible views of the speech signal including Magnitude, Pitch, Spectra and color Spectrogram. The 5 levels of transcription: Phonetic, Pitch, Phonemic, Orthographic, and Gloss are time aligned to the waveform and provide segmentation of the recording.

Waveform, Magnitude, Zero Crossing
CECIL Raw Pitch, Smooth Pitch Change, Automatic Pitch
Cepstral based Color Spectrogram, Spectrum, Formants
Vowel charts using F1, F2 and F3

SpeechAnalyzer.gif
arrow_up

SpeechStudio, Laryngograph Ltd.

http://www.laryngograph.com/pr_studio.htm

“Speech Studio is a software and hardware package, which has been specially designed for phoneticians, speech scientists and quantitative work by ENT clinicians and SLT's. It supports data recording direct to hard disk, real-time displays, and instantaneous quantitative analysis and pattern target mode for speech training.

Speech Studio software is Windows-based, user friendly, and feature rich.

Speech Studio also includes a very powerful program, which can make an extensive range of quantitative analysis on connected speech. It is seamlessly integrated with the data recording and display program. It can work on different kinds of speech pattern elements and produce powerful graph families. The speech elements include fundamental frequency, speech amplitude, contact quotient, nasality and friction.”

speech_studio.jpg
arrow_up

Transana, Wisconsin University

http://www.transana.org/

“Transana is designed to facilitate the transcription and qualitative analysis of video and audio data. It provides a way to view video or play audio recordings, create a transcript, and link places in the transcript to frames in the video. It provides tools for identifying and organizing analytically interesting portions of video or audio files, as well as for attaching keywords to those video or audio clips. It also features database and file manipulation tools that facilitate the organization and storage of large collections of digitized video.”

Transana.jpeg
arrow_up

TranscriberAG

TranscriberAG. A tool for segmenting, labeling and transcribing speech. [Computer Software] Paris: DGA. Retrieved from http://transag.sourceforge.net/

“TranscriberAG is designed for assisting the manual annotation of speech signals. It provides a user-friendly graphical user interface (GUI) for segmenting long duration speech recordings, transcribing them, labeling speech turns, topic changes and acoustic conditions.

TranscriberAG is geared toward the needs of the speech research community, but its features might be found useful for other applications. It uses the Annotation Graph format as native format but can read a number of other annotation formats.

TranscriberAG is distributed as free software under the GNU General Public License GPLv3.”

transcriber.jpg

TranscriberAG. A tool for segmenting, labeling and transcribing speech. [Computer Software] Paris: DGA. Retrieved from http://transag.sourceforge.net/

transcriber_2.jpg

TranscriberAG. A tool for segmenting, labeling and transcribing speech. [Computer Software] Paris: DGA. Retrieved from http://transag.sourceforge.net/

arrow_up

WaveSurfer, Centre for Speech Technology, KTH

http://www.speech.kth.se/wavesurfer/

“WaveSurfer is an Open Source tool for sound visualization and manipulation. It has been designed to suit both novice and advanced users. WaveSurfer has a simple and logical user interface that provides functionality in an intuitive way and which can be adapted to different tasks. It can be used as a stand-alone tool for a wide range of tasks in speech research and education. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer can also serve as a platform for more advanced/specialized applications. This is accomplished either through extending the WaveSurfer application with new custom plug-ins or by embedding WaveSurfer visualization components in other applications.

Multi-platform - Linux, Windows 95/98/NT/2K/XP, Macintosh, Sun Solaris, HP-UX, FreeBSD, and SGI IRIX
Flexible interface - handles multiple sounds
Common sound file formats - reads, and writes WAV, AU, AIFF, MP3, CSL, SD, Ogg/Vorbis, and NIST/Sphere
Transcription file formats - reads, and writes HTK (and MLF), TIMIT, ESPS/Waves+ and Phondat. Support for encodings and Unicode
Unlimited file size - playback and recording directly from/to disk
Sound analysis - e.g. spectrogram and pitch analysis
Customizable - users can create their own configurations. Localization support
Extensible - new functionality can be added through a plugin architecture
Embeddable - WaveSurfer can be used as a widget in custom applications
Scriptable - hosts a built-in script interpreter

WaveSurfer.gif
arrow_up

WEDW Edit Waveform Program, Speech Research Lab, University of Delaware - A.I. duPont Hospital for Children

http://www.asel.udel.edu/speech/Spch_proc/wedw.htm

“Windows EDW (WEDW) is a fundamentally new program which attempts to provide similar functionality to the Unix/DOS version (EDW), but with a very different user interface.

WEDW retains some of the appearance of EDW in that a waveform display region is always present while spectrogram and pitch marking windows can be toggled on and off as desired. Both EDW and WEDW read and write waveforms in an extended RIFF (Microsoft .WAV) format that includes waveform segment definitions and both are also able to read an older .WAV format that was the original format used by EDW.

Waveform
Labels
Spectrogram
Pitch Tracking
WEDW provides a way to display special symbols such as IPA phonetic symbols when a font for the symbols is available.

Prosodic features of duration, F0, and amplitude can be changed.”

wedw1.gif
arrow_up

Winpitch, Pitch Instruments Inc.

http://www.winpitch.com

“Operates under Windows 3.1, Windows 95 and Windows NT 4.0
Uses Sound Blaster (TM) Sound Card or compatible
All functions with MDI (multi windows processing) capabilities
Extended sound Cut Copy and Paste functions
Color and black and white speech spectrograms, real time operation with Pentium processors
Real time fundamental frequency analysis and display of prosodic parameters (Fo, Intensity, waveform)
Time markers for easy signal segmentation
Display and storage of prosodic parameters of long signals
Visual display of intonation Fo model and imitation curves on the same windows or on separate windows
Bloc and segmentation functions, with listening, reanalysis and synthesis of any portion of the signal, not limited in numbers
Real time display of Fo, intensity and time values of both original and synthetic speech
Pitch marker editor Audio Morphing of Pitch, Intensity and Duration
Synthesis capability, with easy definition of modified prosodic parameters (Fo, intensity and duration)
MIDI musical synthesis function playing tunes according to the original or synthetic Fo variations
Full statistical analysis of fundamental frequency, jitter, shimmer and pauses of any portion of the signal
Visual lens displaying wave forms and spectrum Batch Printing capability (allowing printing of long signals without operator intervention)
Change of file format: 8 / 16 bits, upsampling and downsampling
Bookmark with text for speech labelling
Master / Student Mode for second language teaching
Graphic and text prosodic parameters editing
Duplex mode: record and playback at the same time
Variable rate speech playback”

winpitch.gif
arrow_up

WinSnoori, BaBel Technologies

http://www.loria.fr/~laprie/WinSnoori/GuidedTourW/index.html

“For several years we have undertaken the development of the software WinSnoori which is for both speech scientists as a research tool and teachers in phonetics as an illustration tool. It consists of five types of tools:

to edit speech signals
to annotate phonetically or orthographically speech signals. WinSnorri offers tools to explore annotated corpora automatically
to analyse speech with several spectral analyses and monitor spectral peaks along time
to study prosody. Besides pitch calculation it is possible to synthesise new signals by modifying the F0 curve and/or the speech rate
to generate parameters for the Klatt synthesiser (in the Motif version). A user friendly graphic interface and copy synthesis tools allows the user to generate files for the Klatt synthesiser easily.”

WinSNorri.gif
arrow_up

xassp, IPDS Institut für Phonetik und digitale Sprachverarbeitung, Christian-Albrechts-Universität, Kiel

http://www.ipds.uni-kiel.de/forschung/xassp.en.html

“xassp is an application for displaying, analysing and processing speech signals. It is intended for segmental and prosodic labelling, but can be used for different purposes, because of its numerous configuration possibilities.

User-definable configurations allow to open several associated files together and to automatically perform certain analyses of the speech signal. The configuration Segmental, e.g., is intended for segmental labelling. The windows that are opened when choosing this configuration are:

a speech signal that can be selected in the main dialog
a sonagram that is computed by means of spectral analysis of the speech signal
the labels that are associated with the speech signal
The configuration Prosodic is used for prosodic labelling. When choosing this configuration the following windows are opened:
the selected speech signal
the fundamental frequency computed from the speech signal
the labels that are associated with the speech signal
Although xassp is mainly intended for segmental and prosodic labelling, it provides several additional possibilities for analysing speech signals:
Fundamental frequency (F0)

The fundamental frequency can be displayed in different ways (range, linear or logarithmic scale)

Energy
Sonagram (FFT and LPC)
Section (FFT and LPC)”

xassp_segmental.jpg

xassp_prosodic.jpg
arrow_up

Tools for the acoustic analysis of speech

Tools for labelling and annotation of speech corpora

Tools for the acoustic analysis of speech


Speech analysis and transcription software
Joaquim Llisterri, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona
http://liceu.uab.cat/~joaquim/phonetics/fon_anal_acus/herram_anal_acus.html
Last updated: 20/3/14 13:21

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.