Vocal Tract Data Collection

Using MRI, electromagnetic artogulograph, and ultrasound

Information about the shape and movements of the vocal tract requires medical imaging techniques in order to look inside the head and throat. At York, we have access to three different systems allowing us to capture different aspects of vocal tract shape and movement: two MRI scanners based at York Neuroimaging Centre, a Carstens AG501 Electromagnetic Articulograph, and an Articulate Instruments micro ultrasound system and lip imaging camera. In my work I make use of all three systems, particularly MRI.

Participant in MRI scanner
MRI scan image including vocal tract

Magnetic resonance imaging (MRI)

MRI offers the richest information currently available about the vocal tract and surrounding structures. It is possible to obtain detailed 3D information about the shape of the vocal tract with sub-millimetre resolution; however with present techniques, several seconds are required to capture such data, so vocal tract postures must be held for an unnaturally long time. Alternatively, real-time 2D MRI—usually in the midsagittal plane—can capture information about the movements of the vocal tract during running speech in real time, but only on one plane, thus excluding a great deal of rich information about the shape of the vocal tract in 3D. Recent advances in dynamic 3D MRI mean that real-time 3D MRI is a possibility in the future, but as yet these techniques are only available on specialist research scanners and not the more widely available clinical scanners.

In my work I use both static, 3D MRI capture and real-time midsagittal 2D capture. Subjects in my studies are asked to perform different speech tasks in the scanner, and their speech is also recorded during scans with an optical microphone. Subjects then repeat the protocol in an anechoic chamber to obtain contemporaneous but clean audio recordings. We have shown that speech produced while supine and listening to MRI noise exhibits the Lombard effect, so it is important that these additional recordings are obtained in conditions as close to the MRI scan as possible.

Once we have the MRI and audio data, we can use medical image segmentation procedures to identify the boundaries of the vocal tract airway, and analyse the airway shapes and corresponding speech recordings in order to compare vocal tract anatomy, articulation and acoustics between individuals.

Electromagnetic articulograph (EMA)

Although real-time MRI is improving all the time, its frame rate is still insufficient for some faster articulations; additionally some people might not be able to go into an MRI scanner. One alternative for capturing data about movement in the vocal tract, at frame rates up to 1250Hz, is to use EMA. Small magnets are glued onto a subject's articulators, commonly on the tongue, lips and jaw, and the subject sits within an electromagnetic field generated by the machine. The position of each of the magnets is recorded in real time while the subject speaks.

EMA provides a number of advantages over MRI: subjects are sat up rather than lying down, meaning their articulation will be more normal; the machine is not claustrophobic and does not have medical contraindications, unlike an MRI scanner; and the machine does not make a noise, meaning clean audio can be captured simultaneously. EMA data analysis requires careful processing as sensor positions will not only be subject- but also session-specific. The data acquired is quite sparse data as only the position of the magnets can be recorded, however it can be combined with UTI to provide a more complete picture of tongue articulation.

Participant in electromagnetic articulograph
Participant in MRI scanner

Ultrasound tongue imaging (UTI)

The surface of the tongue provides a tissue/air boundary that can be detected using an ultrasound probe positioned under the chin. Like EMA, this data is captured in real-time in a much more natural speech environment than in an MRI scanner; in fact, the two can be captured simultaneously (although this is something we have only just begun to experiment with at York). UTI provides spatially rich (but 2D-only) information about the most critical articulator, the tongue, along much of its length. It is also possible to obtain the shape of the hard palate, allowing relative tongue positions to be determined. By changing the direction of the probe it is also possible to obtain information across the width, rather than along the length, of the tongue.

Other modalities

In addition to capturing data about articulator movement during speech, we are also interested in the speech signal and behaviour of the vocal folds. We can use optical microphones to record audio in the MRI scanner. We partner with AudioLab at York to make use of their anechoic chamber, allowing us to capture high-quality, clean audio with calibrated measurement microphones. Additionally, we try to collect data in lower-quality, forensically realistic conditions, such as via a telephone intercept. We also use a Laryngograph Ltd Electrolaryngograph to monitor vocal fold activity during recordings.

See also...

Anatomy, acoustics and the individual

Using MRI to study the variation in vocal tract shape among the population.
Find out more

Speech Synthesis

Using numerical acoustic modelling with MRI data to synthesise speech sounds.
Find out more

Voice and Identity

Using MRI to study the voice qualities of the vocal profile analysis scheme for forensic applications.
Find out more