How does the shape of your vocal tract contribute to the uniqueness of your voice?
Your voice is unique and complex, and carries a variety of different information: not just what you are saying, but also where you come from, how you are feeling, and even who you are. This last aspect allows you to be identified by your voice in certain cirumstances, and this is the underlying idea which supports technologies such as voice-as-biometric ("my voice is my password"), and fields like forensic speech science.
Identifying information in your voice falls under two broad categories: behavioural factors (such as where you were brought up, and with whom), and physiological factors (such as the size and shape of your vocal apparatus). What's difficult about using the voice for identification is that it keeps changing - for example, depending on your mood, or whether you have a cold, or who you're talking to - and even if you said the same thing 100 times, no two repetitions would be truly identical. It is therefore very important to determine which features of the voice are likely to remain consistent across different environments and utterances. Since the shape of some parts of your vocal tract - like your hard palate and teeth - remain consistent no matter what you are saying, it is of considerable interest to study how these physiological features affect the sounds of your voice.
This project, funded by the British Academy (PF19\100024), investigates the contributions of vocal tract shape to the sound of speech. It does this by asking three main research questions:
- Which parts of the vocal anatomy have the most impact on speaker-discriminatory features in the speech signal?
- What degree of anatomical variation can be expected among the population?
- To what extent are these differences detectable by naive listeners, forensic speech experts, and automatic speaker recognition systems?
Let's look at each of these in a little more detail:
Which parts of the vocal anatomy have the most impact on speaker-discriminatory features in the speech signal?
This question will be addressed using morphoacoustic methods to determine which parts of the vocal tract have the largest impact on the speech signal, and how this impact may be characterised. Detailed, volumetric 3D vocal tract shape data, collected via MRI, will be used in conjunction with numerical acoustic simulation methods to systematically vary the vocal tract shape in predictable ways and determine the effect on the resulting speech.
What degree of anatomical variation can be expected among the population?
This question will be addressed using geometric morphometric techniques to capture vocal tract shapes and develop a statistical map of vocal tract shapes for all available MRI data. Vocal tract shapes vary a great deal in size and shape between individuals, and while it is not possible to develop a statistical map of all possible vocal tract shapes since everybody's is unique, this project aims to make a start on quantifying this variation using all the data currently available.
To what extent are these differences detectable by naive listeners, forensic speech experts, and automatic speaker recognition systems?
Determining relationships between the vocal tract shape and the speech signal is of little practical use if these differences are not detectable. This part of the project will use carefully-controlled stimuli to undertake for perceptual evaluation by naive listeners, detailed auditory-acoustic evaluation by forensic speech experts, and automatic evaluation by state-of-the-art automatic speaker recognition systems, in order to determine how viable any identified features are for speaker identification purposes.
This project is currently underway, and findings will be shared here as they become available.
Collecting vocal tract shape and articulation information using MRI, EMA and ultrasound.
Find out more
Voice and Identity
Using MRI to study the voice qualities of the vocal profile analysis scheme for forensic applications.
Find out more
Using numerical acoustic modelling with MRI data to synthesise speech sounds.
Find out more