Speech Sounds

1372 Words3 Pages

How do listeners extract the linguistic features of speech sounds from the acoustic signal?

Speech sounds can be defined as those that belong to a language and convey meaning. While the distinction of such sounds from other auditory stimuli such as the slamming of a door comes easily, it is not immediately clear why this should be the case. It was initially thought that speech was processed in a phoneme-by-phoneme fashion; however, this theory became discredited due to the development of technology that produces spectrograms of speech. Research using spectrograms in an attempt to identify invariant features of formant frequency patterns for each phoneme have revealed several problems with this theory, including a lack of invariance in phoneme …show more content…

Each speech signal contains information across multiple frequencies which, when charted on a spectrogram, tend to form bands known as formants. Initial attempts to understand speech perception assumed that each phoneme we perceive would have an invariant formant pattern. While it was recognised that an extended, steady state formant resulted in perception of vowel sounds while formant transitions resulted in perception of consonants, this was the limit to the pursuit of invariant pattern identification. It was discovered that phonemes are not produced one after the other, but are instead produced in parallel - coarticulation. We begin to enunciate a phoneme before we have finished articulating the preceding one, increasing the potential rapidity of speech production. Therefore there is not a consistent acoustic signal each time a certain phoneme is produced. The exact acoustic signal will be modified depending on the preceding and subsequent phonemes that make up a word, a process called …show more content…

While the range of possible formant transitions is continuous, perception of consonants relies on the identification of each formant transition as belonging to a category. This mode of perception is known as categorical perception and was initially identified by Liberman et al. in the early 1950s. They found that when participants were presented with a range of synthetic phonemes, varying only in voice-onset-time on a continuous scale, they tended to recognise each stimulus as one of two phonemes (or categories) rather than as a range of slightly different phonemes. It was suggested that this might not reflect an inability to discriminate speech sounds within a category but rather of a tendency to group such sounds into the already pre-existing categories. In order to test this, Liberman et al (1957) carried out follow up studies, in which participants were presented with two synthetic phonemes and asked to identify which was identical to a sample stimulus. They found that participants had a high success rate when the difference between two phonemes was across a category boundary, but that they performed at chance level when the phonemes differed by the same voice-onset-time but both lay within the same category. This grouping of physically different stimuli into perceptual categories is unique to perception of speech sounds; this phenomenon is not observed when a listener is

Open Document