Phonotactic Constraints in McGurk Effect

Shubham Atreja, Enayat Ullah


Supervisor- Prof. Amitabha Mukerjee




Introduction

McGurk effect is a phenomenon that demonstrates an interaction between hearing and vision in speech perception. The illusion occurs when the auditory component of one sound is paired with the visual component of another sound, leading to the perception of a third sound. The visual information gathered by the audience from seeing a person speak changes the way they hear the sound. This establishes that speech perception is not at all an auditory phenomenon but an audio-visual phenomenon.

Audio stimuli Video stimuli Fused stimuli
ba-ba ga-ga da-da
ka-ka pa-pa ta-ta


In this project we will further study the perception of speech in brain by using phonotactic constraints to create a bias that will affect the possibility of the Mcgurk fusion.

Hypothesis

To establish the importance of phonotactic constraints in different languages while perceiving speech and hypothesise following observations-

Mcgurk fusion is more prominent when the word perceived from fusion is phonetically licensed and validated by the phonotactic constraints. Further, the effect of the fusion is reduced when the word perceived from either auditory or visual channel is licensed while the one perceived from fusion is not legal according to the phonotactic constraints

As the final conclusion we propose the fact that speech perception is not autonomous as phonotactic constraints can have significant influence on the McGurk effect.

Motivation

McGurk effect was originally discovered by McGurk and Macdonald in 1976[1]. Since then a number of attempts have been made to study the effects of a bias on the McGurk fusion. Windmann discovered how sentence context and expectation affected the Mcgurk illusion in German language [2]. The results proved that that the sentence context did influence the strength of the McGurk effect, strengthening the effect when the word due to Mcgurk fusion was congruent with the sentence context and significantly weakening it, when it was out of context. Azra N. Ali did a similar work on sentences in English, giving the same results. He further quantified the strength of the effect on a more evidential basis by using probabilistic grammars of the N-gram type.

In our work, we plan to use phonotactic constraints to influence the strength of Mcgurk effect and establish the importance of these constraints in any language

Each word is formed from different sets of phonemes. All the languages put some constraints on the way in which different phonemes can be arranged to form syllables. These are the phonotactic constraints and they severely limit the number of syllables that would be theoretically possible in a language. These restrictions are different for different languages.

An example of such constraints can be, in English /sl/ is allowed while /ls/ is not allowed at the starting of a word.

Methodology

The experiment will be carried out on two sets of people, A and B. The set A consists of people able to speak a language LA (say Hindi) while people in set B speak the language LB (say English). Both the sets will be subjected to 5 different scenarios consisting of audio and audio-visual clips

Even though the tasks for the two groups will be identical, it will result in different scenarios since different languages have different phonotactic constraints.

Cases Group A Group B
Audio only Spoken word is phonotactically licensed Spoken word is phonotactically licensed
Audio only Spoken word is phonotactically restricted Spoken word is phonotactically restricted
Audio-Visual Fused word is phototactically licensed, Spoken words are restricted Fused word as well as spoken words are phototactically licensed
Audio-Visual Fused word is phototactically restricted, either of the spoken word is licensed Fused word as well as spoken words are phototactically licensed
Audio-Visual Fused word as well as spoken words are phototactically licensed Fused word as well as spoken words are phototactically licensed

Now the words (either spoken in audio/video channel or fused) are chosen such that-
1) They results in the above mentioned situations for the two groups.
2) They are are semantically meaningless for both the groups, so that the only bias in their mind is due to the phonotactic constraints.

EXPERIMENT
For each of the above case, the group of people are shown an audio/audio-visual clip of a person enunciating a word. In case of audio-visual clips, the auditory component of one spoken word is paired with the visual component of another spoken word, so that it results in Mcgurk effect and a new word is formed (fused word). These words start with a syllable which may or may not be phonotactically licensed. The subject is asked to report the word as whatever they perceive it to be and their responses will be recorded.

Finally a probability based approach will be used to quantize the strength of Mcgurk effect in different cases for both the groups.



References and Readings:

    [1] McGurk, H., & MacDonald, J. (1976); Hearing lips and seeing voices. Nature, 264, 746-748.
    [2] Windmann, S., "Effects of sentence context and expectation on the McGurk illusion", J. Memory and Language, Vol 50, 2004.
    [3] Ali A. N. (2007). Exploring semantic cueing effects using McGurk fusion, in Auditory-Visual Speech Processing (Hilvarenbeek: Kasteel Groenendaal)
    [4] http://clas.mq.edu.au/speech/phonetics/phonology/syllable/syll_phonotactic.html