Learning Grammatical Gender in an Artificial Language Based on Hindi

Shourya Sonkar Roy Burman


Project Proposal



Background

Grammatical gender has traditionally been thought to be a semantically arbitrary categorisation in most languages. This raises the question: how does a person associate a gender with a noun while learning the language? Recent studies on European languages have shown that distributional clues, such as co-occurrence with gender-marked articles (Dahan, Swingley, Tanenhaus, & Magnuson, 2000) and phonological similarities (Brooks, Braine, Catalano,& Brody, 1993) help in gender acquisition. An example of such distributional clues would be that French speakers remember most nouns with the definite articles in front of them, which serve as a gender determiner, i.e. they remember the noun femme (woman) as la femme (where la is the gender marked definite article). Phonological similarities in German like the suffix -lein at the end of noun would naturally qualify the noun to be a neutral gender word, even overriding natural gender.

A study to support this view was done by Mirkovic, Forrest & Gaskell (2011). Using an artificial language consisting of pronounceable English pseudo-words, they tried to teach the grammatical gender of the pseudo-words implicitly while teaching this artificial language. To implement the distributional cues they attached a determiner before each word, which acted as the gender-marked article, while phonological similarities were implemented using single syllable endings in each word, i.e. feminine words were made to end with -eem or -esh, while masculine words ended with -ool or -aaf. They were able to successfully teach the participants this artificial language by using a cross-modal learning technique developed by Breitenstein et al. (2007). Post training, the participants started matching words with their determiners with a good accuracy. However, in a set where the endings and the determiners were randomly combined, not only did the participants falter in determiner selection, they were not able correctly learn the meanings of the words either. This showed how distributional and phonological clues together are important for the correct gender acquisition.

Proposed Idea

The proposed study attempts to follow a methodology similar to the one adopted by Mirkovic, Forrest & Gaskell to study gender acquisition by native Hindi speakers using an artificial language constructed using pronounceable Hindi pseudo-words. There are a few easily noticeable differences in Hindi when compared to European languages. For example, nouns are not associated with articles naturally in Hindi and hence, whether the determiner should be before or after the noun is doubtful. It is not that distributional clues are not present in Hindi, but they are present as modifications in the verbs, adjectives and postpositions referring to that noun. Verb and adjectives are modified with -aa at the end for masculine and -ee at the end for feminine nouns usually. We will construct an artificial language with nouns compisiting of three syllables chosen from the Hindi alphabets. Two gender-like classes will be used. One class will comprise nouns ending in -oo and the other will have them ending in -uh. These two vowels have been chosen considering that they do not have gender or plurality bias. Each class will also have separate determiners, one a pre-determiner (occurring before the word) and the other a post-determiner (occurring after the word). In some cases pre-determiners will be used and in others, post-determiners. This division has been made as unlike European languages which have articles immediately preceding the noun, Hindi does not seem to follow a definite determiner structure. Perhaps we will manage to figure out a certain preference towards one type of determiner.

Proposed Methodology

A group of native Hindi speakers will be required for this experiment (preferably 5+ participants). A set of 80 such three syllable pseudo-words will be constructed, 40 in each class, 20 having pre- and 20 having post- determiners. They will be associated with pictures. Each word will be recorded in 2 different voices. The objective is to teach this artificial language by making participants match the auditory stimulus (the new word) with the shown picture. Over a training period of 4 days the participants will be exposed to each new word 30 times a day, 20 times with the correct picture and 10 times with an incorrect one and asked to say whether the word and the picture match (Word-Picture Matching). No feedback will be given. To evaluate the determiner matching, the participants will be asked to match the auditory stimulus (word without the determiner) to one of the four determiners. To further assess whether the phonological cues are being used for determiner selection, we will construct a generalisation set which will have 12 new words, 6 of each classes, 3 having pre- and 3 having post-determiners. Out of these 12, 6 will have an inconsistent determiner-ending pairing. This generalisation set will only be shown post training. Based on a determiner selection task, we will evaluate if the consistent pairing produces better determiner selection results. If so, the hypothesis will be verified.

The following training routine will be used:

methodology
Figure 1: Tasks, Schedule and the Purpose

References

  • Breitenstein, C., Zwitserlood, P., Vries, M. de, Feldhues, C.,Knecht, S., & Dobel, C. (2007). Five days versus a lifetime:Intense associative vocabulary training generates lexically integrated words. Restorative Neurology and Neuroscience, 25, 493-500.

  • Brooks, P. J., Braine, M. D. S., Catalano, L., & Brody, R. E.(1993). Acquisition of gender-like noun subclasses in an artificial language: the contribution of phonological markers to learning. Journal of Memory and Language, 32, 76-95.

  • Dahan, D., Swingley, D., Tanenhaus, M. K., & Magnuson, J. S. (2000). Linguistic gender and spoken-word recognition in French. Journal of Memory and Language, 42, 465-480.

  • Mirkovic J., Forrest S. & Gaskell M. G. (2011). Semantic Regularities in Grammatical Categories: Learning Grammatical Gender in an Artificial Language. Proceedings of the 33rd Annual Conference of the Cognitive Science Society