Prabhat Pandey (prabhatp[at]iitk[dot]ac[dot]in)
Rahul Arora (arorar[at]iitk[dot]ac[dot]in)
Advisor: Prof. Amitabha Mukerjee (amit[at]cse[dot]iitk[dot]ac[dot]in)
Abstract

Word Sense Disambiguation (referred to as WSD henceforth) is the task of finding the appropriate sense of a word used in a given sentence, when the word may have multiple senses.For example, consider these two sentences -
Mary walked along the bank of the river.
HarborBank is the richest bank in the city.
It can be noticed that the word bank refers to ‘river-side’ in first sentence and ‘financial institution’ in the second sentence. Similarly the in following sentences -
रमेश को सोना पसंद है ।
सोना एक कीमती पदार्थ है ।
The Hindi word सोना refers to ‘sleep’ in the first sentence while it points to ‘gold’ in the second sentence.
There are basically four conventional approaches to WSD - knowledge-based, supervised, semi-supervised and unsupervised. In the recent times, cross-lingual approaches have shown some good results for languages with scarce resources. In this paper, we propose a cross-lingual approach for Hindi language. This approach make use of Wikipedia articles which are present both in English as well as Hindi, WordNet and Hindi Wordnet.
References

[1] Satanjeev Banerjee and Ted Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In IJCAI'03: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 805-810, 2003.

[2] Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT-Summit, Phuket, Thailand, 2005.

[3] Els Lefever and Veronique Hoste. Semeval-2010 task 3: Cross-lingual word sense disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, pages 15-20, Uppsala,Sweden, 2010.

[4] Els Lefever and Veronique Hoste. Examining the validity of cross-lingual word sense disambiguation. In CICLing'2011: Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing, Tokyo,Japan, 2011.

[5] Michael Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC'86: Proceedings of the 5th annual international conference on Systems documentation, pages 24-26, New York, NY, USA, 1986. ACM.

[6] G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4):235-244, 1990.

[7] Dipak Narayan, Debasri Chakrabarty, Prabhakar Pande, and Pushpak Bhattacharyya. An experience in building the indo-wordnet - A Wordnet for Hindi. In GWC'02: Proceedings of the First International Conference on Global WordNet, Mysore,India, 2002.

[8] Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models.Computational Linguistics, 29(1):19-51, 2003.

[9] Ted Pederson and Varada Kolhatkar. Wordnet:: Senserelate:: Allwords: A broad coverage word sense tagger that maximizes semantic relatedness. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session, pages 17-20, 2009.

[10] J. Ramanand, Akshay Ukey, Brahm Kiran Singh, and Pushpak Bhattacharyya. Mapping and structural analysis of multi-lingual wordnets. IEEE Data Engineering Bulletin, 30(1):30-44, 2007.

[11] Bahareh Sarrafzadeh, Nikolay Yakovets, Nick Cercone, and Aijun An. Cross-lingual word sense disambiguation for languages with scarce resources. In Canadian Conference on AI'11, pages 347-358, 2011.

[12] Mehrnoush Shamsfard, Akbar Hesabi, Nick Cercone, Hakimeh Fadaei, Niloofar Mansoory, Ali Famian, Somayeh Bagherbeigi, Elham Fekri, Maliheh Monshizadeh, and S. Mostafa Assi. Semi automatic development of farsnet: The persian wordnet. In Proceedings of 5th Global WordNet Conference, Mumbai,India, 2010.