CS365 Project: Cross-Lingual WSD

Cross-Lingual Word Sense Disambiguation using Wordnets and Context based Mapping

Prabhat Pandey (prabhatp[at]iitk[dot]ac[dot]in)
Rahul Arora (arorar[at]iitk[dot]ac[dot]in)
Advisor: Prof. Amitabha Mukerjee (amit[at]cse[dot]iitk[dot]ac[dot]in)

Abstract

Word Sense Disambiguation (referred to as WSD henceforth) is the task of finding the appropriate sense of a word used in a given sentence, when the word may have multiple senses.For example, consider these two sentences -
Mary walked along the bank of the river.
HarborBank is the richest bank in the city.
It can be noticed that the word bank refers to ‘river-side’ in first sentence and ‘financial institution’ in the second sentence. Similarly the in following sentences -
रमेश को सोना पसंद है ।
सोना एक कीमती पदार्थ है ।
The Hindi word सोना refers to ‘sleep’ in the first sentence while it points to ‘gold’ in the second sentence.
There are basically four conventional approaches to WSD - knowledge-based, supervised, semi-supervised and unsupervised. In the recent times, cross-lingual approaches have shown some good results for languages with scarce resources. In this paper, we propose a cross-lingual approach for Hindi language. This approach make use of Wikipedia articles which are present both in English as well as Hindi, WordNet and Hindi Wordnet.

Links

Area
Project Proposal
Presentation
Final Report
Download Source Code(gzipped tarball)
Download Results(gzipped tarball)

References

[1] Satanjeev Banerjee and Ted Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In IJCAI'03: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 805-810, 2003.

[2] Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT-Summit, Phuket, Thailand, 2005.

[3] Els Lefever and Veronique Hoste. Semeval-2010 task 3: Cross-lingual word sense disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, pages 15-20, Uppsala,Sweden, 2010.

[4] Els Lefever and Veronique Hoste. Examining the validity of cross-lingual word sense disambiguation. In CICLing'2011: Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing, Tokyo,Japan, 2011.

[5] Michael Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC'86: Proceedings of the 5th annual international conference on Systems documentation, pages 24-26, New York, NY, USA, 1986. ACM.

[6] G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4):235-244, 1990.

[7] Dipak Narayan, Debasri Chakrabarty, Prabhakar Pande, and Pushpak Bhattacharyya. An experience in building the indo-wordnet - A Wordnet for Hindi. In GWC'02: Proceedings of the First International Conference on Global WordNet, Mysore,India, 2002.

[8] Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models.Computational Linguistics, 29(1):19-51, 2003.

[9] Ted Pederson and Varada Kolhatkar. Wordnet:: Senserelate:: Allwords: A broad coverage word sense tagger that maximizes semantic relatedness. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session, pages 17-20, 2009.

[10] J. Ramanand, Akshay Ukey, Brahm Kiran Singh, and Pushpak Bhattacharyya. Mapping and structural analysis of multi-lingual wordnets. IEEE Data Engineering Bulletin, 30(1):30-44, 2007.

[11] Bahareh Sarrafzadeh, Nikolay Yakovets, Nick Cercone, and Aijun An. Cross-lingual word sense disambiguation for languages with scarce resources. In Canadian Conference on AI'11, pages 347-358, 2011.

[12] Mehrnoush Shamsfard, Akbar Hesabi, Nick Cercone, Hakimeh Fadaei, Niloofar Mansoory, Ali Famian, Somayeh Bagherbeigi, Elham Fekri, Maliheh Monshizadeh, and S. Mostafa Assi. Semi automatic development of farsnet: The persian wordnet. In Proceedings of 5th Global WordNet Conference, Mumbai,India, 2010.