Unsupervised Morphological Analysis of Hindi Text

Group ID: N3

Advisor : Prof.Amitabha Mukherjee
Aditi Krishn
Rabi Shanker Guha

Abstract

Morphological Analysis has been an active area of research in Natural Language Processing. It aims at deriving the structure of a language by exploring word structure. A major breakthrough in this field was Goldsmith's paper 'Linguistica: An Automatic Morphological Analyser' in 2000 which takes an unsupervised approach to learning morphology and hence has widespread application because of its language independency. Since this paper was published, there has been active research in this field and attempts have been made to include phonological rules in the language structure. In this project we have explored Linguistica and extended it to include those cases where even the root morpheme changes in morphological variants.The program takes raw linguistic data as input and produces as their output an analysis of data or a grammar in the form of {stems} X {signatures}

Links

Proposal
Slides
Report
Linguistica
Code

References

[1] GOLDSMITH, J. 2000. Linguistica: An Automatic Morphological Analyser.

[2] GOLDSMITH, J. 2001. Unsupervised Learning Of The Morphology Of A Natural Language. Computat. Linguis. 27, 2, 153–198.

[3] GOLDSMITH, J.2004. An algorithm for the unsupervised learning of morphology.

[4] GOLDWATER, S. AND JOHNSON, M.2004.Priors in Bayesian Learning of Phonological Rules. http://linguistica.uchicago.edu/

[5] GOLDSMITH, J. 2005. An Algorithm For The Unsupervised Learning Of Morphology. Tech. rep. TR- 2005-06, Department of Computer Science, University of Chicago. http://humfs1.uchicago.edu/~jagoldsm/Papers/Algorithm.pdf.

[6] ’Hindi Morphology’ by Rajendra Singh and R.K. Agnihotri. Motilal Banarsidass Publ., 1997

[7] http://www.uknow.gse.harvard.edu/teaching/TC102-407.html