CS671: Natural Language Processing
Assignment 1

Personal Details

Name: ASHISH DWIVEDI
Roll Number: 15111011
1st year, M.Tech(Computer Science and Engineering)
IIT Kanpur


Hindi

Hindi Corpus

Syllable Frequency Plot
List of top syllables and Bigram
Letter Frequency Plot
List of top Letters and Bigram
Word Frequency Plot
List of top Words and Bigram

Bengali

Bengali Corpus

Syllable Frequency Plot
List of top syllables and Bigram
Letter Frequency Plot
List of top Letters and Bigram
Word Frequency Plot
List of top Words and Bigram

Here I am giving the code. To run program correctly all the files in given zip file
should be kept in same folder. Hindi is by default language, but you can use -hindi
for hindi corpus as command line argument for eg:

java Assignment1 OR java Assignment1 -hindi

and you must use -bengali as command line argument for bengali corpus:
java Assignment1 -bengali.

Code