CSE - IIT Kanpur

CS 698O: Special Topics in Natural Language Processing

Units: 3-0-0-0-9

Pre-requisites:

Instructor's consent and Must: Introduction to Machine Learning (CS771), Proficiency in Linear Algebra, Probability and Statistics, Proficiency in Python Programming

Desirable:

Probabilistic Machine Learning (CS772), Topics in Probabilistic Modeling and Inference (CS775), Deep Learning for Computer Vision (CS776)

Departments which may be interested:

CSE, EE, MTH, IME, ECO

Level of the course:

Senior UG and PG (6xx level)

Course Description:

Natural language (NL) refers to the language spoken/written by humans. NL is the primary mode of commu- nication for humans. With the growth of the world wide web, data in the form of textual natural language has grown exponentially. This calls for development of algorithms and techniques for processing natural language for the purposes of automation and for the development of intelligent machines. This course will primarily focus on understanding and developing techniques/learning algorithms/models for processing text. We will have a statistical approach to Natural Language Processing (NLP), wherein we will learn how one could develop natural language understanding models from regularities in large corpora of natural language texts.

Tentative Topics:

Introduction to Natural Language (NL): why is it hard to process NL, linguistics fundamentals, etc
Language Models: n-grams, smoothing, class-based, brown clustering
Sequence Labeling: HMM, MaxEnt, CRFs, related applications of these models e.g. Part of Speech tagging, etc.
Parsing: CFG, Lexicalized CFG, PCFGs, Dependency parsing
Applications: Named Entity Recognition, Coreference Resolution, text classification, toolkits e.g. Spacy, etc.
Distributional Semantics: distributional hypothesis, vector space models, etc.
Distributed Representations: Neural Networks (NN), Backpropogation, Softmax, Hierarchical Softmax
Word Vectors: Feedforward NN, Word2Vec, GloVE, Contextualization (ELMo etc.), Sub- word information (FastText, etc.)
Deep Models: RNNs, LSTMs, Attention, CNNs, applications in language, etc.
Sequence to Sequence models: machine translation and other applications
Transformers: BERT, transfer learning and applications

References:

The course is based on the following text books:

Introduction to Natural Language Processing, Jacob Eisenstein
Speech and Language Processing, Daniel Jurafsky, James H.Martin
Foundations of Statistical Natural Language Processing, CH Manning, H Schutze
Natural Language Understanding, James Allen

Other than text books mentioned above, this course gleans information from a variety of sources like books, research papers, other courses, etc. Relevant references would be suggested in the lectures.

CS 698O: Special Topics in Natural Language Processing

Units: 3-0-0-0-9

Pre-requisites:

Desirable:

Departments which may be interested:

Level of the course:

Course Description:

Tentative Topics:

References:

People

Resources

Programs

Admissions

Department

Research