Home > > CS 657: Information Retrieval

CS 657: Information Retrieval

CS657: Information Retrieval

Search engines have become ubiquitous with modern information needs. What is it that enables finding a document in the blink of an eye from a seemingly unending catalogue of documents, namely, the Internet? The answer is "information retrieval", essentially defined as the retrieval of information (mostly text) efficiently from a large collection of objects (mostly documents). In recent years, information retrieval needs have expanded to music, image, videos, graphs, and so on.

This course will cover the basic methods of information retrieval. In particular, it will cover the entire pipeline of building an information retrieval system, starting from the basic boolean retrieval model to designing web-scale engines. Emphasis will also be given on the recent trends in the field.

The tentative topics to be discussed are:

  1. Motivation for information retrieval
  2. Basic document retrieval
    1. Inverted index
    2. Querying using inverted index
  3. Tokenization
    1. Word segmentation
    2. Stopwords
    3. Stemming
  4. Document scoring
    1. Zone scoring
    2. Term Frequency
    3. Inverse Document Frequency
    4. Tf-idf
  5. Document as a vector
    1. Vector model
    2. Document similarity
    3. Document vector models
      1. LDA
      2. GLoVe
      3. Word2Vec
  6. Scalability
    1. Skip list
    2. Champion list
    3. Tiered index
  7. IR as a system
  8. IR for non-documents
    1. Images
    2. Graphs
    3. Audio