CS 698D: Topics in Data Compression

Course Contents

This  course  covers  the  introductory  basics  and  some  recent  topics  in  the  area  of  lossless  data  compression. We will cover the following topics. First,  we  introduce  the  notion  of  a  lossless compressor. We will  cover the  fundamental  techniques  like Shannon-Fano  coding,  Huffman  coding  and  Arithmetic  codes.  We will  cover Kolmogorov complexity  as the "limit notion" of a lossless code, where the compressor can be any arbitrary program.

Special emphasis will be given to the class of "universal codes", including the Lempel-Ziv algorithm and the Burrows-Wheeler transform. We then study the optimality and rate of convergence results of the Lempel-Ziv algorithm, in the weak sense, and in the strong sense.

We  then cover some current topics in the topic of universal codes.
Specifically,the class of universal codes has  been  shown  to  be "non-rob ust"  with  respect  to  small  perturbations.  We will  study  the  tools  used  to establish these results, and investigate some open problems in this area.

Finally,we finish  with  some  applications  of  the  Lempel-Ziv a lgorithm  to  areas  like Statistical  inference, and algorithms.
Mathematical  techniques  used  in  these  results  include  recurrence theorems  from  ergodic  theory, and "Rokhlin-Kakutani tower constructions" from Ergodic theory (also known as "cutting and stacking".

If time permits, we will consider the recent class of optimal compressors called "Asymmetric Numeral Systems", which were introduced in 2014.


Books and References

  1. "Elements of Information Theory" by Cover and Thomas [5]
  2. "Information  Theory"  by  Csizar  and  Korner  [6]
  3. "Text Compression"  by  Bell,  Cleary  and  W itten[7]