CS 719: Introduction to Data Streams

Course Contents:

Motivating applications: network monitoring, sensor networks, need for highly efficient processing of high speed and high volume data streams, Space and time efficient randomized algorithms as a candidate solution, models of data streams. Basics of randomization: elementary probability theory, expectation, linearity of expectation, variance, Markov and Chebychev's inequality, Chernoff and Hoeffding (CH) tail inequalities, hash functions, limited independence, CH-bounds for limited independence.

Finding frequent items in data streams, Estimating distinct item queries, Estimating frequency moments, estimating join sizes, Approximate histograms over data streams, Transforms over data streams, wavelets, fourier and DCT clustering over data streams, Applications to graphs.