Homework 16

Due Friday, April 10 at 5:00 PM. Please refer to the homework policy here.

Problems

  1. The count-min-sketch data structure allows us to estimate the relative frequency of each element up to an -additive factor with probability of error with space. The original motivation, however was to also obtain a list of -heavy hitters. Explain how to adjust the count-min-sketch to also maintain a collection of all of the -heavy hitters while holding onto at most items from the stream at any point in time, with probability of error . Here your algorithm may also collect a few extra items with relative frequency in the range .

  2. In this exercise, we develop a refined analysis that can reduce the additive error substantially in many real settings. Let denote the sum of frequence counts of all elements that are not -heavy hitters:

    Note that , and might be much less than when the stream is dominated by heavy hitters. Show that, by adjusting count-min-sketch(,) slightly by increasing to , the additive error for every element is at most with probability of error .