Continuous Parallel Approximate Frequent Elements Queries on Data Streams
Abstract
The frequent elements problem involves processing a stream of elements and finding all elements that occur more than a given fraction of the time. A relaxed version
of this problem is the -approximate elements problem which allows some false positives.
This thesis aims to solve this problem in a parallel context, where multiple
threads work together to speed up computation. Previous research has been successful
in producing algorithms that can process large streams of data very quickly,
however they divide the input stream equally among the threads in the system,
which results in excessive memory usage. The algorithm presented in this thesis, the Delegation Space-Saving algorithm, logically assigns ownership of certain elements to certain threads. This decreases space consumption and increases accuracy.
The Delegation Space-Saving algorithm was evaluated on the metrics of throughput, accuracy, and memory consumption. The algorithm was evaluated using both synthetic data with varying skew and real-world network packet data from a backbone
router. The Delegation Space-Saving algorithm uses as little as almost the same amount of memory as the single-threaded version, while also having several times higher query and update throughput and equivalent accuracy.
Degree
Student essay
Collections
View/ Open
Date
2021-10-06Author
Jarlow, Victor
Keywords
computer science
big data
Space-Saving
Misra-Gries summary
frequent items
frequent elements
concurrent programming
Delegation Sketch
domain splitting
Count-Min Sketch
Majority algorithm
pproximate frequent-elements algorithm
approximate top-k elements algorithm
Language
eng