Compressed Machine Learning on Time Series Data
Efficient compression through clustering using candidate selection and the application of machine learning on compressed data
Abstract
The extent of time related data across many fields has led to substantial interest
in the analysis of time series. This interest meets growing challenges to store and
process data. While the data is collected at an exponential rate, advancements in
processing units are slowing down. Therefore, active research is practiced to find
more efficient means of storing and processing data. This can be especially difficult
for time series due to their various shapes and scales.
In this thesis, we present two variants for optimising a Greedy Clustering algorithm
used for lossy time series compression. This study investigates, whether the efficient
but lossy compression sufficiently preserves the characteristics of the time series
to allow time series prediction and anomaly detection. We suggest two variants
for a performance optimization, Greedy SF and Greedy SAX. These algorithms are
based on novel lookup methods for cluster candidate selection based on statistical
features of time series and extracted SAX substrings. Furthermore, we enabled
the clustering to allow processing time series with different value ranges, which
allows the compression of time series with various scales. To validate the endto-
end pipeline including compression and prediction, a performance evaluation is
applied. To further analyse the applicability, a comprehensive benchmark against a
pipeline with an autoencoder for compression and a stacked LSTM for prediction is
performed.
Degree
Student essay
Collections
View/ Open
Date
2020-07-08Author
Finger, Felix
Gocht, Nathalie
Keywords
time series clustering
large scale data
machine learning
prediction
anomaly detection
compression
Series/Report no.
CSE 20-13
Language
eng