Data streaming provenance in advanced metering infrastructures
Abstract
Increasing volumes of data in digital systems have made the traditional approach of
gathering and storing all the data while analyzing it in bulks at periodic intervals
challenging and costly. One such field is the electric grid market, which has started
modernizing its aging grids into smart grids where Advanced Metering Infrastructures
(AMIs) play a vital role. Within AMIs, old meters are replaced with smart
meters that are able to collect data more often and measure more properties than the
old meters. However, they also produce a higher volume of data. Stream processing
where data is analyzed continuously before being stored, can therefore be of interest
as data can be heavily reduced before storage. The downside of this approach is
that the traceability of data is lost. A technique that can solve this is called stream
provenance which can be used to get the source data that contributed to the output
data from a stream processing application. However, stream provenance is an
understudied problem that can decrease performance when used. The purpose of
this thesis is to study stream provenance by developing a streaming application that
makes use of provenance. The application is evaluated by measuring several metrics
to determine how performance is affected. The project is conducted at Göteborg
Energi (GE), one of Sweden’s biggest energy utility companies. The objective is to
develop a prototype extension to GE’s current stream-processing application that
can detect faulty meters and use stream provenance to report them. The development
processes and evaluation of the application are covered in this report. The
application is developed through a Stream Processing Engine (SPE) called Apache
Flink and a stream provenance framework called Ananke. Two versions are created,
one with provenance and another without. Performance metrics like CPU utilization,
memory consumption, latency, and throughput are measured. The result showed
that provenance decreases throughput by 10.4% and increases memory consumption
by 8.8%, latency by 10.4%, and CPU utilization by 238.1%. Several reasons behind
the result are discussed in the report, along with the implications it can have for an
application. Although there is an added overhead with provenance, it can still be
beneficial for some types of applications. For example, an application where time is
not crucial and good access to resources is possible, like the one developed in this
thesis.
Degree
Student essay
Collections
View/ Open
Date
2023-11-24Author
Mohamed, Zozk
Keywords
Advanced Metering Infrastructure
Ananke
Apache Flink
Göteborg Energi
Provenance
Stream processing
Stream Processing Engine
Language
eng