Visa enkel post

dc.contributor.authorHammargren, Lina
dc.contributor.authorWu, Wei
dc.date.accessioned2021-06-14T17:44:44Z
dc.date.available2021-06-14T17:44:44Z
dc.date.issued2021-06-14
dc.identifier.urihttp://hdl.handle.net/2077/68598
dc.description.abstractAbstract Software development with continuous integration changes needs frequent testing for assessment. Analyzing the test output manually is time-consuming and automating this process could be beneficial to an organization. The goal of this thesis project is to do the automated anomaly detection analysis of software test output files provided by Volvo Group Trucks Technology, to achieve this we evaluated four different neural network architectures. The four neural network architectures are two recurrent neural networks with long short-term memory (LSTM) where one is unidirectional and one is bidirectional as well as two autoencoders (an LSTM-based sequence-tosequence model and a Transformer) that aim to reconstruct a sequence from the files. In order to evaluate the performance of the neural network architectures two datasets were utilized. The first dataset is from the Hadoop Distributed File System (HDFS) and this is a publicly available dataset where all logs are labelled as either anomalous or non-anomalous. The second dataset are log files resulting from software testing provided by Volvo Group Trucks Technology which contain no labels. The networks were evaluated in two different settings when trained on the HDFS data. In the first setting the logs labelled as anomalous were filtered out making it a semi-supervised approach and in the second setting the logs labelled as anomalous were kept which makes it an unsupervised approach. Lastly the networks were trained on the data provided by Volvo Group Trucks Technology which is unlabeled, the objective of approach is to evaluate how the networks perform in an unsupervised setting. In addition, an analysis of the size of the data sets used to train the networks were performed. The results show that for the data provided by Volvo Group Trucks Technology the size of the dataset used for training the networks influenced the performance of the anomaly detection where a smaller dataset performed better than a larger dataset. Moving on to the HDFS dataset, a smaller dataset for the unsupervised setting was also better than a larger dataset. However, for the HDFS data the semi-supervised approach outperformed the unsupervised setting regardless of the size of the training dataset.sv
dc.language.isoengsv
dc.subjectanomaly detection, recurrent neural network, long short-term memory, semi-supervised learning, seq2seq, transformer, unsupervised learning, log analysissv
dc.titleSequential Anomaly Detection for Log Data Using Deep Learningsv
dc.typetext
dc.setspec.uppsokPhysicsChemistryMaths
dc.type.uppsokH2
dc.contributor.departmentUniversity of Gothenburg/Department of Mathematical Scienceeng
dc.contributor.departmentGöteborgs universitet/Institutionen för matematiska vetenskaperswe
dc.type.degreeStudent essay


Filer under denna titel

Thumbnail

Dokumentet tillhör följande samling(ar)

Visa enkel post