An Investigation on Online Machine Learning for Anomaly Detection in Time Series Data

Sebastian Niklas Wette

Abstract

Concept drift in time series data poses a problem for many machine learning algorithms. Underlying shifts in the statistical properties of data lead to a decline in the performance of batch-trained models. Anomaly detection algorithms working with forecasts on the future behavior of a system suffer from these effects. Thus, adaption to concept drift is a fundamental challenge for anomaly detection systems like this, especially in quickly evolving environments. Instead of retraining models from scratch regularly, models can continuously learn and update themselves as new data arrives, a strategy known as online learning. This thesis investigates the efficacy of online machine learning in prediction-based anomaly detection for time series data under concept drift, focusing on accuracy and computational efficiency compared to batch-trained methods. The work presents a proof of concept for prediction-based anomaly detection using online learning. Furthermore, the research compares the performance of the presented approach with the well-known batch-trained models SARIMA and Prophet, using real-world data from Deutsche Telekom’s IP Backbone, focussing on accuracy and efficiency. Resulting measurements indicate that the online learning approach is more accurate in detecting anomalies when concept drift exists in the data. It exhibits superior adaptability to concept drift, whereas batch-trained models fail to produce adequate forecasts after a changepoint. However, batch-trained models perform better in static data environments. Lower CPU and memory usage and faster runtimes indicate the superior computational efficiency of the online learning method. Finally, this study confirms the superiority of online learning for prediction-based anomaly detection under concept drift. It suggests potential applications in real-time systems and dynamic data environments. There are also some limitations to this approach that motivate future work. Forthcoming research should explore more diverse online learning algorithms for different use cases and address the challenges of online MLOps, namely hyperparameter tuning. Additionally, distinguishing between anomalies and concept drift remains a critical challenge, suggesting avenues for further exploration in adaptive learning strategies.

Topics
Online ML Anomaly Detection Time Series Concept Drift
Research Methods
systematic literature review benchmarking

Publication Data

Author: Sebastian Niklas Wette
Thesis Type: Bachelor's Thesis
Pages: 60
Language: English
DOI:
About the Author:
Major / Study Program: Artificial Intelligence and Machine Learning
Primary Field of Study: Computer Science
Additional Study Interests: Web Developement
License: CC BY-NC 4.0
Date of Publication: 09/16/24
Status: Available
Date of Grading: 03/31/24
Institution: Darmstadt University of Applied Sciences (Darmstadt University of Applied Sciences, Germany)

Endorsements

# Name Details Endorsement
1
Dr. Florian Heinrichs
Examiner
Professor in Data Science and Statistics
09/15/24
01:00:00 AM

Thesis Documents and Supplemental Materials

10/08/24 01:27:40 PM
# Description Type Upload Date Location
1 Thesis Document PDF (21.83MB) 09/09/24 01:00:00 AMIPFS Download Raw