Abstract
Concept drift in time series data poses a problem for many machine learning algorithms. Underlying shifts in the statistical properties of data lead to a decline in the performance of batch-trained models. Anomaly detection algorithms working with forecasts on the future behavior of a system suffer from these effects. Thus, adaption to concept drift is a fundamental challenge for anomaly detection systems like this, especially in quickly evolving environments. Instead of retraining models from scratch regularly, models can continuously learn and update themselves as new data arrives, a strategy known as online learning. This thesis investigates the efficacy of online machine learning in prediction-based anomaly detection for time series data under concept drift, focusing on accuracy and computational efficiency compared to batch-trained methods. The work presents a proof of concept for prediction-based anomaly detection using online learning. Furthermore, the research compares the performance of the presented approach with the well-known batch-trained models SARIMA and Prophet, using real-world data from Deutsche Telekom’s IP Backbone, focussing on accuracy and efficiency. Resulting measurements indicate that the online learning approach is more accurate in detecting anomalies when concept drift exists in the data. It exhibits superior adaptability to concept drift, whereas batch-trained models fail to produce adequate forecasts after a changepoint. However, batch-trained models perform better in static data environments. Lower CPU and memory usage and faster runtimes indicate the superior computational efficiency of the online learning method. Finally, this study confirms the superiority of online learning for prediction-based anomaly detection under concept drift. It suggests potential applications in real-time systems and dynamic data environments. There are also some limitations to this approach that motivate future work. Forthcoming research should explore more diverse online learning algorithms for different use cases and address the challenges of online MLOps, namely hyperparameter tuning. Additionally, distinguishing between anomalies and concept drift remains a critical challenge, suggesting avenues for further exploration in adaptive learning strategies.
Publication Data
Endorsements
# | Name | Details | Endorsement |
---|---|---|---|
1 |
Dr. Florian Heinrichs
Examiner |
Professor in Data Science and Statistics
|
09/15/24 01:00:00 AM |