2018 IEEE 11th International Conference on Cloud Computing (CLOUD)
Download PDF

Abstract

Reliable deployment of services is especially challenging in virtualized infrastructures, where the deep tech-nological stack and the multitude of components necessitate automatic anomaly detection and remediation mechanisms. Traditional monitoring solutions observe the system and generate alarms when the collected metrics exceed predefined thresholds. The fixed thresholds rely on expert knowledge and can lead to numerous false alarms, while abnormal behavior that spans over multiple metrics, components, or system layers, may not be detected. We propose to use an unsupervised online clustering algorithm to create a model of the normal behavior of each monitored component with minimal human interaction and no impact on the monitored system. When an anomaly is detected, a human administrator or automatic remediation system can subsequently revert the component into a normal state. An experimental evaluation resulted in a high accuracy of our approach, indicating that it is suitable for anomaly detection in productive systems.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles