Abstract
Network traffic classification technique is currently a key part of network security systems. In recent years, some network traffic classification algorithms using machine learning based on packet and flow level features have been proposed, yet the results are frequently disappointing. On the one hand, obtaining a large, representative, training data set that is fully labeled to train a classifier is difficult, time-consuming, and expensive. On the other hand, the classification performance is affected by the new protocols and applications which can produce unknown traffic that existing classification systems cannot identify. To achieve effective and inexpensive classification, we propose a framework based on unsupervised methods and the tri-training method. By two independent clusterings, the proposed method can precisely detect unknown applications and extend labeled flows from a few labeled and many unlabeled flows. Meanwhile, tri-training method can effectively exploit unlabeled flows to enhance the proposed method performance. We implement our approach and evaluate it on two real-world Internet traffic traces. The experimental results demonstrate that the proposed method has more excellent performance in terms of Precision and Recall in comparison with the state-of-the-art approaches and can better handle different data sets.