Abstract
Recently, imbalanced traffic classification has attracted more attention due to the fact that most internet traffic exhibits imbalance behavior. However, few works only have considered real-time imbalanced traffic classification. In this project, we propose a comparative study comprising several machine learning algorithms for nine different scenarios. We vary dataset and flow sizes following an under-sampling approach, in order to establish an objective evaluation of the best parameters for classification. The results showed that: 1) Combined with packet length, inter-arrival time and maximum segment size, features related to TCP session signalization enhance imbalanced traffic classification performances; 2) Ensemble approaches, especially Bagged Random Forest, achieve the best results for real-time imbalanced traffic classification; 3) Increasing flow sizes while reducing (to a certain level) training set sizes, enhances classification performances as we learn more about each individual instance. The best classification scenario includes 500 samples in each class with 8 packets flows.

