Abstract
Given the increasing volume and complexity of network traffic nowadays, network operators often leverage application-layer protocols to differentiate network traffic, so as to improve quality-of-service control, security protection, and resource profiling. We present ProGraph, a tool that accurately infers protocol message formats at both byte-level and bit-level granularities. Unlike existing approaches that mainly exploit statistical features across packets, ProGraph exploits intra-packet dependency among the values of different portions of a packet payload. It systematically constructs a graphical model that captures intra-packet dependency, using various techniques in graph theory and information theory. It also achieves several important design properties for real deployment, including fine-grained inference, protocol independence, simple parameterization, robustness to noisy training sets, and fast execution. We show via trace-driven evaluations that ProGraph achieves more accurate inference than existing approaches. We further show how ProGraph can be used for classifying traffic.