Abstract
For enterprise storage systems, users' directory/file and IO access traces are critical for fine-tuning and new designs. However, once these systems are deployed, only trace features with small sizes are allowed to be sent back to vendors. Therefore, it is crucial to develop effective techniques for highly compressed feature extraction and feature-based high-fidelity trace regeneration. Existing works primarily focus on I/O trace modeling and regeneration without considering the directory/file access information. In this paper, we propose a new technique, called Sketcher, that can sketch massive traces into highly compressed “joint features” with both directory/file and I/O characteristics, and then based on these features regenerate high-fidelity traces with a learning-based approach. For trace feature extraction, one key idea is to divide traces into multiple distance-associated segments, where each segment contains all files and IO accesses operating under the same directory and the differences between segments are represented as displacement of segment inside the directory tree. A dynamic weight scaling technique is proposed to further compress features considering feature criticality and the size quota, thereby achieving high compression ratios with critical characteristics (e.g., abnormal IO access patterns). For trace regeneration, a new learning-based RNN model is proposed to regenerate high-fidelity traces from extracted features based on sampling directory trees. We have implemented a fully functional prototype based on typical enterprise storage systems and evaluated Sketcher with real applications and benchmarks on Huawei OceanStor Dorado storage server. Results show that Sketcher can effectively extract features with marginal runtime overheads while achieving compression ratios up to 15.2K and regenerating high-fidelity traces.