2024 Eighth IEEE International Conference on Robotic Computing (IRC)
Download PDF

Abstract

The rapid development of unmanned aerial vehicles (UAVs) has intensified the need for advanced classification techniques. This paper presents a novel approach that leverages audio data transformed into visual representations through Mel-Frequency Cepstral Coefficients (MFCCs) for drone classification. Our dataset consists of 28 drone types, each with 100 five-second audio recordings, from which 30 MFCCs are extracted per file. We investigate the effectiveness of this dataset by applying various vision models to the MFCC visualizations. Our results reveal that EfficientNet achieved the highest accuracy at 96.31%, followed by ResNet50 at 94.22%, and Vision Transformer at 73.69%. These findings highlight the potential of using audio-derived visual features for robust drone classification and demonstrate the varying performance of different vision models. This study provides a comprehensive examination of the methodology, experimental setup, and results, offering valuable insights into future research directions for enhancing classification accuracy with transformed audio data.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles