2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)
Download PDF

Abstract

Gender classification based on voice analysis is one of the essential tasks in speech and audio processing, with various applications such as speech recognition systems, voice assistants, call center analytics, Etc. For speech synthesis, human-computer interaction, and speaker identification - gender classification plays a vital role. Although extensive research on this topic has been done in various languages, studies can hardly be found regarding gender classification in the Bangla language. Our research aims to recognize gender in the Bangla language using deep learning approaches and voice analysis. The proposed strategy in this study consists of three stages: i) Pre-processing of the data; ii) Feature extraction utilizing the Short-Time Fourier Transforms (STFT) and Mel-Frequency Cepstral Coefficients (MFCC); iii) Classification using Convolutional Neural Network (CNN) models such as ResNet50, EfficientNetB0, InceptionV3, and DenseNet-121. Notably, 12 distinct feature combinations are used for model training and testing, using both the MFCC and STFT features singly or in combination. After thorough training and testing, InceptionV3 and EfficientNetB0 CNN models with MFCC features as input resulted in the highest accuracy of 92%, which demonstrates the system's excellent accuracy rate and its potential for use in practical settings.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles