2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI)
Download PDF

Abstract

This study explores various techniques for extracting features from unstructured textual data and evaluates their effectiveness in text classification. Categorizing text into appropriate categories is a crucial task in natural language processing (NLP) known as text classification. The most commonly used feature extraction methods for this task are TF-IDF and NLP techniques. For this research, a dataset of Amazon customer reviews extracted from Amazon's website is used. The dataset contains 3071 reviews, along with star ratings, review dates, variants, and feedback for various Amazon Alexa products. To enhance classification accuracy, the study proposes a stacking ensemble method that combines the predictions of Random Forest Classifier and Gradient Boosting Classifier models with TF-IDF feature weighting. In addition, the experiment compared the proposed method's effectiveness with other methods by evaluating various feature extraction techniques, such as NGrams, Bag-of-Words (BoW), TF-IDF, Word2vec, and Glove, along with several classifiers, including Logistic Regression (LR), Decision Tree (DT), Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), k-nearest neighbors (KNN), and Random Forest (RF). According to the results of the experiment, the Stacking Ensemble (RF+GBM-Stacking) approach performed better than other methods. The proposed technique demonstrated promising results with an accuracy, precision, recall, and F1 score of 0.85, 0.82, 0.87, and 0.84, respectively. These findings suggest that the technique has the capability to improve text classification performance significantly.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles