2022 International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI)
Download PDF

Abstract

Because of the large number and disorder of news on the Internet, which is difficult to classify and manage accurately, a text classification method based on BiLSTM (bi-directional long short term memory) and attention mechanism is applied in this paper. Firstly, each word segment of Chinese news content is embedded into a word vector through word2vec. Then, after the feature preprocessing of the BiLSTM layer, which can learn two-way long-term dependence, the attention weight is updated by the attention mechanism. Finally, after ReLU and fully connected layers, the classifier classified the news tags. In the experiment, the THUCNews data set is used to verify the effectiveness of the method. The accuracy rate of the test set of 10000 samples is as high as 97.46%, the recall rate is 97.47%, and the F1 score is 97.45%. These three balance indexes are higher than the traditional CNN, BiLSTM, and BiLSTM+pooling classification models. The experimental results show that the BiLSTM+attention fusion model can positively affect Chinese long news text classification.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles