Education Technology and Computer Science, International Workshop on
Download PDF

Abstract

According to the high-dimensional sparse features of the storage of the textual document, and defects existing in the clustering methods which havealready studied by now and some other problems, an effective text clustering approach (short for TGSOM-FS-FKM) based on tree-structured growingself-organizing maps (TGSOM) and Fuzzy K-Means (FKM) is proposed. It firstly makes preprocess of texts, and filter the majority of noisy words by usingunsupervised feature selection method. Then it used TGSOM to execute the first clustering to get the rough classification of texts, and to get the initial clustering number and each text’s category. And then introduced LSA theory to improve the precision of clustering and reduce the dimension of feature vector. After that it used TGSOM to execute the second clustering to get the moreprecise clustering result, and used supervised feature selection method to select feature items. Finally, it used FKM to cluster the result set. In the experiment, it remained the same number of feature items.Experimental results indicate that TGSOM-FS-FKM clustering excels to other clustering method such as DSOM-FS-FCM, and the precision is better thanDSOM-FCM, DFKCN and FDMFC clustering.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles