2008 IEEE International Conference on Data Mining Workshops
Download PDF

Abstract

This paper presents a new keyword extraction algorithm for Chinese news web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the corelation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single web page without corpus. Experiments on randomly selected web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles