2014 IEEE 17th International Conference on Computational Science and Engineering (CSE)
Download PDF

Abstract

This paper proposes a method for extractive multi-document summarization based on the combined features of n-grams co-occurrences and dependency word pairs co-occurrences. Unigram is the basic text unit, Big ram and skip-big ram reflect the word sequential relationships in the sentences, Dependency word pairs describe the syntactic relationships between words. The co-occurrences of each feature reflect the common topics of multiple documents in different perspective. The score of a sentence is the weighted sum of the features it contains. The summary is generated by extracting salient sentences based on the maximum significance score model. This approach obtains higher ROUGE scores than several well-known methods on the TAC summarization dataset.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles