2015 International Conference on Network and Information Systems for Computers (ICNISC)
Download PDF

Abstract

This paper presents a construction method of Web Information extraction wrapper based on DOM is proposed. Combining XPath and pattern matching, it can deal with the two type of information at the same time under the guide of source and target knowledge library. Also, knowledge libraries help to extract more useful information for users. This paper introduces in detail the process of building the wrapper and the corresponding algorithm, including information judgment based on DOM, key extraction block determination by hierarchical clustering thoughts, extraction expression determination using inductive learning and natural language processing and so on.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles