2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)
Download PDF

Abstract

Extracting protein-protein interactions (PPIs) from articles is important in comprehending the underlying biological processes. With advances of natural language processing, many automatic PPI extraction methods from articles such as the machine learning-based methods, including the feature-based methods and the kernel-based ones, have been developed. However, the results of these methods still need to be improved much more. We propose a novel method to extract PPIs from articles. We use many diverse features, including lexical features obtained from sentences and features obtained from parse trees. We also devise new features extracted from shortest dependency paths obtained from dependency trees. In our method, after the training data and the test data are partitioned into subsets based on the basic structures of the sentences and the process of the feature selection (FS) is performed, we decrease the values of all the features, which belong to each group of similar features, of each instance by multiplying them with corresponding shrink coefficients of features. These shrink coefficients are determined automatically. Our experimental results using five corpora show the usefulness of the proposed method.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles