2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Download PDF

Abstract

Predicting protein function from sequences through machine learning can improve the understanding of novel proteins and biological mechanisms. Existing methods mainly rely on one-dimensional convolution or natural language processing (NLP) techniques to extract features from sequences, but they suffer from limited predictive performance. To address this challenge, we propose MulAxialGO, a new method that leverages multi-modal feature fusion to improve prediction accuracy. MulAxialGO integrates the prior features of a large-scale pre-trained protein language model and the posterior features of dynamic embedding coding and sequence homology. In addition, MulAxialGO employs a comprehensive image feature encoder to extract features from sequences, providing a novel perspective for protein function prediction. MulAxialGO is tested on two benchmark datasets and achieves state-of-the-art results. On the 2016 dataset, MulAxialGO significantly outperforms DeepGOPlus, improving molecular function by 4.5 points, biological process by 2.4 points and cellular component by 1.6 points for the AUPR metric. Similarly, on the NetGO dataset, MulAxialGO outperforms the state-of-the-art NetGO2.0, improving Fmax by 1.1 points for biological process and 2.3 points for cellular component.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles