2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Download PDF

Abstract

Post-database searching is a key procedure for peptide spectrum matches (PSMs) in protein identification with mass spectrometry-based strategies. Although many machine learning-based approaches have been developed to improve the accuracy of peptide identification, the challenge remains for improvement due to the poor quality of data samples. CRanker has shown its effectiveness and efficiency in terms of the number of identified PSMs compared with benchmark algorithms. However, it has two weaknesses: overfitting and instability on small-sized datasets. In this paper, we incorporate two new strategies into CRanker to tackle its weaknesses. First of all, we modify the CRanker model by using different weight parameters for the learning losses of decoy and target PSMs. Moreover, we employ self-paced learning in training process to help the classifier getting avoid of those incorrect PSMs. Experimental studies show the modified CRanker with new strategies is more stable than the original one and outperforms benchmark methods in terms of the number of identified PSMs at the same false discovery rates (FDRs).
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles