2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Download PDF

Abstract

As a pivotal element in transcriptional regulation, the transcription factor is required to regulate and control gene expression. It is difficult for the predicting problem of transcription factor binding sites (TFBSs), and it is an important task in biology. In our study, we propose a new algorithm WMS_TF of weighted multi-granularity scanning strategy based on the deep forest method, we assign unique weight vectors to scan windows in multi-granularity scanning, and the algorithm pays more attention to the important features. In addition to single DNA base features, the paper also presents the method of multi-base feature encoding in feature representation. The algorithm WMS_TF uses DNA sequences for training, directly implements sequence-to-function prediction, and reduces the impact of noisy data for results. Experiments show that the algorithm WMS_TF can effectively predict TFBSs according to DNA sequences. Especially in the small data set, the corresponding index scores of the results are higher than similar algorithms, such as algorithm Adaboost, Deep Forest, and KNN. The accuracy of algorithm WMS_TF reaches 89.43%, the F1-Measure attains 89.20%, and the AUC achieves 92.19%. New ideas are provided by weighted multi-grained scanning and combined feature representations for predicting TFBSs.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles