Abstract
Gene expression is a pivotal biological process within organisms, and in recent years, the prediction of gene expression levels has garnered increasing attention due to its vast potential in clinical applications. Predicting gene expression levels is a complex problem as gene expression is influenced by multiple factors, including but not limited to gene sequences, epigenetic modifications, transcription factor binding, and micro-environmental conditions. This paper proposes a model named Multimodal Expression, based on the Transformer architecture, which integrates various data types. The model can extract effective features from gene promoter sequences and combine pre-transcriptional and post-transcriptional regulatory information to predict gene expression levels. Experimental results demonstrate that our model can extract more effective information from promoter sequences, and the attention mechanism in the Transformer can integrate multiple data types to jointly predict gene expression levels. Compared to previous methods, our model’s R2 values improved by 7.05%, 8.9%, and 1.91% when using gene sequence data alone, gene sequence data combined with mRNA half-life data, and gene sequence data combined with mRNA half-life data and transcription factor data, respectively.