Abstract
Tremendous amounts of microarray data for various organisms have provided a rich opportunity for computational analyses of gene products. Integrating these data can help inferring gene function effectively. Nevertheless, combining various heterogeneous, incomplete and noisy microarray datasets is still challenging. To address this challenge, we have developed a new statistical model for combining multiple microarray datasets for gene function prediction. We first evaluate the statistical significance of a Pearson correlation coefficient between two gene expression profiles in a single dataset using p-value based on the standard t-statistics. We then use the joint meta-analysis p-value to quantify the posterior probability that two genes have the same function using multiple microarray datasets. The function of a gene is predicted according to the posterior probabilities of its co-expressed genes with known functions in the multiple microarray datasets. To test the sensitivity and specificity of our model, we used microarray data of yeast and human to predict gene functions. Our results show that combining multiple datasets improves the accuracy over the best function prediction of any single dataset significantly. We have implemented the method into a software tool using the C programming language. The executables under Linux and Windows are available upon request. Supplementary data along with prediction results are available at http://digbio.missouri.edu/meta_analyses.