Proceedings of the Fifth Mexican International Conference in Computer Science, 2004. ENC 2004.
Download PDF

Abstract

Recent advances in information technologies and molecular biology have led to an exponential growth in genome data. DNA micro-arrays experiments are an important tool for monitoring and analyzing gene expression profiles of thousands of genes simultaneously. In particular, we are interested in identifying similar expressions patterns from the genes of the E. coli bacteria, which will help to improve the understanding of its regulation pathways. We applied the KDD (Knowledge Discovery in Databases) methodology, in particular a clustering algorithm, to gen expression data from micro-arrays experiments for E. coli under different conditions. Using AutoClass on a data base of more tan 1000 genes of E. coli, we identified about 70 clusters of genes that exhibit similar patterns of expression level, and compare them to the regulated genes groups that have been identified by the biologists. The results show many coincidences, but also important differences. These differences provide important clues for future research on the regulation process in E. coli. The contributions of this paper are threefold. First, we illustrate the application of the KDD methodology in a difficult problem in molecular biology, including the necessary steps for preprocessing the data so that the clustering techniques could be applied. Second, we made an objective comparison of the clusters obtained form the data with the groups of regulated genes considered by the experts, using two different methodologies. One is based on the Jaccard index. The other is a methodology proposed by us to compare two different clusterings. Third, we identify possible groups of co-regulated genes in E. coli that merit further research in the understanding of the gene regulation pathways.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles