2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)
Download PDF

Abstract

One of the main goals of genome wide association studies (GWAS) has been detecting the gene-gene interactions, also known as epistasis in a broad sense, underlying complex diseases. The ability of decision trees and their ensembles to capture interactions among input variable has attracted attention among computational biologists for this aim. However, individual decision trees suffer from some limitations including data fragmentation and representational problem that can impact the epistasis detection performance of their ensembles when not taken into account. Here we take a closer look at feature selection capability of AdaBoost in the realm of epistasis detection and the effect of tuning the weak classifiers on its performance. We also explore the efficacy of applying different statistical and information theoretic strategies in tandem with AdaBoost in order to improve its performance. The results show that the performance of AdaBoost is more sensitive to the parameters settings of the weak learner when risk allele frequencies are low, which can be explained with respect to the data fragmentation phenomenon. Also depending on the model of interaction between the risk SNPs different criterion might excel in the second stage.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles