Abstract
Gene expression microarray data are highly multidimensional and contain high level of noise. Most of these data involve multiple heterogeneous dynamic patterns depending on disease under study. In addition, possible errors might also be introduced along data collection path if multiple sites and methods are used. In this paper a combined data mining method, i.e., neural network with K-means clustering via principal component analysis (PCA), is proposed to address the data complexity issues when conducting gene expression profile mining. The proposed method was tested on gene expression profile in lung adenocarcinoma, collected from multiple cancer research centers, for survival prediction and risk assessment. The results from the proposed method were analyzed, and further studies for future improvement of the proposed method were also recommended