Abstract
Cancer classification using gene expressions is extremely challenging given the complexity and high dimensionality of the data. Current classification methods typically rely on samples collected from a single tissue type and perform a prerequisite of gene feature selection to avoid processing the full set of genes. These methods fall short in taking advantage of genome-wide Next Generation Sequencing technologies that provide a snapshot of the whole transcriptome rather than a predetermined subset of genes. We propose a deep learning framework for cancer diagnosis by developing a multi-tissue cancer classifier based on whole-transcriptome gene expressions collected from multiple tumor types. We introduce a new Convolutional Neural Network architecture specifically designed to address the complex nature of whole-transcriptome gene expressions with capabilities of detecting genetic alterations driving cancer progression by learning genomic signatures across multiple tissue types without requiring the prerequisite of gene feature selection. Our model achieves 98.9% classification accuracy on human samples representing 33 different cancer tumor types across 26 organ sites.