Abstract
It is increasingly appreciated that the tumor stroma is an integral part of cancer initiation, growth, and progression. Recently it has been shown that the stromal elements of tumors hold prognostic as well as response-predictive information. This work proposes a multi-scale image analysis and machine learning pipeline for epithelial versus stromal tissue identification in images of H&E stained breast cancer specimens. Unlike many studies that perform pixel or block-based epithelium-stroma classification, this pipeline includes an explicit image segmentation module. We first partition the H&E stain images into coherent partitions/superpixels, then extract a number of regional color and texture features from these partitions, and finally use support vector machine classifiers to classify them into epithelium and stroma classes. We propose a multi-scale hierarchical fuzzy c-means (HFCM) approach for segmentation of the images. We also investigate multi-scale feature extraction and descriptors. Our experimental results on Stanford Tissue Microarray Database show that multi-scale regional feature descriptors outperform single-scale feature descriptors. Experimental results also show that when the same set of regional features are used, classification of HFCM-based partitions outperforms classification of both regular blocks and SLIC superpixels.