Abstract
The advent of droplet-based transcriptomics platforms has enabled parallel screening over thousands or millions of cells. One of the challenging issues is to identify the rare cells from the ultra-large scRNA-seq data. Existing algorithms to find rare cells are time consuming or memory-exhausting. We propose an efficient and accurate method, Discovery of Rare Cells (DoRC). The rareness scores generated by DoRC can help biologists focus the downstream analyses only on a fraction of expression profiles within ultra-large scRNA-seq data. We also demonstrate the efficacy of DoRC in delineating human blood dendritic cell sub-types using ~68k single-cell expression profiles of human blood cells. DoRC can recover artificially planted rare cells and is sensitive to cell type identities as well.