Abstract
In recent years, high-throughput sequencing techniques have been generating millions of biological sequences in genome projects. These enormous volume of data must be stored and treated in order to support biological research. In this work, we describe a model to represent, organize and structure data generated by a computational pipeline to support a trancriptome project. As a case study, we propose a conceptual model for a pipeline to manage original (untreated) and produced (processed) data of a transcriptome project which has the objective to identifying differentially expressed genes between liver and kidney RNA samples.