Abstract
K-mer Mapping, an internal process for many de novo genome fragments assembly methods, constitutes a computational challenge due to its high main memory consumption. We present in this paper a study of indexing methods to deal with this problem, considering plant genome assembling. We propose an ad-hoc I/O cost model to analyze the performance of B+− tree and hashing index structures. We use indexes to detect duplicate k-mers and improve the execution time. An actual RDBMS implementation for experiments with a sugarcane data set shows that one can obtain considerable performance gains while reducing RAM requirements.