Abstract
Abstract: The principle that for every document analysis task there exists a mechanism for creating well-defined ground-truth is widely held tenet. Past experience with standard datasets providing ground-truth for character recognition and page segmentation tasks supports this belief. In the process of attempting to evaluate several table recognition algorithms we have been developing, however, we have uncovered a number of serious hurdles connected with the ground-truthing of tables. This problem may, in fact, be much more difficult than it appears. We present a detailed analysis of why table ground-truthing is so hard, including the notions that there may exist more than one acceptable "truth" and/or incomplete or partial "truths."