Abstract
We propose new algorithms, collectively termed, codelet parsing, for lossy compression. The algorithms sequentially parse a given source sequence into phrases, say, sourcelets, and map each sourcelet to a distorted phrase, say, a codelet, such that the per-letter distortion between the two phrases does not exceed the desired distortion. The algorithms adaptively maintain a codebook (a set of codewords), and do not require any a priori knowledge of the soruce statistics. The algorithms use approximate string matching and, as key new idea, at each epoch, carefully select one of the many approximately matching codewords to balance between the code rate in the current epoch versus the code rate from resulting codebooks in future epochs. The algorithms are quadratic-time in the length of the source sequence and output a distorted sequence that can be naturally losslessly compressed using the Lempel-Ziv (LZ78) algorithm.