Abstract
The repetition-free longest common subsequence problem is an important model in sequence comparison and analyzing conserved genes within bioinformatics. Conserved genes are highly similar across species and play a crucial role in understanding the evolution of species and gene function. However, existing algorithms suffer from limitations concerning runtime efficiency and the number of conserved genes identified, thereby hindering the practical analysis of these essential genetic elements. To address these challenges, this paper proposes a novel hybrid heuristic algorithm that partitions gene sequences based on the known conserved genes. By building a linear programming model and utilizing a linear programming exact solver, the algorithm can efficiently identify the longest subsequence that satisfies the conditions imposed by the model in a short time frame. Through experiments on simulated and real genomic data, we verify the superior performance of the algorithm in solving the repetition-free longest common subsequence problem and identifying conserved genes.