Proceedings of IEEE Computer Society Bioinformatics Conference
Download PDF

Abstract

Given a set of n sequences, the multiple sequence alignment problem is to align these n sequences, with gaps or otherwise, such that the commonality of the sequences is projected appropriately. If m is the total sum of the lengths of the input sequences, A is the alphabet size of the input sequences, and P is the final number of unique patterns, fixed by the user, that cause an alignment between sequences, then the algorithm runs in time bound O(m(A + P)), linear worst case time. Our algorithm runs on both sequences where A is small and large. Our algorithm forms the alignment by first discovering patterns, and thus is also a pattern discovery solution. We support our theoretical conclusions with experimental results obtained from running our algorithm on GenPept sequences and the human genome sequences from GenBank public domain database. Our algorithm uses a direct n-wise alignment. The algorithm uses a constant memory space irrespective of the value of m. What differentiates this algorithm from most others is that it is deterministic; it is guaranteed and theoretically proved that all patterns of any arbitrary length that occur in atleast k sequences and that are responsible for multiple sequences alignment, are found by the algorithm, where k is specified by the user.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles