Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007)
Download PDF

Abstract

Sequential patterns are used to discover knowledge in a wide range of applications. However, in many scenar- ios pattern quality can be low, due to short lengths or low supports. Furthermore, for dense datasets such as proteins, most of the sequential pattern mining algorithms return a tremendously large number of patterns, which are difficult to process and analyze. However, by relaxing the defini- tion of frequency and allowing some mismatches, it is pos- sible to discover higher quality patterns. We call these pat- terns Frequent Approximate Substrings or FAS-patterns and we introduce an algorithm called FAS-Miner, to handle the mining task efficiently. The experiments on real-world pro- tein and DNA datasets show that FAS-Miner can discover patterns of much longer lengths and higher supports than standard sequential mining approaches.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles