2015 IEEE 18th International Conference on Computational Science and Engineering (CSE)
Download PDF

Abstract

The unsupervised analysis of data-sets, both large in dimension as well as in number of objects, are one of the most challenging tasks in data intense sciences. Especially in astronomy, dedicated survey telescopes generate an enormous amount of complex data. For example the database of the Sloan Digital Sky Survey (SDSS DR10) contains 3 million spectra with ca. 5,000 values each. Analyzing those spectra is computationally demanding when applying standard techniques and standard similarity measures. In addition to the big data aspects one has to deal with the uncertainties of the measurements. We present a generic and noise tolerant similarity measure which is based on box counting methods and comparable to calculating fractal dimensions. Besides the theoretical aspects of the proposed method, the implementation details as well as the achieved evaluation results are discussed in this paper. Our implementation exploits current affordable computing architectures with large memory resources. The Fractal Similarity Measure enables scientists to perform clustering, classification and outlier detection in nowadays databases. Event though this is a generic method, the experiments shown in this paper demonstrate the performance just for clustering.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles