Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007)
Download PDF

Abstract

Data preprocessing is important in machine learning, data mining, and pattern recognition. In particular, selecting relevant features in high- dimensional data is often necessary to efficiently construct models that accurately describe the data. For example, many lazy learning algorithms (like k- Nearest Neighbor) rely on feature-based distance metrics to compare input patterns for the purpose of classification or retrieval from a database. In previous work, we introduced Slider, a distance metric learning method that optimizes the weights of features in a protein model-building application (where features are used to describe patterns of electron density around protein macromolecules). In this work, we demonstrate the usefulness of Slider as a general method for classification, ranking and retrieval, with results on several benchmark datasets. We also compare it to other well-known feature selection or weighting methods.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles