Abstract
Data preprocessing is important in machine learning, data mining, and pattern recognition. In particular, selecting relevant features in high- dimensional data is often necessary to efficiently construct models that accurately describe the data. For example, many lazy learning algorithms (like k- Nearest Neighbor) rely on feature-based distance metrics to compare input patterns for the purpose of classification or retrieval from a database. In previous work, we introduced Slider, a distance metric learning method that optimizes the weights of features in a protein model-building application (where features are used to describe patterns of electron density around protein macromolecules). In this work, we demonstrate the usefulness of Slider as a general method for classification, ranking and retrieval, with results on several benchmark datasets. We also compare it to other well-known feature selection or weighting methods.