2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud)
Download PDF

Abstract

Finding the k-Nearest Neighbors (kNN) of a query object for a given dataset S is a primitive operation in many application domains. kNN search is very costly, especially many applications witness a quick increase in the amount and dimension of data to be processed. Locality sensitive hashing (LSH) has become a very popular method for this problem. However, most such methods can't obtain good performance in terms of search quality, search efficiency and space cost at the same time, such as RankReduce, which gains good search efficiency at the sacrifice of the search quality. Motivated by these, we propose a novel LSH-based inverted index scheme and design an efficient search algorithm, called H-c2kNN, which enables fast high-dimensional kNN search with excellent quality and low space cost. For efficiency and scalability concerns, we implemented our proposed approach to solve the kNN search in high dimensional space using MapReduce, which is a well-known framework for data-intensive applications and conducted extensive experiments to evaluate our proposed approach using both synthetic and real datasets. The results show that our proposed approach outperforms baseline methods in high dimensional space.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles