Abstract
As the agglomerative clustering algorithm is widely used in data mining, image processing, bioinformatics and pattern recognition. it has attracted great interests from both academical and industrial communities. However, existing studies neglect the decisive factor of the efficiency of the agglomerative clustering algorithm for large complex networks and usually use criterion functions which lead to inefficiency. In this paper, we propose three effective criterion functions for improving performance of agglomerative clustering algorithm. We note that clustering efficiency is determined by two factors: a) the number of neighbors of two merged clusters in each merge step; b) the number of neighbors shared by the two clusters. Based on these observations, we propose a framework for designing criterion functions in order to efficiently find clusters in very large networks. We devise three criterion functions that can effectively control the number of neighbors of clusters, and they can efficiently produce high-quality clusters. We have implemented our method and compared with existing studies on real networks, and our method outperforms state-of-the-art approaches significantly on large networks.