Abstract
As the information grows exponentially, it has become a new and basic requirement to reduce the querying area efficiently and accurately for information querying. This paper proposes a semantic distance based clustering algorithm for XML documents. It discusses the algorithm in two steps, Firstly, it forms some DTD clusters with all heterogeneous DTD documents by using the global semantic dictionary, Secondly, it computes the semantic distance between XML documents which corresponded certain DTD cluster, then build some finally XML clusters according threshold value given beforehand. Users can locate document cluster and query within this area without extending all over XML documents, and the querying results satisfying the users' requirements can be returned rapidly. The experiments show that this algorithm has good categorization function, and can facilitate information querying.