Abstract
Topic digital library is a special domain digital library based on topic features. This paper is to introduce a new approach to build topic navigation in the topic digital library using topic extraction and clustering. Topic digital library is an important application of knowledge service and it is a special domain digital library based on topic or concept features. Firstly, documents in a special domain are automatically produced by document classification approach. It integrates the rule-based and statistical method to classify the documents in the large-scale collection. Then, the keywords of each document are extracted using the machine learning approach. The keywords are used to cluster the documents subset. The clustered result is the taxonomy of the subset. Lastly, the taxonomy is modified to the hierarchical structure for user navigation by manual adjustments. The topic digital library is constructed after combining the full-text retrieval and hierarchical navigation function. The construction method of topic digital library is significant to build the topic databases or special resource databases.