Abstract
In many NLP applications, text topic identification is a common problem. Traditional topic identification method always generated a single-layered topic structure which is usually inaccurate topic division even if generated manually by the human experts. This paper proposed a concept of hierarchical topic which used multi-layer topic tree structure to represent the text or text set. Secondly, this paper proposed an iterative text units clustering method to recognize automatically the hierarchical topic of the text set. In this method, text clustering processing paused when each topic in the text set were correctly divided into multiple sub-topics, and such processing continued until a hierarchical topic tree had been built. A difficult problem of this method was how to automatically determine multiple pause threshold values and was resolved by the minimized clustering entropy method in this paper. The results of our experiments demonstrated the effectiveness of the method.