Chinese Text Simplification Based on Large Language Models

Hongying Huo; Zetong Fu; Haochen Yu; Shengguang Yang; Gongbo Tang

doi:10.1109/CLNLP64123.2024.00018

Abstract

Chinese is a complex language, creating a demand for text simplification in various contexts. Traditional text simplification models have primarily relied on pre-trained models. With the advancement of large language models (LLMs), researchers have begun leveraging their robust text comprehension and generation capabilities for text simplification tasks. However, these implementations are often simplistic, merely instructing the LLM to simplify text without addressing inherent challenges. One notable issue is the hallucination phenomenon where LLMs introduce inaccuracies, especially in fine-grained knowledge. This paper proposes an approach to interact with LLMs by embedding the text’s underlying knowledge through carefully designed prompts. These prompts provide clearer guidance to the LLMs, thereby improving the quality of text simplification. In addition, LLMs may be further improved on specific tasks after instruction fine-tuning. Thus, we constructed a text simplification training dataset with semantic knowledge from Chinese Parataxis graph, and fine-tuned existing large models specifically for the text simplification task. Experimental results demonstrate that fine-tuning LLMs can significantly enhance the quality of text simplification. Finally, we provide the proposed methods as a service via a web interface and a browser extension.

Chinese Text Simplification Based on Large Language Models

Authors

Abstract

Related Articles