Abstract
With the rise of intelligent medical assistance, the Dialogue System for Medical Diagnosis(DSMD) guided by reinforcement learning(RL) has gained much attention. However, currently available medical dialogue datasets suffer from insufficient diagnostic evidence caused by sparse symptoms, making it difficult to reproduce the evidence-based process of doctors in differential diagnosis and disease confirmation. Moreover, purely data-driven RL often involves extensive and blind trial-and-error, leading to inquiries about irrelevant symptoms to the patient’s chief complaints in limited dialogue turns, further exacerbating the issue of inadequate diagnostic evidence. To enhance the quantity and effectiveness of potential symptom collection in DSMD, we first construct a more comprehensive medical dialogue dataset CMD based on electronic medical records. The diversity of diseases and symptoms mentioned in the dialogue context of CMD surpasses that of existing public datasets. Furthermore, to enhance the efficiency of diagnostic evidence collection in DSMD, inspired by the logic of symptom inquiries in doctor-patient interactions, we combine experiential diagnostic knowledge with a specialized medical knowledge graph to constrain the inquiry of symptoms via RL, eliminating the introduction of symptoms unrelated to the patient. Experimental results demonstrate that our model significantly outperforms competitive benchmark methods in terms of diagnostic accuracy and the efficiency of symptom inquiries. Our codes and the CMD dataset are available at https://github.com/YanPioneer/EBAD.