Abstract
Non-small cell lung cancer (NSCLC) is the most common form of lung cancer. It is a complex disease, that is typically diagnosed in advanced stages, mainly using image data from PET/SPECT scans. The main focus of the present work is to develop a computer-aided classification model relying exclusively on clinical data, that can identify benign/malignant Solitary Pulmonary Nodules (SPN) related to NSCLC. For this purpose, a dataset was created using biometric and clinical data from 243 patients (54% malignant cases, 70% male, 67 average age) along with the doctor's yield. Four different well-documented Machine Learning (ML) classification algorithms were employed to provide prediction models for this scenario. A stratified ten-fold validation approach and common metrics were utilized to assess each model's performance. Furthermore, the best performing model's prediction process was analyzed in order to provide explainability for the prediction results. The significance of this study is twofold: first, it demonstrates the efficacy of ML-assisted prediction to characterize SPNs. Second, it provides an added layer of explainability to a black-box machine learning (ML) prediction model. Ergo, this approach can enhance trust and confidence in the model's results and enable users to better understand the decision-making process. The AdaBoost algorithm provided the most accurate prediction model with an accuracy of 94.33 % and True Positive Rate (TPR) of 95.71 %. Therefore, this work demonstrates the potential for an ML approach to improve the diagnosis of NSCLC, while also providing explainable classification.