Abstract
App stores serve as platforms for users to discover and download applications (apps). The reviews on these app stores serve as a valuable resource for app developers to monitor app performance. These reviews encompass diverse information, including user experiences, bug reports, feature requests, and overall app ratings. Analyzing app reviews has proved to be useful for many areas of software engineering (e.g., requirement engineering, testing, etc.). While deep learning methods show promise in app review classification, their black-box nature poses challenges in terms of transparency, hindering developers from fully trusting and utilizing their outputs. To address this, our paper introduces a novel approach called Interpretable App Review Classification with Transformers (IARCT) for enhancing transparency in app review classification. We illuminate the significance of individual words in the app reviews, evaluating their influence on predictions through the application of Shapley additive values. This approach generates contextual justifications that succinctly explain the reasoning behind the model's prediction. We applied this technique to both BERT and GPT-2 models, showing its effectiveness in classifying publicly available app reviews.