Abstract
Recent advancements in Natural Language Processing (NLP) have spurred significant interest in analyzing social media text data for identifying linguistic features indicative of mental health issues. However, the domain of Expressive Narrative Stories (ENS)—deeply personal and emotionally charged narratives that offer rich psychological insights—remains underex-plored. This study bridges this gap by utilizing a dataset sourced from Reddit, focusing on ENS from individuals with and without self-declared depression. Our research evaluates the utility of advanced language models, BERT and MentalBERT, against traditional models such as SVM, Naive Bayes, and Logistic Regression. We find that traditional models are notably sensitive to the absence of explicit topic-related words, which could risk their potential to extend applications to emotional expressive narratives that lack clear mental health terminology. Despite MentalBERT’s design to better handle psychiatric contexts, it demonstrated a dependency on specific topic words for classification accuracy, raising concerns about its application in scenarios where explicit mental health terms are sparse (P-value < 0.05). In contrast, BERT(128) exhibited minimal sensitivity to the absence of topic words in ENS, suggesting its superior capability to understand deeper linguistic features, making it more effective for real-world applications that require nuanced text analysis. Both BERT and MentalBERT excel at recognizing linguistic nuances and maintaining classification accuracy even when narrative order is disrupted—a crucial capability in mental health narratives. This resilience is statistically significant, with sentence shuffling showing substantial impacts on model performance (P-value <0.05), especially evident in ENS comparisons between individuals with and without mental health declarations. These findings underscore the importance of exploring ENS for deeper insights into mental health-related narratives, advocating for a nuanced approach to mental health text analysis that moves beyond mere keyword detection. This study paves the way for more sophisticated, context-aware analyses in NLP applications, aiming to enhance the understanding of linguistic patterns associated with mental health conditions.