Explainable Bangla Linguistic Style Classification into Saint and Common Forms: A Deep Neural Network and Transformer-Based Approach with LIME-Based Interpretability
Abstract
This study explores a deep neural network and transformer-based framework for classifying Bengali texts into saint (Sadhu) and common (Cholito) forms. We evaluated six architectures: BiLSTM with attention, GRU-CNN, BanglaBERT, BanglaBERT-Enhanced CNN, XLM-RoBERTa Large, and SahajBERT, all implemented using the dataset BanglaBlend. Among these, transformer-based models, particularly SahajBERT and XLM-RoBERTa Large, consistently achieved high performance across the evaluation metrics. SahajBERT achieved the best overall results, with performance metrics of 0.95 ± 0.01, outperforming BiLSTM, GRU-CNN, and BanglaBERT by a significant margin in predictive accuracy and robustness. To enhance interpretability, we incorporated LIME, a widely used explainable AI (XAI) technique that provides token-level attribution for individual predictions. We further examined the robustness of these explanations across random seeds, assessed lexical overlap between splits to ensure fair evaluation, and benchmarked inference efficiency for the transformer models. This enables transparent validation of stylistic cues aligned with linguistic expectations. Our findings demonstrate the strength of transformer-based models in capturing stylistic and lexical distinctions in Bangla, setting a benchmark for future research in literary style detection, text normalization, and digital language preservation.
Type
Publication
2025 3rd International Conference on Big Data, IoT and Machine Learning