Explainable Bangla Linguistic Style Classification into Saint and Common Forms

October 28, 2025·
Gazi Maliha Raisa Noor
,
Afia Fahmida
Raihan Tanvir
Raihan Tanvir
,
Faisal Muhammad Shah
· 0 min read
Abstract
This study explores a deep neural network and transformer-based framework for classifying Bengali texts into saint (Sadhu) and common (Cholito) forms. We evaluated six architectures: BiLSTM with attention, GRU-CNN, BanglaBERT, BanglaBERT-Enhanced CNN, XLM-RoBERTa Large, and SahajBERT, all implemented using the dataset BanglaBlend. Among these, transformer-based models, particularly SahajBERT and XLM-RoBERTa Large, consistently achieved high performance across the evaluation metrics. SahajBERT achieved the best overall results, with performance metrics of 0.95 ± 0.01, outperforming BiLSTM, GRU-CNN, and BanglaBERT by a significant margin in predictive accuracy and robustness. To enhance interpretability, we incorporated LIME, a widely used explainable AI (XAI) technique that provides token-level attribution for individual predictions. We further examined the robustness of these explanations across random seeds, assessed lexical overlap between splits to ensure fair evaluation, and benchmarked inference efficiency for the transformer models. This enables transparent validation of stylistic cues aligned with linguistic expectations. Our findings demonstrate the strength of transformer-based models in capturing stylistic and lexical distinctions in Bangla, setting a benchmark for future research in literary style detection, text normalization, and digital language preservation.
Type
Publication
Proceedings of BIM 2025
publications
Raihan Tanvir
Authors
Senior Lecturer
I am Raihan Tanvir, currently serving as a Senior Lecturer in the Department of Computer Science and Engineering at Ahsanullah University of Science and Technology (AUST) in Dhaka, Bangladesh. My research spans Computer Vision, Natural Language Processing (NLP), Large Language Models (LLMs), Vision-Language Models (VLMs), and multimodal deep learning.