Explainable Bangla Linguistic Style Classification into Saint and Common Forms
October 28, 2025·,
,·
0 min read
Gazi Maliha Raisa Noor
Afia Fahmida
Raihan Tanvir
Faisal Muhammad Shah
Abstract
This study explores a deep neural network and transformer-based framework for
classifying Bengali texts into saint (Sadhu) and common (Cholito) forms. We
evaluated six architectures: BiLSTM with attention, GRU-CNN, BanglaBERT,
BanglaBERT-Enhanced CNN, XLM-RoBERTa Large, and SahajBERT, all implemented
using the dataset BanglaBlend. Among these, transformer-based models, particularly
SahajBERT and XLM-RoBERTa Large, consistently achieved high performance across
the evaluation metrics. SahajBERT achieved the best overall results, with
performance metrics of 0.95 ± 0.01, outperforming BiLSTM, GRU-CNN, and
BanglaBERT by a significant margin in predictive accuracy and robustness. To
enhance interpretability, we incorporated LIME, a widely used explainable AI
(XAI) technique that provides token-level attribution for individual predictions.
We further examined the robustness of these explanations across random seeds,
assessed lexical overlap between splits to ensure fair evaluation, and benchmarked
inference efficiency for the transformer models. This enables transparent
validation of stylistic cues aligned with linguistic expectations. Our findings
demonstrate the strength of transformer-based models in capturing stylistic and
lexical distinctions in Bangla, setting a benchmark for future research in
literary style detection, text normalization, and digital language preservation.
Type
Publication
Proceedings of BIM 2025
Natural Language Processing
Deep Learning
Transformer Models
Explainable AI
Bengali Language Processing

Authors
Senior Lecturer
I am Raihan Tanvir, currently serving as a Senior Lecturer in the Department of Computer Science and Engineering at Ahsanullah University of Science and Technology (AUST) in Dhaka, Bangladesh. My research spans Computer Vision, Natural Language Processing (NLP), Large Language Models (LLMs), Vision-Language Models (VLMs), and multimodal deep learning.