Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)

Enhancing Bangla Document Classification Using a Hybrid Ensemble of Bangla-BERT and Bi-LSTM Models

Authors
Nafia Islam Naina1, Khondoker Sabit Uz Zaman1, Mahedi Hasan Emon1, Md. Sadekur Rahman1, *
1Department of Computer Science and Engineering, Daffodil International University, Dhaka, 1216, Bangladesh
*Corresponding author. Email: sadekur.cse@daffodilvarsity.edu.bd
Corresponding Author
Md. Sadekur Rahman
Available Online 8 June 2026.
DOI
10.2991/978-94-6239-664-7_37How to use a DOI?
Keywords
Bangla Document Classification; Natural Language Processing; Deep Learning Models; Bangla BERT; Bi-LSTM; Hybrid Ensemble Model
Abstract

With the tremendous increase of digital content in Bangla, the demand for automated document classification systems has been increasing. The complex morphology of the language, the absence of sufficient labeled documents, and the different writing styles make Bangla document classification hard to achieve. This research proposes a solution to overcome these problems by a new hybrid model that utilizes both Bi-LSTM and Bangla BERT models to complement each other in sequential dependency capturing and contextual embedding. To implement a robust Bangla document classification system, the Potrika dataset has been used in this work that consists of hundreds of thousands multi-category Bangla news articles. The data went through an extensive preprocessing pipeline which includes data cleaning, text normalization, tokenization, stop word removal, stemming, label encoding and more. From the entire dataset, 80% data was set aside for training and 20% for testing. Both models were trained individually. A soft voting ensemble was then performed on the outputs of both models to obtain the final predictions. The hybrid ensemble model is found to provide better performance than individual models with an accuracy of 97.16%, as shown through experimental results. Moreover, the proposed method outperforms all previously published Bangla document classification techniques in terms of accuracy and robustness. The results demonstrate the usefulness of ensemble methods in enhancing classification performance, which opens further research directions for Bangla NLP.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series
Advances in Intelligent Systems Research
Publication Date
8 June 2026
ISBN
978-94-6239-664-7
ISSN
1951-6851
DOI
10.2991/978-94-6239-664-7_37How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Nafia Islam Naina
AU  - Khondoker Sabit Uz Zaman
AU  - Mahedi Hasan Emon
AU  - Md. Sadekur Rahman
PY  - 2026
DA  - 2026/06/08
TI  - Enhancing Bangla Document Classification Using a Hybrid Ensemble of Bangla-BERT and Bi-LSTM Models
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 533
EP  - 548
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_37
DO  - 10.2991/978-94-6239-664-7_37
ID  - Naina2026
ER  -