Enhancing Bangla Document Classification Using a Hybrid Ensemble of Bangla-BERT and Bi-LSTM Models

Nafia Islam Naina; Khondoker Sabit Uz Zaman; Mahedi Hasan Emon; Md. Sadekur Rahman

doi:10.2991/978-94-6239-664-7_37

<Previous Article In Volume

Next Article In Volume>

Enhancing Bangla Document Classification Using a Hybrid Ensemble of Bangla-BERT and Bi-LSTM Models

Authors

Nafia Islam Naina¹, Khondoker Sabit Uz Zaman¹, Mahedi Hasan Emon¹, Md. Sadekur Rahman¹^{, *}

¹Department of Computer Science and Engineering, Daffodil International University, Dhaka, 1216, Bangladesh

^*Corresponding author. Email: sadekur.cse@daffodilvarsity.edu.bd

Corresponding Author

Md. Sadekur Rahman

Available Online 8 June 2026.

DOI: 10.2991/978-94-6239-664-7_37 How to use a DOI?
Keywords: Bangla Document Classification; Natural Language Processing; Deep Learning Models; Bangla BERT; Bi-LSTM; Hybrid Ensemble Model
Abstract: With the tremendous increase of digital content in Bangla, the demand for automated document classification systems has been increasing. The complex morphology of the language, the absence of sufficient labeled documents, and the different writing styles make Bangla document classification hard to achieve. This research proposes a solution to overcome these problems by a new hybrid model that utilizes both Bi-LSTM and Bangla BERT models to complement each other in sequential dependency capturing and contextual embedding. To implement a robust Bangla document classification system, the Potrika dataset has been used in this work that consists of hundreds of thousands multi-category Bangla news articles. The data went through an extensive preprocessing pipeline which includes data cleaning, text normalization, tokenization, stop word removal, stemming, label encoding and more. From the entire dataset, 80% data was set aside for training and 20% for testing. Both models were trained individually. A soft voting ensemble was then performed on the outputs of both models to obtain the final predictions. The hybrid ensemble model is found to provide better performance than individual models with an accuracy of 97.16%, as shown through experimental results. Moreover, the proposed method outperforms all previously published Bangla document classification techniques in terms of accuracy and robustness. The results demonstrate the usefulness of ensemble methods in enhancing classification performance, which opens further research directions for Bangla NLP.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 8 June 2026
ISBN: 978-94-6239-664-7
ISSN: 1951-6851
DOI: 10.2991/978-94-6239-664-7_37 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Nafia Islam Naina
AU  - Khondoker Sabit Uz Zaman
AU  - Mahedi Hasan Emon
AU  - Md. Sadekur Rahman
PY  - 2026
DA  - 2026/06/08
TI  - Enhancing Bangla Document Classification Using a Hybrid Ensemble of Bangla-BERT and Bi-LSTM Models
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 533
EP  - 548
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_37
DO  - 10.2991/978-94-6239-664-7_37
ID  - Naina2026
ER  -

download .riscopy to clipboard