Enhancing Bangla Document Classification Using a Hybrid Ensemble of Bangla-BERT and Bi-LSTM Models
- DOI
- 10.2991/978-94-6239-664-7_37How to use a DOI?
- Keywords
- Bangla Document Classification; Natural Language Processing; Deep Learning Models; Bangla BERT; Bi-LSTM; Hybrid Ensemble Model
- Abstract
With the tremendous increase of digital content in Bangla, the demand for automated document classification systems has been increasing. The complex morphology of the language, the absence of sufficient labeled documents, and the different writing styles make Bangla document classification hard to achieve. This research proposes a solution to overcome these problems by a new hybrid model that utilizes both Bi-LSTM and Bangla BERT models to complement each other in sequential dependency capturing and contextual embedding. To implement a robust Bangla document classification system, the Potrika dataset has been used in this work that consists of hundreds of thousands multi-category Bangla news articles. The data went through an extensive preprocessing pipeline which includes data cleaning, text normalization, tokenization, stop word removal, stemming, label encoding and more. From the entire dataset, 80% data was set aside for training and 20% for testing. Both models were trained individually. A soft voting ensemble was then performed on the outputs of both models to obtain the final predictions. The hybrid ensemble model is found to provide better performance than individual models with an accuracy of 97.16%, as shown through experimental results. Moreover, the proposed method outperforms all previously published Bangla document classification techniques in terms of accuracy and robustness. The results demonstrate the usefulness of ensemble methods in enhancing classification performance, which opens further research directions for Bangla NLP.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Nafia Islam Naina AU - Khondoker Sabit Uz Zaman AU - Mahedi Hasan Emon AU - Md. Sadekur Rahman PY - 2026 DA - 2026/06/08 TI - Enhancing Bangla Document Classification Using a Hybrid Ensemble of Bangla-BERT and Bi-LSTM Models BT - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025) PB - Atlantis Press SP - 533 EP - 548 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6239-664-7_37 DO - 10.2991/978-94-6239-664-7_37 ID - Naina2026 ER -