Secure Deepfake Audio Detection with a Soft-Voting Ensemble of PGD-Hardened Heterogeneous Models

Aisha Tasnim Aishy; Abdur Rahman Wahid; Rafshia Mahbuba Ayshe; M. Shahriar Mahmud Rafi; Mohammed Maruf Hossen; Fairuz Nowshin Tohfa

doi:10.2991/978-94-6239-664-7_64

<Previous Article In Volume

Next Article In Volume>

Secure Deepfake Audio Detection with a Soft-Voting Ensemble of PGD-Hardened Heterogeneous Models

Authors

Aisha Tasnim Aishy¹, Abdur Rahman Wahid¹, Rafshia Mahbuba Ayshe¹, M. Shahriar Mahmud Rafi¹^{, *}, Mohammed Maruf Hossen¹, Fairuz Nowshin Tohfa¹

¹East Delta University, Chittagong, Bangladesh

^*Corresponding author. Email: shahriarrafi30@gmail.com

Corresponding Author

M. Shahriar Mahmud Rafi

Available Online 8 June 2026.

DOI: 10.2991/978-94-6239-664-7_64 How to use a DOI?
Keywords: Deepfake Audio Detection; Ensemble Learning; Adversarial Robustness; Projected Gradient Descent (PGD); Soft Voting; Mel-Spectrograms; Audio Classification
Abstract: This study introduces a dependable method for detecting deepfake audio by combining multiple deep learning models into a single, unified system. The approach integrates two ResNet models and one CNN model, using a soft-voting strategy to merge their predictions and achieve higher overall accuracy and stability. To defend against adversarial attacks—small changes meant to fool the system—we employ adversarial training with the Projected Gradient Descent (PGD) method. This process strengthens the models by helping them learn more robust features, making the system significantly harder to bypass. Through extensive testing, our method achieved an accuracy of 89.00% and an F1-score of 90.04%, representing a 3.83% improvement over the strongest individual model. Moreover, the system demonstrated exceptional resistance to PGD attacks, with a success rate of only 0.16%. By combining diverse model architectures and incorporating proactive defenses, this research offers a practical and trustworthy solution for deepfake audio detection, contributing to greater security and authenticity in digital communications.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 8 June 2026
ISBN: 978-94-6239-664-7
ISSN: 1951-6851
DOI: 10.2991/978-94-6239-664-7_64 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Aisha Tasnim Aishy
AU  - Abdur Rahman Wahid
AU  - Rafshia Mahbuba Ayshe
AU  - M. Shahriar Mahmud Rafi
AU  - Mohammed Maruf Hossen
AU  - Fairuz Nowshin Tohfa
PY  - 2026
DA  - 2026/06/08
TI  - Secure Deepfake Audio Detection with a Soft-Voting Ensemble of PGD-Hardened Heterogeneous Models
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 932
EP  - 946
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_64
DO  - 10.2991/978-94-6239-664-7_64
ID  - Aishy2026
ER  -

download .riscopy to clipboard