Proceedings of the International Conference on Advances in Computing Technology and Artificial Intelligence (COMPUTATIA 2026)

International Conference on Advances in Computing Technology and Artificial Intelligence (COMPUTATIA 2026)

📍Jaipur, India🗓️ 23-24 March 2026

Low-Latency Sentiment and Emotion Mining from Streaming Voice Transcriptions

Authors
Mopuri Rishitha1, *, K. Madhumita1, M. Chandraleka1, M. Rahul Raj1
1Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India
*Corresponding author. Email: rishimopuri@gmail.com
Corresponding Author
Mopuri Rishitha
Available Online 25 June 2026.
DOI
10.2991/978-94-6239-713-2_44How to use a DOI?
Keywords
ASR; Multimodal Fusion; BERT; Latency-Accuracy Tradeoff; Word Error Rate (WER); Prosodic Feature Extraction
Abstract

Real-time speech emotion analysis is crucial in domains such as call center analytics and human-computer interaction. Although many existing emotion recognition systems achieve high accuracy, they often operate offline and overlook the impact of transcription errors and processing delays in real-time systems. This work presents a low-latency, multimodal framework that integrates Whisper-based Automatic Speech Recognition (ASR) with a fine-tuned BERT-based emotion classifier and real-time prosodic feature extraction to detect emotion from streaming voice calls. The system processes audio in chunks for real-time prediction and uses metrics such as Word Error Rate (WER), emotion accuracy, and overall pipeline latency to evaluate performance. A key contribution of this study is the analysis of the correlation between ASR accuracy and emotion prediction confidence, along with the role of specific acoustic features—pitch (YIN algorithm), energy, and zero-crossing rate (ZCR)—in improving classification robustness against transcription noise. The proposed multimodal framework reported an emotion classification accuracy of 32.00% and a weighted F1-score of 0.2280 on the CREMA-D dataset. Interestingly, the optimized pipeline using Whisper Tiny and DistilBERT reported an average end-to-end latency of 9.78 ms, which is substantially lower than the standard human conversation perception time of 200 ms.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Advances in Computing Technology and Artificial Intelligence (COMPUTATIA 2026)
Series
Atlantis Highlights in Intelligent Systems
Publication Date
25 June 2026
ISBN
978-94-6239-713-2
ISSN
2589-4919
DOI
10.2991/978-94-6239-713-2_44How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Mopuri Rishitha
AU  - K. Madhumita
AU  - M. Chandraleka
AU  - M. Rahul Raj
PY  - 2026
DA  - 2026/06/25
TI  - Low-Latency Sentiment and Emotion Mining from Streaming Voice Transcriptions
BT  - Proceedings of the International Conference on Advances in Computing Technology and Artificial Intelligence (COMPUTATIA 2026)
PB  - Atlantis Press
SP  - 591
EP  - 606
SN  - 2589-4919
UR  - https://doi.org/10.2991/978-94-6239-713-2_44
DO  - 10.2991/978-94-6239-713-2_44
ID  - Rishitha2026
ER  -