Low-Latency Sentiment and Emotion Mining from Streaming Voice Transcriptions

Mopuri Rishitha; K. Madhumita; M. Chandraleka; M. Rahul Raj

doi:10.2991/978-94-6239-713-2_44

<Previous Article In Volume

Next Article In Volume>

Low-Latency Sentiment and Emotion Mining from Streaming Voice Transcriptions

Authors

Mopuri Rishitha¹^{, *}, K. Madhumita¹, M. Chandraleka¹, M. Rahul Raj¹

¹Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India

^*Corresponding author. Email: rishimopuri@gmail.com

Corresponding Author

Mopuri Rishitha

Available Online 25 June 2026.

DOI: 10.2991/978-94-6239-713-2_44 How to use a DOI?
Keywords: ASR; Multimodal Fusion; BERT; Latency-Accuracy Tradeoff; Word Error Rate (WER); Prosodic Feature Extraction
Abstract: Real-time speech emotion analysis is crucial in domains such as call center analytics and human-computer interaction. Although many existing emotion recognition systems achieve high accuracy, they often operate offline and overlook the impact of transcription errors and processing delays in real-time systems. This work presents a low-latency, multimodal framework that integrates Whisper-based Automatic Speech Recognition (ASR) with a fine-tuned BERT-based emotion classifier and real-time prosodic feature extraction to detect emotion from streaming voice calls. The system processes audio in chunks for real-time prediction and uses metrics such as Word Error Rate (WER), emotion accuracy, and overall pipeline latency to evaluate performance. A key contribution of this study is the analysis of the correlation between ASR accuracy and emotion prediction confidence, along with the role of specific acoustic features—pitch (YIN algorithm), energy, and zero-crossing rate (ZCR)—in improving classification robustness against transcription noise. The proposed multimodal framework reported an emotion classification accuracy of 32.00% and a weighted F1-score of 0.2280 on the CREMA-D dataset. Interestingly, the optimized pipeline using Whisper Tiny and DistilBERT reported an average end-to-end latency of 9.78 ms, which is substantially lower than the standard human conversation perception time of 200 ms.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Advances in Computing Technology and Artificial Intelligence (COMPUTATIA 2026)
Series: Atlantis Highlights in Intelligent Systems
Publication Date: 25 June 2026
ISBN: 978-94-6239-713-2
ISSN: 2589-4919
DOI: 10.2991/978-94-6239-713-2_44 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Mopuri Rishitha
AU  - K. Madhumita
AU  - M. Chandraleka
AU  - M. Rahul Raj
PY  - 2026
DA  - 2026/06/25
TI  - Low-Latency Sentiment and Emotion Mining from Streaming Voice Transcriptions
BT  - Proceedings of the International Conference on Advances in Computing Technology and Artificial Intelligence (COMPUTATIA 2026)
PB  - Atlantis Press
SP  - 591
EP  - 606
SN  - 2589-4919
UR  - https://doi.org/10.2991/978-94-6239-713-2_44
DO  - 10.2991/978-94-6239-713-2_44
ID  - Rishitha2026
ER  -

download .riscopy to clipboard