Speakers Identification Using Diarization Techniques

Vinod K. Pande; Vijay K. Kale

doi:10.2991/978-94-6463-136-4_80

<Previous Article In Volume

Next Article In Volume>

Speakers Identification Using Diarization Techniques

Authors

Vinod K. Pande¹^{, *}, Vijay K. Kale¹

¹Dr. G. Y. Pathrikar College of Computer Science and Information Technology, MGM University, Aurangabad, Maharashtra, India

^*Corresponding author. Email: vinodkpande2014@gmail.com

Corresponding Author

Vinod K. Pande

Available Online 1 May 2023.

DOI: 10.2991/978-94-6463-136-4_80 How to use a DOI?
Keywords: Speaker Diarization; End-to-End Neural Diarization(EEND); Mel Frequency Cepstrum Coefficients (MFCC); Generative Adversarial Networks (GANs); Hidden Markov Model (HMM)
Abstract: Research work analyses speaker voice identification and voice separation development methodologies and show an overview of the findings. Several speech recognition methods, such as Mel Frequency Cepstrum Coefficients (MFCC), Vector Quantization (VQ), Hidden Markov Model (HMM), Long Short-Term Memory (LSTM), End-to-End Neural Diarization (EEND), Generative Adversarial Networks (GANs), Convolutional Neural Networks, and Audio Embeddiment, can be used for adaptive processing with multiple speakers identification in audio data. Additionally, we addressed the uses of speaker diarization, the potential for future development, and the databases used to evaluate diarization systems.

The speaker diarization method consists of seven steps, including input, front-end processing, speech activity detection, segmentation, speaker embedding, clustering post-processing, and output.

Speaker identification recognizes speakers during an audio conversion, a kind of speech recognition. Diarization of the speaker is a way of recognizing the speaker in a multi-speaker audio file. And The procedure of identifying who talks when in an audio recording is known as speaker diarization. The audio file includes information from conferences, broadcast news, and any other public gathering with many speakers.
Copyright: © 2023 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)
Series: Advances in Computer Science Research
Publication Date: 1 May 2023
ISBN: 978-94-6463-136-4
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-136-4_80 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Vinod K. Pande
AU  - Vijay K. Kale
PY  - 2023
DA  - 2023/05/01
TI  - Speakers Identification Using Diarization Techniques
BT  - Proceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)
PB  - Atlantis Press
SP  - 905
EP  - 915
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-136-4_80
DO  - 10.2991/978-94-6463-136-4_80
ID  - Pande2023
ER  -

download .riscopy to clipboard