AI-Powered Texts Extracted from Images To Speech Synthesis

Amal Boumedjout; Fatima Kabli; Nawel Bendimrad; Souhila Kebdani; Bouchra Saidi

doi:10.2991/978-94-6463-805-9_25

<Previous Article In Volume

Next Article In Volume>

AI-Powered Texts Extracted from Images To Speech Synthesis

Authors

Amal Boumedjout¹^{, 2}^{, *}, Fatima Kabli¹^{, 3}, Nawel Bendimrad¹, Souhila Kebdani¹, Bouchra Saidi¹

¹Polytechnic National School of Oran (ENPO-MA), Oran, 31000, Algeria

²Signal Image Speech Laboratory (SIMPA), USTO-MB, Oran, Algeria

³Mechanical Manufacturing Technology Research Laboratory (LaRTFM), Oran, Algeria

^*Corresponding author. Email: amal.boumedjout@enp-oran.dz

Corresponding Author

Amal Boumedjout

Available Online 5 August 2025.

DOI: 10.2991/978-94-6463-805-9_25 How to use a DOI?
Keywords: Image processing; text extracting; voice generating; neural network; OCR tools
Abstract: Image processing has expanded widely with the popularization of AI tools. Indeed, images processed based on AI algorithms have been used to improve many fields such as medicine, education, intelligent surveillance and document processing. In this context, we have developed a system based on deep learning, allowing to analyse any type of image in order to extract text and convert it into an audio message. In order to achieve this, we started by scanning and cleaning the image, then we used neural networks to guarantee text recognition and finally we moved on to the generation of the voice. In order to ensure better experimentation, we used on the one hand a training of our system on star datasets such as ICDAR2013, Total Text, Street View Text and LJSpeech. On the other hand, we realized text recognition from pre-trained models like PaddleOCR and EasyOCR. The switch to text-to-speech was performed using Google TTS, as well as Tacotron and HiFi-GAN advanced architectures. The results of our study allowed us to optimize text acquisition and the quality of its conversion to audio while ensuring a more flexible and responsive text-to-speech conversion.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the First International Conference on Artificial Intelligence, Smart Technologies and Communications (AISTC 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 5 August 2025
ISBN: 978-94-6463-805-9
ISSN: 1951-6851
DOI: 10.2991/978-94-6463-805-9_25 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Amal Boumedjout
AU  - Fatima Kabli
AU  - Nawel Bendimrad
AU  - Souhila Kebdani
AU  - Bouchra Saidi
PY  - 2025
DA  - 2025/08/05
TI  - AI-Powered Texts Extracted from Images To Speech Synthesis
BT  - Proceedings of the First International Conference on Artificial Intelligence, Smart Technologies and Communications (AISTC 2025)
PB  - Atlantis Press
SP  - 220
EP  - 227
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-805-9_25
DO  - 10.2991/978-94-6463-805-9_25
ID  - Boumedjout2025
ER  -

download .riscopy to clipboard