Proceedings of the First International Conference on Artificial Intelligence, Smart Technologies and Communications (AISTC 2025)

AI-Powered Texts Extracted from Images To Speech Synthesis

Authors
Amal Boumedjout1, 2, *, Fatima Kabli1, 3, Nawel Bendimrad1, Souhila Kebdani1, Bouchra Saidi1
1Polytechnic National School of Oran (ENPO-MA), Oran, 31000, Algeria
2Signal Image Speech Laboratory (SIMPA), USTO-MB, Oran, Algeria
3Mechanical Manufacturing Technology Research Laboratory (LaRTFM), Oran, Algeria
*Corresponding author. Email: amal.boumedjout@enp-oran.dz
Corresponding Author
Amal Boumedjout
Available Online 5 August 2025.
DOI
10.2991/978-94-6463-805-9_25How to use a DOI?
Keywords
Image processing; text extracting; voice generating; neural network; OCR tools
Abstract

Image processing has expanded widely with the popularization of AI tools. Indeed, images processed based on AI algorithms have been used to improve many fields such as medicine, education, intelligent surveillance and document processing. In this context, we have developed a system based on deep learning, allowing to analyse any type of image in order to extract text and convert it into an audio message. In order to achieve this, we started by scanning and cleaning the image, then we used neural networks to guarantee text recognition and finally we moved on to the generation of the voice. In order to ensure better experimentation, we used on the one hand a training of our system on star datasets such as ICDAR2013, Total Text, Street View Text and LJSpeech. On the other hand, we realized text recognition from pre-trained models like PaddleOCR and EasyOCR. The switch to text-to-speech was performed using Google TTS, as well as Tacotron and HiFi-GAN advanced architectures. The results of our study allowed us to optimize text acquisition and the quality of its conversion to audio while ensuring a more flexible and responsive text-to-speech conversion.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the First International Conference on Artificial Intelligence, Smart Technologies and Communications (AISTC 2025)
Series
Advances in Intelligent Systems Research
Publication Date
5 August 2025
ISBN
978-94-6463-805-9
ISSN
1951-6851
DOI
10.2991/978-94-6463-805-9_25How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Amal Boumedjout
AU  - Fatima Kabli
AU  - Nawel Bendimrad
AU  - Souhila Kebdani
AU  - Bouchra Saidi
PY  - 2025
DA  - 2025/08/05
TI  - AI-Powered Texts Extracted from Images To Speech Synthesis
BT  - Proceedings of the First International Conference on Artificial Intelligence, Smart Technologies and Communications (AISTC 2025)
PB  - Atlantis Press
SP  - 220
EP  - 227
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-805-9_25
DO  - 10.2991/978-94-6463-805-9_25
ID  - Boumedjout2025
ER  -