AI-Powered Texts Extracted from Images To Speech Synthesis
- DOI
- 10.2991/978-94-6463-805-9_25How to use a DOI?
- Keywords
- Image processing; text extracting; voice generating; neural network; OCR tools
- Abstract
Image processing has expanded widely with the popularization of AI tools. Indeed, images processed based on AI algorithms have been used to improve many fields such as medicine, education, intelligent surveillance and document processing. In this context, we have developed a system based on deep learning, allowing to analyse any type of image in order to extract text and convert it into an audio message. In order to achieve this, we started by scanning and cleaning the image, then we used neural networks to guarantee text recognition and finally we moved on to the generation of the voice. In order to ensure better experimentation, we used on the one hand a training of our system on star datasets such as ICDAR2013, Total Text, Street View Text and LJSpeech. On the other hand, we realized text recognition from pre-trained models like PaddleOCR and EasyOCR. The switch to text-to-speech was performed using Google TTS, as well as Tacotron and HiFi-GAN advanced architectures. The results of our study allowed us to optimize text acquisition and the quality of its conversion to audio while ensuring a more flexible and responsive text-to-speech conversion.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Amal Boumedjout AU - Fatima Kabli AU - Nawel Bendimrad AU - Souhila Kebdani AU - Bouchra Saidi PY - 2025 DA - 2025/08/05 TI - AI-Powered Texts Extracted from Images To Speech Synthesis BT - Proceedings of the First International Conference on Artificial Intelligence, Smart Technologies and Communications (AISTC 2025) PB - Atlantis Press SP - 220 EP - 227 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-805-9_25 DO - 10.2991/978-94-6463-805-9_25 ID - Boumedjout2025 ER -