Image-based News Aggregator Using OCR and NLP for Summarization
- DOI
- 10.2991/978-94-6239-674-6_40How to use a DOI?
- Keywords
- OCR; NLP; Summarization; Image Processing; News Aggregator; Text Extraction; Tesseract; Transformers
- Abstract
The development of digital information raises the demand for insight extraction from large data in the shortest possible time. Users suffer from inability to keep updated due to growing online news content and time limits. There-fore, this solution combines text extraction using OCR (Tesseract) in newspaper images, real-time news using NewsAPI.org, and abstractive summarization. This approach condenses articles into compact, easily understandable form. Sum-maries can be turned into audio using TTS tools like gTTS or pyttsx3 to increase accessibility. These combined technologies provide news faster, personalized, and easily digestible to the users without actually reading full articles.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - P. Santosh Reddy AU - S. S. Sanjan AU - Spandhana K. Devadiga AU - Vinati Thakkar PY - 2026 DA - 2026/05/28 TI - Image-based News Aggregator Using OCR and NLP for Summarization BT - Proceedings of the International Conference on Sustainable Computing and Artificial Intelligence (ICSCAI 2025) PB - Atlantis Press SP - 489 EP - 498 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6239-674-6_40 DO - 10.2991/978-94-6239-674-6_40 ID - Reddy2026 ER -