A Survey on Autonomous Mobile Navigation Using Multimodal Agents and Natural Language Commands

L. Durgadevi; C. Dinesh Kumar; G. Nithish; R. S. Shankar

doi:10.2991/978-94-6239-616-6_15

<Previous Article In Volume

Next Article In Volume>

A Survey on Autonomous Mobile Navigation Using Multimodal Agents and Natural Language Commands

Authors

L. Durgadevi¹^{, *}, C. Dinesh Kumar², G. Nithish³, R. S. Shankar⁴

¹Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India

²Bachelor of Technology, Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India

³Bachelor of Technology, Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India

⁴Bachelor of Technology, Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India

^*Corresponding author. Email: durgadevime.ap@gmail.com

Corresponding Author

L. Durgadevi

Available Online 31 March 2026.

DOI: 10.2991/978-94-6239-616-6_15 How to use a DOI?
Keywords: Accessibility; Android Accessibility Service; Speech-to-Text (STT); Voice-controlled navigation; Large Language Models (LLM); Crew AI; Lang Graph; Vision-Language Models (VLM); Multimodal AI; Hands-free smartphone control
Abstract: Smartphones are mostly controlled by touch. However, current voice assistants provide limited navigation, especially for complex or changing interfaces. This limitation drives research into multimodal systems that combine speech, vision, and language understanding. Past work in this area falls into three categories: rule-based, OCR-based, and large language model (LLM)-driven methods. This survey reviews and compares these approaches, focusing on their methods, strengths, and weaknesses. It identifies major research gaps, such as adapting to changing user interfaces, improving multimodal perception, and the absence of unified benchmarks. The survey also points out ongoing challenges like latency, privacy, and real-world scalability, particularly in hands-free and accessibility situations. By summarizing recent advancements and comparing key systems, this work offers a helpful reference for researchers and practitioners looking to improve autonomous mobile navigation.
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 31 March 2026
ISBN: 978-94-6239-616-6
ISSN: 1951-6851
DOI: 10.2991/978-94-6239-616-6_15 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - L. Durgadevi
AU  - C. Dinesh Kumar
AU  - G. Nithish
AU  - R. S. Shankar
PY  - 2026
DA  - 2026/03/31
TI  - A Survey on Autonomous Mobile Navigation Using Multimodal Agents and Natural Language Commands
BT  - Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
PB  - Atlantis Press
SP  - 174
EP  - 190
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-616-6_15
DO  - 10.2991/978-94-6239-616-6_15
ID  - Durgadevi2026
ER  -

download .riscopy to clipboard