Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)

A Survey on Autonomous Mobile Navigation Using Multimodal Agents and Natural Language Commands

Authors
L. Durgadevi1, *, C. Dinesh Kumar2, G. Nithish3, R. S. Shankar4
1Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India
2Bachelor of Technology, Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India
3Bachelor of Technology, Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India
4Bachelor of Technology, Department of Information Technology, Sri Manakula Vinayagar Engineering College, Puducherry, India
*Corresponding author. Email: durgadevime.ap@gmail.com
Corresponding Author
L. Durgadevi
Available Online 31 March 2026.
DOI
10.2991/978-94-6239-616-6_15How to use a DOI?
Keywords
Accessibility; Android Accessibility Service; Speech-to-Text (STT); Voice-controlled navigation; Large Language Models (LLM); Crew AI; Lang Graph; Vision-Language Models (VLM); Multimodal AI; Hands-free smartphone control
Abstract

Smartphones are mostly controlled by touch. However, current voice assistants provide limited navigation, especially for complex or changing interfaces. This limitation drives research into multimodal systems that combine speech, vision, and language understanding. Past work in this area falls into three categories: rule-based, OCR-based, and large language model (LLM)-driven methods. This survey reviews and compares these approaches, focusing on their methods, strengths, and weaknesses. It identifies major research gaps, such as adapting to changing user interfaces, improving multimodal perception, and the absence of unified benchmarks. The survey also points out ongoing challenges like latency, privacy, and real-world scalability, particularly in hands-free and accessibility situations. By summarizing recent advancements and comparing key systems, this work offers a helpful reference for researchers and practitioners looking to improve autonomous mobile navigation.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
Series
Advances in Intelligent Systems Research
Publication Date
31 March 2026
ISBN
978-94-6239-616-6
ISSN
1951-6851
DOI
10.2991/978-94-6239-616-6_15How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - L. Durgadevi
AU  - C. Dinesh Kumar
AU  - G. Nithish
AU  - R. S. Shankar
PY  - 2026
DA  - 2026/03/31
TI  - A Survey on Autonomous Mobile Navigation Using Multimodal Agents and Natural Language Commands
BT  - Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
PB  - Atlantis Press
SP  - 174
EP  - 190
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-616-6_15
DO  - 10.2991/978-94-6239-616-6_15
ID  - Durgadevi2026
ER  -