OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use (ACL 2025)

OS Agents: A Survey on MLLM-based Agents
for Computer, Phone and Browser Use (ACL 2025 Oral)

¹Zhejiang University ²Fudan University ³OPPO AI Center
⁴University of Chinese Academy of Sciences
⁵Institute of Automation, Chinese Academy of Sciences
⁶The Chinese University of Hong Kong ⁷Tsinghua University ⁸Shanghai Jiao Tong University
⁹01.AI ¹⁰The Hong Kong Polytechnic University

^†Project Lead ^‡Core Contributor ^*Corresponding Author
{huxueyu, sy_zhang}@zju.edu.cn

Overview of Survey

This project conducts a comprehensive survey on OS Agents, which are (M)LLM-based Agents using computers, phones and browsers by operating within the environments and interfaces (e.g., Graphical User Interface (GUI) and Command Line Interface (CLI)) provided by operating systems (OS) to automate tasks. The survey begins by elucidating the fundamentals of OS Agents, exploring their key components including the environment, observation space, and action space, and outlining essential capabilities such as understanding, planning, and grounding. Methodologies for constructing OS Agents are examined, with a focus on domain-specific foundation models and agent frameworks. A detailed review of evaluation protocols and benchmarks highlights how OS Agents are assessed across diverse tasks. Finally, current challenges and promising future research directions, including safety and privacy, personalization and self-evolution, are discussed. Ultimately, we hope this study will serve as a catalyst for innovation, driving meaningful progress in both academia and industry.

Citation

If you find our work valuable for your research or applications, we would greatly appreciate a star ⭐ and a citation using the BibTeX entry provided below.

@misc{hu2024agents,
  title={OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use},
  author={Hu, Xueyu and Xiong, Tao and Yi, Biao and Wei, Zishu and Xiao, Ruixuan and Chen, Yurun and Ye, Jiasheng and Tao, Meiling and Zhou, Xiangxin and Zhao, Ziyu and others},
  year={2024},
  publisher={OpenReview}
}
}

OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use (ACL 2025 Oral)

Overview of Survey

Contact

Citation

OS Agents: A Survey on MLLM-based Agents
for Computer, Phone and Browser Use (ACL 2025 Oral)