OS Agents: A Survey on MLLM-based Agents
for General Computing Devices Use

1Zhejiang University 2Fudan University 3OPPO AI Center
4University of Chinese Academy of Sciences
5Institute of Automation, Chinese Academy of Sciences
6The Chinese University of Hong Kong 7Tsinghua University 801.AI
9The Hong Kong Polytechnic University 10Shanghai Jiao Tong University

Project Lead Core Contributor *Corresponding Author
{huxueyu, sy_zhang}@zju.edu.cn

❗Why is there no arXiv link for this paper?
This paper was rejected by arXiv with the justification: "Our moderators determined that your submission does not contain sufficient original or substantive scholarly research and is not of interest to arXiv." This reasoning appears to be inconsistent with the content and contribution of the paper. We attempted an appeal, but unfortunately, this was unsuccessful, and no further explanation was provided. A resubmission did not resolve the issue either. As a result, the ONLY way to access the paper at the moment is through our GitHub Repository or via OpenReview Archive. We are disappointed by the lack of transparency in arXiv’s moderation process.

Overview of Survey

This project conducts a comprehensive survey on OS Agents, which are (M)LLM-based agents that use computing devices (e.g., computers and mobile phones) by operating within the environments and interfaces (e.g., Graphical User Interface (GUI)) provided by operating systems (OS) to automate tasks. The survey begins by elucidating the fundamentals of OS Agents, exploring their key components including the environment, observation space, and action space, and outlining essential capabilities such as understanding, planning, and grounding. Methodologies for constructing OS Agents are examined, with a focus on domain-specific foundation models and agent frameworks. A detailed review of evaluation protocols and benchmarks highlights how OS Agents are assessed across diverse tasks. Finally, current challenges and promising future research directions, including safety and privacy, personalization and self-evolution, are discussed. Ultimately, we hope this study will serve as a catalyst for innovation, driving meaningful progress in both academia and industry.

Contact

Please let us know if you find out a mistake or are interested in contributing by e-mail: huxueyu.zju@gmail.com.

Citation

❗Caution| Considering that the current bib citation points to our repository, we will update it to point to the paper as soon as the preprint server is available. Please stay tuned for updates. Before this, if you find our repository helpful, we would appreciate it if you could cite:

@misc{hu2024osagents,  
  title        = {OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use},  
  author       = {Xueyu Hu and Tao Xiong and Biao Yi and Zishu Wei and Ruixuan Xiao and Yurun Chen and Jiasheng Ye and Meiling Tao and Xiangxin Zhou and Ziyu Zhao and Yuhuai Li and Shengze Xu and Shawn Wang and Xinchen Xu and Shuofei Qiao and Kun Kuang and Tieyong Zeng and Liang Wang and Jiwei Li and Yuchen Eleanor Jiang and Wangchunshu Zhou and Guoyin Wang and Keting Yin and Zhou Zhao and Hongxia Yang and Fan Wu and Shengyu Zhang and Fei Wu},  
  year         = {2024},  
  howpublished = {\url{https://github.com/OS-Agent-Survey/OS-Agent-Survey/}}
}