A curated list of AI tools and resources for developers, see the AI Resources .

Mobile-Agent

Mobile-Agent is a cross-platform GUI agent family for multimodal perception, planning and execution.

Detailed Introduction

Mobile-Agent is a cross-platform GUI agent family developed by Tongyi Lab / Alibaba, providing multimodal perception, planning and execution for desktop and mobile GUI automation. It emphasizes robust GUI understanding and end-to-end operation capabilities.

Main Features

  • Vision-based GUI perception and element localization.
  • Multimodal policies combining text and vision for planning and action.
  • Cross-platform demos and research components for PC and mobile.
  • A series of subprojects and academic papers supporting the framework.

Use Cases

Use cases include GUI automation testing, desktop/mobile operation automation, repeatable demo tasks (e.g., form filling, scripted interactions), and academic evaluation of interactive agents.

Technical Features

The project integrates perception, planning and execution, focusing on robust element recognition, multi-step recovery, and engineering practices that bridge research prototypes and deployable systems.

Mobile-Agent
Resource Info
🦾 Agents 🎨 Multimodal 📱 Application 🌱 Open Source