Mind2Web: Towards a Generalist Agent for the Web

Deng, Xiang; Gu, Yu; Zheng, Boyuan; Chen, Shijie; Stevens, Samuel; Wang, Boshi; Sun, Huan; Su, Yu

Computer Science > Computation and Language

arXiv:2306.06070 (cs)

[Submitted on 9 Jun 2023 (v1), last revised 9 Dec 2023 (this version, v3)]

Title:Mind2Web: Towards a Generalist Agent for the Web

Authors:Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, Yu Su

View PDF HTML (experimental)

Abstract:We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use of real-world websites instead of simulated and simplified ones, and 3) a broad spectrum of user interaction patterns. Based on Mind2Web, we conduct an initial exploration of using large language models (LLMs) for building generalist web agents. While the raw HTML of real-world websites are often too large to be fed to LLMs, we show that first filtering it with a small LM significantly improves the effectiveness and efficiency of LLMs. Our solution demonstrates a decent level of performance, even on websites or entire domains the model has never seen before, but there is still a substantial room to improve towards truly generalizable agents. We open-source our dataset, model implementation, and trained models (this https URL) to facilitate further research on building a generalist agent for the web.

Comments:	Website: this https URL. Updated with supplementary material. NeurIPS'23 Spotlight
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.06070 [cs.CL]
	(or arXiv:2306.06070v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.06070

Submission history

From: Xiang Deng [view email]
[v1] Fri, 9 Jun 2023 17:44:31 UTC (9,798 KB)
[v2] Thu, 15 Jun 2023 03:50:30 UTC (9,791 KB)
[v3] Sat, 9 Dec 2023 05:57:46 UTC (9,772 KB)

Computer Science > Computation and Language

Title:Mind2Web: Towards a Generalist Agent for the Web

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mind2Web: Towards a Generalist Agent for the Web

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators