VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Zhao, Zixiao; Sun, Jing; Wei, Zhiyuan; Cai, Cheng-Hao; Hou, Zhe; Dong, Jin Song

Computer Science > Software Engineering

arXiv:2410.19245 (cs)

[Submitted on 25 Oct 2024]

Title:VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Authors:Zixiao Zhao, Jing Sun, Zhiyuan Wei, Cheng-Hao Cai, Zhe Hou, Jin Song Dong

View PDF HTML (experimental)

Abstract:In the field of automated programming, large language models (LLMs) have demonstrated foundational generative capabilities when given detailed task descriptions. However, their current functionalities are primarily limited to function-level development, restricting their effectiveness in complex project environments and specific application scenarios, such as complicated image-processing tasks. This paper presents a multi-agent framework that utilises a hybrid set of LLMs, including GPT-4o and locally deployed open-source models, which collaboratively complete auto-programming tasks. Each agent plays a distinct role in the software development cycle, collectively forming a virtual organisation that works together to produce software products. By establishing a tree-structured thought distribution and development mechanism across project, module, and function levels, this framework offers a cost-effective and efficient solution for code generation. We evaluated our approach using benchmark datasets, and the experimental results demonstrate that VisionCoder significantly outperforms existing methods in image processing auto-programming tasks.

Subjects:	Software Engineering (cs.SE); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
Cite as:	arXiv:2410.19245 [cs.SE]
	(or arXiv:2410.19245v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2410.19245

Submission history

From: Zixiao Zhao [view email]
[v1] Fri, 25 Oct 2024 01:52:15 UTC (18,415 KB)

Computer Science > Software Engineering

Title:VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators