A Toolkit for Evaluating Large Vision-Language Models.
English | 简体中文
🏆 OC Learderboard • 🏗️Quickstart • 📊Datasets & Models • 🛠️Development
🤗 HF Leaderboard • 🤗 Evaluation Records • 🤗 HF Video Leaderboard •
🔊 Discord • 📝 Report • 🎯Goal • 🖊️Citation
VLMEvalKit (the python package name is vlmeval) is an open-source evaluation toolkit of large vision-language models (LVLMs). It enables one-command evaluation of LVLMs on various benchmarks, without the heavy workload of data preparation under multiple repositories. In VLMEvalKit, we adopt generation-based evaluation for all LVLMs, and provide the evaluation results obtained with both exact matching and LLM-based answer extraction.
See [QuickStart | 快速开始] for a quick start guide.
# Demo
from vlmeval.config import supported_VLM
model = supported_VLM['NEO1_0-2B-SFT']()
# Forward Single Image
ret = model.generate(['assets/apple.jpg', 'What is in this image?'])
print(ret) # The image features a red apple with a leaf on it.
# Forward Multiple Images
ret = model.generate(['assets/apple.jpg', 'assets/apple.jpg', 'How many apples are there in the provided images? '])
print(ret) # There are two apples in the provided images.