Powerful visual AI.Tiny footprint.
Moondream is an open-source visual language model that understands images using simple text prompts. It's fast, wildly capable — and just 1GB in size.
Vision AI at Warp Speed
Forget everything you thought you needed to know about computer vision. With Moondream, there's no training, no ground truth data, and no heavy infrastructure. Just a model, a prompt, and a whole world of visual understanding.
Ridiculously lightweight
Under 2B parameters. Quantized to 4-bit. Just 1GB. Moondream runs anywhere — from edge devices to your laptop.
Actually affordable
Run it locally for free. Or use our cloud API to process a high volume of images quickly and cheaply. Free tier included.
Simple by design
Choose a capability. Write a prompt. Get results. That's it. Moondream is designed for developers who don't want to babysit models.
Versatile as hell
Go beyond basic visual Q&A. Moondream can caption, detect objects, locate things, read documents, follow gaze, and more.
Tried, tested, trusted
6M+ downloads. 8K+ GitHub stars. Used across industries — from healthcare to robotics to mobile apps.
Ridiculously lightweight
Under 2B parameters. Quantized to 4-bit. Just 1GB. Moondream runs anywhere — from edge devices to your laptop.
Actually affordable
Run it locally for free. Or use our cloud API to process a high volume of images quickly and cheaply. Free tier included.
Simple by design
Choose a capability. Write a prompt. Get results. That's it. Moondream is designed for developers who don't want to babysit models.
Versatile as hell
Go beyond basic visual Q&A. Moondream can caption, detect objects, locate things, read documents, follow gaze, and more.
Tried, tested, trusted
6M+ downloads. 8K+ GitHub stars. Used across industries — from healthcare to robotics to mobile apps.
One Model, Many Capabilities
Moondream supports a growing set of visual capabilities — all accessible through natural-language prompts.
Image Captioning
Image Captioning
The image shows a man in a blue jumpsuit and yellow hard hat standing in a large industrial setting. He is wearing safety glasses and ear protection, and is holding a clipboard and a pen, appearing to be taking notes...

Visual Question Answering
Visual Question Answering
Yes, there is an unsecured truck parked in the area. The truck is filled with boxes, and it appears to be a delivery truck. The presence of the unsecured truck and the boxes suggests that it might be a delivery service or a delivery truck parked in a public area.

Object Detection
Object Detection
(x=0.431, y=0.713, x2=0.569, y2=0.921)

Pointing (x, y)
Pointing (x, y)
(x=0.431, y=0.505)

Gaze Detection
Gaze Detection
The operator is looking at the bottom-right section of the control panel, near the red warning light.

OCR & Document Understanding
OCR & Document Understanding
"Preface, The computing world has undergone a revolution since the publication of The C Programming Language in 1978. Big computers are much bigger, and personal computers have capabilities..."

Moondream is trusted by:
Blazingly Fast and Cost-Effective.
Moondream is the most efficient VLM ever built. Only 1GB size and packed with architectural optimizations, it runs fast on commodity hardware and scales beautifully in the cloud.
Fast
Blazing fast even on laptops and mobile
Low Memory
Low memory usage and power consumption
Cost-Effective
Lower cloud costs at any volume
Runs everywhere
No GPU rentals or server tuning required
Get Running in Minutes.
Moondream is open source and you can install and run it anywhere, for free. You can have it running on your computer or in our cloud in a matter of minutes.
- Moondream Server is free
- Works with our Python and Node clients
- Works offline, fully under your control
- CPU or GPU compatible
- No downloads required
- Free tier: 5,000 requests per day
- Works with same Python or Node clients
- Scales to production
Trusted by Developers Everywhere.
Used in real-world applications across retail, logistics, healthcare, defense,and more.