Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

flexaihq/tt-inference-server

 
 

Repository files navigation

TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

https://github.com/tenstorrent/tt-inference-server

Getting Started

Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.

Note: models with Status [🔍 preview] are under active development. If you encounter setup or stability problems please file an issue and our team will address it.

LLMs

For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide.

Model Name Model URL Hardware Status tt-metal commit vLLM commit Docker Image
QwQ-32B HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc51 e2e0002a 0.0.4-v0.56.0-rc51-e2e0002ac7dc
DeepSeek-R1-Distill-Llama-70B HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Qwen2.5-72B HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-72B-Instruct HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-7B HF Repo n150 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-7B-Instruct HF Repo n150 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Llama-3.3-70B-Instruct HF Repo TT-LoudBox/TT-QuietBox ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-11B-Vision HF Repo n150 🔍 preview v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-11B-Vision-Instruct HF Repo n150 🔍 preview v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-1B HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-1B-Instruct HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-3B HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-3B-Instruct HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-70B HF Repo TT-LoudBox/TT-QuietBox ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-70B-Instruct HF Repo TT-LoudBox/TT-QuietBox ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-8B HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-8B-Instruct HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc

CNNs

Model Name Model URL Hardware Status Minimum Release Version
YOLOv4 GH Repo n150 🔍 preview v0.0.1

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.2%
  • Shell 10.9%
  • Dockerfile 3.8%
  • Jinja 0.1%