TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

https://github.com/tenstorrent/tt-inference-server

Getting Started

Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.

Note: models with Status [🔍 preview] are under active development. If you encounter setup or stability problems please file an issue and our team will address it.

LLMs

For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide.

Model Name	Model URL	Hardware	Status	tt-metal commit	vLLM commit	Docker Image
QwQ-32B	HF Repo	TT-LoudBox/TT-QuietBox	🔍 preview	v0.56.0-rc51	e2e0002a	0.0.4-v0.56.0-rc51-e2e0002ac7dc
DeepSeek-R1-Distill-Llama-70B	HF Repo	TT-LoudBox/TT-QuietBox	🔍 preview	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Qwen2.5-72B	HF Repo	TT-LoudBox/TT-QuietBox	🔍 preview	v0.56.0-rc33	e2e0002a	0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-72B-Instruct	HF Repo	TT-LoudBox/TT-QuietBox	🔍 preview	v0.56.0-rc33	e2e0002a	0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-7B	HF Repo	n150	🔍 preview	v0.56.0-rc33	e2e0002a	0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-7B-Instruct	HF Repo	n150	🔍 preview	v0.56.0-rc33	e2e0002a	0.0.4-v0.56.0-rc33-e2e0002ac7dc
Llama-3.3-70B-Instruct	HF Repo	TT-LoudBox/TT-QuietBox	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-11B-Vision	HF Repo	n150	🔍 preview	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-11B-Vision-Instruct	HF Repo	n150	🔍 preview	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-1B	HF Repo	n150	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-1B-Instruct	HF Repo	n150	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-3B	HF Repo	n150	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-3B-Instruct	HF Repo	n150	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-70B	HF Repo	TT-LoudBox/TT-QuietBox	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-70B-Instruct	HF Repo	TT-LoudBox/TT-QuietBox	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-8B	HF Repo	n150	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-8B-Instruct	HF Repo	n150	✅ ready	v0.56.0-rc47	e2e0002a	0.0.4-v0.56.0-rc47-e2e0002ac7dc

CNNs

Model Name	Model URL	Hardware	Status	Minimum Release Version
YOLOv4	GH Repo	n150	🔍 preview	v0.0.1

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
.github/workflows		.github/workflows
archive/tt-metal-mistral-7b		archive/tt-metal-mistral-7b
benchmarking		benchmarking
docs		docs
evals		evals
locust		locust
scripts		scripts
tests		tests
tt-metal-yolov4		tt-metal-yolov4
utils		utils
vllm-tt-metal-llama3		vllm-tt-metal-llama3
workflows		workflows
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
docker-entrypoint.sh		docker-entrypoint.sh
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
run.py		run.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TT-Inference-Server

Official Repository

Getting Started

LLMs

CNNs

About

Releases

Packages

Languages

License

flexaihq/tt-inference-server

Folders and files

Latest commit

History

Repository files navigation

TT-Inference-Server

Official Repository

Getting Started

LLMs

CNNs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages