Tenstorrent Inference Server (tt-inference-server
) is the repo of available model APIs for deploying on Tenstorrent hardware.
https://github.com/tenstorrent/tt-inference-server
Please follow setup instructions for the model you want to serve, Model Name
in tables below link to corresponding implementation.
Note: models with Status [🔍 preview] are under active development. If you encounter setup or stability problems please file an issue and our team will address it.
For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide.
Model Name | Model URL | Hardware | Status | Minimum Release Version |
---|---|---|---|---|
YOLOv4 | GH Repo | n150 | 🔍 preview | v0.0.1 |