Skip to main content

Triton Inference Server

LiteLLM supports Embedding Models on Triton Inference Servers

Usage​

Example Call​

Use the triton/ prefix to route to triton server

from litellm import embedding
import os

response = await litellm.aembedding(
model="triton/<your-triton-model>",
api_base="https://your-triton-api-base/triton/embeddings", # /embeddings endpoint you want litellm to call on your server
input=["good morning from litellm"],
)