Skip to main content
The LLM Layer in Tavus enables your persona to generate intelligent, context-aware responses. You can use Tavus-hosted models or connect your own OpenAI-compatible LLM.

Tavus-Hosted Models

1. model

Select one of the available models. tavus-gpt-oss is recommended as a good starting point; the table below helps you choose based on your priorities.
ModelSpeedIntelligenceNaturalnessBest For
tavus-gpt-ossβš‘βš‘βš‘πŸ§ πŸ’¬Snappy, low-latency
tavus-gpt-4.1 (deprecated)βš‘βš‘πŸ§ πŸ§ πŸ§ πŸ’¬πŸ’¬πŸ’¬Long-context reasoning
tavus-gpt-4o (deprecated)βš‘βš‘πŸ§ πŸ§ πŸ’¬πŸ’¬Legacy option
tavus-gemini-2.5-flashβš‘βš‘πŸ§ πŸ§ πŸ’¬πŸ’¬πŸ’¬Latency + logical deduction
tavus-claude-haiku-4.5βš‘βš‘πŸ§ πŸ§ πŸ’¬πŸ’¬Grounded, fewer hallucinations
tavus-gpt-5.2βš‘βš‘πŸ§ πŸ§ πŸ’¬πŸ’¬General use, latency less critical
tavus-gpt-4o-mini (deprecated)βš‘βš‘πŸ§ πŸ’¬πŸ’¬Legacy option
tavus-gemini-3-flashβš‘πŸ§ πŸ§ πŸ§ πŸ’¬πŸ’¬πŸ’¬Highest intelligence, lower speed
Context Window Limit
  • Performance and intelligence are best when prompts are limited to 5,000 tokens. You may see degradations in speed and instruction following in the 15,000–20,000 token range.
  • All Tavus-hosted models support up to 32,000 tokens; staying within 5k is recommended for optimal behavior.
Tip: 1 token β‰ˆ 4 characters, so 5,000 tokens β‰ˆ 20,000 characters (including spaces and punctuation).
"model": "tavus-gpt-oss"

2. tools

Optionally enable tool calling by defining functions the LLM can invoke.
Please see LLM Tool Calling for more details.

3. speculative_inference

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. This is the default value; you can set it to false to disable.
"speculative_inference": true
This field is optional. It defaults to true for better performance.

4. extra_body

Add parameters to customize the LLM request. For Tavus-hosted models, you can pass temperature and top_p:
"extra_body": {
  "temperature": 0.7,
  "top_p": 0.9
}
This field is optional.

Example Configuration

{
  "persona_name": "Health Coach",
  "system_prompt": "You provide wellness tips and encouragement for people pursuing a healthy lifestyle.",
  "pipeline_mode": "full",
  "default_replica_id": "rf4e9d9790f0",
  "layers": {
    "llm": {
      "model": "tavus-gpt-oss",
      "speculative_inference": true,
      "extra_body": {
        "temperature": 0.7,
        "top_p": 0.9
      }
    }
  }
}

Custom LLMs

Prerequisites

To use your own OpenAI-compatible LLM, you’ll need:
  • Model name
  • Base URL
  • API key
Ensure your LLM:
  • Streamable (ie. via SSE)
  • Uses the /chat/completions endpoint

1. model

Name of the custom model you want to use.
"model": "gpt-3.5-turbo"

2. base_url

Base URL of your LLM endpoint.
Do not include route extensions in the base_url.
"base_url": "https://your-llm.com/api/v1"

3. api_key

API key to authenticate with your LLM provider.
"api_key": "your-api-key"
base_url and api_key are required only when using a custom model.

4. tools

Optionally enable tool calling by defining functions the LLM can invoke.
Please see LLM Tool Calling for more details.

5. speculative_inference

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. This is the default value; you can set it to false to disable.
"speculative_inference": true
This field is optional. It defaults to true for better performance.

6. headers

Optional additional headers to include when making requests to your LLM. Use this for any extra headers your provider requires beyond the API key (which should be set via the api_key field).
"headers": {
  "X-Organization-ID": "your-org-id",
  "X-Request-Source": "tavus-cvi"
}
This field is optional, depending on your LLM provider’s requirements.

7. extra_body

Add parameters to customize the LLM request. You can pass any parameters that your LLM provider supports:
"extra_body": {
  "temperature": 0.5,
  "top_p": 0.9,
  "frequency_penalty": 0.5
}
This field is optional.

8. default_query

Add default query parameters that get appended to the base URL when making requests to the /chat/completions endpoint.
"default_query": {
  "api-version": "2024-02-15-preview"
}
This field is optional. Useful for LLM providers that require query parameters for authentication or versioning.

Example Configuration

{
  "persona_name": "Storyteller",
  "system_prompt": "You are a storyteller who entertains people of all ages.",
  "pipeline_mode": "full",
  "default_replica_id": "rf4e9d9790f0",
  "layers": {
    "llm": {
      "model": "gpt-4o",
      "base_url": "https://your-azure-openai.openai.azure.com/openai/deployments/gpt-4o",
      "api_key": "your-api-key",
      "speculative_inference": true,
      "default_query": {
        "api-version": "2024-02-15-preview"
      }
    }
  }
}
Refer to the Create Persona API for a full list of supported fields.

Perception

When using the raven-1 perception model with a custom LLM, your LLM will receive system messages containing visual context extracted from the user’s video input.
{
    "role": "system",
    "content": "<user_appearance>...</user_appearance> <user_emotions>...</user_emotions> <user_screenshare>...</user_screenshare>"
}

Disabled Perception model

If you disable the perception model, your LLM will not receive any special messages.