Custom STT Onboarding
You can choose between tavus-turbo
and tavus-advanced
as your STT engine, and adjust the VAD sensitivity to your needs.
Create Persona
To get started, you’ll need to create a Persona that specifies your STT engine and VAD sensitivity. Here’s an example Persona:
<persona created>, id: p234324a
STT Engine
The STT engine parameter controls the transcription engine that will be used. The default is tavus-advanced
, but you can adjust this to tavus-turbo
for a tiny latency improvement. However, tavus-advanced
provides much higher transcription accuracy and supports non-English languages, so we highly recommend using it for almost all use cases.
Speech Sensitivity
These sensitivity parameters control the sensitivity of the Voice Activity Detection (VAD) engine. The defaults are medium
, but you can adjust this to low
or high
depending on your needs. You can use the guidelines below to choose the right sensitivity for your use case:
Participant Pause Sensitivity
Controls how long of a pause the user can take before the replica responds. You can think of this as the replica’s “pause” tolerance.
high
: The replica replies quickly after short pauses. Good for fast and casual conversations.medium
(default): Balanced timing. Allows natural pauses without feeling rushed or delayed.low
: The replica waits a bit longer before replying. Useful for slower or more thoughtful discussions.verylow
: The replica allows even longer pauses before responding.superlow
: The replica has the longest response delay, making it suitable for conversations where participants often pause.
Participant Interrupt Sensitivity
Controls how long the user can speak before the replica will be interrupted. You can think of this as the replica’s “interrupt” tolerance.
high
: The replica stops speaking immediately when the participant starts talking. Ideal for quick and back-and-forth exchanges.medium
(default): Balanced behavior. Allows short interruptions without breaking the flow.low
: The participant needs to speak more clearly or for a bit longer to interrupt.verylow
: The replica usually keeps talking unless the interruption is strong.superlow
: The replica rarely stops mid-sentence. It will usually finish speaking before responding.
Smart Turn Detection
When enabled, Sparrow-0 ensures highly natural interactions by intelligently evaluating semantic and lexical conversation cues in real-time. It:
- Continuously assesses speech patterns and conversation content
- Seamlessly integrates heuristic strategies and machine learning to refine turn-taking
- Ensures minimal latency overhead, adding only 10ms, enabling response times as fast as 600ms when needed
Key Benefits:
- Enhanced naturalness: Conversations feel more human-like and fluid.
- Reduced latency: Only adds 10ms latency, supporting rapid conversational interactions.
- Continuous improvement: Gets smarter and more nuanced over time using adaptive learning.