Utterance Event

This is an event broadcasted by Tavus. An utterance contains the content of what was spoken and an indication of who spoke it (i.e. the user or replica). Each utterance event includes all of the words spoken by the user or replica measured from when the person started speaking to when they finished speaking. This could include multiple sentences or phrases. User utterances (role: user) are sent when the user finishes speaking and contain the transcribed text. Replica utterances (role: replica) are sent immediately when the replica begins speaking and contain the full LLM response text — including words the replica may not have actually spoken if it was interrupted. This makes them useful for quickly displaying the replica’s intended response.

If the replica is interrupted mid-sentence, the conversation.utterance event (role=replica) will still contain the full intended response. To track only the words the replica actually spoke, use streaming utterance events, which progressively report spoken text and indicate interruptions.

Utterance events can be used to keep track of what the user or the replica has said. To track how long an utterance lasts, please refer to duration in “User Started/Stopped Speaking” and “Replica Started/Stopped Speaking” events. When the speaker is the user and the persona uses Raven-1, properties may include user_audio_analysis (tone/delivery) and/or user_visual_analysis (appearance and demeanor). These fields are only present when there is relevant analysis for that utterance. This event includes a seq field for global ordering and a turn_idx field to identify which conversational turn the utterance belongs to. See Event Ordering and Turn Tracking for details.

message_type

string

Message type indicates what product this event will be used for. In this case, the message_type will be conversation

Example:

"conversation"

event_type

string

This is the type of event that is being sent back. This field will be present on all events and can be used to distinguish between different event types.

Example:

"conversation.utterance"

seq

integer

A globally monotonic sequence number assigned to each event. Use this to determine the ordering of events — a higher seq means the event was sent later. This is useful for reconciling events that may arrive out of order.

Example:

42

conversation_id

string

The unique identifier for the conversation.

Example:

"c123456"

inference_id

string

This is a unique identifier for a given utterance. In this case, it will be the utterance the replica is speaking.

Example:

"83294d9f-8306-491b-a284-791f56c8383f"

turn_idx

integer

The conversation turn index. This value increments each time a conversation.respond interaction is received, and groups all events that belong to the same conversational turn. Use this to correlate events (utterances, tool calls, speaking state changes, etc.) that are part of the same turn.

Example:

3

properties

object

This object contains the speech property (the contents of the utterance). When the speaker is the user and the persona uses Raven-1, it may also include user_audio_analysis and/or user_visual_analysis when relevant analysis is available.

Show child attributes

Getting Started

Onboarding Guide

Conversational Video Interface

Replica

Video Generation

Resources

Utterance Event