Skip to main content
This is an event broadcasted by Tavus. A perception tool call event is broadcast when a perception tool is triggered by Raven based on visual or audio input. The event always includes eventType conversation.perception_tool_call, a modality in data.properties ("vision" or "audio"), the tool name, and arguments. Modality-specific payload:
  • modality: "audio" — Triggered by audio tools (audio_tool_prompt / audio_tools). arguments is a JSON string (e.g. "{\"reason\":\"The user said …\"}"). There is no frames array.
  • modality: "vision" — Triggered by visual tools (visual_tool_prompt / visual_tools). arguments is an object with tool-defined fields. Includes a frames array of objects with data (base64-encoded JPEG) and mime_type (e.g. "image/jpeg") for the images that triggered the call.
Perception tool calls can be used to trigger automated actions in response to visual or audio cues detected by the Raven perception system. For more on configuring perception tool calls, see Tool Calling for Perception and Perception. This event includes a seq field for global ordering and a turn_idx field to identify which conversational turn the perception tool call belongs to. See Event Ordering and Turn Tracking for details.

Example: audio tool call

When an audio tool is triggered (e.g. sarcasm detection), the event looks like:
{
  "timestamp": "2026-03-02T21:51:47.194Z",
  "eventType": "conversation.perception_tool_call",
  "data": {
    "conversation_id": "c58b46f8646d943f",
    "event_type": "conversation.perception_tool_call",
    "message_type": "conversation",
    "seq": 17,
    "turn_idx": 2,
    "properties": {
      "arguments": "{\"reason\":\"The user said \\\"well, yeah\\\"\"}",
      "modality": "audio",
      "name": "notify_sarcasm_detected"
    }
  }
}

Example: vision tool call

When a visual tool is triggered (e.g. hat detection), the event includes frames with base64-encoded images. The data values in the example are shortened for readability.
{
  "timestamp": "2026-03-02T21:51:49.730Z",
  "eventType": "conversation.perception_tool_call",
  "data": {
    "conversation_id": "c58b46f8646d943f",
    "event_type": "conversation.perception_tool_call",
    "message_type": "conversation",
    "seq": 18,
    "turn_idx": 2,
    "properties": {
      "arguments": {
        "hat_type": "baseball cap"
      },
      "frames": [
        { "data": "<base64-encoded-jpeg>", "mime_type": "image/jpeg" },
        { "data": "<base64-encoded-jpeg>", "mime_type": "image/jpeg" }
      ],
      "modality": "vision",
      "name": "notify_hat_detected"
    }
  }
}
message_type
string

Message type indicates what product this event will be used for. In this case, the message_type will be conversation

Example:

"conversation"

event_type
string

This is the type of event that is being sent back. This field will be present on all events and can be used to distinguish between different event types.

Example:

"conversation.perception_tool_call"

seq
integer

A globally monotonic sequence number assigned to each event. Use this to determine the ordering of events — a higher seq means the event was sent later. This is useful for reconciling events that may arrive out of order.

Example:

42

conversation_id
string

The unique identifier for the conversation.

Example:

"c123456"

turn_idx
integer

The conversation turn index. This value increments each time a conversation.respond interaction is received, and groups all events that belong to the same conversational turn. Use this to correlate events (utterances, tool calls, speaking state changes, etc.) that are part of the same turn.

Example:

3

properties
object

Contains the tool call payload. Includes modality (vision or audio), name, arguments, and for vision calls, frames.