Merge branch 'main' into ben/add-changelog

fixie-ai · Dec 3, 2024 · 25960b7 · 25960b7
2 parents 4bb8146 + 2d01082
commit 25960b7
Show file tree

Hide file tree

Showing 8 changed files with 346 additions and 126 deletions.
diff --git a/docs/astro.config.mjs b/docs/astro.config.mjs
@@ -37,9 +37,9 @@ export default defineConfig({
         label: 'Guides',
         collapsed: false,
         items: [
+          'guides/connectionoptions',
           'tools',
           'guides/stages',
-          'guides/telephony',
           'guides/clienttoolstutorial',
           'guides/callstagestutorial',
         ]
@@ -52,7 +52,11 @@ export default defineConfig({
       },
       {
         label: 'SDK',
-        link: 'sdk'
+        collapsed: false,
+        items: [
+          'sdk',
+          'datamessages'
+        ]
       },
       {
         label: 'Updates & Roadmap',

diff --git a/docs/src/content/docs/api/calls.mdx b/docs/src/content/docs/api/calls.mdx
@@ -150,6 +150,11 @@ An optional query parameter called `priorCallId` can be provided to continue a p
         <td class="font-mono">firstSpeaker</td>
         <td>string</td>
         <td>Who should talk first when the call starts. Typically set to FIRST_SPEAKER_USER for outgoing calls and left as the default (FIRST_SPEAKER_AGENT) otherwise.</td>
+      </tr>
+      <tr>
+        <td class="font-mono">initialMessages</td>
+        <td>array</td>
+        <td>The conversation history to start from for this call. See [below](#more-info) for more information.</td>
       </tr>
        <tr>
         <td class="font-mono">initialOutputMedium</td>
@@ -465,15 +470,10 @@ Lists all messages generated during the given call.
         <td>array</td>
         <td>Array of message objects. Each message object contains:</td>
       </tr>
-      <tr>
-        <td class="font-mono text-sm">ordinal</td>
-        <td>number</td>
-        <td>Ordinal position of the message. Used to determine sequence.</td>
-      </tr>
       <tr>
         <td class="font-mono text-sm">role</td>
         <td>string</td>
-        <td>Role that generated the message. Corresponds to one of the following: `USER` or `AGENT`.</td>
+        <td>Role that generated the message. Corresponds to one of the following: `MESSAGE_ROLE_USER` or `MESSAGE_ROLE_AGENT`.</td>
       </tr>
       <tr>
         <td class="font-mono text-sm">text</td>
@@ -531,4 +531,53 @@ Call recordings are only generated if you add `"recordingEnabled": true` to the
       </tr>
     </table>
   </TabItem>
-</Tabs>
+</Tabs>
+
+## More Info
+
+This section contains additional details for some properties.
+
+### initialMessages
+When creating a new call or a new call stage, you can provide messages to the agent via `initialMessages`. By default, new calls don't have initial messages and call stages inherit the prior stage's messages. New calls will inherit messages if `priorCallId` is set.
+
+These messages can serve the purpose of giving the agent call history or to give examples for few-shotting (e.g. if you want the agent to learn how to respond in a specific way to user input).
+
+#### Message Format
+`initialMessages` must be an array of message objects where each message contains a `role` and `text`. See "Response" under [List Call Messages](#list-call-messages) above for more details.
+
+Here's an example:
+
+```js
+[
+  {
+    "role": "MESSAGE_ROLE_USER",
+    "text": "My name is Steve"
+  },
+  {
+    "role": "MESSAGE_ROLE_AGENT",
+    "text": "Great to meet you, Steve! How can I help?"
+  },
+]
+```
+
+#### Using Mistral
+If you are using `fixie-ai/ultravox-mistral-nemo-12B` as your model, you need to do the following when creating the call:
+1. **Empty System Prompt** → Set `systemPrompt` to an empty string.
+1. **Prompt in Initial Messages** → Add a single user message to `initialMessages` that contains the system prompt.
+1. **Proper Turns** → Make sure that you follow Mistral's strict, alternating, user > agent > user message ordering.
+
+Here's an example of what the request body for creating the call might look like:
+
+```js
+{
+  "systemPrompt": "",
+  "model": "fixie-ai/ultravox-mistral-nemo-12B",
+  "initialMessages": [
+    {
+      "role": "MESSAGE_ROLE_USER",
+      "text": "You are a helpful assistant."
+    }
+  ],
+  "temperature": 0.4
+}
+```
diff --git a/docs/src/content/docs/api/playground.md b/docs/src/content/docs/api/playground.md
@@ -4,9 +4,9 @@ sidebar:
   order: 99
 ---
 ## API Playground
-There is a hosted, interactive API playground available at https://api.ultravox.ai/api/schema/swagger-ui. This enables you to visually explore the Ultravox API and to make calls to the API directly in your browser.
+There is a hosted, interactive API playground available at https://app.ultravox.ai/api/schema/swagger-ui. This enables you to visually explore the Ultravox API and to make calls to the API directly in your browser.
 
-[![Ultravox OpenAPI Playground](../../../assets/apiplayground.png)](https://api.ultravox.ai/api/schema/swagger-ui)
+[![Ultravox OpenAPI Playground](../../../assets/apiplayground.png)](https://app.ultravox.ai/api/schema/swagger-ui)
 
 ## OpenAPI Specification
-We have an OpenAPI spec (OAS) file available for [download](https://api.ultravox.ai/api/schema/).
+We have an OpenAPI spec (OAS) file available for [download](https://app.ultravox.ai/api/schema/).
diff --git a/docs/src/content/docs/availablemodels.mdx b/docs/src/content/docs/availablemodels.mdx
@@ -15,12 +15,16 @@ The Ultravox API currently provides the following models.
     </tr>
     <tr>
         <td class="font-mono">fixie-ai/ultravox-70B</td>
-        <td>70B variant of Ultravox. Supports tools.</td>
+        <td>Based on Llama 3.1 70B. Supports tools.</td>
     </tr>
     <tr>
         <td class="font-mono">fixie-ai/ultravox-8B</td>
-        <td>8B variant of Ultravox. Not recommended for most use cases. Tools are unlikely to work.</td>
+        <td>Based on Llama 3.1 8B. Not recommended for most use cases. Tools are unlikely to work.</td>
     </tr>
+    <tr>
+        <td class="font-mono">fixie-ai/ultravox-mistral-nemo-12B</td>
+        <td>Based on Mistral Nemo 12B. Mistral handles the system prompt differently. See [`initialMessages`](/api/calls/#more-info).</td>
+    </tr>   
 </table>
 
 ## Using Models

diff --git a/docs/src/content/docs/datamessages.mdx b/docs/src/content/docs/datamessages.mdx
@@ -0,0 +1,88 @@
+---
+title: "Data Messages"
+description: Protocol documentation for messages exchanged between client and server during Ultravox calls.
+---
+
+Data messages are used to communicate non-audio information between your client and an Ultravox server during calls. These messages work across WebRTC data channels and WebSocket connections.
+
+All messages are JSON objects with camelCase keys containing:
+- A required `type` field identifying the message type
+- Additional fields specific to each message type
+
+## Messages at a Glance
+This table provides all messages at a glance. Details on each message type appears below. Sender indicates client or server message. Client messages are sent from the client to the server. Server messages are sent from the server to the client.
+| Message | Sender | Description |
+| --------------------------------------------- | ------ | ---------------------------------------------------- |
+| [Ping](#ping)                                 | Client | Measures round-trip data latency.                    |
+| [Pong](#pong)                                 | Server | Server reply to a ping message.                      |
+| [State](#state)                               | Server | Indicates the server's current state.                |
+| [Transcript](#transcript)                     | Server | Contains text for an utterance made during the call.                  |
+| [InputTextMessage](#inputtextmessage)         | Client | Used to send a user message to the agent via text.   |
+| [SetOutputMedium](#setoutputmedium)           | Client | Sets server's output medium to text or voice.        |
+| [ClientToolInvocation](#clienttoolinvocation) | Server | Asks the client to invoke a client tool.             |
+| [ClientToolResult](#clienttoolresult)         | Client | Contains the result of a client tool invocation.     |
+| [Debug](#debug)                               | Server | Useful for application debugging.                    |
+| [PlaybackClearBuffer](#playbackclearbuffer)   | Server | Used to clear buffered output audio. WebSocket only. |
+
+
+## Ping
+A message sent by the client to measure round-trip data message latency.
+- `type: "ping"` 
+- `timestamp`: Float. Client timestamp for latency measurement.
+
+## Pong
+A message sent by the server in response to a PingMessage. The timestamp is copied from the PingMessage.
+- `type: "pong"`
+- `timestamp`: Float. Matching ping timestamp.
+
+## State
+A message sent by the server to indicate its current state.
+- `type: "state"`
+- `state`: Current session state
+
+## Transcript
+A message containing text transcripts of user and agent utterances.
+- `type: "transcript"`
+- `role`: "user" or "agent". Who emitted the utterance.
+- `medium`: "text" or "voice". The medium through which the utterance was emitted.
+- `text`: String. Full transcript text (exclusive with delta). The full text of the transcript so far. Either this or delta will be set.
+- `delta`: String. Incremental transcript update (exclusive with text). The additional transcript text added since the last agent transcript message.
+- `final`: Boolean. Whether more updates are expected for this utterance.
+- `ordinal`: int. Used for ordering transcripts within a call.
+
+## InputTextMessage
+A user message sent via text.
+- `type: "input_text_message"`
+- `text`: String. The content of the user message.
+
+## SetOutputMedium
+Message sent by the client to set the server's output medium.
+- `type: "set_output_medium"`
+- `medium`: Either "voice" or "text".
+
+## ClientToolInvocation
+Sent by the server to ask the client to invoke a client-implemented tool with the given parameters. The client is expected to send back a ClientToolResultMessage with a matching invocation_id.
+- `type: "client_tool_invocation"`
+- `tool_name`: String. Tool to invoke
+- `invocation_id`: String. Unique invocation ID
+- `parameters`: Dict[String, Any]. Tool parameters
+
+## ClientToolResult
+Contains the result of a client-implemented tool invocation.
+- `type: "client_tool_result"`
+- `invocation_id`: String. Matches corresponding invocation.
+- `result`: String. Tool execution result. Often a JSON string. May be omitted for errors.
+- `response_type`: String. Defaults to "tool-response".
+- `error_type`: Optional string. Should be omitted when result is set. Otherwise, should be "undefined" if the a tool with the given name does not exist or "implementation-error" otherwise.
+- `error_message`: String. Error details if failed (optional).
+
+## Debug
+A message sent by the server to communicate debug information.
+- `type: "debug"`
+- `message`: String. Debug information
+- Disabled by default
+
+## PlaybackClearBuffer
+Message sent by our server to clear buffered output audio. Integrators should drop as much unplayed output audio as possible in order for interruptions to function properly.
+- `type: "playback_clear_buffer"`
+- WebSocket connections only