Prepare 2.2.0-beta.1 release (Part 1) (#337)

### Features added - Chat completion now supports audio input and output! - To configure a chat completion to request audio output using the `gpt-4o-audio-preview` model, use `ChatResponseModalities.Text | ChatResponseModalities.Audio` as the value for `ChatCompletionOptions.ResponseModalities` and create a `ChatAudioOptions` instance for `ChatCompletionOptions.AudioOptions`. - Input chat audio is provided to `UserChatMessage` instances using `ChatContentPart.CreateInputAudioPart()` - Output chat audio is provided on the `OutputAudio` property of `ChatCompletion` - References to prior assistant audio are provided via `OutputAudioReference` instances on the `AudioReference` property of `AssistantChatMessage`; `AssistantChatMessage(chatCompletion)` will automatically handle this, too - For more information, see the example in the README - Predicted output can be used with chat completion: the new `OutputPrediction` property on `ChatCompletionOptions` can be populated with `ChatMessageContentPart` instances via `ChatOutputPrediction.CreateStaticContentPrediction()` to substantially accelerate some varieties of requests. - For `o3-mini`, `o1`, and later models with reasoning capabilities: - The new `DeveloperChatMessage`, which replaces `SystemChatMessage`, can be used to provide instructions to the model - `ChatCompletionOptions` can specify a `ReasoningEffortLevel` property to adjust the level of token consumption the model will attempt to apply ### `[Experimental]` Breaking changes - The `IDictionary<string, string> Metadata` property in several request options types in the Assistants and RealtimeConversation areas have had their setters removed, aligning them with other request use of collections. The dictionaries remain writeable and use both initializer syntax and range copies to produce the same effect.
openai · Feb 7, 2025 · 0e0c460 · 0e0c460
1 parent 4cd8529
commit 0e0c460
Show file tree

Hide file tree

Showing 271 changed files with 7,717 additions and 2,200 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,24 @@
 # Release History
 
+## 2.2.0-beta.1 (Unreleased)
+
+### Features added
+
+- Chat completion now supports audio input and output!
+  - To configure a chat completion to request audio output using the `gpt-4o-audio-preview` model, use `ChatResponseModalities.Text | ChatResponseModalities.Audio` as the value for `ChatCompletionOptions.ResponseModalities` and create a `ChatAudioOptions` instance for `ChatCompletionOptions.AudioOptions`.
+  - Input chat audio is provided to `UserChatMessage` instances using `ChatContentPart.CreateInputAudioPart()`
+  - Output chat audio is provided on the `OutputAudio` property of `ChatCompletion`
+  - References to prior assistant audio are provided via `OutputAudioReference` instances on the `AudioReference` property of `AssistantChatMessage`; `AssistantChatMessage(chatCompletion)` will automatically handle this, too
+  - For more information, see the example in the README
+- Predicted output can be used with chat completion: the new `OutputPrediction` property on `ChatCompletionOptions` can be populated with `ChatMessageContentPart` instances via `ChatOutputPrediction.CreateStaticContentPrediction()` to substantially accelerate some varieties of requests.
+- For `o3-mini`, `o1`, and later models with reasoning capabilities:
+  - The new `DeveloperChatMessage`, which replaces `SystemChatMessage`, can be used to provide instructions to the model
+  - `ChatCompletionOptions` can specify a `ReasoningEffortLevel` property to adjust the level of token consumption the model will attempt to apply
+
+### `[Experimental]` Breaking changes
+
+- The `IDictionary<string, string> Metadata` property in several request options types in the Assistants and RealtimeConversation areas have had their setters removed, aligning them with other request use of collections. The dictionaries remain writeable and use both initializer syntax and range copies to produce the same effect.
+
 ## 2.1.0 (2024-12-04)
 
 ### Features added

diff --git a/README.md b/README.md
@@ -18,6 +18,7 @@ It is generated from our [OpenAPI specification](https://github.com/openai/opena
 - [How to use chat completions with streaming](#how-to-use-chat-completions-with-streaming)
 - [How to use chat completions with tools and function calling](#how-to-use-chat-completions-with-tools-and-function-calling)
 - [How to use chat completions with structured outputs](#how-to-use-chat-completions-with-structured-outputs)
+- [How to use chat completions with audio](#how-to-use-chat-completions-with-audio)
 - [How to generate text embeddings](#how-to-generate-text-embeddings)
 - [How to generate images](#how-to-generate-images)
 - [How to transcribe audio](#how-to-transcribe-audio)
@@ -354,6 +355,75 @@ foreach (JsonElement stepElement in structuredJson.RootElement.GetProperty("step
 }
 ```
 
+## How to use chat completions with audio
+
+Starting with the `gpt-4o-audio-preview` model, chat completions can process audio input and output.
+
+This example demonstrates:
+  1. Configuring the client with the supported `gpt-4o-audio-preview` model
+  1. Supplying user audio input on a chat completion request
+  1. Requesting model audio output from the chat completion operation
+  1. Retrieving audio output from a `ChatCompletion` instance
+  1. Using past audio output as `ChatMessage` conversation history
+
+```csharp
+// Chat audio input and output is only supported on specific models, beginning with gpt-4o-audio-preview
+ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));
+
+// Input audio is provided to a request by adding an audio content part to a user message
+string audioFilePath = Path.Combine("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav");
+byte[] audioFileRawBytes = File.ReadAllBytes(audioFilePath);
+BinaryData audioData = BinaryData.FromBytes(audioFileRawBytes);
+List<ChatMessage> messages =
+    [
+        new UserChatMessage(ChatMessageContentPart.CreateInputAudioPart(audioData, ChatInputAudioFormat.Wav)),
+    ];
+
+// Output audio is requested by configuring ChatCompletionOptions to include the appropriate
+// ResponseModalities values and corresponding AudioOptions.
+ChatCompletionOptions options = new()
+{
+    ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio,
+    AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Mp3),
+};
+
+ChatCompletion completion = client.CompleteChat(messages, options);
+
+void PrintAudioContent()
+{
+    if (completion.OutputAudio is ChatOutputAudio outputAudio)
+    {
+        Console.WriteLine($"Response audio transcript: {outputAudio.Transcript}");
+        string outputFilePath = $"{outputAudio.Id}.mp3";
+        using (FileStream outputFileStream = File.OpenWrite(outputFilePath))
+        {
+            outputFileStream.Write(outputAudio.AudioBytes);
+        }
+        Console.WriteLine($"Response audio written to file: {outputFilePath}");
+        Console.WriteLine($"Valid on followup requests until: {outputAudio.ExpiresAt}");
+    }
+}
+
+PrintAudioContent();
+
+// To refer to past audio output, create an assistant message from the earlier ChatCompletion, use the earlier
+// response content part, or use ChatMessageContentPart.CreateAudioPart(string) to manually instantiate a part.
+
+messages.Add(new AssistantChatMessage(completion));
+messages.Add("Can you say that like a pirate?");
+
+completion = client.CompleteChat(messages, options);
+
+PrintAudioContent();
+```
+
+Streaming is highly parallel: `StreamingChatCompletionUpdate` instances can include a `OutputAudioUpdate` that may
+contain any of:
+
+- The `Id` of the streamed audio content, which can be referenced by subsequent `AssistantChatMessage` instances via `ChatAudioReference` once the streaming response is complete; this may appear across multiple `StreamingChatCompletionUpdate` instances but will always be the same value when present
+- The `ExpiresAt` value that describes when the `Id` will no longer be valid for use with `ChatAudioReference` in subsequent requests; this typically appears once and only once, in the final `StreamingOutputAudioUpdate`
+- Incremental `TranscriptUpdate` and/or `AudioBytesUpdate` values, which can incrementally consumed and, when concatenated, form the complete audio transcript and audio output for the overall response; many of these typically appear
+
 ## How to generate text embeddings
 
 In this example, you want to create a trip-planning website that allows customers to write a prompt describing the kind of hotel that they are looking for and then offers hotel recommendations that closely match this description. To achieve this, it is possible to use text embeddings to measure the relatedness of text strings. In summary, you can get embeddings of the hotel descriptions, store them in a vector database, and use them to build a search index that you can query using the embedding of a given customer's prompt.