Skip to content

Commit

Permalink
Prepare 2.2.0-beta.1 release (Part 1) (#337)
Browse files Browse the repository at this point in the history
### Features added

- Chat completion now supports audio input and output!
  - To configure a chat completion to request audio output using the `gpt-4o-audio-preview` model, use `ChatResponseModalities.Text | ChatResponseModalities.Audio` as the value for `ChatCompletionOptions.ResponseModalities` and create a `ChatAudioOptions` instance for `ChatCompletionOptions.AudioOptions`.
  - Input chat audio is provided to `UserChatMessage` instances using `ChatContentPart.CreateInputAudioPart()`
  - Output chat audio is provided on the `OutputAudio` property of `ChatCompletion`
  - References to prior assistant audio are provided via `OutputAudioReference` instances on the `AudioReference` property of `AssistantChatMessage`; `AssistantChatMessage(chatCompletion)` will automatically handle this, too
  - For more information, see the example in the README
- Predicted output can be used with chat completion: the new `OutputPrediction` property on `ChatCompletionOptions` can be populated with `ChatMessageContentPart` instances via `ChatOutputPrediction.CreateStaticContentPrediction()` to substantially accelerate some varieties of requests.
- For `o3-mini`, `o1`, and later models with reasoning capabilities:
  - The new `DeveloperChatMessage`, which replaces `SystemChatMessage`, can be used to provide instructions to the model
  - `ChatCompletionOptions` can specify a `ReasoningEffortLevel` property to adjust the level of token consumption the model will attempt to apply

### `[Experimental]` Breaking changes

- The `IDictionary<string, string> Metadata` property in several request options types in the Assistants and RealtimeConversation areas have had their setters removed, aligning them with other request use of collections. The dictionaries remain writeable and use both initializer syntax and range copies to produce the same effect.
  • Loading branch information
joseharriaga authored Feb 7, 2025
1 parent 4cd8529 commit 0e0c460
Show file tree
Hide file tree
Showing 271 changed files with 7,717 additions and 2,200 deletions.
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
# Release History

## 2.2.0-beta.1 (Unreleased)

### Features added

- Chat completion now supports audio input and output!
- To configure a chat completion to request audio output using the `gpt-4o-audio-preview` model, use `ChatResponseModalities.Text | ChatResponseModalities.Audio` as the value for `ChatCompletionOptions.ResponseModalities` and create a `ChatAudioOptions` instance for `ChatCompletionOptions.AudioOptions`.
- Input chat audio is provided to `UserChatMessage` instances using `ChatContentPart.CreateInputAudioPart()`
- Output chat audio is provided on the `OutputAudio` property of `ChatCompletion`
- References to prior assistant audio are provided via `OutputAudioReference` instances on the `AudioReference` property of `AssistantChatMessage`; `AssistantChatMessage(chatCompletion)` will automatically handle this, too
- For more information, see the example in the README
- Predicted output can be used with chat completion: the new `OutputPrediction` property on `ChatCompletionOptions` can be populated with `ChatMessageContentPart` instances via `ChatOutputPrediction.CreateStaticContentPrediction()` to substantially accelerate some varieties of requests.
- For `o3-mini`, `o1`, and later models with reasoning capabilities:
- The new `DeveloperChatMessage`, which replaces `SystemChatMessage`, can be used to provide instructions to the model
- `ChatCompletionOptions` can specify a `ReasoningEffortLevel` property to adjust the level of token consumption the model will attempt to apply

### `[Experimental]` Breaking changes

- The `IDictionary<string, string> Metadata` property in several request options types in the Assistants and RealtimeConversation areas have had their setters removed, aligning them with other request use of collections. The dictionaries remain writeable and use both initializer syntax and range copies to produce the same effect.

## 2.1.0 (2024-12-04)

### Features added
Expand Down
70 changes: 70 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ It is generated from our [OpenAPI specification](https://github.com/openai/opena
- [How to use chat completions with streaming](#how-to-use-chat-completions-with-streaming)
- [How to use chat completions with tools and function calling](#how-to-use-chat-completions-with-tools-and-function-calling)
- [How to use chat completions with structured outputs](#how-to-use-chat-completions-with-structured-outputs)
- [How to use chat completions with audio](#how-to-use-chat-completions-with-audio)
- [How to generate text embeddings](#how-to-generate-text-embeddings)
- [How to generate images](#how-to-generate-images)
- [How to transcribe audio](#how-to-transcribe-audio)
Expand Down Expand Up @@ -354,6 +355,75 @@ foreach (JsonElement stepElement in structuredJson.RootElement.GetProperty("step
}
```

## How to use chat completions with audio

Starting with the `gpt-4o-audio-preview` model, chat completions can process audio input and output.

This example demonstrates:
1. Configuring the client with the supported `gpt-4o-audio-preview` model
1. Supplying user audio input on a chat completion request
1. Requesting model audio output from the chat completion operation
1. Retrieving audio output from a `ChatCompletion` instance
1. Using past audio output as `ChatMessage` conversation history

```csharp
// Chat audio input and output is only supported on specific models, beginning with gpt-4o-audio-preview
ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));

// Input audio is provided to a request by adding an audio content part to a user message
string audioFilePath = Path.Combine("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav");
byte[] audioFileRawBytes = File.ReadAllBytes(audioFilePath);
BinaryData audioData = BinaryData.FromBytes(audioFileRawBytes);
List<ChatMessage> messages =
[
new UserChatMessage(ChatMessageContentPart.CreateInputAudioPart(audioData, ChatInputAudioFormat.Wav)),
];

// Output audio is requested by configuring ChatCompletionOptions to include the appropriate
// ResponseModalities values and corresponding AudioOptions.
ChatCompletionOptions options = new()
{
ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio,
AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Mp3),
};

ChatCompletion completion = client.CompleteChat(messages, options);

void PrintAudioContent()
{
if (completion.OutputAudio is ChatOutputAudio outputAudio)
{
Console.WriteLine($"Response audio transcript: {outputAudio.Transcript}");
string outputFilePath = $"{outputAudio.Id}.mp3";
using (FileStream outputFileStream = File.OpenWrite(outputFilePath))
{
outputFileStream.Write(outputAudio.AudioBytes);
}
Console.WriteLine($"Response audio written to file: {outputFilePath}");
Console.WriteLine($"Valid on followup requests until: {outputAudio.ExpiresAt}");
}
}

PrintAudioContent();

// To refer to past audio output, create an assistant message from the earlier ChatCompletion, use the earlier
// response content part, or use ChatMessageContentPart.CreateAudioPart(string) to manually instantiate a part.
messages.Add(new AssistantChatMessage(completion));
messages.Add("Can you say that like a pirate?");

completion = client.CompleteChat(messages, options);

PrintAudioContent();
```

Streaming is highly parallel: `StreamingChatCompletionUpdate` instances can include a `OutputAudioUpdate` that may
contain any of:

- The `Id` of the streamed audio content, which can be referenced by subsequent `AssistantChatMessage` instances via `ChatAudioReference` once the streaming response is complete; this may appear across multiple `StreamingChatCompletionUpdate` instances but will always be the same value when present
- The `ExpiresAt` value that describes when the `Id` will no longer be valid for use with `ChatAudioReference` in subsequent requests; this typically appears once and only once, in the final `StreamingOutputAudioUpdate`
- Incremental `TranscriptUpdate` and/or `AudioBytesUpdate` values, which can incrementally consumed and, when concatenated, form the complete audio transcript and audio output for the overall response; many of these typically appear

## How to generate text embeddings

In this example, you want to create a trip-planning website that allows customers to write a prompt describing the kind of hotel that they are looking for and then offers hotel recommendations that closely match this description. To achieve this, it is possible to use text embeddings to measure the relatedness of text strings. In summary, you can get embeddings of the hotel descriptions, store them in a vector database, and use them to build a search index that you can query using the embedding of a given customer's prompt.
Expand Down
Loading

0 comments on commit 0e0c460

Please sign in to comment.