-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CosyVoice什么时候支持呢 #29
Comments
文档里似乎没有 HTTP 调用的 API,这个得研究一下 Java SDK 的源码了 |
好吧,原来是 WebSocket 的 API,它甚至不是 HTTP 的 |
我刚好用Java实现了一次,给自己的娱乐项目使用的。这里分享下dashscope关于webscoket交互大概的协议格式。希望我的分享能对你研发.net版本的SDK有一定的帮助。 总述dashscope的wss通信地址是:wss://dashscope.aliyuncs.com/api-ws/v1/inference/ 当webscoket通道被打开之后,云端会启动一个task配合本次webscoket通道。task-id由client自己生成,只要确保一定时间范围内唯一,推荐用UUID 以cosyvoice-v1的全双工模式为例 客户端发送webscoket通道打开后,客户端依次向发起三类请求:启动任务、数据传输(多次)、结束任务 通道开始
{
"header": {
"action": "run-task",
"task_id": "439e0616-2f5b-44e0-8872-0002a066a49c",
"streaming": "duplex"
},
"payload": {
"model": "cosyvoice-v1",
"task_group": "audio",
"task": "tts",
"function": "SpeechSynthesizer",
"input": {},
"parameters": {
"voice": "longxiaochun",
"volume": 50,
"text_type": "PlainText",
"sample_rate": 0,
"rate": 1.0,
"phoneme_timestamp_enabled": false,
"format": "Default",
"pitch": 1.0,
"word_timestamp_enabled": false
}
}
} 数据传输(多次)
{
"header": {
"action": "continue-task",
"task_id": "439e0616-2f5b-44e0-8872-0002a066a49c",
"streaming": "duplex"
},
"payload": {
"model": "cosyvoice-v1",
"task_group": "audio",
"task": "tts",
"function": "SpeechSynthesizer",
"input": {
"text": "今天天气怎么样?"
},
"parameters": {
"voice": "longxiaochun",
"volume": 50,
"text_type": "PlainText",
"sample_rate": 0,
"rate": 1.0,
"phoneme_timestamp_enabled": false,
"format": "Default",
"pitch": 1.0,
"word_timestamp_enabled": false
}
}
} 任务结束
{
"header": {
"action": "finish-task",
"task_id": "439e0616-2f5b-44e0-8872-0002a066a49c",
"streaming": "duplex"
},
"payload": {
"input": {}
}
} 服务端响应当客户端在启动任务后,服务端就开始响应本次语音合成请求,也分三类:任务开始、数据生成(多次)、任务结束/失败 任务开始{
"header": {
"task_id": "439e0616-2f5b-44e0-8872-0002a066a49c",
"event": "task-started",
"attributes": {}
},
"payload": {}
} 数据生成(多次)接收数据分两类:数据生成应答和音频数据块,这两个没有严格的先后顺序。 第一类:数据生成应答(json)在cosyvoice-v1模型中没有实际意义,但在其他模型中能返回整句、单字的时间轴和音调、音素等信息,方便做字幕等功能。 {
"header": {
"task_id": "439e0616-2f5b-44e0-8872-0002a066a49c",
"event": "result-generated",
"attributes": {}
},
"payload": {
"output": {
"sentence": {
"words": []
}
},
"usage": null
}
} 第二类:音频数据块(ByteBuffer)这个为音频数据块,可以存储下来,也可以直接输入给音频输出流播放音频 任务结束/失败任务结束的状态分两种,成功和失败 任务结束{
"header": {
"task_id": "439e0616-2f5b-44e0-8872-0002a066a49c",
"event": "task-finished",
"attributes": {}
},
"payload": {
"output": null,
"usage": {
"characters": 15
}
}
} 任务失败{
"header": {
"task_id": "439e0616-2f5b-44e0-8872-0002a066a49c",
"event": "task-failed",
"error_code": "InvalidParameter",
"error_message": "request timeout after 23 seconds.",
"attributes": {}
},
"payload": {}
} |
感谢,非常有帮助,周末我抽时间再研究一下 |
貌似是最近新加的语音合成模型。非常好用。希望添加
The text was updated successfully, but these errors were encountered: