Model Type | Multimodal, Audio understanding |
|
Use Cases |
Areas: | Audio analysis, Sound understanding, Music appreciation, Speech editing |
|
Primary Use Cases: | Multimodal understanding, Sound analysis, Multi-turn dialogues, Multi-language support |
|
|
Additional Notes | The model is designed to operate efficiently with diverse audio and text inputs, and aims to serve as a universal audio understanding model based on Alibaba Cloud's large model series. |
|
Supported Languages | languages_supported (/ languages: zh, en) |
|
Training Details |
Methodology: | Multimodal and multi-task pre-training with a multi-task training framework. |
|
|
Input Output |
Input Format: | Audio (FLAC format) and text inputs |
|
Accepted Modalities: | |
Output Format: | |
|