| Model Type | | Multimodal, Audio understanding |
|
| Use Cases |
| Areas: | | Audio analysis, Sound understanding, Music appreciation, Speech editing |
|
| Primary Use Cases: | | Multimodal understanding, Sound analysis, Multi-turn dialogues, Multi-language support |
|
|
| Additional Notes | | The model is designed to operate efficiently with diverse audio and text inputs, and aims to serve as a universal audio understanding model based on Alibaba Cloud's large model series. |
|
| Supported Languages | | languages_supported (/ languages: zh, en) |
|
| Training Details |
| Methodology: | | Multimodal and multi-task pre-training with a multi-task training framework. |
|
|
| Input Output |
| Input Format: | | Audio (FLAC format) and text inputs |
|
| Accepted Modalities: | |
| Output Format: | |
|