| Model Type | | audio-language, multimodal |
|
| Use Cases |
| Areas: | | research, multimodal applications |
|
| Limitations: | | The model may not accurately follow human instructions., Prone to generating hallucinations., Lacks moderation mechanisms, potentially producing harmful or inappropriate responses. |
|
| Considerations: | | Developers should assess risks based on specific applications. |
|
|
| Supported Languages | | languages_supported (Thai, English), proficiency_level (native) |
|
| Training Details |
| Methodology: | | Incorporates Whisper's encoder and BEATs |
|
| Model Architecture: | | Based on Typhoon-1.5-8b-instruct architecture |
|
|
| Input Output |
| Accepted Modalities: | |
| Output Format: | |
|