| Model Type | |
| Use Cases |
| Areas: | | research, commercial applications |
|
| Primary Use Cases: | | text generation tasks, conversational interfaces |
|
| Limitations: | | potential bias, inaccurate or harmful generation |
|
| Considerations: | | Ensure legality and security when deploying. |
|
|
| Additional Notes | | TeleChat supports deepspeed fine-tuning with Zero parallel memory optimization and has been enhanced for multi-round capabilities and long-text generation. |
|
| Training Details |
| Data Sources: | |
| Data Volume: | |
| Methodology: | | 标准的 Decoder-only 结构,使用 Rotary Embedding 和 SwiGLU 激活函数 |
|
| Hardware Used: | |
| Model Architecture: | | 标准的 Decoder-only 结构,使用 Rotary Embedding 和 SwiGLU 激活函数,词嵌入层与输出层解耦 |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Released 7B version chat model and quantized versions. |
|
| Version: | |
| Date: | |
| Notes: | | Released 12B version chat model and quantized versions. |
|
| Date: | |
| Notes: | | Released 1TB Chinese dataset. |
|
|
|