| Model Type | | text-to-code, text-to-text, code completion, code generation, instruction following, chat |
|
| Use Cases |
| Areas: | | Research, Development, Education |
|
| Applications: | | Code Completion, Code Generation, Code Conversation, Code Education |
|
| Primary Use Cases: | | IDE code completion, Code chat applications, Interactive learning |
|
| Limitations: | | Training data limitations, Inherent limitations of LLMs |
|
|
| Additional Notes | | Model implements rigorous safety filtering. |
|
| Supported Languages | | Primary Language (English), Code Languages (C++, C#, Go, Java, JavaScript, Kotlin, Python, Rust) |
|
| Training Details |
| Data Sources: | | Public code repositories, Open source mathematics datasets, Synthetically generated code |
|
| Data Volume: | | 500 to 1000 billion tokens |
|
| Methodology: | |
| Hardware Used: | |
|
| Safety Evaluation |
| Methodologies: | | structured evaluations, internal red-teaming |
|
| Risk Categories: | | cyber-offence capabilities, representational harms, large-scale harms |
|
|
| Input Output |
| Input Format: | | Natural language text or prompt, code prefix/suffix |
|
| Accepted Modalities: | |
| Output Format: | | Code and natural language |
|
|