| Model Type | | text-to-text, text-to-code, decoder-only |
|
| Use Cases |
| Areas: | | code completion, code generation, code conversation, code education |
|
| Limitations: | | limitations based on training data, ethical concerns |
|
|
| Additional Notes | | Supports Responsible AI development |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | publicly available code repositories, open source mathematics datasets, synthetically generated code |
|
| Data Volume: | |
| Methodology: | | FIM Pretraining, dependency graph-based packing, unit test-based lexical packing |
|
| Hardware Used: | |
|
| Safety Evaluation |
| Findings: | | evaluation within acceptable thresholds for content safety, representational harms |
|
| Risk Categories: | | child safety, content safety, representational harms, memorization, large-scale harms |
|
| Ethical Considerations: | | Tested against autonomous hacking capabilities and potential harms |
|
|
| Responsible Ai Considerations |
| Mitigation Strategies: | | Implemented safety filtering |
|
|
| Input Output |
| Input Format: | | Code prefix/suffix or natural language text prompt |
|
| Output Format: | | Code generation, completion, conversation |
|
|