| Model Type | | text-to-text, text-to-code |
|
| Use Cases |
| Areas: | | research, commercial applications |
|
| Applications: | | code completion, code generation, code conversation, code education |
|
| Primary Use Cases: | | interactive code learning, syntax correction, coding practice |
|
| Limitations: | | Large Language Models (LLMs) have limitations based on their training data. |
|
|
| Additional Notes | | TPU hardware was used for training. The model focused on fitness in real-world applications with structured examples using heuristic techniques. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | publicly available code repositories, open source mathematics datasets, synthetically generated code |
|
| Data Volume: | | 500 to 1000 billion tokens |
|
| Methodology: | |
| Hardware Used: | |
|
| Safety Evaluation |
| Methodologies: | | red-teaming, structured evaluations |
|
| Findings: | | within acceptable thresholds for meeting internal policies |
|
| Risk Categories: | | content safety, representational harms, child safety |
|
|
| Responsible Ai Considerations |
| Fairness: | | Evaluated with structured evaluations and internal red-teaming |
|
| Accountability: | |
|
| Input Output |
| Input Format: | | code prefix and/or suffix, or natural language text or prompt |
|
| Output Format: | | fill-in-the-middle code completion, code and natural language |
|
| Performance Tips: | | Provide a list of terminators to the `generate` function to ensure generation stops at the first delimiter. |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Performance metrics and comparisons provided. |
|
|
|