| Model Type | | text-to-text, text-to-code, decoder-only |
|
| Use Cases |
| Areas: | | code completion, code generation, code conversation, code education |
|
| Applications: | | IDE integration, Conversational interfaces |
|
| Primary Use Cases: | | Code fragments question answering, Natural language to code generation |
|
| Limitations: | |
| Considerations: | | Users should follow safety and ethical guidelines outlined by Google AI. |
|
|
| Additional Notes | | CodeGemma models build on the successful Gemma model with specialized training for text-to-code tasks. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | publicly available code repositories, open source mathematics datasets, synthetically generated code |
|
| Data Volume: | | 500 to 1000 billion tokens |
|
| Methodology: | | Further trained variants including FIM (Fill-In-The-Middle) techniques |
|
| Hardware Used: | |
| Model Architecture: | | CodeGemma models are built on top of Gemma |
|
|
| Safety Evaluation |
| Methodologies: | | human evaluation, cyber-offense testing, red-teaming |
|
| Findings: | | Within acceptable thresholds for meeting internal policies |
|
| Risk Categories: | | child safety, content safety, representational harms, memorization, large-scale harms |
|
| Ethical Considerations: | | Evaluation of ethical considerations in line with Google AI Principles |
|
|
| Responsible Ai Considerations |
| Mitigation Strategies: | | Implemented rigorous safety filtering and evaluation processes. |
|
|
| Input Output |
| Input Format: | | code prefix/suffix or natural language text |
|
| Accepted Modalities: | |
| Output Format: | | code and natural language text |
|
|