| Model Type | |
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Applications: | | Image captioning, Object detection, Segmentation, Vision-language tasks |
|
| Primary Use Cases: | | Captioning, Object detection, Segmentation, Vision-language tasks |
|
|
| Additional Notes | | Used in research and commercial applications without explicit censoring. |
|
| Training Details |
| Data Sources: | |
| Data Volume: | | 5.4 billion annotations across 126 million images |
|
| Methodology: | | Prompt-based approach, sequence-to-sequence architecture |
|
| Model Architecture: | | Sequence-to-sequence architecture |
|
|
| Input Output |
| Input Format: | | Input format for vision tasks through prompts. |
|
| Accepted Modalities: | |
| Output Format: | | Text-based descriptions and symbols for image annotations. |
|
| Performance Tips: | | Switch prompts to trigger different tasks. |
|
|