Phi 3 Vision 128K Instruct is an open-source language model by inventbot. Features: 4.1b LLM, VRAM: 8.3GB, Context: 128K, License: mit, Instruction-Based, LLM Explorer Score: 0.14.
Phi 3 Vision 128K Instruct Parameters and Internals
Model Type
text generation, multimodal
Use Cases
Areas:
research, commercial applications
Applications:
general AI systems, visual and text input capabilities, memory/compute constrained environments, OCR, image understanding
Primary Use Cases:
general image understanding, text generation, language understanding
Limitations:
not evaluated for all downstream purposes, inappropriate for high-risk scenarios without additional safeguards
Considerations:
Developers should ensure accuracy, safety, and fairness for their use cases.
Additional Notes
Phi-3-Vision-128K-Instruct is designed for use in latency-constrained scenarios and comes with rights for commercial use. Developers should apply responsible AI best practices.
Supported Languages
multilingual (high quality, reasoning dense)
Training Details
Data Sources:
publicly available documents, high-quality educational data and code, selected high-quality image-text interleave, synthetic data for teaching math, coding, and reasoning, newly created image data (charts, tables, diagrams), high-quality chat format supervised data
Data Volume:
500B vision and text tokens
Methodology:
supervised fine-tuning and direct preference optimization for instruction adherence
Context Length:
128000
Training Time:
1.5 days
Hardware Used:
512 H100-80G GPUs
Model Architecture:
Includes image encoder, connector, projector, and Phi-3 Mini language model
Safety Evaluation
Risk Categories:
misinformation, offensive content, bias
Ethical Considerations:
Models may produce inappropriate or offensive content. Developers should implement necessary safeguards.
Responsible Ai Considerations
Fairness:
Models can over- or under-represent groups of people and reinforce stereotypes.
Transparency:
Developers should inform users they are interacting with AI.
Accountability:
Developers are responsible for ensuring use case compliance with laws.
Mitigation Strategies:
Additional debiasing techniques and RAG for misinformation.
Input Output
Input Format:
Text and image as inputs using chat template format
Note: green Score (e.g. "73.2") means that the model is better than inventbot/Phi-3-vision-128k-instruct.
Rank the Phi 3 Vision 128K Instruct Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.