Research on large language models, specifically on training models on filtered web data
Limitations:
Production use without adequate risk assessment may be irresponsible or harmful, Not suitable for non-English data due to training on English data only
Considerations:
Finetuning and guardrail setups for production contexts.
Additional Notes
Falcon is released under the Apache 2.0 license and is intended as a research artifact.
Supported Languages
English (native)
Training Details
Data Sources:
RefinedWeb
Data Volume:
350B tokens
Methodology:
Training adapted from GPT-3 paper using ALiBi and FlashAttention
Context Length:
2048
Training Time:
approximately five days
Hardware Used:
256 A100 40GB GPUs
Model Architecture:
36 layers, d_model=4096, head_dim=64
Responsible Ai Considerations
Fairness:
Training on large-scale web data might introduce common online stereotypes and biases.
Mitigation Strategies:
Finetuning the model for specific tasks and setting appropriate guardrails for production use.
Note: green Score (e.g. "73.2") means that the model is better than tiiuae/falcon-rw-7b.
Rank the Falcon Rw 7B Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.