exploration into comparing RLHF tuning vs. 'guided'/specific tuning on 'quality' datasets
Limitations:
Not trained/fine-tuned with RLHF., Will not be as helpful/generalizable/safe as chatGPT., Model is ~30x smaller than chatGPT.
Additional Notes
The model was not trained with Reinforcement Learning through Human Feedback (RLHF) and focuses on generating high-quality answers akin to what humans would desire.
Training Details
Data Sources:
pszemraj/HC3-textgen-qa
Methodology:
fine-tuning
Input Output
Accepted Modalities:
text
Output Format:
text
Performance Tips:
Contrastive search with top_k=4 and penalty_alpha=0.6 is used for generation.
Release Notes
Version:
pythia-6.9b-hc3-qa-assistant
Notes:
Tested on causal language modeling with a training loss of 1.2598, and validated with a loss of 1.2372 and an accuracy of 0.6769.
Note: green Score (e.g. "73.2") means that the model is better than pszemraj/pythia-6.9b-HC3.
Rank the Pythia 6.9B HC3 Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.