The model's standard configuration requires transformers version 4.31.0 or higher to operate correctly. Special attention to hyper-parameters is needed for optimal performance.
Supported Languages
English (proficient), Japanese (proficient)
Training Details
Data Sources:
Japanese CC-100, Japanese C4, The Pile, Redpajama, Wikipedia
Data Volume:
1.5 billion tokens
Methodology:
fine-tuning using RoPE positional interpolation
Context Length:
8192
Model Architecture:
A 36-layer, 2816-hidden-size transformer-based language model
Input Output
Performance Tips:
Since the model is sensitive to decoding hyper-parameters (e.g., temperature, top_p, top_k, repetition_penalty), it is suggested to explore the best setting for your task.
Note: green Score (e.g. "73.2") means that the model is better than rinna/bilingual-gpt-neox-4b-8k.
Rank the Bilingual GPT Neox 4B 8K Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.