| Model Type | | long-context language model, instruction-tuned model |
|
| Use Cases |
| Areas: | |
| Applications: | | long document processing, chat interactions, QA tasks |
|
| Primary Use Cases: | | Recall tasks, Retrieval-augmented generation, In-context learning, Reranking, Long-document QA/summarization |
|
|
| Additional Notes | | The ProLong LLM series is aimed at enhancing the context length capability of Llama-3 models with minimal performance degradation. |
|
| Training Details |
| Data Sources: | | Books, Textbooks, The Stack V1, StackExchange, Tulu-v2, Wikipedia, Arxiv, OpenWebMath, FineWeb, FineWeb-EDU |
|
| Data Volume: | | 20B tokens for 64k version, additional 5B for 512k version |
|
| Methodology: | | Efficient training techniques such as FlashAttention-2's variable length attention and smart batching were used. A carefully curated data mixture of short and long data was employed. |
|
| Context Length: | |
|