Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
Feb 15, 2024 · We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality.
This is the official repository for our ICML'24 paper QuRating: Selecting High-Quality Data for Training Language Models and contains code for (1) collecting ...
People also ask
Selecting high-quality pre-training data is important for creating capable language models, but existing methods rely on simple heuristics.
Jun 13, 2024 · Our best model is based on educational value andperforms similarly to a model trained with uniform sampling for 50% more steps.Beyond data ...
Selecting high-quality pre-training data is important for creating capable language models, but existing methods rely on simple heuristics.
Jun 16, 2024 · Overview · This paper introduces a method called QuRating for selecting high-quality pre-training data for large language models (LLMs).
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models. QuRating: Selecting High-Quality Data for Training Language Models.
Feb 22, 2024 · QuRating: Selecting High-Quality Data for Training Language Models We all know that a high-quality dataset is very important in ML, ...
Mar 29, 2024 · QuRater model fine-tuned from 1.3B Sheared-LLaMA model. From the paper: QuRating: Selecting High-Quality Data for Training Language Models.
Feb 16, 2024 · QuRating: Selecting High-Quality Data for Training Language Models. Alexander Wettig 1. Aatmik Gupta 1. Saumya Malik 1. Danqi Chen 1. 英寸.