Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Past week
  • Any time
  • Past hour
  • Past 24 hours
  • Past week
  • Past month
  • Past year
All results
1 day ago · Hyperparameters ; Optimizer: AdamW ; Learning Rate: 0.0002 with a cosine learning rate scheduler ; Epochs: 3 ; Batch Size: 4 ; Gradient Accumulation Steps: 4 ...
3 days ago · The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all ...
5 days ago · The bash script contains paths to the .nemo checkpoint and the dataset, as well as other PEFT hyperparameters such as batch-size, learning-rate, etc. These ...
6 days ago · We train the Mistral-7B with a learning rate of 5e-6. For other models, The learn- ing rates are set to 2e-5, 1e-5, and 6e-6 for the. 7B/13B, 34B, and 67B ...
1 day ago · In this section, we presented the results for fine-tuning the MedQA training set (11.4k data) using the Mistral 7B model, our best-performing model, as a ...
6 days ago · After being fine-tuned on the WSD training data, the instruction-tuned. Mistral achieves an F1 score of 65.2 (versus 78.0 without instruction tuning), while the ...
5 days ago · We used constant learning rate for the most of the training, followed by a relatively short learning rate decay stage. ... Mistral-7B-v0.1, 23.86, 22.02, 2.49 ...
4 days ago · Adam and AdamW belong to the family of adaptive learning rate optimization algorithms. They leverage the concept of momentum, which accelerates gradient descent ...
3 days ago · Each pre-training checkpoint is fully fine-tuned for 3 epochs with a batch size of 8 and learning rates resulting from minimal hyperparameter tuning. Each task ...
3 days ago · Mistral AI's models Mistral 7B and Mixtral 8x7b have the more permissive Apache License. ... Note that training cost is much higher than inference cost ...