Changing Answer Order Can Decrease MMLU Accuracy
Authors:
Vipul Gupta,
David Pantoja,
Candace Ross,
Adina Williams,
Megan Ung
Abstract:
As large language models (LLMs) have grown in prevalence, particular benchmarks have become essential for the evaluation of these models and for understanding model capabilities. Most commonly, we use test accuracy averaged across multiple subtasks in order to rank models on leaderboards, to determine which model is best for our purposes. In this paper, we investigate the robustness of the accurac…
▽ More
As large language models (LLMs) have grown in prevalence, particular benchmarks have become essential for the evaluation of these models and for understanding model capabilities. Most commonly, we use test accuracy averaged across multiple subtasks in order to rank models on leaderboards, to determine which model is best for our purposes. In this paper, we investigate the robustness of the accuracy measurement on a widely used multiple choice question answering dataset, MMLU. When shuffling the answer label contents, we find that all explored models decrease in accuracy on MMLU, but not every model is equally sensitive. These findings suggest a possible adjustment to the standard practice of leaderboard testing, where we additionally consider the percentage of examples each model answers correctly by random chance.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
Bohmian Quantization of the Big Rip
Authors:
Nelson Pinto-Neto,
Diego Moraes Pantoja
Abstract:
It is shown in this paper that minisuperspace quantization of homogeneous and isotropic geometries with phantom scalar fields, when examined in the light of the Bohm-de Broglie interpretation of quantum mechanics, does not eliminate, in general, the classical big rip singularity present in the classical model. For some values of the Hamilton-Jacobi separation constant present in a class of quant…
▽ More
It is shown in this paper that minisuperspace quantization of homogeneous and isotropic geometries with phantom scalar fields, when examined in the light of the Bohm-de Broglie interpretation of quantum mechanics, does not eliminate, in general, the classical big rip singularity present in the classical model. For some values of the Hamilton-Jacobi separation constant present in a class of quantum state solutions of the Wheeler-DeWitt equation, the big rip can be either completely eliminated or may still constitute a future attractor for all expanding solutions. This is contrary to the conclusion presented in Ref.[1], using a different interpretation of the wave function, where the big rip singularity is completely eliminated ("smoothed out") through quantization, independently of such separation constant and for all members of the above mentioned class of solutions. This is an example of the very peculiar situation where different interpretations of the same quantum state of a system are predicting different physical facts, instead of just giving different descriptions of the same observable facts: in fact, there is nothing more observable than the fate of the whole Universe.
△ Less
Submitted 16 November, 2009; v1 submitted 16 November, 2009;
originally announced November 2009.