Bugs in evaluation scripts #44

xxxbrem · 2025-01-07T00:34:27Z

sf_bq085: condition_cols should be [1, 2] rather than [2, 3]. There are only 3 columns in sf_bq085.csv.
When invoking the compare_multi_pandas_table function, the order of csv_files is the reverse of condition_cols. For example, in sf_bq088, the csv_files are arranged as follows:
```
0 ='sf_bq088_c.csv'
1 ='sf_bq088_b.csv'
2 ='sf_bq088_a.csv'
```
However, the condition_cols are ordered as a, b, c. Similar reversals are observed in other cases requiring the compare_multi_pandas_table function.
sf_bq341: The top two records have the same total value, so the ignore_order parameter should be set to True.
sf_local199: The ignore_order parameter should be set to True.
sf_local193: When comparing values containing "%", the comparison is performed as strings rather than as floats.
sf_bq121: The provided result is inconsistent with the result obtained from executing the gold SQL.
sf_bq406: Another sf_bq406b could be included in a column, following the format used in other examples with multiple answers.
sf_bq375: The file type of the answer should be Java (.java) as specified in the task description. Alternatively, a less strict approach would be to ignore this column.

The text was updated successfully, but these errors were encountered:

lfy79001 · 2025-01-07T02:55:07Z

Hi!
Thank you for your suggestion! We have pushed the updated answers and eval config. Additionally, we have decided to maintain a document (https://docs.google.com/document/d/1a69mxO7m1nMndXp8H_-aggvYDbcbiS3rV9GPXEw-DeM/edit?usp=sharing) to track data updates!

xxxbrem · 2025-01-08T06:12:34Z

Thank you for your updating!
But for multiple tables updating, the condition_cols in spider2snow_eval.jsonl should also be updated.
Such as:

{"instance_id": "sf_bq406", "condition_cols": [[0, 1, 2, 3, 4, 5, 6, 7, 8], [0]], "ignore_order": false, "toks": "99"}
{"instance_id": "sf_local193", "condition_cols": [[0, 1, 2], [0, 1, 2]], "ignore_order": false, "toks": "126"}

Additionally, for sf_local193, it is recommended to retain the original answer with the "%" symbol and perform the numeric comparison using the percentage format rather than directly comparing float values, which aligns with the task requirement that states, "The percentage should be shown with %".

lfy79001 · 2025-01-08T06:37:33Z

Thanks! We have fixed them.

For answer types such as sf_local193, where two CSVs share the same format, we use "condition_cols": [0, 1, 2]. The evaluation scripts can then be extended accordingly. ( [0,1,2] -> [[0,1,2],[0,1,2])

xxxbrem closed this as completed Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs in evaluation scripts #44

Bugs in evaluation scripts #44

xxxbrem commented Jan 7, 2025

lfy79001 commented Jan 7, 2025

xxxbrem commented Jan 8, 2025

lfy79001 commented Jan 8, 2025

Bugs in evaluation scripts #44

Bugs in evaluation scripts #44

Comments

xxxbrem commented Jan 7, 2025

lfy79001 commented Jan 7, 2025

xxxbrem commented Jan 8, 2025

lfy79001 commented Jan 8, 2025