Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in evaluation scripts #44

Closed
xxxbrem opened this issue Jan 7, 2025 · 3 comments
Closed

Bugs in evaluation scripts #44

xxxbrem opened this issue Jan 7, 2025 · 3 comments

Comments

@xxxbrem
Copy link

xxxbrem commented Jan 7, 2025

  • sf_bq085: condition_cols should be [1, 2] rather than [2, 3]. There are only 3 columns in sf_bq085.csv.

  • When invoking the compare_multi_pandas_table function, the order of csv_files is the reverse of condition_cols. For example, in sf_bq088, the csv_files are arranged as follows:

    0 ='sf_bq088_c.csv'
    1 ='sf_bq088_b.csv'
    2 ='sf_bq088_a.csv'
    

    However, the condition_cols are ordered as a, b, c. Similar reversals are observed in other cases requiring the compare_multi_pandas_table function.

  • sf_bq341: The top two records have the same total value, so the ignore_order parameter should be set to True.

  • sf_local199: The ignore_order parameter should be set to True.

  • sf_local193: When comparing values containing "%", the comparison is performed as strings rather than as floats.

  • sf_bq121: The provided result is inconsistent with the result obtained from executing the gold SQL.

  • sf_bq406: Another sf_bq406b could be included in a column, following the format used in other examples with multiple answers.

  • sf_bq375: The file type of the answer should be Java (.java) as specified in the task description. Alternatively, a less strict approach would be to ignore this column.

@lfy79001
Copy link
Collaborator

lfy79001 commented Jan 7, 2025

Hi!
Thank you for your suggestion! We have pushed the updated answers and eval config. Additionally, we have decided to maintain a document (https://docs.google.com/document/d/1a69mxO7m1nMndXp8H_-aggvYDbcbiS3rV9GPXEw-DeM/edit?usp=sharing) to track data updates!

@xxxbrem
Copy link
Author

xxxbrem commented Jan 8, 2025

Thank you for your updating!
But for multiple tables updating, the condition_cols in spider2snow_eval.jsonl should also be updated.
Such as:

{"instance_id": "sf_bq406", "condition_cols": [[0, 1, 2, 3, 4, 5, 6, 7, 8], [0]], "ignore_order": false, "toks": "99"}
{"instance_id": "sf_local193", "condition_cols": [[0, 1, 2], [0, 1, 2]], "ignore_order": false, "toks": "126"}

Additionally, for sf_local193, it is recommended to retain the original answer with the "%" symbol and perform the numeric comparison using the percentage format rather than directly comparing float values, which aligns with the task requirement that states, "The percentage should be shown with %".

@lfy79001
Copy link
Collaborator

lfy79001 commented Jan 8, 2025

Thanks! We have fixed them.

For answer types such as sf_local193, where two CSVs share the same format, we use "condition_cols": [0, 1, 2]. The evaluation scripts can then be extended accordingly. ( [0,1,2] -> [[0,1,2],[0,1,2])

@xxxbrem xxxbrem closed this as completed Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants