Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: do not check repeated join keys #18109

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wangxiaoying
Copy link
Contributor

@wangxiaoying wangxiaoying commented Aug 8, 2024

fix #18007

This PR attempts to fix #18007 by removing the check. It will make three of the existing tests fail (need to remove them in order to pass all tests):

FAILED tests/unit/operations/test_join.py::test_join_raise_on_repeated_expression_key_names[False] - Failed: DID NOT RAISE <class 'polars.exceptions.InvalidOperationError'>
FAILED tests/unit/operations/test_join.py::test_join_raise_on_redundant_keys - Failed: DID NOT RAISE <class 'polars.exceptions.InvalidOperationError'>
FAILED tests/unit/operations/test_join.py::test_join_raise_on_repeated_expression_key_names[True] - Failed: DID NOT RAISE <class 'polars.exceptions.InvalidOperationError'>

Can anyone check whether these checking on repeated key names are necessary (since it prevents from supporting valid SQL queries)?

@ritchie46
Copy link
Member

I don't think removing that line is correct. On that location you only join two tables, and it doesn't make sense to join on two conditions twice.

@wangxiaoying
Copy link
Contributor Author

I don't think removing that line is correct. On that location you only join two tables, and it doesn't make sense to join on two conditions twice.

Thanks for the quick reply. Indeed, if we only join two base tables it doesn't make sense to join on the same key columns. However, each input of the join may come from multiple base tables that may share the same column names. Like the examples shown in #18007 and the test cases here. In these cases, other DB engines like postgres or duckdb can handle the same query.

@ritchie46
Copy link
Member

Yes, but at that point there will be suffixed added during the previous join. Polars tables have the invariant that names are unique. Is there an example in Polars python that shows your case?

@wangxiaoying
Copy link
Contributor Author

Yes, but at that point there will be suffixed added during the previous join. Polars tables have the invariant that names are unique. Is there an example in Polars python that shows your case?

Do you mean the key columns will be renamed with suffixed in previous join? If that's the case I think this check is fine. But it seems like in my example here it throws the error.

@ritchie46
Copy link
Member

Yes, they are renamed (because of the unique names invariant). So it is up to our SQL front end to ensure that the proper names are passed to each individual join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when joining on the same pair of key names (but from different tables)
2 participants