perf: Join op discards child ordering in unordered mode#923
perf: Join op discards child ordering in unordered mode#923TrevorBergeron merged 8 commits intomainfrom
Conversation
4a6b85a to
24aaaff
Compare
24aaaff to
ac805ce
Compare
| join=node.join, | ||
| ) | ||
| else: | ||
| left_unordered = self.compile_unordered_ir(node.left_child) |
There was a problem hiding this comment.
I'm curious if wee need to do anything to handle sort=True behavior from the pandas join? I'm guessing "no", as I suspect this would be implemented as a sort after the join.
Also, let's add a comment explaining this optimization:
| left_unordered = self.compile_unordered_ir(node.left_child) | |
| # In general, joins are an ordering destroying operation. | |
| # With ordering_mode = "partial", make this explicit. In | |
| # this case, we don't need to provide a deterministic ordering. | |
| left_unordered = self.compile_unordered_ir(node.left_child) |
There was a problem hiding this comment.
added comment
There was a problem hiding this comment.
And yes, sort=True sorts as an additional operation and so will still apply: https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/core/blocks.py#L2073-L2080
bigframes/core/compile/compiler.py
Outdated
| if self.strict: | ||
| compiled_ordered = [ | ||
| self.compile_ordered_ir(node) for node in node.children | ||
| ] | ||
| return concat_impl.concat_ordered(compiled_ordered) | ||
| else: | ||
| compiled_unordered = [ | ||
| self.compile_unordered_ir(node) for node in node.children | ||
| ] | ||
| return concat_impl.concat_unordered(compiled_unordered).as_ordered_ir() |
There was a problem hiding this comment.
This optimization worries me a little more than the join optimization. Could we update some documentation for concat to call out this fact? I don't find this intuitive.
There was a problem hiding this comment.
Hmm, could be confusing if concat doesn't conserve ordering. Reverting this aspect of the change - only join will drop child ordering for now.
c0a5346 to
0b1ce61
Compare
…er mode
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕