fix(rust): follow function params & aliases for left-most name #13819

mcrumiller · 2024-01-18T21:37:37Z

Resolves #13817 and resolves #13820.

Not completely sure this is the proper resolution but it fixes the issue. This does cause one test to fail so @reswqa I may need your help here. The failing test is this:

import polars as pl
df = (
    pl.DataFrame(
        {
            "col1": pl.int_range(0, 7, eager=True),
            "col2": pl.int_range(0, 7, eager=True).reverse(),
        }
    )
).with_columns(
    [
        pl.when(pl.int_range(0, pl.len(), eager=False) < 2)
        .then(None)
        .otherwise(pl.all())
        .name.suffix("_nulls")
    ]
)

polars.exceptions.ComputeError: The name: 'literal_nulls' passed to `LazyFrame.with_columns` is duplicate

Edit: ok, the reason is because pl.when() now dives into the left-most arg, whereas it looks like before the pl.all() from the otherwise branch populated the names by default. What is supposed to happen?

reswqa · 2024-01-19T06:46:57Z

When deciding the schema for Ternary, we let the name of truthy branch be the name of Ternary expression. We should also follow this with RenameAlias, so I think it's indeed a bug to bypass truthy but get falsy name, and this will be fixed by this PR.

In addition, without this suffix, the following query would raise now. I don't think adding a suffix should prevent it from giving an duplicate columns error.

        with_columns(
            pl.when(pl.int_range(0, pl.len(), eager=False) < 2)
            .then(None)
            .otherwise(pl.all())
    )

mcrumiller · 2024-01-19T13:59:36Z

When deciding the schema for Ternary, we let the name of truthy branch be the name of Ternary expression. We should also follow this with RenameAlias, so I think it's indeed a bug to bypass truthy but get falsy name, and this will be fixed by this PR.

Question: when you say "truthy" branch, do you mean the first input (in the when case)? I don't understand otherwise, what if the result of the when clause is sometimes True and sometimes False? Which branch is truthy in that case?

With that aside....doesn't this mean that we can never do the following pattern?

pl.when(pl.some_function()).then(something).otherwise(pl.all())

In other words, pl.all() is invalid in the otherwise case, which procludes us from using a single column to affect multiple other columns. Is there a way around this? This seems like it could break a lot of stuff.

reswqa · 2024-01-19T14:08:25Z

That's the purpose of name.keep, right?

pl.when(pl.col.c).then(None).otherwise(pl.all()).name.keep()

mcrumiller · 2024-01-19T14:09:28Z

That's the purpose of name.keep, right?

Indeed! Thanks. Let me update the failing test.

mcrumiller · 2024-01-19T14:15:46Z

Hmm, it appears we can't do .name.keep().name.suffix():

>>> pl.DataFrame({"a": [1, 2, 3]}).with_columns(pl.all().name.keep().name.suffix("_2"))
pyo3_runtime.PanicException: no `name.keep` expected at this point

mcrumiller · 2024-01-19T14:17:10Z

Also, how does .name.keep() know which expressions' names we want to keep?

mcrumiller · 2024-01-19T14:31:32Z

@reswqa I've added a workaround to the renaming in the unit test, along with a note. It's a bit messier (now have to create seprate dfs and concat them) but it works.

I still don't quite understand how .name.keep() is supposed to work when there are multiple expressions present. I assume that it, like everything else, looks for the left-most expression, which m eans that .name.keep() should be failing as well, just like .name.suffix() now does too.

mcrumiller · 2024-01-19T14:33:41Z

A theoretical workaround: when using a pl.col in an expression, allow an extra tag that says "this is the column whose name I want to keep". Somethine like this:

from polars import col
df.with_columns(
    col("a").expression_chain().random_function(col("x", "y", keep_name=True)).more_functions()
)
# output: df with output columns `x` and `y`

reswqa · 2024-01-19T16:39:03Z

I assume that it, like everything else, looks for the left-most expression, which means that .name.keep() should be failing as well, just like .name.suffix() now does too.

The key is that name.keep only looks for leaf Expr::Column, but name.suffix is expected to suffix the real output name, it has to consider more things like Alias.

mcrumiller · 2024-01-19T16:58:04Z

Alright. So should we just proceed as if this is now correct, and wait until someone runs into an issue in the future with the when/then case?

reswqa · 2024-01-19T17:17:51Z

until someone runs into an issue

What do you mean by issue? The test case you have to workaround?

I can open a PR(It might be a little late because I have some things to do this weekend :)) that lets us support the following usage, Is this enough to address your concerns.

         pl.when(pl.int_range(0, pl.len(), eager=False) < 2)
            .then(None)
            .otherwise(pl.all())
            .name.keep()
            .name.suffix("_nulls")

And it's results should be:

┌──────┬──────┬────────────┬────────────┐
│ col1 ┆ col2 ┆ col1_nulls ┆ col2_nulls │
│ ---  ┆ ---  ┆ ---        ┆ ---        │
│ i64  ┆ i64  ┆ i64        ┆ i64        │
╞══════╪══════╪════════════╪════════════╡
│ 0    ┆ 6    ┆ null       ┆ null       │
│ 1    ┆ 5    ┆ null       ┆ null       │
│ 2    ┆ 4    ┆ 2          ┆ 4          │
│ 3    ┆ 3    ┆ 3          ┆ 3          │
│ 4    ┆ 2    ┆ 4          ┆ 2          │
│ 5    ┆ 1    ┆ 5          ┆ 1          │
│ 6    ┆ 0    ┆ 6          ┆ 0          │
└──────┴──────┴────────────┴────────────┘

mcrumiller · 2024-01-19T17:30:14Z

Hmm, yes I think that works. I'm still struggling a bit to follow the expected behavior though. I understand that name.keep() under the hood finds the first Column in the expression chain and keeps that, but the documentation says:

Keep the original root name of the expression

The "root name" (I think) should be what's in the pl.when expression (as of this PR), which is now "literal". So .keep_name after when/then chain should keep whatever is in the when clause, which now is literal. So, it appears there is a bit of disconnect between the documentation of .name.keep and the implementation. Do you agree with that assessment? Perhaps the documentation should be amended to say:

Keep the name of the first named column(s) used in the expression

reswqa · 2024-01-19T17:37:11Z

The "root name" (I think) should be what's in the pl.when expression (as of this PR), which is now "literal"

Depending on how you define the root expr for Ternary, we actually define it as the expr inside then() branch. After all, the final result will only come from the then/otherwise expression, not the when() expression, right? Note: It is a separate Expr, not Expr::Function.

If you accept that the first child-expr traversed is the root expr of this expression, we actually traverse when first for Expr::Ternary. Of course you can argue that the order should not be defined in this way.

mcrumiller · 2024-01-19T17:46:42Z

After all, the final result will only come from the then/otherwise expression, not the when() expression, right?

Duh, you are absolutely right.

Note: It is a separate Expr, not Expr::Function.

Why, then, does my PR appear to break this? when().then().otherwise(pl.all()).name.with_suffix() worked before, now it does not.

reswqa · 2024-01-19T17:51:50Z

Why, then, does my PR appear to break this? when().then().otherwise(pl.all()).name.with_suffix() worked before, now it does not.

The traversal order is then->otherwise->when. We have then (literal), before your change, it does not match anything in get_single_leaf. So we go to otherwise, which matches Expr::Column. But after this PR, then can match Expr::Literal directly. The result is that we are suffix literal.

reswqa · 2024-01-19T17:54:51Z

But I think your changes is correct. suffix should take into account both Literal and Alias IMO.

mcrumiller · 2024-01-19T17:55:27Z

Thanks Weijie, that makes perfect sense. I think my PR title then covers that case properly as well. I think your keep PR fix for the suffix after keep_name would be very appropriate after this. Make sure to update the docs since it says:

Notes

Due to implementation constraints, this method can only be called as the last expression in a chain.

mcrumiller · 2024-01-19T17:56:51Z

An alternative thought would be to add a suffix parameter to keep_name, which may reduce complexity with regards to the positioning of keep_name.

reswqa · 2024-01-19T18:01:48Z

An alternative thought would be to add a suffix parameter to keep_name, which may reduce complexity with regards to the positioning of keep_name.

Well, this would really make things easier, mind setting up an issue to discuss this? I thought it was worth discussing before I started making changes.

Update: Oh, to do so, it probably should take a function: Callable[[str], str] argument. Because we need to support all name mapping instead of only suffix. If people don't like this, I can support .name.keep().name.map() then.

Follow function params for left-most name

1828a38

mcrumiller requested review from ritchie46, stinodego, c-peters, alexander-beedie, MarcoGorelli and orlp as code owners January 18, 2024 21:37

github-actions bot added fix Bug fix rust Related to Rust Polars labels Jan 18, 2024

Add Alias to leaf exprs

fb78332

mcrumiller changed the title ~~fix(rust): follow function params for left-most name~~ fix(rust): follow function params & aliases for left-most name Jan 18, 2024

mcrumiller added 2 commits January 18, 2024 17:08

Format

8b51181

Add test with alias

eed0802

Fix unit test renaming

412cc32

mcrumiller mentioned this pull request Jan 19, 2024

Add function: Callable[[str], str] argument to name.keep #13858

Open

stinodego requested a review from reswqa as a code owner June 28, 2024 17:53

ritchie46 force-pushed the main branch from 0a696ff to 9c29683 Compare July 28, 2024 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rust): follow function params & aliases for left-most name #13819

fix(rust): follow function params & aliases for left-most name #13819

mcrumiller commented Jan 18, 2024 •

edited

Loading

reswqa commented Jan 19, 2024 •

edited

Loading

mcrumiller commented Jan 19, 2024 •

edited

Loading

reswqa commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024 •

edited

Loading

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024 •

edited

Loading

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024 •

edited

Loading

reswqa commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

Notes

mcrumiller commented Jan 19, 2024 •

edited

Loading

reswqa commented Jan 19, 2024 •

edited

Loading

fix(rust): follow function params & aliases for left-most name #13819

Are you sure you want to change the base?

fix(rust): follow function params & aliases for left-most name #13819

Conversation

mcrumiller commented Jan 18, 2024 • edited Loading

reswqa commented Jan 19, 2024 • edited Loading

mcrumiller commented Jan 19, 2024 • edited Loading

reswqa commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024 • edited Loading

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024 • edited Loading

mcrumiller commented Jan 19, 2024

reswqa commented Jan 19, 2024 • edited Loading

reswqa commented Jan 19, 2024

mcrumiller commented Jan 19, 2024

Notes

mcrumiller commented Jan 19, 2024 • edited Loading

reswqa commented Jan 19, 2024 • edited Loading

mcrumiller commented Jan 18, 2024 •

edited

Loading

reswqa commented Jan 19, 2024 •

edited

Loading

mcrumiller commented Jan 19, 2024 •

edited

Loading

reswqa commented Jan 19, 2024 •

edited

Loading

reswqa commented Jan 19, 2024 •

edited

Loading

reswqa commented Jan 19, 2024 •

edited

Loading

mcrumiller commented Jan 19, 2024 •

edited

Loading

reswqa commented Jan 19, 2024 •

edited

Loading