fix: read_csv with both index_col and use_cols inconsistent with pandas#1785
fix: read_csv with both index_col and use_cols inconsistent with pandas#1785
Conversation
032f193 to
8d6d9ee
Compare
0785cf8 to
344d6c9
Compare
bigframes/session/__init__.py
Outdated
| index_col=index_col, | ||
| columns=columns, | ||
| names=names, | ||
| is_index_in_columns=True, |
There was a problem hiding this comment.
I'm a bit confused by this parameter name. Wouldn't the read_gbq_table function be able to figure out that the index columns are present already?
There was a problem hiding this comment.
renamed to index_col_in_columns and added docstring.
bigframes/session/loader.py
Outdated
|
|
||
| def _check_column_duplicates(index_cols: Iterable[str], columns: Iterable[str]): | ||
| def _check_column_duplicates( | ||
| index_cols: Iterable[str], columns: Iterable[str], is_index_in_columns: bool |
There was a problem hiding this comment.
After looking at the logic, I still don't understand the is_index_in_columns name. If there isn't a better name, could we at least add some docstrings with more information?
There was a problem hiding this comment.
renamed to index_col_in_columns and added docstring.
tests/system/small/test_session.py
Outdated
|
|
||
| # BigFrames requires `sort_index()` because BigQuery doesn't preserve row IDs | ||
| # (b/280889935) or guarantee row ordering. | ||
| bf_df = bf_df.sort_index() |
There was a problem hiding this comment.
Don't we sort by the index already if we determine it's unique?
There was a problem hiding this comment.
Good catches! Removed it from all similar tests.
344d6c9 to
7e59b20
Compare
Fixes internal issue 408499371 🦕