BUG: Pandas 1.1.5 location-based indexing error with quantized pivot table #38367

tgaddair · 2020-12-08T16:08:38Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import numpy as np
import pandas as pd

input_df = pd.DataFrame(**{
    'index': [0, 1], 
    'columns': ['loss', 'category_64973.fc_size', 'category_64973.num_fc_layers', 'training.learning_rate'], 
    'data': [[1.0549572706222534, 240, 2, 0.0014908184659929895], [1.225046157836914, 160, 2, 0.0013734204727201226]]
})

input_df['training.learning_rate'] = pd.qcut(
    input_df['training.learning_rate'],
    q=10,
    precision=3,
    duplicates='drop',
)

data = input_df.pivot_table(
    index='category_64973.fc_size',
    columns='training.learning_rate',
    values='loss',
    aggfunc='mean'
)

# Seaborn code starts here
mask = np.zeros(data.shape, bool)
mask = pd.DataFrame(mask,
                    index=data.index,
                    columns=data.columns,
                    dtype=bool)

mask | pd.isnull(data)

Problem description

An error occurs when attempting to plot a quantized pivot table using Seaborn with the latest version of Pandas (1.1.5).

The code above is a self-contained example showing what Seaborn is doing when heatmap() is called on the input pivot table (data). See this usage in the Ludwig framework: https://github.com/uber/ludwig/blob/master/ludwig/utils/visualization_utils.py#L1392. Prior to v1.1.5, this code was working fine and used to generate plots in Ludwig.

The stack trace is as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    701             try:
--> 702                 self._validate_key(k, i)
    703             except ValueError as err:

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1368         else:
-> 1369             raise ValueError(f"Can only index by location with a [{self._valid_types}]")
   1370 

ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-1-e654830c5b85> in <module>
     32                     dtype=bool)
     33 
---> 34 mask | pd.isnull(data)

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/ops/__init__.py in f(self, other, axis, level, fill_value)
    638             self, other, op, axis, default_axis, fill_value, level
    639         ):
--> 640             return _frame_arith_method_with_reindex(self, other, op)
    641 
    642         if isinstance(other, ABCSeries) and fill_value is not None:

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/ops/__init__.py in _frame_arith_method_with_reindex(left, right, op)
    572     )
    573 
--> 574     new_left = left.iloc[:, lcols]
    575     new_right = right.iloc[:, rcols]
    576     result = op(new_left, new_right)

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    871                     # AttributeError for IntervalTree get_value
    872                     pass
--> 873             return self._getitem_tuple(key)
    874         else:
    875             # we by definition only have the 0th axis

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1441     def _getitem_tuple(self, tup: Tuple):
   1442 
-> 1443         self._has_valid_tuple(tup)
   1444         try:
   1445             return self._getitem_lowerdim(tup)

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    705                     "Location based indexing can only have "
    706                     f"[{self._valid_types}] types"
--> 707                 ) from err
    708 
    709     def _is_nested_tuple_indexer(self, tup: Tuple) -> bool:

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Note that this last mask | pd.isnull(data) operations succeeds with Pandas 1.1.4 and all other dependencies being left the same.

Expected Output

The mask | pd.isnull(data) call should succeed.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : b5958ee
python : 3.7.8.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0
Cython : 0.29.21
pytest : 6.1.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : 0.10.1
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.2
bottleneck : None
fsspec : 0.8.4
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 2.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.52.0

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2020-12-10T14:25:07Z

Thanks @tgaddair for the report

Note that this last mask | pd.isnull(data) operations succeeds with Pandas 1.1.4 and all other dependencies being left the same.

first bad commit: [e99e5ab] BUG: Fix duplicates in intersection of multiindexes (#36927) cc @phofl

phofl · 2020-12-10T17:35:14Z

Yikes,
unique() on a categorical index deletes unused categories, while intersection does not.

@jbrockmendel Is this expected? If yes, any suggestions on how to handle this case? Problem lies in

pandas/pandas/core/ops/__init__.py

Line 310 in ec8240a

if fill_value is None and level is None and axis is default_axis:

jbrockmendel · 2020-12-10T17:52:11Z

Would #38140 fix this?

phofl · 2020-12-11T17:35:35Z

Yep, this would fix this

simonjayhawkins · 2020-12-11T17:42:29Z

#38140 is milestoned for 1.3

we will probably want a fix in place for 1.2 since this is a regression.

phofl · 2020-12-11T17:46:01Z

@simonjayhawkins

We could apply unique to the intersection if it is categorical. This is an ugly fix which could be removed when #38140 is merged

simonjayhawkins · 2020-12-16T16:10:17Z

@phofl if you could put together a PR (suitable for merging before 1.2 release) for review, that'll be great.

tgaddair added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2020

tgaddair mentioned this issue Dec 8, 2020

Pin pandas<1.1.5 until pivot table issue is resolved ludwig-ai/ludwig#1046

Merged

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Dec 10, 2020

code sample for pandas-dev#38367

979b5f2

simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 10, 2020

phofl mentioned this issue Dec 16, 2020

BUG: Regression in logical ops raising ValueError with Categorical columns with unused categories #38532

Merged

5 tasks

simonjayhawkins added this to the 1.2 milestone Dec 18, 2020

jreback closed this as completed in #38532 Dec 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Pandas 1.1.5 location-based indexing error with quantized pivot table #38367

BUG: Pandas 1.1.5 location-based indexing error with quantized pivot table #38367

tgaddair commented Dec 8, 2020

INSTALLED VERSIONS

simonjayhawkins commented Dec 10, 2020

Uh oh!

phofl commented Dec 10, 2020

Uh oh!

jbrockmendel commented Dec 10, 2020

Uh oh!

phofl commented Dec 11, 2020

Uh oh!

simonjayhawkins commented Dec 11, 2020

Uh oh!

phofl commented Dec 11, 2020

Uh oh!

simonjayhawkins commented Dec 16, 2020

Uh oh!

Uh oh!

BUG: Pandas 1.1.5 location-based indexing error with quantized pivot table #38367

BUG: Pandas 1.1.5 location-based indexing error with quantized pivot table #38367

Comments

tgaddair commented Dec 8, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

simonjayhawkins commented Dec 10, 2020

Uh oh!

phofl commented Dec 10, 2020

Uh oh!

jbrockmendel commented Dec 10, 2020

Uh oh!

phofl commented Dec 11, 2020

Uh oh!

simonjayhawkins commented Dec 11, 2020

Uh oh!

phofl commented Dec 11, 2020

Uh oh!

simonjayhawkins commented Dec 16, 2020

Uh oh!

Output of `pd.show_versions()`