Skip to content

pandas 2.2.x incompatible: pivot on ArrowDtype dictionary columns crashes ('Series' object has no attribute '_pa_array') #52

@Haigutus

Description

@Haigutus

Symptom

On pandas 2.2.x with the arrow parser engines (the default in published wheels), any pivot over the dictionary-encoded columns crashes:

AttributeError: 'Series' object has no attribute '_pa_array'

Hit via type_tableview / tableview_by_type, key_tableview, id_tableview — and equally in user code that pivots/groups the KEY/INSTANCE_ID columns of a triplets DataFrame.

Root cause

The arrow engines return KEY/INSTANCE_ID as ArrowDtype dictionary columns (dictionary<values=string, indices=int32>[pyarrow]). pandas 2.2.x has an upstream bug: pivot()MultiIndex.from_arraysCategorical(...) accesses _pa_array on the wrong object for such columns. Fixed upstream in pandas 2.3.

Verified matrix (pyarrow 24): pandas 2.0.3 OK · 2.1.4 OK · 2.2.3 crash · 3.0.3 OK.

Resolution (#51, on main — not yet released)

The buggy version is excluded in the dependency metadata: pandas>=2.0,!=2.2.*. A code workaround (#49, converting dictionary columns to plain category) was tried first and reverted — the exclusion keeps the code simpler and also protects users from hitting the same pandas bug in their own code on the frames triplets hands them, which a code-side fix could not.

Verified: installing triplets into a pandas 2.2.3 environment now forces the resolver onto a non-buggy pandas.

Workaround for ≤ 0.1.0rc4 (published metadata is immutable)

  • upgrade pandas to ≥ 2.3 (or stay ≤ 2.1), or
  • parse with categorical_columns=None (no dictionary columns)

Keeping open until a release ships the exclusion.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions