bed_to_regions silently drops strand when polars Categorical

## Summary

`_dataset/_utils.py::bed_to_regions` checks `bed.schema.get('strand', None) == pl.Utf8` before mapping `+/-` to `1/-1`. When `strand` is a Categorical column (which is what `gvl.write` stores when the input BED has a repeated strand vocabulary — very common), the check returns False, and the fallback branch `cols.append(pl.col('strand'))` appends the raw string column without mapping.

`.to_numpy()` on the resulting polars frame then produces `dtype=object` (int columns + string strand). Downstream, the njit-compiled `get_diffs_sparse` and `reconstruct_haplotypes_from_sparse` fail with:

```
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type array(pyobject, 1d, A)
```

## Reproduction

On GVL 0.21.4 + polars 1.40 + numba 0.65:

```python
import polars as pl
import genvarloader as gvl

bed = pl.DataFrame({
    'chrom': ['chr17'] * 3,
    'chromStart': [100, 200, 300],
    'chromEnd': [150, 250, 350],
    'strand': pl.Series(['+', '+', '-'], dtype=pl.Categorical),
    'transcript_id': ['t1', 't1', 't2'],
    'exon_number': [1, 2, 1],
})
gvl.write('/tmp/test_gvl', bed, '<some_pgen>')
ds = gvl.Dataset.open('/tmp/test_gvl', reference='...').with_seqs('haplotypes')
ds[0, 0]  # -> numba TypingError
```

## Fix

Two small changes in `_dataset/_utils.py::bed_to_regions`:

```diff
-    if bed.schema.get('strand', None) == pl.Utf8:
+    if bed.schema.get('strand', None) in (pl.Utf8, pl.String, pl.Categorical, pl.Enum):
         cols.append(
-            pl.col('strand').replace_strict({'+': 1, '-': -1}, return_dtype=pl.Int32)
+            pl.col('strand').cast(pl.String).replace_strict({'+': 1, '-': -1}, return_dtype=pl.Int32)
         )
```

(`.cast(pl.String)` is a no-op on String columns and unwraps Categorical/Enum cleanly.)

Verified locally: `ds._full_regions.dtype` goes from `object` to `int32`, and `ds[i, j]` returns the expected `Ragged` without tripping numba.

Happy to open a PR if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bed_to_regions silently drops strand when polars Categorical #152

Summary

Reproduction

Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bed_to_regions silently drops strand when polars Categorical #152

Description

Summary

Reproduction

Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions