Skip to content

Fix ValueError by nonuniform grid initialization (src/GriTS.py)#208

Open
Mennaa-Ayman wants to merge 3 commits into
microsoft:mainfrom
Mennaa-Ayman:GriTS/Updates
Open

Fix ValueError by nonuniform grid initialization (src/GriTS.py)#208
Mennaa-Ayman wants to merge 3 commits into
microsoft:mainfrom
Mennaa-Ayman:GriTS/Updates

Conversation

@Mennaa-Ayman
Copy link
Copy Markdown

When evaluating model predictions using GriTS, the model's output is often inconsistent and may not perfectly tile the ground truth grid. This leads to holes in the table structure (grid coordinates with no assigned cells).

Currently, cells_to_grid and cells_to_relspan_grid initialize the cell_grid as a 2D matrix of single floats (0.0). When a cell is found, the code overwrites that 0.0 with a list (bbox/relspan) or a string (text). If a cell is missing which is common in model predictions, the 0.0 remains.

The result: np.array() fails with a ValueError because it cannot cast a mixture of floats and lists/strings into a fixed-dimension numerical array.

Changes
1. Uniform Grid Initialization
Replaced np.zeros(...).tolist() with nested list comprehensions to ensure every cell in the grid starts with a Null value that matches the expected data type.

  • Relspan/Bbox: Now initialized with a 4-item list of zeros.
    cell_grid = [[[0, 0, 0, 0] for _ in range(num_columns)] for _ in range(num_rows)]

Text: Now initialized with empty strings.
cell_grid = [["" for _ in range(num_columns)] for _ in range(num_rows)]

2. PyMuPDF Compatibility
List Casting when creating Rect objects to ensure compatibility with modern PyMuPDF versions

intersection = Rect(list(bbox1)).intersect(list(bbox2))
union = Rect(list(bbox1)).include_rect(list(bbox2)) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant