Implement phase 4 critical table imports

This commit is contained in:
2026-03-14 03:27:14 +01:00
parent a391a1421a
commit b2f61c3d73
17 changed files with 1280 additions and 474 deletions

View File

@@ -30,8 +30,10 @@ The current implementation supports:
- explicit CLI commands for reset, extraction, and import
- manifest-driven source selection
- `standard` critical tables with columns `A-E`
- `variant_column` critical tables with non-severity columns
- `grouped_variant` critical tables with a group axis plus variant columns
- XML-based extraction using `pdftohtml -xml`
- geometry-based parsing across the currently enabled phase-3 tables:
- geometry-based parsing across the currently enabled table set:
- `arcane-aether`
- `arcane-nether`
- `ballistic-shrapnel`
@@ -42,22 +44,24 @@ The current implementation supports:
- `heat`
- `impact`
- `krush`
- `large_creature_magic`
- `large_creature_weapon`
- `ma-strikes`
- `ma-sweeps`
- `mana`
- `puncture`
- `slash`
- `subdual`
- `super_large_creature_weapon`
- `tiny`
- `unbalance`
- row-boundary repair for trailing affix leakage
- split row-label reconstruction for tables that render labels such as `99-` / `100` as two fragments
- footer/page-number filtering during body parsing
- transactional loading into SQLite
The current implementation does not yet support:
- variant-column critical tables
- grouped variant tables
- OCR/image-based PDFs such as `Void.pdf`
- normalized `critical_branch` population
- normalized `critical_effect` population
@@ -246,9 +250,28 @@ Current phase-3 notes:
### Phase 4: Variant and Grouped Tables
- support `variant_column` tables such as `Large Creature - Weapon.pdf`
- support `grouped_variant` tables such as `Large Creature - Magic.pdf`
- add parser strategies for additional table families
Phase 4 extended the importer beyond `A-E` tables.
The currently enabled phase-4 table set is:
- `large_creature_weapon`
- `family`: `variant_column`
- columns: `NORMAL`, `MAGIC`, `MITHRIL`, `HOLY_ARMS`, `SLAYING`
- `super_large_creature_weapon`
- `family`: `variant_column`
- columns: `NORMAL`, `MAGIC`, `MITHRIL`, `HOLY_ARMS`, `SLAYING`
- `large_creature_magic`
- `family`: `grouped_variant`
- groups: `large`, `super_large`
- columns: `NORMAL`, `SLAYING`
Phase-4 notes:
- grouped results now populate `critical_group` during SQLite load
- parser dispatch is family-based instead of standard-table only
- left-margin row labels can be reconstructed from split fragments such as `151-` / `175`
- the grouped magic PDF is imported once as `large_creature_magic`
- `sources/Large Creature - Magic.pdf` and `sources/Super Large Creature - Magic.pdf` are duplicate files
### Phase 5: Conditional Branch Extraction
@@ -335,10 +358,12 @@ Each entry declares:
The manifest is intentionally the control point for enabling importer support one table at a time.
For the currently enabled phase-3 entries:
For the currently enabled entries:
- `family` is `standard`
- `extractionMethod` is `xml`
- standard tables use `family: standard`
- creature weapon tables use `family: variant_column`
- grouped creature magic uses `family: grouped_variant`
- all enabled entries currently use `extractionMethod: xml`
## Artifact Layout