Implement phase 4 critical table imports
This commit is contained in:
@@ -30,8 +30,10 @@ The current implementation supports:
|
||||
- explicit CLI commands for reset, extraction, and import
|
||||
- manifest-driven source selection
|
||||
- `standard` critical tables with columns `A-E`
|
||||
- `variant_column` critical tables with non-severity columns
|
||||
- `grouped_variant` critical tables with a group axis plus variant columns
|
||||
- XML-based extraction using `pdftohtml -xml`
|
||||
- geometry-based parsing across the currently enabled phase-3 tables:
|
||||
- geometry-based parsing across the currently enabled table set:
|
||||
- `arcane-aether`
|
||||
- `arcane-nether`
|
||||
- `ballistic-shrapnel`
|
||||
@@ -42,22 +44,24 @@ The current implementation supports:
|
||||
- `heat`
|
||||
- `impact`
|
||||
- `krush`
|
||||
- `large_creature_magic`
|
||||
- `large_creature_weapon`
|
||||
- `ma-strikes`
|
||||
- `ma-sweeps`
|
||||
- `mana`
|
||||
- `puncture`
|
||||
- `slash`
|
||||
- `subdual`
|
||||
- `super_large_creature_weapon`
|
||||
- `tiny`
|
||||
- `unbalance`
|
||||
- row-boundary repair for trailing affix leakage
|
||||
- split row-label reconstruction for tables that render labels such as `99-` / `100` as two fragments
|
||||
- footer/page-number filtering during body parsing
|
||||
- transactional loading into SQLite
|
||||
|
||||
The current implementation does not yet support:
|
||||
|
||||
- variant-column critical tables
|
||||
- grouped variant tables
|
||||
- OCR/image-based PDFs such as `Void.pdf`
|
||||
- normalized `critical_branch` population
|
||||
- normalized `critical_effect` population
|
||||
@@ -246,9 +250,28 @@ Current phase-3 notes:
|
||||
|
||||
### Phase 4: Variant and Grouped Tables
|
||||
|
||||
- support `variant_column` tables such as `Large Creature - Weapon.pdf`
|
||||
- support `grouped_variant` tables such as `Large Creature - Magic.pdf`
|
||||
- add parser strategies for additional table families
|
||||
Phase 4 extended the importer beyond `A-E` tables.
|
||||
|
||||
The currently enabled phase-4 table set is:
|
||||
|
||||
- `large_creature_weapon`
|
||||
- `family`: `variant_column`
|
||||
- columns: `NORMAL`, `MAGIC`, `MITHRIL`, `HOLY_ARMS`, `SLAYING`
|
||||
- `super_large_creature_weapon`
|
||||
- `family`: `variant_column`
|
||||
- columns: `NORMAL`, `MAGIC`, `MITHRIL`, `HOLY_ARMS`, `SLAYING`
|
||||
- `large_creature_magic`
|
||||
- `family`: `grouped_variant`
|
||||
- groups: `large`, `super_large`
|
||||
- columns: `NORMAL`, `SLAYING`
|
||||
|
||||
Phase-4 notes:
|
||||
|
||||
- grouped results now populate `critical_group` during SQLite load
|
||||
- parser dispatch is family-based instead of standard-table only
|
||||
- left-margin row labels can be reconstructed from split fragments such as `151-` / `175`
|
||||
- the grouped magic PDF is imported once as `large_creature_magic`
|
||||
- `sources/Large Creature - Magic.pdf` and `sources/Super Large Creature - Magic.pdf` are duplicate files
|
||||
|
||||
### Phase 5: Conditional Branch Extraction
|
||||
|
||||
@@ -335,10 +358,12 @@ Each entry declares:
|
||||
|
||||
The manifest is intentionally the control point for enabling importer support one table at a time.
|
||||
|
||||
For the currently enabled phase-3 entries:
|
||||
For the currently enabled entries:
|
||||
|
||||
- `family` is `standard`
|
||||
- `extractionMethod` is `xml`
|
||||
- standard tables use `family: standard`
|
||||
- creature weapon tables use `family: variant_column`
|
||||
- grouped creature magic uses `family: grouped_variant`
|
||||
- all enabled entries currently use `extractionMethod: xml`
|
||||
|
||||
## Artifact Layout
|
||||
|
||||
|
||||
Reference in New Issue
Block a user