# Critical Tables DB Model ## What the PDFs look like The PDFs are not one uniform table shape. I found three families: 1. Standard tables - Columns are severity-like keys such as `A` through `E`. - Rows are roll bands such as `01-05`, `66`, `96-99`, or `100`. - Examples: `Slash.pdf`, `Puncture.pdf`, `Arcane Aether.pdf`. 2. Variant-column tables - Columns are not severity letters; they are variant keys such as `normal`, `magic`, `mithril`, `holy arms`, `slaying`. - Rows are still roll bands. - Example: `Large Creature - Weapon.pdf`. 3. Grouped variant tables - There is an extra grouping axis above the column axis. - Example: `Large Creature - Magic.pdf` has: - group: `large`, `super_large` - column: `normal`, `slaying` - In the current importer manifest, the grouped magic PDF is loaded once as `large_creature_magic` because the `Large Creature - Magic.pdf` and `Super Large Creature - Magic.pdf` source files are duplicates. - row: roll band There are also extraction constraints: - Most PDFs are text extractable with `pdftohtml -xml`. - `Void.pdf` appears image-based and will need OCR bootstrap, with the existing curation flow handling cleanup. - A single cell can contain: - base description text - symbolic affixes such as `+5H - 2S - 3B` - conditional branches such as `with helmet`, `w/o leg greaves`, `if foe has shield` Because of that, the safest model is hybrid: - relational tables for lookup axes and indexed effects - raw text storage for fidelity - structured JSON for irregular branches that are hard to normalize perfectly on first pass ## Recommended logical model ### 1. `critical_table` One record per PDF/table, which is the primary "critical type" for lookup. Examples: - `slash` - `puncture` - `arcane_aether` - `large_creature_weapon` - `large_creature_magic` ### 2. `critical_group` Optional extra axis for tables that need more than type + column + roll. Examples: - `large` - `super_large` Most tables will have no group rows. ### 3. `critical_column` Generalized "severity/column" axis. Examples: - `A`, `B`, `C`, `D`, `E` - `normal`, `magic`, `mithril`, `holy_arms`, `slaying` Do not hardcode this as a single severity enum. Treat it as a table-defined dimension. ### 4. `critical_roll_band` Stores row bands and supports exact row lookup by roll. Examples: - `01-05` - `66` - `96-99` - `251+` Recommended fields: - `min_roll` - `max_roll` nullable for open-ended rows like `251+` - display label - sort order ### 5. `critical_result` One record per lookup cell: - table - optional group - column - roll band This stores: - `is_curated` - `raw_cell_text` - `description_text` - `raw_affix_text` - `parsed_json` - `parse_status` - `source_page_number` - `source_image_path` - `source_image_crop` `is_curated` is an explicit workflow flag. Once a result is curated in the web editor, later importer runs must preserve curator-owned content instead of replacing the row wholesale. The source-image fields keep importer provenance separate from the editor snapshot stored in `parsed_json`: - `source_page_number` points to the rendered PDF page used for review - `source_image_path` stores the importer-managed relative PNG path for the cell crop - `source_image_crop` stores the crop geometry that produced the PNG and can be used for debugging alignment problems ### 6. `critical_branch` Optional conditional branches inside a result cell. Examples: - `with helmet` - `without helmet` - `with leg greaves` - `if foe has shield` Each branch can carry: - `condition_text` - optional structured `condition_json` - branch description text - branch raw affix text - parsed JSON Current implementation note: - `critical_branch` is now populated by the importer and returned by the web critical lookup - condition keys are normalized for lookup/API use, while the original condition text remains available for display ### 7. `critical_effect` Normalized machine-readable effects parsed from the symbol line and, over time, from prose. Recommended canonical `effect_code` values: - `direct_hits` - `must_parry_rounds` - `no_parry_rounds` - `stunned_rounds` - `bleed_per_round` - `foe_penalty` - `attacker_bonus_next_round` - `power_point_modifier` - `initiative_gain` - `initiative_loss` - `drop_item` - `item_breakage_check` - `limb_useless` - `knockdown` - `prone` - `coma` - `paralyzed` - `blind` - `deaf` - `mute` - `dies_in_rounds` - `instant_death` - `armor_destroyed` - `weapon_stuck` Each effect should point to either: - the base `critical_result`, or - a `critical_branch` This lets you keep the raw text but still filter/query on effects. Current implementation note: - symbol-driven affixes are now normalized for both base results and conditional branch affixes - `value_expression` is used when the affix contains a formula instead of a flat integer, which is currently needed for `Mana` power-point adjustments such as `+(2d10-18)P` ## Why this works for your lookup Your lookup target is mostly: - `critical type` - `severity(column)` - `roll` That maps cleanly to: - `critical_table.slug` - `critical_column.column_key` - numeric roll matched against `critical_roll_band` For the outlier tables, add an optional `group_key`. That means the API can still stay simple: ```json { "critical_type": "slash", "column": "C", "roll": 38, "group": null } ``` or: ```json { "critical_type": "large_creature_magic", "group": "super_large", "column": "slaying", "roll": 88 } ``` ## Example return object This is close to the current lookup shape, while still leaving room for future `critical_effect` normalization: ```json { "critical_type": "slash", "table_name": "Slash Critical Strike Table", "group": null, "column": "B", "column_label": "B", "column_role": "severity", "roll": 38, "roll_band": "36-45", "roll_band_min": 36, "roll_band_max": 45, "description": "Strike foe in shin.", "raw_affix_text": null, "branches": [ { "branch_kind": "conditional", "condition_key": "with_leg_greaves", "condition_text": "with leg greaves", "description": "", "raw_affix_text": "+2H - must_parry", "sort_order": 1 }, { "branch_kind": "conditional", "condition_key": "without_leg_greaves", "condition_text": "w/o leg greaves", "description": "You slash open foe's shin.", "raw_affix_text": "+2H - bleed", "sort_order": 2 } ], "raw_cell_text": "Original full cell text as extracted from the PDF", "source": { "pdf": "Slash.pdf", "extraction_method": "xml" } } ``` ## Ingestion notes Current import flow: 1. Create `critical_table`, `critical_group`, `critical_column`, and `critical_roll_band` from each PDF's visible axes. 2. Store each base cell in `critical_result` with base raw/description/affix text. 3. Split explicit conditional branches into `critical_branch`. 4. Parse symbolic affixes for both the base result and any branch affix payloads into `critical_effect`. 5. Return the base result plus ordered branches and parsed affix effects through the web critical lookup. 6. Gradually enrich prose-derived effects such as death, blindness, paralysis, limb loss, initiative changes, and item breakage. 7. Route image PDFs like `Void.pdf` through OCR bootstrap before the same downstream parser and curation flow. The important design decision is: never throw away the original text. The prose is too irregular to rely on normalized fields alone. ## Manual curation workflow Because the import path depends on OCR, PDF XML extraction, and heuristics, the web app now treats manual repair as a first-class capability instead of an out-of-band database operation. Current curation flow: 1. Browse a table on the `/tables` page. 2. Hover a populated cell to identify editable entries. 3. Open the popup editor for that cell. 4. Edit the entire `critical_result` graph: - base raw cell text - curated prose / description - raw affix text - curated state - parse status - parsed JSON - nested `critical_branch` rows - nested `critical_effect` rows for both the base result and branches 5. Save the result back through the API. The corresponding API endpoints are: - `GET /api/tables/critical/{slug}/cells/{resultId}` - `GET /api/tables/critical/{slug}/cells/{resultId}/source-image` - `PUT /api/tables/critical/{slug}/cells/{resultId}` The save operation replaces the stored branches and effects for that cell with the submitted payload and updates the explicit curated flag. Importer-managed source provenance can still be refreshed on later imports without overwriting curated content.