RolemasterDB/docs/critical_tables_db_model.md

# Critical Tables DB Model

## What the PDFs look like

The PDFs are not one uniform table shape. I found three families:

1. Standard tables
   - Columns are severity-like keys such as `A` through `E`.
   - Rows are roll bands such as `01-05`, `66`, `96-99`, or `100`.
   - Examples: `Slash.pdf`, `Puncture.pdf`, `Arcane Aether.pdf`.

2. Variant-column tables
   - Columns are not severity letters; they are variant keys such as `normal`, `magic`, `mithril`, `holy arms`, `slaying`.
   - Rows are still roll bands.
   - Example: `Large Creature - Weapon.pdf`.

3. Grouped variant tables
   - There is an extra grouping axis above the column axis.
   - Example: `Large Creature - Magic.pdf` has:
     - group: `large`, `super_large`
     - column: `normal`, `slaying`
   - In the current importer manifest, the grouped magic PDF is loaded once as `large_creature_magic` because the `Large Creature - Magic.pdf` and `Super Large Creature - Magic.pdf` source files are duplicates.
     - row: roll band

There are also extraction constraints:

- Most PDFs are text extractable with `pdftohtml -xml`.
- `Void.pdf` appears image-based and will need OCR or manual transcription.
- A single cell can contain:
  - base description text
  - symbolic affixes such as `+5H - 2S - 3B`
  - conditional branches such as `with helmet`, `w/o leg greaves`, `if foe has shield`

Because of that, the safest model is hybrid:

- relational tables for lookup axes and indexed effects
- raw text storage for fidelity
- structured JSON for irregular branches that are hard to normalize perfectly on first pass

## Recommended logical model

### 1. `critical_table`

One record per PDF/table, which is the primary "critical type" for lookup.

Examples:

- `slash`
- `puncture`
- `arcane_aether`
- `large_creature_weapon`
- `large_creature_magic`

### 2. `critical_group`

Optional extra axis for tables that need more than type + column + roll.

Examples:

- `large`
- `super_large`

Most tables will have no group rows.

### 3. `critical_column`

Generalized "severity/column" axis.

Examples:

- `A`, `B`, `C`, `D`, `E`
- `normal`, `magic`, `mithril`, `holy_arms`, `slaying`

Do not hardcode this as a single severity enum. Treat it as a table-defined dimension.

### 4. `critical_roll_band`

Stores row bands and supports exact row lookup by roll.

Examples:

- `01-05`
- `66`
- `96-99`
- `251+`

Recommended fields:

- `min_roll`
- `max_roll` nullable for open-ended rows like `251+`
- display label
- sort order

### 5. `critical_result`

One record per lookup cell:

- table
- optional group
- column
- roll band

This stores:

- `raw_cell_text`
- `description_text`
- `raw_affix_text`
- `parsed_json`
- parse status / source metadata

### 6. `critical_branch`

Optional conditional branches inside a result cell.

Examples:

- `with helmet`
- `without helmet`
- `with leg greaves`
- `if foe has shield`

Each branch can carry:

- `condition_text`
- optional structured `condition_json`
- branch description text
- branch raw affix text
- parsed JSON

Current implementation note:

- `critical_branch` is now populated by the importer and returned by the web critical lookup
- condition keys are normalized for lookup/API use, while the original condition text remains available for display

### 7. `critical_effect`

Normalized machine-readable effects parsed from the symbol line and, over time, from prose.

Recommended canonical `effect_code` values:

- `direct_hits`
- `must_parry_rounds`
- `no_parry_rounds`
- `stunned_rounds`
- `bleed_per_round`
- `foe_penalty`
- `attacker_bonus_next_round`
- `power_point_modifier`
- `initiative_gain`
- `initiative_loss`
- `drop_item`
- `item_breakage_check`
- `limb_useless`
- `knockdown`
- `prone`
- `coma`
- `paralyzed`
- `blind`
- `deaf`
- `mute`
- `dies_in_rounds`
- `instant_death`
- `armor_destroyed`
- `weapon_stuck`

Each effect should point to either:

- the base `critical_result`, or
- a `critical_branch`

This lets you keep the raw text but still filter/query on effects.

Current implementation note:

- symbol-driven affixes are now normalized for both base results and conditional branch affixes
- `value_expression` is used when the affix contains a formula instead of a flat integer, which is currently needed for `Mana` power-point adjustments such as `+(2d10-18)P`

## Why this works for your lookup

Your lookup target is mostly:

- `critical type`
- `severity(column)`
- `roll`

That maps cleanly to:

- `critical_table.slug`
- `critical_column.column_key`
- numeric roll matched against `critical_roll_band`

For the outlier tables, add an optional `group_key`.

That means the API can still stay simple:

```json
{
  "critical_type": "slash",
  "column": "C",
  "roll": 38,
  "group": null
}
```

or:

```json
{
  "critical_type": "large_creature_magic",
  "group": "super_large",
  "column": "slaying",
  "roll": 88
}
```

## Example return object

This is close to the current lookup shape, while still leaving room for future `critical_effect` normalization:

```json
{
  "critical_type": "slash",
  "table_name": "Slash Critical Strike Table",
  "group": null,
  "column": "B",
  "column_label": "B",
  "column_role": "severity",
  "roll": 38,
  "roll_band": "36-45",
  "roll_band_min": 36,
  "roll_band_max": 45,
  "description": "Strike foe in shin.",
  "raw_affix_text": null,
  "branches": [
    {
      "branch_kind": "conditional",
      "condition_key": "with_leg_greaves",
      "condition_text": "with leg greaves",
      "description": "",
      "raw_affix_text": "+2H - must_parry",
      "sort_order": 1
    },
    {
      "branch_kind": "conditional",
      "condition_key": "without_leg_greaves",
      "condition_text": "w/o leg greaves",
      "description": "You slash open foe's shin.",
      "raw_affix_text": "+2H - bleed",
      "sort_order": 2
    }
  ],
  "raw_cell_text": "Original full cell text as extracted from the PDF",
  "source": {
    "pdf": "Slash.pdf",
    "extraction_method": "xml"
  }
}
```

## Ingestion notes

Current import flow:

1. Create `critical_table`, `critical_group`, `critical_column`, and `critical_roll_band` from each PDF's visible axes.
2. Store each base cell in `critical_result` with base raw/description/affix text.
3. Split explicit conditional branches into `critical_branch`.
4. Parse symbolic affixes for both the base result and any branch affix payloads into `critical_effect`.
5. Return the base result plus ordered branches and parsed affix effects through the web critical lookup.
6. Gradually enrich prose-derived effects such as death, blindness, paralysis, limb loss, initiative changes, and item breakage.
7. Route image PDFs like `Void.pdf` through OCR before the same parser.

The important design decision is: never throw away the original text. The prose is too irregular to rely on normalized fields alone.

## Manual curation workflow

Because the import path depends on OCR, PDF XML extraction, and heuristics, the web app now treats manual repair as a first-class capability instead of an out-of-band database operation.

Current curation flow:

1. Browse a table on the `/tables` page.
2. Hover a populated cell to identify editable entries.
3. Open the popup editor for that cell.
4. Edit the entire `critical_result` graph:
   - base raw cell text
   - curated prose / description
   - raw affix text
   - parse status
   - parsed JSON
   - nested `critical_branch` rows
   - nested `critical_effect` rows for both the base result and branches
5. Save the result back through the API.

The corresponding API endpoints are:

- `GET /api/tables/critical/{slug}/cells/{resultId}`
- `PUT /api/tables/critical/{slug}/cells/{resultId}`

The save operation replaces the stored branches and effects for that cell with the submitted payload. That keeps manual edits deterministic and avoids trying to reconcile partial child-row diffs against importer-generated data.