Initial commit

This commit is contained in:
2026-03-14 00:32:26 +01:00
commit 70a35f3985
109 changed files with 62554 additions and 0 deletions

View File

@@ -0,0 +1,285 @@
# Critical Tables DB Model
## What the PDFs look like
The PDFs are not one uniform table shape. I found three families:
1. Standard tables
- Columns are severity-like keys such as `A` through `E`.
- Rows are roll bands such as `01-05`, `66`, `96-99`, or `100`.
- Examples: `Slash.pdf`, `Puncture.pdf`, `Arcane Aether.pdf`.
2. Variant-column tables
- Columns are not severity letters; they are variant keys such as `normal`, `magic`, `mithril`, `holy arms`, `slaying`.
- Rows are still roll bands.
- Example: `Large Creature - Weapon.pdf`.
3. Grouped variant tables
- There is an extra grouping axis above the column axis.
- Example: `Large Creature - Magic.pdf` has:
- group: `large`, `super_large`
- column: `normal`, `slaying`
- row: roll band
There are also extraction constraints:
- Most PDFs are text extractable with `pdftotext -layout`.
- `Void.pdf` appears image-based and will need OCR or manual transcription.
- A single cell can contain:
- base description text
- symbolic affixes such as `+5H - 2S - 3B`
- conditional branches such as `with helmet`, `w/o leg greaves`, `if foe has shield`
Because of that, the safest model is hybrid:
- relational tables for lookup axes and indexed effects
- raw text storage for fidelity
- structured JSON for irregular branches that are hard to normalize perfectly on first pass
## Recommended logical model
### 1. `critical_table`
One record per PDF/table, which is the primary "critical type" for lookup.
Examples:
- `slash`
- `puncture`
- `arcane_aether`
- `large_creature_weapon`
- `large_creature_magic`
### 2. `critical_group`
Optional extra axis for tables that need more than type + column + roll.
Examples:
- `large`
- `super_large`
Most tables will have no group rows.
### 3. `critical_column`
Generalized "severity/column" axis.
Examples:
- `A`, `B`, `C`, `D`, `E`
- `normal`, `magic`, `mithril`, `holy_arms`, `slaying`
Do not hardcode this as a single severity enum. Treat it as a table-defined dimension.
### 4. `critical_roll_band`
Stores row bands and supports exact row lookup by roll.
Examples:
- `01-05`
- `66`
- `96-99`
- `251+`
Recommended fields:
- `min_roll`
- `max_roll` nullable for open-ended rows like `251+`
- display label
- sort order
### 5. `critical_result`
One record per lookup cell:
- table
- optional group
- column
- roll band
This stores:
- `raw_cell_text`
- `description_text`
- `raw_affix_text`
- `parsed_json`
- parse status / source metadata
### 6. `critical_branch`
Optional conditional branches inside a result cell.
Examples:
- `with helmet`
- `without helmet`
- `with leg greaves`
- `if foe has shield`
Each branch can carry:
- `condition_text`
- optional structured `condition_json`
- branch description text
- branch raw affix text
- parsed JSON
### 7. `critical_effect`
Normalized machine-readable effects parsed from the symbol line and, over time, from prose.
Recommended canonical `effect_code` values:
- `direct_hits`
- `must_parry_rounds`
- `no_parry_rounds`
- `stunned_rounds`
- `bleed_per_round`
- `foe_penalty`
- `attacker_bonus_next_round`
- `initiative_gain`
- `initiative_loss`
- `drop_item`
- `item_breakage_check`
- `limb_useless`
- `knockdown`
- `prone`
- `coma`
- `paralyzed`
- `blind`
- `deaf`
- `mute`
- `dies_in_rounds`
- `instant_death`
- `armor_destroyed`
- `weapon_stuck`
Each effect should point to either:
- the base `critical_result`, or
- a `critical_branch`
This lets you keep the raw text but still filter/query on effects.
## Why this works for your lookup
Your lookup target is mostly:
- `critical type`
- `severity(column)`
- `roll`
That maps cleanly to:
- `critical_table.slug`
- `critical_column.column_key`
- numeric roll matched against `critical_roll_band`
For the outlier tables, add an optional `group_key`.
That means the API can still stay simple:
```json
{
"critical_type": "slash",
"column": "C",
"roll": 38,
"group": null
}
```
or:
```json
{
"critical_type": "large_creature_magic",
"group": "super_large",
"column": "slaying",
"roll": 88
}
```
## Example return object
This is the shape I would return from a lookup:
```json
{
"critical_type": "slash",
"table_name": "Slash Critical Strike Table",
"group": null,
"column": {
"key": "B",
"label": "B",
"role": "severity"
},
"roll": {
"input": 38,
"band": "36-45",
"min": 36,
"max": 45
},
"description": "Strike foe in shin.",
"raw_affix_text": "+2H - must_parry",
"affixes": [
{
"effect_code": "direct_hits",
"value": 2
}
],
"conditions": [
{
"when": "with leg greaves",
"description": null,
"raw_affix_text": "+2H - must_parry",
"affixes": [
{
"effect_code": "direct_hits",
"value": 2
},
{
"effect_code": "must_parry_rounds",
"value": 1
}
]
},
{
"when": "without leg greaves",
"description": "You slash open foe's shin.",
"raw_affix_text": "+2H - bleed",
"affixes": [
{
"effect_code": "direct_hits",
"value": 2
},
{
"effect_code": "bleed_per_round",
"value": 1
}
]
}
],
"raw_text": "Original full cell text as extracted from the PDF",
"source": {
"pdf": "Slash.pdf",
"page": 1,
"extraction_method": "text"
}
}
```
## Ingestion notes
Recommended import flow:
1. Create `critical_table`, `critical_group`, `critical_column`, and `critical_roll_band` from each PDF's visible axes.
2. Store each cell in `critical_result.raw_cell_text` exactly as extracted.
3. Parse the symbol line into `critical_effect`.
4. Split explicit conditional branches into `critical_branch`.
5. Gradually enrich prose-derived effects such as death, blindness, paralysis, limb loss, initiative changes, and item breakage.
6. Route image PDFs like `Void.pdf` through OCR before the same parser.
The important design decision is: never throw away the original text. The prose is too irregular to rely on normalized fields alone.

View File

@@ -0,0 +1,142 @@
-- PostgreSQL-oriented schema for Rolemaster critical tables.
-- It is intentionally hybrid: relational axes + raw text + parsed JSON.
create table critical_table (
id bigint generated always as identity primary key,
slug text not null unique,
display_name text not null,
family text not null check (family in ('standard', 'variant_column', 'grouped_variant')),
source_pdf text not null,
source_page integer not null default 1,
extraction_method text not null check (extraction_method in ('text', 'ocr', 'manual')),
notes text,
created_at timestamptz not null default now()
);
create table critical_group (
id bigint generated always as identity primary key,
critical_table_id bigint not null references critical_table(id) on delete cascade,
group_key text not null,
label text not null,
sort_order integer not null,
unique (critical_table_id, group_key)
);
create table critical_column (
id bigint generated always as identity primary key,
critical_table_id bigint not null references critical_table(id) on delete cascade,
column_key text not null,
label text not null,
role text not null default 'severity' check (role in ('severity', 'variant', 'other')),
sort_order integer not null,
unique (critical_table_id, column_key)
);
create table critical_roll_band (
id bigint generated always as identity primary key,
critical_table_id bigint not null references critical_table(id) on delete cascade,
label text not null,
min_roll integer not null,
max_roll integer,
sort_order integer not null,
check (max_roll is null or max_roll >= min_roll),
unique (critical_table_id, label)
);
create index critical_roll_band_lookup_idx
on critical_roll_band (critical_table_id, min_roll, max_roll);
create table critical_result (
id bigint generated always as identity primary key,
critical_table_id bigint not null references critical_table(id) on delete cascade,
critical_group_id bigint references critical_group(id) on delete cascade,
critical_column_id bigint not null references critical_column(id) on delete cascade,
critical_roll_band_id bigint not null references critical_roll_band(id) on delete cascade,
raw_cell_text text not null,
description_text text,
raw_affix_text text,
parsed_json jsonb not null default '{}'::jsonb,
parse_status text not null default 'raw' check (parse_status in ('raw', 'partial', 'parsed', 'verified')),
source_bbox jsonb,
created_at timestamptz not null default now()
);
create unique index critical_result_lookup_uidx
on critical_result (
critical_table_id,
coalesce(critical_group_id, 0),
critical_column_id,
critical_roll_band_id
);
create table critical_branch (
id bigint generated always as identity primary key,
critical_result_id bigint not null references critical_result(id) on delete cascade,
branch_kind text not null default 'conditional' check (branch_kind in ('conditional', 'note', 'override')),
condition_key text,
condition_text text not null,
condition_json jsonb not null default '{}'::jsonb,
raw_text text not null,
description_text text,
raw_affix_text text,
parsed_json jsonb not null default '{}'::jsonb,
sort_order integer not null default 1
);
create table critical_effect (
id bigint generated always as identity primary key,
critical_result_id bigint references critical_result(id) on delete cascade,
critical_branch_id bigint references critical_branch(id) on delete cascade,
effect_code text not null,
target text,
value_integer integer,
value_decimal numeric(10, 2),
duration_rounds integer,
per_round integer,
modifier integer,
body_part text,
is_permanent boolean not null default false,
source_type text not null default 'symbol' check (source_type in ('symbol', 'prose', 'manual')),
source_text text,
check ((critical_result_id is not null) <> (critical_branch_id is not null))
);
create index critical_effect_lookup_idx
on critical_effect (effect_code, target);
create index critical_effect_result_idx
on critical_effect (critical_result_id);
create index critical_effect_branch_idx
on critical_effect (critical_branch_id);
create index critical_result_parsed_json_gin
on critical_result using gin (parsed_json);
create index critical_branch_parsed_json_gin
on critical_branch using gin (parsed_json);
-- Example lookup pattern:
--
-- select
-- t.slug as critical_type,
-- t.display_name as table_name,
-- g.group_key,
-- c.column_key,
-- rb.label as roll_band,
-- rb.min_roll,
-- rb.max_roll,
-- r.description_text,
-- r.raw_affix_text,
-- r.raw_cell_text,
-- r.parsed_json
-- from critical_result r
-- join critical_table t on t.id = r.critical_table_id
-- left join critical_group g on g.id = r.critical_group_id
-- join critical_column c on c.id = r.critical_column_id
-- join critical_roll_band rb on rb.id = r.critical_roll_band_id
-- where t.slug = 'slash'
-- and c.column_key = 'C'
-- and 38 >= rb.min_roll
-- and (rb.max_roll is null or 38 <= rb.max_roll);