663 lines
21 KiB
Markdown
663 lines
21 KiB
Markdown
# Critical Import Tool
|
||
|
||
## Purpose
|
||
|
||
The critical import tool exists to migrate Rolemaster critical-table source PDFs into the SQLite database used by the web app.
|
||
|
||
The tool is intentionally separate from the web application startup path. Critical data needs to be re-imported repeatedly while the extraction and parsing logic evolves, so the import workflow must be:
|
||
|
||
- explicit
|
||
- repeatable
|
||
- debuggable
|
||
- able to rebuild importer-managed data without resetting the entire application
|
||
|
||
The tool currently lives in `src/RolemasterDb.ImportTool` and operates against the same SQLite schema used by the web app.
|
||
|
||
## Goals
|
||
|
||
The importer is designed around the following requirements:
|
||
|
||
- reset and reload critical data without touching unrelated tables
|
||
- preserve source fidelity while still producing structured lookup data
|
||
- make parsing failures visible before bad data reaches SQLite
|
||
- keep intermediate artifacts on disk for inspection
|
||
- support iterative parser development one table at a time
|
||
|
||
## Current Scope
|
||
|
||
The current implementation supports:
|
||
|
||
- explicit CLI commands for reset, extraction, and import
|
||
- manifest-driven source selection
|
||
- `standard` critical tables with columns `A-E`
|
||
- `variant_column` critical tables with non-severity columns
|
||
- `grouped_variant` critical tables with a group axis plus variant columns
|
||
- XML-based extraction using `pdftohtml -xml`
|
||
- geometry-based parsing across the currently enabled table set:
|
||
- `arcane-aether`
|
||
- `arcane-nether`
|
||
- `ballistic-shrapnel`
|
||
- `brawling`
|
||
- `cold`
|
||
- `electricity`
|
||
- `grapple`
|
||
- `heat`
|
||
- `impact`
|
||
- `krush`
|
||
- `large_creature_magic`
|
||
- `large_creature_weapon`
|
||
- `ma-strikes`
|
||
- `ma-sweeps`
|
||
- `mana`
|
||
- `puncture`
|
||
- `slash`
|
||
- `subdual`
|
||
- `super_large_creature_weapon`
|
||
- `tiny`
|
||
- `unbalance`
|
||
- row-boundary repair for trailing affix leakage
|
||
- split row-label reconstruction for tables that render labels such as `99-` / `100` as two fragments
|
||
- conditional branch extraction into `critical_branch`
|
||
- footer/page-number filtering during body parsing
|
||
- transactional loading into SQLite
|
||
- conditional branch display through the web critical lookup
|
||
|
||
The current implementation does not yet support:
|
||
|
||
- OCR/image-based PDFs such as `Void.pdf`
|
||
- automatic confidence scoring beyond validation errors
|
||
|
||
## High-Level Architecture
|
||
|
||
The importer workflow is:
|
||
|
||
1. Resolve a table entry from the manifest.
|
||
2. Extract the source PDF into an artifact format.
|
||
3. Parse the extracted artifact into an in-memory table model.
|
||
4. Write debug artifacts to disk.
|
||
5. Validate the parsed result.
|
||
6. If validation succeeds, load the parsed data into SQLite in a transaction.
|
||
|
||
The importer uses the same EF Core context and domain model as the web app, but it owns the critical-data population flow.
|
||
|
||
## Implementation Phases
|
||
|
||
## Phase 1: Initial Importer and Text Extraction
|
||
|
||
Phase 1 established the first end-to-end workflow:
|
||
|
||
- a dedicated console project
|
||
- `CommandLineParser` based verbs
|
||
- a table manifest
|
||
- transactional reset/load commands
|
||
- a first parser for `Slash.pdf`
|
||
|
||
### Phase 1 command surface
|
||
|
||
Phase 1 introduced these verbs:
|
||
|
||
- `reset criticals`
|
||
- `extract <table>`
|
||
- `load <table>`
|
||
- `import <table>`
|
||
|
||
### Phase 1 extraction approach
|
||
|
||
The initial version used `pdftotext -layout` to create a flattened text artifact. The parser then tried to reconstruct:
|
||
|
||
- column boundaries from the `A-E` header line
|
||
- roll-band rows from labels such as `71-75`
|
||
- cell contents by slicing monospaced text blocks
|
||
|
||
### Phase 1 outcome
|
||
|
||
Phase 1 proved that the import loop and database load path worked, but it also exposed a critical reliability problem: flattened text was not a safe source format for these PDFs.
|
||
|
||
### Phase 1 failure mode
|
||
|
||
The first serious regression was seen in `Slash.pdf`:
|
||
|
||
- lookup target: `slash`, severity `A`, roll `72`
|
||
- expected band: `71-75`
|
||
- broken result from the text-based parser: content from `76-80` mixed with stray characters from severity `B`
|
||
|
||
That failure showed the core problem with `pdftotext -layout`: it discards the original page geometry and forces the importer to guess row and column structure from a lossy text layout.
|
||
|
||
Because of that, phase 1 is important historically, but it is not the recommended foundation for further parser development.
|
||
|
||
## Phase 2: XML Geometry-Based Parsing
|
||
|
||
Phase 2 replaced the flattened-text pipeline with a geometry-aware pipeline based on `pdftohtml -xml`.
|
||
|
||
### Why Phase 2 was necessary
|
||
|
||
The PDFs are still text-based, but the text needs to be parsed with positional information intact. The XML output produced by `pdftohtml` preserves:
|
||
|
||
- page number
|
||
- `top`
|
||
- `left`
|
||
- `width`
|
||
- `height`
|
||
- text content
|
||
|
||
That positional data makes it possible to assign fragments to rows and columns based on geometry instead of guessing from flattened text lines.
|
||
|
||
### Phase 2 extraction format
|
||
|
||
The importer now extracts to XML instead of plain text:
|
||
|
||
- extraction tool: `pdftohtml -xml -i -noframes`
|
||
- artifact file: `source.xml`
|
||
|
||
### Phase 2 parser model
|
||
|
||
The parser now works in these stages:
|
||
|
||
1. Load all `<text>` fragments from the XML.
|
||
2. Detect the standard `A-E` header row.
|
||
3. Detect roll-band labels on the left margin.
|
||
4. Build row bands from the vertical positions of those roll labels.
|
||
5. Build column boundaries from the horizontal centers of the `A-E` header fragments.
|
||
6. Assign each text fragment to a row by `top`.
|
||
7. Assign each text fragment to a column by horizontal position.
|
||
8. Reconstruct each cell from ordered fragments.
|
||
9. Split cell content into description lines and affix-like lines.
|
||
10. Validate the result before touching SQLite.
|
||
|
||
### Phase 2 reliability improvement
|
||
|
||
This phase fixed the original `Slash / A / 72` corruption. The same lookup now resolves to:
|
||
|
||
- band `71-75`
|
||
- description `Blow falls on lower leg. Slash tendons. Poor sucker.`
|
||
|
||
The important change is not only that the current output is correct, but that the importer now fails fast on structural ambiguity instead of silently loading corrupted rows.
|
||
|
||
## Phase 2.1: Boundary Hardening After Manual Validation
|
||
|
||
After phase 2, a manual validation pass compared:
|
||
|
||
- the rendered `Slash.pdf`
|
||
- the extracted `source.xml`
|
||
- the imported SQLite rows
|
||
|
||
That review found a remaining defect around the `51-55` / `56-60` boundary:
|
||
|
||
- `51-55` lost several affix lines
|
||
- `56-60` gained leading affix lines from the previous row
|
||
|
||
The root cause was the original row segmentation rule:
|
||
|
||
- rows were assigned strictly by the midpoint between adjacent roll-label `top` values
|
||
|
||
That rule was too naive for rows whose affix block sits visually near the next row label.
|
||
|
||
### Phase 2.1 fix
|
||
|
||
The parser was hardened in two ways:
|
||
|
||
1. Leading affix leakage repair
|
||
- after the initial row assignment, if a cell in the next row starts with affix-like lines and then continues with prose, those leading affix lines are moved back to the previous row
|
||
2. Better affix classification
|
||
- generic digit-starting lines are no longer assumed to be affixes
|
||
- this prevents prose such as `25% chance your weapon is stuck...` from being misclassified
|
||
|
||
### Phase 2.1 validation rules
|
||
|
||
The importer now explicitly rejects cells that still look structurally wrong after repair:
|
||
|
||
- prose and affix segments may not alternate more than once inside a cell
|
||
|
||
This keeps the phase-2.1 safety goal in place while allowing broader standard-table layouts that render a single affix block either before or after the prose block.
|
||
|
||
### Phase 3: Broader Table Coverage
|
||
|
||
Phase 3 expands the manifest and validates the shared `standard` parser across a broader set of `A-E` tables.
|
||
|
||
The currently enabled phase-3 table set is:
|
||
|
||
- `arcane-aether`
|
||
- `arcane-nether`
|
||
- `ballistic-shrapnel`
|
||
- `brawling`
|
||
- `cold`
|
||
- `electricity`
|
||
- `grapple`
|
||
- `heat`
|
||
- `impact`
|
||
- `krush`
|
||
- `ma-strikes`
|
||
- `ma-sweeps`
|
||
- `mana`
|
||
- `puncture`
|
||
- `slash`
|
||
- `subdual`
|
||
- `tiny`
|
||
- `unbalance`
|
||
|
||
Current phase-3 notes:
|
||
|
||
- header detection now tolerates minor `top` misalignment across the `A-E` header glyphs
|
||
- row boundaries can snap to the last affix-to-prose transition between adjacent roll labels when midpoint slicing would leak into the next row
|
||
- affix symbols are learned from the footer legend before body parsing, so symbol-only affix fragments are classified correctly
|
||
- affix fragments that cross a column boundary in the XML can be split on hard internal spacing before column assignment, which is required for `Mana.pdf`
|
||
- footer page numbers are filtered out before body parsing
|
||
- validation allows a single contiguous affix block either before or after prose
|
||
|
||
### Phase 4: Variant and Grouped Tables
|
||
|
||
Phase 4 extended the importer beyond `A-E` tables.
|
||
|
||
The currently enabled phase-4 table set is:
|
||
|
||
- `large_creature_weapon`
|
||
- `family`: `variant_column`
|
||
- columns: `NORMAL`, `MAGIC`, `MITHRIL`, `HOLY_ARMS`, `SLAYING`
|
||
- `super_large_creature_weapon`
|
||
- `family`: `variant_column`
|
||
- columns: `NORMAL`, `MAGIC`, `MITHRIL`, `HOLY_ARMS`, `SLAYING`
|
||
- `large_creature_magic`
|
||
- `family`: `grouped_variant`
|
||
- groups: `large`, `super_large`
|
||
- columns: `NORMAL`, `SLAYING`
|
||
|
||
Phase-4 notes:
|
||
|
||
- grouped results now populate `critical_group` during SQLite load
|
||
- parser dispatch is family-based instead of standard-table only
|
||
- left-margin row labels can be reconstructed from split fragments such as `151-` / `175`
|
||
- the grouped magic PDF is imported once as `large_creature_magic`
|
||
- `sources/Large Creature - Magic.pdf` and `sources/Super Large Creature - Magic.pdf` are duplicate files
|
||
|
||
### Phase 5: Conditional Branch Extraction
|
||
|
||
Phase 5 is complete.
|
||
|
||
Phase-5 notes:
|
||
|
||
- branch-heavy cells are split into base result content plus ordered `critical_branch` rows
|
||
- branch parsing is shared across `standard`, `variant_column`, and `grouped_variant` table families
|
||
- branch conditions are preserved as display text and normalized into condition keys such as `with_leg_greaves`
|
||
- branch payloads can contain prose, affix notation, or both
|
||
- the importer now upgrades older SQLite files to add the `CriticalBranches` table before load
|
||
- the web critical lookup now returns and renders conditional branches alongside the base result
|
||
|
||
### Phase 6: Effect Normalization
|
||
|
||
Phase 6 is complete for symbol-driven affixes.
|
||
|
||
Phase-6 notes:
|
||
|
||
- footer legends are parsed into table-specific affix metadata before effect normalization
|
||
- symbolic affix lines are normalized into `critical_effect` rows for both base results and conditional branches
|
||
- the normalized pass currently covers direct hits, must-parry, no-parry, stun, bleed, foe penalties, attacker bonuses, and `Mana` power-point modifiers
|
||
- result and branch `parsed_json` payloads now store the normalized symbol effects
|
||
- the web critical lookup now returns and renders parsed affix effects alongside the raw affix text
|
||
- prose-derived effects remain future work
|
||
|
||
### Phase 7: OCR and Manual Fallback
|
||
|
||
- support image-based PDFs such as `Void.pdf`
|
||
- route image-based sources through OCR or curated manual input
|
||
- keep the same post-extraction parsing contract where possible
|
||
|
||
## Current CLI
|
||
|
||
The tool uses `CommandLineParser` and currently exposes these verbs:
|
||
|
||
### `reset criticals`
|
||
|
||
Deletes importer-managed critical data from SQLite.
|
||
|
||
Use this when:
|
||
|
||
- you want to clear imported critical data
|
||
- you want to rerun a fresh import
|
||
- you need to verify the rebuild path from an empty critical-table state
|
||
|
||
Example:
|
||
|
||
```powershell
|
||
dotnet run --project .\src\RolemasterDb.ImportTool\RolemasterDb.ImportTool.csproj -- reset criticals
|
||
```
|
||
|
||
### `extract <table>`
|
||
|
||
Resolves a table from the manifest and writes the extraction artifact to disk.
|
||
|
||
Example:
|
||
|
||
```powershell
|
||
dotnet run --project .\src\RolemasterDb.ImportTool\RolemasterDb.ImportTool.csproj -- extract slash
|
||
```
|
||
|
||
### `load <table>`
|
||
|
||
Reads the extraction artifact, parses it, writes debug artifacts, validates the result, and loads SQLite if validation succeeds.
|
||
|
||
Example:
|
||
|
||
```powershell
|
||
dotnet run --project .\src\RolemasterDb.ImportTool\RolemasterDb.ImportTool.csproj -- load slash
|
||
```
|
||
|
||
### `import <table>`
|
||
|
||
Runs extraction followed by load.
|
||
|
||
Example:
|
||
|
||
```powershell
|
||
dotnet run --project .\src\RolemasterDb.ImportTool\RolemasterDb.ImportTool.csproj -- import slash
|
||
```
|
||
|
||
## Manifest
|
||
|
||
The importer manifest is stored at:
|
||
|
||
- `sources/critical-import-manifest.json`
|
||
|
||
Each entry declares:
|
||
|
||
- `slug`
|
||
- `displayName`
|
||
- `family`
|
||
- `extractionMethod`
|
||
- `pdfPath`
|
||
- `enabled`
|
||
|
||
The manifest is intentionally the control point for enabling importer support one table at a time.
|
||
|
||
For the currently enabled entries:
|
||
|
||
- standard tables use `family: standard`
|
||
- creature weapon tables use `family: variant_column`
|
||
- grouped creature magic uses `family: grouped_variant`
|
||
- all enabled entries currently use `extractionMethod: xml`
|
||
|
||
## Artifact Layout
|
||
|
||
Artifacts are written under:
|
||
|
||
- `artifacts/import/critical/<slug>/`
|
||
|
||
The current artifact set is:
|
||
|
||
### `source.xml`
|
||
|
||
The raw XML extraction output from `pdftohtml`.
|
||
|
||
Use this when:
|
||
|
||
- checking whether text is present in the PDF
|
||
- inspecting original `top` and `left` coordinates
|
||
- diagnosing row/column misassignment
|
||
|
||
### `fragments.json`
|
||
|
||
A normalized list of parsed text fragments with page and position metadata.
|
||
|
||
Use this when:
|
||
|
||
- comparing raw XML to the importer’s internal fragment model
|
||
- confirming that specific fragments were loaded correctly
|
||
- debugging Unicode or whitespace normalization issues
|
||
|
||
### `parsed-cells.json`
|
||
|
||
The reconstructed cells after geometry-based row/column assignment.
|
||
|
||
Use this when:
|
||
|
||
- validating a specific row and column
|
||
- checking whether a fragment was assigned to the correct cell
|
||
- confirming description and affix splitting
|
||
|
||
### `validation-report.json`
|
||
|
||
The validation result for the parsed table.
|
||
|
||
This includes:
|
||
|
||
- overall validity
|
||
- validation errors
|
||
- validation warnings
|
||
- row count
|
||
- cell count
|
||
|
||
Use this when:
|
||
|
||
- a `load` command fails
|
||
- a parser change introduces ambiguity
|
||
- you need to confirm that the importer refused to write SQLite data
|
||
|
||
## Standard Table Parsing Strategy
|
||
|
||
The current `standard` parser is designed for tables shaped like `Slash.pdf`:
|
||
|
||
- columns: `A-E`
|
||
- rows: roll bands such as `01-05`, `71-75`, `100`
|
||
- cell contents: prose, symbolic affixes, and sometimes conditional branch lines
|
||
|
||
### Header Detection
|
||
|
||
The parser searches the XML fragments for a row containing exactly:
|
||
|
||
- `A`
|
||
- `B`
|
||
- `C`
|
||
- `D`
|
||
- `E`
|
||
|
||
Those positions define the standard-table column anchors.
|
||
|
||
### Row Detection
|
||
|
||
The parser searches the left margin below the header for roll-band labels, for example:
|
||
|
||
- `01-05`
|
||
- `66`
|
||
- `251+`
|
||
|
||
Those vertical positions define the row anchors.
|
||
|
||
### Row Bands
|
||
|
||
The parser derives each row’s vertical range from the midpoint between adjacent roll-band anchors.
|
||
|
||
That prevents one row from drifting into the next when text wraps over multiple visual lines.
|
||
|
||
### Column Assignment
|
||
|
||
Each text fragment is assigned to the nearest column band based on horizontal center position.
|
||
|
||
This is the core reliability improvement over the phase-1 text slicing approach.
|
||
|
||
### Line Reconstruction
|
||
|
||
Fragments inside a cell are grouped into lines by close `top` values and then ordered by `left`.
|
||
|
||
This produces a stable line list even when PDF text is broken into multiple fragments.
|
||
|
||
### Boundary Repair
|
||
|
||
After the initial midpoint-based row assignment, the parser performs a repair step across adjacent rows in the same column.
|
||
|
||
If the next row begins with affix-like lines and then continues with prose, those leading affix lines are treated as leaked trailing affixes from the previous row and moved back.
|
||
|
||
This repair exists because some tables place affix lines close enough to the next row label that midpoint-only segmentation is not reliable.
|
||
|
||
### Description vs Affix Splitting
|
||
|
||
The parser classifies lines as:
|
||
|
||
- description-like prose
|
||
- affix-like notation
|
||
|
||
Affix-like lines include:
|
||
|
||
- `+...`
|
||
- symbolic lines using the critical glyphs
|
||
- branch-like affix lines such as `with leg greaves: +2H - ...`
|
||
|
||
Affix-like classification is intentionally conservative. Numeric prose lines such as `25% chance...` are not treated as affixes unless they match a known affix-like notation pattern.
|
||
|
||
The current implementation stores:
|
||
|
||
- base `RawCellText`
|
||
- base `DescriptionText`
|
||
- base `RawAffixText`
|
||
- normalized base affix effects in `critical_effect`
|
||
- parsed conditional branches with condition text, branch prose, branch affix text, and normalized branch affix effects
|
||
- parsed conditional branches in debug artifacts and persisted SQLite rows
|
||
|
||
## Validation Rules
|
||
|
||
The current validation pass is intentionally strict.
|
||
|
||
At minimum, a valid `standard` table must satisfy:
|
||
|
||
- a detectable `A-E` header row exists
|
||
- roll-band labels are found
|
||
- each detected row produces content for all five columns
|
||
- total parsed cell count matches `row_count * 5`
|
||
- no cell begins with affix-like lines before prose
|
||
- no cell contains prose after affix lines
|
||
|
||
If validation fails:
|
||
|
||
- artifacts are still written
|
||
- SQLite load is aborted
|
||
- the command returns an error
|
||
|
||
If validation succeeds with warnings:
|
||
|
||
- artifacts still record the warnings
|
||
- SQLite load continues
|
||
- the CLI prints each warning before reporting the successful load
|
||
|
||
This design is deliberate. It is safer to reject ambiguous extraction than to load a nearly-correct but wrong lookup table.
|
||
|
||
## Database Load Behavior
|
||
|
||
The loader is transactional.
|
||
|
||
The current load path:
|
||
|
||
1. ensures the SQLite database exists
|
||
2. upgrades older SQLite files to the current importer-owned critical schema where needed
|
||
3. deletes the existing subtree for the targeted critical table
|
||
4. inserts:
|
||
- `critical_table`
|
||
- `critical_column`
|
||
- `critical_roll_band`
|
||
- `critical_result`
|
||
- `critical_branch`
|
||
- `critical_effect`
|
||
5. commits only after the full table is saved
|
||
|
||
This means importer iterations can target one table without resetting unrelated database content.
|
||
|
||
## Interaction With Web App Startup
|
||
|
||
The web application no longer auto-seeds critical starter data on startup.
|
||
|
||
Startup still ensures the database exists and seeds attack starter data, but critical-table population is now owned by the importer.
|
||
|
||
This separation is important because:
|
||
|
||
- importer iterations are frequent
|
||
- parser logic is still evolving
|
||
- startup should not silently repopulate critical data behind the tool’s back
|
||
|
||
## Current Code Map
|
||
|
||
Important files in the current implementation:
|
||
|
||
- `src/RolemasterDb.ImportTool/Program.cs`
|
||
- CLI entry point
|
||
- `src/RolemasterDb.ImportTool/CriticalImportCommandRunner.cs`
|
||
- command orchestration
|
||
- `src/RolemasterDb.ImportTool/CriticalImportLoader.cs`
|
||
- transactional SQLite load/reset behavior
|
||
- `src/RolemasterDb.ImportTool/Parsing/CriticalCellTextParser.cs`
|
||
- shared base-vs-branch parsing for cell content and affix extraction
|
||
- `src/RolemasterDb.ImportTool/Parsing/AffixEffectParser.cs`
|
||
- footer-legend-aware symbol effect normalization
|
||
- `src/RolemasterDb.ImportTool/Parsing/AffixLegend.cs`
|
||
- parsed footer legend model used for affix classification and effect mapping
|
||
- `src/RolemasterDb.ImportTool/CriticalImportManifestLoader.cs`
|
||
- manifest loading
|
||
- `src/RolemasterDb.ImportTool/PdfXmlExtractor.cs`
|
||
- XML extraction via `pdftohtml`
|
||
- `src/RolemasterDb.ImportTool/ImportArtifactWriter.cs`
|
||
- artifact output
|
||
- `src/RolemasterDb.ImportTool/Parsing/StandardCriticalTableParser.cs`
|
||
- standard table geometry parser
|
||
- `src/RolemasterDb.ImportTool/Parsing/XmlTextFragment.cs`
|
||
- positioned text fragment model
|
||
- `src/RolemasterDb.ImportTool/Parsing/ParsedCriticalCellArtifact.cs`
|
||
- debug cell artifact model
|
||
- `src/RolemasterDb.ImportTool/Parsing/ParsedCriticalBranch.cs`
|
||
- parsed branch artifact model with normalized effects
|
||
- `src/RolemasterDb.ImportTool/Parsing/ParsedCriticalEffect.cs`
|
||
- parsed effect artifact model
|
||
- `src/RolemasterDb.ImportTool/Parsing/ImportValidationReport.cs`
|
||
- validation output model
|
||
- `src/RolemasterDb.App/Data/RolemasterDbSchemaUpgrader.cs`
|
||
- SQLite upgrade hook for branch/effect-table rollout
|
||
- `src/RolemasterDb.App/Components/Shared/CriticalLookupResultCard.razor`
|
||
- web rendering of base results, conditional branches, and parsed affix effects
|
||
|
||
## Adding a New Table
|
||
|
||
The recommended process for onboarding a new table is:
|
||
|
||
1. Add a manifest entry.
|
||
2. Run `extract <slug>`.
|
||
3. Inspect `source.xml`.
|
||
4. Run `load <slug>`.
|
||
5. Inspect `validation-report.json` and `parsed-cells.json`.
|
||
6. If validation succeeds, spot-check SQLite output.
|
||
7. If validation fails, adjust the parser or add a family-specific parser strategy before retrying.
|
||
|
||
## Debugging Guidance
|
||
|
||
If a table imports incorrectly, inspect artifacts in this order:
|
||
|
||
1. `validation-report.json`
|
||
2. `parsed-cells.json`
|
||
3. `fragments.json`
|
||
4. `source.xml`
|
||
|
||
That order usually answers the key questions fastest:
|
||
|
||
- did validation fail
|
||
- which row/column is wrong
|
||
- were fragments assigned incorrectly
|
||
- or was the extraction itself already malformed
|
||
|
||
## Reliability Position
|
||
|
||
The current importer should be understood as:
|
||
|
||
- reliable enough for geometry-based `standard` table iteration
|
||
- much safer than the old flattened-text approach
|
||
- still evolving toward broader family coverage and deeper normalization
|
||
|
||
The key design rule going forward is:
|
||
|
||
- do not silently load ambiguous data
|
||
|
||
The importer should always prefer:
|
||
|
||
- preserving source fidelity
|
||
- writing review artifacts
|
||
- failing validation
|
||
|
||
over:
|
||
|
||
- guessing
|
||
- auto-correcting without evidence
|
||
- loading nearly-correct but structurally wrong critical results
|