Add high-res critical image refresh import

This commit is contained in:
2026-03-18 00:44:58 +01:00
parent 30fd257ea5
commit 8cbcf66695
10 changed files with 183 additions and 18 deletions

View File

@@ -33,7 +33,7 @@ The current implementation supports:
- `variant_column` critical tables with non-severity columns
- `grouped_variant` critical tables with a group axis plus variant columns
- XML-based extraction using `pdftohtml -xml`
- XML-aligned page rendering and per-cell PNG crops using `pdftoppm -png -r 108`
- XML-aligned page rendering and per-cell PNG crops using `pdftoppm -png -r 432`
- geometry-based parsing across the currently enabled table set:
- `arcane-aether`
- `arcane-nether`
@@ -359,6 +359,22 @@ Example:
dotnet run --project .\src\RolemasterDb.ImportTool\RolemasterDb.ImportTool.csproj -- import slash
```
### `reimport-images <table>`
Reuses `source.xml`, regenerates page PNGs and cell PNGs, rewrites the JSON artifacts, and refreshes only source-image metadata in SQLite.
Use this when:
- crop resolution or render settings changed
- you want better source images without reloading result text
- you want to keep curated and uncurated content untouched while refreshing artifacts
Example:
```powershell
dotnet run --project .\src\RolemasterDb.ImportTool\RolemasterDb.ImportTool.csproj -- reimport-images slash
```
## Manifest
The importer manifest is stored at:
@@ -433,7 +449,7 @@ Each parsed cell now includes:
### `pages/page-001.png`
Rendered PDF page images at `108 DPI`, which matches the coordinate space emitted by `pdftohtml -xml`.
Rendered PDF page images at `432 DPI`, using a central render scale factor of `4` over the XML coordinate space emitted by `pdftohtml -xml`.
Use this when:
@@ -607,10 +623,14 @@ The importer now uses two Poppler tools:
- `pdftohtml -xml -i -noframes`
- extracts geometry-aware XML text
- `pdftoppm -png -r 108`
- `pdftoppm -png -r 432`
- renders page PNGs and per-cell crop PNGs
The `108 DPI` render setting is deliberate: for the current PDFs and Poppler output, it produces page images whose pixel dimensions match the XML `page width` and `page height`, so crop coordinates can be applied directly without an extra scale-conversion step.
The importer keeps a central render scale factor of `4`. The XML still defines bounds in its original coordinate space, but rendered PNGs and stored crop metadata now use the scaled coordinate space and a `432 DPI` render setting. In practice:
- XML coordinates are multiplied by `4` before crop extraction
- page and crop metadata stored with each result reflect the scaled PNG coordinate space
- crop alignment remains deterministic without changing the parsing pipeline
## Interaction With Web App Startup