removed stale doc
This commit is contained in:
419
POSTMORTEM.md
419
POSTMORTEM.md
@@ -1,419 +0,0 @@
|
||||
# POSTMORTEM
|
||||
|
||||
## Executive Summary
|
||||
|
||||
RpgRoller failed in Firefox with RoboForm enabled because the authenticated workspace was built as a highly reactive Blazor Server surface that performs several structural rerenders immediately after login while assuming stable ownership of the rendered DOM.
|
||||
|
||||
RoboForm was the trigger, not the root cause.
|
||||
|
||||
The root cause was architectural:
|
||||
|
||||
- the app mixed static HTML auth, interactive Blazor Server UI, browser-managed session state, JavaScript fetch calls, and SSE live updates into one startup path
|
||||
- the authenticated workspace bootstrap was driven from `OnAfterRenderAsync` and intentionally caused follow-up renders
|
||||
- the root shell in [App.razor](/home/frank/Code/RpgRoller/RpgRoller/Components/App.razor) still branches on request-time `HttpContext` and session-cookie state
|
||||
- the workspace render tree is large, form-heavy, and sensitive to DOM mutation during early batches
|
||||
|
||||
That combination made the app fragile under browser extensions that legitimately modify login and form-related DOM.
|
||||
|
||||
## Incident Summary
|
||||
|
||||
### User-visible symptoms
|
||||
|
||||
- Firefox in a normal profile crashed the Blazor circuit immediately after or around login
|
||||
- the browser console reported:
|
||||
- `Error: There was an error applying batch ...`
|
||||
- `TypeError: can't access property "insertBefore", n.parentNode is null`
|
||||
- the server logged `RemoteRenderer[100]` and terminated the circuit
|
||||
- the UI often degraded into `Loading user...`, `Offline fallback`, or a partially rendered play screen before failing
|
||||
|
||||
### Scope
|
||||
|
||||
- The failure reproduced only in a normal Firefox profile with RoboForm enabled
|
||||
- The failure did not reproduce in a private Firefox window
|
||||
- The failure did not reproduce after disabling RoboForm
|
||||
- The failure was not tied to a specific username or database row
|
||||
|
||||
### Trigger vs. Root Cause
|
||||
|
||||
Trigger:
|
||||
|
||||
- RoboForm mutated form-related DOM in the page
|
||||
|
||||
Root cause:
|
||||
|
||||
- the app architecture depended on Blazor Server retaining stable ownership of a DOM subtree that was undergoing immediate, multi-batch structural changes during startup
|
||||
|
||||
## Current Architecture
|
||||
|
||||
### Root shell
|
||||
|
||||
The application entry point is [Program.cs](/home/frank/Code/RpgRoller/RpgRoller/Program.cs):
|
||||
|
||||
- `AddRazorComponents().AddInteractiveServerComponents()`
|
||||
- `MapRazorComponents<App>().AddInteractiveServerRenderMode()`
|
||||
- `AddScoped<RpgRollerApiClient>()`
|
||||
- `AddScoped<WorkspaceQueryService>()`
|
||||
|
||||
The root component is [App.razor](/home/frank/Code/RpgRoller/RpgRoller/Components/App.razor). It does two different things at `/`:
|
||||
|
||||
- if the incoming request has no valid session cookie, it renders [StaticAuthPage.razor](/home/frank/Code/RpgRoller/RpgRoller/Components/Pages/HomeControls/StaticAuthPage.razor) as plain HTML
|
||||
- otherwise it boots the Blazor app with `<Routes @rendermode="InteractiveServer(prerender: false)" />`
|
||||
|
||||
This decision is made from request-time state:
|
||||
|
||||
- `HttpContext`
|
||||
- request path
|
||||
- session cookie
|
||||
- `IGameService.GetUserBySession`
|
||||
|
||||
### Auth flow
|
||||
|
||||
The auth page is no longer a Blazor form.
|
||||
|
||||
[StaticAuthPage.razor](/home/frank/Code/RpgRoller/RpgRoller/Components/Pages/HomeControls/StaticAuthPage.razor) renders plain HTML forms, and [rpgroller-api.js](/home/frank/Code/RpgRoller/RpgRoller/wwwroot/js/rpgroller-api.js) binds submit handlers that:
|
||||
|
||||
- validate in JS
|
||||
- call `/api/auth/register` or `/api/auth/login` via `fetch`
|
||||
- on login, force a full `window.location.assign("/")`
|
||||
|
||||
This means the app has one browser experience before login and a different ownership model after login.
|
||||
|
||||
### Authenticated workspace
|
||||
|
||||
The authenticated `/` route is [Home.razor](/home/frank/Code/RpgRoller/RpgRoller/Components/Pages/Home.razor), which only renders `<Workspace LoggedOut="OnLoggedOutAsync" />`.
|
||||
|
||||
The real composition root is [Workspace.razor.cs](/home/frank/Code/RpgRoller/RpgRoller/Components/Pages/Workspace.razor.cs). It wires:
|
||||
|
||||
- `WorkspaceSessionCoordinator`
|
||||
- `WorkspaceCampaignScopeCoordinator`
|
||||
- `WorkspaceCampaignCoordinator`
|
||||
- `WorkspacePlayCoordinator`
|
||||
- `WorkspaceAdminCoordinator`
|
||||
- `WorkspaceLiveStateController`
|
||||
- `WorkspaceFeedbackService`
|
||||
|
||||
The current startup path is driven by `OnAfterRenderAsync`:
|
||||
|
||||
1. first interactive render occurs
|
||||
2. `Session.InitializeAsync()` runs
|
||||
3. `StateHasChanged()` is invoked
|
||||
4. later renders enable more controls such as the character controls and custom roll composer
|
||||
|
||||
### State channels
|
||||
|
||||
The workspace state is not owned by one subsystem. It is spread across four channels:
|
||||
|
||||
1. Blazor component state in [WorkspaceState.cs](/home/frank/Code/RpgRoller/RpgRoller/Components/Pages/WorkspaceState.cs)
|
||||
2. browser `sessionStorage` via `rpgroller-api.js`
|
||||
3. browser `fetch` requests through [RpgRollerApiClient.cs](/home/frank/Code/RpgRoller/RpgRoller/Components/RpgRollerApiClient.cs) and [WorkspaceQueryService.cs](/home/frank/Code/RpgRoller/RpgRoller/Components/WorkspaceQueryService.cs)
|
||||
4. SSE live updates from `/api/events/state` via [StateEventEndpoints.cs](/home/frank/Code/RpgRoller/RpgRoller/Api/StateEventEndpoints.cs)
|
||||
|
||||
### Backend state model
|
||||
|
||||
The backend is not stateless HTTP over a database. [GameService.cs](/home/frank/Code/RpgRoller/RpgRoller/Services/GameService.cs) builds an in-memory runtime state from SQLite at startup using [GameStateStore.cs](/home/frank/Code/RpgRoller/RpgRoller/Services/GameStateStore.cs), then serves reads and writes against that state.
|
||||
|
||||
This matters because the frontend already has multiple live state concepts:
|
||||
|
||||
- the Blazor circuit
|
||||
- the JS app state
|
||||
- the SSE stream
|
||||
- the server-side runtime store
|
||||
|
||||
That complexity is manageable only if the UI ownership boundaries stay clean. They did not.
|
||||
|
||||
## Architectural Timeline
|
||||
|
||||
### February 25, 2026
|
||||
|
||||
Blazor was introduced as the frontend host:
|
||||
|
||||
- `a8ee637` `Scaffold Blazor frontend host and root components`
|
||||
- `35c60c4` `Replace frontend with Blazor UX implementation`
|
||||
|
||||
This established Blazor Server as the UI owner.
|
||||
|
||||
### February 26 to April 5, 2026
|
||||
|
||||
The workspace became denser and more interactive:
|
||||
|
||||
- `c3aa0d4` `Overhaul workspace UX for denser play workflow`
|
||||
- `bf3a6fa` `Persist roll visibility preference across workspace reloads`
|
||||
- `54aabc6` `Unify play management and admin screens in workspace`
|
||||
- `6ea91ee` `Add targeted workspace live refresh`
|
||||
- `e42c0fb` `Load campaign logs incrementally`
|
||||
- `9e6e6fe` `Add custom campaign roll composer`
|
||||
- `4af1c87` through `b291d05` extracted coordinators and simplified the composition root
|
||||
|
||||
This refactor improved code organization, but it also increased the number of reactive moving parts:
|
||||
|
||||
- more persistent UI state in `sessionStorage`
|
||||
- more conditional screen branching inside one workspace root
|
||||
- more input-heavy controls on the default play screen
|
||||
- a live SSE side channel that can trigger refreshes after startup
|
||||
|
||||
### May 2 to May 4, 2026
|
||||
|
||||
This period contains direct evidence of mitigation attempts after the Firefox failure surfaced:
|
||||
|
||||
- `2d2ed56` `Isolate anonymous auth page from Blazor`
|
||||
- `1f19bf7` `Restore workspace prerender and auth errors`
|
||||
- `231b0ac` `Remove workspace session-token coupling`
|
||||
- `da81358` `Delay workspace render until session init completes`
|
||||
- `e0b7d27` `Stage workspace controls after bootstrap`
|
||||
|
||||
These commits are valuable evidence because they show the app was being repaired at the symptom boundary:
|
||||
|
||||
- first by removing auth from Blazor ownership
|
||||
- then by changing prerender behavior
|
||||
- then by removing `HttpContext`-captured session coupling from workspace queries
|
||||
- then by staging workspace startup
|
||||
|
||||
None of those changes removed the underlying architectural fragility: a Blazor Server workspace that still reshapes a large, extension-visible DOM over several early render batches.
|
||||
|
||||
## Root Causes
|
||||
|
||||
### 1. DOM ownership was not treated as a hard architectural boundary
|
||||
|
||||
The app used Blazor Server for a form-heavy authenticated workspace, while also operating in a browser environment where password managers are expected to inject or wrap form controls.
|
||||
|
||||
In principle that can work, but only when the rendered DOM is stable enough that third-party mutation does not race against structural rerenders.
|
||||
|
||||
RpgRoller violated that assumption:
|
||||
|
||||
- immediate post-login renders reshaped the workspace
|
||||
- input-bearing controls were mounted during startup
|
||||
- later state syncs continued changing the same subtree
|
||||
|
||||
That made the DOM ownership contract weak.
|
||||
|
||||
### 2. Startup was centered on `OnAfterRenderAsync` instead of a stable initial model
|
||||
|
||||
[Workspace.razor.cs](/home/frank/Code/RpgRoller/RpgRoller/Components/Pages/Workspace.razor.cs) drives initialization from `OnAfterRenderAsync`, then explicitly schedules more rerenders.
|
||||
|
||||
That has two consequences:
|
||||
|
||||
- the first visible authenticated frame is not the final intended frame
|
||||
- the renderer must apply several batches while the browser is already free to run extensions against the DOM
|
||||
|
||||
This is a poor fit for DOM-mutating extensions.
|
||||
|
||||
### 3. The root shell still depends on request-time `HttpContext`
|
||||
|
||||
[App.razor](/home/frank/Code/RpgRoller/RpgRoller/Components/App.razor) still uses `HttpContext` and the session cookie to decide whether to render:
|
||||
|
||||
- static auth HTML
|
||||
- or the interactive app
|
||||
|
||||
Even after removing the old workspace session-token accessor, the root shell still relies on request-only state to choose the subtree for `/`.
|
||||
|
||||
This is a fragile architectural seam because the app is half request-rendered and half interactive, with the split encoded in the component tree itself.
|
||||
|
||||
### 4. Too many reactive state channels were active during the same startup window
|
||||
|
||||
During or shortly after login, the workspace may react to:
|
||||
|
||||
- `sessionStorage` reads
|
||||
- API reads through `fetch`
|
||||
- Blazor rerenders from `StateHasChanged`
|
||||
- SSE connection state transitions
|
||||
- SSE state events
|
||||
|
||||
That is too much coordination for a large render tree if DOM stability is required.
|
||||
|
||||
### 5. The workspace root remained too large and too structurally dynamic
|
||||
|
||||
[Workspace.razor](/home/frank/Code/RpgRoller/RpgRoller/Components/Pages/Workspace.razor) controls:
|
||||
|
||||
- header
|
||||
- play screen
|
||||
- campaign management
|
||||
- admin screen
|
||||
- toasts
|
||||
- character modals
|
||||
- Rolemaster modal
|
||||
|
||||
The play screen itself contains multiple conditional branches and input surfaces.
|
||||
|
||||
Even though code-behind files and coordinators improved organization, the rendered root still has a large rerender blast radius.
|
||||
|
||||
### 6. Documentation drift hid the architecture change
|
||||
|
||||
The current [README.md](/home/frank/Code/RpgRoller/README.md) still describes:
|
||||
|
||||
- `Home.razor` as a gateway that switches between loading, auth, and authenticated workspace views
|
||||
- `WorkspaceQueryService` as “server-side read model access”
|
||||
|
||||
Neither description matches the current code:
|
||||
|
||||
- `Home.razor` now just renders `Workspace`
|
||||
- `App.razor` became the real gateway
|
||||
- `WorkspaceQueryService` now calls browser `fetch` through `RpgRollerApiClient`
|
||||
|
||||
This kind of drift usually means the architecture has moved faster than the design was reevaluated.
|
||||
|
||||
## Why the Failure Was Hard to Fix Incrementally
|
||||
|
||||
The failure was not caused by one broken line. It emerged from several acceptable local decisions that interacted badly:
|
||||
|
||||
- static auth was added to avoid Blazor auth-page failures
|
||||
- workspace prerender behavior was changed to satisfy session bootstrap
|
||||
- direct server-side workspace reads were removed to avoid `HttpContext` coupling
|
||||
- render staging was added to reduce early DOM churn
|
||||
|
||||
Each change improved one seam while leaving the overall architecture intact.
|
||||
|
||||
That is why the issue kept moving:
|
||||
|
||||
- first the auth page crashed
|
||||
- then login worked but workspace bootstrap stalled
|
||||
- then the play view partially rendered but later crashed
|
||||
|
||||
The architecture allowed the failure to migrate between phases instead of disappearing.
|
||||
|
||||
## Evidence From Recent Fix Attempts
|
||||
|
||||
### `2d2ed56` on May 2, 2026
|
||||
|
||||
`Isolate anonymous auth page from Blazor`
|
||||
|
||||
What it changed:
|
||||
|
||||
- added `StaticAuthPage.razor`
|
||||
- moved login/register handling into `rpgroller-api.js`
|
||||
- changed `App.razor` to serve static auth HTML when unauthenticated
|
||||
|
||||
What it revealed:
|
||||
|
||||
- removing Blazor from the auth page improved the anonymous path
|
||||
- the underlying crash still existed in the authenticated workspace path
|
||||
|
||||
### `1f19bf7` on May 2, 2026
|
||||
|
||||
`Restore workspace prerender and auth errors`
|
||||
|
||||
What it changed:
|
||||
|
||||
- adjusted render mode behavior again
|
||||
- kept better auth error reporting
|
||||
|
||||
What it revealed:
|
||||
|
||||
- workspace startup was still entangled with earlier render-mode decisions
|
||||
- the app was using render-mode changes as a corrective mechanism rather than as a stable architecture choice
|
||||
|
||||
### `231b0ac` on May 3, 2026
|
||||
|
||||
`Remove workspace session-token coupling`
|
||||
|
||||
What it changed:
|
||||
|
||||
- deleted `WorkspaceSessionTokenAccessor`
|
||||
- changed `WorkspaceQueryService` to use API calls instead of direct service access
|
||||
|
||||
What it revealed:
|
||||
|
||||
- the previous architecture had leaked request-time session access into interactive workspace startup
|
||||
- removing that coupling was necessary, but not sufficient, because the DOM ownership problem remained
|
||||
|
||||
### `da81358` on May 4, 2026
|
||||
|
||||
`Delay workspace render until session init completes`
|
||||
|
||||
What it changed:
|
||||
|
||||
- replaced the early workspace UI with a loading shell
|
||||
|
||||
What it revealed:
|
||||
|
||||
- broad render suppression was too blunt
|
||||
- it masked, rather than removed, the actual failing rerender path
|
||||
|
||||
### `e0b7d27` on May 4, 2026
|
||||
|
||||
`Stage workspace controls after bootstrap`
|
||||
|
||||
What it changed:
|
||||
|
||||
- restored the base workspace
|
||||
- deferred some input-heavy controls to later batches
|
||||
|
||||
What it revealed:
|
||||
|
||||
- the crash moved later in startup
|
||||
- the base play view could survive, but later structural updates still failed
|
||||
|
||||
Taken together, these commits are evidence that the app was being pushed toward compatibility through localized mitigations, while the larger architecture still tolerated unstable startup ownership.
|
||||
|
||||
## What Actually Failed
|
||||
|
||||
The practical failure mode was:
|
||||
|
||||
1. login succeeded
|
||||
2. the authenticated workspace circuit started
|
||||
3. early render batches built or reshaped a large DOM subtree
|
||||
4. RoboForm touched form-related DOM inside that subtree
|
||||
5. Blazor attempted to apply a later batch using DOM assumptions that were no longer true
|
||||
6. the browser-side batch apply failed with `insertBefore ... parentNode is null`
|
||||
7. the server terminated the circuit
|
||||
|
||||
The `Offline fallback` label was mostly a consequence:
|
||||
|
||||
- once the circuit failed, live-state coordination could not complete cleanly
|
||||
- the connection-state UI then reflected that degraded state
|
||||
|
||||
## Findings
|
||||
|
||||
### Primary finding
|
||||
|
||||
The authenticated workspace should not have been architected as a multi-batch, structurally dynamic, form-heavy startup surface if compatibility with password managers and other DOM-mutating extensions is a requirement.
|
||||
|
||||
### Secondary findings
|
||||
|
||||
- `App.razor` became a hidden architecture boundary without being treated as one
|
||||
- the workspace composition root is still too structurally broad
|
||||
- frontend ownership is split between Blazor and handwritten JS in a way that complicates startup reasoning
|
||||
- live updates were added as another reactivity source before the UI ownership model was made robust
|
||||
- documentation no longer described the actual architecture, making corrective design work harder
|
||||
|
||||
## Remediation Directions
|
||||
|
||||
These are architectural directions, not the implementation plan.
|
||||
|
||||
### 1. Choose a single ownership model for the authenticated shell
|
||||
|
||||
The authenticated shell should not be partly “request-decided” and partly “interactive-decided” in a component tree that still relies on request-time state.
|
||||
|
||||
### 2. Stop using `OnAfterRenderAsync` as the main workspace bootstrap orchestrator
|
||||
|
||||
The authenticated workspace needs a stable initial render contract with fewer structural follow-up diffs.
|
||||
|
||||
### 3. Reduce startup-state multiplicity
|
||||
|
||||
The startup path should not require simultaneous coordination across:
|
||||
|
||||
- Blazor state
|
||||
- `sessionStorage`
|
||||
- `fetch`
|
||||
- SSE
|
||||
|
||||
at least not before the UI is stable.
|
||||
|
||||
### 4. Shrink the render blast radius
|
||||
|
||||
The workspace root should own less structural branching. The more isolated the screen and control subtrees are, the less likely a third-party DOM mutation is to invalidate a broad diff.
|
||||
|
||||
### 5. Treat extension compatibility as a design requirement
|
||||
|
||||
Password managers are not an edge case for login and form-driven applications. The UI architecture must assume that form controls can be wrapped, annotated, or moved by browser software.
|
||||
|
||||
### 6. Realign documentation with the code
|
||||
|
||||
The design notes and README need to describe the actual architecture before a stable fix plan is made. Otherwise future changes will continue to optimize around an outdated mental model.
|
||||
|
||||
## Conclusion
|
||||
|
||||
RpgRoller did not fail because RoboForm existed. It failed because the app’s frontend architecture evolved into a shape where the authenticated workspace depended on fragile early render batches, mixed ownership boundaries, and multiple overlapping state channels.
|
||||
|
||||
RoboForm exposed that weakness reliably.
|
||||
|
||||
The correct next step is not another isolated workaround. The correct next step is to redesign the authenticated shell and workspace startup path around stable DOM ownership and simpler state flow.
|
||||
Reference in New Issue
Block a user