Files
GameList/REVIEW.md

166 lines
7.5 KiB
Markdown

# Review - Pick'n'Play (State-of-the-Art Assessment)
Date: 2026-02-18
## Scope
- Evaluated backend (`Program.cs`, `Endpoints/*`, `Infrastructure/*`, `Data/*`), frontend (`wwwroot/*`), and CI/deployment scripts.
- Focused on risks in maintainability, extensibility, scalability, and security.
## Executive summary
The app is solid for a small private group, but it still has several architectural bottlenecks compared to current best practice for long-term product growth. The largest risks are read amplification from client polling, hard-coded workflow/permission modeling, and security hardening gaps (CSRF posture and crypto modernization).
## Findings
### 1) High - Scalability - Single-node SQLite bottleneck
Evidence:
- SQLite is the primary DB (`Program.cs:42`).
Risk:
- SQLite is excellent for small single-node deployments, but write concurrency and horizontal scale are limited for larger or bursty usage.
Alternative:
- Keep SQLite for local/dev and migrate production to PostgreSQL/SQL Server with provider-specific migrations and connection pooling.
### 2) High - Scalability - Polling causes read amplification
Evidence:
- Frontend polling runs continuously with 3s-20s cadence (`wwwroot/app.js:30`, `wwwroot/app.js:58`).
- Each refresh can hit multiple endpoints (`wwwroot/js/data.js:21`, `wwwroot/js/data.js:109`).
- `/api/state` also executes multiple aggregate counts each time (`Endpoints/StateWorkflowService.cs:14`).
Risk:
- As concurrent users increase, backend read load grows quickly and mostly serves unchanged data.
Alternative:
- Move to event-driven updates (SSE/WebSocket) plus conditional GET (`ETag`/`If-None-Match`) and/or a consolidated bootstrap endpoint.
### 3) High - Security - CSRF protection is implicit, not explicit
Evidence:
- Cookie authentication is used for API auth (`Program.cs:100`, `Program.cs:104`).
- Many state-changing endpoints rely on cookie auth (`Endpoints/SuggestEndpoints.cs:24`, `Endpoints/VoteEndpoints.cs:24`, `Endpoints/AdminEndpoints.cs:14`).
- No antiforgery middleware/tokens are configured in startup.
Risk:
- `SameSite=Strict` helps (`Program.cs:104`) but is not a full long-term CSRF strategy across browser/proxy edge cases.
Alternative:
- Add explicit anti-forgery tokens for mutating requests (or move to bearer tokens for API calls) and verify origin headers server-side.
### 4) High - Extensibility - Workflow is hard-coded across backend and frontend
Evidence:
- Phase behavior is encoded via enum and many explicit checks/switches (`Domain/Phase.cs:3`, `Endpoints/StateWorkflowService.cs:70`, `Endpoints/EndpointHelpers.cs:97`).
- Frontend also hard-codes phase logic in many files (`wwwroot/app.js:99`, `wwwroot/js/data.js:47`, `wwwroot/js/votes-ui.js:167`).
Risk:
- Adding a new phase or changing transitions requires touching many scattered branches, increasing regression risk.
Alternative:
- Introduce a shared workflow/state-machine model (transition table) and consume it in both backend and frontend.
### 5) High - Extensibility - Role model is fixed to booleans
Evidence:
- Role/state flags are booleans on player (`Domain/Player.cs:22`, `Domain/Player.cs:23`).
- Admin checks are tightly coupled to that model (`Infrastructure/AdminOnlyFilter.cs:13`, `Endpoints/AdminWorkflowService.cs:93`).
Risk:
- Future needs (moderator, read-only admin, per-feature permissions) require schema and logic rewrites instead of additive changes.
Alternative:
- Move to role/permission tables (or claims-based capability model) and policy-based authorization.
### 6) Medium - Maintainability - Frontend is string-template heavy with global mutable state
Evidence:
- Single global state object (`wwwroot/js/state.js:1`).
- Heavy `innerHTML` rendering across modules (`wwwroot/js/suggestions-ui.js:115`, `wwwroot/js/votes-ui.js:35`, `wwwroot/js/results-ui.js:72`).
Risk:
- Harder refactoring, weaker static guarantees, and easy XSS regressions when new contributors add templates.
Alternative:
- Incrementally move to TypeScript + componentized rendering (or at minimum typed JSDoc + stricter lint rules + centralized safe render helpers).
### 7) Medium - Scalability/Security - In-memory dictionaries are unbounded
Evidence:
- Auth attempt monitor stores failures in unbounded `ConcurrentDictionary` (`Infrastructure/AuthAttemptMonitor.cs:14`).
- Image reachability cache is a static dictionary without size limits (`Endpoints/SuggestionValidator.cs:7`, `Endpoints/SuggestionValidator.cs:42`).
Risk:
- High-cardinality traffic can grow memory and become a denial-of-service vector.
Alternative:
- Replace with bounded `MemoryCache` (size limits + eviction) or distributed cache (Redis) with TTL and cardinality controls.
### 8) Medium - Scalability - Linking/results workflows load full sets into memory
Evidence:
- Link/unlink loads all suggestions then computes roots in memory (`Endpoints/AdminWorkflowService.cs:136`, `Endpoints/AdminWorkflowService.cs:188`).
- Results projection includes full vote lists per suggestion (`Endpoints/ResultsWorkflowService.cs:23`, `Endpoints/ResultsWorkflowService.cs:34`, `Endpoints/ResultsWorkflowService.cs:50`).
Risk:
- Memory and query cost rise non-linearly with larger datasets.
Alternative:
- Introduce persisted link-group IDs and push aggregation to SQL; add pagination/windowing for large result sets.
### 9) Medium - Security - Crypto is good but not state-of-the-art
Evidence:
- Password hashing uses PBKDF2-SHA256 (`Infrastructure/PasswordHasher.cs:33`) with fixed iteration count (`Infrastructure/PasswordHasher.cs:10`).
Risk:
- PBKDF2 remains acceptable, but modern guidance favors memory-hard KDFs (Argon2id/scrypt) against GPU/ASIC attacks.
Alternative:
- Add versioned password hashes and migrate to Argon2id on login/re-hash.
### 10) Medium - Security - CSP remains permissive for inline style and mixed image origins
Evidence:
- CSP allows `style-src 'unsafe-inline'` and `img-src ... https: http:` (`Program.cs:138`).
Risk:
- Wider policy surface than needed, especially for long-term hardening.
Alternative:
- Remove inline style dependence (e.g., CSS classes or nonce/hash), and restrict image sources to `https` and/or a media proxy allowlist.
### 11) Medium - Maintainability/Extensibility - API contract sync is manual
Evidence:
- Frontend endpoints are hard-coded in JS (`wwwroot/js/api.js:37`, `wwwroot/js/api.js:70`).
- Contract is also maintained manually in markdown (`API.md`).
Risk:
- Contract drift between backend DTOs and frontend consumers over time.
Alternative:
- Generate OpenAPI from endpoints and produce typed client code for frontend consumption.
### 12) Medium - Scalability/Resilience - External image validation is synchronous on write path
Evidence:
- Suggestion validation calls network reachability checks (`Endpoints/SuggestionValidator.cs:40`).
- Validation does outbound DNS/connect/HEAD/GET in request path (`Endpoints/EndpointHelpers.cs:196`, `Endpoints/EndpointHelpers.cs:240`, `Endpoints/EndpointHelpers.cs:270`).
Risk:
- User write latency is coupled to third-party host responsiveness.
Alternative:
- Accept URL quickly, then validate asynchronously (background job + status flag), optionally with trusted media proxying.
## Recommended remediation order
1. Reduce read amplification: event-driven updates + state endpoint optimization.
2. Harden security baseline: explicit CSRF, CSP tightening, and password-hash migration plan.
3. Decouple growth hotspots: workflow state machine + role/permission model.
4. Improve operational scale: replace unbounded in-memory structures and large in-memory link/result operations.
5. Introduce contract tooling: OpenAPI + generated frontend client to reduce drift.