Skip to content

Close remaining known-risks gaps (A2, D4, SPOF test coverage)#184

Merged
128na merged 3 commits into
masterfrom
claude/known-risks-followups
Jun 22, 2026
Merged

Close remaining known-risks gaps (A2, D4, SPOF test coverage)#184
128na merged 3 commits into
masterfrom
claude/known-risks-followups

Conversation

@128na

@128na 128na commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

Follow-up to #183, closing out the remaining open items in docs/known-risks.md.

Two items needed a behavior decision before implementing:

  • A2 — non-2xx HTTP responses are now treated as failures. FetchHtml calls Http::get()->throw(), so a 404/500 response is handled the same as a connection exception: it's caught by the Handler's existing try/catch and never written to RawPage or cached. (Decision: treat all non-2xx the same way, no 4xx/5xx distinction.)
  • D4 — extracted UpdateOrCreatePageWithPaks, wrapping the Page update and Pak sync in a single DB::transaction(). Previously a SyncPak failure could leave a Page with new title/text but stale/missing pak associations visible to search. All three site Handlers (Japan/Twitrans/Portal) now use this combined action instead of calling UpdateOrCreatePage + SyncPak separately.

B3 (Notion re-sync throttling) stays unimplemented per decision — recorded in the ledger only, to revisit if API quota becomes an actual problem.

The rest is closing every remaining 🟡 SPOF row (single test) to 🟢 OK by adding a second test, per the ledger's own stated correction conditions:

  • A1: scrape-side isolation test (two URLs, one fails, other still scraped) — previously only had an extract-side test.
  • A3: idempotency test across separate action instances (simulates separate job runs, not just reusing one instance).
  • B1: duplicate-prevention test for a page that already exists in Notion (no delete/update mixed in).
  • B2: delete-side fault isolation test + a direct test for the null-URL delete guard (neither existed despite being implemented in Add known-risks ledger and fix Assurance Audit findings #183).
  • C1: a real web-route (non-JSON) debug-leak test — the existing test only covered the JSON/API path.
  • C2: a second test embedding a real secret (DB password) in the exception message, not just a generic string.
  • D1: FeedActionTest proving the RSS feed path also doesn't eager-load rawPage.
  • D3: a test on Page::toFeedItem() itself, not just PageResource.

docs/known-risks.md is now 🟢 OK across the board except B3 (🔴 Missing, by decision) and D2 (N/A — no draft/publish concept exists on Page).

Test plan

  • vendor/bin/pint --dirty — passed
  • composer stan (PHPStan level 9) — no errors
  • vendor/bin/rector process --dry-run — no changes needed
  • php artisan test — 55 passed, 124 assertions, against an isolated cs_test database

🤖 Generated with Claude Code

Two items needed a behavior decision and got one:
- A2: FetchHtml now calls Http::get()->throw(), treating any non-2xx
  response the same as a connection failure — neither writes to
  RawPage nor gets cached. Previously only connection exceptions were
  caught; a 404/500 response body would silently get written/cached.
- D4: extracted Page update and Pak sync into UpdateOrCreatePageWithPaks,
  wrapping both in DB::transaction() so a SyncPak failure can no longer
  leave a Page with new title/text but stale pak associations visible
  to search. All three site Handlers now use this combined action.

B3 (Notion re-sync throttling) stays unimplemented per decision —
recorded in the ledger only, revisit if API quota becomes a problem.

The rest is closing out every remaining SPOF row (single test) to OK
(2+ tests) per the ledger's own correction conditions: A1/A3, B1/B2,
C1/C2, D1/D3. Notably C1 had no direct web-route (non-JSON) test, and
B2's null-URL delete guard and delete-side fault isolation had no test
at all despite being implemented in the prior PR.

known-risks.md updated to 🟢 OK across the board except B3 (still
🔴 Missing, by decision) and D2 (N/A, no draft/publish concept exists).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces atomic page updates and pak synchronization using a new UpdateOrCreatePageWithPaks action wrapped in a database transaction, resolving a potential data consistency issue. It also updates FetchHtml to throw exceptions on non-2xx responses to prevent writing error pages to RawPage, and adds comprehensive feature tests. The review feedback suggests using DB::partialMock() instead of DB::shouldReceive() in tests to avoid breaking subsequent database queries, and recommends restricting HTTP retries to connection issues or 5xx server errors rather than retrying on permanent 4xx client errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tests/Feature/Actions/Extract/UpdateOrCreatePageWithPaksTest.php Outdated
Comment thread app/Actions/Scrape/FetchHtml.php Outdated
128na and others added 2 commits June 22, 2026 23:22
- UpdateOrCreatePageWithPaksTest: DB::shouldReceive('transaction')
  replaces the whole database manager in the container, breaking the
  real queries UpdateOrCreatePage/SyncPak issue inside the mocked
  transaction. Switch to DB::partialMock() so only transaction() is
  mocked and everything else still hits the real connection.
- FetchHtml: retry() previously retried on any throwable, including
  4xx client errors that retrying can never fix (404/403 won't
  resolve on attempt 2). Added a $when callback so only connection
  exceptions and 5xx server errors are retried; 4xx fails immediately.
  Added tests asserting the call count for both branches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@128na

128na commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces transactional atomicity to the page extraction process by wrapping page updates and pak synchronization inside a single database transaction via the new UpdateOrCreatePageWithPaks action. Additionally, it improves HTTP scraping robustness in FetchHtml by throwing exceptions on non-2xx responses and only retrying on 5xx server errors or connection failures. A comprehensive suite of feature tests has been added to cover these scenarios, along with updates to the risk documentation. There are no review comments to address, and the changes look solid.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@128na 128na merged commit d463961 into master Jun 22, 2026
5 checks passed
@128na 128na deleted the claude/known-risks-followups branch June 22, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants