Add known-risks ledger and fix Assurance Audit findings#183
Conversation
No threat model or known-risks ledger existed in this repo; an Assurance Audit was run across scrape/extract, Notion sync, secret handling, and public search/feed to establish one and close the highest-priority gaps. - docs/known-risks.md: continuing ledger of protected behaviors (A1-D4) with control strength, test count, and Status per the audit rubric. - docs/assurance-audit-2026-06-22.md: frozen snapshot of the initial audit. Fixes applied in priority order: - Extract handlers no longer delete RawPage on extraction failure (destroyed the only scraped copy on a transient bug); failures are now isolated via an extract_failed_at flag, cleared on the next success. - .env.example now defaults APP_DEBUG=false to close the debug-page secret-leak path. - Added SecretScrubber + a Monolog tap + Discord converter scrubbing so known secrets (Notion token, Discord webhook, DB passwords) are masked before reaching log files or the Discord webhook. - Notion sync now isolates per-item failures in SyncAction instead of aborting the whole batch on one API error. - Added regression tests for previously-untested-but-sound controls (per-site fault isolation, extract idempotency, URL uniqueness, PageResource field whitelist, API error response shape). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request enhances the application's resilience and security by replacing raw page deletion on extraction failure with a failure-marking mechanism, preventing complete batch failures during Notion synchronization when individual items fail, and introducing a SecretScrubber to redact sensitive credentials from logs and Discord notifications. It also defaults APP_DEBUG to false and adds comprehensive feature tests. The review feedback highlights a scoping issue in SyncAction where a failed loop iteration could log the previous iteration's URL, and suggests a length threshold in SecretScrubber to prevent over-redaction of short development passwords.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
- SecretScrubber: use Config::get() instead of Config::string() — the latter throws when a config key is present but explicitly null (e.g. LOG_DISCORD_WEBHOOK_URL unset in CI), which broke RedactSecretsProcessorTest in the "test (Feature)" CI job. Also drop secrets shorter than 4 chars from the redaction list so a short dev password (e.g. "root") can't mass-redact unrelated log content. - SyncAction::deleteOldNotionPages: initialize $url before the try block so a failure on getUrlProp() doesn't log the previous iteration's URL instead of the one that actually failed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces robust error handling and security enhancements across the scraping and synchronization pipelines. Key changes include isolating extraction failures on RawPage using an extract_failed_at timestamp instead of deleting records, implementing a SecretScrubber to redact sensitive credentials from logs and Discord payloads, and updating the Notion sync to process items individually so a single API failure does not halt the batch. Feedback from the review highlights three important improvements: first, in ConvertDiscord, the stacktrace must be extracted before scrubbing the context to prevent mutating Throwable objects; second, the resolved secrets in SecretScrubber should be cached to avoid redundant configuration lookups; and third, the lookup in SyncAction should be optimized to O(N + M) complexity by keying the collection by URL.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
- ConvertDiscord: extract the stacktrace before scrubbing context (scrubArray() stringifies any Throwable, so getStacktrace() would return null if called after). More importantly, the inherited addMessageStacktrace() was still running after our override and overwriting $message->file with the *unscrubbed* raw trace, which would have defeated the whole redaction path for error-level Discord messages. Override it as a no-op since the scrubbed trace is already attached in addMessageContent(). Added ConvertDiscordTest covering both behaviors directly. - SecretScrubber: cache the resolved secret list per instance (not static — a static cache would leak Config::set() changes across PHPUnit test methods in the same process) to avoid recomputing Config::get() + array filtering on every log line. - SyncAction::deleteOldNotionPages: key $pages by url once before the loop instead of calling Collection::first() per Notion page, cutting the lookup from O(N*M) to O(N+M). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances application reliability and security by introducing fault isolation in the extraction and Notion sync pipelines, implementing a 'SecretScrubber' to redact sensitive credentials from logs and Discord messages, and ensuring production error responses do not leak stack traces. Feedback from the code review highlights several critical improvements: optimizing the Notion sync loop to prevent exception propagation and reduce lookup complexity to O(1), avoiding accidental deletion of Notion pages with null URLs, increasing the minimum secret scrubbing length to 5 to prevent redacting common words like 'root', warning about the side effects of converting Throwable objects to strings in log contexts, and adding a database index on the new 'extract_failed_at' column to maintain query performance.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
- SyncAction::deleteOldNotionPages: a Notion page with a null URL (e.g. a manually created draft) no longer gets deleted — only pages with a non-null URL absent from our DB are removed. - SyncAction::addNewNotionPages: pre-key Notion pages by URL once (keyNotionPagesByUrl()) instead of calling Collection::first() with a closure that calls getUrlProp() per page inside the loop. That closure could throw on a single malformed Notion page and take down the entire $pages loop, defeating the per-item fault isolation this PR introduced. The new helper isolates URL-parsing failures per Notion page and also cuts the lookup from O(N*M) to O(N+M). - SecretScrubber: raise the minimum scrub length from 4 to 5 so a 4-char dev password like "root" doesn't redact unrelated words (chroot, uproot, root user, etc). Added a regression test pinning this. Documented the Throwable->string side effect in walk() for future Monolog processor ordering. - Migration: index extract_failed_at since it will be queried via whereNotNull/whereNull as the table grows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A4/B2/B4/C3 entries were left describing the pre-review-feedback state. Update them to match what actually merged: extract_failed_at index, addNewNotionPages per-item URL-parse isolation + null-URL delete guard, the 5-char secret-scrub threshold, and ConvertDiscord's addMessageStacktrace no-op fix — plus corrected test counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
This repo had no threat model or known-risks ledger. I ran an Assurance
Audit across four areas (scrape/extract pipeline, Notion sync, secret
handling, public search/feed) and used the findings to (1) establish a
continuing ledger and (2) fix the highest-priority gaps in priority order.
control strength / test count / Status, to be revisited on future audits.
(New Candidate Risks + full matrix).
Fixes (priority order)
The previous code called
$rawPage->delete()on any extraction error,permanently destroying the only scraped copy on a transient bug.
Failures are now isolated via a new
extract_failed_atflag(
MarkExtractFailed), cleared automatically on the next successful run..env.examplenow defaultsAPP_DEBUG=false, closing thedebug-page secret-leak path (
.env.ci/.env.deploywere alreadycorrect).
SecretScrubberplus a Monolog tap (daily/apichannels)and a
ConvertDiscordscrub step, so known secrets (Notion token,Discord webhook URL, DB passwords) are masked before reaching log
files or the Discord webhook.
SyncAction) now isolates per-item failures insteadof aborting the whole batch on one Notion API error; failure count is
reported.
but had zero test coverage: per-site/per-URL fault isolation, extract
idempotency, DB URL-uniqueness constraints,
PageResourcefieldwhitelist (no raw HTML/internal fields leak to the API), and API error
response shape in production mode.
Known residual gap (recorded, not fixed)
synced_at/status flag, sounchanged items get re-pushed every run. Recorded in the ledger with a
revisit condition rather than implemented now (API-quota waste, not data
corruption).
failures, but a non-2xx HTTP response that doesn't throw could still
write its body. Noted in the ledger's 是正条件 for
A2.Test plan
vendor/bin/pint --dirty— passedcomposer stan(PHPStan level 9,app/) — no errorsphp artisan test— 39 passed, 94 assertions, against an isolatedcs_testdatabase (not the devcsdatabase)extract_failed_atflag is set/cleared correctlyand that a second RawPage still processes after the first one fails
(
HandlerIsolationTest)(
SyncActionTest::test_continues_syncing_when_one_item_fails)Config::setdo not appear inscrubbed log/Discord output (
SecretScrubberTest,RedactSecretsProcessorTest)🤖 Generated with Claude Code