Add known-risks ledger and fix Assurance Audit findings by 128na · Pull Request #183 · 128na/SimutransCrossSearch

128na · 2026-06-22T10:15:09Z

Summary

This repo had no threat model or known-risks ledger. I ran an Assurance
Audit across four areas (scrape/extract pipeline, Notion sync, secret
handling, public search/feed) and used the findings to (1) establish a
continuing ledger and (2) fix the highest-priority gaps in priority order.

docs/known-risks.md — continuing ledger of protected behaviors (A1–D4) with
control strength / test count / Status, to be revisited on future audits.
docs/assurance-audit-2026-06-22.md — frozen snapshot of the initial audit
(New Candidate Risks + full matrix).

Fixes (priority order)

Extract handlers no longer delete RawPage on extraction failure.
The previous code called $rawPage->delete() on any extraction error,
permanently destroying the only scraped copy on a transient bug.
Failures are now isolated via a new extract_failed_at flag
(MarkExtractFailed), cleared automatically on the next successful run.
.env.example now defaults APP_DEBUG=false, closing the
debug-page secret-leak path (.env.ci/.env.deploy were already
correct).
Added SecretScrubber plus a Monolog tap (daily/api channels)
and a ConvertDiscord scrub step, so known secrets (Notion token,
Discord webhook URL, DB passwords) are masked before reaching log
files or the Discord webhook.
Notion sync (SyncAction) now isolates per-item failures instead
of aborting the whole batch on one Notion API error; failure count is
reported.
Added regression tests for controls that were structurally sound
but had zero test coverage: per-site/per-URL fault isolation, extract
idempotency, DB URL-uniqueness constraints, PageResource field
whitelist (no raw HTML/internal fields leak to the API), and API error
response shape in production mode.

Known residual gap (recorded, not fixed)

B3 (Notion re-sync throttling): no synced_at/status flag, so
unchanged items get re-pushed every run. Recorded in the ledger with a
revisit condition rather than implemented now (API-quota waste, not data
corruption).
A2: the upsert-only-on-success path protects against connection
failures, but a non-2xx HTTP response that doesn't throw could still
write its body. Noted in the ledger's 是正条件 for A2.

Test plan

vendor/bin/pint --dirty — passed
composer stan (PHPStan level 9, app/) — no errors
php artisan test — 39 passed, 94 assertions, against an isolated
cs_test database (not the dev cs database)
Verified the new extract_failed_at flag is set/cleared correctly
and that a second RawPage still processes after the first one fails
(HandlerIsolationTest)
Verified Notion sync continues past a single item's API failure
(SyncActionTest::test_continues_syncing_when_one_item_fails)
Verified secrets configured via Config::set do not appear in
scrubbed log/Discord output (SecretScrubberTest,
RedactSecretsProcessorTest)

🤖 Generated with Claude Code

No threat model or known-risks ledger existed in this repo; an Assurance Audit was run across scrape/extract, Notion sync, secret handling, and public search/feed to establish one and close the highest-priority gaps. - docs/known-risks.md: continuing ledger of protected behaviors (A1-D4) with control strength, test count, and Status per the audit rubric. - docs/assurance-audit-2026-06-22.md: frozen snapshot of the initial audit. Fixes applied in priority order: - Extract handlers no longer delete RawPage on extraction failure (destroyed the only scraped copy on a transient bug); failures are now isolated via an extract_failed_at flag, cleared on the next success. - .env.example now defaults APP_DEBUG=false to close the debug-page secret-leak path. - Added SecretScrubber + a Monolog tap + Discord converter scrubbing so known secrets (Notion token, Discord webhook, DB passwords) are masked before reaching log files or the Discord webhook. - Notion sync now isolates per-item failures in SyncAction instead of aborting the whole batch on one API error. - Added regression tests for previously-untested-but-sound controls (per-site fault isolation, extract idempotency, URL uniqueness, PageResource field whitelist, API error response shape). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-22T10:15:13Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

gemini-code-assist

Code Review

This pull request enhances the application's resilience and security by replacing raw page deletion on extraction failure with a failure-marking mechanism, preventing complete batch failures during Notion synchronization when individual items fail, and introducing a SecretScrubber to redact sensitive credentials from logs and Discord notifications. It also defaults APP_DEBUG to false and adds comprehensive feature tests. The review feedback highlights a scoping issue in SyncAction where a failed loop iteration could log the previous iteration's URL, and suggests a length threshold in SecretScrubber to prevent over-redaction of short development passwords.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

- SecretScrubber: use Config::get() instead of Config::string() — the latter throws when a config key is present but explicitly null (e.g. LOG_DISCORD_WEBHOOK_URL unset in CI), which broke RedactSecretsProcessorTest in the "test (Feature)" CI job. Also drop secrets shorter than 4 chars from the redaction list so a short dev password (e.g. "root") can't mass-redact unrelated log content. - SyncAction::deleteOldNotionPages: initialize $url before the try block so a failure on getUrlProp() doesn't log the previous iteration's URL instead of the one that actually failed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

128na · 2026-06-22T10:49:58Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces robust error handling and security enhancements across the scraping and synchronization pipelines. Key changes include isolating extraction failures on RawPage using an extract_failed_at timestamp instead of deleting records, implementing a SecretScrubber to redact sensitive credentials from logs and Discord payloads, and updating the Notion sync to process items individually so a single API failure does not halt the batch. Feedback from the review highlights three important improvements: first, in ConvertDiscord, the stacktrace must be extracted before scrubbing the context to prevent mutating Throwable objects; second, the resolved secrets in SecretScrubber should be cached to avoid redundant configuration lookups; and third, the lookup in SyncAction should be optimized to O(N + M) complexity by keying the collection by URL.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

- ConvertDiscord: extract the stacktrace before scrubbing context (scrubArray() stringifies any Throwable, so getStacktrace() would return null if called after). More importantly, the inherited addMessageStacktrace() was still running after our override and overwriting $message->file with the *unscrubbed* raw trace, which would have defeated the whole redaction path for error-level Discord messages. Override it as a no-op since the scrubbed trace is already attached in addMessageContent(). Added ConvertDiscordTest covering both behaviors directly. - SecretScrubber: cache the resolved secret list per instance (not static — a static cache would leak Config::set() changes across PHPUnit test methods in the same process) to avoid recomputing Config::get() + array filtering on every log line. - SyncAction::deleteOldNotionPages: key $pages by url once before the loop instead of calling Collection::first() per Notion page, cutting the lookup from O(N*M) to O(N+M). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

128na · 2026-06-22T11:02:00Z

/gemini review

gemini-code-assist

Code Review

This pull request enhances application reliability and security by introducing fault isolation in the extraction and Notion sync pipelines, implementing a 'SecretScrubber' to redact sensitive credentials from logs and Discord messages, and ensuring production error responses do not leak stack traces. Feedback from the code review highlights several critical improvements: optimizing the Notion sync loop to prevent exception propagation and reduce lookup complexity to O(1), avoiding accidental deletion of Notion pages with null URLs, increasing the minimum secret scrubbing length to 5 to prevent redacting common words like 'root', warning about the side effects of converting Throwable objects to strings in log contexts, and adding a database index on the new 'extract_failed_at' column to maintain query performance.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

- SyncAction::deleteOldNotionPages: a Notion page with a null URL (e.g. a manually created draft) no longer gets deleted — only pages with a non-null URL absent from our DB are removed. - SyncAction::addNewNotionPages: pre-key Notion pages by URL once (keyNotionPagesByUrl()) instead of calling Collection::first() with a closure that calls getUrlProp() per page inside the loop. That closure could throw on a single malformed Notion page and take down the entire $pages loop, defeating the per-item fault isolation this PR introduced. The new helper isolates URL-parsing failures per Notion page and also cuts the lookup from O(N*M) to O(N+M). - SecretScrubber: raise the minimum scrub length from 4 to 5 so a 4-char dev password like "root" doesn't redact unrelated words (chroot, uproot, root user, etc). Added a regression test pinning this. Documented the Throwable->string side effect in walk() for future Monolog processor ordering. - Migration: index extract_failed_at since it will be queried via whereNotNull/whereNull as the table grows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A4/B2/B4/C3 entries were left describing the pre-review-feedback state. Update them to match what actually merged: extract_failed_at index, addNewNotionPages per-item URL-parse isolation + null-URL delete guard, the 5-char secret-scrub threshold, and ConvertDiscord's addMessageStacktrace no-op fix — plus corrected test counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread app/Actions/SyncNotion/SyncAction.php

Comment thread app/Actions/Logging/SecretScrubber.php Outdated

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread app/Actions/Logging/ConvertDiscord.php

Comment thread app/Actions/Logging/SecretScrubber.php

Comment thread app/Actions/SyncNotion/SyncAction.php

128na and others added 2 commits June 22, 2026 19:58

[rector] Rector fixes

7e951fe

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

128na merged commit 33cebdb into master Jun 22, 2026
5 checks passed

128na deleted the claude/thirsty-ride-a762f8 branch June 22, 2026 12:27

128na mentioned this pull request Jun 22, 2026

Close remaining known-risks gaps (A2, D4, SPOF test coverage) #184

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add known-risks ledger and fix Assurance Audit findings#183

Add known-risks ledger and fix Assurance Audit findings#183
128na merged 5 commits into
masterfrom
claude/thirsty-ride-a762f8

128na commented Jun 22, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

128na commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

128na commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

128na commented Jun 22, 2026

Summary

Fixes (priority order)

Known residual gap (recorded, not fixed)

Test plan

Uh oh!

chatgpt-codex-connector Bot commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

128na commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

128na commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants