Skip to content

Add known-risks ledger and fix Assurance Audit findings#183

Merged
128na merged 5 commits into
masterfrom
claude/thirsty-ride-a762f8
Jun 22, 2026
Merged

Add known-risks ledger and fix Assurance Audit findings#183
128na merged 5 commits into
masterfrom
claude/thirsty-ride-a762f8

Conversation

@128na

@128na 128na commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

This repo had no threat model or known-risks ledger. I ran an Assurance
Audit across four areas (scrape/extract pipeline, Notion sync, secret
handling, public search/feed) and used the findings to (1) establish a
continuing ledger and (2) fix the highest-priority gaps in priority order.

  • docs/known-risks.md — continuing ledger of protected behaviors (A1–D4) with
    control strength / test count / Status, to be revisited on future audits.
  • docs/assurance-audit-2026-06-22.md — frozen snapshot of the initial audit
    (New Candidate Risks + full matrix).

Fixes (priority order)

  1. Extract handlers no longer delete RawPage on extraction failure.
    The previous code called $rawPage->delete() on any extraction error,
    permanently destroying the only scraped copy on a transient bug.
    Failures are now isolated via a new extract_failed_at flag
    (MarkExtractFailed), cleared automatically on the next successful run.
  2. .env.example now defaults APP_DEBUG=false, closing the
    debug-page secret-leak path (.env.ci/.env.deploy were already
    correct).
  3. Added SecretScrubber plus a Monolog tap (daily/api channels)
    and a ConvertDiscord scrub step, so known secrets (Notion token,
    Discord webhook URL, DB passwords) are masked before reaching log
    files or the Discord webhook.
  4. Notion sync (SyncAction) now isolates per-item failures instead
    of aborting the whole batch on one Notion API error; failure count is
    reported.
  5. Added regression tests for controls that were structurally sound
    but had zero test coverage: per-site/per-URL fault isolation, extract
    idempotency, DB URL-uniqueness constraints, PageResource field
    whitelist (no raw HTML/internal fields leak to the API), and API error
    response shape in production mode.

Known residual gap (recorded, not fixed)

  • B3 (Notion re-sync throttling): no synced_at/status flag, so
    unchanged items get re-pushed every run. Recorded in the ledger with a
    revisit condition rather than implemented now (API-quota waste, not data
    corruption).
  • A2: the upsert-only-on-success path protects against connection
    failures, but a non-2xx HTTP response that doesn't throw could still
    write its body. Noted in the ledger's 是正条件 for A2.

Test plan

  • vendor/bin/pint --dirty — passed
  • composer stan (PHPStan level 9, app/) — no errors
  • php artisan test — 39 passed, 94 assertions, against an isolated
    cs_test database (not the dev cs database)
  • Verified the new extract_failed_at flag is set/cleared correctly
    and that a second RawPage still processes after the first one fails
    (HandlerIsolationTest)
  • Verified Notion sync continues past a single item's API failure
    (SyncActionTest::test_continues_syncing_when_one_item_fails)
  • Verified secrets configured via Config::set do not appear in
    scrubbed log/Discord output (SecretScrubberTest,
    RedactSecretsProcessorTest)

🤖 Generated with Claude Code

No threat model or known-risks ledger existed in this repo; an Assurance
Audit was run across scrape/extract, Notion sync, secret handling, and
public search/feed to establish one and close the highest-priority gaps.

- docs/known-risks.md: continuing ledger of protected behaviors (A1-D4)
  with control strength, test count, and Status per the audit rubric.
- docs/assurance-audit-2026-06-22.md: frozen snapshot of the initial audit.

Fixes applied in priority order:
- Extract handlers no longer delete RawPage on extraction failure
  (destroyed the only scraped copy on a transient bug); failures are now
  isolated via an extract_failed_at flag, cleared on the next success.
- .env.example now defaults APP_DEBUG=false to close the debug-page
  secret-leak path.
- Added SecretScrubber + a Monolog tap + Discord converter scrubbing so
  known secrets (Notion token, Discord webhook, DB passwords) are masked
  before reaching log files or the Discord webhook.
- Notion sync now isolates per-item failures in SyncAction instead of
  aborting the whole batch on one API error.
- Added regression tests for previously-untested-but-sound controls
  (per-site fault isolation, extract idempotency, URL uniqueness,
  PageResource field whitelist, API error response shape).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the application's resilience and security by replacing raw page deletion on extraction failure with a failure-marking mechanism, preventing complete batch failures during Notion synchronization when individual items fail, and introducing a SecretScrubber to redact sensitive credentials from logs and Discord notifications. It also defaults APP_DEBUG to false and adds comprehensive feature tests. The review feedback highlights a scoping issue in SyncAction where a failed loop iteration could log the previous iteration's URL, and suggests a length threshold in SecretScrubber to prevent over-redaction of short development passwords.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread app/Actions/SyncNotion/SyncAction.php
Comment thread app/Actions/Logging/SecretScrubber.php Outdated
- SecretScrubber: use Config::get() instead of Config::string() —
  the latter throws when a config key is present but explicitly null
  (e.g. LOG_DISCORD_WEBHOOK_URL unset in CI), which broke
  RedactSecretsProcessorTest in the "test (Feature)" CI job. Also drop
  secrets shorter than 4 chars from the redaction list so a short dev
  password (e.g. "root") can't mass-redact unrelated log content.
- SyncAction::deleteOldNotionPages: initialize $url before the try
  block so a failure on getUrlProp() doesn't log the previous
  iteration's URL instead of the one that actually failed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@128na

128na commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust error handling and security enhancements across the scraping and synchronization pipelines. Key changes include isolating extraction failures on RawPage using an extract_failed_at timestamp instead of deleting records, implementing a SecretScrubber to redact sensitive credentials from logs and Discord payloads, and updating the Notion sync to process items individually so a single API failure does not halt the batch. Feedback from the review highlights three important improvements: first, in ConvertDiscord, the stacktrace must be extracted before scrubbing the context to prevent mutating Throwable objects; second, the resolved secrets in SecretScrubber should be cached to avoid redundant configuration lookups; and third, the lookup in SyncAction should be optimized to O(N + M) complexity by keying the collection by URL.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread app/Actions/Logging/ConvertDiscord.php
Comment thread app/Actions/Logging/SecretScrubber.php
Comment thread app/Actions/SyncNotion/SyncAction.php
128na and others added 2 commits June 22, 2026 19:58
- ConvertDiscord: extract the stacktrace before scrubbing context
  (scrubArray() stringifies any Throwable, so getStacktrace() would
  return null if called after). More importantly, the inherited
  addMessageStacktrace() was still running after our override and
  overwriting $message->file with the *unscrubbed* raw trace, which
  would have defeated the whole redaction path for error-level Discord
  messages. Override it as a no-op since the scrubbed trace is already
  attached in addMessageContent(). Added ConvertDiscordTest covering
  both behaviors directly.
- SecretScrubber: cache the resolved secret list per instance (not
  static — a static cache would leak Config::set() changes across
  PHPUnit test methods in the same process) to avoid recomputing
  Config::get() + array filtering on every log line.
- SyncAction::deleteOldNotionPages: key $pages by url once before the
  loop instead of calling Collection::first() per Notion page, cutting
  the lookup from O(N*M) to O(N+M).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@128na

128na commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances application reliability and security by introducing fault isolation in the extraction and Notion sync pipelines, implementing a 'SecretScrubber' to redact sensitive credentials from logs and Discord messages, and ensuring production error responses do not leak stack traces. Feedback from the code review highlights several critical improvements: optimizing the Notion sync loop to prevent exception propagation and reduce lookup complexity to O(1), avoiding accidental deletion of Notion pages with null URLs, increasing the minimum secret scrubbing length to 5 to prevent redacting common words like 'root', warning about the side effects of converting Throwable objects to strings in log contexts, and adding a database index on the new 'extract_failed_at' column to maintain query performance.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread app/Actions/SyncNotion/SyncAction.php
Comment thread app/Actions/SyncNotion/SyncAction.php Outdated
Comment thread app/Actions/Logging/SecretScrubber.php Outdated
Comment thread app/Actions/Logging/SecretScrubber.php
- SyncAction::deleteOldNotionPages: a Notion page with a null URL
  (e.g. a manually created draft) no longer gets deleted — only pages
  with a non-null URL absent from our DB are removed.
- SyncAction::addNewNotionPages: pre-key Notion pages by URL once
  (keyNotionPagesByUrl()) instead of calling Collection::first() with
  a closure that calls getUrlProp() per page inside the loop. That
  closure could throw on a single malformed Notion page and take down
  the entire $pages loop, defeating the per-item fault isolation this
  PR introduced. The new helper isolates URL-parsing failures per
  Notion page and also cuts the lookup from O(N*M) to O(N+M).
- SecretScrubber: raise the minimum scrub length from 4 to 5 so a
  4-char dev password like "root" doesn't redact unrelated words
  (chroot, uproot, root user, etc). Added a regression test pinning
  this. Documented the Throwable->string side effect in walk() for
  future Monolog processor ordering.
- Migration: index extract_failed_at since it will be queried via
  whereNotNull/whereNull as the table grows.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@128na 128na merged commit 33cebdb into master Jun 22, 2026
5 checks passed
@128na 128na deleted the claude/thirsty-ride-a762f8 branch June 22, 2026 12:27
128na added a commit that referenced this pull request Jun 22, 2026
A4/B2/B4/C3 entries were left describing the pre-review-feedback state.
Update them to match what actually merged: extract_failed_at index,
addNewNotionPages per-item URL-parse isolation + null-URL delete guard,
the 5-char secret-scrub threshold, and ConvertDiscord's
addMessageStacktrace no-op fix — plus corrected test counts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants