Add initial backfill package#779
Conversation
Define the initial @fedify/backfill async generator API around a typed BackfillContext, note seed object, traversal options, and BackfillItem wrappers. The generator remains a stub so tests and traversal logic can be added in follow-up commits. Assisted-by: Codex:gpt-5
Add the initial context-posts traversal for @fedify/backfill. The implementation dereferences the seed object's context, accepts direct ActivityStreams collections and collection pages, yields post-like objects, and enforces request, item, interval, abort, and duplicate-id handling. Add tests for the PR 1 behavior across Deno, Node.js, and Bun. Assisted-by: Codex:gpt-5
Replace the scaffold status text with a short description of the initial context collection backfill behavior and a minimal usage example. Assisted-by: Codex:gpt-5
Remove unrelated lockfile churn from the backfill branch, keeping only the new package importer required for @fedify/backfill. Assisted-by: Codex:gpt-5
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds the Changes
Sequence DiagramsequenceDiagram
participant Caller
participant backfill as backfill()
participant loadObject
participant documentLoader
participant collection
Caller->>backfill: backfill(context, note, options)
backfill->>backfill: Validate maxItems/contextId
backfill->>loadObject: Load collection from contextId
loadObject->>documentLoader: documentLoader(collectionUrl)
documentLoader-->>loadObject: Collection object
loadObject-->>backfill: Loaded collection
backfill->>collection: collection.getItems()
loop For each item in collection
alt Item is URL reference
backfill->>loadObject: Load referenced object
loadObject->>waitForInterval: Check interval delay
loadObject->>documentLoader: documentLoader(itemUrl, {signal})
documentLoader-->>loadObject: Object or null
loadObject-->>backfill: Loaded object
else Item is embedded object
backfill->>backfill: Use embedded object directly
end
alt Is context post (not Activity, not collection)
backfill->>backfill: Check deduplication set
alt Not seen before and within maxItems
backfill-->>Caller: yield BackfillItem
end
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new package @fedify/backfill to provide ActivityPub conversation backfill support for the Fedify ecosystem. It implements a backfill() async generator that retrieves post-like objects from a seed object's context collection. The review feedback highlights three main areas for improvement: handling load failures of individual collection items gracefully by returning a dummy Activity document instead of terminating the entire backfill process, using optional chaining when accessing note.contextIds to prevent runtime errors, and removing the abort event listener on normal timeout completion to prevent memory leaks while prioritizing aborted signal checks.
Codecov Report❌ Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/backfill/deno.json`:
- Around line 1-22: Add an imports entry for the `@fedify/vocab` package in this
package's deno.json by adding an "`@fedify/vocab`" key to the "imports" object
that points to the local Deno entrypoint for the vocab package (e.g., the
relative path to vocab's src/mod.ts or its published Deno-compatible entry).
Update the "imports" field (create it if missing) so imports["`@fedify/vocab`"]
resolves to the repo-local vocab module, ensuring the deno.json and package.json
dependency declarations stay in sync.
In `@packages/backfill/src/backfill.test.ts`:
- Line 2: The tests import test/describe from node:test; replace that with
importing test from "`@fedify/fixture`" and remove or refactor any describe(...)
blocks to follow the fixture-based pattern used across the repo. Locate the
import line referencing test/describe and change it to import only test from
"`@fedify/fixture`", then update test files that call describe(...) or use
node:test semantics to instead organize cases using the fixture's test API
(single test functions, fixtures or table-driven subtests) and adjust any
setup/teardown accordingly (e.g., move before/after logic into fixture setup).
Ensure all references to describe and node:test-specific hooks are removed or
converted so the file only relies on the test symbol from "`@fedify/fixture`".
In `@packages/backfill/src/backfill.ts`:
- Around line 141-147: The abort listener is left attached when the timeout
wins, causing listener buildup; in the Promise where you create timeout and call
budget.signal?.addEventListener("abort", ...), register the abort handler in a
variable (e.g., const onAbort = () => { ... }) and pass that to
addEventListener, and when the timer fires (in the setTimeout callback or
immediately before resolve) call budget.signal?.removeEventListener("abort",
onAbort) after clearing the timeout so the handler is removed; ensure the same
handler reference is used for both addEventListener and removeEventListener and
keep rejecting with budget.signal?.reason inside the onAbort so behavior is
unchanged.
- Around line 30-36: The generator currently unsafely casts yielded "object:
object as TObject" in backfill<TObject>, so either require/accept a runtime type
guard and use it to narrow before yielding (e.g., add an isTObject?: (obj:
APObject) => obj is TObject to BackfillOptions and call it to assert object is
TObject before yielding BackfillItem<TObject>), or decouple the generic by
changing the function signature and yields to
AsyncGenerator<BackfillItem<APObject>> (remove the unchecked TObject cast and
keep TObject only for callers who perform their own narrowing). Update the
backfill implementation to use the provided type predicate (or the non-generic
APObject yield) instead of casting to ensure sound typing for
BackfillItem<TObject>.
In `@packages/backfill/tsdown.config.ts`:
- Line 20: The mapping of globbed test entry paths uses f.replace(sep, "/")
which only replaces the first occurrence; update the transformation used in the
.map callback that currently references f.replace(sep, "/") so it replaces all
path separators (e.g., use a global replace or split/join approach such as
splitting on sep and joining with "/" or using replaceAll) to fully normalize
nested Windows paths for the entry generation in tsdown.config.ts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f8bcaa34-4cbd-46a7-9994-8d90a9d0b4a5
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (11)
deno.jsonpackages/backfill/README.mdpackages/backfill/deno.jsonpackages/backfill/package.jsonpackages/backfill/src/backfill.test.tspackages/backfill/src/backfill.tspackages/backfill/src/mod.tspackages/backfill/src/types.tspackages/backfill/tsdown.config.tspackages/fedify/README.mdpnpm-workspace.yaml
Skip collection URL items that fail to dereference instead of terminating the whole traversal, while still stopping when the request budget is exhausted. Also clean up interval abort listeners after successful waits. Assisted-by: Codex:gpt-5
Normalize every platform path separator in globbed test entries before passing those paths to tsdown. Assisted-by: Codex:gpt-5
Summary
@fedify/backfillpackage setup and exports.backfill()async generator API for direct context collection backfill.contextwhen it resolves to an ActivityStreams collection or collection page, yielding post-likeBackfillItemobjects.Verification
deno task -f @fedify/backfill checkdeno task -f @fedify/backfill testpnpm --filter @fedify/backfill testmise exec -- pnpm --filter @fedify/backfill test:bunAI usage
Assisted by Codex (GPT-5).