Skip to content

Add initial backfill package#779

Open
sij411 wants to merge 7 commits into
fedify-dev:feat/backfillfrom
sij411:feat/backfill
Open

Add initial backfill package#779
sij411 wants to merge 7 commits into
fedify-dev:feat/backfillfrom
sij411:feat/backfill

Conversation

@sij411
Copy link
Copy Markdown
Contributor

@sij411 sij411 commented May 27, 2026

Summary

  • Add the initial @fedify/backfill package setup and exports.
  • Add the backfill() async generator API for direct context collection backfill.
  • Traverse seed object context when it resolves to an ActivityStreams collection or collection page, yielding post-like BackfillItem objects.
  • Add tests covering missing/non-collection contexts, embedded and URL collection items, deduplication, request/item budgets, abort handling, and interval behavior.

Verification

  • deno task -f @fedify/backfill check
  • deno task -f @fedify/backfill test
  • pnpm --filter @fedify/backfill test
  • mise exec -- pnpm --filter @fedify/backfill test:bun

AI usage

Assisted by Codex (GPT-5).

sij411 added 5 commits May 23, 2026 11:28
Define the initial @fedify/backfill async generator API around a typed
BackfillContext, note seed object, traversal options, and BackfillItem
wrappers.  The generator remains a stub so tests and traversal logic can be
added in follow-up commits.

Assisted-by: Codex:gpt-5
Add the initial context-posts traversal for @fedify/backfill.  The
implementation dereferences the seed object's context, accepts direct
ActivityStreams collections and collection pages, yields post-like objects,
and enforces request, item, interval, abort, and duplicate-id handling.

Add tests for the PR 1 behavior across Deno, Node.js, and Bun.

Assisted-by: Codex:gpt-5
Replace the scaffold status text with a short description of the initial
context collection backfill behavior and a minimal usage example.

Assisted-by: Codex:gpt-5
Remove unrelated lockfile churn from the backfill branch, keeping only the
new package importer required for @fedify/backfill.

Assisted-by: Codex:gpt-5
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aa44c238-e359-4a5b-9514-00a8c03fc0d5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds the @fedify/backfill package to the Fedify monorepo. The package provides an async generator that traverses ActivityPub collections referenced from a seed object's context metadata, with support for request budgeting, rate limiting, deduplication, and cancellation.

Changes

@fedify/backfill Package

Layer / File(s) Summary
Backfill Types and Interface
packages/backfill/src/types.ts
Defines BackfillStrategy and BackfillOrigin literals, plus public interfaces for configurable traversal: BackfillDocumentLoader, BackfillContext, BackfillOptions (with maxItems, maxDepth, maxRequests, and interval controls), and BackfillItem for discovered objects.
Backfill Implementation and Helpers
packages/backfill/src/backfill.ts
Core backfill() async generator that loads a collection from the seed note's context ID, iterates items with deduplication and filtering for context post objects, and yields non-duplicate, non-Activity results. Includes loadObject() with request budgeting (enforced via MaxRequestsExceeded), waitForInterval() for rate limiting, getCollectionItems() for collection traversal, and type guards isCollection() and isContextPostObject().
Module Exports
packages/backfill/src/mod.ts
Package entrypoint re-exporting backfill() and all public types from types.ts.
Test Suite
packages/backfill/src/backfill.test.ts
Comprehensive test coverage for context/collection validation, item deduplication, request limiting (maxRequests), cancellation (AbortSignal), rate limiting (interval callback), and correct handling of embedded vs. URL-referenced objects.
Package Configuration
packages/backfill/{deno.json,package.json,tsdown.config.ts}, deno.json, pnpm-workspace.yaml
Build and workspace configuration including version 2.3.0, ESM/CJS dual outputs via tsdown, Deno tasks for check and test, and workspace integration (added to deno.json, removed from pnpm).
Documentation
packages/backfill/README.md, packages/fedify/README.md
Package documentation with usage example and behavioral notes; monorepo README updated with @fedify/backfill entry and JSR/npm links.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant backfill as backfill()
  participant loadObject
  participant documentLoader
  participant collection
  
  Caller->>backfill: backfill(context, note, options)
  backfill->>backfill: Validate maxItems/contextId
  backfill->>loadObject: Load collection from contextId
  loadObject->>documentLoader: documentLoader(collectionUrl)
  documentLoader-->>loadObject: Collection object
  loadObject-->>backfill: Loaded collection
  backfill->>collection: collection.getItems()
  loop For each item in collection
    alt Item is URL reference
      backfill->>loadObject: Load referenced object
      loadObject->>waitForInterval: Check interval delay
      loadObject->>documentLoader: documentLoader(itemUrl, {signal})
      documentLoader-->>loadObject: Object or null
      loadObject-->>backfill: Loaded object
    else Item is embedded object
      backfill->>backfill: Use embedded object directly
    end
    alt Is context post (not Activity, not collection)
      backfill->>backfill: Check deduplication set
      alt Not seen before and within maxItems
        backfill-->>Caller: yield BackfillItem
      end
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #275: Implements the backfill functionality and async generator API with FEP-f228/context-based collection traversal, request budgeting, deduplication, and comprehensive test coverage as described in the issue.

Suggested labels

component/federation, type/feature

Suggested reviewers

  • dahlia
  • 2chanhaeng
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding a new backfill package to the monorepo with initial setup and exports.
Description check ✅ Passed The description is directly related to the changeset, documenting the new @fedify/backfill package setup, the backfill() async generator API, traversal logic, test coverage, and verification commands.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sij411 sij411 requested a review from dahlia May 27, 2026 08:04
@sij411
Copy link
Copy Markdown
Contributor Author

sij411 commented May 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new package @fedify/backfill to provide ActivityPub conversation backfill support for the Fedify ecosystem. It implements a backfill() async generator that retrieves post-like objects from a seed object's context collection. The review feedback highlights three main areas for improvement: handling load failures of individual collection items gracefully by returning a dummy Activity document instead of terminating the entire backfill process, using optional chaining when accessing note.contextIds to prevent runtime errors, and removing the abort event listener on normal timeout completion to prevent memory leaks while prioritizing aborted signal checks.

Comment thread packages/backfill/src/backfill.ts Outdated
Comment thread packages/backfill/src/backfill.ts
Comment thread packages/backfill/src/backfill.ts
@sij411 sij411 removed the request for review from dahlia May 27, 2026 08:05
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 81.87500% with 29 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/backfill/src/backfill.ts 81.76% 20 Missing and 9 partials ⚠️
Files with missing lines Coverage Δ
packages/backfill/src/mod.ts 100.00% <100.00%> (ø)
packages/backfill/src/backfill.ts 81.76% <81.76%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/backfill/deno.json`:
- Around line 1-22: Add an imports entry for the `@fedify/vocab` package in this
package's deno.json by adding an "`@fedify/vocab`" key to the "imports" object
that points to the local Deno entrypoint for the vocab package (e.g., the
relative path to vocab's src/mod.ts or its published Deno-compatible entry).
Update the "imports" field (create it if missing) so imports["`@fedify/vocab`"]
resolves to the repo-local vocab module, ensuring the deno.json and package.json
dependency declarations stay in sync.

In `@packages/backfill/src/backfill.test.ts`:
- Line 2: The tests import test/describe from node:test; replace that with
importing test from "`@fedify/fixture`" and remove or refactor any describe(...)
blocks to follow the fixture-based pattern used across the repo. Locate the
import line referencing test/describe and change it to import only test from
"`@fedify/fixture`", then update test files that call describe(...) or use
node:test semantics to instead organize cases using the fixture's test API
(single test functions, fixtures or table-driven subtests) and adjust any
setup/teardown accordingly (e.g., move before/after logic into fixture setup).
Ensure all references to describe and node:test-specific hooks are removed or
converted so the file only relies on the test symbol from "`@fedify/fixture`".

In `@packages/backfill/src/backfill.ts`:
- Around line 141-147: The abort listener is left attached when the timeout
wins, causing listener buildup; in the Promise where you create timeout and call
budget.signal?.addEventListener("abort", ...), register the abort handler in a
variable (e.g., const onAbort = () => { ... }) and pass that to
addEventListener, and when the timer fires (in the setTimeout callback or
immediately before resolve) call budget.signal?.removeEventListener("abort",
onAbort) after clearing the timeout so the handler is removed; ensure the same
handler reference is used for both addEventListener and removeEventListener and
keep rejecting with budget.signal?.reason inside the onAbort so behavior is
unchanged.
- Around line 30-36: The generator currently unsafely casts yielded "object:
object as TObject" in backfill<TObject>, so either require/accept a runtime type
guard and use it to narrow before yielding (e.g., add an isTObject?: (obj:
APObject) => obj is TObject to BackfillOptions and call it to assert object is
TObject before yielding BackfillItem<TObject>), or decouple the generic by
changing the function signature and yields to
AsyncGenerator<BackfillItem<APObject>> (remove the unchecked TObject cast and
keep TObject only for callers who perform their own narrowing). Update the
backfill implementation to use the provided type predicate (or the non-generic
APObject yield) instead of casting to ensure sound typing for
BackfillItem<TObject>.

In `@packages/backfill/tsdown.config.ts`:
- Line 20: The mapping of globbed test entry paths uses f.replace(sep, "/")
which only replaces the first occurrence; update the transformation used in the
.map callback that currently references f.replace(sep, "/") so it replaces all
path separators (e.g., use a global replace or split/join approach such as
splitting on sep and joining with "/" or using replaceAll) to fully normalize
nested Windows paths for the entry generation in tsdown.config.ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f8bcaa34-4cbd-46a7-9994-8d90a9d0b4a5

📥 Commits

Reviewing files that changed from the base of the PR and between 8db5848 and ff8f75f.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (11)
  • deno.json
  • packages/backfill/README.md
  • packages/backfill/deno.json
  • packages/backfill/package.json
  • packages/backfill/src/backfill.test.ts
  • packages/backfill/src/backfill.ts
  • packages/backfill/src/mod.ts
  • packages/backfill/src/types.ts
  • packages/backfill/tsdown.config.ts
  • packages/fedify/README.md
  • pnpm-workspace.yaml

Comment thread packages/backfill/deno.json
Comment thread packages/backfill/src/backfill.test.ts
Comment thread packages/backfill/src/backfill.ts
Comment thread packages/backfill/src/backfill.ts
Comment thread packages/backfill/tsdown.config.ts Outdated
sij411 added 2 commits May 27, 2026 17:50
Skip collection URL items that fail to dereference instead of terminating the
whole traversal, while still stopping when the request budget is exhausted.
Also clean up interval abort listeners after successful waits.

Assisted-by: Codex:gpt-5
Normalize every platform path separator in globbed test entries before passing
those paths to tsdown.

Assisted-by: Codex:gpt-5
@sij411 sij411 requested a review from dahlia May 27, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants