Skip to content

Extract R analysis into modular backend abstraction#153

Open
sanghoonio wants to merge 3 commits into
mainfrom
modular-backend-r
Open

Extract R analysis into modular backend abstraction#153
sanghoonio wants to merge 3 commits into
mainfrom
modular-backend-r

Conversation

@sanghoonio

@sanghoonio sanghoonio commented Mar 27, 2026

Copy link
Copy Markdown
Member

Summary

Refactor bedboss bedstat into a modular backend system. Zero behavior change — all existing R-based analysis works identically.

What changed

New package: bedboss/bedstat/backends/

  • base.py: StatBackend ABC with compute() and cleanup() methods, context manager protocol
  • r_backend.py: RStatBackend — extracts current R logic from bedstat.py (move, not rewrite)
  • __init__.py: create_backend(name) low-level factory, build_backend(name) high-level constructor that handles backend-specific prerequisites (e.g. starts RServiceManager for R)

Refactored orchestration:

  • bedstat.py: signature changed from bedstat(backend="r", r_service=..., ...) to bedstat(backend: StatBackend, ...) — takes an instance, not a string
  • bedboss.py, insert_pep, reprocess_all, upload_all: build one backend at batch-orchestrator level via build_backend(), pass through to bedstat() for each file, cleanup when done
  • cli.py: standalone bedstat command uses with block for automatic cleanup

Why this matters:
Backend instances hold resources across a batch (R keeps its subprocess alive, future backends can cache reference data). The old pattern reconstructed backends per file, which prevented cross-file amortization.

Constants

Test plan

  • All existing behavior preserved — R backend is a pure extraction
  • RServiceManager lifecycle unchanged (created in build_backend, cleaned up by caller)
  • Black formatted

🤖 Generated with Claude Code

Create bedboss/bedstat/backends/ package with StatBackend ABC and
RStatBackend implementation. Move R logic from bedstat.py into
r_backend.py (no rewrite, just extraction). bedstat() becomes a thin
dispatcher that delegates to the configured backend via factory.

bedboss.py reads backend from bbagent.config.config.analysis.backend
and passes it through. All R scripts and r_service.py untouched.
Identical behavior when backend="r" (the default).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move magic strings "r" and "gtars" to BACKEND_R / BACKEND_GTARS
constants in const.py. Also fixes black formatting in __init__.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sanghoonio sanghoonio mentioned this pull request Apr 3, 2026
5 tasks
Architecture change: batch orchestrators build a stats backend once and
pass it through the call chain as a StatBackend instance, instead of
creating a new backend per file inside bedstat().

Before: bedstat(backend="r", r_service=..., ...) reconstructed the
backend on every call. Fine for R because r_service was passed IN and
the RStatBackend constructor was cheap, but broken for future backends
that need per-instance caches (GtarsPyStatBackend's reference cache
would be wiped between files).

After: callers build one backend for their batch, pass it to bedstat()
for each file, and cleanup() at the end. Backend-internal resources
(R service, cached references) are held on the backend instance and
amortized across the whole batch.

Changes:
- bedstat(): signature takes `backend: StatBackend` instead of
  `backend: str + r_service`; no longer calls create_backend internally
- Add build_backend(name) helper to backends/__init__.py that hides
  backend-specific prerequisites (e.g. starts RServiceManager for "r")
- StatBackend base: add __enter__/__exit__ so backends work as context
  managers (with build_backend("r") as backend: ...)
- run_all / insert_pep / reprocess_all / upload_all / _upload_gse:
  build backend at batch-orchestrator level, pass through, cleanup
  when done
- cli.py `bedstat` standalone command: build+use via `with` block
- run_all accepts optional backend=None for single-file standalone use;
  builds + cleans up locally if caller didn't provide one

This is a breaking change — no backward compat shim. The backend
abstraction is pre-merge (PR #153).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant