Extract R analysis into modular backend abstraction#153
Open
sanghoonio wants to merge 3 commits into
Open
Conversation
Create bedboss/bedstat/backends/ package with StatBackend ABC and RStatBackend implementation. Move R logic from bedstat.py into r_backend.py (no rewrite, just extraction). bedstat() becomes a thin dispatcher that delegates to the configured backend via factory. bedboss.py reads backend from bbagent.config.config.analysis.backend and passes it through. All R scripts and r_service.py untouched. Identical behavior when backend="r" (the default). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move magic strings "r" and "gtars" to BACKEND_R / BACKEND_GTARS constants in const.py. Also fixes black formatting in __init__.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Architecture change: batch orchestrators build a stats backend once and
pass it through the call chain as a StatBackend instance, instead of
creating a new backend per file inside bedstat().
Before: bedstat(backend="r", r_service=..., ...) reconstructed the
backend on every call. Fine for R because r_service was passed IN and
the RStatBackend constructor was cheap, but broken for future backends
that need per-instance caches (GtarsPyStatBackend's reference cache
would be wiped between files).
After: callers build one backend for their batch, pass it to bedstat()
for each file, and cleanup() at the end. Backend-internal resources
(R service, cached references) are held on the backend instance and
amortized across the whole batch.
Changes:
- bedstat(): signature takes `backend: StatBackend` instead of
`backend: str + r_service`; no longer calls create_backend internally
- Add build_backend(name) helper to backends/__init__.py that hides
backend-specific prerequisites (e.g. starts RServiceManager for "r")
- StatBackend base: add __enter__/__exit__ so backends work as context
managers (with build_backend("r") as backend: ...)
- run_all / insert_pep / reprocess_all / upload_all / _upload_gse:
build backend at batch-orchestrator level, pass through, cleanup
when done
- cli.py `bedstat` standalone command: build+use via `with` block
- run_all accepts optional backend=None for single-file standalone use;
builds + cleans up locally if caller didn't provide one
This is a breaking change — no backward compat shim. The backend
abstraction is pre-merge (PR #153).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactor bedboss bedstat into a modular backend system. Zero behavior change — all existing R-based analysis works identically.
What changed
New package:
bedboss/bedstat/backends/base.py:StatBackendABC withcompute()andcleanup()methods, context manager protocolr_backend.py:RStatBackend— extracts current R logic frombedstat.py(move, not rewrite)__init__.py:create_backend(name)low-level factory,build_backend(name)high-level constructor that handles backend-specific prerequisites (e.g. startsRServiceManagerfor R)Refactored orchestration:
bedstat.py: signature changed frombedstat(backend="r", r_service=..., ...)tobedstat(backend: StatBackend, ...)— takes an instance, not a stringbedboss.py,insert_pep,reprocess_all,upload_all: build one backend at batch-orchestrator level viabuild_backend(), pass through tobedstat()for each file, cleanup when donecli.py: standalonebedstatcommand useswithblock for automatic cleanupWhy this matters:
Backend instances hold resources across a batch (R keeps its subprocess alive, future backends can cache reference data). The old pattern reconstructed backends per file, which prevented cross-file amortization.
Constants
BACKEND_R = "r"andBACKEND_GTARS = "gtars"inconst.py(gtars backend comes in PR Add gtars genomicdist backend #155)Test plan
RServiceManagerlifecycle unchanged (created inbuild_backend, cleaned up by caller)🤖 Generated with Claude Code