Skip to content

refactor(backends): move generate_from_raw hook firing into Backend base class#1264

Open
ajbozarth wants to merge 1 commit into
generative-computing:mainfrom
ajbozarth:refactor/1183-move-generate-from-raw-hook
Open

refactor(backends): move generate_from_raw hook firing into Backend base class#1264
ajbozarth wants to merge 1 commit into
generative-computing:mainfrom
ajbozarth:refactor/1183-move-generate-from-raw-hook

Conversation

@ajbozarth

Copy link
Copy Markdown
Contributor

Pull Request

Issue

Fixes #1183. Folds the additive half of #1218 Part 1.

Description

Brings the raw (batch) generation path in line with the chat path's
existing wrapper pattern: Backend.generate_from_raw becomes a @final
wrapper that owns hook firing, and backends implement only the new
_generate_from_raw abstract. The three generation_batch_* hooks now
fire from one place instead of being duplicated inline in all five
backends.

Reviewer call-outs:

  1. Tuple return on _generate_from_raw. Backends return
    tuple[list[ModelOutputThunk], dict | None](results, usage). The
    wrapper unpacks the tuple, fires post_call with the aggregate, and
    returns just results to callers (public signature unchanged).
    Considered moving usage aggregation into the wrapper, but openai and
    litellm only report whole-batch usage at the response root, so the
    impl is the only place that knows the right shape. Matches the
    existing _generate_from_context -> tuple[MOT, Context] precedent.

  2. Standardized self._model_id and self._provider instead of new
    abstracts.
    The wrapper needs model and provider for hook
    payloads (pre_call has no MOT yet; error may have no MOT). Considered
    adding _provider_name / _resolved_model_id() abstract members.
    Instead, finished the partial convention 3 of 5 backends already
    used: every backend now sets self._model_id: str and
    self._provider: str in __init__. Inlines model-id resolution into
    ollama and huggingface init bodies and deletes the now-unused
    _get_*_model_id helpers
    . Renames LocalHFBackend._hf_model_id to
    _model_id for consistency. ~10 chat-path setter sites switch to
    reading these attributes — no value changes, just one canonical
    source per backend.

  3. BaseException on the raw-path error wrapper, matching chat. The
    chat-path wrapper's BaseException was added in refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181; the raw path
    was tentatively left at except Exception. The same gap exists:
    synchronous KeyboardInterrupt / asyncio.CancelledError /
    SystemExit inside the impl currently bypass the error hook
    silently. This PR aligns to BaseException. Behavior change: raw
    cancellation/interrupts now fire generation_batch_error before
    propagating.

  4. Batch pre_call payload mutations now propagate. When the batch
    hooks were added in fix(backends): unify raw-path token usage on mot.generation.usage; guard eval_count=None #1218 they were wired for telemetry-only
    observation: a plugin could mutate model_options / format /
    tool_calls on the pre_call payload, and the chat path would honor
    those mutations, but the batch path silently dropped them. Same
    plugin, same intent, different result depending on which API the
    user reached. The wrapper now captures the post-hook payload and
    reassigns the locals before calling _generate_from_raw, identical
    to the chat-path idiom. Adds model_options to
    GenerationBatchPreCallPayload (it didn't exist on the batch
    payload at all). Test inverted from _are_not_propagated to
    _propagates, mirroring the chat-path test.

  5. Folded the additive half of fix(backends): unify raw-path token usage on mot.generation.usage; guard eval_count=None #1218 Part 1. Backends with per-MOT
    token counts (ollama, huggingface, watsonx) now also populate
    mot.generation.usage per MOT. Backends with whole-batch-only usage
    (openai, litellm) leave per-MOT mot.generation.usage = None and
    surface the aggregate via the tuple return. Null-token policy:
    all-or-nothing — if any of prompt_tokens / completion_tokens /
    total_tokens cannot be determined for a MOT, that MOT's
    generation.usage stays None (matches dict | None typing; ollama
    docs document Optional[int] as "not yet available" rather than
    "zero"). Watsonx's previous undocumented or 0 coerce switches to
    the same policy.

    The existing mot._meta["usage"] writes are preserved everywhere —
    budget_forcing_alg.py still reads from them. fix(backends): unify raw-path token usage on mot.generation.usage; guard eval_count=None #1218's remaining
    scope (consumer-side reads in budget_forcing_alg.py and the
    mot._meta["usage"] deprecation path) stays open for follow-up.

    Originally folded in to support generic usage aggregation in the
    wrapper; kept after the openai/litellm pivot to call-out 1's tuple
    return because the per-MOT writes are still the right thing for the
    3 backends whose APIs expose them.

  6. Custom-backend doc updated.
    docs/docs/community/building-extensions.md was teaching users to
    override the public generate_from_context / generate_from_raw
    directly; those are @final now. The example was updated to
    implement _generate_from_context / _generate_from_raw and set
    _model_id / _provider.

  7. Test coverage. Adds TestGenerationBatchHookCallSites in
    test/plugins/test_hook_call_sites.py mirroring the chat-path
    firing-site tests (9 tests). _MockBackend extended to cover both
    paths with hardcoded behavior; error case via inline subclass per
    the existing RecordingBackend(_MockBackend) pattern. Mocks in
    test/stdlib/test_streaming.py, test/core/test_logger_plugin_hooks.py,
    and test/stdlib/frameworks/test_react_framework.py updated to the
    new abstract contract.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

…ase class

Closes generative-computing#1183. Folds the additive half of generative-computing#1218 Part 1.

- `generate_from_raw` becomes a `@final` wrapper on `Backend` that owns
  pre/post/error hook firing; backends implement `_generate_from_raw`
  returning `(results, usage)`.
- All five backends drop the inline gen_id, latency timing, and three
  hook-fire blocks; wrapper catches `BaseException` (matches chat-path
  generative-computing#1181).
- `generation_batch_pre_call` payload mutations now propagate to the
  backend impl (model_options/format/tool_calls), matching the chat
  path. Adds `model_options` field to `GenerationBatchPreCallPayload`.
  Closes a gap from generative-computing#1218 where the batch hook was wired for telemetry
  observation only.
- Standardizes `self._model_id` and `self._provider` on every backend.
  Inlines model-id resolution into ollama/huggingface `__init__`s and
  deletes the `_get_*_model_id` helpers; renames `_hf_model_id` to
  `_model_id`.
- Backends with per-MOT token counts (ollama, hf, watsonx) now populate
  `mot.generation.usage` per MOT; openai/litellm leave it `None` since
  their APIs only report whole-batch usage. `mot._meta["usage"]` writes
  preserved for generative-computing#1218's follow-up.
- Adds `TestGenerationBatchHookCallSites` mirroring the chat-path tests;
  updates the custom-backend doc snippet.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth ajbozarth requested review from a team, jakelorocco and nrfulton as code owners June 12, 2026 20:50
@ajbozarth ajbozarth self-assigned this Jun 12, 2026
@ajbozarth ajbozarth requested a review from planetf1 June 12, 2026 20:50
@github-actions github-actions Bot added the enhancement New feature or request label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: move generate_from_raw hook firing into Backend base class

1 participant