Skip to content

pyk: per-request haskell-backend RPC log capture, written per request under the proof dir#4928

Open
ehildenb wants to merge 8 commits into
developfrom
kore-rpc-improvements
Open

pyk: per-request haskell-backend RPC log capture, written per request under the proof dir#4928
ehildenb wants to merge 8 commits into
developfrom
kore-rpc-improvements

Conversation

@ehildenb
Copy link
Copy Markdown
Member

@ehildenb ehildenb commented Jun 3, 2026

Blocked on: runtimeverification/haskell-backend#4151
Blocked on: #4929

Lets pyk request the haskell-backend's per-request log bundle and write it to disk, so downstream can diagnose fallback behaviour per RPC. The request carries the list of log entries to capture and the response returns them in-band; pyk owns the default set and writes each request's bundle to its own file. This pairs with the haskell-backend per-request-rpc branch, which adds the haskell-logging request field ([String]) and the haskell-log-entries response field to the five JSON-RPC endpoints and captures the named entries independently of the server's -l/--log-file configuration.

Changes:

  • BoosterServer now emits the booster-form haskell-log CLI (--log-file FILE, repeated -l ENTRY) via a new overridable _haskell_log_cli_args hook; it previously sent the kore-rpc form (--log, --log-entries A,B), which made kore-rpc-booster exit on a usage error before opening its RPC port.
  • KoreClient.{execute,implies,simplify,get_model,add_module} take haskell_logging: Iterable[str] | None — the entries to capture for that request — and send it as the haskell-logging wire field (omitted when None); the optional haskell-log-entries bundle is parsed onto every result type, and simplify returns a SimplifyResult (state, logs, haskell_log_entries).
  • KoreClient.last_request_id exposes the JSON-RPC id of the most recent request on the calling thread, used to name each per-request log file.
  • CTermSymbolic owns the canonical default entry set (HASKELL_LOGGING_ENTRIES), overridable per instance so downstream can tailor it; the on/off flag is threaded through execute/simplify/kast_simplify/implies and KCFGExplore.extend_cterm.
  • CTermSymbolic takes a haskell_log_dir: when set, every RPC requests the bundle and the entries returned on the response are written to <haskell_log_dir>/<request_id>.jsonl (one JSON value per line, named by request id so concurrent proof workers never collide).
  • pyk prove --haskell-logging points that sink at <save-directory>/haskell-logs/ (requires --save-directory).

Validation:

  • make -C pyk check clean; unit suites for cterm/kore/kcfg green, including new coverage for the per-request entry list, result-type bundle parsing, the entry-set override, and the jsonl writer.
  • Confirmed against the per-request-rpc backend implementation: it accepts haskell-logging: [String], routes the flat list to both engines (skipping names each does not recognise), tag-matches id-carrying contexts like Rewrite, and returns the captured entries in-band — pyk's default set is fully recognised, no skips.
  • Integration and regression-new not run locally (the machine's K install is pinned for a benchmark) — relying on CI.

ehildenb and others added 5 commits June 3, 2026 20:53
…rgs hook

Pull the `--log` / `--log-format` / `--log-entries` construction out of
KoreServer._extra_args into a new _haskell_log_cli_args method so subclasses
can supply their own flag form.  No behavior change at this commit; the
default still emits the kore-rpc form.  Mirrors the existing _smt_cli_args
override pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…og-file / -l in BoosterServer

kore-rpc-booster diverged from kore-rpc on the haskell-log CLI: it accepts
`--log-file FILE` (not `--log FILE`) and repeated `-l ENTRY` (not the
comma-separated `--log-entries A,B,C`).  See booster's CLOptions.hs
(`long "log-file"` and `long "log-level" <> short 'l'` inside a `many`).
Pyk was emitting the kore-rpc form to both binaries, causing kore-rpc-booster
to exit on usage error before opening its RPC port — surfacing as a
psutil.AccessDenied in KoreServer._get_host_and_port.

Override _haskell_log_cli_args on BoosterServer to produce the booster form.
KoreServer is unchanged.  Regression test pins both flag forms by calling
_haskell_log_cli_args directly on bypass-__init__ instances.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-local accessor on KoreClient

`JsonRpcClient.request` now records the deterministic `f'{client_label}-{NNN}'`
id it issues into a thread-local; `KoreClient.last_request_id` exposes it.
A diagnostic caller snapshots this immediately after each RPC to use the id as a
join key onto the per-request captured logs and handoff metadata.  Thread-local
so parallel proof workers never clobber each other's ids.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ion/{kore/test_kore_client,test_division_hooks}: per-request haskell-logging entry list + haskell-log-entries on result types

Add `haskell_logging: Iterable[str] | None = None` to KoreClient.{execute,implies,
simplify,get_model,add_module}: the value is the list of haskell-backend log entries
to capture for that request, sent verbatim as the `haskell-logging` wire field (None
omits it).  Putting the entry set on the request — rather than a fixed backend bundle
or a server-startup `-l` set — lets pyk own the default and downstream override it,
and keeps the backend from committing to a log set ahead of time.  Parse the optional
`haskell-log-entries` bundle onto every ExecuteResult subtype, ImpliesResult, and
GetModelResult; `simplify` now returns a `SimplifyResult` `(state, logs,
haskell_log_entries)`.  Absent entries / unset list preserve today's behaviour.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…haskell_logging through execute, simplify, kast_simplify, implies, extend_cterm

Add an optional `haskell_logging` per-call parameter to
CTermSymbolic.{execute,simplify,kast_simplify,implies} (including the recursive
failure-reason implies) and forward it through KCFGExplore.extend_cterm to
CTermSymbolic.execute.  Additive and default-preserving: omitting it reproduces
today's calls exactly (None leaves backend logging untouched).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@deosa-arch deosa-arch changed the base branch from master to develop June 3, 2026 23:39
@ehildenb ehildenb changed the title Kore rpc improvements kevm prove: add --haskell-log-file/format/entries Jun 3, 2026
@ehildenb ehildenb changed the title kevm prove: add --haskell-log-file/format/entries pyk: per-request haskell-backend RPC log capture, written per request under the proof dir Jun 3, 2026
ehildenb and others added 2 commits June 4, 2026 03:27
…kell-logging bundle to <dir>/<request_id>.jsonl

Add an instance-level `haskell_log_dir` to CTermSymbolic (threaded through the
`cterm_symbolic` factory).  When set, every execute/simplify/implies requests the
per-request `haskell-logging` bundle (unless a per-call flag overrides) and writes
the captured entries to `<haskell_log_dir>/<request_id>.jsonl`, one JSON value per
line, named by request id so concurrent proof workers never collide.  Mirrors the
`booster_only_simplify` instance-default pattern; no-op when unset.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…est bundles under the save dir

Add a `--haskell-logging` boolean to `pyk prove` (ProveOptions).  When set,
exec_prove points `cterm_symbolic` at `<save-directory>/haskell-logs`, so every
RPC's `haskell-logging` bundle is captured and written one file per request.
Requires `--save-directory`; errors clearly otherwise.

The single per-request flag is self-sufficient on the backend side: the
haskell-backend captures its bundle (kore equation attempt/apply plus term
resolution, and the booster proxy/detail/abort/simplify context events)
independently of the server's -l/--log-entries configuration.  So pyk only sets
the one flag and does not configure which entries to enable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
juliankuners
juliankuners previously approved these changes Jun 4, 2026
@ehildenb ehildenb dismissed juliankuners’s stale review June 4, 2026 14:03

Approved wrong PR.

@ehildenb ehildenb marked this pull request as ready for review June 4, 2026 14:04
#: registry) and booster context tags. Sent verbatim on the ``haskell-logging`` request field;
#: the backend skips any name it does not recognise, so this can evolve without a lockstep backend
#: release. Override per `CTermSymbolic` (e.g. downstream semantics needing a different set).
HASKELL_LOGGING_ENTRIES: Final[tuple[str, ...]] = (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HASKELL_LOGGING_ENTRIES: Final[tuple[str, ...]] = (
HASKELL_LOGGING_ENTRIES: Final = (

Copy link
Copy Markdown
Member Author

@ehildenb ehildenb Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. abc9754

#: diagnosis. Spans both engines: kore entry types (resolved by the backend against its log
#: registry) and booster context tags. Sent verbatim on the ``haskell-logging`` request field;
#: the backend skips any name it does not recognise, so this can evolve without a lockstep backend
#: release. Override per `CTermSymbolic` (e.g. downstream semantics needing a different set).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#: release. Override per `CTermSymbolic` (e.g. downstream semantics needing a different set).
#: release. Override per ``CTermSymbolic`` (e.g. downstream semantics needing a different set).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (abc9754)

log_succ_rewrites: bool = True,
log_fail_rewrites: bool = False,
booster_only_simplify: bool = False,
haskell_log_entries: Iterable[str] = HASKELL_LOGGING_ENTRIES,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So logging is on by default?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No — it's off by default. The populated default is which entries to request if logging is enabled, not a switch: logging only happens once haskell_log_dir is set (or a per-call flag turns it on). Reworded the comment to make that explicit. (abc9754)

cterm: CTerm,
module_name: str | None = None,
booster_only_simplify: bool | None = None,
haskell_logging: bool | None = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the class owns the list of log entry types to ask for, and requests just turn this on or off, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. And following up on that ownership: the set is now also overridable through the cterm_symbolic factory (it previously only reached the legacy server -l path), so downstream legacy_explore/create_kcfg_explore callers can tailor it, not just directly-constructed CTermSymbolics. (f4aa11e)

Comment thread pyk/src/pyk/kore/rpc.py
Comment on lines +51 to +57
# Records the JSON-RPC request id of the most recent wire request issued on the
# calling thread (see `JsonRpcClient.request`). A diagnostic caller snapshots
# `KoreClient.last_request_id` immediately after each RPC to use the deterministic
# `f'{client_label}-{NNN}'` id as a join key onto the captured per-request logs.
# Thread-local so parallel proof workers never clobber each other's ids.
_last_request_id: local = local()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for tracking this per-thread and not per-client?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked _ProverPool and KoreServerPool — both key instances by current_thread().name, so each worker thread creates and owns its own KoreClient; no client is ever shared across threads. The thread-local was guarding a case that doesn't occur, so I switched last_request_id to a plain per-client attribute (recorded on JsonRpcClient, surfaced through the facade and KoreClient). (9a98324)

def worker(name: str) -> None:
_last_request_id.value = name
barrier.wait() # both threads have set their value before either reads
seen[name] = kore_client.last_request_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also looks like a KoreClient test.

return server


def _make_booster_server(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be turned into fixtures.

haskell_log_format: KoreExecLogFormat = KoreExecLogFormat.JSON,
haskell_log_entries: list[str] | None = None,
) -> BoosterServer:
server = object.__new__(BoosterServer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using mocks of the actual classes instead that simply patch start into a no-op. This would let us test all other private methods as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can refactor the classes to call start only on __enter__. Then the real classes can be tested directly.

return server


class TestKoreServerHaskellLogCliArgs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests can be made top-level functions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consider turning them into parameterized tests.

def worker(name: str) -> None:
_last_request_id.value = name
barrier.wait() # both threads have set their value before either reads
seen[name] = kore_client.last_request_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks like a KoreClient test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants