pyk: per-request haskell-backend RPC log capture, written per request under the proof dir#4928
pyk: per-request haskell-backend RPC log capture, written per request under the proof dir#4928ehildenb wants to merge 8 commits into
Conversation
…rgs hook Pull the `--log` / `--log-format` / `--log-entries` construction out of KoreServer._extra_args into a new _haskell_log_cli_args method so subclasses can supply their own flag form. No behavior change at this commit; the default still emits the kore-rpc form. Mirrors the existing _smt_cli_args override pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…og-file / -l in BoosterServer kore-rpc-booster diverged from kore-rpc on the haskell-log CLI: it accepts `--log-file FILE` (not `--log FILE`) and repeated `-l ENTRY` (not the comma-separated `--log-entries A,B,C`). See booster's CLOptions.hs (`long "log-file"` and `long "log-level" <> short 'l'` inside a `many`). Pyk was emitting the kore-rpc form to both binaries, causing kore-rpc-booster to exit on usage error before opening its RPC port — surfacing as a psutil.AccessDenied in KoreServer._get_host_and_port. Override _haskell_log_cli_args on BoosterServer to produce the booster form. KoreServer is unchanged. Regression test pins both flag forms by calling _haskell_log_cli_args directly on bypass-__init__ instances. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-local accessor on KoreClient
`JsonRpcClient.request` now records the deterministic `f'{client_label}-{NNN}'`
id it issues into a thread-local; `KoreClient.last_request_id` exposes it.
A diagnostic caller snapshots this immediately after each RPC to use the id as a
join key onto the per-request captured logs and handoff metadata. Thread-local
so parallel proof workers never clobber each other's ids.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ion/{kore/test_kore_client,test_division_hooks}: per-request haskell-logging entry list + haskell-log-entries on result types
Add `haskell_logging: Iterable[str] | None = None` to KoreClient.{execute,implies,
simplify,get_model,add_module}: the value is the list of haskell-backend log entries
to capture for that request, sent verbatim as the `haskell-logging` wire field (None
omits it). Putting the entry set on the request — rather than a fixed backend bundle
or a server-startup `-l` set — lets pyk own the default and downstream override it,
and keeps the backend from committing to a log set ahead of time. Parse the optional
`haskell-log-entries` bundle onto every ExecuteResult subtype, ImpliesResult, and
GetModelResult; `simplify` now returns a `SimplifyResult` `(state, logs,
haskell_log_entries)`. Absent entries / unset list preserve today's behaviour.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…haskell_logging through execute, simplify, kast_simplify, implies, extend_cterm
Add an optional `haskell_logging` per-call parameter to
CTermSymbolic.{execute,simplify,kast_simplify,implies} (including the recursive
failure-reason implies) and forward it through KCFGExplore.extend_cterm to
CTermSymbolic.execute. Additive and default-preserving: omitting it reproduces
today's calls exactly (None leaves backend logging untouched).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kell-logging bundle to <dir>/<request_id>.jsonl Add an instance-level `haskell_log_dir` to CTermSymbolic (threaded through the `cterm_symbolic` factory). When set, every execute/simplify/implies requests the per-request `haskell-logging` bundle (unless a per-call flag overrides) and writes the captured entries to `<haskell_log_dir>/<request_id>.jsonl`, one JSON value per line, named by request id so concurrent proof workers never collide. Mirrors the `booster_only_simplify` instance-default pattern; no-op when unset. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…est bundles under the save dir Add a `--haskell-logging` boolean to `pyk prove` (ProveOptions). When set, exec_prove points `cterm_symbolic` at `<save-directory>/haskell-logs`, so every RPC's `haskell-logging` bundle is captured and written one file per request. Requires `--save-directory`; errors clearly otherwise. The single per-request flag is self-sufficient on the backend side: the haskell-backend captures its bundle (kore equation attempt/apply plus term resolution, and the booster proxy/detail/abort/simplify context events) independently of the server's -l/--log-entries configuration. So pyk only sets the one flag and does not configure which entries to enable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a165263 to
13d2a6c
Compare
| #: registry) and booster context tags. Sent verbatim on the ``haskell-logging`` request field; | ||
| #: the backend skips any name it does not recognise, so this can evolve without a lockstep backend | ||
| #: release. Override per `CTermSymbolic` (e.g. downstream semantics needing a different set). | ||
| HASKELL_LOGGING_ENTRIES: Final[tuple[str, ...]] = ( |
There was a problem hiding this comment.
| HASKELL_LOGGING_ENTRIES: Final[tuple[str, ...]] = ( | |
| HASKELL_LOGGING_ENTRIES: Final = ( |
| #: diagnosis. Spans both engines: kore entry types (resolved by the backend against its log | ||
| #: registry) and booster context tags. Sent verbatim on the ``haskell-logging`` request field; | ||
| #: the backend skips any name it does not recognise, so this can evolve without a lockstep backend | ||
| #: release. Override per `CTermSymbolic` (e.g. downstream semantics needing a different set). |
There was a problem hiding this comment.
| #: release. Override per `CTermSymbolic` (e.g. downstream semantics needing a different set). | |
| #: release. Override per ``CTermSymbolic`` (e.g. downstream semantics needing a different set). |
| log_succ_rewrites: bool = True, | ||
| log_fail_rewrites: bool = False, | ||
| booster_only_simplify: bool = False, | ||
| haskell_log_entries: Iterable[str] = HASKELL_LOGGING_ENTRIES, |
There was a problem hiding this comment.
So logging is on by default?
There was a problem hiding this comment.
No — it's off by default. The populated default is which entries to request if logging is enabled, not a switch: logging only happens once haskell_log_dir is set (or a per-call flag turns it on). Reworded the comment to make that explicit. (abc9754)
| cterm: CTerm, | ||
| module_name: str | None = None, | ||
| booster_only_simplify: bool | None = None, | ||
| haskell_logging: bool | None = None, |
There was a problem hiding this comment.
So the class owns the list of log entry types to ask for, and requests just turn this on or off, right?
There was a problem hiding this comment.
Yes, exactly. And following up on that ownership: the set is now also overridable through the cterm_symbolic factory (it previously only reached the legacy server -l path), so downstream legacy_explore/create_kcfg_explore callers can tailor it, not just directly-constructed CTermSymbolics. (f4aa11e)
| # Records the JSON-RPC request id of the most recent wire request issued on the | ||
| # calling thread (see `JsonRpcClient.request`). A diagnostic caller snapshots | ||
| # `KoreClient.last_request_id` immediately after each RPC to use the deterministic | ||
| # `f'{client_label}-{NNN}'` id as a join key onto the captured per-request logs. | ||
| # Thread-local so parallel proof workers never clobber each other's ids. | ||
| _last_request_id: local = local() | ||
|
|
There was a problem hiding this comment.
What's the rationale for tracking this per-thread and not per-client?
There was a problem hiding this comment.
I checked _ProverPool and KoreServerPool — both key instances by current_thread().name, so each worker thread creates and owns its own KoreClient; no client is ever shared across threads. The thread-local was guarding a case that doesn't occur, so I switched last_request_id to a plain per-client attribute (recorded on JsonRpcClient, surfaced through the facade and KoreClient). (9a98324)
| def worker(name: str) -> None: | ||
| _last_request_id.value = name | ||
| barrier.wait() # both threads have set their value before either reads | ||
| seen[name] = kore_client.last_request_id |
There was a problem hiding this comment.
This is also looks like a KoreClient test.
| return server | ||
|
|
||
|
|
||
| def _make_booster_server( |
There was a problem hiding this comment.
These can be turned into fixtures.
| haskell_log_format: KoreExecLogFormat = KoreExecLogFormat.JSON, | ||
| haskell_log_entries: list[str] | None = None, | ||
| ) -> BoosterServer: | ||
| server = object.__new__(BoosterServer) |
There was a problem hiding this comment.
Consider using mocks of the actual classes instead that simply patch start into a no-op. This would let us test all other private methods as well.
There was a problem hiding this comment.
Alternatively, you can refactor the classes to call start only on __enter__. Then the real classes can be tested directly.
| return server | ||
|
|
||
|
|
||
| class TestKoreServerHaskellLogCliArgs: |
There was a problem hiding this comment.
These tests can be made top-level functions.
There was a problem hiding this comment.
Also consider turning them into parameterized tests.
| def worker(name: str) -> None: | ||
| _last_request_id.value = name | ||
| barrier.wait() # both threads have set their value before either reads | ||
| seen[name] = kore_client.last_request_id |
There was a problem hiding this comment.
This also looks like a KoreClient test.
Blocked on: runtimeverification/haskell-backend#4151Blocked on: #4929
Lets pyk request the haskell-backend's per-request log bundle and write it to disk, so downstream can diagnose fallback behaviour per RPC. The request carries the list of log entries to capture and the response returns them in-band; pyk owns the default set and writes each request's bundle to its own file. This pairs with the haskell-backend
per-request-rpcbranch, which adds thehaskell-loggingrequest field ([String]) and thehaskell-log-entriesresponse field to the five JSON-RPC endpoints and captures the named entries independently of the server's-l/--log-fileconfiguration.Changes:
BoosterServernow emits the booster-form haskell-log CLI (--log-file FILE, repeated-l ENTRY) via a new overridable_haskell_log_cli_argshook; it previously sent the kore-rpc form (--log,--log-entries A,B), which made kore-rpc-booster exit on a usage error before opening its RPC port.KoreClient.{execute,implies,simplify,get_model,add_module}takehaskell_logging: Iterable[str] | None— the entries to capture for that request — and send it as thehaskell-loggingwire field (omitted whenNone); the optionalhaskell-log-entriesbundle is parsed onto every result type, andsimplifyreturns aSimplifyResult (state, logs, haskell_log_entries).KoreClient.last_request_idexposes the JSON-RPC id of the most recent request on the calling thread, used to name each per-request log file.CTermSymbolicowns the canonical default entry set (HASKELL_LOGGING_ENTRIES), overridable per instance so downstream can tailor it; the on/off flag is threaded throughexecute/simplify/kast_simplify/impliesandKCFGExplore.extend_cterm.CTermSymbolictakes ahaskell_log_dir: when set, every RPC requests the bundle and the entries returned on the response are written to<haskell_log_dir>/<request_id>.jsonl(one JSON value per line, named by request id so concurrent proof workers never collide).pyk prove --haskell-loggingpoints that sink at<save-directory>/haskell-logs/(requires--save-directory).Validation:
make -C pyk checkclean; unit suites forcterm/kore/kcfggreen, including new coverage for the per-request entry list, result-type bundle parsing, the entry-set override, and the jsonl writer.per-request-rpcbackend implementation: it acceptshaskell-logging: [String], routes the flat list to both engines (skipping names each does not recognise), tag-matches id-carrying contexts likeRewrite, and returns the captured entries in-band — pyk's default set is fully recognised, no skips.regression-newnot run locally (the machine's K install is pinned for a benchmark) — relying on CI.