Skip to content

Fix production loop readiness gaps#22

Open
Svaag wants to merge 12 commits into
mainfrom
fix/loop-production-readiness
Open

Fix production loop readiness gaps#22
Svaag wants to merge 12 commits into
mainfrom
fix/loop-production-readiness

Conversation

@Svaag

@Svaag Svaag commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #13.
Closes #11.
Closes #12.
Closes #6.

This hardens the production loop path by fixing the current open readiness gaps:

  • run Python gates through target repo dev dependencies (uv run --group dev / --extra dev) when present
  • fail closed on missing/unstartable gate binaries and expose gate stdout/stderr in compact traces
  • derive planner acceptance criteria from issue-body ## Acceptance / ## Acceptance criteria sections with deterministic fallback
  • switch PiBackend and production policy to pi --print --mode json
  • parse Pi NDJSON usage/cost into the loop cost ledger shape
  • add per-issue daemon budget labels: loop:budget-large and loop:budget-xl

Validation

  • uv run --group dev python -m pytest -q → 200 passed
  • uvx ruff check src tests → passed
  • uv run --group dev mypy --strict src → passed
  • uv run hyrule-engineering-loop models validate --model-policy configs/loop/model-policy.production.yml --json → ok

Notes

The daily daemon caps remain enforced. Budget labels only raise the per-run cap for explicitly triaged feature-sized issues.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9489723bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +134 to +138
return [*argv[:2], *dev_args, *argv[2:]]
return argv

if _is_python_gate_payload(argv):
return ["uv", "run", *dev_args, *argv]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent uv gates from mutating the worktree

When this wrapper runs in a target repo with a missing or stale uv.lock, uv run can resolve/sync and write uv.lock before policy_node inspects the worktree; uv run --help documents --locked as asserting the lockfile remains unchanged and --frozen as running without updating uv.lock. That means an otherwise allowed Python change can fail policy because the gate, not the backend, added or modified uv.lock outside allowed_paths. Add --locked/--frozen or otherwise prevent/clean gate side effects when synthesizing these uv commands.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: synthesized and rewritten uv run gates now add --locked unless --locked or --frozen is already present. Regression coverage is in tests/test_gate_runner.py.

@Svaag

Svaag commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the Codex review: synthesized/rewritten uv run gates now include --locked unless an explicit lock guard (--locked/--frozen) is already present, so gates fail closed instead of mutating uv.lock. Added regression coverage and re-ran locally: pytest 201 passed, ruff, mypy.

@Svaag Svaag force-pushed the fix/loop-production-readiness branch from ae08c02 to 986b750 Compare June 21, 2026 19:20

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 986b750712

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hyrule_engineering_loop/nodes.py Outdated
update["gate_commands"] = select_gate_commands_for_mutations(
changed_paths or list(mutations)
changed_paths or list(mutations),
cwd=gate_worktree,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Select Python gates per worktree

When a run has multiple worktrees, this passes only the first worktree into gate selection, but gate_execution_node later runs the resulting global gate_commands in every worktree. If the Python repo with a dev dependency group is not first, the run falls back to compileall and skips the intended pytest/ruff/mypy gates for that repo; if only the first repo has the dev group, those uv gates get run in unrelated worktrees. Derive gates per worktree/repo instead of using a single cwd for the whole multi-repo run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: auto-selected gates are now stored as gate_commands_by_repo and selected/executed per worktree, using each repo own pyproject.toml. Added multi-worktree regression coverage in test_auto_gate_selection_is_per_worktree.

Comment thread src/hyrule_engineering_loop/daemon.py Outdated
"max_wall_clock_minutes": config.max_wall_clock_minutes_per_run,
"max_cost_usd": config.max_cost_usd_per_run,
},
backend_budget=backend_budget_for_issue(item, config),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clamp issue budgets to remaining daily cost

When the ledger already has spend earlier in the UTC day, an issue labeled loop:budget-xl is still handed a $10 run budget here; the daily guard above only checks whether the ledger is already at the cap before starting. For example, with $5 already spent under the default $10 daily cap, the next XL run can be budgeted for another $10 and update the ledger to $15, so the new label bypasses the promised daily cap. Clamp max_cost_usd to the remaining daily budget or skip the run when none remains.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: issue-label run budgets are clamped to the remaining daily cost budget before the feature runner starts. Added unit and daemon-path regression coverage.

@Svaag

Svaag commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the two follow-up Codex findings:

  • Auto-selected gates are now derived per worktree/repo via gate_commands_by_repo, so Python repos get their own pytest/ruff/mypy selection based on that repo's pyproject.toml, and those commands are not run in unrelated worktrees.
  • Issue budget labels are now clamped to the remaining daily cost budget before the feature runner starts, so loop:budget-xl cannot exceed the daemon's daily cost cap.

Re-ran locally:

  • uv run --group dev python -m pytest -q → 204 passed
  • uvx ruff check src tests → passed
  • uv run --group dev mypy --strict src → passed

@Svaag Svaag force-pushed the fix/loop-production-readiness branch from 4524339 to aa56212 Compare June 21, 2026 19:41

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa56212438

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 664 to 668
try:
decoded = json.loads(stdout)
except (json.JSONDecodeError, ValueError):
return {}
return _parse_pi_json_events(stdout)
return decoded if isinstance(decoded, dict) else {}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize single-event Pi JSON before returning

When pi --mode json emits only one JSON event, such as an immediate agent_error before any turns, json.loads(stdout) succeeds and the new Pi event mapper is skipped. That single event is returned without the normalized is_error, usage, or total_cost_usd fields, so a zero-exit error event can be treated as a completed backend run and the daemon ledger records no reported cost. Detect Pi event-shaped dicts here and pass them through the same mapper before returning generic harness JSON.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit a0ab44f: single Pi JSON events whose type is a Pi event are now routed through the same Pi event normalizer instead of being returned raw. Added test_pi_backend_parses_single_json_error_event.

Comment on lines +139 to +141
dev_args = _uv_dev_args(cwd_path)
if not dev_args:
return argv

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard explicit uv gates without dev deps

Fresh evidence beyond the earlier lockfile thread: this early return still bypasses _with_uv_lock_guard for an explicit uv run ... gate whenever the target pyproject.toml has no dev group/extra. I checked uv run --help: --locked asserts uv.lock remains unchanged and --frozen runs without updating it; without either flag, uv run in a repo with a missing or stale lockfile can create/modify uv.lock, and the subsequent policy diff check will blame the backend for a gate-side effect outside allowed_paths.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit a0ab44f: explicit uv run ... gates now receive --locked even when the target repo has no dev group/extra, while explicit --locked/--frozen remains preserved. Added test_explicit_uv_gate_without_dev_env_still_gets_lock_guard.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0ab44f7ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

argv = list(command)
if not argv:
raise ValueError("gate command cannot be empty")
prepared = prepare_gate_command(argv, cwd=cwd)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate rewritten gate commands

When a target repo has a dev dependency set, this rewrites an already policy-approved gate such as ruff check . into uv run --locked --group dev ruff check .; however _validate_gate_commands only checks the original command[0]. In environments where allowed_gate_commands permits ruff/pytest/mypy but intentionally omits uv, the loop will still execute uv, bypassing the allowlist. Validate the prepared argv or persist prepared commands before the policy node runs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: gate policy now validates the prepared argv for the cwd(s) where gates execute, in addition to the declared command. That catches rewrites such as ruff check . -> uv run --locked --group dev ruff check . when uv is not allowlisted. Added test_policy_validates_prepared_gate_command.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16b6e1bbc7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +75 to +76
def _uv_run_has_lock_guard(argv: Sequence[str]) -> bool:
return any(arg in UV_LOCK_GUARDS for arg in argv)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Distinguish uv lock flags from payload args

When an explicit gate passes --locked or --frozen to the command being run, for example uv run python tools/check.py --locked or anything after --, this scan treats that payload flag as a uv lock guard and _with_uv_lock_guard leaves uv run unguarded. I checked uv run --help: its usage is uv run [OPTIONS] [COMMAND], with --locked/--frozen listed as uv options, so only flags before the command protect the lockfile; in the payload case a stale or missing uv.lock can still be updated by the gate and then be blamed by the policy diff check.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: _uv_run_has_lock_guard now only considers --locked/--frozen when they appear in the uv run option segment before the payload command or --. Payload args like uv run python tools/check.py --locked and args after -- no longer suppress insertion of the real uv lock guard. Added regression coverage for both cases.

@Svaag Svaag force-pushed the fix/loop-production-readiness branch from 07d452e to c84e9ca Compare June 21, 2026 20:17

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c84e9ca5a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"--index-url",
"--isolated",
"--keyring-provider",
"--managed-python",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify uv run flags correctly

When an explicit gate uses a no-argument uv flag such as uv run --managed-python pytest -q in a repo that declares a dev group, this entry makes _uv_run_payload consume pytest as if it were the value for --managed-python, so prepare_gate_command no longer recognizes the payload as a Python gate and never inserts --group dev/--extra dev. I checked uv run --help: --managed-python is listed as a flag, while --group <GROUP> is the dependency selector this code is trying to add, so these gates can still run outside the target repo's dev dependencies.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: corrected the uv option classifier so no-argument flags like --managed-python and --isolated are not treated as consuming the payload command. uv run --managed-python pytest -q now still gets the target repo dev selector plus --locked. Added test_uv_no_argument_flag_does_not_hide_python_gate_payload.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc08975a88

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

PYTHON_GATE_TOOLS = frozenset({"pytest", "ruff", "mypy"})
UV_DEV_SELECTORS = frozenset({"--group", "--extra", "--all-groups", "--all-extras"})
UV_LOCK_GUARDS = frozenset({"--locked", "--frozen"})
UV_OPTIONS_WITH_VALUE = frozenset(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse uv options that take values before the payload

For explicit workspace gates such as uv run --package api pytest -q, this option-value table omits --package, so _uv_run_payload treats api as the command and never recognizes the pytest payload that should get the repo's dev selector. I checked uv run --help, which lists --package <PACKAGE> before [COMMAND]; add the missing value-taking uv options or use a more complete parser so flags before the payload do not hide Python gates.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: expanded the uv value-taking option table to include workspace/package and other uv run options such as --package, so option values before the payload no longer hide Python gates. Added test_uv_value_option_does_not_hide_python_gate_payload.

Comment on lines +164 to +165
if _uv_run_has_dependency_selector(locked):
return locked

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add dev selectors alongside non-dev uv extras

When an explicit Python gate already selects another uv dependency set, for example uv run --extra docs pytest -q in a repo whose pyproject.toml exposes [project.optional-dependencies].dev, this early return leaves the command without --extra dev. I checked uv run --help: --extra <EXTRA> includes dependencies from the specified extra name, so a non-dev extra does not make the dev tools available; these gates can still run outside the target repo's dev dependencies or fail even though the dev extra exists.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: existing non-dev selectors no longer suppress the required dev selector. We now check specifically for the required dev group/extra (or all-groups/all-extras) before deciding not to add it. Added test_uv_non_dev_extra_does_not_suppress_dev_selector.

budget["max_wall_clock_minutes"] = max(
int(budget["max_wall_clock_minutes"]), int(override["max_wall_clock_minutes"])
)
budget["max_cost_usd"] = max(float(budget["max_cost_usd"]), float(override["max_cost_usd"]))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the raised cost budget before trusting labels

Fresh evidence beyond the earlier clamp thread: when loop:budget-large/xl reaches this assignment, the only downstream use is copying max_cost_usd into BackendConstraints; assemble_backend_prompt does not mention the dollar cap and SubprocessBackend.execute never compares reported Pi cost to it. The new label therefore neither enforces nor meaningfully raises a run cost cap, and the daemon can still add an over-cap cost to the daily ledger after the run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: reported harness cost is now enforced against BackendConstraints.max_cost_usd; an over-budget completed harness run becomes budget_exhausted. The daemon also blocks draft PR publication if the final reported run cost exceeds the effective per-run budget. Added backend and daemon regression coverage.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccae6f7d31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +636 to +640
if isinstance(message, dict) and message.get("role") == "assistant":
usage = _usage_from_pi_message(message)
if usage is not None:
parsed.update(usage)
text = _content_text(message)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor Pi message stopReason errors

When Pi encodes a provider/runtime failure on an assistant message, e.g. stopReason: "error" or "aborted" with errorMessage (as its message types document), this branch still treats the message like a normal assistant response and never sets is_error. If that JSON-mode run exits 0, execute() reports a completed backend run instead of a failure, so the loop can proceed past a failed Pi invocation; check assistant stopReason/errorMessage while normalizing these events.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: Pi assistant messages with error-like stopReason values (error, aborted, canceled variants) or errorMessage/error now normalize to is_error=True. Added test_pi_backend_treats_assistant_error_stop_reason_as_error, preserving usage/cost extraction while making zero-exit provider/runtime failures fail the backend run.

@Svaag Svaag force-pushed the fix/loop-production-readiness branch from c72e1d0 to 8ad7a8a Compare June 21, 2026 21:33

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ad7a8ad12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hyrule_engineering_loop/backend.py Outdated
Comment on lines +641 to +642
if event.get("willRetry") is True:
parsed["is_error"] = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't fail successful Pi retry streams

When Pi recovers from context overflow, its JSON stream can contain retry bookkeeping before the later successful agent_end. The Pi RPC docs describe compaction_end with willRetry: true as meaning Pi will automatically retry the prompt, but this line makes that transient marker set is_error permanently. In that scenario a zero-exit, ultimately successful pi --mode json run is returned as failed, so the loop goes to triage despite valid backend output.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: transient retry bookkeeping (willRetry: true, e.g. compaction retry events) no longer marks the whole Pi stream as failed. Error events and assistant error stop reasons still set is_error. Added test_pi_backend_retry_bookkeeping_does_not_fail_later_success.

Comment on lines +139 to +140
if selector == "--group":
return "--all-groups" in options or value in _uv_option_value(options, "--group")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat uv only-dev selectors as satisfying dev

When an explicit gate already uses uv run --only-group dev pytest -q or uv run --only-dev pytest -q in a repo with [dependency-groups].dev, this check returns false because it only accepts --group dev/--all-groups. I checked uv run --help: --only-group <ONLY_GROUP> and --only-dev are uv dependency selectors, and uv rejects combining them with an added --group, so prepare_gate_command rewrites a valid explicit gate into one that fails before running tests.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: --only-group dev and --only-dev now satisfy the dev dependency requirement for dependency-group repos, so we do not add incompatible --group dev. Added regressions for both selectors.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a7157d3fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

selector, value = dev_args
if selector == "--group":
return (
"--all-groups" in options

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat dev exclusions as unsatisfied

When an explicit gate in a dependency-group repo uses uv run --all-groups --no-group dev pytest -q, this branch returns true as soon as --all-groups is present. I checked uv run --help: --no-group <NO_GROUP> disables the specified dependency group, so the command still excludes the dev group that normally provides pytest/ruff/mypy; prepare_gate_command then leaves the gate outside the target dev deps or failing because the tools are missing. Treat dev exclusions as unsatisfied or reject them.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: uv Python gates that explicitly exclude the required dev dependency set now fail closed instead of being treated as satisfied. This catches cases like uv run --all-groups --no-group dev pytest -q for dependency-group repos before execution and in gate policy validation. Added runner and policy regressions.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7c416f26e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +412 to +413
["uv", "run", "ruff", "check", "."],
["uv", "run", "mypy", *_mypy_targets(normalized)],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep auto-selected Python gates from dirtying worktrees

When a Python repo has a dev dependency set and does not already ignore these cache directories, the newly auto-selected ruff check . and mypy ... gates can create untracked .ruff_cache/.mypy_cache in the target worktree; Ruff documents its default cache dir as .ruff_cache and mypy documents its default cache dir as .mypy_cache. Because _validate_worktree_diffs later reads git status for the same worktree, an otherwise valid backend diff can fail policy on gate-created files outside the allowed paths. Run these gates with cache dirs outside the worktree or clean/disable their caches before the diff policy step.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the PR branch: auto-selected Python gates are now cache-neutral inside target worktrees. Pytest disables the cache provider, Ruff runs with --no-cache, and mypy runs with --no-incremental, preventing .pytest_cache, .ruff_cache, and .mypy_cache side effects from polluting later policy diff checks. Added regression coverage for the selected gate argv.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4cec956fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cwd_path = Path(cwd).expanduser().resolve() if cwd is not None else None
if _uv_dev_args(cwd_path):
return [
["uv", "run", "python", "-m", "pytest", "-q", "-p", "no:cacheprovider"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid forcing pytest on testless dev-env repos

When any Python file changes in a repo that merely declares a dev group/extra, this auto-selects pytest even if the target repo has no tests or does not use pytest. Pytest documents exit code 5 as “No tests were collected” (https://docs.pytest.org/en/stable/reference/exit-codes.html), so a valid Python-only change in a new/small package can fail the loop solely because the synthesized gate was added; consider only selecting pytest when tests/config are present or falling back to compileall in that case.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant