Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- `review_depth: lightweight` annotation for review blocks in `job.yml` step outputs and `.deepreview` rules — when set, the workflow's `common_job_info` preamble is omitted from review instruction files, reducing token overhead for trivial or reversible intermediate steps (closes #86)
- Hung-reviewer retry policy in quality gate guidance: reviewers completing with 0 tool uses are retried up to `REVIEWER_MAX_RETRIES` (default: 1) times before being skipped with a "manual review recommended" note; fast-fails (elapsed < `REVIEWER_FAST_FAIL_SECONDS`, default: 30s) are retried immediately (closes #408)

### Changed

Expand Down
10 changes: 10 additions & 0 deletions doc/specs/deepwork/jobs/JOBS-REQ-004-quality-review-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,13 @@ The quality review system evaluates step outputs against defined quality criteri
4. The synthetic task's instructions MUST be prefixed with the same preamble used for file-based reviews (workflow `common_job_info` and input context).
5. String outputs with `None` values MUST be skipped.
6. The synthetic task's `review_id` MUST incorporate the string value into its content hash so that distinct string values produce distinct cache keys (per REVIEW-REQ-009.1.7).

### JOBS-REQ-004.9: Hung Reviewer Retry Policy

1. The review guidance MUST include a clearly labelled hung-reviewer retry policy section.
2. The guidance MUST define a reviewer as "hung" when it completes with 0 tool uses and produces no substantive output.
3. The guidance MUST instruct the agent to retry a hung reviewer up to `REVIEWER_MAX_RETRIES` times (default: 1) before skipping.
4. After exhausting retries the guidance MUST instruct the agent to: (a) call `mark_review_as_passed` with the review ID only (the tool accepts no comment argument), and (b) separately inform the user with a skip message — not silently pass the review without any attempt.
5. The guidance MUST distinguish a fast-fail (elapsed time under `REVIEWER_FAST_FAIL_SECONDS`, default: 30) from a slow-hang (elapsed time at or above that threshold), advising an immediate retry for fast-fails and a brief-pause retry for slow-hangs.
6. `REVIEWER_MAX_RETRIES` and `REVIEWER_FAST_FAIL_SECONDS` MUST be named module-level constants (not inline magic numbers) so they can be adjusted without editing the guidance prose.
7. `_build_review_guidance()` MUST accept `max_retries` and `fast_fail_seconds` keyword arguments so callers can override the defaults in tests or alternative configurations.
50 changes: 47 additions & 3 deletions src/deepwork/jobs/mcp/quality_gate.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,19 @@

logger = logging.getLogger("deepwork.jobs.mcp.quality_gate")

# Hung-reviewer retry policy (configurable constants).
#
# A reviewer is considered "hung" when it completes with 0 tool uses —
# a strong signal of API overload or a dropped connection. The dispatch
# guidance instructs Claude to retry hung reviewers up to
# REVIEWER_MAX_RETRIES times before skipping and logging a warning.
#
# REVIEWER_FAST_FAIL_SECONDS: if a reviewer returns 0 tool uses *and*
# elapsed time is below this threshold it almost certainly never received
# an API response at all — retry immediately without any backoff.
REVIEWER_MAX_RETRIES: int = 1
REVIEWER_FAST_FAIL_SECONDS: int = 30


def validate_json_schemas(
outputs: dict[str, ArgumentValue],
Expand Down Expand Up @@ -480,8 +493,27 @@ def run_quality_gate(
return guidance


def _build_review_guidance(review_output: str) -> str:
"""Build the complete review guidance including /review skill instructions."""
def _build_review_guidance(
review_output: str,
max_retries: int = REVIEWER_MAX_RETRIES,
fast_fail_seconds: int = REVIEWER_FAST_FAIL_SECONDS,
) -> str:
"""Build the complete review guidance including hung-reviewer retry policy.

Args:
review_output: Formatted list of review tasks from format_for_claude.
max_retries: How many times to retry a hung reviewer before skipping.
fast_fail_seconds: Elapsed-time threshold below which a 0-tool-use
result is treated as a fast-fail and retried immediately.
"""
retry_word = "retry" if max_retries == 1 else "retries"
retry_instruction = (
"Retry it once" if max_retries == 1 else f"Retry it up to {max_retries} times"
)
exhaust_condition = (
"If the retry also returns" if max_retries == 1 else f"If all {max_retries} retries return"
)
total_attempts = max_retries + 1
return f"""Quality reviews are required before this step can advance.

{review_output}
Expand All @@ -490,6 +522,18 @@ def _build_review_guidance(review_output: str) -> str:

For each review task listed above, launch it as a parallel Agent. The task's prompt field points to an instruction file — read it and follow the review instructions.

## Handling Hung Reviewers

A reviewer has **hung** when it completes with **0 tool uses** — a signal of API overload or a dropped connection. Hung reviewers must be retried; do **not** silently pass them.

**Retry policy (max {max_retries} {retry_word} per reviewer)**:

1. After each reviewer completes, check whether it made 0 tool uses (shown as `0 tool uses` in the agent result) **and** produced no substantive output.
2. If both are true the reviewer hung. {retry_instruction} by re-launching the same agent with the same prompt.
- If elapsed time was under {fast_fail_seconds}s the reviewer fast-failed (never got an API response) — retry immediately.
- If elapsed time was {fast_fail_seconds}s or more (slow-hang), the reviewer started but stalled — retry after a brief pause.
3. {exhaust_condition} 0 tool uses: call `mark_review_as_passed` with the review ID. Then tell the user: "Review skipped after {total_attempts} failed attempts — manual review recommended for this step." Do **not** proceed without informing the user.

## After Reviews

For any failing reviews, if you believe the issue is invalid, then you can call `mark_review_as_passed` on it. Otherwise, you should act on any feedback from the review to fix the issues. Once done, call `finished_step` again to see if you will pass now."""
For any failing reviews where the reviewer produced actual findings: if you believe the issue is invalid, call `mark_review_as_passed` on it. Otherwise, act on the feedback, fix the issues, and call `finished_step` again."""
114 changes: 113 additions & 1 deletion tests/unit/jobs/mcp/test_quality_gate.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""Tests for MCP quality gate (reviews-based implementation).

Validates requirements: JOBS-REQ-004, JOBS-REQ-004.1, JOBS-REQ-004.2, JOBS-REQ-004.3,
JOBS-REQ-004.4, JOBS-REQ-004.5, JOBS-REQ-004.6, JOBS-REQ-004.7.
JOBS-REQ-004.4, JOBS-REQ-004.5, JOBS-REQ-004.6, JOBS-REQ-004.7, JOBS-REQ-004.8,
JOBS-REQ-004.9.

Note: JOBS-REQ-009 is DEPRECATED (superseded by JOBS-REQ-004). No tests required.
"""
Expand All @@ -13,6 +14,9 @@
from unittest.mock import patch

from deepwork.jobs.mcp.quality_gate import (
REVIEWER_FAST_FAIL_SECONDS,
REVIEWER_MAX_RETRIES,
_build_review_guidance,
build_dynamic_review_rules,
build_string_output_review_tasks,
run_quality_gate,
Expand Down Expand Up @@ -1897,3 +1901,111 @@ def test_reruns_review_when_file_content_changes_after_pass(self, tmp_path: Path
# Review MUST run again because content changed
assert result is not None
assert "Quality reviews are required" in result


class TestHungReviewerRetryPolicy:
"""Tests for the hung-reviewer retry policy — validates JOBS-REQ-004.9."""

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.1).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_guidance_includes_hung_reviewer_section(self) -> None:
"""Review guidance MUST include a hung-reviewer retry policy section."""
result = _build_review_guidance("## Review Tasks\n\n- some task")
assert "Handling Hung Reviewers" in result

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.2).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_guidance_defines_hung_as_zero_tool_uses(self) -> None:
"""Guidance MUST define a hung reviewer as one that returns 0 tool uses."""
result = _build_review_guidance("## Review Tasks\n\n- some task")
assert "0 tool uses" in result

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.3).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_guidance_specifies_max_retries(self) -> None:
"""Guidance MUST specify the max retry count from REVIEWER_MAX_RETRIES."""
result = _build_review_guidance("## Review Tasks\n\n- some task")
# Default is 1 retry — heading must say "max 1 retry"
assert (
f"max {REVIEWER_MAX_RETRIES} retry" in result
or f"max {REVIEWER_MAX_RETRIES} retries" in result
)

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.4).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_guidance_instructs_skip_with_note_after_retries_exhausted(self) -> None:
"""After retries exhausted, guidance MUST instruct skip + log, not silent pass."""
result = _build_review_guidance("## Review Tasks\n\n- some task")
assert "mark_review_as_passed" in result
assert "manual review recommended" in result

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.5).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_guidance_mentions_fast_fail_threshold(self) -> None:
"""Guidance MUST mention the fast-fail elapsed-time threshold."""
result = _build_review_guidance("## Review Tasks\n\n- some task")
# The threshold in seconds should appear in the guidance
assert str(REVIEWER_FAST_FAIL_SECONDS) in result

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.6).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_reviewer_max_retries_is_named_constant(self) -> None:
"""REVIEWER_MAX_RETRIES MUST be a module-level integer constant."""
assert isinstance(REVIEWER_MAX_RETRIES, int)
assert REVIEWER_MAX_RETRIES >= 1

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.6).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_reviewer_fast_fail_seconds_is_named_constant(self) -> None:
"""REVIEWER_FAST_FAIL_SECONDS MUST be a module-level integer constant."""
assert isinstance(REVIEWER_FAST_FAIL_SECONDS, int)
assert REVIEWER_FAST_FAIL_SECONDS > 0

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.7).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_build_review_guidance_accepts_override_params(self) -> None:
"""_build_review_guidance MUST accept max_retries and fast_fail_seconds kwargs."""
result = _build_review_guidance(
"## Review Tasks\n\n- some task",
max_retries=3,
fast_fail_seconds=60,
)
# max_retries=3 → heading says "max 3 retries", exhaust text says "4 failed attempts"
assert "max 3 retries" in result
assert "4 failed attempts" in result
# fast_fail_seconds=60 → threshold appears in fast-fail bullet
assert "60" in result

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.7).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_build_review_guidance_retry_text_reflects_max_retries(self) -> None:
"""Guidance retry count text MUST reflect the max_retries parameter."""
result_1 = _build_review_guidance("tasks", max_retries=1)
result_2 = _build_review_guidance("tasks", max_retries=2)
# After exhausting retries: message mentions total attempts (max_retries + 1)
assert "2 failed attempts" in result_1
assert "3 failed attempts" in result_2

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.5).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_guidance_describes_slow_hang_path(self) -> None:
"""Guidance MUST describe the slow-hang retry path (elapsed >= threshold)."""
result = _build_review_guidance("## Review Tasks\n\n- some task")
# Must mention that slow-hangs are also retried (not only fast-fails)
assert "slow-hang" in result or "slow" in result.lower()

# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.9.4).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
def test_guidance_instructs_separate_user_message_not_tool_comment(self) -> None:
"""Skip note MUST be communicated to the user separately, not as a tool arg.

mark_review_as_passed only accepts review_id — the guidance must not
imply passing a comment/note to the tool itself.
"""
result = _build_review_guidance("## Review Tasks\n\n- some task")
# Guidance should tell Claude to call mark_review_as_passed AND then
# separately tell the user — not pass a note argument to the tool.
assert "tell the user" in result
# Must NOT suggest an impossible "append comment" or "note=" argument
assert "append the comment" not in result
assert "note=" not in result
Loading