Skip to content

LocalEvalService crashes with len(None) when inference_result.inferences is None #5876

@llamallamaredpajama

Description

@llamallamaredpajama

Summary

LocalEvalService._evaluate_single_inference_result() crashes with TypeError: object of type 'NoneType' has no len() when inference fails and InferenceResult.inferences is left as None.

The failing path is visible in google-adk==1.34.0:

if eval_case.conversation_scenario is None and len(
    inference_result.inferences
) != len(eval_case.conversation):
    ...

InferenceResult.inferences is optional and _perform_inference_single_eval_item() can produce a failure result with status=InferenceStatus.FAILURE, error_message=..., and inferences=None (for example when auth refresh, network, or model calls fail). The later len(inference_result.inferences) aborts the entire eval run instead of returning a structured per-case failure.

Reproduction

Run any eval case where inference fails before invocations are generated. One local repro from an ADK project using Vertex/ADC with expired ADC credentials:

cd idc-runtime
GOOGLE_GENAI_USE_VERTEXAI=True \
GOOGLE_CLOUD_PROJECT=idc-runtime-prod \
GOOGLE_CLOUD_LOCATION=global \
uv run agents-cli eval run --evalset tests/eval/evalsets/build_role.evalset.json

The underlying inference error is:

google.auth.exceptions.RefreshError: Reauthentication is needed. Please run `gcloud auth application-default login` to reauthenticate.

Then ADK crashes in the eval framework:

File ".../google/adk/evaluation/local_eval_service.py", line 273, in _evaluate_single_inference_result
    if eval_case.conversation_scenario is None and len(
                                                   ^^^^
TypeError: object of type 'NoneType' has no len()

The same issue can surface during multi-case eval runs; several tasks can fail inference and then all hit len(None) while asyncio.as_completed(...) is collecting results.

Expected behavior

If inference failed and inference_result.inferences is None, ADK should not call len(None). It should surface a structured failed EvalCaseResult using the existing inference_result.error_message, then allow the rest of the eval run to continue.

Local workaround

We are carrying a narrow in-process guard that checks inference_result.inferences is None before delegating to the original method and returns an EvalCaseResult(final_eval_status=EvalStatus.FAILED, ...) for that case. We also serialize CI eval cases while this failure path is unresolved.

Environment

  • google-adk==1.34.0
  • Python 3.12
  • macOS local repro; also observed from GitHub Actions runners when live Vertex evals reach an inference failure

Metadata

Metadata

Labels

eval[Component] This issue is related to evaluationrequest clarification[Status] The maintainer need clarification or more information from the author

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions