Clarification on OpenDriveVLA collision evaluation metrics

Thank you for releasing OpenDriveVLA and the official evaluation code. We have been studying the model, and the work has been very helpful for our VLM-AD planning research.

In our local evaluation of the official OpenDriveVLA-0.5B checkpoint on nuScenes val, the log reports Processed total 6019 samples, gt collision: 36, with UniAD-style metrics L2 = 0.21/0.61/1.24/0.68 and Collision = 0.00/0.17/0.63/0.27%, and STP-3-style metrics L2 = 0.15/0.32/0.57/0.35 and Collision = 0.01/0.07/0.22/0.10%.

We would like to make sure we interpret these numbers correctly. In particular, could you kindly clarify the official collision evaluation protocol used here? For example, should the reported collision be understood as endpoint collision, averaged per-step collision, or any-collision-within-horizon? We also noticed that the evaluation appears to use precomputed planing_gt_segmentation_val; could you clarify the BEV grid resolution, ego-box handling, and whether this protocol is intended to be directly comparable with OmniDrive-style planning collision metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on OpenDriveVLA collision evaluation metrics #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarification on OpenDriveVLA collision evaluation metrics #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions