Thank you for releasing OpenDriveVLA and the official evaluation code. We have been studying the model, and the work has been very helpful for our VLM-AD planning research.
In our local evaluation of the official OpenDriveVLA-0.5B checkpoint on nuScenes val, the log reports Processed total 6019 samples, gt collision: 36, with UniAD-style metrics L2 = 0.21/0.61/1.24/0.68 and Collision = 0.00/0.17/0.63/0.27%, and STP-3-style metrics L2 = 0.15/0.32/0.57/0.35 and Collision = 0.01/0.07/0.22/0.10%.
We would like to make sure we interpret these numbers correctly. In particular, could you kindly clarify the official collision evaluation protocol used here? For example, should the reported collision be understood as endpoint collision, averaged per-step collision, or any-collision-within-horizon? We also noticed that the evaluation appears to use precomputed planing_gt_segmentation_val; could you clarify the BEV grid resolution, ego-box handling, and whether this protocol is intended to be directly comparable with OmniDrive-style planning collision metrics
Thank you for releasing OpenDriveVLA and the official evaluation code. We have been studying the model, and the work has been very helpful for our VLM-AD planning research.
In our local evaluation of the official OpenDriveVLA-0.5B checkpoint on nuScenes val, the log reports Processed total 6019 samples, gt collision: 36, with UniAD-style metrics L2 = 0.21/0.61/1.24/0.68 and Collision = 0.00/0.17/0.63/0.27%, and STP-3-style metrics L2 = 0.15/0.32/0.57/0.35 and Collision = 0.01/0.07/0.22/0.10%.
We would like to make sure we interpret these numbers correctly. In particular, could you kindly clarify the official collision evaluation protocol used here? For example, should the reported collision be understood as endpoint collision, averaged per-step collision, or any-collision-within-horizon? We also noticed that the evaluation appears to use precomputed planing_gt_segmentation_val; could you clarify the BEV grid resolution, ego-box handling, and whether this protocol is intended to be directly comparable with OmniDrive-style planning collision metrics