Skip to content

feat: planner incremental builds and mapper frame registration#2565

Open
aclauer wants to merge 10 commits into
mainfrom
andrew/feat/ray-tracer-region-bounds
Open

feat: planner incremental builds and mapper frame registration#2565
aclauer wants to merge 10 commits into
mainfrom
andrew/feat/ray-tracer-region-bounds

Conversation

@aclauer

@aclauer aclauer commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Problem

Improvements to MLS planner and ray tracing

Closes DIM-XXX

Solution

ray tracer:

  • register point clouds with odometry (breaks old fastlio recordings!)
  • outputs region bounds for robust updates

planner:

  • incremental builds of planner artifacts in separate task so ingest isn't blocked
  • path replan on each frame
  • string pulled paths with tunable cost metrics
  • levers for path safety

How to Test

Contributor License Agreement

  • I have read and approved the CLA.

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2275 2 2273 74
View the full list of 2 ❄️ flaky test(s)
dimos.e2e_tests.test_dimsim_spatial_memory::test_go_to_the_bed

Flake rate in main: 22.22% (Passed 28 times, Failed 8 times)

Stack Traces | 572s run time
lcm_spy = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376209af60>
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7037625d7380>
human_input = <function human_input.<locals>.send_human_input at 0x7037625d7920>
dim_sim = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x70376210d550>
explore_house = <function explore_house.<locals>.explore at 0x7037625c8360>

    @pytest.mark.self_hosted_large
    def test_go_to_the_bed(lcm_spy, start_blueprint, human_input, dim_sim, explore_house) -> None:
        start_blueprint(
            "run",
            "unitree-go2-agentic",
            simulator="dimsim",
        )
        lcm_spy.save_topic(".../McpClient/on_system_modules/res")
        lcm_spy.wait_for_saved_topic(".../McpClient/on_system_modules/res", timeout=1200.0)
    
        explore_house()
    
        human_input("go to the bed")
    
>       lcm_spy.wait_until_odom_position(-3.567, -1.332, threshold=2, timeout=180)

dim_sim    = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x70376210d550>
explore_house = <function explore_house.<locals>.explore at 0x7037625c8360>
human_input = <function human_input.<locals>.send_human_input at 0x7037625d7920>
lcm_spy    = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376209af60>
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7037625d7380>

dimos/e2e_tests/test_dimsim_spatial_memory.py:32: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dimos/e2e_tests/lcm_spy.py:182: in wait_until_odom_position
    self.wait_for_message_result(
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7037625d7ce0>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376209af60>
        threshold  = 2
        timeout    = 180
        x          = -3.567
        y          = -1.332
dimos/e2e_tests/lcm_spy.py:168: in wait_for_message_result
    self.wait_until(
        event      = <threading.Event at 0x70376210ec30: unset>
        fail_message = 'Failed to get to position x=-3.567, y=-1.332'
        listener   = <function LcmSpy.wait_for_message_result.<locals>.listener at 0x7037625c8540>
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7037625d7ce0>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376209af60>
        timeout    = 180
        topic      = '/odom#geometry_msgs.PoseStamped'
        type       = <class 'dimos.msgs.geometry_msgs.PoseStamped.PoseStamped'>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376209af60>

    def wait_until(
        self,
        *,
        condition: Callable[[], bool],
        timeout: float,
        error_message: str,
        poll_interval: float = 0.1,
    ) -> None:
        start_time = time.time()
        while time.time() - start_time < timeout:
            if condition():
                return
            time.sleep(poll_interval)
>       raise TimeoutError(error_message)
E       TimeoutError: Failed to get to position x=-3.567, y=-1.332

condition  = <bound method Event.is_set of <threading.Event at 0x70376210ec30: unset>>
error_message = 'Failed to get to position x=-3.567, y=-1.332'
poll_interval = 0.1
self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376209af60>
start_time = 1782186193.6966882
timeout    = 180

dimos/e2e_tests/lcm_spy.py:105: TimeoutError
dimos.e2e_tests.test_dimsim_walk_forward::test_walk_forward

Flake rate in main: 28.12% (Passed 23 times, Failed 9 times)

Stack Traces | 206s run time
lcm_spy = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376210ff80>
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7037625c8cc0>
human_input = <function human_input.<locals>.send_human_input at 0x7037625c8e00>
dim_sim = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x7037621105c0>

    @pytest.mark.self_hosted_large
    def test_walk_forward(lcm_spy, start_blueprint, human_input, dim_sim) -> None:
        start_blueprint(
            "run",
            "--disable",
            "spatial-memory",
            "--disable",
            "security-module",
            "unitree-go2-agentic",
            simulator="dimsim",
        )
        lcm_spy.save_topic(".../McpClient/on_system_modules/res")
        lcm_spy.wait_for_saved_topic(".../McpClient/on_system_modules/res", timeout=1200.0)
    
        origin_x, origin_y = 1, 2
        dim_sim.set_agent_position(origin_x, origin_y)
    
        human_input("move forward 3 meter")
    
>       lcm_spy.wait_until_odom_position(origin_x + 3, origin_y, threshold=0.4, timeout=120)

dim_sim    = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x7037621105c0>
human_input = <function human_input.<locals>.send_human_input at 0x7037625c8e00>
lcm_spy    = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376210ff80>
origin_x   = 1
origin_y   = 2
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7037625c8cc0>

dimos/e2e_tests/test_dimsim_walk_forward.py:37: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dimos/e2e_tests/lcm_spy.py:182: in wait_until_odom_position
    self.wait_for_message_result(
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7037625d7ec0>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376210ff80>
        threshold  = 0.4
        timeout    = 120
        x          = 4
        y          = 2
dimos/e2e_tests/lcm_spy.py:168: in wait_for_message_result
    self.wait_until(
        event      = <threading.Event at 0x703762110bc0: unset>
        fail_message = 'Failed to get to position x=4, y=2'
        listener   = <function LcmSpy.wait_for_message_result.<locals>.listener at 0x7037625c8900>
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7037625d7ec0>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376210ff80>
        timeout    = 120
        topic      = '/odom#geometry_msgs.PoseStamped'
        type       = <class 'dimos.msgs.geometry_msgs.PoseStamped.PoseStamped'>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376210ff80>

    def wait_until(
        self,
        *,
        condition: Callable[[], bool],
        timeout: float,
        error_message: str,
        poll_interval: float = 0.1,
    ) -> None:
        start_time = time.time()
        while time.time() - start_time < timeout:
            if condition():
                return
            time.sleep(poll_interval)
>       raise TimeoutError(error_message)
E       TimeoutError: Failed to get to position x=4, y=2

condition  = <bound method Event.is_set of <threading.Event at 0x703762110bc0: unset>>
error_message = 'Failed to get to position x=4, y=2'
poll_interval = 0.1
self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x70376210ff80>
start_time = 1782186459.3957152
timeout    = 120

dimos/e2e_tests/lcm_spy.py:105: TimeoutError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@aclauer aclauer changed the title Andrew/feat/ray tracer region bounds feat: planner incremental builds and frame registration Jun 23, 2026
@aclauer aclauer marked this pull request as ready for review June 23, 2026 00:41
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces incremental graph updates to the MLS planner (replacing full rebuilds with windowed update_region passes) and adds sensor-frame→world cloud registration to the ray-tracing mapper via odometry quaternion. The result is a complete end-to-end pipeline where the mapper emits stamp-paired local_map + region_bounds messages that the planner consumes asynchronously through a new worker/handle-loop split.

  • Incremental planner: update_region replaces only the voxels inside a sensor-radius cylinder, re-extracts surfaces with extract_surfaces_region, and patches the Voronoi node graph via rebuild_region_graph; a full rebuild is still triggered on global map messages.
  • Frame registration: each lidar cloud is rotated and translated into world frame using the odometry (qx, qy, qz, qw) before being fed to VoxelRayMapper; per-batch region bounds are computed via a percentile radius over the accumulated world-frame points.
  • Async pairing: try_pair matches local_map and region_bounds by stamp equality and hands the pair to a background worker via Arc<Mutex<Option<T>>> + Arc<Notify>, with block_in_place keeping the CPU-heavy planner off the async executor thread.

Confidence Score: 4/5

Safe to merge with two low-severity hardening items addressed first.

The core logic — incremental surface extraction, Dijkstra region relabeling, frame registration, and stamp-paired async ingestion — is well-structured and covered by tests. Two P2 gaps prevent a 5: follow_preds in planner.rs lacks the cycle guard that walk_preds gained in this same PR (a thread-hang risk under any future graph corruption), and silent overwrite of unmatched pending messages in try_pair makes stamp-sync regressions invisible in logs.

dimos/navigation/nav_3d/mls_planner/rust/src/planner.rs (cycle guard) and dimos/navigation/nav_3d/mls_planner/rust/src/main.rs (pending-message logging).

Important Files Changed

Filename Overview
dimos/mapping/ray_tracing/transformer.py Adds sensor-frame→world registration via odometry quaternion and per-batch region-bounds computation; both batch_points and batch_origins are updated atomically inside the if pts.size: guard so the _local_bounds empty-batch path is correctly handled.
dimos/navigation/nav_3d/mls_planner/rust/src/planner.rs New path-planning module with snap_candidates, node_dijkstra, string_pull, and segment_metrics; follow_preds lacks the cycle guard that walk_preds gained in this PR, leaving a potential infinite loop under corrupted predecessor maps.
dimos/navigation/nav_3d/mls_planner/rust/src/main.rs Adds worker/handle-loop split with Arc+Notify for async map ingestion and stamp-based local_map/region_bounds pairing; unmatched pending messages are silently overwritten with no debug logging.
dimos/navigation/nav_3d/mls_planner/rust/src/surfaces.rs Adds extract_surfaces_region for windowed extraction and add_to_by_col / remove_from_by_col for incremental ColumnIz maintenance; morphological close correctly pads the read window by 2*closing_passes.
dimos/navigation/nav_3d/mls_planner/rust/src/dijkstra.rs Adds dijkstra_region for partial-window relabeling and walk_preds cycle guard; source field reset to 0 on region entry is benign since infinite-dist cells are excluded from edge relaxation.
dimos/navigation/nav_3d/mls_planner/rust/src/mls_planner.rs Adds update_region for incremental voxel replacement, surface re-extraction, and graph patching; replace_region_voxels uses par_iter for stale removal and rebuild_region_graph correctly chains edge/node/boundary rebuilds.
dimos/mapping/ray_tracing/rust/src/main.rs Adds last_pose for sensor-frame→world registration, build_global_and_local cloud split, and region_bounds encoding in PoseStamped orientation fields; emit_due correctly guards is_multiple_of with an every!=0 check.
dimos/navigation/nav_3d/mls_planner/transformer.py MLSPlan transformer switches to update_region via obs.tags["region_bounds"] and exposes voxel_map, surface_clearance, nodes, node_edges, and timings in output tags.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Lidar as Lidar
    participant RTMapper as RayTraceMap
    participant RustMapper as VoxelRayMapper
    participant MLSHandle as MlsPlanner Handle
    participant Worker as MlsPlanner Worker

    Lidar->>RTMapper: Observation[PointCloud2] + pose(qx,qy,qz,qw)
    RTMapper->>RTMapper: rotate+translate pts to world frame
    RTMapper->>RustMapper: add_frame(world_pts, origin)
    RTMapper->>RustMapper: local_map(cx, cy, radius, z_min, z_max)
    RTMapper-->>MLSHandle: local_cloud + region_bounds tag

    MLSHandle->>MLSHandle: on_local_map sets pending_local, calls try_pair
    MLSHandle->>MLSHandle: on_region_bounds sets pending_bounds, calls try_pair
    MLSHandle->>MLSHandle: stamps_paired check
    MLSHandle->>Worker: Arc-Mutex gets MapUpdate::Region, Notify fires

    Worker->>Worker: replace_region_voxels (parallel remove + insert)
    Worker->>Worker: extract_surfaces_region (windowed morphological close)
    Worker->>Worker: rebuild_region_graph (edges, nodes, boundaries)
    Worker->>Worker: dijkstra_region (wall + node Dijkstra states)
    Worker->>Worker: snap_candidates, node_dijkstra, string_pull
    Worker-->>MLSHandle: publish Path + viz
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Lidar as Lidar
    participant RTMapper as RayTraceMap
    participant RustMapper as VoxelRayMapper
    participant MLSHandle as MlsPlanner Handle
    participant Worker as MlsPlanner Worker

    Lidar->>RTMapper: Observation[PointCloud2] + pose(qx,qy,qz,qw)
    RTMapper->>RTMapper: rotate+translate pts to world frame
    RTMapper->>RustMapper: add_frame(world_pts, origin)
    RTMapper->>RustMapper: local_map(cx, cy, radius, z_min, z_max)
    RTMapper-->>MLSHandle: local_cloud + region_bounds tag

    MLSHandle->>MLSHandle: on_local_map sets pending_local, calls try_pair
    MLSHandle->>MLSHandle: on_region_bounds sets pending_bounds, calls try_pair
    MLSHandle->>MLSHandle: stamps_paired check
    MLSHandle->>Worker: Arc-Mutex gets MapUpdate::Region, Notify fires

    Worker->>Worker: replace_region_voxels (parallel remove + insert)
    Worker->>Worker: extract_surfaces_region (windowed morphological close)
    Worker->>Worker: rebuild_region_graph (edges, nodes, boundaries)
    Worker->>Worker: dijkstra_region (wall + node Dijkstra states)
    Worker->>Worker: snap_candidates, node_dijkstra, string_pull
    Worker-->>MLSHandle: publish Path + viz
Loading

Reviews (3): Last reviewed commit: "Minor fixes" | Re-trigger Greptile

Comment thread dimos/navigation/nav_3d/mls_planner/transformer.py Outdated
Comment thread dimos/mapping/ray_tracing/rust/src/main.rs
@aclauer aclauer changed the title feat: planner incremental builds and frame registration feat: planner incremental builds and mapper frame registration Jun 23, 2026
@github-actions github-actions Bot added the ready-to-merge Required CI checks have passed on this PR label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Required CI checks have passed on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant