Skip to content

IRMVLab/CodeRobo

Repository files navigation

CodeRobo Preview

CodeRobo is a research preview of a language-model-driven visual manipulation stack built on top of the LIBERO benchmark. The goal of this preview is to make the system architecture, reusable method components, evaluation gates, and current evidence boundaries visible to the community without publishing private run artifacts, full prompt-generation material, credentials, or machine-local experiment state.

This is not a formal release. APIs, scripts, and results may still change.

What Is Included

  • libero/: the LIBERO benchmark package, assets, BDDL task definitions, lifelong-learning baselines, and Hydra configs.
  • libero_sdk/: a small Python SDK around LIBERO tasks, environments, robot control, perception wrappers, and skills.
  • lmvs/: the CodeRobo / LMVS policy stack, including LLM planning, action primitives, perception fusion, structured memory, evidence tracking, skill registries, and audit utilities.
  • scripts/: static checks, sweep runners, gate checkers, evidence summaries, and focused smoke tests.
  • docs/: curated architecture and evaluation notes for the preview.

Generated datasets, checkpoints, model caches, local run directories, Codex transcripts, complete prompt-generation files, and presentation build artifacts are intentionally excluded from the public preview.

Method Overview

CodeRobo separates robot manipulation into auditable modules:

  1. Task and environment layer: LIBERO provides benchmark suites, BDDL task files, initial states, observations, and sparse success signals.
  2. Observation and perception layer: RGB-D observations are converted into object candidates, relation evidence, placement regions, grasp proposals, and current-trial world evidence.
  3. LLM planning layer: CodexBrain produces one structured high-level decision at a time from task language, visible evidence, action history, prompt-safe memory, and available tools.
  4. Action layer: ActionAPI exposes generic manipulation primitives such as visual pick, servo-assisted grasp, placement, relation-aware placement, contact scan, recovery, and terminal decisions.
  5. Memory and skill layer: semantic memory, skill performance records, and failure summaries are stored as typed records. Reusable memory is restricted to human-observable strategy and must not contain hidden simulator state, expert replay actions, or demo-derived absolute coordinates.
  6. Audit layer: scripts check no runtime ground truth, no replay actions, transcript completeness, Codex health, metadata consistency, same-task variation, and broader task-family claims.

The preview emphasizes transparent evidence over a single headline number. Successful demonstrations should include the exact sweep command, row metadata, model-call transcripts when applicable, no-replay checks, and gate outputs.

Installation

The base LIBERO environment targets Python 3.8.

conda create -n coderobo python=3.8.13
conda activate coderobo
pip install -r requirements.txt
pip install -e .

For GPU simulation, set MuJoCo/EGL device variables before running rollouts:

export CUDA_VISIBLE_DEVICES=0
export MUJOCO_EGL_DEVICE_ID=0

Download LIBERO datasets only when you need demonstrations or benchmark rollouts:

python benchmark_scripts/download_libero_datasets.py --datasets libero_spatial --use-huggingface

Datasets are ignored by Git and should stay outside the preview source release.

Optional LLM Configuration

The LLM planner can use a local Codex CLI transport, a file-backed subagent bridge, or an OpenAI-compatible API endpoint depending on the script.

For API transport:

export CODEX_API_BASE_URL="https://your-compatible-endpoint.example/v1"
export CODEX_API_KEY="..."
export CODEX_API_MODEL="your-model-name"

Never commit API keys, endpoint credentials, private transcripts, or prompt archives. The preview repository documents the interfaces, not private service configuration.

Quick Checks

Run fast checks that do not require MuJoCo simulation:

python scripts/run_lmvs_static_checks.py
python scripts/check_no_runtime_gt.py lmvs

The first command compiles the LMVS/SDK/script code and runs focused script tests. The second scans the runtime policy code for hidden simulator ground truth usage.

Dry-Run a Matrix

To inspect commands without launching long robot evaluations:

python scripts/run_codex_40_task_experiment.py --dry-run \
  --suites libero_spatial,libero_object,libero_goal,libero_10 \
  --tasks 0,1,2,3,4,5,6,7,8,9 \
  --init-states 0,1,2

Run a Small Sweep

A minimal simulation sweep writes a JSON summary under lmvs_runs/:

python scripts/run_libero_object_sweep.py \
  --suite libero_spatial \
  --tasks 0 \
  --init-states 0 \
  --max-turns 6 \
  --policy-mode codex-brain \
  --codex-tool-profile generic \
  --out lmvs_runs/spatial_t0_i0_preview.json

Depending on the selected transport, you may also need codex CLI login or the CODEX_API_* variables above.

Evidence and Results Summary

Current preview evidence supports the following conservative conclusions:

  • The architecture for LLM-guided visual closed-loop manipulation is in place: task discovery, visual evidence, generic action tools, structured memory, model-call transports, and gate scripts are implemented.
  • Strict generic-tool runs show meaningful local progress on LIBERO Spatial and selected Object/Goal cases, especially when tasks can be solved through visible object selection, servo grasping, and relation-aware placement.
  • The project should not yet be described as a solved full LIBERO benchmark or broad task-family generalization system. Hard cases remain around thin-object contact, object-container insertion, post-release verification, long-horizon recovery, and robustness under held-out initial states.
  • Evidence quality is treated as part of the method: a claimed result should pass no-runtime-ground-truth, no-replay, transcript, health, metadata, and variation gates before being reported.

See docs/PREVIEW_RELEASE.md for the open-source boundary and docs/RESULTS_SUMMARY.md for a concise result statement.

Repository Hygiene for Preview

Before pushing to https://github.com/IRMVLab/CodeRobo, verify that the staged files do not include:

  • lmvs_runs/, lmvs_memory/, lmvs_demo_bundles/, sdk_output/, agent_io/
  • libero/datasets/, bert/, experiments/, hf_cache/, external_repos/
  • presentations/*/prompts/, generated PPTX files, generated slide images
  • API keys, private endpoint URLs, local absolute paths, or full model prompts

Recommended pre-push checks:

git status --short
rg -n --hidden -g '!.git/**' -g '!libero/datasets/**' -g '!bert/**' \
  -g '!experiments/**' -g '!presentations/**' \
  'sk-[A-Za-z0-9_-]+|API_KEY|SECRET|TOKEN|/(home|root)/|<local-path>' .
python scripts/run_lmvs_static_checks.py
python scripts/check_no_runtime_gt.py lmvs

Relation to LIBERO

This preview builds on LIBERO:

@article{liu2023libero,
  title={LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning},
  author={Liu, Bo and Zhu, Yifeng and Gao, Chongkai and Feng, Yihao and Liu, Qiang and Zhu, Yuke and Stone, Peter},
  journal={arXiv preprint arXiv:2306.03310},
  year={2023}
}

License

The inherited LIBERO code is distributed under the MIT License. Dataset assets and third-party model assets may have separate licenses; keep downloaded data, checkpoints, and external repositories outside the preview source tree unless their redistribution terms have been reviewed.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors