CodeRobo is a research preview of a language-model-driven visual manipulation stack built on top of the LIBERO benchmark. The goal of this preview is to make the system architecture, reusable method components, evaluation gates, and current evidence boundaries visible to the community without publishing private run artifacts, full prompt-generation material, credentials, or machine-local experiment state.
This is not a formal release. APIs, scripts, and results may still change.
libero/: the LIBERO benchmark package, assets, BDDL task definitions, lifelong-learning baselines, and Hydra configs.libero_sdk/: a small Python SDK around LIBERO tasks, environments, robot control, perception wrappers, and skills.lmvs/: the CodeRobo / LMVS policy stack, including LLM planning, action primitives, perception fusion, structured memory, evidence tracking, skill registries, and audit utilities.scripts/: static checks, sweep runners, gate checkers, evidence summaries, and focused smoke tests.docs/: curated architecture and evaluation notes for the preview.
Generated datasets, checkpoints, model caches, local run directories, Codex transcripts, complete prompt-generation files, and presentation build artifacts are intentionally excluded from the public preview.
CodeRobo separates robot manipulation into auditable modules:
- Task and environment layer: LIBERO provides benchmark suites, BDDL task files, initial states, observations, and sparse success signals.
- Observation and perception layer: RGB-D observations are converted into object candidates, relation evidence, placement regions, grasp proposals, and current-trial world evidence.
- LLM planning layer:
CodexBrainproduces one structured high-level decision at a time from task language, visible evidence, action history, prompt-safe memory, and available tools. - Action layer:
ActionAPIexposes generic manipulation primitives such as visual pick, servo-assisted grasp, placement, relation-aware placement, contact scan, recovery, and terminal decisions. - Memory and skill layer: semantic memory, skill performance records, and failure summaries are stored as typed records. Reusable memory is restricted to human-observable strategy and must not contain hidden simulator state, expert replay actions, or demo-derived absolute coordinates.
- Audit layer: scripts check no runtime ground truth, no replay actions, transcript completeness, Codex health, metadata consistency, same-task variation, and broader task-family claims.
The preview emphasizes transparent evidence over a single headline number. Successful demonstrations should include the exact sweep command, row metadata, model-call transcripts when applicable, no-replay checks, and gate outputs.
The base LIBERO environment targets Python 3.8.
conda create -n coderobo python=3.8.13
conda activate coderobo
pip install -r requirements.txt
pip install -e .For GPU simulation, set MuJoCo/EGL device variables before running rollouts:
export CUDA_VISIBLE_DEVICES=0
export MUJOCO_EGL_DEVICE_ID=0Download LIBERO datasets only when you need demonstrations or benchmark rollouts:
python benchmark_scripts/download_libero_datasets.py --datasets libero_spatial --use-huggingfaceDatasets are ignored by Git and should stay outside the preview source release.
The LLM planner can use a local Codex CLI transport, a file-backed subagent bridge, or an OpenAI-compatible API endpoint depending on the script.
For API transport:
export CODEX_API_BASE_URL="https://your-compatible-endpoint.example/v1"
export CODEX_API_KEY="..."
export CODEX_API_MODEL="your-model-name"Never commit API keys, endpoint credentials, private transcripts, or prompt archives. The preview repository documents the interfaces, not private service configuration.
Run fast checks that do not require MuJoCo simulation:
python scripts/run_lmvs_static_checks.py
python scripts/check_no_runtime_gt.py lmvsThe first command compiles the LMVS/SDK/script code and runs focused script tests. The second scans the runtime policy code for hidden simulator ground truth usage.
To inspect commands without launching long robot evaluations:
python scripts/run_codex_40_task_experiment.py --dry-run \
--suites libero_spatial,libero_object,libero_goal,libero_10 \
--tasks 0,1,2,3,4,5,6,7,8,9 \
--init-states 0,1,2A minimal simulation sweep writes a JSON summary under lmvs_runs/:
python scripts/run_libero_object_sweep.py \
--suite libero_spatial \
--tasks 0 \
--init-states 0 \
--max-turns 6 \
--policy-mode codex-brain \
--codex-tool-profile generic \
--out lmvs_runs/spatial_t0_i0_preview.jsonDepending on the selected transport, you may also need codex CLI login or
the CODEX_API_* variables above.
Current preview evidence supports the following conservative conclusions:
- The architecture for LLM-guided visual closed-loop manipulation is in place: task discovery, visual evidence, generic action tools, structured memory, model-call transports, and gate scripts are implemented.
- Strict generic-tool runs show meaningful local progress on LIBERO Spatial and selected Object/Goal cases, especially when tasks can be solved through visible object selection, servo grasping, and relation-aware placement.
- The project should not yet be described as a solved full LIBERO benchmark or broad task-family generalization system. Hard cases remain around thin-object contact, object-container insertion, post-release verification, long-horizon recovery, and robustness under held-out initial states.
- Evidence quality is treated as part of the method: a claimed result should pass no-runtime-ground-truth, no-replay, transcript, health, metadata, and variation gates before being reported.
See docs/PREVIEW_RELEASE.md for the open-source boundary and
docs/RESULTS_SUMMARY.md for a concise result statement.
Before pushing to https://github.com/IRMVLab/CodeRobo, verify that the staged
files do not include:
lmvs_runs/,lmvs_memory/,lmvs_demo_bundles/,sdk_output/,agent_io/libero/datasets/,bert/,experiments/,hf_cache/,external_repos/presentations/*/prompts/, generated PPTX files, generated slide images- API keys, private endpoint URLs, local absolute paths, or full model prompts
Recommended pre-push checks:
git status --short
rg -n --hidden -g '!.git/**' -g '!libero/datasets/**' -g '!bert/**' \
-g '!experiments/**' -g '!presentations/**' \
'sk-[A-Za-z0-9_-]+|API_KEY|SECRET|TOKEN|/(home|root)/|<local-path>' .
python scripts/run_lmvs_static_checks.py
python scripts/check_no_runtime_gt.py lmvsThis preview builds on LIBERO:
@article{liu2023libero,
title={LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning},
author={Liu, Bo and Zhu, Yifeng and Gao, Chongkai and Feng, Yihao and Liu, Qiang and Zhu, Yuke and Stone, Peter},
journal={arXiv preprint arXiv:2306.03310},
year={2023}
}The inherited LIBERO code is distributed under the MIT License. Dataset assets and third-party model assets may have separate licenses; keep downloaded data, checkpoints, and external repositories outside the preview source tree unless their redistribution terms have been reviewed.