Benchmarks
Running benchmarks
from vlarobot.benchmark import BenchmarkSuite
suite = BenchmarkSuite(["simpler_env", "calvin"], num_episodes=20)
results = suite.evaluate(model)
Available benchmarks
- simpler_env — SIMPLER evaluation environments (drawer open/close, push, pick-and-place)
- calvin — CALVIN long-horizon manipulation benchmark
- meta_world — Meta-World manipulation tasks
- custom — Inference-only latency/throughput testing
Compare models
comparison = suite.compare({
"openvla": openvla_model,
"smolvla": smolvla_model,
})
CLI evaluation
vlarobot evaluate --model openvla-7b --benchmark simpler_env calvin