Benchmarks

Running benchmarks

from vlarobot.benchmark import BenchmarkSuite

suite = BenchmarkSuite(["simpler_env", "calvin"], num_episodes=20)
results = suite.evaluate(model)

Available benchmarks

simpler_env — SIMPLER evaluation environments (drawer open/close, push, pick-and-place)
calvin — CALVIN long-horizon manipulation benchmark
meta_world — Meta-World manipulation tasks
custom — Inference-only latency/throughput testing

Compare models

comparison = suite.compare({
    "openvla": openvla_model,
    "smolvla": smolvla_model,
})

CLI evaluation

vlarobot evaluate --model openvla-7b --benchmark simpler_env calvin