Quickstart
This guide walks you through running your first VLA inference, fine-tuning on custom data, and deploying to a robot.
Load a model and run inference
Load a pre-trained VLA model and predict actions from an image and natural language instruction:
from vlarobot import VLAModel
from PIL import Image
# Load a pre-trained model (downloads weights on first run)
model = VLAModel.from_preset("openvla-7b")
# Open a camera image and predict action
image = Image.open("robot_view.jpg")
action = model.predict(image, "pick up the red block")
print(action) # Action(x=0.12, y=-0.05, z=0.03, roll=0.01, pitch=-0.02, yaw=0.0, gripper=1.0)
The Action object contains 7-DoF end-effector deltas: x, y, z position; roll, pitch, yaw orientation; and gripper state.
Fine-tune on your data
Fine-tune a pre-trained model on your own demonstration data using LoRA:
from vlarobot.training import VLATrainer, TrainingConfig
config = TrainingConfig(
model="openvla-7b",
dataset="./my_demonstrations.hdf5",
method="lora",
num_epochs=10,
learning_rate=2e-5,
)
trainer = VLATrainer(config)
results = trainer.train()
print(f"Final loss: {results['loss']:.4f}")
Your data should be in HDF5 format with image and action datasets. See the Fine-tuning Guide for details.
Deploy to robot
Deploy a trained model to a physical robot with a simple controller interface:
from vlarobot.deploy import RobotController
controller = RobotController(model=model, robot="widowx", control_hz=10.0)
controller.start()
controller.execute("pick up the red block", max_steps=50)
controller.stop()
See Real Robots for supported hardware and setup instructions.
CLI usage
vlarobot includes a CLI for common operations:
# List available model presets
vlarobot presets
# Train a model
vlarobot train --model openvla-7b --dataset ./demos.hdf5 --method lora
# Evaluate on a benchmark
vlarobot evaluate --model openvla-7b --benchmark simpler_env
# Run single inference
vlarobot predict --model openvla-7b --image robot.jpg --instruction "pick up block"