Skip to content
Back to Docs

Quickstart

This guide walks you through running your first VLA inference, fine-tuning on custom data, and deploying to a robot.

Load a model and run inference

Load a pre-trained VLA model and predict actions from an image and natural language instruction:

from vlarobot import VLAModel
from PIL import Image

# Load a pre-trained model (downloads weights on first run)
model = VLAModel.from_preset("openvla-7b")

# Open a camera image and predict action
image = Image.open("robot_view.jpg")
action = model.predict(image, "pick up the red block")
print(action)  # Action(x=0.12, y=-0.05, z=0.03, roll=0.01, pitch=-0.02, yaw=0.0, gripper=1.0)

The Action object contains 7-DoF end-effector deltas: x, y, z position; roll, pitch, yaw orientation; and gripper state.

Fine-tune on your data

Fine-tune a pre-trained model on your own demonstration data using LoRA:

from vlarobot.training import VLATrainer, TrainingConfig

config = TrainingConfig(
    model="openvla-7b",
    dataset="./my_demonstrations.hdf5",
    method="lora",
    num_epochs=10,
    learning_rate=2e-5,
)

trainer = VLATrainer(config)
results = trainer.train()
print(f"Final loss: {results['loss']:.4f}")

Your data should be in HDF5 format with image and action datasets. See the Fine-tuning Guide for details.

Deploy to robot

Deploy a trained model to a physical robot with a simple controller interface:

from vlarobot.deploy import RobotController

controller = RobotController(model=model, robot="widowx", control_hz=10.0)
controller.start()
controller.execute("pick up the red block", max_steps=50)
controller.stop()

See Real Robots for supported hardware and setup instructions.

CLI usage

vlarobot includes a CLI for common operations:

# List available model presets
vlarobot presets

# Train a model
vlarobot train --model openvla-7b --dataset ./demos.hdf5 --method lora

# Evaluate on a benchmark
vlarobot evaluate --model openvla-7b --benchmark simpler_env

# Run single inference
vlarobot predict --model openvla-7b --image robot.jpg --instruction "pick up block"