Skip to content
Back to Hub

OpenVLA-7B

OpenVLA7BApache 2.0

Description

OpenVLA is a 7B-parameter vision-language-action model that uses a fused SigLIP + DinoV2 visual encoder with a Llama 2 language model backbone. It predicts discrete action tokens that are decoded into 7-DoF end-effector delta actions. Trained on 970k trajectories from the Open X-Embodiment dataset.

Hardware Requirements

Min GPUA100 40GB
VRAM~28 GB (BF16)
FrameworkPyTorch + Transformers
LicenseApache 2.0