Back to Hub
OpenVLA-7B
OpenVLA7BApache 2.0
Description
OpenVLA is a 7B-parameter vision-language-action model that uses a fused SigLIP + DinoV2 visual encoder with a Llama 2 language model backbone. It predicts discrete action tokens that are decoded into 7-DoF end-effector delta actions. Trained on 970k trajectories from the Open X-Embodiment dataset.
Hardware Requirements
| Min GPU | A100 40GB |
| VRAM | ~28 GB (BF16) |
| Framework | PyTorch + Transformers |
| License | Apache 2.0 |