This paper introduces GET-Zero, a model architecture and training procedure for learning an embodiment-aware control policy that can immediately adapt to new hardware changes without retraining. To do so, we present Graph Embodiment Transformer (GET), a transformer model that leverages the embodiment graph connectivity as a learned structural bias in the attention mechanism. We use behavior cloning to distill demonstration data from embodiment-specific expert policies into an embodiment-aware GET model that conditions on the hardware configuration of the robot to make control decisions. We conduct a case study on a dexterous in-hand object rotation task using different configurations of a four-fingered robot hand with joints removed and with link length extensions. Using the GET model along with a self-modeling loss enables GET-Zero to zero-shot generalize to unseen variation in graph structure and link length, yielding a 20% improvement over baseline methods.
GET is an embodiment-aware model architecture based on the transformer encoder that conditions on the robot hardware structure to zero-shot control new robot designs. The main idea is to leverage the embodiment graph (left), with joints as nodes and links as edges, as a structural bias in the attention mechanism (right). And by operating on per-joint tokens containing local hardware properties and observations, GET flexibly adapts to robots with varying number of joints and graph structures. GET also performs per-joint forward kinematics prediction as a self-modeling meta-task, which empirically improves cross-embodiment transfer.
With only a single set of network weights, GET-Zero controls many different hand designs even if we remove joints/fingers or add link length extensions. We compare to a baseline that doesn't have the graph encoding nor self-modeling features and find a substantial drop in performance.
We evaluate zero-shot transfer to (a) new graph variations, (b) link length extensions (orange), and (c) both link and graph variations. For each category we evaluate ten embodiments in simulation (pictured) and a subset to evaluate in real. Hands in the third row (link and graph variations) match those in the first row (connectivity variations), but with extensions applied to some links. We train on 44 hands with connectivity variations (not pictured). We test on embodiments with link length extensions (which were never added into training embodiments) to ensure that we evaluate on additional settings where the testing embodiments significantly vary from the training embodiments.
In simulation, we evaluate on these zero-shot hand designs not seen during training with joints removed and with link length extensions (orange) and observe GET-Zero can control a broad range of designs.
The sim-to-real zero-shot control of a previously unseen hand design is challenging for contact-rich tasks, especially with both graph and link length variations present. We observe a few failure cases with GET-Zero when testing on embodiments not seen during training. [Left] This hand with missing index finger and link length extensions has a gap where the cube can fall through. [Right] This hand struggles to rotate the cube due to a pinky finger that only has a single joint, making it difficult to start the rotation cycle.
Cube is dropped
Cube fails to rotate
Below are the hyperparamter choices for our GET architecuture, the behavior cloning training, and the dataset.
@misc{patel2024getzero,
title={{GET-Zero}: Graph Embodiment Transformer for Zero-shot Embodiment Generalization},
author={Austin Patel and Shuran Song},
year={2024},
eprint={2407.15002},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2407.15002},
}
We thank Kenneth Shaw, Prof. Pathak's lab at CMU, and the Prof. Liu's Movement lab at Stanford for sharing LEAP Hand hardware. We also thank Huy Ha, Xiaomeng Xu, Mengda Xu and Haochen Shi for their helpful feedback and fruitful discussions. This work was supported in part by Sloan Fellowship and NSF #2132519. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.