GET-Zero

Graph Embodiment Transformer for
Zero-shot Embodiment Generalization

Stanford University

GET-Zero controls new hand designs without a new policy

Abstract

This paper introduces GET-Zero, a model architecture and training procedure for learning an embodiment-aware control policy that can immediately adapt to new hardware changes without retraining. To do so, we present Graph Embodiment Transformer (GET), a transformer model that leverages the embodiment graph connectivity as a learned structural bias in the attention mechanism. We use behavior cloning to distill demonstration data from embodiment-specific expert policies into an embodiment-aware GET model that conditions on the hardware configuration of the robot to make control decisions. We conduct a case study on a dexterous in-hand object rotation task using different configurations of a four-fingered robot hand with joints removed and with link length extensions. Using the GET model along with a self-modeling loss enables GET-Zero to zero-shot generalize to unseen variation in graph structure and link length, yielding a 20% improvement over baseline methods.



Graph Embodiment Transformer (GET)

GET is an embodiment-aware model architecture based on the transformer encoder that conditions on the robot hardware structure to zero-shot control new robot designs. The main idea is to leverage the embodiment graph (left), with joints as nodes and links as edges, as a structural bias in the attention mechanism (right). And by operating on per-joint tokens containing local hardware properties and observations, GET flexibly adapts to robots with varying number of joints and graph structures. GET also performs per-joint forward kinematics prediction as a self-modeling meta-task, which empirically improves cross-embodiment transfer.

GET-Zero Controls Many Designs

With only a single set of network weights, GET-Zero controls many different hand designs even if we remove joints/fingers or add link length extensions. We compare to a baseline that doesn't have the graph encoding nor self-modeling features and find a substantial drop in performance.

All joints present (Training)

GET-Zero (Ours) - 21 degrees/second

Baseline - 12 degrees/second

Shorten thumb and remove middle finger (Zero-shot)

GET-Zero (Ours) - 22 degrees/second

Baseline - 9 degrees/second

Extend finger lengths and remove joint from index finger (Zero-shot)

GET-Zero (Ours) - 19 degrees/second

Baseline - 9 degrees/second

Many design variations

In simulation, we explore even more hand designs not seen during training with joints removed and with link length extensions (orange) and observe GET-Zero can control a broad range of designs.

Failure Cases

The sim-to-real zero-shot control of a previously unseen hand design is challenging for contact-rich tasks, especially with both graph and link length variations present. We observe a few failure cases with GET-Zero when testing on embodiments not seen during training. [Left] This hand with missing index finger and link length extensions has a gap where the cube can fall through. [Right] This hand difficulty rotating the cube due to a pinky finger that only has a single joint, making it difficult to start the rotation cycle.

Cube is dropped

Cube fails to rotate


Citation

@misc{patel2024getzero,
      title={{GET-Zero}: Graph Embodiment Transformer for Zero-shot Embodiment Generalization}, 
      author={Austin Patel and Shuran Song},
      year={2024},
      eprint={2407.15002},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2407.15002}, 
}

Acknowledgment

We would like to thank Kenneth Shaw, Prof. Pathak's lab at CMU, and the Prof. Liu's Movement lab at Stanford for sharing the LEAP Hand hardware. We would also like to thank Huy Ha, Xiaomeng Xu, Mengda Xu and Haochen Shi for their helpful feedback and fruitful discussions. This work was supported in part by Sloan Fellowship and NSF #2132519. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.