\ours: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

Zhixuan Xu1,2, Yichen Li1,2, Xuanye Wu2,3, Tianyu Qiu2,4, Lin Shao1,2,†
1National University of Singapore 2RoboScience 3Huazhong University of Science and Technology 4South China University of Technology Corresponding author

Summary Video

Narration included.

Design of \ours

Hardware System Overview

Hardware system overview

Exploded View

Robustness of Deformation Measurement

Eccentric Force Sensing

Lateral Force Sensing

Light Robustness(50% Cover)

Light Robustness(100% Cover)

Learning Dexterous Manipulation Skills with
FingerEye

Coin Standing

Chip Picking

Letter Retrieving

Syringe Manipulation

Rollout Results

Teaser Image

Simulation-Augmented Representation Learning

Method Overview

Simulation-augmented representation learning overview

Method Result

Simulation-augmented representation learning result

Yellow

x2.5

Green

x3

Orange

x2

Red

x2

Purple

x4

Fabrication

Abstract

Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt its actions throughout interaction. However, many existing tactile sensors, such as GelSight and its variants, only provide feedback after contact is established, limiting a robot's ability to precisely initiate contact. We introduce \ours, a compact and cost-effective sensor that provides continuous vision-tactile feedback throughout the interaction process. \ours integrates binocular RGB cameras to provide close-range visual perception with implicit stereo depth. Upon contact, external forces and torques deform a compliant ring structure; these deformations are captured via marker-based pose estimation and serve as a proxy for contact wrench sensing. This design enables a perception stream that smoothly transitions from pre-contact visual cues to post-contact tactile feedback. Building on this sensing capability, we develop a vision-tactile imitation learning policy that fuses signals from multiple \ours sensors to learn dexterous manipulation behaviors from limited real-world data. We further develop a digital twin of our sensor and robot platform to improve policy generalization. By combining real demonstrations with visually augmented simulated observations for representation learning, the learned policies become more robust to object appearance variations. Together, these design aspects enable dexterous manipulation across diverse object properties and interaction regimes, including coin standing, chip picking, letter retrieving, and syringe manipulation.

BibTeX

      
        TODO