\ours: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

Zhixuan Xu^1,2, Yichen Li^1,2, Xuanye Wu^2,3, Tianyu Qiu^2,4, Lin Shao^1,2,†

¹National University of Singapore ²RoboScience ³Huazhong University of Science and Technology ⁴South China University of Technology ^†Corresponding author

arXiv PDF Hardware Guide Twitter

Code (Policy) Code (Real-World) Code (Sim)

Summary Video

Narration included.

Design of \ours

Hardware System Overview

Exploded View

Robustness of Deformation Measurement

Eccentric Force Sensing

Lateral Force Sensing

Light Robustness(50% Cover)

Light Robustness(100% Cover)

Gallery of Delicate Grasp Experiment

Each clip shows one target object in the delicate grasp setting.

Balloon

Chips Bag

Cones

Eggs

Grape

Paper

Paper Cup

Pencil

Seaweed

Learning Dexterous Manipulation Skills with
FingerEye

Coin Standing

Chip Picking

Letter Retrieving

Syringe Manipulation

Rollout Results

Simulation-Augmented Representation Learning

Method Overview

Method Result

Simulation-augmented representation learning result

Yellow

x2.5

Green

Orange

Red

Purple

Fabrication

Abstract

Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt its actions throughout interaction. However, many existing tactile sensors, such as GelSight and its variants, only provide feedback after contact is established, limiting a robot's ability to precisely initiate contact. We introduce \ours, a compact and cost-effective sensor that provides continuous vision-tactile feedback throughout the interaction process. \ours integrates binocular RGB cameras to provide close-range visual perception with implicit stereo depth. Upon contact, external forces and torques deform a compliant ring structure; these deformations are captured via marker-based pose estimation and serve as a proxy for contact wrench sensing. This design enables a perception stream that smoothly transitions from pre-contact visual cues to post-contact tactile feedback. Building on this sensing capability, we develop a vision-tactile imitation learning policy that fuses signals from multiple \ours sensors to learn dexterous manipulation behaviors from limited real-world data. We further develop a digital twin of our sensor and robot platform to improve policy generalization. By combining real demonstrations with visually augmented simulated observations for representation learning, the learned policies become more robust to object appearance variations. Together, these design aspects enable dexterous manipulation across diverse object properties and interaction regimes, including coin standing, chip picking, letter retrieving, and syringe manipulation.

BibTeX

      
        TODO