\ours: Learning Dexterous Manipulation with Continuous Vision-Tactile Sensing

Zhixuan Xu1,2*, Yichen Li1,2*, Xuanye Wu2,3, Tianyu Qiu2,4, Lin Shao1,2†
1National University of Singapore 2RoboScience 3Huazhong University of Science and Technology 4South China University of Technology *Equal contribution Corresponding author

Summary Video

Narration included.

\ours Sensing Interface

Hardware System Overview

Hardware system overview

Exploded View

Sensing Capability: Robust Pose Estimation

Eccentric Force Sensing

Lateral Force Sensing

Light Robustness(50% Cover)

Light Robustness(100% Cover)

\ours Learning Interface Evaluation

Coin Standing

Chip Picking

Letter Retrieving

Syringe Manipulation

Key Observations

O1

FingerEye beats wrist-only

Adding full binocular FingerEye to wrist vision raises mean success from 26.7% to 65.9% in simulation and from 37.5% to 71.3% in the real world.

O2

Binocular beats monocular

Full binocular FingerEye outperforms the monocular variant in mean success: 65.9% vs. 59.1% in simulation and 71.3% vs. 56.3% in the real world.

O3

Continuous beats post-contact

Post-contact tactile maps stay near wrist-only performance in simulation mean success, 24.1% vs. 26.7%, while wrist + FingerEye reaches 65.9%.

O4

Group fusion is strongest

GEnc+GDec reaches 65.9% simulation mean success, above GEnc+FDec at 59.8% and NoEnc+FDec, the strongest non-grouped baseline, at 52.0%.

Fabrication Details

Abstract

Dexterous robotic manipulation requires perception that remains informative from pre-contact approach to contact initiation and post-contact control. We introduce FingerEye, a sensing and learning framework that strengthens robotic dexterity through continuous vision-tactile feedback throughout interaction. On the sensing side, FingerEye integrates binocular RGB cameras with a compliant contact interface to support perception both before and after contact. Before contact, the fingertip cameras provide close-range visual cues and implicit stereo for precise approach and object localization. After contact, marker-tracked deformation of the compliant ring provides a proxy for contact wrench sensing. On the learning side, we build real-and-sim infrastructure for data collection and evaluation, systematically study policy-interface designs for learning with multiple FingerEye sensors, and develop FingerEye Policy, which applies group-structured modality fusion to reduce modality shortcuts and better exploit distributed fingertip feedback. Across seven contact-sensitive task settings, FingerEye improves wrist-only policy by over \(30\) percentage points in mean success rate in both simulation and the real world.

BibTeX

      
@misc{xu2026fingereyecontinuousunifiedvisiontactile,
      title={FingerEye: Learning Dexterous Manipulation with Continuous Vision-Tactile Sensing},
      author={Zhixuan Xu and Yichen Li and Xuanye Wu and Tianyu Qiu and Lin Shao},
      year={2026},
      eprint={2604.20689},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2604.20689}, 
}