Asr Is All You Need: Cross-Modal Distillation For Lip Reading

The goal of this work is to train strong models for visual speech recognition without requiring human annotated ground truth data. We achieve this by distilling from an Automatic Speech Recognition (ASR) model that has been trained on a large-scale audio-
  • IEEE MemberUS $11.00
  • Society MemberUS $0.00
  • IEEE Student MemberUS $11.00
  • Non-IEEE MemberUS $15.00
Purchase

Videos in this product