Disentangled Speech Embeddings Using Cross-Modal Self-Supervision

The objective of this paper is to learn representations of speaker identity without access to manually annotated data. To do so, we develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces and audio in vid
  • IEEE MemberUS $11.00
  • Society MemberUS $0.00
  • IEEE Student MemberUS $11.00
  • Non-IEEE MemberUS $15.00
Purchase

Videos in this product