An Analysis Of Speech Enhancement And Recognition Losses In Limited Resources Multi-Talker Single Channel Audio-Visual Asr

In this paper, we analyzed how audio-visual speech enhancement can help to perform the ASR task in a cocktail party scenario. Therefore we considered two simple end-to-end LSTM-based models that perform single-channel audiovisual speech enhancement and ph
