Showing 913 - 936 of 23807
Generating body movements based on given music audio recordings is an emerging research topic. This problem remains challenging particularly for string instruments, considering the fact that the…

In this highlight reel of her keynote, Ms Hoffman discusses the impact of technology on an evolving grid, and the safety perspective that is needed when looking at the many components involved.

657 views
Fusion of multi-channel representations has played a crucial role in the success of correlation filter (CF) based trackers. But, all channels do not contain useful information for target localization…
Time-based sampling of continuous-time signals is an alternative to Shannon's sampling paradigm in which the signal is encoded using a sequence of nonuniform time instants. The standard methods for…
In this paper, we describe a method to collect dialectal speech from YouTube videos to create a large-scale Dialect Identification (DID) dataset. Using this method, we collected dialectal Arabic from…
In recent years, the interest in kernel methods has increased exponentially, mainly due to applications including phenomena that cannot be well modeled by linear systems. Furthermore, the demand for…
A recent publication introduced the Directional Feedback Delay Network, a parametric artificial reverberation algorithm capable of producing direction-dependent energy decay. This method extends the…
Speaker extraction aims to separate a target speaker from multiple voices which is useful for applications, e.g. teleconference. In many practical cases, it has an opportunity to get a piece voice of…
Weighted canonical polyadic (CP) tensor decomposition appears in a wide range of applications. A typical situation where the weighted decomposition is needed is when some tensor elements are unknown…
Recently, Transformer has gained success in automatic speech recognition (ASR) field. However, it is challenging to deploy a Transformer-based end-to-end (E2E) model for online speech recognition. In…
Audio-Visual Speech Recognition (AVSR) faces the difficult task of exploiting acoustic and visual cues simultaneously. Augmenting speech with the visual channel creates its own challenges, e.g. every…
GNSS is widely used to provide positions in an absolute reference frame in Unmanned Aerial Vehicles (UAV) and Unmanned Ground Vehicles (UGV), where GNSS is merged with the information provided by…
Polyphonic sound event detection and direction-of-arrival estimation require different input features from audio signals. While sound event detection mainly relies on time-frequency patterns,…
Locating perceptually similar sound events within a continuous recording is a common task for various audio applications. However, current tools require users to manually listen to and label all the…
The bandwidth of a bandlimited signal is a key quantity that is relevant in numerous applications. For example, it determines the minimum sampling rate that is necessary to reconstruct a bandlimited…
Image segmentation is a ubiquitous step in almost any medical image study. Deep learning-based approaches achieve state-of-the-art in the majority of image segmentation benchmarks. However, end-to-…
Spatio-temporal event data are becoming increasingly commonplace in a wide variety of applications, such as electronic transaction records, social network data, and crime data. How to efficiently…
We propose a speech enhancement method using a causal deep neural network (DNN) for real-time applications. DNN has been widely used for estimating a time-frequency (T-F) mask which enhances a speech…