Showing 769 - 792 of 23807
We propose a speech enhancement method using a causal deep neural network (DNN) for real-time applications. DNN has been widely used for estimating a time-frequency (T-F) mask which enhances a speech…
Distributed automatic speech recognition (ASR) requires to aggregate outputs of distributed deep neural network (DNN)-based models. This work studies the use of submodular functions to design a rank…
There has been growing interest in developing neural network based automatic target recognition systems for synthetic aperture radar applications. However, these networks are typically complex in…
This study proposes a bi-directional recurrent neural network (Bi-RNN) post-processing method for speech enhancement (SE) at low signal-to noise ratios (SNR). Current speech enhancement solutions…
In recent years, Siamese trackers have achieved great success in visual tracking. Siamese networks can achieve competitive performance in both accuracy and speed. However, they may suffer from the…
Long-distance neuronal communication in the brain is enabled by the interactions across various oscillatory frequencies. One interaction that is gaining importance during cognitive brain functions is…
1 views
Acoustic cues are not the only component in speech communication; if the visual counterpart is present, it is shown to benefit speech comprehension. In this work, we propose an end-to-end (no pre- or…
Discriminative models for source separation have recently been shown to produce impressive results. However, when operating on sources outside of the training set, these models can not perform as…
Due to its ability to visualize and measure the dynamics of vocal tract shaping during speech production, real-time magnetic resonance imaging (rtMRI) has emerged as one of the prominent research…
The gradual adaptation and possibility of divergence have been the two main obstacles in the efficient implementation of conventional adaptive active noise control (ANC) to a wider range of…
In this paper, we study the realization of any given fully-digital precoder (FDP) by hybrid analog/digital precoding (HADP) in wide-band mmWave systems. We first formulate the massive-MIMO OFDM-based…

…

71 views
Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for…
Class imbalance in the training data hinders the generalization ability of machine listening systems. In the context of bioacoustics, this issue may be circumvented by aggregating species labels into…
Accented speech poses significant challenges for state-of-the-art automatic speech recognition (ASR) systems. Accent is a property of speech that lasts throughout an utterance in varying degrees of…
Free-form Jacobian of Reversible Dynamics(FFJORD) is a flow-based invertible generative model defined by ordinary differential equations (ODE). Inspired by WaveGlow, in this paper, we propose…
Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined…
Singing voice synthesis is a generative task that involves not only multidimensional controls of a singer model such as phonetic modulation by lyrics and pitch control by music score but also…
Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-…
2 views
In manufacturing, the monitoring of the fabrication process is crucial in order to be sure that objects are compliant. For nano-objects, most of this monitoring is done manually. In this paper, we…

…

94 views
Convolution neural networks (CNNs) have been achieving increasing attention for the artificial bandwidth extension (ABE) task recently. However, these methods use the flipped low-frequency phase to…
IEEE SSCI 2013 Day 3 Prof Wlodzislaw Duch
91 views