Showing 951 - 1000 of 1951
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Speaker And Multi-Domain Emotional Voice Conversion Using Factorized Hierarchical Variational Autoencoder
Due to the complexity of emotional features, there has been limited success in emotional voice conversion. One major challenge is that conversion between more than two kinds of emotions often accompanies distortion of voice signal. The factorized hierarch
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Real-Time Binaural Speech Separation With Preserved Spatial Cues
Deep learning speech separation algorithms have achieved great success in improving the quality and intelligibility of separated speech from mixed audio. Most previous methods focused on generating a single-channel output for each of the target speakers,
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
What Makes The Sound?: A Dual-Modality Interacting Network For Audio-Visual Event Localization
The presence of auditory and visual senses enables humans to obtain a profound understanding of the real-world scenes. While audio and visual signals are capable of providing scene knowledge individually, the combination of both offers a better insight ab
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Exploiting Periodicity Features For Joint Detection And Doa Estimation Of Speech Sources Using Convolutional Neural Networks
While many algorithms deal with direction of arrival (DOA) estimation and voice activity detection (VAD) as two separate tasks, only a small number of data-driven methods have addressed these two tasks jointly. In this paper, a multi-input single-output c
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Patch Aggregation Models For Resampling Detection
Images captured nowadays are of varying dimensions with smartphones and DSLR?s allowing users to choose from a list of available image resolutions. It is therefore imperative for forensic algorithms such as resampling detection to scale well for images of
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Partial Relaxation Doa Estimator Based On Orthogonal Matching Pursuit
A family of computationally efficient DOA estimators under the partial relaxation framework has recently been proposed. In this framework, the manifold structure of the ``interfering" signals is relaxed, and only the manifold structure of one desired sign
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Eeg Feature Selection Using Orthogonal Regression: Application To Emotion Recognition
A common drawback of the EEG applications is that the volume conduction of human head leads to lots of redundant information in EEG recordings. To reduce the redundancy and choose informative EEG features, in this paper, we propose an EEG feature selectio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Joint Source-Channel Coding And Bayesian Message Passing Detection For Grant-Free Radio Access In Iot
Consider an Internet-of-Things (IoT) system that monitors a number of multi-valued events through multiple sensors sharing the same bandwidth. Each sensor measures data correlated to one or more events, and communicates to the fusion center at a base stat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Generating And Protecting Against Adversarial Attacks For Deep Speech-Based Emotion Recognition Models
The development of deep learning models for speech emotion recognition has become a popular area of research. Adversarially generated data can cause false predictions, and in an endeavor to ensure model robustness, defense methods against such attacks sho
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Generalized Coherence-Based Signal Enhancement
This contribution presents a novel approach for coherence-based signal enhancement. An estimator for the coherent-to-diffuse ratio (CDR) is devised, which exploits the concept of generalized magnitude coherence and thus, unlike common state-of-the-art sch
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Self-Supervised Denoising Autoencoder With Linear Regression Decoder For Speech Enhancement
Nonlinear spectral mapping-based models based on supervised learning have successfully applied for speech enhancement. However, as supervised learning approaches, a large amount of labelled data (noisy-clean speech pairs) should be provided to train those
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Dysarthric Speech Recognition With Lattice-Free Mmi
Recognising dysarthric speech is a challenging problem as it differs in many aspects from typical speech, such as speaking rate and pronunciation. In the literature the focus so far has largely been on handling these variabilities in the framework of HMM/
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Towards Unsupervised Speech Recognition And Synthesis With Quantized Speech Representation Learning
In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances. This is achieved by proper
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Error Analysis Applied To End-To-End Spoken Language Understanding
This paper presents a qualitative study of errors produced by a end-to-end spoken language understanding (SLU) system (speech signal to concepts) that reaches state of the art performance. Different studies are proposed to better understand the weaknesses
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Bert Is Not All You Need For Commonsense Inference
This paper studies the task of commonsense inference, especially natural language inference (NLI) and causal inference (CI), requiring knowledge beyond what is stated in the input sentences. State-of-the-arts have been neural models powered with knowledge
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Recursive Edge Detector For Color Filter Array Image
Most of embedded cameras use a single sensor to capture images through a color filter. They produce special images with only one color component per pixel. Missing data are usually estimated through a demosaicking process, but this takes undesirable compu
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Learning Fractional Orthogonal Latent Consistent Features For Face Hallucination And Recognition
Face hallucination (FH) is a powerful technique to reconstruct high-resolution (HR) faces from low-resolution (LR) faces. Most of conventional FH techniques ignore the influence of small training data, which may lead to the bias of variance and covariance
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Track-Before-Detect For Sub-Nyquist Radar
Sub-Nyquist radars require fewer measurements, facilitating low-cost design, flexible resource allocation, etc. By applying compressed sensing (CS) method, such radars achieve close performance to traditional Nyquist radars. However in low signal-to-noise
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Performance Comparison Of Lossless Compression Strategies For Dynamic Vision Sensor Data
Dynamic Vision Sensors (DVS) are emerging neuromorphic visual capturing devices, with great advantages in terms of low power consumption, wide dynamic range, and high temporal resolution in diverse applications. The capturing method results in lower data
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Independent-Variation Matrix Factorization With Application To Energy Disaggregation
Matrix factorization techniques have been recently applied to Non Intrusive Load Monitoring (NILM), the process of breaking down the total electric consumption of a building into consumptions of individual appliances. While several studies addressed the N
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Non-Iterative Subspace-Based Doa Estimation In The Presence Of Nonuniform Noise
The uniform white noise assumption is one of the basic assumptions in most of the existing direction-of-arrival (DOA) estimation methods. In many applications, however, the nonuniform white noise model is more adequate. Then, the noise variances at differ
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Demo Proposal For Data-Driven Symbol Detection And Online Channel Tracking Via Model-Based Machine Learning
Recent years have witnessed a dramatically growing interest in machine learning (ML). These data-driven methods have demonstrated an unprecedented success in various applications. The benefits of ML over traditional model-based approaches are twofold: Fir
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Is Your Hearing Aid Algorithm Really Working?
When developing new hearing aid algorithms, it is difficult to systematically evaluate their performance in realistic every-day life situations. Laboratory setups and computer simulations are usually limited to very controlled and simple acoustic situatio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Improved Solution To The Frequency-Invariant Beamforming With Concentric Circular Microphone Arrays
Frequency-invariant beamforming with circular microphone arrays (CMAs) has drawn a significant amount of attention for its steering flexibility and high directivity. However, frequency-invariant beamforming with CMAs often suffers from the so-called null
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Bridging Mixture Density Networks With Meta-Learning For Automatic Speaker Identification
Speaker identification answers the fundamental question "Who is speaking?" The identification technology enables downstream applications to provide a personalized experience. Both the prevalent i-vector based solutions and deep learning solutions usually
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Age Of Information With Finite Horizon And Partial Updates
A resource-constrained system monitors a source of information by requesting a finite number of updates subject to random transmission delays. An a priori fixed update request policy is shown to minimize a polynomial penalty function of the age of informa
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Complexity Reduction Methods For Index Modulation Based Dual-Function Radar Communication Systems
Dual-function radar communication (DFRC) systems implement both sensing and communication using the same hardware. An emerging DFRC strategy embeds transmission of digital messages into agility-based radar schemes in the form of index modulation (IM). Thi
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deep Monocular Video Depth Estimation Using Temporal Attention
Monocular video depth estimation (MVDE) plays a crucial role in 3D computer vision. In this paper, we propose an end-to-end monocular video depth estimation network based on temporal attention. Our network starts by a motion compensation module where the
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fully Learnable Front-End For Multi-Channel Acoustic Modeling Using Semi-Supervised Learning
In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a sim
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
T-Gsa: Transformer With Gaussian-Weighted Self-Attention For Speech Enhancement
Transformer neural networks (TNN) demonstrated state-of-art performance on many natural language processing (NLP) tasks, replacing recurrent neural networks (RNNs), such as LSTMs or GRUs. However, TNNs did not perform well in speech enhancement, whose con
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adaptation And Learning In Multi-Task Decision Systems
Adaptation and learning over multi-agent networks is a topic of great relevance with important implications. Elaborating on previous works on single-task networks engaged in decision problems, here we consider the multi-task version in the challenging sce
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Colour Compression Of Plenoptic Point Clouds Using Raht-Klt With Prior Colour Clustering And Specular/Diffuse Component Separation
The recently introduced plenoptic point cloud representation marries a 3D point cloud with a light field. Instead of each point being associated with a single colour value, there can be multiple values to represent the colour at that point as perceived fr
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Self-Supervised Learning For Ecg-Based Emotion Recognition
We present an electrocardiogram (ECG) -based emotion recognition system using self-supervised learning. Our proposed architecture consists of two main networks, a signal transformation recognition network and an emotion recognition network. First, unlabel
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deep Learning Abilities To Classify Intricate Variations In Temporal Dynamics Of Multivariate Time Series
The aim of this work is to investigate the ability of deep learning (DL) architectures to learn temporal dynamics in multivariate time series. The methodology consists in using well known synthetic stochastic processes for which changes in joint temporal
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Attention Enhanced Multi-Task Model For Objective Speech Assessment In Real-World Environments
Computational objective metrics that use reference signals have been shown to be effective forms of speech assessment in simulated environments, since they are correlated with subjective listening studies. Recent efforts have been dedicated towards effect
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Early Termination Scheme For Successive Cancellation List Decoding Of Polar Codes
In order to minimize the decoding period and the response time for Polar Codes, an early termination (ET) scheme based on additional check points (ACPs) is proposed in this work. For conventional ET schemes based on distributed parity-check (PC) bits, ET
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Clcnet: Deep Learning-Based Noise Reduction For Hearing Aids Using Complex Linear Coding
Noise reduction is an important part of modern hearing aids and is included in most commercially available devices. Deep learning-based state-of-the-art algorithms, however, either do not consider real-time and frequency resolution constrains or result in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Translation Of A Higher Order Ambisonics Sound Scene Based On Parametric Decomposition
This paper presents a novel 3DoF+ system that allows to navigate, i.e., change position, in scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics recording. It is one of the first such systems based on sound capturing at a s
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Leveraging Gans To Improve Continuous Path Keyboard Input Models
Continuous path keyboard input has higher inherent ambiguity than standard tapping, because the path trace may exhibit not only local overshoots/undershoots (as in tapping) but also, depending on the user, substantial mid-path excursions. Deploying a robu
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Scalable Learning-Based Sampling Optimization For Compressive Dynamic Mri
Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reco
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Design Of A Convergence-Aware Based Expectation Propagation Algorithm For Uplink Mimo Scma Systems
Sparse code multiple access (SCMA) uses multi-dimensional sparse codewords to transmit user data. The expectation propagation algorithm (EPA) exploiting the sparse property shows linear complexity growth and thus is preferred for multi-user detection. To
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Enhancement Of Coded Speech Using A Mask-Based Post-Filter
The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven postfilter relying on masking in the time-frequency doma
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Diagonalizable Shift And Filters For Directed Graphs Based On The Jordan-Chevalley Decomposition
Graph signal processing on directed graphs poses theoretical challenges since an eigendecomposition of filters is in general not available. Instead, Fourier analysis requires a Jordan decomposition and the frequency response is given by the Jordan normal
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A-Crnn: A Domain Adaptation Model For Sound Event Detection
This paper presents a domain adaptation model for sound event detection. A common challenge for sound event detection is how to deal with the mismatch among different datasets. Typically, the performance of a model will decrease if it is tested on a datas
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Efficient Multichannel Nonlinear Acoustic Echo Cancellation Based On A Cooperative Strategy
While a common approach to address nonlinear distortions, emitted by multiple loudspeakers and observed by multiple microphones, is to use post-filtering techniques, this paper proposes a cooperative strategy to rather model and then cancel such distortio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Using Speech Synthesis To Train End-To-End Spoken Language Understanding Models
End-to-end models are an attractive new approach to spoken language understanding (SLU) in which the meaning of an utterance is inferred directly from the raw audio, without employing the standard pipeline composed of a separately trained speech recognize
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Efficient Deep Learning-Based Lossy Image Compression Via Asymmetric Autoencoder And Pruning
Recently, deep learning-based lossy image compression methods have been proposed. However, their efficiency in terms of storage and computational costs has not been addressed adequately. In this paper, we propose efficient lossy image compression methods
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Time-Scale Synthesis For Locally Stationary Signals
We develop a time-scale synthesis-based probabilistic approach for the modeling of locally stationary signals. Inspired by our previous work, the model involves zero-mean, complex Gaussian wavelet coefficients, whose distribution varies as a function of t