Showing 1651 - 1700 of 1951
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis
This paper proposes a hierarchical, fine-grained and interpretable latent variable model for prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution modeling of prosody by conditioning finer level representations on coarser leve
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Real-Time, Universal, And Robust Adversarial Attacks Against Speaker Recognition Systems
As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition system has emerged as an important medium of identifying a speaker in many security-required applications and services. In this paper, we propose the first real-
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Simple But Effective Bert Model For Dialog State Tracking On Resource-Limited Systems
In a task-oriented dialog system, the goal of dialog state tracking (DST) is to monitor the state of the conversation from the dialog history. Recently, many deep learning based methods have been proposed for the task. Despite their impressive performance
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Speaker-Aware Target Speaker Enhancement By Jointly Learning With Speaker Embedding Extraction
Deep learning based speech separation approaches have received great interest, among which the recent speaker-aware speech enhancement methods are promising for solving difficulties such as arbitrary source permutation and unknown number of sources. In th
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Wawenets: A No-Reference Convolutional Waveform-Based Approach To Estimating Narrowband And Wideband Speech Quality
Building on prior work we have developed a no-reference (NR) waveform-based convolutional neural network (CNN) architecture that can accurately estimate speech quality or intelligibility of narrowband and wideband speech segments. These Wideband Audio Wav
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Automatic Fluency Evaluation Of Spontaneous Speech Using Disfluency-Based Features
This paper describes an automatic fluency evaluation of spontaneous speech. Although we regularly observe a variety of different disfluencies in spontaneous speech, we focus on two types of phenomena, i.e., filled pauses and word fragments. This paper aim
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Time-Frequency Network With Channel Attention And Non-Local Modules For Artificial Bandwidth Extension
Convolution neural networks (CNNs) have been achieving increasing attention for the artificial bandwidth extension (ABE) task recently. However, these methods use the flipped low-frequency phase to reconstruct speech signals, which may lead to the well-kn
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Interpretable Self-Attention Temporal Reasoning For Driving Behavior Understanding
Performing driving behaviors based on causal reasoning is essential to ensure driving safety. In this work, we investigated how state-of-the-art 3D Convolutional Neural Networks (CNNs) perform on classifying driving behaviors based on causal reasoning. We
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Polarizing Front Ends For Robust Cnns
The vulnerability of deep neural networks to small, adversarially designed perturbations can be attributed to their ?excessive linearity.? In this paper, we propose a bottom-up strategy for attenuating adversarial perturbations using a nonlinear front end
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Novel Rank Selection Scheme In Tensor Ring Decomposition Based On Reinforcement Learning For Deep Neural Networks
Tensor decomposition has been proved to be effective for solving many problems in signal processing and machine learning. Recently, tensor decomposition finds its advantage for compressing deep neural networks. In many applications of deep neural networks
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Voice Based Classification Of Patients With Amyotrophic Lateral Sclerosis, Parkinson's Disease And Healthy Controls With Cnn-Lstm Using Transfer Learning
In this paper, we consider 2-class and 3-class classification problems for classifying patients with Amyotrophic Lateral Sclerosis (ALS), Parkinson?s Disease (PD), and Healthy Controls (HC) using a CNN-LSTM network. Classification performance is examined
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Automatic Event Detection Of Rem Sleep Without Atonia From Polysomnography Signals Using Deep Neural Networks
Rapid eye movement (REM) sleep behavior disorder (RBD) is a sleep disorder that features loss of atonia, or REM sleep without atonia (RSWA). RBD and RSWA are early manifestations of degenerative neurological diseases such as Parkinson's and Lewy Body Deme
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Mahalanobis Distance Based Adversarial Network For Anomaly Detection
Anomaly detection techniques are very crucial in multiple business applications, such as cyber security, manufacturing and finance. However, developing anomaly detection methods for high-dimensional data with high speed and good performance is still a cha
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Time-Domain Neural Network Approach For Speech Bandwidth Extension
In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-freq
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Exploiting Two-Dimensional Symmetry And Unimodality For Model-Free Source Localization In Harsh Environment
Knowing the location of a transceiver may enable advanced radio resource management strategies in sensing and communication networks. However, there are many scenarios where users operate in a non-cooperative mode with no localization-dedicated signaling
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Head Attention For Speech Emotion Recognition With Auxiliary Learning Of Gender Recognition
The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Generalized Graph Spectral Sampling With Stochastic Priors
We consider generalized sampling for stochastic graph signals. The generalized graph sampling framework allows recovery of graph signals beyond the bandlimited setting by placing a correction filter between the sampling and reconstruction operators and as
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
View-Angle Invariant Object Monitoring Without Image Registration
Object monitoring can be performed by change detection algorithms. However, for the image pair with a large perspective difference, the change detection performance is usually impacted by inaccurate image registration. To address the above difficulties, a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Spoken Language Acquisition Based On Reinforcement Learning And Word Unit Segmentation
The process of spoken language acquisition has been one of the topics which attract the greatest interesting from linguists for decades. By utilizing modern machine learning techniques, we simulated this process on computers, which helps to understand the
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Investigation Of Methods To Improve The Recognition Performance Of Tamil-English Code-Switched Data In Transformer Framework
Code-switching (CS) refers to (inter/intra-word) switching between multiple languages in a single conversation. In multilingual countries like India, CS occurs very often in everyday speech, resulting in a new breed of languages in urban regions like Hing
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Video Deblurring Via 3D Cnn And Fourier Accumulation Learning
Camera shake and target movement often leads to undesirable image blurring in videos. How to exploit spatial-temporal information of adjacent frames and reduce the processing time of deblurring are two major issues in video deblurring. In this paper, we p
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Hydranet: A Real-Time Waveform Separation Network
Real-time source separation has become increasingly important, as more and more applications, such as voice recognition and voice commands, require clean audio input in noisy environments. Recent developments in deep learning have allowed models to direct
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Computability Of The Peak Value Of Bandlimited Signals
In this paper we study the peak value problem, i.e., the task of computing the peak value of a bandlimited signal from its samples. The peak value problem is important, for example, in communications, where the peak value of the transmit signal has to be
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Meta-Learning Extractors For Music Source Separation
We propose a hierarchical meta-learning-inspired model for music source separation (Meta-TasNet) in which a generator model is used to predict the weights of individual extractor models. This enables efficient parameter-sharing, while still allowing for i
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Unsupervised Variational Bayesian Kalman Filtering For Large-Dimensional Gaussian Systems
This paper considers the unsupervised filtering problem for large-dimensional linear and Gaussian systems, a setup in which the optimal Kalman filter (KF) might not be usable due to the exorbitant computational cost and storage requirements. For this prob
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Cross-Stained Segmentation From Renal Biopsy Images Using Multi-Level Adversarial Learning
Segmentation from renal pathological images is a key step in automatic analyzing the renal histological characteristics. However, the performance of models varies significantly in different types of stained datasets due to the appearance variations. In th
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Conditioning And Data Augmentation Using Generative Noise Model For Speech Emotion Recognition In Noisy Conditions
Degradation due to additive noise is a significant road block in the real-life deployment of Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt with the noise degradation either at the signal or at the feature level. I
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
3D Unknown View Tomography Via Rotation Invariants
In this paper, we study the problem of reconstructing a 3D point source model from a set of 2D projections at unknown view angles. Our method obviates the need to recover the projection angles by extracting a set of rotation-invariant features from the no
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Hierarchical Sequence Representation With Graph Network
Video classification problem is a challenging task in computer vision. The performance of this task is highly relied on the scale of training data and the effectiveness of video embedding via a robust embedding network. Unsupervised solutions such as feat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
High-Resolution Attention Network With Acoustic Segment Model For Acoustic Scene Classification
[2 Videos ]
The spectral information of acoustic scenes is diverse and complex, which poses challenges for acoustic scene tasks. To improve the classification performance, a variety of convolutional neural networks (CNNs) are proposed to extract richer semantic infor
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
The Fifthnet Chroma Extractor
Deep Learning (DL) is now commonly used in music processing such as Automatic Chord Recognition (ACR), with Convolutional Neural Networks (CNN) being popular in such tasks. Compression of CNNs has become a research topic of interest, focussed on post-prun
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Projected Weight Regularization To Improve Neural Network Generalization
Generalization of a deep neural network (DNN) is one major concern when employing the deep learning approach for solving practical problems. In this paper we propose a new technique, named projected weight regularization (PWR), to improve the generalizati
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Acoustic Modelling Based Remote Error Sensing Approach For Quiet Zone Generation In A Noisy Environment
Remote error sensing is required in active noise control systems when they are used to create a quiet zone in a noisy environment with the constraint that the error microphones cannot be inside the zone. The challenge in remote error sensing is to estimat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Performance Analysis And Constellation Optimization Of Star-Qam-Aided Differential Faster-Than-Nyquist Signaling
In this letter, motivated by the recent differential faster-than-Nyquist (DFTN) signaling concept, we propose an improved 16-point double-ring star quadrature amplitude modulation (QAM)-aided DFTN signaling transmission, which allows us to attain a higher
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Direction Of Arrival Estimation For Reverberant Speech Based On Enhanced Decomposition Of The Direct Sound
Direction of arrival (DOA) estimation for speech sources is an important task in audio signal processing. This task becomes a challenge in reverberant environments, which are typical to real scenarios. Several DOA estimation methods for speech sources hav
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Realistic Real-Time Voice Swapping From Single Unpaired Sentences
We demonstrate a system that allows two speakers to swap their voices from any two unpaired sentences such that the result is indistinguishable from real voices and performed in real-time on a laptop. Each of the two speakers takes turns pronouncing any u
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Machine Learning-Based Adaptive Receive Filtering: Proof-Of-Concept On An Sdr Platform
The constant demand for low latency and high data rates in a modern mobile communications network creates new scientific challenges in each new generation. An accurate reconstruction of transmission data of as many users as possible at the base station is
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Empirical Study On Acoustic Feedback Path Across Hearing Aid Users
Acoustic feedback is one of the major problems in hearing aid applications. During a fitting session of a modern hearing aid, typically a feedback path prediction or an in situ measurement of feedback path is used as part of the gain and earpiece prescrip
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fast Block-Sparse Estimation For Vector Networks
While there is now a significant literature on sparse inverse covariance estimation, all that literature, with only a couple of exceptions, has dealt only with univariate (or scalar) networks where each node carries a univariate signal. However in many, p
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
On Modeling Asr Word Confidence
We present a new method for computing ASR word confidences that effectively mitigates the effect of ASR errors for diverse downstream applications, improves the word error rate of the 1-best result, and allows better comparison of scores across different
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improving Reverberant Speech Training Using Diffuse Acoustic Simulation
We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks. Our physically-based acoustic simulation method is capable of modeling occlusion, specular a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Single-Shot Real-Time Multiple-Path Time-Of-Flight Depth Imaging For Multi-Aperture And Macro-Pixel Sensors
Multiple-Path Interference (MPI) is a major drawback of Time-of-Flight (ToF) sensors. MPI occurs when a ToF pixel receives more than a single light bounce from the scene. Current methods resolving more than a single return per pixel rely on the sequential
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Design-Gan: Cross-Category Fashion Translation Driven By Landmark Attention
The rise of generative adversarial networks has boosted a vast interest in the field of fashion image-to-image translation. However, previous methods do not perform well in cross-category translation tasks, e.g., translating jeans to skirts in fashion ima
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Near-Optimal Interference Exploitation 1-Bit Massive Mimo Precoding Via Partial Branch-And-Bound
In this paper, we focus on 1-bit precoding for large-scale antenna systems in the downlink based on the concept of constructive interference (CI). By formulating the optimization problem that aims to maximize the CI effect subject to the 1-bit constraint
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Theoretical Analysis Of Multi-Carrier Agile Phased Array Radar
Modern radar systems are expected to operate reliably in congested environments under cost and power constraints. A recent technology for realizing such systems is frequency agile radar (FAR), which transmits narrowband pulses in a frequency hopping manne
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Bin Encoding Training Of A Spiking Neural Network Based Voice Activity Detection
Advances of deep learning for Artificial Neural Networks(ANNs) have led to significant improvements in the performance of digital signal processing systems implemented on digital chips. Although recent progress in low-power chips is remarkable, neuromorph