Showing 701 - 750 of 1951
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Empirical Study Of Transformer-Based Neural Language Model Adaptation
We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from s
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
One-Shot Parametric Audio Production Style Transfer With Application To Frequency Equalization
Audio production is a difficult process for many people, and properly manipulating sound to achieve a certain effect is non-trivial. In this paper, we present a method that facilitates this process by inferring appropriate audio effect parameters in order
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And Rnn-T Loss
In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences i
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Decentralized Min-Max Optimization: Formulations, Algorithms And Applications In Network Poisoning Attack
This paper discusses formulations and algorithms which allow a number of agents to collectively solve problems involving both (non-convex) minimization and (concave) maximization operations. These problems have a number of interesting applications in info
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Cross-Vae: Towards Disentangling Expression From Identity For Human Faces
Facial expression and identity are two independent yet intertwined components for representing a face. For facial expression recognition, identity can contaminate the training procedure by providing tangled but irrelevant information. In this paper, we pr
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Toso: Student's-T Distribution Aided One-Stage Orientation Target Detection In Remote Sensing Images
In this paper, a robust Student?s-T distribution aided One-Stage Orientation detector, namely TOSO, is proposed to address orientation target detection in remote sensing images. A one-stage keypoint based network architecture is used to avoid the complica
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Example Detection By Classification For Deep Speech Recognition
Machine Learning systems are vulnerable to adversarial attacks and will highly likely produce incorrect outputs under these attacks. There are white-box and black-box attacks regarding to adversary?s access level to the victim learning algorithm. To defen
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Aligntts: Efficient Feed-Forward Text-To-Speech System Without Explicit Alignment
Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Weakly Supervised Segmentation Guided Hand Pose Estimation During Interaction With Unknown Objects
Hand pose estimation is important for human computer interaction, but the performance is not satisfying when the hand is interacting with objects. To alleviate the influence of unknown objects, we propose a novel weakly supervised segmentation guided sche
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deep Geometric Knowledge Distillation With Graphs
In most cases deep learning architectures are trained disregarding the amount of operations and energy consumption. However, some applications, like embedded systems, can be resource-constrained during inference. A popular approach to reduce the size of a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Audio-Assisted Image Inpainting For Talking Faces
The goal of our work is to complete missing areas of images of talking faces, exploiting information from both the visual and audio modalities. Existing image inpainting methods rely solely on visual content that doesn?t always provide sufficient informat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improving Spoken Question Answering Using Contextualized Word Representation
While question answering (QA) systems have witnessed great breakthroughs in reading comprehension (RC) tasks, spoken question answering (SQA) is still a much less investigated area. Previous work shows that existing SQA systems are limited by catastrophic
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based Asr
The state-of-art methods for acoustic beamforming in multi-channel ASR is based on a neural mask estimator that attempts to learn the prediction of speech and noise using a paired corpus of clean and noisy recordings (teacher model). In this paper, we att
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Learning Noise Invariant Features Through Transfer Learning For Robust End-To-End Speech Recognition
End-to-end models yield impressive speech recognition results on clean datasets while having inferior performance on noisy datasets. To address this, we propose transfer learning from a clean dataset (WSJ) to a noisy dataset (CHiME-4) for connectionist te
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Prediction Of Vessel Trajectories From Ais Data Via Sequence-To-Sequence Recurrent Neural Networks
In this paper, we address the problem of predicting vessel trajectories based on Automatic Identification System (AIS) data. The goal is to learn the predictive distribution of maritime traffic patterns using historical data during the training phase, in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Confirmnet: Convolutional Firmnet And Application To Image Denoising And Inpainting
We address the problem of efficient convolutional sparse coding (CSC) and develop a non-convex-penalty-regularized CSC formulation, namely, minimax-concave CSC (MC2SC). MC2SC leads to an optimal sparse representation than the standard ell_1-penalty based
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
On The Stability Of Polynomial Spectral Graph Filters
Spectral graph filters are a key component in state-of-the-art machine learning models used for graph-based learning, such as graph neural networks. For certain tasks stability of the spectral graph filters is important for learning suitable representatio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Method For Millimeter-Wave Imaging Of Concealed Objects Via De-Aliasing
We consider the problem of millimeter-wave (MMW) imaging for concealed objects using a transceiver antenna array. In practical implementations, larger array element spacing leads to aliasing in the spectrum of the received echo signals. In this paper, we
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Key Action And Joint Ctc-Attention Based Sign Language Recognition
Sign Language Recognition (SLR) translates sign language video into natural language. In practice, sign language video, owning a large number of redundant frames, is necessary to be selected the essential. However, unlike common video that describes actio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Stochastic Ml Estimation For Hyperspectral Unmixing Under Endmember Variability And Nonlinear Models
Hyperspectral unmixing (HU) is a problem of blindly identifying the underlying materials, in form of spectral signatures, in the captured hyperspectral image. HU has received tremendous interest in remote sensing, and fundamentally the problem can be rega
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Uncertainty Quantification For Remaining Useful Lifetime Prediction With Multi-Channel Sensory Data
For remaining useful lifetime (RUL) prediction with multi-channel sensory data, long-term prediction has more uncertainty than short-term prediction. In this paper, the ratio of mean to variance was considered to measure the uncertainty propagation rate (
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Limitations Of Weak Labels For Embedding And Tagging
Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Graphem: Em Algorithm For Blind Kalman Filtering Under Graphical Sparsity Constraints
Modeling and inference with multivariate sequences is central in a number of signal processing applications such as acoustics, social network analysis, biomedical, and finance, to name a few. The linear-Gaussian state-space model is a common way to descri
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Deep Neural Network-Driven Feature Learning Method For Polyphonic Acoustic Event Detection From Real-Life Recordings
In this paper, a Deep Neural Network (DNN)-driven feature learning method for polyphonic Acoustic Event Detection (AED) is proposed. The proposed DNN is a combination of different layers used to characterize multiple overlapped acoustic events in the mixt
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Swift-Link: A Compressive Beam Alignment Algorithm For Practical Mmwave Radios
Millimeter wave (mmWave) bands offer a large amount of spectrum that can support many high data rate applications. To efficiently use the spectrum at mmWave, the wireless link between the transmitting and receiving radios must be configured properly. Comp
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Sylnet: An Adaptable End-To-End Syllable Count Estimator For Speech
Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learn
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Curriculum Learning For Speech Emotion Recognition From Crowdsourced Labels
This study introduces a method to design a curriculum for machine-learning to maximize the efficiency during the training process of deep neural networks (DNNs) for speech emotion recognition. Previous studies in other machine-learning problems have shown
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Title: Cognitive Joint Mimo Radar Mimo Communications (C-Mrmc) Prototype
Cognitive radars optimize both transmit and receive processing to adjust to the dynamic target environment while aiming to enhance their behavioral agility by learning through experience in sensing and processing. In classical radar systems, the adaptabil
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Secure Face Recognition In Edge And Cloud Networks: From The Ensemble Learning Perspective
Offloading the computationally intensive workloads to the edge and cloud not only improves the quality of computation, but also creates an extra degree of diversity by collecting information from devices in service, which, in turn, has raised significant
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Auditory Model Based Subsetting Of Head-Related Transfer Function Datasets
The rising availability of public head-related transfer function (HRTF) data, measured on hundreds of different individuals, offers a user the possibility to select the best matching non-individual HRTF from a wide catalogue. To this end, reducing the num
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Xceptiontime: Independent Time-Window Xceptiontime Architecture For Hand Gesture Classification
Capitalizing on the need for addressing the existing challenges associated with gesture recognition via sparse multichannel surface Electromyography (sEMG) signals, the paper proposes a novel deep learning model, referred to as the XceptionTime architectu
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Color And Angular Reconstruction Of Light Fields From Incomplete-Color Coded Projections
We present a simple variational approach for reconstructing color light fields (LFs) in the compressed sensing (CS) framework with very low sampling ratio, using both coded masks and color filter arrays (CFAs). A coded mask is placed in front of the camer
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Low-Complexity 5G Slam With Ckf-Phd Filter
In 5G mmWave, simultaneous localization and mapping (SLAM) allows devices to exploit map information to improve their position estimate. Even the most basic SLAM filter based on a Rao-Blackwellized particle filter (RBPF) combined with a probability hypoth
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Risk Convergence Of Centered Kernel Ridge Regression With Large Dimensional Data
This paper carries out a large dimensional analysis of a variation of kernel ridge regression that we call centered kernel ridge regression (CKRR), also known in the literature as kernel ridge regression with offset. This modified technique is obtained by
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deblurring And Super-Resolution Using Deep Gated Fusion Attention Networks For Face Images
Image deblurring and super-resolution are very important in image processing such as face verification. However, when in the outdoors, we often get blurry and low resolution images. To solve the problem, we propose a deep gated fusion attention network (D
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Libri-Light: A Benchmark For Asr With Limited Or No Supervision
This paper introduces a new corpus of English speech suitable for training speech recognition systems under limited or no supervision. It is derived from open source audio books in the LibriVox project and governmental speech recordings and contain over 7
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Dgan: Disentangled Representation Learning For Anisotropic Brdf Reconstruction
Accurate reconstruction of real-world materials' appearance from a very limited number of samples is still a huge challenge in computer vision and graphics. In this paper, we present a novel deep architecture, Disentangled Generative Adversarial Network (
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Hybrid Deep-Semantic Matrix Factorization For Tag-Aware Personalized Recommendation
Matrix factorization has now become a dominant solution for personalized recommendation on the Social Web. To alleviate the cold start problem, previous approaches have incorporated various additional sources of information into traditional matrix factori
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Primary Path Estimator Based On Individual Secondary Path For Anc Headphones
Active noise cancellation (ANC) technology is a valuable asset for hearables. For a well performing and robust ANC system precise knowledge of the relevant acoustic paths is vital. It is feasible to individually measure the user's secondary path by using
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Particle Filtering On The Complex Stiefel Manifold With Application To Subspace Tracking
In this paper, we extend previous particle filtering methods whose states were constrained to the (real) Stiefel manifold to the complex case. The method is then applied to a Bayesian formulation of the subspace tracking problem. To implement the proposed
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Constrained Maximum Likelihood Estimator Of Speech And Noise Spectra With Application To Multi-Microphone Noise Reduction
One of the challenges with the implementation of multi-microphone noise reduction systems in practical applications lies in the need for the knowledge of the speech and noise covariance matrices. Recently, a method based on Maximum Likelihood (ML) estimat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Self-Paced Probabilistic Principal Component Analysis For Data With Outliers
Principal Component Analysis (PCA) is a popular tool for dimension reduction and feature extraction in data analysis. Probabilistic PCA (PPCA) extends the standard PCA by using a probabilistic model. However, both standard PCA and PPCA are not robust, as
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Paco And Paco-Dct: Patch Consensus And Its Application To Inpainting
Many signal processing methods break the target signal into overlapping patches, process them separately, and then stitch them back to produce an output. How to merge the resulting patches at the overlaps is central to such methods. We propose a novel fra
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Embedded Large–Scale Handwritten Chinese Character Recognition
As handwriting input becomes more prevalent, the large symbol inventory required to support Chinese handwriting recognition poses unique challenges. This paper describes how the Apple deep learning recognition system can accurately handle up to 30,000 Chi
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Dnn-Supported Mask-Based Convolutional Beamforming For Simultaneous Denoising, Dereverberation, And Source Separation
In this article, we investigate an integrated mask-based convolutional beamforming method for performing simultaneous denoising, dereverberation, and source separation. Conventionally, it is dif?cult for neural network-supported mask-based source separati
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Hijacking Tracker: A Powerful Adversarial Attack On Visual Tracking
Visual object tracking has made important breakthroughs with the assistance of deep learning models. Unfortunately, recent research has clearly proved that deep learning models are vulnerable to malicious adversarial attacks, which mislead the models maki
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Mockingjay: Unsupervised Speech Representation Learning With Deep Bidirectional Transformer Encoders
We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech. Previous speech representation methods learn through conditioning on past frames and pr