Showing 601 - 650 of 1951
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Using Personalized Speech Synthesis And Neural Language Generator For Rapid Speaker Adaptation
We propose to use the personalized speech synthesis and the neural language generator to synthesize content relevant personalized speech for rapid speaker adaptation. It has two distinct aspects: First, it relieves the general data sparsity issue in rapid
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Signal Sensing And Reconstruction Paradigms For A Novel Multi-Source Static Computed Tomography System
Conventional Computed Tomography (CT) systems use a single X-ray source and an arc of detectors mounted on a rotating gantry to acquire a set of projection data. Novel CT systems are now being pioneered in which a complete ring of distributed X-ray source
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Computation Of "Best" Interpolants In The Lp Sense
We study a variant of the interpolation problem where the continuously defined solution is regularized by minimizing the Lp-norm of its second-order derivative. For this continuous-domain problem, we propose an exact discretization scheme that restricts t
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fast Intent Classification For Spoken Language Understanding Systems
Spoken Language Understanding (SLU) systems consist of several machine learning components operating together (e.g. intent classification, named entity resolution and recognition). Deep learning models have obtained state of the art results on several of
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Exploiting Channel Locality For Adaptive Massive Mimo Signal Detection
We propose MMNet, a deep learning MIMO detection scheme that significantly outperforms existing approaches on realistic channels with the same or lower computational complexity. MMNet?s design builds on the theory of iterative soft-thresholding algorithms
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Hybrid Model For Bipolar Disorder Classification From Visual Information
Bipolar Disorder (BD) is one of the most prevalent mental illnesses in the world. It has a negative impact on people?s social and personal functions. The principal indicator of BD is the extreme swing in the mood ranging from manic to depressive states. T
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Densely Connected Neural Network With Dilated Convolutions For Real-Time Speech Enhancement In The Time Domain
In this work, we propose a fully convolutional neural network for real-time speech enhancement in the time domain. The proposed network is an encoder-decoder based architecture with skip connections. The layers in the encoder and the decoder are followed
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Person Identification Using Deep Convolutional Neural Networks On Short-Term Signals From Wearable Sensors
In this work, we explore the discriminating ability of short-term signal patterns (e.g. few minutes long) with respect to the person identification task. We focus on signals recorded by simple wearable devices, such as smartwatches, which can measure move
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multimodal Learning For Classroom Activity Detection
Classroom activity detection (CAD) focuses on accurately classifying whether the teacher or student is speaking and recording both the length of individual utterances during a class. A CAD solution helps teachers get instant feedback on their pedagogical
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
The objective of this paper is to learn representations of speaker identity without access to manually annotated data. To do so, we develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces and audio in vid
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Unsupervised Retinal Vessel Extraction And Segmentation Method Based On A Tube Marked Point Process Model
Retinal vessel extraction and segmentation is essential for supporting diagnosis of eye-related diseases. In recent years, deep learning has been applied to vessel segmentation and achieved excellent performance. However, these supervised methods require
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Super-Resolution Of 3D Color Point Clouds Via Fast Graph Total Variation
3D point clouds acquired by low-cost sensors are often in lower spatial resolutions than desired for rendering images on high-resolution displays. In this paper, we propose a fast super-resolution (SR) algorithm for color 3D point clouds. We first populat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Concentration-Based Polynomial Calculations On Nicked Dna
In this paper, we introduce a novel scheme for computing polynomial functions on a substrate of nicked DNA. We first discuss a fractional encoding of data, based on the concentration of nicked double DNA strands. Then we show how to perform multiplication
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Soft-Output Finite Alphabet Equalization For Mmwave Massive Mimo
Next-generation wireless systems are expected to combine millimeter-wave (mmWave) and massive multi-user multiple-input multiple-output (MU-MIMO) technologies to deliver high data-rates. These technologies require the basestations (BSs) to process high-di
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Age-Based Scheduling Policy For Federated Learning In Mobile Edge Networks
Federated learning (FL) is a machine learning model that preserves data privacy in the training process. Specifically, FL brings the model directly to the user equipments (UEs) for local training, where an edge server periodically collects the trained par
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Prototypical Networks For Small Footprint Text-Independent Speaker Verification
Speaker verification aims to recognize target speakers with very few enrollment utterances. Conventional approaches learn a representation model to extract the speaker embeddings for verification. Recently, there are several new approaches in meta-learnin
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
2D-To-2D Mask Estimation For Speech Enhancement Based On Fully Convolutional Neural Network
In recent years, the deep learning-based approaches are popular in the field of singe-channel speech enhancement. Convolutional neural networks (CNNs) are a standard component of many current speech enhancement system. In this study, we design a new Fully
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Gated Mechanism For Attention Based Multimodal Sentiment Analysis
Multimodal sentiment analysis has recently gained popularity because of its relevance to social media posts, customer service calls and video blogs. In this paper, we address three aspects of multimodal sentiment analysis; 1. Cross modal interaction learn
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Empirical Study Of Transformer-Based Neural Language Model Adaptation
We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from s
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
One-Shot Parametric Audio Production Style Transfer With Application To Frequency Equalization
Audio production is a difficult process for many people, and properly manipulating sound to achieve a certain effect is non-trivial. In this paper, we present a method that facilitates this process by inferring appropriate audio effect parameters in order
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And Rnn-T Loss
In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences i
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Decentralized Min-Max Optimization: Formulations, Algorithms And Applications In Network Poisoning Attack
This paper discusses formulations and algorithms which allow a number of agents to collectively solve problems involving both (non-convex) minimization and (concave) maximization operations. These problems have a number of interesting applications in info
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Cross-Vae: Towards Disentangling Expression From Identity For Human Faces
Facial expression and identity are two independent yet intertwined components for representing a face. For facial expression recognition, identity can contaminate the training procedure by providing tangled but irrelevant information. In this paper, we pr
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Toso: Student's-T Distribution Aided One-Stage Orientation Target Detection In Remote Sensing Images
In this paper, a robust Student?s-T distribution aided One-Stage Orientation detector, namely TOSO, is proposed to address orientation target detection in remote sensing images. A one-stage keypoint based network architecture is used to avoid the complica
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Example Detection By Classification For Deep Speech Recognition
Machine Learning systems are vulnerable to adversarial attacks and will highly likely produce incorrect outputs under these attacks. There are white-box and black-box attacks regarding to adversary?s access level to the victim learning algorithm. To defen
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Aligntts: Efficient Feed-Forward Text-To-Speech System Without Explicit Alignment
Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Weakly Supervised Segmentation Guided Hand Pose Estimation During Interaction With Unknown Objects
Hand pose estimation is important for human computer interaction, but the performance is not satisfying when the hand is interacting with objects. To alleviate the influence of unknown objects, we propose a novel weakly supervised segmentation guided sche
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deep Geometric Knowledge Distillation With Graphs
In most cases deep learning architectures are trained disregarding the amount of operations and energy consumption. However, some applications, like embedded systems, can be resource-constrained during inference. A popular approach to reduce the size of a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Audio-Assisted Image Inpainting For Talking Faces
The goal of our work is to complete missing areas of images of talking faces, exploiting information from both the visual and audio modalities. Existing image inpainting methods rely solely on visual content that doesn?t always provide sufficient informat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improving Spoken Question Answering Using Contextualized Word Representation
While question answering (QA) systems have witnessed great breakthroughs in reading comprehension (RC) tasks, spoken question answering (SQA) is still a much less investigated area. Previous work shows that existing SQA systems are limited by catastrophic
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based Asr
The state-of-art methods for acoustic beamforming in multi-channel ASR is based on a neural mask estimator that attempts to learn the prediction of speech and noise using a paired corpus of clean and noisy recordings (teacher model). In this paper, we att
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Learning Noise Invariant Features Through Transfer Learning For Robust End-To-End Speech Recognition
End-to-end models yield impressive speech recognition results on clean datasets while having inferior performance on noisy datasets. To address this, we propose transfer learning from a clean dataset (WSJ) to a noisy dataset (CHiME-4) for connectionist te
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Prediction Of Vessel Trajectories From Ais Data Via Sequence-To-Sequence Recurrent Neural Networks
In this paper, we address the problem of predicting vessel trajectories based on Automatic Identification System (AIS) data. The goal is to learn the predictive distribution of maritime traffic patterns using historical data during the training phase, in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Confirmnet: Convolutional Firmnet And Application To Image Denoising And Inpainting
We address the problem of efficient convolutional sparse coding (CSC) and develop a non-convex-penalty-regularized CSC formulation, namely, minimax-concave CSC (MC2SC). MC2SC leads to an optimal sparse representation than the standard ell_1-penalty based
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
On The Stability Of Polynomial Spectral Graph Filters
Spectral graph filters are a key component in state-of-the-art machine learning models used for graph-based learning, such as graph neural networks. For certain tasks stability of the spectral graph filters is important for learning suitable representatio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Method For Millimeter-Wave Imaging Of Concealed Objects Via De-Aliasing
We consider the problem of millimeter-wave (MMW) imaging for concealed objects using a transceiver antenna array. In practical implementations, larger array element spacing leads to aliasing in the spectrum of the received echo signals. In this paper, we
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Key Action And Joint Ctc-Attention Based Sign Language Recognition
Sign Language Recognition (SLR) translates sign language video into natural language. In practice, sign language video, owning a large number of redundant frames, is necessary to be selected the essential. However, unlike common video that describes actio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Stochastic Ml Estimation For Hyperspectral Unmixing Under Endmember Variability And Nonlinear Models
Hyperspectral unmixing (HU) is a problem of blindly identifying the underlying materials, in form of spectral signatures, in the captured hyperspectral image. HU has received tremendous interest in remote sensing, and fundamentally the problem can be rega
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Uncertainty Quantification For Remaining Useful Lifetime Prediction With Multi-Channel Sensory Data
For remaining useful lifetime (RUL) prediction with multi-channel sensory data, long-term prediction has more uncertainty than short-term prediction. In this paper, the ratio of mean to variance was considered to measure the uncertainty propagation rate (
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Limitations Of Weak Labels For Embedding And Tagging
Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Graphem: Em Algorithm For Blind Kalman Filtering Under Graphical Sparsity Constraints
Modeling and inference with multivariate sequences is central in a number of signal processing applications such as acoustics, social network analysis, biomedical, and finance, to name a few. The linear-Gaussian state-space model is a common way to descri
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Deep Neural Network-Driven Feature Learning Method For Polyphonic Acoustic Event Detection From Real-Life Recordings
In this paper, a Deep Neural Network (DNN)-driven feature learning method for polyphonic Acoustic Event Detection (AED) is proposed. The proposed DNN is a combination of different layers used to characterize multiple overlapped acoustic events in the mixt
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Swift-Link: A Compressive Beam Alignment Algorithm For Practical Mmwave Radios
Millimeter wave (mmWave) bands offer a large amount of spectrum that can support many high data rate applications. To efficiently use the spectrum at mmWave, the wireless link between the transmitting and receiving radios must be configured properly. Comp
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Sylnet: An Adaptable End-To-End Syllable Count Estimator For Speech
Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learn
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Curriculum Learning For Speech Emotion Recognition From Crowdsourced Labels
This study introduces a method to design a curriculum for machine-learning to maximize the efficiency during the training process of deep neural networks (DNNs) for speech emotion recognition. Previous studies in other machine-learning problems have shown