Showing 1001 - 1050 of 1951
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Accurate Localization Of Auv In Motion By Explicit Solution Using Time Delays
Accurate localization of an autonomous underwater vehicle (AUV) is essential in many applications. The motion of an AUV during the measurement acquisition period can be significant and the localization performance can suffer considerably if it is neglecte
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Algorithmic Exploration Of American English Dialects
In this paper, we use a novel algorithmic approach to explore dialectal variation in American English speech. Without the need for human annotations, we are able to use a corpus transcribed in text form only. Our results show that, in general, American En
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improving Deep Learning Classification Of Jpeg2000 Images Over Bandlimited Networks
JPEG2000 (j2k) is a highly popular format for image and video compression. It plays a major role in the rapidly growing applications of cloud based image classification. Considering limited network bandwidth, we propose an end-to-end deep learning framewo
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Multi-View Approach For Mandarin Non-Native Mispronunciation Verification
Traditionally, the performance of non-native mispronunciation verification systems relied on effective phone-level labelling of non-native corpora. In this study, a multi-view approach is proposed to incorporate discriminative feature representations whic
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
K-Autoencoders Deep Clustering
In this study we propose a deep clustering algorithm that extends the k-means algorithm. Each cluster is represented by an autoencoder instead of a single centroid vector. Each data point is associated with the autoencoder which yields the minimal reconst
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Multi-Task Learning For Speaker Normalization In Replay Detection
Spoofing detection algorithms in voice biometrics are adversely affected by differences in the speech characteristics of the various target users. In this paper, we propose a novel speaker normalisation technique that employs adversarial multi-task learni
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
The Matched Reassigned Cross-Spectrogram For Phase Estimation
In this paper, the matched reassigned spectrogram is expanded into a novel matched phase reassignment (MPR) method based on the reassigned cross-spectrogram. It is shown that for two phase synchronized oscillating transient signals, the method gives perfe
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Interpretable Machine Learning In Sustainable Edge Computing: A Case Study Of Short-Term Photovoltaic Power Output Prediction
With the Internet of Things continuously penetrating into all spheres of our daily life, the increasing use of smart devices enabled the emergence of the edge computing paradigm. To meet the needs of saving energy and reducing electricity bills for each h
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Sparse Convolutional Beamforming For Wireless Ultrasound
Wireless ultrasound systems can make the imaging process much more efficient, affordable and accessible for users. The standard technique to create B-mode images is to rely on delay and sum (DAS) beamforming, in which the signals at each transducer elemen
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Divergence-Based Adaptive Extreme Video Completion
Extreme image or video completion, where, for instance, we only retain 1% of pixels in random locations, allows for very cheap sampling in terms of the required pre-processing. The consequence is, however, a reconstruction that is challenging for humans a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fast Acoustic Scattering Using Convolutional Neural Networks
Diffracted scattering and occlusion are important acoustic effects in interactive auralization and noise control applications, typically requiring expensive numerical simulation. We propose training a convolutional neural network to map from a convex scat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Video Compression Guided By Soft Edge Detection
We propose a video compression framework using conditional Generative Adversarial Networks (GANs). We rely on two encoders: one that deploys a standard video codec and another one which generates low-level soft edge maps. For decoding, we use a standard v
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Transformer Vae: A Hierarchical Model For Structure-Aware And Interpretable Music Representation Learning
Structure awareness and interpretability are two of the most desired properties of music generation algorithms. Structure-aware models generate more natural and coherent music with long-term dependencies, while interpretable models are more friendly for h
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Geometrically Constrained Independent Vector Analysis For Directional Speech Enhancement
This paper addresses the multichannel directional speech enhancement problem with geometrically constrained independent vector analysis (GCIVA), where we aim to combine the high separation performance from blind source separation and the capability of dir
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Espnet-Tts: Unified, Reproducible, And Integratable Open Source End-To-End Text-To-Speech Toolkit
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit supports state-of-the-art E2E-TTS models, including Tacotron~2, Transformer TT
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
The Processing Of Mandarin Chinese Tonal Alternations In Contexts: An Eye-Tracking Study
This study investigated the perception of Mandarin tonal alternations in disyllabic words. In Mandarin, a low-dipping Tone3 is converted to a high-rising Tone2 when followed by another Tone3, known as third tone sandhi. Although previous studies showed st
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Attacks On Gmm I-Vector Based Speaker Verification Systems
This work investigates the vulnerability of Gaussian Mixture Model (GMM) i-vector based speaker verification systems to adversarial attacks, and the transferability of adversarial samples crafted from GMM i-vector based systems to x-vector based systems.
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Detecting Mismatch Between Text Script And Voice-Over Using Utterance Verification Based On Phoneme Recognition Ranking
The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a scrip
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Regularized Beamformer For The Spherical Microphone Array To Cope With The White Noise Amplification
[2 Videos ]
Spherical microphone arrays with compact aperture and maximum directivity factor have been one of the popular research fields but are usually accompanied by the white noise amplification problem, which hinders them for practical applications. This paper p
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Neural Network Based On First Principles
In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation. The approach appeals to the principle of Maximum Entropy (Max-Ent) to find the posterior distribution o
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Av(Se)²: Audio-Visual Squeeze-Excite Speech Enhancement
The goal of audio-visual speech enhancement (AVSE) is to supplement audio-only information with visual information, such as target speaker's lip movements, to improve the intelligibility and overall perceptual quality of noisy speech signals. We propose a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Unseen Face Presentation Attack Detection With Hypersphere Loss
Presentation attack is one of the main threats to face verification systems and attracts great attention of research community. Recent methods achieve great success in intra-database test. However, the problem is more complex in practical scenario as the
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
L-Vector: Neural Label Embedding For Domain Adaptation
We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains. With NLE method, we distill the knowledge from a powerful source-doma
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Attention-Guided Deraining Network Via Stage-Wise Learning
Due to diverse rain shapes, directions, densities as well as different distances to cameras, rain streaks in the air are interweaved and overlapped. However, most existing deraining methods are inherently oblivious this phenomenon and tend to learn a sing
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Joint Enhancement And Denoising Of Low Light Images Via Jnd Transform
Low light images suffer from low dynamic range and severe noise due to low signal-to-noise ratio (SNR). In this paper, we propose joint enhancement and denoising of low light images via just-noticeable-difference (JND) transform. We achieve contrast enhan
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Signal Clustering With Class-Independent Segmentation
Radar signals have been dramatically increasing in complexity, limiting the source separation ability of traditional approaches. In this paper we propose a Deep Learning-based clustering method, which encodes concurrent signals into images, and, for the f
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Model Order Selection In Doa Scenarios Via Cross-Entropy Based Machine Learning Techniques
In this paper, we present a machine learning approach for estimating the number of incident wavefronts in a direction of arrival scenario. In contrast to previous works, a multilayer neural network with a cross-entropy objective is trained. Furthermore, w
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Instant Adaptive Learning: An Adaptive Filter Based Fast Learning Model Construction For Sensor Signal Time Series Classification On Edge Devices
Construction of learning model under computational and energy constraints, particularly in highly limited training time requirement is a critical as well as unique necessity of many practical IoT applications that use time series sensor signal analytics f
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Spectrograms Fusion With Minimum Difference Masks Estimation For Monaural Speech Dereverberation
Spectrograms fusion is an effective method for incorporating complementary speech dereverberation systems. Previous linear spectrograms fusion by averaging multiple spectrograms shows very good performance. However, this simple method can?t be applied to
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Sequential Deep Unrolling With Flow Priors For Robust Video Deraining
Video deraining has attracted wide attention since the urgent demand of high-quality video in recent years. The indistinct details and nonideal deraining effects are the most common defects in existing techniques, whose cause lies in the insufficient usag
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Linear Speedup In Saddle-Point Escape For Decentralized Non-Convex Optimization
Under appropriate cooperation protocols and parameter choices, fully decentralized solutions for stochastic optimization have been shown to match the performance of centralized solutions and result in linear speedup (in the number of agents) relative to n
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Weighted Krylov-Levenberg-Marquardt Method For Canonical Polyadic Tensor Decomposition
Weighted canonical polyadic (CP) tensor decomposition appears in a wide range of applications. A typical situation where the weighted decomposition is needed is when some tensor elements are unknown, and the task is to fill in the missing elements under t
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Streaming Automatic Speech Recognition With The Transformer Model
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR). Recently, the transformer architecture, which uses self-attention to model temporal context information, has bee
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Source Coding Of Audio Signals With A Generative Model
We consider source coding of audio signals with the help of a generative model. We use a construction where a waveform is first quantized, yielding a finite bitrate representation. The waveform is then reconstructed by random sampling from a model conditi
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Lifter Training And Sub-Band Modeling For Computationally Efficient And High-Quality Voice Conversion Using Spectral Differentials
In this paper, we propose computationally efficient and high-quality methods for statistical voice conversion (VC) with direct waveform modification based on spectral differentials. The conventional method with a minimum-phase filter achieves high-quality
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Speaker Adaptation Of A Multilingual Acoustic Model For Cross-Language Synthesis
Several studies have shown promising results in adapting DNN-based acoustic models as a mechanism to transfer characteristics from pre-trained models. One such example is speaker adaptation using a small amount of data, where fine-tuning has helped train
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
One-Shot Voice Conversion By Vector Quantization
In this paper, we propose a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label. We model the content embedding as a series of discrete codes and take the difference between quantize-before and q
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Reconstruction Of Fri Signals Using Deep Neural Network Approaches
Finite Rate of Innovation (FRI) theory considers sampling and reconstruction of classes of non-bandlimited continuous signals that have a small number of free parameters, such as a stream of Diracs. The task of reconstructing FRI signals from discrete sam
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Meta-Learning For Robust Child-Adult Classification From Speech
Computational modeling of naturalistic conversations in clinical applications has seen growing interest in the past decade. An important use-case involves child-adult interactions within the autism diagnosis and intervention domain. In this paper, we addr
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Residual Attention Network For Wavelet Domain Super-Resolution
Single-image super-resolution plays an important role in computer vision area. However, previous works using convolutional neural networks perform badly when reconstructing high frequency details, result in over-smooth and lacking of textural information
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Supervised Graph Representation Learning For Modeling The Relationship Between Structural And Functional Brain Connectivity
In this paper, we propose a supervised graph representation learning method to model the relationship between brain functional connectivity (FC) and structural connectivity (SC) through a graph encoder-decoder system. The graph convolutional network (GCN)
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Online Tensor Completion And Free Submodule Tracking With The T-Svd
We propose a new online algorithm, called TOUCAN, for the tensor completion problem of imputing missing entries of a low tubal-rank tensor using the tensor-tensor product (t-product) and tensor singular value decomposition (t-SVD) algebraic framework. We
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Feature Enhancement With Deep Feature Losses For Speaker Verification
Speaker Verification still suffers from the challenge of generalization to novel adverse environments. We leverage on the recent advancements made by deep learning based speech enhancement and propose a feature-domain supervised denoising based solution.
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Stage Residual Hiding For Image-Into-Audio Steganography
The widespread application of audio communication technologies has speeded up audio data flowing across the Internet, which made it a popular carrier for covert communication. In this paper, we present a cross-modal steganography method for hiding image c
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improving Sequence-To-Sequence Speech Recognition Training With On-The-Fly Data Augmentation
Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obta
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Riemannian Geometry And CraméR-Rao Bound For Blind Separation Of Gaussian Sources
We consider the optimal performance of blind separation of Gaussian sources. In practice, this estimation problem is solved by a two-step procedure: estimation of a set of covariance matrices from the observed data and approximate joint diagonalization of
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Humbug Zooniverse: A Crowd-Sourced Acoustic Mosquito Dataset
Mosquitoes are the only known vector of malaria, which leads to hundreds of thousands of deaths each year. Understanding the number and location of potential mosquito vectors is of paramount importance to aid the reduction of malaria transmission cases. I
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Receiver Design And Agc Optimization With Self Interference Induced Saturation
In-band Full Duplex (FD) is a wireless communication technology which has the potential to transmit and receive simultaneously in the same frequency band. Self-interference cancellation (SIC) is the key enabler to achieve FD operation. As the SI is severe
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Spatial Attention For Far-Field Speech Recognition With Deep Beamforming Neural Networks
In this paper, we introduce spatial attention for refining the information in multi-direction neural beamformer for far-field automatic speech recognition. Previous approaches of neural beamformers with multiple look directions, such as the factored compl