
Showing 1 - 50 of 1951
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Enhanced Action Tubelet Detector For Spatio-Temporal Video Action Detection
Current spatio-temporal action detection methods usually employ a two-stream architecture, a RGB stream for raw images and an auxiliary motion stream for optical flow. Training is required individually for each stream and more efforts are necessary to imp
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Mdr-Surv: A Multi-Scale Deep Learning-Based Radiomics For Survival Prediction In Pulmonary Malignancies
Predicting death in lung cancer patients before initiating treatment is of paramount importance as this may guide decision-making towards more aggressive or combination of different types of treatment. In this work, we propose a Multi-scale Deep learning-
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Doa Tracking Via Signal-Subspace Projector Update
We develop a novel direction-of-arrival (DOA) tracking method in which we directly operate the signal-subspace projector instead of tracking the subspace eigenbasis. In each time frame, we employ a multidimensional subspace fitting approach to track the s
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Tdmf: Task-Driven Multilevel Framework For End-To-End Speaker Verification
In this paper, a task-driven multilevel framework (TDMF) is proposed for end-to-end speaker verification. The TDMF has four layers, and each layer has different effects on speaker models or representations to implement the functions of universal backgroun
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Hdmfh: Hypergraph Based Discrete Matrix Factorization Hashing For Multimodal Retrieval
In recent years, hashing based cross-modal retrieval methods have attracted considerable attention for the high retrieval efficiency and low storage cost. However, most of the existing methods neglect the high-order relationship among data samples. In add
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Hka: A Hierarchical Knowledge Attention Mechanism For Multi-Turn Dialogue System
Generating informative responses by incorporating external knowledge into dialogue system attracts more and more attention. Most previous works facilitate single-turn dialogue system on generating such responses. However, few works focus on incorporating
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Uncertainties In Short Commercial Microwave Links Fading Due To Rain
A Power-Law relation between attenuation and rain rate has proven to be a useful tool in wireless network design at microwave and mmWave frequencies. In the last decade this relation has also been used for estimating rain from signal level measurements in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
G2G: Tts-Driven Pronunciation Learning For Graphemic Hybrid Asr
Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English. However, graphemic ASR still has problems with r
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Assessing The Scope Of Generalized Countermeasures For Anti-Spoofing
Most of the research on anti-spoofing countermeasures are specific to a type of spoofing attacks, where models are trained on data of a particular nature, either synthetic or replay. However, one does not have such leverage as there is no prior knowledge
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Moment-Based Approach For Guaranteed Tensor Decomposition
This paper presents a new scheme to perform the canonical polyadic decomposition (CPD) of a symmetric tensor. We first formulate the CPD problem as a truncated moment problem, where a measure has to be recovered knowing some of its moments. The support of
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Neural Attentive Multiview Machines
An important problem in multiview representation learning is finding the optimal combination of views with respect to the specific task at hand. To this end, we introduce NAM: a Neural Attentive Multiview machine that learns multiview item representations
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Text-To-Image Synthesis Method Evaluation Based On Visual Patterns
A commonly used evaluation metric for text-to-image synthesis is the Inception score (IS) cite{inceptionscore}, which has been shown to be a quality metric that correlates well with human judgment. However, IS does not reveal properties of the generated
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Analytical Solution To Jacobsen Estimator For Windowed Signals
Interpolated discrete Fourier transform (DFT) is a well-known method for frequency estimation of complex sinusoids. For signals without windowing (or with rectangular-windowing), this has been well investigated and a large number of estimators have been d
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Training Spoken Language Understanding Systems With Non-Parallel Speech And Text
End-to-end spoken language understanding (SLU) systems are typically trained on large amounts of data. In many practical scenarios, the amount of labeled speech is often limited as opposed to text. In this study, we investigate the use of non-parallel spe
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Approaching Optimal Embedding In Audio Steganography With Gan
Audio steganography is a technology that embeds messages into audio without raising any suspicion from hearing it. Current steganography methods are based on heuristic cost designs. In this work, we proposed a framework based on Generative Adversarial Net
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Weakly Supervised Crowd-Wise Attention For Robust Crowd Counting
Due to a wide range of various application scenes, robust crowd counting is still quite difficult and the performance is far from being satisfied. In this paper, we propose a novel robust crowd counting method by introducing a weakly supervised crowd-wise
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Principle-Inspired Multi-Scale Aggregation Network For Extremely Low-Light Image Enhancement
The under-exposure and low-light environments are common to degrade the image-quality with invisible information. To ameliorate this case, a copious of low-light image enhancement methods are developed. However, these existing works are hard to handle ext
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Greedy Sparse Array Design For Optimal Localization Under Spatially Prioritized Source Distribution
A common approach for acoustic source localization is based on finding the maximum of a spatial cost function, such as the steered response power (SRP) function. The shape of the SRP highly depends on the constellation of sensors within the array layout,
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Real-Time Task Offloading For Large-Scale Mobile Edge Computing
Mobile-edge computing (MEC) is a promising technology to support computation-intensive and delay-sensitive applications at smart devices by offloading their local tasks to the network edge. In this paper, we propose a novel index based real-time task offl
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Simultaneous Separation And Transcription Of Mixtures With Multiple Polyphonic And Percussive Instruments
We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared m
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Attentional Fused Temporal Transformation Network For Video Action Recognition
Effective spatiotemporal feature representation is crucial to the video-based action recognition task. Focusing on discriminate spatiotemporal feature learning, we propose Attentional Fused Temporal Transformation Network (AttnTTN) for action recognition
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Identification Of Essential Proteins Using A Novel Multi-Objective Optimization Method
Using graph theory to identify essential proteins is a hot topic at present. These methods are called network-based methods. However, the generalization ability of most network-based methods is not satisfactory. Hence, in this paper, we consider the ident
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
All You Need Is A Second Look: Towards Tighter Arbitrary Shape Text Detection
Scene text detection methods have progressed substantially over the past years. However, there remain several problems to be solved. Generally, long curve text instances tend to be fragmented because of the limited receptive field size of CNN. Besides, si
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Weakly Labelled Audio Tagging Via Convolutional Networks With Spatial And Channel-Wise Attention
Multiple instance learning (MIL) with convolutional neural networks (CNNs) has been proposed recently for weakly labelled audio tagging. However, features from the various filtering channels and spatial regions are often treated equally, which may limit i
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00

Sound Event Detection By Multitask Learning Of Sound Events And Scenes With Soft Scene Labels
[3 Videos ]
Sound event detection (SED) and acoustic scene classification (ASC) are major tasks in environmental sound analysis. Considering that sound events and scenes are closely related to each other, some works have addressed joint analyses of sound events and a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Cumulant Slice Reconstruction From Compressive Measurements And Its Application To Line Spectrum Estimation
Higher-order statistics (HOS) estimation hinges on the availability of a huge amount of data records, which causes exceedingly high sampling rates and overwhelming energy consumption for the sampling devices, especially when dealing with wideband signals.
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Depthwise-Stft Based Separable Convolutional Neural Networks
In this paper, we propose a new convolutional layer called Depthwise-STFT Separable layer that can serve as an alternative to the standard Depthwise Separable convolutional layer. The construction of the proposed layer is inspired by the fact that the Fou
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Indoor Heading Direction Estimation Using Rf Signals
Heading direction information is crucial to many ubiquitous computing applications. The main stream has been resort- ing to inertial sensors, such as accelerometer, gyroscope and magnetometer, which suffer from severe accumulative errors or large degradat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Evaluation Of Sensor Self-Noise In Binaural Rendering Of Spherical Microphone Array Signals
Spherical microphone arrays are used to capture spatial sound fields, which can then be rendered via headphones. We use the Real-Time Spherical Array Renderer (ReTiSAR) to analyze and auralize the propagation of sensor self-noise through the processing pi
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Geometry Constrained Progressive Learning For Lstm-Based Speech Enhancement
In our previous work, a progressive learning framework for long short-term memory (LSTM)-based speech enhancement was proposed to improve the performance in low SNR environment, where each LSTM layer is guided to learn an intermediate target with a specif
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Semi-Supervised Speaker Adaptation For End-To-End Speech Synthesis With Pretrained Models
Recently, end-to-end text-to-speech (TTS) models have achieved a remarkable performance, however, requiring a large amount of paired text and speech data for training. On the other hand, we can easily collect unpaired dozen minutes of speech recordings fo
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Wifi-Based Passive Fall Detection System
Fall detection systems based on WiFi signals are gaining popularity recently. However, most of the existing works relying on training are environment-dependent. In this paper, we propose DeFall, a novel WiFi-based environment-independent fall detection sy
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Real-Time Deep Network For Crowd Counting
Automatic analysis of highly crowded people has attracted extensive attention from computer vision research. Previous approaches for crowd counting have already achieved promising performance across various benchmarks. However, to deal with the real situa
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
End-To-End Automatic Speech Recognition Integrated With Ctc-Based Voice Activity Detection
This paper integrates a voice activity detection (VAD) function with end-to-end automatic speech recognition toward an online speech interface and transcribing very long audio recordings. We focus on connectionist temporal classification (CTC) and its ext
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Classification Of High-Dimensional Motor Imagery Tasks Based On An End-To-End Role Assigned Convolutional Neural Network
A brain-computer interface (BCI) provides a direct communication pathway between user and external devices. EEG-based motor imagery paradigm is widely used in non-invasive BCI to obtain encoded signals contained user intention of movement execution. Howev
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Pseudo Labeling And Negative Feedback Learning For Large-Scale Multi-Label Domain Classification
In large-scale domain classification, an utterance can be handled by multiple domains with overlapped capabilities. However, only a limited number of ground-truth domains are provided for each training utterance in practice while knowing as many as correc
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Comparison Of User Models Based On Gmm-Ubm And I-Vectors For Speech, Handwriting, And Gait Assessment Of Parkinson's Disease Patients
Parkinson's disease is a neurodegenerative disorder characterized by the presence of different motor impairments. Information from speech, handwriting, and gait signals have been considered to evaluate the neurological state of the patients. On the other
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Level Deep Neural Network Adaptation For Speaker Verification Using Mmd And Consistency Regularization
Adapting speaker verification (SV) systems to a new environment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we pres
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Recurrent Variational Autoencoder For Speech Enhancement
This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization no
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Exploiting Sparsity For Robust Sensor Network Localization In Mixed Los/Nlos Environments
We address the problem of robust network localization in realistic mixed LOS/NLOS environments. We make use of the fact that the bias of range measurement errors is not only non-negative but also sparse when LOS dominates, which has been long overlooked i
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Federated Neuromorphic Learning Of Spiking Neural Networks For Low-Power Edge Intelligence
Spiking Neural Networks (SNNs) offer a promising alternative to conventional Artificial Neural Networks (ANNs) for the implementation of on-device low-power online learning and inference. On-device training is, however, constrained by the limited amount o
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Learning To Estimate Driver Drowsiness From Car Acceleration Sensors Using Weakly Labeled Data
This paper addresses the learning task of estimating driver drowsiness from the signals of car acceleration sensors. Since even drivers themselves cannot perceive their own drowsiness in a timely manner unless they use burdensome invasive sensors, obtaini
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
The Effect Of Power Allocation On Visible Light Communication Using Commercial Phosphor-Converted Led Lamp For Indirect Illumination
Visible light communication (VLC) systems should be designed to provide illumination and wireless data services simultaneously. To achieve this goal at a reasonable cost, the use of Phosphor-Converted (PC) LEDs for indirect illumination should be favored
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Scale Residual Network For Image Classification
Multi-scale approach representing image objects at various levels-of-details has been applied to various computer vision tasks. Existing image classification approaches place more emphasis on multi-scale convolution kernels, and overlook multi-scale featu
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Sparse Modeling On Distributed Encryption Data
Big-data analysis by edge/cloud systems is becoming more important. However, when information may lead to personal identification, such information tends to be encrypted and restricted to its owners to ensure privacy protection. The resulting data is ofte
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
D2Na: Day-To-Night Adaptation For Vision Based Parking Management System
Recently, smart parking management systems built on deep learning frameworks have achieved promising performance. However, most of them are designed for the day-time. To help these systems work at night also, extra labor-intensive efforts and extra traini
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
End-To-End Accent Conversion Without Using Native Utterances
Techniques for accent conversion (AC) aim to convert non-native to native accented speech. Conventional AC methods try to convert only the speaker identity of a native speaker's voice to that of the non-native accented target speaker, leaving the underlyi
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adi17: A Fine-Grained Arabic Dialect Identification Dataset
In this paper, we describe a method to collect dialectal speech from YouTube videos to create a large-scale Dialect Identification (DID) dataset. Using this method, we collected dialectal Arabic from known YouTube channels from 17 Arabic speaking countrie
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improving Automated Segmentation Of Radio Shows With Audio Embeddings
Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three dif