Results
- Showing results for:
- Reset all filters
Search results
-
Conference paperNeo VW, Evers C, Naylor PA, 2020,
PEVD-based speech enhancement in reverberant environments
, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 186-190The enhancement of noisy speech is important for applications involving human-to-human interactions, such as telecommunications and hearing aids, as well as human-to-machine interactions, such as voice-controlled systems and robot audition. In this work, we focus on reverberant environments. It is shown that, by exploiting the lack of correlation between speech and the late reflections, further noise reduction can be achieved. This is verified using simulations involving actual acoustic impulse responses and noise from the ACE corpus. The simulations show that even without using a noise estimator, our proposed method simultaneously achieves noise reduction, and enhancement of speech quality and intelligibility, in reverberant environments over a wide range of SNRs. Furthermore, informal listening examples highlight that our approach does not introduce any significant processing artefacts such as musical noise.
-
Journal articleEvers C, Lollmann HW, Mellmann H, et al., 2020,
The LOCATA challenge: acoustic source localization and tracking
, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 28, Pages: 1620-1643, ISSN: 2329-9290The ability to localize and track acoustic events is a fundamental prerequisite for equipping machines with the ability to be aware of and engage with humans in their surrounding environment. However, in realistic scenarios, audio signals are adversely affected by reverberation, noise, interference, and periods of speech inactivity. In dynamic scenarios, where the sources and microphone platforms may be moving, the signals are additionally affected by variations in the source-sensor geometries. In practice, approaches to sound source localization and tracking are often impeded by missing estimates of active sources, estimation errors, as well as false estimates. The aim of the LOCAlization and TrAcking (LOCATA) Challenge is an openaccess framework for the objective evaluation and benchmarking of broad classes of algorithms for sound source localization and tracking. This paper provides a review of relevant localization and tracking algorithms and, within the context of the existing literature, a detailed evaluation and dissemination of the LOCATA submissions. The evaluation highlights achievements in the field, open challenges, and identifies potential future directions.Index Terms—Acoustic signal processing, Source localization, Source tracking, Reverberation.
-
Journal articleAntonello N, De Sena E, Moonen M, et al., 2019,
Joint Acoustic Localization and Dereverberation Through Plane Wave Decomposition and Sparse Regularization
, IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, Vol: 27, Pages: 1893-1905, ISSN: 2329-9290 -
Journal articleHafezi S, Moore AH, Naylor PA, 2019,
Spatial consistency for multiple source direction-of-arrival estimation and source counting.
, Journal of the Acoustical Society of America, Vol: 146, Pages: 4592-4603, ISSN: 0001-4966A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. The typical SS-validity confidence metrics analyse the validity of the SS assumption over a fixed-size TF region local to the TF bin. The performance of such methods degrades as the number of simultaneously active sources increases due to the associated decrease in the size of the TF regions where the SS assumption is valid. A SS-validity confidence metric is proposed that exploits a dynamic MS assumption over relatively larger TF regions. The proposed metric first clusters the initial DOA estimates (one per TF bin) and then uses the members' spatial consistency as well as its cluster's spread to weight each TF bin. Distance-based and density-based clustering are employed as two alternative approaches for clustering DOAs. A noise-robust density-based clustering is also used in an evolutionary framework to propose a method for source counting and source direction estimation. The evaluation results based on simulations and also with real recordings show that the proposed weighting strategy significantly improves the accuracy of source counting and MS DOA estimation compared to the state-of-the-art.
-
Conference paperNeo V, Evers C, Naylor P, 2019,
Speech enhancement using polynomial eigenvalue decomposition
, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Publisher: IEEESpeech enhancement is important for applications such as telecommunications, hearing aids, automatic speech recognition and voice-controlled system. The enhancement algorithms aim to reduce interfering noise while minimizing any speech distortion. In this work for speech enhancement, we propose to use polynomial matrices in order to exploit the spatial, spectral as well as temporal correlations between the speech signals received by the microphone array. Polynomial matrices provide the necessary mathematical framework in order to exploit constructively the spatial correlations within and between sensor pairs, as well as the spectral-temporal correlations of broadband signals, such as speech. Specifically, the polynomial eigenvalue decomposition (PEVD) decorrelates simultaneously in space, time and frequency. We then propose a PEVD-based speech enhancement algorithm. Simulations and informal listening examples have shown that our method achieves noise reduction without introducing artefacts into the enhanced signal for white, babble and factory noise conditions between -10 dB to 30 dB SNR.
-
Conference paperHogg AOT, Evers C, Naylor PA, 2019,
Multiple Hypothesis Tracking for Overlapping Speaker Segmentation
, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE, Pages: 195-199 -
Conference paperSharma D, Hogg AOT, Wang Y, et al., 2019,
Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks
, 2019 27th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1-5 -
Conference paperHogg AOT, Evers C, Naylor PA, 2019,
Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation
, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5826-5830 -
Conference paperNeo V, Naylor PA, 2019,
Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition
, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 8043-8047, ISSN: 0736-7791The Second-order Sequential Best Rotation (SBR2) algorithm, usedfor Eigenvalue Decomposition (EVD) on para-Hermitian polynomialmatrices typically encountered in wideband signal processingapplications like multichannel Wiener filtering and channel coding,involves a series of delay and rotation operations to achieve diagonalisation.In this paper, we proposed the use of Householder transformationsto reduce polynomial matrices to tridiagonal form beforezeroing the dominant element with rotation. Similar to performingHouseholder reduction on conventional matrices, our methodenables SBR2 to converge in fewer iterations with smaller orderof polynomial matrix factors because more off-diagonal Frobeniusnorm(F-norm) could be transferred to the main diagonal at everyiteration. A reduction in the number of iterations by 12.35% and0.1% improvement in reconstruction error is achievable.
-
Journal articleMoore AH, de Haan JM, Pedersen MS, et al., 2019,
Personalized signal-independent beamforming for binaural hearing aids
, Journal of the Acoustical Society of America, Vol: 145, Pages: 2971-2981, ISSN: 0001-4966The effect of personalized microphone array calibration on the performance of hearing aid beamformers under noisy reverberant conditions is studied. The study makes use of a new, publicly available, database containing acoustic transfer function measurements from 29 loudspeakers arranged on a sphere to a pair of behind-the-ear hearing aids in a listening room when worn by 27 males, 14 females, and 4 mannequins. Bilateral and binaural beamformers are designed using each participant's hearing aid head-related impulse responses (HAHRIRs). The performance of these personalized beamformers is compared to that of mismatched beamformers, where the HAHRIR used for the design does not belong to the individual for whom performance is measured. The case where the mismatched HAHRIR is that of a mannequin is of particular interest since it represents current practice in commercially available hearing aids. The benefit of personalized beamforming is assessed using an intrusive binaural speech intelligibility metric and in a matrix speech intelligibility test. For binaural beamforming, both measures demonstrate a statistically signficant (p < 0.05) benefit of personalization. The benefit varies substantially between individuals with some predicted to benefit by as much as 1.5 dB.
-
Journal articleMoore A, Xue W, Naylor P, et al., 2019,
Noise covariance matrix estimation for rotating microphone arrays
, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 519-530, ISSN: 2329-9290The noise covariance matrix computed between the signals from a microphone array is used in the design of spatial filters and beamformers with applications in noise suppression and dereverberation. This paper specifically addresses the problem of estimating the covariance matrix associated with a noise field when the array is rotating during desired source activity, as is common in head-mounted arrays. We propose a parametric model that leads to an analytical expression for the microphone signal covariance as a function of the array orientation and array manifold. An algorithm for estimating the model parameters during noise-only segments is proposed and the performance shown to be improved, rather than degraded, by array rotation. The stored model parameters can then be used to update the covariance matrix to account for the effects of any array rotation that occurs when the desired source is active. The proposed method is evaluated in terms of the Frobenius norm of the error in the estimated covariance matrix and of the noise reduction performance of a minimum variance distortionless response beamformer. In simulation experiments the proposed method achieves 18 dB lower error in the estimated noise covariance matrix than a conventional recursive averaging approach and results in noise reduction which is within 0.05 dB of an oracle beamformer using the ground truth noise covariance matrix.
-
Journal articleGannot S, Naylor PA, 2019,
Highlights from the Audio and Acoustic Signal Processing Technical Committee [In the Spotlight]
, IEEE Signal Processing Magazine, Vol: 36, ISSN: 1053-5888© 1991-2012 IEEE. The IEEE Audio and Acoustic Signal Processing Technical Committee (AASP TC) is one of 13 TCs in the IEEE Signal Processing Society. Its mission is to support, nourish, and lead scientific and technological development in all areas of AASP. These areas are currently seeing increased levels of interest and significant growth, providing a fertile ground for a broad range of specific and interdisciplinary research and development. Ranging from array processing for microphones and loudspeakers to music genre classification, from psychoacoustics to machine learning (ML), from consumer electronics devices to blue-sky research, this scope encompasses countless technical challenges and many hot topics. The TC has roughly 30 elected volunteer members drawn equally from leading academic and industrial organizations around the world, unified by the common aim of offering their expertise in the service of the scientific community.
-
Conference paperBrookes D, Lightburn L, Moore A, et al., 2019,
Mask-assisted speech enhancement for binaural hearing aids
, ELOBES2019 -
Conference paperMoore A, de Haan JM, Pedersen MS, et al., 2019,
Personalized {HRTF}s for hearing aids
, ELOBES2019 -
Conference paperXue W, Moore AH, Brookes M, et al., 2018,
Modulation-domain parametric multichannel kalman filtering for speech enhancement
, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2509-2513, ISSN: 2076-1465The goal of speech enhancement is to reduce the noise signal while keeping the speech signal undistorted. Recently we developed the multichannel Kalman filtering (MKF) for speech enhancement, in which the temporal evolution of the speech signal and the spatial correlation between multichannel observations are jointly exploited to estimate the clean signal. In this paper, we extend the previous work to derive a parametric MKF (PMKF), which incorporates a controlling factor to achieve the trade-off between the speech distortion and noise reduction. The controlling factor weights between the speech distortion and noise reduction related terms in the cost function of PMKF, and based on the minimum mean squared error (MMSE) criterion, the optimal PMKF gain is derived. We analyse the performance of the proposed PMKF and show the differences with the speech distortion weighted multichannel Wiener filter (SDW-MWF). We conduct experiments in different noisy conditions to evaluate the impact of the controlling factor on the noise reduction performance, and the results demonstrate the effectiveness of the proposed method.
-
Conference paperMoore AH, Lightburn L, Xue W, et al., 2018,
Binaural mask-informed speech enhancement for hearing aids with head tracking
, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE, Pages: 461-465An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.
-
Conference paperSharma D, Nour-Eldin A, Harding P, et al., 2018,
ROBUST FEATURE EXTRACTION FROM AD-HOC MICROPHONES FOR MEETING DIARIZATION
, 16th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE, Pages: 296-300, ISSN: 2639-4316 -
Conference paperEvers C, Loellmann H, Mellmann H, et al., 2018,
LOCATA challenge - evaluation tasks and measures
, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEESound source localization and tracking algorithms provide estimatesof the positional information about active sound sources in acous-tic environments. Despite substantial advances and significant in-terest in the research community, a comprehensive benchmarkingcampaign of the various approaches using a common database ofaudio recordings has, to date, not been performed. The aim of theIEEE-AASP Challenge on sound source localization and tracking(LOCATA) is to objectively benchmark state-of-the-art localizationand tracking algorithms using an open-access data corpus of record-ings for scenarios typically encountered in audio and acoustic signalprocessing applications. The challenge tasks range from the local-ization of a single source with a static microphone array to trackingof multiple moving sources with a moving microphone array. Thispaper provides an overview of the challenge tasks, describes the per-formance measures used for evaluation of the LOCATA Challenge,and presents baseline results for the development dataset.
-
Conference paperMoore AH, Xue W, Naylor PA, et al., 2018,
Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays
, 52nd Asilomar Conference on Signals, Systems, and Computers, Publisher: IEEE, Pages: 1936-1941, ISSN: 1058-6393 -
Journal articleXue W, Moore A, Brookes DM, et al., 2018,
Modulation-domain multichannel Kalman filtering for speech enhancement
, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1833-1847, ISSN: 2329-9290Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.
Contact us
Address
Speech and Audio Processing Lab
CSP Group, EEE Department
Imperial College London
Exhibition Road, London, SW7 2AZ, United Kingdom