Publications

Conference paper

Neo VW, Evers C, Naylor PA, 2020,

PEVD-based speech enhancement in reverberant environments

, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 186-190

The enhancement of noisy speech is important for applications involving human-to-human interactions, such as telecommunications and hearing aids, as well as human-to-machine interactions, such as voice-controlled systems and robot audition. In this work, we focus on reverberant environments. It is shown that, by exploiting the lack of correlation between speech and the late reflections, further noise reduction can be achieved. This is verified using simulations involving actual acoustic impulse responses and noise from the ACE corpus. The simulations show that even without using a noise estimator, our proposed method simultaneously achieves noise reduction, and enhancement of speech quality and intelligibility, in reverberant environments over a wide range of SNRs. Furthermore, informal listening examples highlight that our approach does not introduce any significant processing artefacts such as musical noise.

Journal article

Evers C, Lollmann HW, Mellmann H, Schmidt A, Barfuss H, Naylor PA, Kellermann Wet al., 2020,

The LOCATA challenge: acoustic source localization and tracking

, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 28, Pages: 1620-1643, ISSN: 2329-9290

The ability to localize and track acoustic events is a fundamental prerequisite for equipping machines with the ability to be aware of and engage with humans in their surrounding environment. However, in realistic scenarios, audio signals are adversely affected by reverberation, noise, interference, and periods of speech inactivity. In dynamic scenarios, where the sources and microphone platforms may be moving, the signals are additionally affected by variations in the source-sensor geometries. In practice, approaches to sound source localization and tracking are often impeded by missing estimates of active sources, estimation errors, as well as false estimates. The aim of the LOCAlization and TrAcking (LOCATA) Challenge is an openaccess framework for the objective evaluation and benchmarking of broad classes of algorithms for sound source localization and tracking. This paper provides a review of relevant localization and tracking algorithms and, within the context of the existing literature, a detailed evaluation and dissemination of the LOCATA submissions. The evaluation highlights achievements in the field, open challenges, and identifies potential future directions.Index Terms—Acoustic signal processing, Source localization, Source tracking, Reverberation.

Journal article

Antonello N, De Sena E, Moonen M, Naylor PA, van Waterschoot Tet al., 2019,

Joint Acoustic Localization and Dereverberation Through Plane Wave Decomposition and Sparse Regularization

, IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, Vol: 27, Pages: 1893-1905, ISSN: 2329-9290

Journal article

Hafezi S, Moore AH, Naylor PA, 2019,

Spatial consistency for multiple source direction-of-arrival estimation and source counting.

, Journal of the Acoustical Society of America, Vol: 146, Pages: 4592-4603, ISSN: 0001-4966

A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. The typical SS-validity confidence metrics analyse the validity of the SS assumption over a fixed-size TF region local to the TF bin. The performance of such methods degrades as the number of simultaneously active sources increases due to the associated decrease in the size of the TF regions where the SS assumption is valid. A SS-validity confidence metric is proposed that exploits a dynamic MS assumption over relatively larger TF regions. The proposed metric first clusters the initial DOA estimates (one per TF bin) and then uses the members' spatial consistency as well as its cluster's spread to weight each TF bin. Distance-based and density-based clustering are employed as two alternative approaches for clustering DOAs. A noise-robust density-based clustering is also used in an evolutionary framework to propose a method for source counting and source direction estimation. The evaluation results based on simulations and also with real recordings show that the proposed weighting strategy significantly improves the accuracy of source counting and MS DOA estimation compared to the state-of-the-art.

Conference paper

Neo V, Evers C, Naylor P, 2019,

Speech enhancement using polynomial eigenvalue decomposition

, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Publisher: IEEE

Speech enhancement is important for applications such as telecommunications, hearing aids, automatic speech recognition and voice-controlled system. The enhancement algorithms aim to reduce interfering noise while minimizing any speech distortion. In this work for speech enhancement, we propose to use polynomial matrices in order to exploit the spatial, spectral as well as temporal correlations between the speech signals received by the microphone array. Polynomial matrices provide the necessary mathematical framework in order to exploit constructively the spatial correlations within and between sensor pairs, as well as the spectral-temporal correlations of broadband signals, such as speech. Specifically, the polynomial eigenvalue decomposition (PEVD) decorrelates simultaneously in space, time and frequency. We then propose a PEVD-based speech enhancement algorithm. Simulations and informal listening examples have shown that our method achieves noise reduction without introducing artefacts into the enhanced signal for white, babble and factory noise conditions between -10 dB to 30 dB SNR.

Conference paper

Hogg AOT, Evers C, Naylor PA, 2019,

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE, Pages: 195-199

Conference paper

Sharma D, Hogg AOT, Wang Y, Nour-Eldin A, Naylor PAet al., 2019,

Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks

, 2019 27th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1-5

Conference paper

Hogg AOT, Evers C, Naylor PA, 2019,

Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation

, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5826-5830

Conference paper

Neo V, Naylor PA, 2019,

Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition

, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 8043-8047, ISSN: 0736-7791

The Second-order Sequential Best Rotation (SBR2) algorithm, usedfor Eigenvalue Decomposition (EVD) on para-Hermitian polynomialmatrices typically encountered in wideband signal processingapplications like multichannel Wiener filtering and channel coding,involves a series of delay and rotation operations to achieve diagonalisation.In this paper, we proposed the use of Householder transformationsto reduce polynomial matrices to tridiagonal form beforezeroing the dominant element with rotation. Similar to performingHouseholder reduction on conventional matrices, our methodenables SBR2 to converge in fewer iterations with smaller orderof polynomial matrix factors because more off-diagonal Frobeniusnorm(F-norm) could be transferred to the main diagonal at everyiteration. A reduction in the number of iterations by 12.35% and0.1% improvement in reconstruction error is achievable.

Journal article

Moore AH, de Haan JM, Pedersen MS, Brookes D, Naylor PA, Jensen Jet al., 2019,

Personalized signal-independent beamforming for binaural hearing aids

, Journal of the Acoustical Society of America, Vol: 145, Pages: 2971-2981, ISSN: 0001-4966

The effect of personalized microphone array calibration on the performance of hearing aid beamformers under noisy reverberant conditions is studied. The study makes use of a new, publicly available, database containing acoustic transfer function measurements from 29 loudspeakers arranged on a sphere to a pair of behind-the-ear hearing aids in a listening room when worn by 27 males, 14 females, and 4 mannequins. Bilateral and binaural beamformers are designed using each participant's hearing aid head-related impulse responses (HAHRIRs). The performance of these personalized beamformers is compared to that of mismatched beamformers, where the HAHRIR used for the design does not belong to the individual for whom performance is measured. The case where the mismatched HAHRIR is that of a mannequin is of particular interest since it represents current practice in commercially available hearing aids. The benefit of personalized beamforming is assessed using an intrusive binaural speech intelligibility metric and in a matrix speech intelligibility test. For binaural beamforming, both measures demonstrate a statistically signficant (p < 0.05) benefit of personalization. The benefit varies substantially between individuals with some predicted to benefit by as much as 1.5 dB.

Journal article

Moore A, Xue W, Naylor P, Brookes Det al., 2019,

Noise covariance matrix estimation for rotating microphone arrays

, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 519-530, ISSN: 2329-9290

The noise covariance matrix computed between the signals from a microphone array is used in the design of spatial filters and beamformers with applications in noise suppression and dereverberation. This paper specifically addresses the problem of estimating the covariance matrix associated with a noise field when the array is rotating during desired source activity, as is common in head-mounted arrays. We propose a parametric model that leads to an analytical expression for the microphone signal covariance as a function of the array orientation and array manifold. An algorithm for estimating the model parameters during noise-only segments is proposed and the performance shown to be improved, rather than degraded, by array rotation. The stored model parameters can then be used to update the covariance matrix to account for the effects of any array rotation that occurs when the desired source is active. The proposed method is evaluated in terms of the Frobenius norm of the error in the estimated covariance matrix and of the noise reduction performance of a minimum variance distortionless response beamformer. In simulation experiments the proposed method achieves 18 dB lower error in the estimated noise covariance matrix than a conventional recursive averaging approach and results in noise reduction which is within 0.05 dB of an oracle beamformer using the ground truth noise covariance matrix.

Journal article

Gannot S, Naylor PA, 2019,

Highlights from the Audio and Acoustic Signal Processing Technical Committee [In the Spotlight]

, IEEE Signal Processing Magazine, Vol: 36, ISSN: 1053-5888

© 1991-2012 IEEE. The IEEE Audio and Acoustic Signal Processing Technical Committee (AASP TC) is one of 13 TCs in the IEEE Signal Processing Society. Its mission is to support, nourish, and lead scientific and technological development in all areas of AASP. These areas are currently seeing increased levels of interest and significant growth, providing a fertile ground for a broad range of specific and interdisciplinary research and development. Ranging from array processing for microphones and loudspeakers to music genre classification, from psychoacoustics to machine learning (ML), from consumer electronics devices to blue-sky research, this scope encompasses countless technical challenges and many hot topics. The TC has roughly 30 elected volunteer members drawn equally from leading academic and industrial organizations around the world, unified by the common aim of offering their expertise in the service of the scientific community.

Abstract
Cite

Conference paper

Brookes D, Lightburn L, Moore A, Naylor P, Xue Wet al., 2019,

Mask-assisted speech enhancement for binaural hearing aids

, ELOBES2019

Conference paper

Moore A, de Haan JM, Pedersen MS, Naylor P, Brookes D, Jensen Jet al., 2019,

Personalized {HRTF}s for hearing aids

, ELOBES2019

Cite

Conference paper

Xue W, Moore AH, Brookes M, Naylor PAet al., 2018,

Modulation-domain parametric multichannel kalman filtering for speech enhancement

, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2509-2513, ISSN: 2076-1465

The goal of speech enhancement is to reduce the noise signal while keeping the speech signal undistorted. Recently we developed the multichannel Kalman filtering (MKF) for speech enhancement, in which the temporal evolution of the speech signal and the spatial correlation between multichannel observations are jointly exploited to estimate the clean signal. In this paper, we extend the previous work to derive a parametric MKF (PMKF), which incorporates a controlling factor to achieve the trade-off between the speech distortion and noise reduction. The controlling factor weights between the speech distortion and noise reduction related terms in the cost function of PMKF, and based on the minimum mean squared error (MMSE) criterion, the optimal PMKF gain is derived. We analyse the performance of the proposed PMKF and show the differences with the speech distortion weighted multichannel Wiener filter (SDW-MWF). We conduct experiments in different noisy conditions to evaluate the impact of the controlling factor on the noise reduction performance, and the results demonstrate the effectiveness of the proposed method.

Conference paper

Moore AH, Lightburn L, Xue W, Naylor P, Brookes Det al., 2018,

Binaural mask-informed speech enhancement for hearing aids with head tracking

, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE, Pages: 461-465

An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.

Conference paper

Sharma D, Nour-Eldin A, Harding P, Karimian-Azari S, Naylor PAet al., 2018,

ROBUST FEATURE EXTRACTION FROM AD-HOC MICROPHONES FOR MEETING DIARIZATION

, 16th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE, Pages: 296-300, ISSN: 2639-4316

Conference paper

Evers C, Loellmann H, Mellmann H, Schmidt A, Barfuss H, Naylor P, Kellermann Wet al., 2018,

LOCATA challenge - evaluation tasks and measures

, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE

Sound source localization and tracking algorithms provide estimatesof the positional information about active sound sources in acous-tic environments. Despite substantial advances and significant in-terest in the research community, a comprehensive benchmarkingcampaign of the various approaches using a common database ofaudio recordings has, to date, not been performed. The aim of theIEEE-AASP Challenge on sound source localization and tracking(LOCATA) is to objectively benchmark state-of-the-art localizationand tracking algorithms using an open-access data corpus of record-ings for scenarios typically encountered in audio and acoustic signalprocessing applications. The challenge tasks range from the local-ization of a single source with a static microphone array to trackingof multiple moving sources with a moving microphone array. Thispaper provides an overview of the challenge tasks, describes the per-formance measures used for evaluation of the LOCATA Challenge,and presents baseline results for the development dataset.

Conference paper

Moore AH, Xue W, Naylor PA, Brookes Met al., 2018,

Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays

, 52nd Asilomar Conference on Signals, Systems, and Computers, Publisher: IEEE, Pages: 1936-1941, ISSN: 1058-6393

Journal article

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018,

Modulation-domain multichannel Kalman filtering for speech enhancement

, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1833-1847, ISSN: 2329-9290

Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.

PEVD-based speech enhancement in reverberant environments

The LOCATA challenge: acoustic source localization and tracking

Joint Acoustic Localization and Dereverberation Through Plane Wave Decomposition and Sparse Regularization

Spatial consistency for multiple source direction-of-arrival estimation and source counting.

Speech enhancement using polynomial eigenvalue decomposition

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks

Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation

Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition

Personalized signal-independent beamforming for binaural hearing aids

Noise covariance matrix estimation for rotating microphone arrays

Highlights from the Audio and Acoustic Signal Processing Technical Committee [In the Spotlight]

Mask-assisted speech enhancement for binaural hearing aids

Personalized {HRTF}s for hearing aids

Modulation-domain parametric multichannel kalman filtering for speech enhancement

Binaural mask-informed speech enhancement for hearing aids with head tracking

ROBUST FEATURE EXTRACTION FROM AD-HOC MICROPHONES FOR MEETING DIARIZATION

LOCATA challenge - evaluation tasks and measures

Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays

Modulation-domain multichannel Kalman filtering for speech enhancement

Contact us

Address

Email

Members only

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Mask-assisted speech enhancement for binaural hearing aids

Personalized {HRTF}s for hearing aids

Modulation-domain parametric multichannel kalman filtering for speech enhancement

ROBUST FEATURE EXTRACTION FROM AD-HOC MICROPHONES FOR MEETING DIARIZATION

Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays

Contact us

Address

Email

Members only