Publications

Conference paper

Yiallourides C, Moore AH, Auvinet E, Van der Straeten C, Naylor PAet al., 2018,

Acoustic Analysis and Assessment of the Knee in Osteoarthritis During Walking

, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 281-285

We examine the relation between the sounds emitted by the knee joint during walking and its condition, with particular focus on osteoarthritis, and investigate their potential for noninvasive detection of knee pathology. We present a comparative analysis of several features and evaluate their discriminant power for the task of normal-abnormal signal classification. We statistically evaluate the feature distributions using the two-sample Kolmogorov-Smirnov test and the Bhattacharyya distance. We propose the use of 11 statistics to describe the distributions and test with several classifiers. In our experiments with 249 normal and 297 abnormal acoustic signals from 40 knees, a Support Vector Machine with linear kernel gave the best results with an error rate of 13.9%.

Abstract
Cite

Conference paper

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018,

Multichannel kalman filtering for speech ehnancement

, IEEE Intl Conf on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 2379-190X

The use of spatial information in multichannel speech enhancement methods is well established but information associated with the temporal evolution of speech is less commonly exploited. Speech signals can be modelled using an autoregressive process in the time-frequency modulation domain, and Kalman filtering based speech enhancement algorithms have been developed for single-channel processing. In this paper, a multichannel Kalman filter (MKF) for speech enhancement is derived that jointly considers the multichannel spatial information and the temporal correlations of speech. We model the temporal evolution of speech in the modulation domain and, by incorporating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform domain. We also show that the proposed MKF becomes a conventional multichannel Wiener filter if the temporal information is discarded. Experiments using the signals generated from a public head-related impulse response database demonstrate the effectiveness of the proposed method in comparison to other techniques.

Abstract
Cite

Conference paper

Moore AH, Naylor P, Brookes DM, 2018,

Room identification using frequency dependence of spectral decay statistics

, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers Inc., Pages: 6902-6906, ISSN: 0736-7791

A method for room identification is proposed based on the reverberation properties of multichannel speech recordings. The approach exploits the dependence of spectral decay statistics on the reverberation time of a room. The average negative-side variance within 1/3-octave bands is proposed as the identifying feature and shown to be effective in a classification experiment. However, negative-side variance is also dependent on the direct-to-reverberant energy ratio. The resulting sensitivity to different spatial configurations of source and microphones within a room are mitigated using a novel reverberation enhancement algorithm. A classification experiment using speech convolved with measured impulse responses and contaminated with environmental noise demonstrates the effectiveness of the proposed method, achieving 79% correct identification in the most demanding condition compared to 40% using unenhanced signals.

Abstract
Cite

Conference paper

Antonello N, De Sena E, Moonen M, Naylor PA, van Waterschoot Met al., 2018,

JOINT SOURCE LOCALIZATION AND DEREVERBERATION BY SOUND FIELD INTERPOLATION USING SPARSE REGULARIZATION

, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6892-6896

In this paper, source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists in the interpolation of the sound field measured by a set of microphones by matching the recorded sound pressure with that of a particular acoustic model. This model is based on a collection of equivalent sources creating either spherical or plane waves. In order to achieve meaningful results, spatial, spatio-temporal and spatio-spectral sparsity can be promoted in the signals originating from the equivalent sources. The inverse problem consists of a large-scale optimization problem that is solved using a first order matrix-free optimization algorithm. It is shown that once the equivalent source signals capable of effectively interpolating the sound field are obtained, they can be readily used to localize a speech sound source in terms of Direction of Arrival (DOA) and to perform dereverberation in a highly reverberant environment.

Journal article

Evers C, Naylor PA, 2018,

Acoustic SLAM

, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1484-1498, ISSN: 2329-9290

An algorithm is presented that enables devices equipped with microphones, such as robots, to move within their environment in order to explore, adapt to and interact with sound sources of interest. Acoustic scene mapping creates a 3D representation of the positional information of sound sources across time and space. In practice, positional source information is only provided by Direction-of-Arrival (DoA) estimates of the source directions; the source-sensor range is typically difficult to obtain. DoA estimates are also adversely affected by reverberation, noise, and interference, leading to errors in source location estimation and consequent false DoA estimates. Moroever, many acoustic sources, such as human talkers, are not continuously active, such that periods of inactivity lead to missing DoA estimates. Withal, the DoA estimates are specified relative to the observer's sensor location and orientation. Accurate positional information about the observer therefore is crucial. This paper proposes Acoustic Simultaneous Localization and Mapping (aSLAM), which uses acoustic signals to simultaneously map the 3D positions of multiple sound sources whilst passively localizing the observer within the scene map. The performance of aSLAM is analyzed and evaluated using a series of realistic simulations. Results are presented to show the impact of the observer motion and sound source localization accuracy.

Journal article

Evers C, Habets EAP, Gannot S, Naylor PAet al., 2018,

DoA reliability for distributed acoustic tracking

, IEEE Signal Processing Letters, Vol: 25, Pages: 1320-1324, ISSN: 1070-9908

Distributed acoustic tracking estimates the trajectories of source positions using an acoustic sensor network. As it is often difficult to estimate the source-sensor range from individual nodes, the source positions have to be inferred from Direction of Arrival (DoA) estimates. Due to reverberation and noise, the sound field becomes increasingly diffuse with increasing source-sensor distance, leading to decreased DoA estimation accuracy. To distinguish between accurate and uncertain DoA estimates, this paper proposes to incorporate the Coherent-to-Diffuse Ratio as a measure of DoA reliability for single-source tracking. It is shown that the source positions therefore can be probabilistically triangulated by exploiting the spatial diversity of all nodes.

Abstract
Cite

Conference paper

Dawson PJ, De Sena E, Naylor PA, 2018,

An acoustic image-source characterisation of surface profiles

, 2018 26th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2130-2134, ISSN: 2076-1465

The image-source method models the specular reflection from a plane by means of a secondary source positioned at the source's reflected image. The method has been widely used in acoustics to model the reverberant field of rectangular rooms, but can also be used for general-shaped rooms and non-flat reflectors. This paper explores the relationship between the physical properties of a non-flat reflector and the statistical properties of the associated cloud of image-sources. It is shown here that the standard deviation of the image-sources is strongly correlated with the ratio between depth and width of the reflector's spatial features.

Abstract
Cite

Conference paper

Löllmann HW, Evers C, Schmidt A, Mellmann H, Barfuss H, Naylor PA, Kellermann Wet al., 2018,

The LOCATA challenge data corpus for acoustic source localization and tracking

, IEEE Sensor Array and Multichannel Signal Processing Workshop 2018, Publisher: IEEE, ISSN: 2151-870X

Algorithms for acoustic source localization andtracking are essential for a wide range of applications suchas personal assistants, smart homes, tele-conferencing systems,hearing aids, or autonomous systems. Numerous algorithms havebeen proposed for this purpose which, however, are not evaluatedand compared against each other by using a common database sofar. The IEEE-AASP Challenge on sound source localization andtracking (LOCATA) provides a novel, comprehensive data corpusfor the objective benchmarking of state-of-the-art algorithmson sound source localization and tracking. The data corpuscomprises six tasks ranging from the localization of a singlestatic sound source with a static microphone array to the trackingof multiple moving speakers with a moving microphone array.It contains real-world multichannel audio recordings, obtainedby hearing aids, microphones integrated in a robot head, aplanar and a spherical microphone array in an enclosed acousticenvironment as well as positional information about the involvedarrays and sound sources represented by moving human talkersor static loudspeakers.

Abstract
Cite

Conference paper

Hafezi S, Moore AH, Naylor PA, 2018,

ROBUST SOURCE COUNTING AND ACOUSTIC DOA ESTIMATION USING DENSITY-BASED CLUSTERING

, 10th IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Publisher: IEEE, Pages: 395-399, ISSN: 1551-2282

Journal article

Evers C, Naylor PA, 2018,

Optimized self-localization for SLAM in dynamic scenes using probability hypothesis density filters

, IEEE Transactions on Signal Processing, Vol: 66, Pages: 863-878, ISSN: 1053-587X

In many applications, sensors that map the positions of objects in unknown environments are installed on dynamic platforms. As measurements are relative to the observer's sensors, scene mapping requires accurate knowledge of the observer state. However, in practice, observer reports are subject to positioning errors. Simultaneous localization and mapping addresses the joint estimation problem of observer localization and scene mapping. State-of-the-art approaches typically use visual or optical sensors and therefore rely on static beacons in the environment to anchor the observer estimate. However, many applications involving sensors that are not conventionally used for Simultaneous Localization and Mapping (SLAM) are affected by highly dynamic scenes, such that the static world assumption is invalid. This paper proposes a novel approach for dynamic scenes, called GEneralized Motion (GEM) SLAM. Based on probability hypothesis density filters, the proposed approach probabilistically anchors the observer state by fusing observer information inferred from the scene with reports of the observer motion. This paper derives the general, theoretical framework for GEM-SLAM, and shows that it generalizes existing Probability Hypothesis Density (PHD)-based SLAM algorithms. Simulations for a model-specific realization using range-bearing sensors and multiple moving objects highlight that GEM-SLAM achieves significant improvements over three benchmark algorithms.

Conference paper

Galand M, 2018,

Foreword

, Publisher: REVUE BELGE PHILOLOGIE HISTOIRE, Pages: 5-+, ISSN: 0035-0818

Journal article

De Sena E, Brookes DM, Naylor PA, van Waterschoot Tet al., 2017,

Localization Experiments with Reporting by Head Orientation: Statistical Framework and Case Study

, Journal of the Audio Engineering Society, Vol: 65, Pages: 982-996, ISSN: 0004-7554

This research focuses on sound localization experiments in which subjects report the position of an active sound source by turning toward it. A statistical framework for the analysis of the data is presented together with a case study from a large-scale listening experiment. The statistical framework is based on a model that is robust to the presence of front/back confusions and random errors. Closed-form natural estimators are derived, and one-sample and two-sample statistical tests are described. The framework is used to analyze the data of an auralized experiment undertaken by nearly nine hundred subjects. The objective was to explore localization performance in the horizontal plane in an informal setting and with little training, which are conditions that are similar to those typically encountered in consumer applications of binaural audio. Results show that responses had a rightward bias and that speech was harder to localize than percussion sounds, which are results consistent with the literature. Results also show that it was harder to localize sound in a simulated room with a high ceiling despite having a higher direct-to-reverberant ratio than other simulated rooms.

Abstract
Cite

Conference paper

Weiss S, Goddard NJ, Somasundaram S, Proudler IK, Naylor PAet al., 2017,

Identification of Broadband Source-Array Responses from Sensor Second Order Statistics

, Sensor Signal Processing for Defence Conference (SSPD), Publisher: IEEE, Pages: 35-39

This paper addresses the identification of source-sensor transfer functions from the measured space-time covariance matrix in the absence of any further side information about the source or the propagation environment. Using polynomial matrix decomposition techniques, the responses can be narrowed down to an indeterminacy of a common polynomial factor. If at least two different measurements for a source with constant power spectral density are available, this indeterminacy can be reduced to an ambiguity in the phase response of the source-sensor paths.

Conference paper

Papayiannis C, Evers C, Naylor PA, 2017,

Sparse parametric modeling of the early part of acoustic impulse responses

, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 678-682, ISSN: 2076-1465

Acoustic channels are typically described by their Acoustic Impulse Response (AIR) as a Moving Average (MA) process. Such AIRs are often considered in terms of their early and late parts, describing discrete reflections and the diffuse reverberation tail respectively. We propose an approach for constructing a sparse parametric model for the early part. The model aims at reducing the number of parameters needed to represent it and subsequently reconstruct from the representation the MA coefficients that describe it. It consists of a representation of the reflections arriving at the receiver as delayed copies of an excitation signal. The Time-Of-Arrivals of reflections are not restricted to integer sample instances and a dynamically estimated model for the excitation sound is used. We also present a corresponding parameter estimation method, which is based on regularized-regression and nonlinear optimization. The proposed method also serves as an analysis tool, since estimated parameters can be used for the estimation of room geometry, the mixing time and other channel properties. Experiments involving simulated and measured AIRs are presented, in which the AIR coefficient reconstruction-error energy does not exceed 11.4% of the energy of the original AIR coefficients. The results also indicate dimensionality reduction figures exceeding 90% when compared to a MA process representation.

Conference paper

Hafezi S, Moore AH, Naylor PA, 2017,

Multiple DOA estimation based on estimation consistency and spherical harmonic multiple signal classification

, European Signal Processing Conference, EUSIPCO 2017, Pages: 1240-1244

© EURASIP 2017. A common approach to multiple Direction-of- Arrival (DOA) estimation of speech sources is to identify Time- Frequency (TF) bins with dominant Single Source (SS) and apply DOA estimation such as Multiple Signal Classification (MUSIC) only on those TF bins. In the state-of-the-art Direct Path Dominance (DPD)-MUSIC, the covariance matrix, used as the input to MUSIC, is calculated using only the TF bins over a local TF region where only a SS is dominant. In this work, we propose an alternative approach to MUSIC in which all the SS-dominant TF bins for each speaker across TF domain are globally used to improve the quality of covariance matrix for MUSIC. Our recently proposed Multi-Source Estimation Consistency (MSEC) technique, which exploits the consistency of initial DOA estimates within a time frame based on adaptive clustering, is used to estimate the SS-dominant TF bins for each speaker. The simulation using spherical microphone array shows that our proposed MSEC-MUSIC significantly outperforms the state-of-the-art DPD-MUSIC with less than 6:5° mean estimation error and strong robustness to widely varying source separation for up to 5 sources in the presence of realistic reverberation and sensor noise.

Abstract
Cite

Journal article

Antonello N, De Sena E, Moonen M, Naylor PA, Van Waterschoot Tet al., 2017,

Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field

, IEEE/ACM Transactions on Audio Speech and Language Processing, Vol: 25, Pages: 1929-1941, ISSN: 2329-9290

© 2017 IEEE. Room Impulse Responses (RIRs) are typically measured using a set of microphones and a loudspeaker. When RIRs spanning a large volume are needed, many microphone measurements must be used to spatially sample the sound field. In order to reduce the number of microphone measurements, RIRs can be spatially interpolated. In the present study, RIR interpolation is formulated as an inverse problem. This inverse problem relies on a particular acoustic model capable of representing the measurements. Two different acoustic models are compared: the plane wave decomposition model and a novel time-domain model, which consists of a collection of equivalent sources creating spherical waves. These acoustic models can both approximate any reverberant sound field created by a far-field sound source. In order to produce an accurate RIR interpolation, sparsity regularization is employed when solving the inverse problem. In particular, by combining different acoustic models with different sparsity promoting regularizations, spatial sparsity, spatio-spectral sparsity, and spatio-temporal sparsity are compared. The inverse problem is solved using a matrix-free large-scale optimization algorithm. Simulations show that the best RIR interpolation is obtained when combining the novel time-domain acoustic model with the spatio-temporal sparsity regularization, outperforming the results of the plane wave decomposition model even when far fewer microphone measurements are available.

Abstract
Cite

Conference paper

Parada PP, Sharma D, van Waterschoot T, Naylor PAet al., 2017,

Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization

, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 86-90, ISSN: 2076-1465

Conference paper

Sharma D, Jost U, Naylor PA, 2017,

Non-Intrusive Bit-Rate Detection of Coded Speech

, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1799-1803, ISSN: 2076-1465

Journal article

Hafezi S, Moore AH, Naylor PATRICK, 2017,

Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain

, IEEE Transactions on Audio, Speech and Language Processing, Vol: 25, Pages: 1956-1968, ISSN: 1558-7916

Pseudointensity vectors (PIVs) provide a means of direction of arrival (DOA) estimation for spherical microphone arrays using only the zeroth and the first-order spherical harmonics. An augmented intensity vector (AIV) is proposed which improves the accuracy of PIVs by exploiting higher order spherical harmonics. We compared DOA estimation using our proposed AIVs against PIVs, steered response power (SRP) and subspace methods where the number of sources, their angular separation, the reverberation time of the room and the sensor noise level are varied. The results show that the proposed approach outperforms the baseline methods and performs at least as accurately as the state-of-the-art method with strong robustness to reverberation, sensor noise, and number of sources. In the single and multiple source scenarios tested, which include realistic levels of reverberation and noise, the proposed method had average error of 1.5∘ and 2∘, respectively.

Abstract
Cite

Report

Eaton DJ, Gaubitch ND, Moore AH, Naylor PAet al., 2017,

Acoustic Characterization of Environments (ACE) Challenge Results Technical Report

, Publisher: arXiv

This document provides supplementary information, and the results of the tests of acoustic parameter estimation algorithms on the AcousticCharacterization of Environments (ACE) Challenge Evaluation dataset which were subsequently submitted and written up into papers for theProceedings of the ACE Challenge [2]. This document is supporting material for a forthcoming journal paper on the ACE Challenge which will provide further analysis of the results.

Acoustic Analysis and Assessment of the Knee in Osteoarthritis During Walking

Multichannel kalman filtering for speech ehnancement

Room identification using frequency dependence of spectral decay statistics

JOINT SOURCE LOCALIZATION AND DEREVERBERATION BY SOUND FIELD INTERPOLATION USING SPARSE REGULARIZATION

Acoustic SLAM

DoA reliability for distributed acoustic tracking

An acoustic image-source characterisation of surface profiles

The LOCATA challenge data corpus for acoustic source localization and tracking

ROBUST SOURCE COUNTING AND ACOUSTIC DOA ESTIMATION USING DENSITY-BASED CLUSTERING

Optimized self-localization for SLAM in dynamic scenes using probability hypothesis density filters

Foreword

Localization Experiments with Reporting by Head Orientation: Statistical Framework and Case Study

Identification of Broadband Source-Array Responses from Sensor Second Order Statistics

Sparse parametric modeling of the early part of acoustic impulse responses

Multiple DOA estimation based on estimation consistency and spherical harmonic multiple signal classification

Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field

Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization

Non-Intrusive Bit-Rate Detection of Coded Speech

Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain

Acoustic Characterization of Environments (ACE) Challenge Results Technical Report

Contact us

Address

Email

Members only

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

ROBUST SOURCE COUNTING AND ACOUSTIC DOA ESTIMATION USING DENSITY-BASED CLUSTERING

Foreword

Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization

Non-Intrusive Bit-Rate Detection of Coded Speech

Acoustic Characterization of Environments (ACE) Challenge Results Technical Report

Contact us

Address

Email

Members only