Citation

BibTex format

@inproceedings{McKnight:2022:10.23919/APSIPAASC55919.2022.9979811,
author = {McKnight, S and Hogg, AOT and Neo, VW and Naylor, PA},
doi = {10.23919/APSIPAASC55919.2022.9979811},
pages = {394--401},
publisher = {IEEE},
title = {Studying human-based speaker diarization and comparing to state-of-the-art systems},
url = {http://dx.doi.org/10.23919/APSIPAASC55919.2022.9979811},
year = {2022}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Human-based speaker diarization experiments were carried out on a five-minute extract of a typical AMI corpus meeting to see how much variance there is in human reviews based on hearing only and to compare with state-of-the-art diarization systems on the same extract. There are three distinct experiments: (a) one with no prior information; (b) one with the ground truth speech activity detection (GT-SAD); and (c) one with the blank ground truth labels (GT-labels). The results show that most human reviews tend to be quite similar, albeit with some outliers, but the choice of GT-labels can make a dramatic difference to scored performance. Using the GT-SAD provides a big advantage and improves human review scores substantially, though small differences in the GT-SAD used can have a dramatic effect on results. The use of forgiveness collars is shown to be unhelpful. The results show that state-of-the-art systems can outperform the best human reviews when no prior information is provided. However, the best human reviews still outperform state-of-the-art systems when starting from the GT-SAD.
AU - McKnight,S
AU - Hogg,AOT
AU - Neo,VW
AU - Naylor,PA
DO - 10.23919/APSIPAASC55919.2022.9979811
EP - 401
PB - IEEE
PY - 2022///
SP - 394
TI - Studying human-based speaker diarization and comparing to state-of-the-art systems
UR - http://dx.doi.org/10.23919/APSIPAASC55919.2022.9979811
UR - http://hdl.handle.net/10044/1/103195
ER -

Contact

For more information about the group, please contact:

Dr Dan Goodman
+44 (0)20 7594 6264
d.goodman@imperial.ac.uk