Comparison of forced-alignment speech recognition and humans for generating reference VAD
View/ Open
Date
2015Author
Kraljevski, I.
Tan, Z-H
Bissiri, Maria Paola
Metadata
Show full item recordCitation
Kraljevski, I., Tan, Z. & Bissiri, M. (2015) Comparison of forced-alignment speech recognition and humans for generating reference VAD, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, , , pp. 2937-2941,
Abstract
This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.