Beyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCut

Wrench, Alan A.; Balch-Tomes, Jonathan

Beyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCut

dc.contributor.author	Wrench, Alan A.	en
dc.contributor.author	Balch-Tomes, Jonathan	en
dc.date.accessioned	2022-02-02T13:50:12Z
dc.date.available	2022-02-02T13:50:12Z
dc.date.issued	2022-02-02
dc.description.abstract	Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56, mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.	en
dc.description.ispublished	pub
dc.description.number	3	en
dc.description.status	pub
dc.description.uri	https://doi.org/10.3390/s22031133	en
dc.description.volume	22	en
dc.identifier.citation	Wrench, A. and Balch-Tomes, J. (2022) ‘Beyond the edge: markerless pose estimation of speech articulators from ultrasound and camera images using deeplabcut’, Sensors, 22(3), p. 1133. Available at: https://doi.org/10.3390/s22031133.	en
dc.identifier.issn	1424-8220	en
dc.identifier.uri	https://doi.org/10.3390/s22031133
dc.identifier.uri	https://eresearch.qmu.ac.uk/handle/20.500.12289/11795
dc.language.iso	en	en
dc.publisher	MDPI	en
dc.relation.ispartof	Sensors	en
dc.rights.license	Creative Commons Attribution License
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	Multimodal Speech	en
dc.subject	Lip Reading	en
dc.subject	Ultrasound Tongue Imaging	en
dc.subject	Pose Estimation	en
dc.subject	Speech Kinematics	en
dc.subject	Keypoints	en
dc.subject	Landmarks	en
dc.title	Beyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCut	en
dc.type	Article	en
dcterms.accessRights	public
dcterms.dateAccepted	2022-01-28
qmu.author	Wrench, Alan A.	en
qmu.centre	CASL	en
refterms.accessException	NA	en
refterms.dateDeposit	2022-02-02
refterms.dateFCD	2022-02-02
refterms.depositException	publishedGoldOA	en
refterms.panel	Unspecified	en
refterms.technicalException	NA	en
refterms.version	VoR	en
rioxxterms.publicationdate	2022-02-02
rioxxterms.type	Journal Article/Review	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 11795.pdf
Size:: 4.78 MB
Format:: Adobe Portable Document Format
Description:: Published Version

Download

Collections

CASL