Show simple item record

dc.rights.licenseCreative Commons Attribution License
dc.contributor.authorWrench, Alan A.en
dc.contributor.authorBalch-Tomes, Jonathanen
dc.date.accessioned2022-02-02T13:50:12Z
dc.date.available2022-02-02T13:50:12Z
dc.date.issued2022-02-02
dc.identifier.citationWrench, A. and Balch-Tomes, J. (2022) 'Beyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCut', Sensors, 22(3), article no. 1133.en
dc.identifier.issn1424-8220en
dc.identifier.urihttps://doi.org/10.3390/s22031133
dc.identifier.urihttps://eresearch.qmu.ac.uk/handle/20.500.12289/11795
dc.description.abstractAutomatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56, mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.en
dc.description.urihttps://doi.org/10.3390/s22031133en
dc.language.isoenen
dc.publisherMDPIen
dc.relation.ispartofSensorsen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectMultimodal Speechen
dc.subjectLip Readingen
dc.subjectUltrasound Tongue Imagingen
dc.subjectPose Estimationen
dc.subjectSpeech Kinematicsen
dc.subjectKeypointsen
dc.subjectLandmarksen
dc.titleBeyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCuten
dc.typeArticleen
dcterms.accessRightspublic
dcterms.dateAccepted2022-01-28
dc.description.volume22en
dc.description.ispublishedpub
rioxxterms.typeJournal Article/Reviewen
rioxxterms.publicationdate2022-02-02
refterms.dateFCD2022-02-02
refterms.depositExceptionpublishedGoldOAen
refterms.accessExceptionNAen
refterms.technicalExceptionNAen
refterms.panelUnspecifieden
qmu.authorWrench, Alan A.en
qmu.centreCASLen
dc.description.statuspub
dc.description.number3en
refterms.versionVoRen
refterms.dateDeposit2022-02-02


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution License
Except where otherwise noted, this item's license is described as Creative Commons Attribution License