Repository logo
 

CASL

Permanent URI for this collectionhttps://eresearch.qmu.ac.uk/handle/20.500.12289/22

Browse

Search Results

Now showing 1 - 4 of 4
  • Thumbnail Image
    Item
    Beyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCut
    (MDPI, 2022-02-02) Wrench, Alan A.; Balch-Tomes, Jonathan
    Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56, mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.
  • Thumbnail Image
    Item
    Coarticulation across morpheme boundaries: An ultrasound study of past-tense inflection in Scottish English
    (Elsevier, 2021-09-15) Mousikou, Petroula; Strycharczuk, Patrycja; Turk, Alice; Scobbie, James M.
    It has been hypothesized that morphologically-complex words are mentally stored in a decomposed form, often requiring online composition during processing. Morphologically-simple words can only be stored as a whole. The way a word is stored and retrieved is thought to influence its realization during speech production, so that when retrieval requires less time, the articulatory plan is executed faster. Faster articulatory execution could result in more coarticulation. Accordingly, we hypothesized that morphologically-simple words might be produced with more coarticulation than apparently homophonous morphologically-complex words, because the retrieval of monomorphemic forms is direct, in contrast to morphologically-complex ones, which might need to be composed online into full word forms. Using Ultrasound Tongue Imaging, we tested this hypothesis with nine speakers of Scottish English. Over two days of training, participants learned phonemically identical monomorphemic and morphologically-complex nonce words, while on the third consecutive testing day, they produced them in two prosodic contexts. Two types of articulatory analyses revealed no systematic differences in coarticulation between monomorphemic and morphologically-complex items, yet a few speakers did idiosyncratically produce some morphological effects on articulation. Our work contributes to our understanding of how morphologically complex words are stored and processed during speech production.
  • Thumbnail Image
    Item
    Quantifying changes in ultrasound tongue-shape pre- and post-intervention in speakers with submucous cleft palate: An illustrative case study
    (Taylor & Francis, 2021-09-08) Roxburgh, Zoe; Cleland, Joanne; Scobbie, James M.; Wood, Sara
    Ultrasound Tongue Imaging is increasingly used during assessment and treatment of speech sound disorders. Recent literature has shown that ultrasound is also useful for the quantitative analysis of a wide range of speech errors. So far, the compensatory articulations of speakers with cleft palate have only been analysed qualitatively. This study provides a pilot quantitative ultrasound analysis, drawing on longitudinal intervention data from a child with submucous cleft palate. Two key ultrasound metrics were used: 1. articulatory t-tests were used to compare tongue-shapes for perceptually collapsed phonemes on a radial measurement grid and 2. the Mean Radial Difference was reported to quantify the extent to which the two tongue shapes differ, overall. This articulatory analysis supplemented impressionistic phonetic transcriptions and identified covert contrasts. Articulatory errors identified in this study using ultrasound were in line with errors identified in the speech of children with cleft palate in previous literature. While compensatory error patterns commonly found in speakers with cleft palate have been argued to facilitate functional phonological development, the nature of our findings suggest that the compensatory articulations uncovered are articulatory in nature.
  • Thumbnail Image
    Item
    Lenition and fortition of /r/ in utterance-final position, an ultrasound tongue imaging study of lingual gesture timing in spontaneous speech
    (Elsevier, 2021-04-16) Lawson, Eleanor; Stuart-Smith, Jane; Taehong Cho
    The most fundamental division in English dialects is the rhotic/non-rhotic division. The mechanisms of historical /r/-loss sound change are not well understood, but studying a contemporary /r/-loss sound change in a rhotic variety of English can provide new insights. We know that /r/ weakening in contemporary Scottish English is a gesture-timing based phenomenon and that it is socially indexical, but we have no phonetic explanation for the predominance of weak /r/ variants in utterance-final position. Using a socially-stratified conversational ultrasound tongue imaging speech corpus, this study investigates the effects of boundary context, along with other linguistic and social factors such as syllable stress, following-consonant place and social class, on lingual gesture timing in /r/ and strength of rhoticity. Mixed-effects modelling identified that utterance-final context conditions greater anterior lingual gesture delay in /r/ and weaker-sounding /r/s, but only in working-class speech. Middle-class speech shows no anterior lingual gesture delay for /r/ in utterance-final position and /r/ is audibly strengthened in this position. It is unclear whether this divergence is due to variation in underlying tongue shape for /r/ in these social-class communities, or whether utterance-final position provides a key location for the performance of social class using salient variants of /r/.