Speech and Hearing Sciences

Permanent URI for this collectionhttps://eresearch.qmu.ac.uk/handle/20.500.12289/7192

Browse

Search Results

Now showing 1 - 10 of 29

RELIABILITY AND VALIDITY OF ACOUSTIC VOICE ANALYSIS USING SMARTPHONE RECORDINGS FOR CLINICAL AND REMOTE ASSESSMENT
(Queen Margaret University, Edinburgh, 2025-05) Jannetts, Stephen
This thesis addresses the critical need for reliable and accessible methods to assess and monitor vocal health, particularly among occupational voice users and patients accessing speech and language therapy remotely. Traditional clinical methods, including patient self-reports and acoustic analyses conducted during isolated visits, provide only limited snapshots of vocal health, often missing daily fluctuations essential for long-term well-being. Leveraging advancements in smartphone technology, which offers sophisticated audio processing and widespread accessibility, this research explores the feasibility and clinical utility of smartphone-based acoustic voice analysis. The research comprises four comprehensive studies. Studies 1, 1b and 1c evaluate the reliability of acoustic measurements obtained from four smartphone models compared to a professional studio microphone in a controlled environment. Results indicate that measures such as cepstral peak prominence (CPPS), harmonics-tonoise ratio (HNR), long-term average spectrum (LTAS) slope, and glottal noise excitation ratio (GNE) demonstrate acceptable random error, suggesting that smartphones can reliably capture these parameters under controlled conditions. Study 2 validates the use of loudspeakers to transmit pre-recorded voice signals for acoustic analysis. The study finds minimal systematic bias and acceptable random errors, particularly for reading passages, affirming the reliability of loudspeaker transmitted recordings for standardised voice assessments. Study 3 assesses the validity and reliability of smartphones in field environments, comparing their performance to a studio-grade reference microphone. Higher-end smartphones, such as the iPhone 6s, reliably capture fundamental frequency (F0), CPPS, LTAS slope, and GNE, although shimmer and jitter measures exhibit significant variability. Study 4 investigates the impact of ambient noise on smartphone recordings, revealing that spectral measures remain stable, while parameters like shimmer and jitter are adversely affected by background noise. This underscores the necessity for controlled recording environments or robust noise mitigation strategies in real-world applications. Overall, this thesis demonstrates that smartphones hold significant potential for remote and real-time vocal health monitoring, particularly when focused on specific acoustic measures and controlled recording conditions. The findings contribute to the development of standardised protocols, enhancing the integration of smartphone technology into clinical voice assessment and telehealth services.
INVESTIGATING HEARING CARE IN CARE HOMES FOR OLDER PEOPLE IN SCOTLAND THROUGH THE TRANSFORMATIVE WORLDVIEW
(Queen Margaret University, Edinburgh, 2023-12) White, Amy
Communication is a human right and a critical enabler of other human rights. The 2021 World Report on Hearing states that age-related hearing loss should be recognised as a public health priority, owing to its impact on brain health and communication. Prevalence of deafness in care homes for older people is around 90%, yet is largely undetected and untreated. Recommendations to address these issues include hearing screening and staff training. However, solutions are not presented in the context of any legislative health and social care improvement framework. The Public Services Reform (Scotland) Act 2010 transformed the care home sector to enhance safeguarding practices for older people. In addition, the Scottish Government’s See Hear Strategy pledges support for older people living with deafness to experience equality of access to health and social care services. This thesis investigated hearing care in care homes for older people through the lens of the Scottish Government’s framework for health and social care, using a two-stage mixed-methods design, underpinned by the Transformative worldview. Stage 1 explored the landscape of hearing care in care homes using documentary analysis. Sources of evidence centred on the regulatory organisations involved in the care sector. Online focus groups with the Care Inspectorate were also conducted. It was established hearing care training is not mandatory for care home staff and no regulatory framework for hearing care scrutiny exists in which to safeguard the sensory needs of older people in Scotland’s care homes. Stage 2 formed an instrumental case study of a single care home for older people in which the real-life context of hearing care was explored through documentary analysis, questionnaire, focus groups with staff and hearing assessments with residents. Results revealed there was no policy for identifying residents living with deafness nor any core workforce learning structure related to hearing care. Staff identified knowledge gaps and welcomed more opportunities for training. The prevalence of deafness across 21 residents was between 76-90%. Integrating the results of Stage 1 and 2 suggests Transformative reform is required at the level of both the care home workforce and the wider organisations involved in service improvement and regulation, to meet the recommendations of the See Hear Strategy and achieve equality for older people. The See Hear Strategy will be refreshed in 2025 and the Scottish Government is preparing major reforms through the launch of a National Care Service by 2026. This thesis is therefore timely, highlighting the need for hearing care to be recognised as a priority in care homes, and embedded in any new framework for social care to further social justice and reinforce the human right to communication.
A Fragmented Profession within the System of Professions: The Experience of the Audiology Professional in the United Kingdom
(Queen Margaret University, Edinburgh, 2023-06-28) Steenkamp, Lizanne
The main purpose of this study was to explore the lived experience of audiology professionals in the United Kingdom. For the purposes of this study an audiology professional is defined as someone who completed a United Kingdom or International course/training pathway in audiology and who is working in the UK. The definition can include audiologists, hear(ing) care assistants, hearing aid dispensers, hearing therapists and clinical scientists. Audiology professionals working in Higher Education were also included. Working in two different contexts with similar and dissimilar aspects of role descriptions, as well as boundaries of practice led to the research question: What is the experience of audiology professionals in becoming and being an audiology professional in the United Kingdom? The following strands narrowed the focus of the study and helped to identify the appropriate methodological approach: 1. The experience of becoming an audiology professional 2. The experience of being an audiology professional 3. The impact of change in education pathways and service delivery on the audiology professional The research question was explored through an Exploratory Sequential Mixed Methods approach starting with interviews of eight participants followed by a survey circulated to the wider profession with 329 respondents. Data analysis consists of interpretive phenomenological analysis of the interviews and descriptive statistics for the surveys. The results from both stages will be discussed in relation to the sociology of professions, specifically Abbott’s (1988) system of professions with elements of Bourdieu’s social world theory (1985). The results sketch a fragmented profession divided by titles, professional organisations, and regulatory bodies as well as many education pathways across the private sector and the NHS.
ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSON’S DISEASE
(Queen Margaret University, Edinburgh, 2023-06-28) Murali, Mridhula
Previous research has identified certain overarching features of hypokinetic dysarthria associated with Parkinson’s Disease and found it manifests differently between individuals. Acoustic analysis has often been used to find correlates of perceptual features for differential diagnosis. However, acoustic parameters that are robust for differential diagnosis may not be sensitive to tracking speech changes. Previous longitudinal studies have had limited sample sizes or variable lengths between data collection. This study focused on using acoustic correlates of perceptual features to identify acoustic markers able to track speech changes in people with Parkinson’s Disease (PwPD) over six months. The thesis presents how this study has addressed limitations of previous studies to make a novel contribution to current knowledge. Speech data was collected from 63 PwPD and 47 control speakers using an online podcast software at two time points, six months apart (T1 and T2). Recordings of a standard reading passage, minimal pairs, sustained phonation, and spontaneous speech were collected. Perceptual severity ratings were given by two speech and language therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody were investigated. Two analyses were conducted: a) to identify which acoustic parameters can track perceptual speech changes over time and b) to identify which acoustic parameters can track changes in speech intelligibility over time. An additional attempt was made to identify if these parameters showed group differences for differential diagnosis between PwPD and control speakers at T1 and T2. Results showed that specific acoustic parameters in voice quality, articulation and prosody could differentiate between PwPD and controls, or detect speech changes between T1 and T2, but not both factors. However, specific acoustic parameters within articulation could detect significant group and speech change differences across T1 and T2. The thesis discusses these results, their implications, and the potential for future studies.
VOCAL BIOMARKERS OF CLINICAL DEPRESSION: WORKING TOWARDS AN INTEGRATED MODEL OF DEPRESSION AND SPEECH
(Queen Margaret University, Edinburgh, 2021) Miley Wilson, Erin Victoria
Speech output has long been considered a sensitive marker of a person’s mental state. It has been previously examined as a possible biomarker for diagnosis and treatment response for certain mental health conditions, including clinical depression. To date, it has been difficult to draw robust conclusions from past results due to diversity in samples, speech material, investigated parameters, and analytical methods. Within this exploratory study of speech in clinically depressed individuals, articulatory and phonatory behaviours are examined in relation to psychomotor symptom profiles and overall symptom severity. A systematic review provided context from the existing body of knowledge on the effects of depression on speech, and provided context for experimental setup within this body of work. Examinations of vowel space, monophthong, and diphthong productions as well as a multivariate acoustic analysis of other speech parameters (e.g., F0 range, perturbation measures, composite measures, etc.) are undertaken with the goal of creating a working model of the effects of depression on speech. Initial results demonstrate that overall vowel space area was not different between depressed and healthy speakers, but on closer inspection, this was due to more specific deficits seen in depressed patients along the first formant (F1) axis. Speakers with depression were more likely to produce centralised vowels along F1, as compared to F2—and this was more pronounced for low-front vowels, which are more complex given the degree of tongue-jaw coupling required for production. This pattern was seen in both monophthong and diphthong productions. Other articulatory and phonatory measures were inspected in a factor analysis as well, suggesting additional vocal biomarkers for consideration in diagnosis and treatment assessment of depression—including aperiodicity measures (e.g., higher shimmer and jitter), changes in spectral slope and tilt, and additive noise measures such as increased harmonics-to-noise ratio. Intonation was also affected by diagnostic status, but only for specific speech tasks. These results suggest that laryngeal and articulatory control is reduced by depression. Findings support the clinical utility of combining Ellgring and Scherer’s (1996) psychomotor retardation and social-emotional hypotheses to explain the effects of depression on speech, which suggest observed changes are due to a combination of cognitive, psycho-physiological and motoric mechanisms. Ultimately, depressive speech is able to be modelled along a continuum of hypo- to hyper-speech, where depressed individuals are able to assess communicative situations, assess speech requirements, and then engage in the minimum amount of motoric output necessary to convey their message. As speakers fluctuate with depressive symptoms throughout the course of their disorder, they move along the hypo-hyper-speech continuum and their speech is impacted accordingly. Recommendations for future clinical investigations of the effects of depression on speech are also presented, including suggestions for recording and reporting standards. Results contribute towards cross-disciplinary research into speech analysis between the fields of psychiatry, computer science, and speech science.
The effects of English proficiency on the processing of Bulgarian-accented English by Bulgarian-English bilinguals
(Queen Margaret University, Edinburgh, 2019) Dokovova, Marie
This dissertation explores the potential benefit of listening to and with one’s first-language accent, as suggested by the Interspeech Intelligibility Benefit Hypothesis (ISIB). Previous studies have not consistently supported this hypothesis. According to major second language learning theories, the listener’s second language proficiency determines the extent to which the listener relies on their first language phonetics. Hence, this thesis provides a novel approach by focusing on the role of English proficiency in the understanding of Bulgarian-accented English for Bulgarian-English bilinguals. The first experiment investigated whether evoking the listeners’ L1 Bulgarian phonetics would improve the speed of processing Bulgarian-accented English words, compared to Standard British English words, and vice versa. Listeners with lower English proficiency processed Bulgarian-accented English faster than SBE, while high proficiency listeners tended to have an advantage with SBE over Bulgarian accent. The second experiment measured the accuracy and reaction times (RT) in a lexical decision task with single-word stimuli produced by two L1 English speakers and two Bulgarian-English bilinguals. Listeners with high proficiency in English responded slower and less accurately to Bulgarian-accented speech compared to L1 English speech and compared to lower proficiency listeners. These accent preferences were also supported by the listener’s RT adaptation across the first experimental block. A follow-up investigation compared the results of L1 UK English listeners to the bilingual listeners with the highest proficiency in English. The L1 English listeners and the bilinguals processed both accents with similar speed, accuracy and adaptation patterns, showing no advantage or disadvantage for the bilinguals. These studies support existing models of second language phonetics. Higher proficiency in L2 is associated with lesser reliance on L1 phonetics during speech processing. In addition, the listeners with the highest English proficiency had no advantage when understanding Bulgarian-accented English compared to L1 English listeners, contrary to ISIB. Keywords: Bulgarian-English bilinguals, bilingual speech processing, L2 phonetic development, lexical decision, proficiency
MEASURING PRE-SPEECH ARTICULATION
(Queen Margaret University, Edinburgh, 2019) Palo, Pertti
Abstract: What do speakers do when they start to talk? This thesis concentrates on the articulatory aspects of this problem, and offers methodological and experimental results on tongue movement, captured using Ultrasound Tongue Imaging (UTI). Speech initiation occurs at the start of every utterance. An understanding of the timing relationship between articulatory initiation (which occurs first) and acoustic initiation (that is, the start of audible speech) has implications for speech production theories, the methodological design and interpretation of speech production experiments, and clinical studies of speech production. Two novel automated techniques for detecting articulatory onsets in UTI data were developed based on Euclidean distance. The methods are verified against manually annotated data. The latter technique is based on a novel way of identifying the region of the tongue that is first to initiate movement. Data from three speech production experiments are analysed in this thesis. The first experiment is picture naming recorded with UTI and is used to explore behavioural variation at the beginning of an utterance, and to test and develop analysis tools for articulatory data. The second experiment also uses UTI recordings, but it is specifically designed to exclude any pre-speech movements of the articulators which are not directly related to the linguistic content of the utterance itself (that is, which are not expected to be present in every full repetition of the utterance), in order to study undisturbed speech initiation. The materials systematically varied the phonetic onsets of the monosyllabic target words, and the vowel nucleus. They also provided an acoustic measure of the duration of the syllable rhyme. Statistical models analysed the timing relationships of articulatory onset, and acoustic durations of the sound segments, and the acoustic duration of the rhyme. Finally, to test a discrepancy between the results of the second UTI experiment and findings in the literature, based on data recorded with Electromagnetic Articulography (EMA), a third experiment measured a single speaker using both methods and matched materials. Using the global Pixel Difference and Scanline-based Pixel Difference analysis methods developed and verified in the first half of the thesis, the main experimental findings were as follows. First, pre-utterance silent articulation is timed in inverse correlation with the acoustic duration of the onset consonant and in positive correlation with the acoustic rhyme of the first word. Because of the latter correlation, it should be considered part of the first word. Second, comparison of UTI and EMA failed to replicate the discrepancy. Instead, EMA was found to produce longer reaction times independent of utterance type.
THE AUDITORY BRAINSTEM RESPONSE IN HEALTHY ADULTS AND ADULTS WITH ALCOHOL DEPENDENCE SYNDROME
(Queen Margaret University, Edinburgh, 2018) Johnson, Christine
The Auditory Brainstem Response (ABR) assesses brainstem function. This thesis explores the click and speech ABR in both healthy adults and adults with alcohol dependence syndrome (ADS). Experiment One undertook auditory-cognitive assessment including ABRs, of 60 healthy adults (30 women), aged 18-30 years. For waves III and V of the click ABR, women’s responses were earlier than men’s by 0.14ms and 0.19ms. For the speech ABR, onset and offset measures were earlier in women by at least 0.43ms. No effect for left vs. right ear was found in either case. Inter-rater reliability was found to be high (ICC2,1 ≥0.89) for the click ABR and good (ICC2,1 ≥0.75) for six of the seven peaks of the speech ABR. A comparison of ABRs to those from an older group of 12 adults aged 31-49 years (six women, matched control group for Experiment Two) found the stimulus to response lag for the speech ABR, was earlier (0.78ms) in the older women but within the expected range. Click and speech ABRs were repeated after 12 weeks and the representation of F0 for women was greater by 4.8 μV at the second recording. Experiment Two assessed the auditory-cognitive profile and ABRs of 16 adults (six women) aged 29-49 years, undergoing a treatment and rehabilitation programme for people with ADS. All participants had hearing thresholds within normal limits, but exhibited deficits in auditory-cognitive profiles compared to matched, healthy adults, including their click and speech ABRs. For the click ABR, men had significant delays in wave III (0.18ms) and wave V (0.22ms). For women there were significant delays for wave I (0.11ms) and wave V (0.22ms). For the speech ABR, men had significant delays in the onset measures of waves V (0.40ms) and A (0.36ms). Women had significant delays in waves V (0.45ms), A (0.48ms) E (0.66ms) and O (0.42ms). Testing was repeated after 12 weeks of abstinence and significant improvements in the click and speech ABR were observed. For men, average click ABR latencies improved for wave III (0.12ms) and wave V (0.22ms) and for women, wave V (0.08ms) improved. Significant improvements were also found for discrete peak and onset measures of the speech ABRs for both men and women. For men, average speech ABR latencies improved for wave A (0.23ms) and the duration of the VA complex (0.15ms). For women there were improvements in wave V (0.10ms), A (0.12ms) and E (0.33ms). These results add to the body of knowledge about the ABR and support its value as a clinical tool. They also provide new information about auditory-cognitive function in adults with ADS, for whom beneficial effects of abstinence are demonstrated. The ABR has a potential role in identifying people most at risk of alcohol related brain damage and in monitoring recovery with abstinence. Keywords Auditory Brainstem Response, Frequency Following Response, Speech ABR, Reliability, Alcohol Dependence Syndrome, Abstinence.
AN ARTICULATORY-ACOUSTIC INVESTIGATION OF TIMING AND COORDINATION IN THE FLUENT SPEECH OF PEOPLE WHO STAMMER
(Queen Margaret University, Edinburgh, 2019) Heyde, Cornelia
This thesis investigates Wingate’s Fault-Line hypothesis (1988) which suggests that disfluencies in people who stammer (PWS) result from a deficit in transition from consonant to vowel (CV) thereby implying that stammering as a motor-control disorder would affect transitions even when not perceptually salient. To test this proposal, we explored the perceptually fluent speech of PWS using instrumental analysis (ultrasound and acoustic) to determine the underlying pervasiveness of disfluencies in this group as compared to people who do not stammer (PNS). Following fluency screening of recorded utterances, we applied acoustic and articulatory analysis techniques to perceptually fluent utterances of 9 PWS and 9 typical speakers in order to identify indicators of disfluency in the transition from syllable onsets to the following vowel. Measures of acoustic duration, locus equation and formant slope offer insights into timing and degree of coarticulation. The articulatory ultrasound tongue imaging technique moreover provides kinematic information of the tongue. A novel technique was applied to dynamically analyse and quantify the tongue kinematics in transition. This allowed us to treat the perceptually fluent speech of PWS as an ongoing time-situated process. Both acoustic and articulatory findings indicate by-group differences in timing, whereby PWS are overall slower and more variable in the execution of CV transitions when compared to typical speakers (PNS). The findings from both instrumental approaches also indicate differences in coordination, suggesting that PWS coarticulate to a lesser extent than PNS. Overall, these findings suggest that PWS exhibit a global deficit in CV transition that can be observed in perceptually fluent as well as stammered speech. This is in keeping with the predictions of Wingate’s Fault-Line hypothesis. iv The fact that the conclusions from the acoustic and articulatory measures are coherent, shows that acoustic measures may be sufficient to act as a proxy for articulatory measures.
PAUSING MID-SENTENCE: YOUNG OFFENDER PERSPECTIVES ON THEIR LANGUAGE AND COMMUNICATION NEEDS
(Queen Margaret University, Edinburgh, 2019) Fitzsimons, Dermot
The study investigated participants’ perceptions of their own language and communication; their interactions with peers in prison; and their experiences with professionals in the welfare and justice systems. The prevalence of language disorder in the sample was also established. International research evidence has firmly established a high prevalence of language disorder in young offender populations. Less is known about young offenders’ perspectives on their own language abilities. The study recruited an opportunity sample of ten young men in custody at Polmont HMYOI who had recent experience of removal from association, or ‘segregation’. The research investigated participants’ language and communication abilities in order to inform future support and intervention. It focused on their communication with professionals and peers in justice, education and welfare settings. Results of standardised language assessment indicated the presence of language disorder in 44% (n=4) of the sample (n=9). Informal justice vocabulary assessment results showed an unexpectedly high mean score of 85%. Thematic analysis of interview data led to formulation of three main themes. These were categorised as: Valuing Communication, Literacy and Learning; Exerting Control; and Seeking Support. The themes are discussed with reference to Bronfenbrenner’s Bioecological Model. Participants offered reflective and rich views on their lived experience. They described their perspectives on: the antecedents of communication breakdown in prison; features of successful interaction with peers and authority figures; and a need for support in all justice environments, particularly in the court setting. Thus, this study makes a contribution to knowledge through adding to an emerging qualitative evidence base within Speech and Language Therapy.

Speech and Hearing Sciences

Browse

Filters

Settings

Sort By

Results per page

Search Results