RELIABILITY AND VALIDITY OF ACOUSTIC VOICE ANALYSIS USING SMARTPHONE RECORDINGS FOR CLINICAL AND REMOTE ASSESSMENT
Citation
Abstract
This thesis addresses the critical need for reliable and accessible methods to assess and monitor vocal health, particularly among occupational voice users and patients accessing speech and language therapy remotely. Traditional clinical methods, including patient self-reports and acoustic analyses conducted during isolated visits, provide only limited snapshots of vocal health, often missing daily fluctuations essential for long-term well-being. Leveraging advancements in smartphone technology, which offers sophisticated audio processing and widespread accessibility, this research explores the feasibility and clinical utility of smartphone-based acoustic voice analysis. The research comprises four comprehensive studies. Studies 1, 1b and 1c evaluate the reliability of acoustic measurements obtained from four smartphone models compared to a professional studio microphone in a controlled environment. Results indicate that measures such as cepstral peak prominence (CPPS), harmonics-tonoise ratio (HNR), long-term average spectrum (LTAS) slope, and glottal noise excitation ratio (GNE) demonstrate acceptable random error, suggesting that smartphones can reliably capture these parameters under controlled conditions. Study 2 validates the use of loudspeakers to transmit pre-recorded voice signals for acoustic analysis. The study finds minimal systematic bias and acceptable random errors, particularly for reading passages, affirming the reliability of loudspeaker transmitted recordings for standardised voice assessments. Study 3 assesses the validity and reliability of smartphones in field environments, comparing their performance to a studio-grade reference microphone. Higher-end smartphones, such as the iPhone 6s, reliably capture fundamental frequency (F0), CPPS, LTAS slope, and GNE, although shimmer and jitter measures exhibit significant variability. Study 4 investigates the impact of ambient noise on smartphone recordings, revealing that spectral measures remain stable, while parameters like shimmer and jitter are adversely affected by background noise. This underscores the necessity for controlled recording environments or robust noise mitigation strategies in real-world applications.
Overall, this thesis demonstrates that smartphones hold significant potential for remote and real-time vocal health monitoring, particularly when focused on specific acoustic measures and controlled recording conditions. The findings contribute to the development of standardised protocols, enhancing the integration of smartphone technology into clinical voice assessment and telehealth services.