Browsing by Person "Palo, Pertti"

Now showing 1 - 11 of 11

A Short Term Study of Hungarians Learning Finnish Vowels
(Fonetiikan P_iv_t, 2014) Peltola, T.; Palo, Pertti; Aaltonen, O.
A group of Hungarian students (n=10) participated in a Finnish phonetics and conversation course during the first 3 months of their language studies. During the course, the students trained in the allophonic variation of Finnish speech sounds, comparing them to Hungarian sounds and participating in group conversation exercises. We call the method used on the course conscious phonetic training of foreign language speech sounds. Additionally, the students participated in one-on-one imitating exercises, which were recorded for the current study. We followed the participants' foreign sounds pronunciation development during the first semester of their studies and compared it to their peers (n=4). The results suggest that participating in the course affected the students' pronunciation skills towards the end of the three-month course, whereas, at the beginning, both of the groups' pronunciation was more similar.
Comparing pitch distributions using Praat and R
(Journal of ISPhS/International Society of Phonetic, 2016-02) Lennes, Mietta; Stevanovic, Melisa; Aalto, Daniel; Palo, Pertti
Pitch analysis tools are used widely in order to measure and to visualize the melodic aspects of speech. The resulting pitch contours can serve various research interests linked with speech prosody, such as intonational phonology, interaction in conversation, emotion analysis, language learning and singing. Due to physiological differences and individual habits, speakers tend to differ in their typical pitch ranges. As a consequence, pitch analysis results are not always easy to interpret and to compare among speakers. In this study, we use the Praat program (Boersma & Weenink 2015) for analyzing pitch in samples of conversational Finnish speech and we use the R statistical programming environment (R Core Team, 2014) for further analysis and visualization. We first describe the general shapes of the speaker-specific pitch distributions and see whether and how the distributions vary between individuals. A bootstrapping method is applied to discover the minimal amount of speech that is necessary in order to reliably determine the pitch mean, median and mode for an individual speaker. The scripts and code written for the Praat program and for the R statistical programming environment are made available under an open license for experimenting with other speech samples. The datasets produced with the Praat script will also be made available for further studies.
Effect of phonetic onset on acoustic and articulatory speech reaction times studied with tongue ultrasound
(International Phonetic Association, 2015-08-15) Palo, Pertti; Schaeffler, Sonja; Scobbie, James M.
We study the effect that phonetic onset has on acoustic and articulatory reaction times. An acoustic study by Rastle et al. (2005) shows that the place and manner of the first consonant in a target affects acoustic RT. An articulatory study by Kawamoto et al. (2008) shows that the same effect is not present in articulatory reaction time of the lips. We hypothesise, therefore, that in a replication with articulatory instrumentation for the tongue, we should find the same acoustic effect, but no effect in the articulatory reaction time. As a proof of concept of articulatory measurement from ultrasound images, we report results from a pilot experiment which also extends the dataset to include onset-less syllables. The hypothesis is essentially confirmed with statistical analysis and we explore and discuss the effect of different vowels and onset types (including null onsets) on articulatory and acoustic RT and speech production.
Emotions in freely varying and mono-pitched vowels, acoustic and EGG analyses
(2015-10) Waaramaa, Teija; Palo, Pertti; Kankare, Elina
Vocal emotions are expressed either by speech or singing. The difference is that in singing the pitch is predetermined while in speech it may vary freely. It was of interest to study whether there were voice quality differences between freely varying and mono-pitched vowels expressed by professional actors. Given their profession, actors have to be able to express emotions both by speech and singing. Electroglottogram and acoustic analyses of emotional utterances embedded in expressions of freely varying vowels [a:], [i:], [u:] (96 samples) and mono-pitched protracted vowels (96 samples) were studied. Contact quotient (CQEGG) was calculated using 35%, 55%, and 80% threshold levels. Three different threshold levels were used in order to evaluate their effects on emotions. Genders were studied separately. The results suggested significant gender differences for CQEGG 80% threshold level. SPL, CQEGG, and F4 were used to convey emotions, but to a lesser degree, when F0 was predetermined. Moreover, females showed fewer significant variations than males. Both genders used more hypofunctional phonation type in mono-pitched utterances than in the expressions with freely varying pitch. The present material warrants further study of the interplay between CQEGG threshold levels and formant frequencies, and listening tests to investigate the perceptual value of the mono-pitched vowels in the communication of emotions. 2014 Informa UK, Ltd.
Glottal squeaks in VC sequences
(ISCA, 2016) Hejna, Misa; Palo, Pertti; Moisik, Scott
This paper reports results related to the phenomenon referred to as a glottal squeak- (coined by [1]). At present, nothing is known about the conditioning and the articulation of this feature of speech. Our qualitative acoustic analyses of the conditioning of squeaks (their frequency of occurrence, duration, and f0) found in Aberystwyth English and Manchester English suggest that squeaking may be a result of intrinsically tense vocal fold state associated with thyroarytenoid (TA) muscle recruitment [2] required for epilaryngeal constriction and vocal-ventricular fold contact (VVFC) needed to produce glottalisation [3]. In this interpretation, we hypothesise that squeaks occasionally occur during constriction disengagement: at the point when VVFC suddenly releases but the TAs have not yet fully relaxed. Extralinguistic conditioning identified in this study corroborates findings reported by [1]: females are more prone to squeaking and the phenomenon is individual-dependent.
How young adults with autism spectrum disorder watch and interpret pragmatically complex scenes
(Taylor & Francis, 2017-11-01) Lönnqvist, Linda; Loukusa, Soile; Hurtig, Tuula; Mäkinen, Leena; Siipo, Antti; Väyrynen, Eero; Palo, Pertti; Laukka, Seppo; Mämmelä, Laura; Mattila, Marja-Leena; Ebeling, Hanna
The aim of the current study was to investigate subtle characteristics of social perception and interpretation in high-functioning individuals with autism spectrum disorders (ASDs), and to study the relation between watching and interpreting. As a novelty, we used an approach that combined moment-by-moment eye tracking and verbal assessment. Sixteen young adults with ASD and 16 neurotypical control participants watched a video depicting a complex communication situation while their eye movements were tracked. The participants also completed a verbal task with questions related to the pragmatic content of the video. We compared verbal task scores and eye movements between groups, and assessed correlations between task performance and eye movements. Individuals with ASD had more difficulty than the controls in interpreting the video and during two short moments there were significant group differences in eye movements. Additionally, we found significant correlations between verbal task scores and moment-level eye movement in the ASD group, but not among the controls. We concluded that participants with ASD had slight difficulties in understanding the pragmatic content of the video stimulus and attending to social cues, and that the connection between pragmatic understanding and eye movements was more pronounced for participants with ASD than for neurotypical participants.
The impact of real-time articulatory information on phonetic transcription: Ultrasound-aided transcription in cleft lip and palate speech
(Karger, 2019-05-24) Cleland, Joanne; Lloyd, Susan; Campbell, Linsay; Crampin, Lisa; Palo, Pertti; Sugden, Eleanor; Wrench, Alan A.; Zharkova, Natalia
Objective: This study investigated whether adding an additional modality, namely ultrasound tongue imaging, to perception-based phonetic transcription impacted on the identification of compensatory articulations and on interrater reliability. Patients and Methods: Thirty-nine English-speaking children aged 3 to 12 with cleft lip and palate (CLP) were recorded producing repetitions of /aCa/ for all places of articulation with simultaneous audio and probe-stabilised ultrasound. Three types of transcriptions were performed: 1. Descriptive observations from the live ultrasound by the clinician recording the data; 2. Ultrasound-aided transcription by two ultrasound-trained clinicians; and 3. Traditional phonetic transcription by two CLP specialists from audio recording. We compared the number of consonants identified as in error by each transcriber and then classified errors into eight different subcategories. Results: Both the ultrasound-aided and traditional transcriptions yielded similar error-detection rates, however these were significantly higher than the observations recorded live in the clinic. Interrater reliability for the ultrasound transcribers was substantial (k=0.65), compared to moderate (k=0.47) for the traditional transcribers. Ultrasound-aided transcribers were more likely to identify covert errors such as double articulations and retroflexion than the audio-only transcribers. Conclusion: Ultrasound-tongue imaging is a useful complement to traditional phonetic transcription for CLP speech.
MEASURING PRE-SPEECH ARTICULATION
(Queen Margaret University, Edinburgh, 2019) Palo, Pertti
Abstract: What do speakers do when they start to talk? This thesis concentrates on the articulatory aspects of this problem, and offers methodological and experimental results on tongue movement, captured using Ultrasound Tongue Imaging (UTI). Speech initiation occurs at the start of every utterance. An understanding of the timing relationship between articulatory initiation (which occurs first) and acoustic initiation (that is, the start of audible speech) has implications for speech production theories, the methodological design and interpretation of speech production experiments, and clinical studies of speech production. Two novel automated techniques for detecting articulatory onsets in UTI data were developed based on Euclidean distance. The methods are verified against manually annotated data. The latter technique is based on a novel way of identifying the region of the tongue that is first to initiate movement. Data from three speech production experiments are analysed in this thesis. The first experiment is picture naming recorded with UTI and is used to explore behavioural variation at the beginning of an utterance, and to test and develop analysis tools for articulatory data. The second experiment also uses UTI recordings, but it is specifically designed to exclude any pre-speech movements of the articulators which are not directly related to the linguistic content of the utterance itself (that is, which are not expected to be present in every full repetition of the utterance), in order to study undisturbed speech initiation. The materials systematically varied the phonetic onsets of the monosyllabic target words, and the vowel nucleus. They also provided an acoustic measure of the duration of the syllable rhyme. Statistical models analysed the timing relationships of articulatory onset, and acoustic durations of the sound segments, and the acoustic duration of the rhyme. Finally, to test a discrepancy between the results of the second UTI experiment and findings in the literature, based on data recorded with Electromagnetic Articulography (EMA), a third experiment measured a single speaker using both methods and matched materials. Using the global Pixel Difference and Scanline-based Pixel Difference analysis methods developed and verified in the first half of the thesis, the main experimental findings were as follows. First, pre-utterance silent articulation is timed in inverse correlation with the acoustic duration of the onset consonant and in positive correlation with the acoustic rhyme of the first word. Because of the latter correlation, it should be considered part of the first word. Second, comparison of UTI and EMA failed to replicate the discrepancy. Instead, EMA was found to produce longer reaction times independent of utterance type.
Pre-speech tongue movements recorded with ultrasound
(2014-05-09) Palo, Pertti; Schaeffler, Sonja; Scobbie, James M.; Fuchs, S.; Grice, M.; Hermes, A.; Lancia, L.; Mücke, D.
We analyse Ultrasound Tongue Imaging (UTI) data from five speakers, whose native languages (L1) are English (3 peakers), German (1 speaker), and Finnish (1 speaker). The data consist of single words spoken in the subjects' respective native tongues as responses to a picture naming task. The focus of this study is on automating the analysis of ultrasound recordings of tongue movements that take place after the subject is presented with a stimulus. We analyse these movements with a pixel difference method (McMillan and Corley 2010; Drake, Schaeffler, and Corley 2013a; Drake, Schaeffler, and Corley 2013b), which yields an estimate on the rate of change on a frame by frame basis. We describe typical time dependent pixel difference contours and report grand average contours for each of the speakers.
Visualising speech: Identification of atypical tongue-shape patterns in the speech of children with cleft lip and palate using ultrasound technology
(NHS Greater Glasgow & Clyde and University of Strathclyde, Glasgow, 2018) Lloyd, Susan; Cleland, Joanne; Crampin, Lisa; Campbell, Linsay; Zharkova, Natalia; Palo, Pertti
Previous research by Gibbon (2004) shows that at least 8 distinct error types can be identified in the speech of people with cleft lip and palate (CLP) using electropalatography (EPG), a technique which measures tongue-palate contact. However, EPG is expensive and logistically difficult. In contrast, ultrasound is cheaper and arguably better equipped to image the posterior articulations (such as pharyngeals) which are common in CLP. A key aim of this project is to determine whether the eight error types made visible with EPG in CLP speech described by Gibbon (2004) can be also be identified with ultrasound. This paper will present the first results from a larger study developing a qualitative and quantitative ultrasound speech assessment protocol. Data from the first 20 children aged 3 to 18 with CLP will be presented. Data are spoken materials from the CLEFTNET protocol. We will present a recording format compatible with CAPS-A to record initial observations from the live ultrasound (e.g. double articulations, pharyngeal stops). Two Speech and Language Therapists analysed the data independently to identify error types. Results suggest that all of the error types, for example fronted placement and double articulations can be identified using ultrasound, but this is challenging in real-time. Ongoing work involves quantitative analysis of error types using articulatory measures.
Visualising speech: Using ultrasound visual biofeedback to diagnose and treat speech disorders in children with cleft lip and palate
(NHS Greater Glasgow & Clyde and University of Strathclyde, Glasgow, 2017-09) Cleland, Joanne; Crampin, Lisa; Zharkova, Natalia; Wrench, Alan A.; Lloyd, Susan; Palo, Pertti
Children with cleft lip and palate (CLP) often continue to have problems producing clear speech long after the clefts have been surgically repaired, leading to educational and social disadvantage. Speech is of key importance in CLP from both a quality of life and surgical outcome perspective, yet assessment relies on subjective perceptual methods, with speech and language therapists (SLTs) listening to speech and transcribing errors. This is problematic because perception-based phonetic transcription is well known to be highly unreliable(Howard & Lohmander, 2011) especially in CLP, where the range of error types is arguably far greater than for other speech sound disorders. Moreover,CLP speech is known to be vulnerable to imperceptible error types, such as double articulations which can only be understood with instrumental techniques such as ultrasound tongue imaging (UTI). Incorrect transcription of these errors can result in misdiagnosis and subsequent inappropriate intervention which can lead to speech errors becoming deeply ingrained.