Characterization of Dysphonic VOices by Means of a Filterbank-Based Spectral Analysis: Sustained Vowels and Running Speech

Fraile Muñoz, Rubén; Godino-Llorente, J. I.; Sáenz-Lechón, N.; Osma Ruíz, Víctor J.; Gutiérrez-Arriola, Juana Maria
OBJECTIVES: This article presents a comparative study of the spectral power distribution for normal and dysphonic voices, both for sustained vowels and running speech. The objective of this study was to find robust cues of dysphonia in spectral domain. For this purpose, recordings from two databases are processed, one of them including both sustained vowels and running speech. Additionally, a new measure of stability is introduced (decorrelation time). The application of this measure to the power spectrum is also tested as a cue of dysphonia. MATERIALS AND METHODS: The spectral analysis is done having both an auditory model and the filterbank approach as references to the computation of discrete spectrograms. Results are obtained from three sets of recordings belonging to two different databases. RESULTS: The reported results indicate that only minor differences exist in the shape of the power spectrum of normal and dysphonic voices when performing sustained vowel phonation tasks. However, the calculated band power decorrelation times indicate that power in bands between 2000 and 6400Hz is significantly less stable in dysphonic voices. As for running speech, the stability of spectral power is not such a good indicator of dysphonia, but there is a significant difference between normal and dysphonic voices in the power level of high-frequency bands (above 5300Hz). In addition, this means that sampling rates above 10.6ksps are needed for assessing running speech in spectral domain. Also, the results involving decorrelation times indicate that for short-time spectral analysis, frame rates above 100 frames/s should be preferred.
Journal of Voice