Preface.Foreword.1. Reducing Discontinuities at Synthesis Time for Corpus-Based Speech Synthesis.Introduction.Shift-Only F0 Smoothing.Improving Quality of MBROLA Synthesis.Evaluation.Discussions and Conclusion.Bibliography.2. Voice Quality Variation in a Long-Term Recording of a Single Speaker Speech Corpus.Introduction.Perceptual Experiment.Factors of Voice Quality Variation.Candidates of Acoustic Correlates.Prediction of Voice Quality Difference Scores.Summary.Bibliography.3. Join Cost for Unit Selection Speech Synthesis.Introduction.Previous Work.Spectral Distances.Perceptual Listening Tests.Results and Discussion.Conclusions.Bibliography.4. Articulatory Modeling: A Role in Concatenative Text to Speech Synthesis.Introduction.Articulatory Modeling.Rule-Based Control of the Parameters.Concatenative Articulatory Synthesis.Concluding Remarks.Bibliography.5. Minimizing The Amount of Pitch Modification in Speech Synthesis.Introduction.Speech Corpus Analysis.Text Corpus Analysis.Perceptual Experiment.Conclusion.Bibliography.6. The Use of Speech Recognition Technology in Speech Synthesis.Introduction.Speech Recognition.ASR in Synthesis.Limitations.Speculations.Bibliography.7. An HMM-Based Approach to Multilingual Speech Synthesis.Introduction.HMM-Based Speech Synthesis System.F0 Pattern Modeling by HMM.Speech-Parameter Generation from an HMM.Implementation on Festival Architecture.Discussion.Conclusion.Bibliography.8. Prosody Control For HMM-Based Japanese TTS.Introduction.Outline of HMM-Based TTS System.Prosody Generation Using the Quantification Theory (Type 1).Speech-Rate-Variable Synthesis Method.Conclusions.Bibliography.9. Synthesizing Expressive Speech Overview: Challenges, and Open Questions.Introduction.Theories of Emotion.Dimensions of Emotional Space.Speech Synthesis Methods.Emotional Speech Data Collection.Experimental Evaluation of Expressive Speech.Presentation of Results From Case Studies.Conclusion.Open Questions and Future Directions.Bibliography.10. Unit Selection Synthesis of Prosody: Evaluation Using Diphone Transplantation.Introduction.Computing Prosody by Selection.Comparative Evaluation.Results.Conclusion.Bibliography.11. Toward Expressive Synthetic Speech.Introduction.A Pilot Study For Generating Expressive Speech.Generating Expressive Speech with Limited Resources.Rule-Based Methods for Generating Expressive Speech.Use of an Expressive TTS System.Assessing Performance.Conclusions.Bibliography.Footnotes.Copyright Forms.References.Index.

Their research attempts to develop models of voice
perception and speaker recognition. Without such models, the goal of
understanding how listeners perceive voices will not be achieved. Initial
studies in the laboratory sought to specify the sources of variability in
listeners’ ratings of vocal quality. More recently, studies have focused on
developing reliable, valid methods to measure perceived vocal quality, by
controlling the factors underlying response variability. They have devised a
new, theoretically-motivated method of assessing quality – listener-mediated
analysis-resynthesis—in which listeners explicitly compare synthetic and
natural voice samples, and change speech synthesizer parameters to create
acceptable auditory matches to voice stimuli. This method is designed to
replace usable internal standards for qualities like breathiness and roughness
with externally presented stimuli. Initial results indicate that this technique
does control the major hypothetical sources of disagreement in rating scale judgments.

This review suggests that voice quality is best investigated as a multi-dimensional parameter space involving a combination of factors involving individual prosody, temporally structured speech characteristics, spectral divergence and voice source features, and that it could profitably complement simple linguistic prosodic model processing in speech synthesis.

