Prospects for articulatory synthesis: A position paper
Acoustic analysis and articulatory synthesis

was done by AT&T Bell Labs (Coker, 1976). Lip, jaw, and tongue positions were controlled by rule. The final synthesis step was done by a formant-based terminal analog. Current efforts at KTH by Lin and Fant (1992) use a parallel synthesizer with parameters derived from an articulatory model. In the development of articulatory modeling for text to speech, we can take advantage of parallel work on speech coding based on articulatory modeling (Sondhi and Schroeter, 1987). This work focuses not only on synthesizing speech but also on how to extract appropriate vocal tract configurations. Thus, it will also help us to get articulatory data through an analysis-synthesis procedure. This section has not dealt with the important work carried out to describe speech production in terms of physical models. The inclusion of such models still lies in the future, beyond the next generation of text to speech systems, but the results of these experiments will improve the current articulatory and terminal analog models.

We can position the different synthesis methods along a ''knowledge about speech" scale. Obviously, articulatory synthesis needs considerable understanding of the speech act itself, while models based on coding use such knowledge only to a limited extent. All synthesis methods have to model something that is partly unknown. Unfortunately, artificial obstacles due to simplifications or lack of coverage will also be introduced. A trend in current speech technology, both in speech understanding and speech production, is to avoid explicit formulation of knowledge and to use automatic methods to aid the development of the system. Since such analysis methods lack the human ability to generalize, the generalization has to be present in the data itself. Thus, these methods need large amounts of speech data. Models working close to the waveform are now typically making use of increased unit sizes while still modeling prosody by rule. In the middle of the scale, "formant synthesis" is moving toward the articulatory models by looking for "higher-level parameters" or to larger prestored units. Articulatory synthesis, hampered by lack of data, still has some way to go but is yielding improved quality, due mostly to advanced analysis-synthesis techniques.

1 The foundations for speech synthesis based on acoustical or articulatory modeling can be found in Fant (1960), Holmes et al. (1964), Flanagan (1972), Klatt (1976), and Allen et al. (1987). The paper by Klatt (1987) gives an extensive review of the developments in speech synthesis technology.

