does anyone know "speech synthesis driven by emotional function " ?

  But when I want to synthesize emotional speech, is it necessary to convert spectrum.

Research Demo - Synthesize Emotional Speech

The self-regulatory processing is learned from parental interaction with his/her infant. In combination with genetic and environmental factors, the attachment between parent and child is encoded in the circuits of the infant’s brain. This attachment is the result of the capacity of the parent to appraise the emotional state of the offspring and take necessary actions to change the child’s physiology from stress to well-being, optimizing the chances for survival. Those early connections, especially before the age of three, or later in life during periods of trauma, form the basis for the circuitry of resiliency and health (25-28).

T1 - Emotional speech synthesis by sensing affective information from text

Emotional Speech Synthesis: A Review - ResearchGate

This paper explores a unit selection based concatenative approach towards emotional speech synthesis in Hindi. The emotions explored are sad and neutral. The Festival framework is used as the underlying Text-To-Speech (TTS) system. The various steps which are followed to create a new voice in Festival are described here. The developed TTS systems are evaluated by subjective evaluation tests. These tests indicate a significant improvement in the quality of synthesis after necessary prosody modifications. Finally, possible improvements which can be made on the systems are put forward.

In my opinion, converting spectrum will help to synthesize emotional speech.

In an effort to inject some life in the automated voices that come out of our apps, AI startup Lyrebird has developed a voice-imitation algorithm that can mimic any person’s voice, and read any text with a predefined emotion or intonation. Incredibly, it can do this after analyzing just a few dozen seconds of pre-recorded audio. In an effort to promote its new tool, Lyrebird produced several audio samples using the voices of Barack Obama, Donald Trump, and Hillary Clinton.

CereProc Emotional Text-to-Speech Synthesis - YouTube

AB - Speech can express subjective meanings and intents that, in order to be fully understood, rely heavily in its affective perception. Some Text-to-Speech (TTS) systems reveal weaknesses in their emotional expressivity but this situation can be improved by a better parametrization of the acoustic and prosodic parameters. This paper describes an approach for better emotional expressivity in a speech synthesizer. Our technique uses several linguistic resources that can recognize emotions in a text and assigns appropriate parameters to the synthesizer to carry out a suitable speech synthesis. For evaluation purposes we considered the MARY TTS system to readout "happy" and "sad" news. The preliminary perceptual test results are encouraging and human judges, by listening to the synthesized speech obtained with our approach, could perceive "happy" emotions much better than compared to when they listened nonaffective synthesized speech.

Emotional Speech Synthesis Based on Improved …

N2 - In this paper, we adopt a difference approach to prosody prediction for emotional text-to-speech synthesis, where the prosodic variations between emotional and neutral speech are decomposed into the global and local prosodic variations and predicted using a two-stage model. The global prosodic variations are modeled by the means and standard deviations of the prosodic parameters, while the local prosodic variations are modeled by the classification and regression tree (CART) and dynamic programming. The proposed two-stage prosody prediction model has been successfully implemented as a prosodic module in a Festival-MBROLA architecture based emotional text-to-speech synthesis system, which is able to synthesize highly intelligible, natural and expressive speech.

Emotional speech synthesis database - …

While the nature of the mental activity that underlies language learning is widely debated, there is considerable agreement that the course of language development is influenced by determining factors in at least five fields: social, perceptual, cognitive processing, conceptual and linguistic. As well, although individual differences among children do exist, language development has predictable sequences. Most children begin speaking during their second year, and by 21 months are likely to know about 100 words and are able to combine them in short phrases. By age of four to six, most children are speaking in grammatically complete and fully intelligible sentences. Their first sentences are made of content words and are often missing grammatical function words (e.g., articles and prepositions) and word endings (e.g., plurals and tense markers). Although there is a predictable sequence, the rate of language development among children varies substantially primarily due to the complex interaction between genetic and environmental factors.