In Finnish, the text preprocessing scheme is in general easier but contains also some specific difficulties. Especially with numerals and ordinals expansion may be even more difficult than in other languages due to several cases constructed by several different suffixes. The two first ordinals must be expanded differently in some cases and with larger numbers the expansion may become rather complex. With digits, roman numerals, dates, and abbreviations same kind of difficulties are faced as in other languages. For example, for Roman numerals I and III, there is at least three possible conversion. Some examples of the most difficult abbreviations are given in Table 4.1. In most cases, the correct conversion may be concluded from the type of compounding characters or from other compounding information. But to avoid misconversions, some abbreviations must be spelled letter-by-letter.

For certain languages synthetic speech is easier to produce than in others. Also, the amount of potential users and markets are very different with different countries and languages which also affects how much resources are available for developing speech synthesis. Most of languages have also some special features which can make the development process either much easier or considerably harder.

Some languages, such as Finnish, Italian, and Spanish, have very regular pronunciation. Sometimes there is almost one-to-one correspondence with letter to sound. The other end is for example French with very irregular pronunciation. Many languages, such as French, German, Danish and Portuguese also contain lots of special stress markers and other non ASCII characters (Oliveira et al. 1992). In German, the sentential structure differs largely from other languages. For text analysis, the use of capitalized letters with nouns may cause some problems because capitalized words are usually analyzed differently than others.

Finding correct pronunciation for proper names, especially when they are borrowed from other languages, is usually one of the most difficult tasks for any TTS system. Some common names, such as Nice and Begin, are ambiguous in capitalized context, including sentence initial position, titles and single text. For example, the sentence is very problematic because the word may be pronounced as /niis/ or /nais/. Some names and places have also special pronunciation, such as Leicester and Arkansas. For correct pronunciation, these kind of words may be included in a specific exception dictionary. Unfortunately, it is clear that there is no way to build a database of all proper names in the world.

, '%', '&', '/', '-', '+', cause also special kind of problems. In some situations the word order must be changed. For example, must be expanded as and as , not as . The expression '1-2' may be expanded as or , and character '&' as or . Also special characters and character strings in for example web-sites or e-mail messages must be expanded with special rules. For example, character ' ' is usually converted as and e-mail messages may contain character strings, such as some header information, which may be omitted. Some languages also include special non ASCII characters, such as accent markers or special symbols.