Text to audio conversion

Abstract: You might have already used text-to-speech in products, and maybe even incorporated it into your own application, but you still don’t know how it works. This document will give you a technical overview of text-to-speech so you can understand how it works, and better understand some of the capabilities and limitations of the technology.


Text-to-speech fundamentally functions as a pipeline that converts text into PCM digital audio. The elements of the pipeline are:

 1. Text normalization

   2. Homograph disambiguation

   3. Word pronunciation

   4. Prosody

   5. Concatenate wave segments


I was always fascinated whenever I used Acrobat Reader's Read Out options. I found that Adobe Reader uses the Windows Speech engine. Almost all Windows OSs are shipped with the Speech engine. We can also use this engine programatically. There are many features available with the Speech engine, like speech recognition, text to speech, etc. With speech recognition, you can interact with your PC using voice commands rather than GUI commands. In this example, I have shown how to use the TTS feature of the Speech engine

    Processor     :              Intel Pentium or more

    RAM             :                512 MB or more

    Cache          :             512 KB

    Hard disk     :             16 GB hard disk recommended for primary partition.       



    Operating system        :       Windows 2000 or later

    Front End Software    :       WINDOWS APPLICATION (C#.NET) (Microsoft Visual

                                                  Studio 2008)

Mon, 30/05/2011 - 17:55