*********************************************************************** IBM TEXT-TO-SPEECH RUN TIME KIT Version 6.7.1.0 Readme (linux.readme.6.7.1.0.txt) Copyright IBM Corporation, 2003. All Rights Reserved *********************************************************************** CONTENTS -------- 1. Company 2. Product 3. Version 4. Description 5. Contact Information 6. Upgrade Information 7. What's New 8. Installation Requirements 9. End-User Installation Instructions 10. Working with Concatenative Voices 11. Uninstall Instructions 12. General Limitations and Comments 13. Known Problems & F.A.Q. 14. Developer Notes 15. Memory and Performance Tools 16. Logging Utilities 17. Trademark Information 1. COMPANY ----------- International Business Machines Corporation (IBM) 2. PRODUCT ----------- IBM Text-to-Speech Run Time Kit 3. VERSION ----------- IBM Text-to-Speech Run Time Kit, Version 6.7.1.0 4. DESCRIPTION ---------------- IBM Text-to-Speech Run Time Kit, provides the speech synthesis engine and components necessary for applications to produce speech. IBM Text-to-Speech Run Time Kit, Version 6.7.1.0 produces speech from recordings of units of human speech. These units (possibly phonemes, syllables, words, or phrases) are then combined (concatenated) according to linguistic rules formulated from analyzed text. When these recorded speech units are entire phrases or sentences, the output can be very natural, human-sounding speech. The components for the Text-to-Speech Run Time Kit include: Speech synthesis engine Data Sets (Per Language): Voice 1 Adult male 8 KHz Voice 2 Adult female 8 KHz Voice 4 Adult male 8 KHz for U.S English Only The Speech synthesis engine and data include capability for a concatenative voice dataset representation as well as for synthesized voice representation. The concatenative voice is derived from a professional speaker, speaking a particular language and dialect, recorded at a particular sampling rate. When a client program changes languages, and it is doing concatenative synthesis, a new voice dataset may have to be loaded into memory from disk, if it is not already cached in memory from previous usage. The system will automatically choose concatenative synthesis if a voice data set is available for the language, voice, and sample rate that you select. For example, if you are using English at 8KHz, with voice 1 and U.S. English voice 1 at 8Khz has been installed, then the system will automatically do concatenative synthesis. Otherwise, the system will do formant synthesis. When concatenation is being done, ECI voice selections appear to the concatenative engine as requests to switch between already-loaded voice datasets, while voice attribute settings appear as changes in the phonetic and acoustic data that it receives. 5. CONTACT INFORMATION ----------------------- Please visit our Web site for enhancements and updates to Text-to-Speech. http://www.software.ibm.com/speech/dev 6. UPGRADE PATH TO FULL VERSION -------------------------------- The full version is currently included. 7. WHAT'S NEW -------------- This version of Text-to-Speech includes support for custom filters. An e-mail filter is provided that will convert e-mail messages into a more natural format. Please refer to the Text-to-Speech SDK for more information on implementing and using custom filters. 8. INSTALLATION REQUIREMENTS ----------------------------- Hardware: Formant - Processor performance equivalent to Intel Pentium 133MHz with MMX with 256K L2 cache - 48MB of RAM in total - 10MB available hard disk space - Compatible 16 bit sound card - CD-ROM drive 9. INSTALLATION INSTRUCTIONS ----------------------------- You must have root user access to install IBM Text-to-Speech Run Time Kit. The tar files are named after the language runtime that they contain. You may wish to copy the tar file you need to your hard drive before continuing with the following installation instructions. 1) Untar the files into a diretory the user's home directory for instance. 2) Then run chmod 755 on the setup.sh file that is extracted from the tar file. 3) Execute the setup.sh file by running the following command: ./setup.sh 4) Follow the onscreen instructions until completion Note: Installation has only been tested with RedHat Fedora Core 2 and Mandrake Linux 10 11. UNINSTALL INSTRUCTIONS --------------------------- To uninstall the Text-to-Speech Run Time Kit: First, use to find which packages are installed. rpm -qa | grep viavoice Then use rpm with -e flag to remove. Uninstall all concatenative voices first. NOTE:xx_YY refers to concatenative voices you've installed rpm -e viavoice_tts_concat_xx_YY-6.7-1.0 rpm -e viavoice_tts_concat-6.7-1.0 Then, uninstall all languages: rpm -e viavoice_tts_rte_xx_YY-6.7-1.0 rpm -e viavoice_tts_rte-6.7-1.0 12. GENERAL LIMITATIONS AND COMMENTS ------------------------------------- This section contains information that is not specific to any particular element of the Text-to-Speech Run Time Kit but is general or generic in nature. It is very important to heed these warnings and follow the instructions given to avoid abnormal or unpredictable results. * Currently, only 8 KHz concatenative voices are provided. Application programmers requiring higher quality audio should upgrade their voice datasets. For more information visit the IBM Text-to-Speech home page. * Currently, Version 6.7.1.0 supports the following languages with Formant voices, (Note: languages with a * denote formant and concatenative voice support): Brazilian Portuguese* French* Canadian French* Finnish German* United States English* United Kingdom English* Spanish* Mexican Spanish Italian* Chinese Simplified* Chinese Traditional* Japanese* * Currently, the included e-mail filter is only available for the English language. * The email filter included with IBM Text-to-Speech recognizes the following keywords in an email message: Keyword Action ------- ------ Subject: Parse out the subject of the message and return a new subject string to the client application. To: Filter out lines until a recognized keyword is encountered. From: Parse out the sender of the message and return a new string to the client application. Date: Parse out the date that the message was sent and return a new string with that date to the client application. Sent: Parse out the date that the message was sent and return a new string with that date to the client application. Alternate-Recipient: Filter out the current line. Mime-Version: Filter out the current line. Return-Path: Filter out the current line. MR-Received: Filter out the current line. Content-Type: Filter out lines until a recognized keyword is encountered. Content-Transfer-Encoding: Filter out the current line. Posting-Date: Filter out the current line. Importance: Filter out the current line. Priority: Filter out the current line. Sensitivity: Filter out the current line. UA-Content-ID: Filter out the current line. X400-MTS-Identifier: Filter out the current line. A1-Type: Filter out the current line. Hop-Count: Filter out the current line. Content-Disposition: Filter out the current line. Delivered-To: Filter out the current line. X-Originating-IP: Filter out the current line. X-OriginalArrivalTime: Filter out the current line. Full-Name: Filter out the current line. X-Mailer: Filter out the current line. CC: Filter out the current line. Filetime= Filter out lines until a recognized keyword is encountered. X-Apparently-To: Filter out the current line. Content-Length: Filter out the current line. Auto-Submitted: Filter out the current line Status: Filter out the current line Received: Filter out lines until a recognized keyword is encountered. * The included e-mail filter will also filter the following "emoticons" from messages: (R) (C) :-) :-( :-] :) ;) :-#| :( :-> :-< :-\\ (-: >:-< :-| :-o :-c |-) |-O :-# :-% :-& :-'| :-)' :-)8 :-* :-/ :-: :-? :-@ (:I :-[ *:o) +-(:-).-) <:I @:I [:-|] 8-# 8:-) }(:-( :-{ :-{( :-} :-O :-6 :-8( :-9 :-D :-e :-i :-p :-t :-v ::-) 8-) :<| :=) :>) :~) ;-) %-) (-) (:-) )8-) *-( *<|:-)-:-) ;-\\ =:-) [:-) O-) 8-| {(:-){:-) * The eciUpdateFilter function for the included e-mail filter only supports changing the behavior for the "From:", "Date:", and "Subject:" fields. * The Text-to-Speech SDK includes a file "maildict.dct" that includes translations for common e-mail jargon and abbreviations. For best results when processing e-mail messages, this dictionary file should be used in conjunction with the included e-mail filter. ========= inifilter The inifilter tool registers and unregisters filters which are used as preprocessor addins for eci to modify text. inifilter [-ul] /filter:[filterNum] /path:[filterPath] /autoload:[y/n] /lang:[lang] /ECIINI:[IniPath] -u Disable specified filter -l Display statistics about specified filter filter Filter number path Fully qualified filename of filter autoload Filter is automatically loaded when language selected Valid values are: n Filter is not automatically loaded y Filter is automatically loaded lang Language/Dialect for the filter Valid language/dialect values are: 1.0 - US English 1.1 - British English 2.0 - Castilian Spanish 2.1 - Mexican Spanish 3.0 - Standard French 3.1 - Canadian French 4.0 - Standard German 5.0 - Standard Italian 6.0 - Mandarin Chinese 6.1 - Taiwanese Chinese 7.0 - Brazilian Portuguese 8.0 - Standard Japanese 9.0 - Standard Finnish 13.0 - Standard Norwegian 14.0 - Standard Swedish 15.0 - Standard Danish ECIINI Path to ECIINI file (not used on Windows platforms) ECIINI environment variable used on other platforms if ommitted NOTE: If -u is specified, only the language, filter and INI file may be specified. 13. KNOWN PROBLEMS & F.A.Q. ---------------------------- The following are known problems that are included in this release: * Setting the pitch baseline after setting head size may return an error in certain situations. * If you experience unusual behavior, check and see if a cmm.log file has been created in the directory where you launched the first application. Save this file and report this problem to technical support. The client application should be stopped and "cmmcmd shutdown" run before restarting the application. F.A.Q ----- Q: Can I use version 5.0 and 6.7.1.0 on the system together. A: When upgrading from Text-to-Speech Run Time Kit version 5.0 to version 6.7.1.0, the system will continue to function. You will need to reinstall version 5.0 to enable speech synthesis. Currently, version 6.7.1.0. does not support all of the languages which version 5.0 does support. If both versions are on the same system, the version 6.7.1.0 languages should be installed after the version 5.0 languages. Q: Why is my application still synthesizing with formant synthesis. A: When you install an 8KHz voice the system will produce concatenative synthesis for any application which requests synthesis at 8KHz. By default the system generates audio at 11KHz. In order to produce concatenative speech use eciSetDefaultParam or eciSetParam to set the sample rate. Also, check that version 5.0 was not installed after version 6.7.1.0 if both version reside on the same machine. 14. DEVELOPER NOTES -------------------- * In order to develop applications, you should include the enclosed eci.h in your application. This file defines all the enumerations, typedefs, return codes, and function prototypes used in the ECI interface. * Link your applications to libibmeci.a. This library can be loaded dynamically. * In order to develop applications, you will require updated versions of /usr/local/viavoicetts/lib/libibmeci.a, and /usr/local/viavoicetts/headers/eci.h. The Text-to-Speech SDK Version 6.7.1.0 is a good starting point for developing applications. If you have existing TTS applications that use the ECI interface, you will need to re-compile and re-link these applications with the new files. * Concatenative Memory Manager (CMM) cmmcmd Utility A support utility called cmmcmd was created to interface with the Concatenative Memory Manager (CMM). Note : This is a support tool and was not intended to be an end user utility. Invoke cmmcmd as follows: cmmcmd shutdown -- shuts down the CMM cmmcmd timeout ## -- sets the CMM timeout to ## seconds 15. Memory and Performance Tools ---------------------------------- Due to the computational complexity and amount of memory required to produce concatenative speech, IBM Text-to-Speech utilizes shared memory and speech caching to reduce the amount of system resources required. * The concatenative TTS engine requires more physical memory (to store the data required to produce natural speech synthesis) than formant synthesis. Since many processes on a server may require access to the same data, IBM Text-to-Speech loads and shares one instance between all the processes. In addition, IBM Text-to-Speech allows configuration of how long a data will remain loaded after the last access. By default, each concatenative voice remains loaded for 10 minutes. To configure and stop sharing the memory the Concatenative Memory Manager (CMM) utility, cmmcmd.exe, is provided: cmmcmd { shutdown | timeout [secs] } shutdown - shut down the server immediately. timeout [secs] - get/set the server time-out to the specified number of seconds. If secs is 0 or omitted the current shut down time-out is returned. * The concatenative TTS engine requires more computational power than formant TTS engine. Since the domains of many TTS applications are limited to a small vocabulary, IBM Text-to-Speech now provides a mechanism (speech caching) to bypass complex computations for text which has already been processed. The concatenative system can be configured, per language, to set a number of phrases 'to remember' as pre-synthesized phrases. In addition, the memory can be made persistent (that is, saved on exit and reloaded at voice initialization). By default no caching is performed. To enable and configure speech caching, the utility inicache.exe is provided: inicache [-ul] [-p][-n] lang [phrases] [INI] -u Disable voice caching -l Display current voice cache values -p Cache file is persistent (saved for future use) -n Cache file is not persistent lang Language/Dialect for the voice cache Valid language/dialect values are: 1.0 - US English 1.1 - British English 2.0 - Castilian Spanish 2.1 - Mexican Spanish 3.0 - Standard French 3.1 - Canadian French 4.0 - Standard German 5.0 - Standard Italian 6.0 - Simplified Chinese 6.0d - Simplified Chinese (dual language) 6.1 - Traditional Chinese 6.1d - Traditional Chinese (dual language) 7.0 - Brazilian Portuguese 8.0 - Standard Japanese 9.0 - Standard Finnish 13.0 - Standard Norwegian 14.0 - Standard Swedish 15.0 - Standard Danish phrases Maximum number of phrases in the voice cache INI Path to ECIINI file (not used on Windows platforms) ECIINI environment variable used on other platforms if ommitted NOTE: If -u is specified, only the language and INI file may be specified 16. Logging Utilities ---------------------- Often logs must be produced for auditing, technical support, and diagnostic purposes. The logging provided by IBM Text-to-Speech is extremely verbose and is primarily for technical support and diagnostic purposes. To enable and configure the logging utility the utility initrace.exe is provided: initrace level [file] [INI] level Tracing level [0 = off, 1 = on] file Name of trace file Do not specify trace file if level is 0 INI Path to ECIINI file (not used on Windows platforms) ECIINI environment variable used on other platforms if omitted NOTE: Paths that include spaces much be enclosed in double-quotes. 17. TRADEMARK INFORMATION -------------------------- IBM is a registered trademark or trademark of International Business Machines Corporation in the United States and other countries. All other names are registered trademarks, trademarks or service marks of their respective companies. Doc Number: linux.readme.6.7.1.0.txt.071602 =