Raw text has the problem that it cannot always easily be rendered as speech in the way the author wishes. Sable offers a well-defined way of marking up text so that the synthesizer may render it appropriately.
Festival XML Mark-UP: SSML and HTML
(From XML/SGML mark-up)
NOTE In my Ubuntu distributions, examples are found in directory /usr/share/doc/festival/examples/. Two are provided here: example1.sable and example2.sable. In these examples I have commented out or ammended those elements that result in segmentation fault
or go unrecognized. Generally, festival's SGML is is document type SABEL.
The ideas of a general, synthesizer system nonspecific, mark-up language for labelling text has been under discussion for some time. Festival has supported an SGML based markup language through multiple versions most recently STML (sproat97). This is based on the earlier SSML (Speech Synthesis Markup Language) which was supported by previous versions of Festival (taylor96). With this version of Festival we support Sable a similar mark-up language devised by a consortium from Bell Labls, Sub Microsystems, AT&T and Edinburgh, sable98.
Unlike the previous versions, which were SGML based, the implementation of Sable in Festival is now XML based. To the user the difference is negligable but using XML makes processing of files easier and more standardized. Also Festival now includes an XML, parser thus reducing the dependencies in processing Sable text.
The definition of Sable is by no means settled and is still in development. In this release Festival offers people working on Sable and other XML (and SGML) based markup languages a chance to quickly experiment with prototypes by providing a DTD (document type descriptions) and the mapping of the elements in the DTD to Festival functions. Although we have not yet (personally) investigated facilities like cascading style sheets and generalized SGML specification languages like DSSSL, we believe the facilities offer by Festival allow rapid prototyping of speech output markup languages.
Primarily we see Sable markup text as a language that will be generated by other programs, e.g. text generation systems, dialog managers etc. therefore a standard, easy to parse, format is required, even if it seems overly verbose for human writers.
- An example of Sable with descriptions
- Currently supported Sable tags
- Adding new Sable tags
- XML Software environment requirements
- Rendering Sable files as speech
An example of Sable with descriptions
Here is a simple example of Sable marked up text:
<?xml version="1.0"?> <!DOCTYPE SABLE PUBLIC "-//SABLE//DTD SABLE speech mark up//EN" "Sable.v0_2.dtd" []> <SABLE> <SPEAKER NAME="male1"> The boy saw the girl in the park <BREAK/> with the telescope. The boy saw the girl <BREAK/> in the park with the telescope. Good morning <BREAK /> My name is Stuart, which is spelled <RATE SPEED="-40%"> <SAYAS MODE="literal">stuart</SAYAS> </RATE> though some people pronounce it <PRON SUB="stoo art">stuart</PRON>. My telephone number is <SAYAS MODE="literal">2787</SAYAS>. I used to work in <PRON SUB="Buckloo">Buccleuch</PRON> Place, but no one can pronounce that. By the way, my telephone number is actually <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.2.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.8.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/>. </SPEAKER> </SABLE>
After the initial definition of the SABLE tags, through the file Sable.v0_2.dtd, which is distributed as part of Festival, the body is given. There are tags for identifying the language and the voice. Explicit boundary markers may be given in text. Also duration and intonation control can be explicit specified as can new pronunciations of words. The last sentence specifies some external filenames to play at that point.
XML Software environment requirements*
Using Sable
Support in Festival for Sable is as a text mode. In the command mode use the following to process a Sable file
(tts "file.sable" 'sable)
Also the automatic selection of mode based on file type has been set up such that files ending .sable
will be automatically synthesized in this mode. Thus
festival --tts fred.sable
will render fred.sable as speech in Sable mode.
Another way of using Sable is through the Emacs interface. The say-buffer
command will send the Emacs buffer mode to Festival as its tts-mode. If the Emacs mode is stml or sgml, the file is treated as a sable file.
Many people experimenting with Sable (and TTS in general) often want all the waveform output to be saved to be played at a later date. The simplest way to do this is using the text2wave
script, It respects the audo mode selection:
text2wave fred.sable -o fred.wav
Note: this renders the file as a single waveform (done by concatenating the waveforms for each utterance in the Sable file).
If you wish the waveform for each utterance in a file saved you can cause the tts process to save the waveforms during synthesis:
festival> (save_waves_during_tts)
Any future call to tts will cause the waveforms to be saved in a file tts_file_xxx.wav, where xxx is a number. A call to (save_waves_during_tts_STOP)
will stop saving the waves. A message is printed when the waveform is saved otherwise people forget about this and wonder why their disk has filled up.
This is done by inserting a function in tts_hooks
, which saves the wave. To do other things to each utterances during TTS (such as saving the utterance structure), try redefining the function save_tts_output
(see festival/lib/tts.scm).