I've just been having a look at the site and trying to decide whether it has real potential for helping EFL ESL students with their listening, reading and pronunciation.
As an experiment I decided to select quite a challenging text and see what the site could do. I also decide to select a British English accent, as in the past I know that TTS systems had struggled more with UK accents than US ones, due to the wider range of sounds in UK English.
Anyway, here are the results. The text is from Wikipedia.org at: http://en.wikipedia.org/wiki/Text_to_speech and is about the challenges of text normalisation in TTS.
- Click here to watch Elizabeth read the text to you.
Or
- Listen using this media player
This is the actual text you should be hearing:
"Text normalization challenges
The process of normalizing text is rarely straightforward. Texts are full of heteronyms, numbers, and abbreviations that all require expansion into a phonetic representation. There are many spellings in English which are pronounced differently based on context. For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".
Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective. As a result, various heuristic techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.
Deciding how to convert numbers is another problem that TTS systems have to address. It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five." However, numbers occur in many different contexts; when a year or perhaps a part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a social security number, as "one three two five". A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.
Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs. "
The process of normalizing text is rarely straightforward. Texts are full of heteronyms, numbers, and abbreviations that all require expansion into a phonetic representation. There are many spellings in English which are pronounced differently based on context. For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".
Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective. As a result, various heuristic techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.
Deciding how to convert numbers is another problem that TTS systems have to address. It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five." However, numbers occur in many different contexts; when a year or perhaps a part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a social security number, as "one three two five". A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.
Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs. "
What I like about the site
- The site is free though you do have to register.
- The site creates a number of options once it has converted the text to speech. This includes creating an Mp3 file to download, creating an embed code to embed the audio into a blog or website, or download to i-pod.
- They have quite a selection of avatars and voices
- The site can convert text from a number of sources including Word, PDF, a website (just type in the URL) or even an RSS feed!
- You can make the texts private or public
- There doesn't seem to be a limit on many you can create
- I found it hard to get a link to the avatar reading the text. It would have been nice to be able to embed her into my blog, but I just couldn't get that to work.
- Processing the text can take a while.
So, if you've listened to the text, please do send in a comment and let me know what you think about the useability of a tool like this with EFL ESL students.
Related lnks:
Activities for students:
Best
Nik Peachey
No comments:
Post a Comment