Small Talk

Two years ago, I used an Emic2 module for a couple of TTS (text-to-speech) projects. Despite its flexibility (someone even made it sing Bohemian Rhapsody here), the Emic2 sounds quite ‘robotic’ when used for direct TTS conversion. Not the best choice for a Talking Alarm Clock project, unless you like to wake up next to Stephen Hawking… (no offense).

So I considered to compose full sentences out of spoken language fragments, stored in mp3 format on my VS1053 breakout’s SD card. Then I discovered this google_tts library on github. It takes a text string as input and returns a link to a correspondig mp3 file in the Google cloud. You can specify one out of 30 languages, as well as a voice (both male and female).

I tried it, and the result is fully satisfying. The sketch starts with retrieving the local time for my location, provided by a simple api on my server. Then it fills a string variable with a natural language sentence like “The current time is <hr> hour and <min> minutes” (in Dutch in my sketch). Next, this string in processed by the library and the resulting link is sent to translate.google.com. The returned mp3 data is streamed to the VS1053 chip and sent as analogue audio to the amplifier.

This project is definitely going to replace my alarm clock. From now on, a soft female voice will wake me up in the morning, reading the current time and weather forecast, before telling me to stay in bed for a while. So glad I didn’t use the Emic2…

Here’s a basic sketch (current time only). You can ignore warnings when compiling it for esp32: it’ll run just fine on esp32 boards.