Small Talk

Two years ago, I used an Emic2 module for a couple of TTS (text-to-speech) projects. Despite its flexibility (someone even made it sing Bohemian Rhapsody here), the Emic2 sounds quite ‘robotic’ when used for direct TTS conversion. Not the best choice for a Talking Alarm Clock project, unless you like to wake up next to Stephen Hawking… (no offense).

So I consideredĀ to compose full sentences out of spoken language fragments, stored in mp3 format on my VS1053 breakout’s SD card. Then I discovered this google_tts library on github. It takes a text string as input and returns a link to a correspondig mp3 file in the Google cloud. You can specify one out of 30 languages, as well as a voice (both male and female).

I tried it, and the result is fully satisfying. The sketch starts with retrieving the local time for my location, provided by a simple api on my server. Then it fills a string variable with a natural language sentence like “The current time is <hr> hour and <min> minutes” (in Dutch in my sketch). Next, this string in processed by the library and the resulting link is sent to translate.google.com. The returned mp3 data is streamed to the VS1053 chip and sent as analogue audio to the amplifier.

This project is definitely going to replace my alarm clock. From now on, a soft female voice will wake me up in the morning, reading the current time and weather forecast, before telling me to stay in bed for a while. So glad I didn’t use the Emic2…

Here’s a basic sketch (current time only). You can ignore warnings when compiling it for esp32: it’ll run just fine on esp32 boards.

VS1053 + Matrix keypad

 

The VS1053b is a popular chip, used in many MCU controlled audio projects. It surprises me that nobody seems to use its eight GPIO pins, controllable over SPI by the master MCU. These GPIO pins enabled me to control my esp8266 based Internet radio by means of a matrix keypad, despite the fact that the esp8266 had only one free pin left.

The cheap 12-key membrane keypad shown in the picure uses 7 of these pins, wheras a 16-key matrix keypad uses all of them.

 

Adafruit’s VS1053 library has basic functions for adressing the chip’s GPIO pins. I wrote a simple demo sketch that prints the pressed key. In a real project, this information would typically be used to control other functions of the VS1053, like changing audio source or adjusting volume.

 

Make sure that pin definitions in the demo sketch match your MCU and connect RST to your board’s RST.

The VS1053_GPIO wiring for the demo sketch is different from what the picture shows. This is what I used for my 12-key membrane matrix keypad:

 

 

 

 

 

The connector’s column/row mapping for these particular keypads is usually as follows (from left to right in the picture): row0 – row1 – row2 -row3 – col0 – col1 – col2. The sketch can easily be adapted for a 16-key matrix keypad.

The basic idea of the sketch is:

  • keypadListen() sets all four row pins to INPUT with value 0; it sets all three column pins to OUTPUT with value 1. In this idle state, GPIO_digitalRead() will return B00000111 (decimal 7).
  • In loop(), we check GPIO_digitalRead(). A value higher than 7 means that one of the row pins has changed from 0 to 1 because it got connected to one of the column pins. This happens if a key on that row was pressed (that’s how matrix keypads are wired). We call keypadRead() to find the responsible key.
  • keypadRead() searches the row that caused the GPIO_digitalRead() > 7 condition, by checking each row pin until it reads 1. This gives us the row of the pressed key: R.
  • Next, it finds the column pin that is connected to row pin R by setting all column pins to INPUT with value 0 and row pin R only to OUTPUT with value 1. Checking each column pin until it reads 1 gives the column of the pressed key: C.
  • Now, keymap[R][C] holds the value of the pressed key (10 and 11 for * and #).
  • A simple debouncer is added for, well, debouncing.

 

On-air

Instead of adapting an existing Internet Radio project to my needs (like support for air traffic control stations), I decided that it would be easier to write my own sketch from scratch. I’m satisfied with the first results, and I learned a lot from this project.

Keypad controlled prototype in radio mode (left) and ATC mode (right)

The radio runs without problems on a Wemos D1 R2, driving an Adafruit VS1053 breakout and a 2.4″ ili9341 TFT display. It can be controlled with a 16-key matrix keypad (the pictures show a Wemos D1 Mini Pro, but its wifi turned out to be too slow for some stations).

The prototype uses almost 500 lines of code for the radio. David Bird’s excellent METAR functions take yet another 500 lines. Too big to post on this page, I’m afraid…

It has the following features (more will be added later):

  • Can play mp3 encoded streams from internet radio stations (up to 320 Kbps).
  • Controlled by means of a 4×4 matrix keypad, using VS1053’s own GPIO pins.
  • Offers storage of 40 station presets (grouped in fourĀ  bands).
  • Displays station name and song title on a TFT display.
  • Displays a decoded weather report (METAR) when playing an air traffic control (ATC) channel. The first 10 stations in the preset list are reserved for ATC stations (I used David Bird’s excellent functions for METAR decoding).
  • A 30 Kbyte ring buffer is used for smooth playback.
  • Designed to handle chunked transfer encoded streams if necessary (I was unable to test this, because I couldn’t find any station that uses this encoding).
  • Stereo dB levels can be read directly from the chip and be used to drive VU meters from PWM pins.

Below is a flow chart of the initial connection, followed by the streaming process (basically a finite state machine). In practice, it all comes down to determining each incoming byte’s function within the stream by keeping track of some counters and states. Once the actual streaming has started, an incoming byte can be one of the next types:

  1. audio byte – will be sent to a ringbuffer that feeds the VS1053 decoder
  2. metadata byte – a human readable character from the song title
  3. metadata size – an integer; multiplied by 16, it is the following song title’s length
  4. CR or LF – used as separators
  5. chunk size (if station uses chunked transfer encoding) – size of the next chunk