Dual Core Business

Almost a year after finishing my ESP8266-based Internet Radio project, I still have had no idea why running that sketch on the much more powerful ESP32 produces a heavily stuttering sound, resembling Michael Palin in “A Fish Called Wanda”.  So I wondered if I could solve this problem by dividing the audio streaming process between the ESP32’s two cores: filling the audio buffer on one core, and feeding the VS1053 decoder from that buffer on the other core.

Thanks to its builtin FreeRTOS, pinning a task to a specific core of the ESP32 is easy:

1. Put the code for the task in a function, wrapped in an endless for (;;) { } loop.

2. Create a task handle for the task in the main section of the sketch:

3. Create the task, specifying which function it should execute on which core:

 

(Since there was nothing in my loop() function (yet), I put a delay(1000); command in it. The loop() function runs on core 1, and I didn’t want it to claim too much of that core’s processor time.)

So my Radio sketch will have two independent tasks running on different cores: one that listens to the radio station’s audio stream for filling the circular buffer, and one for feeding the VS1053 decoder with bytes from that buffer. Both processes maintain their own pointer for accessing the circular buffer and the code guarantees that they will never catch up with the other process’ pointer. So there is no need to use a semaphore.

However, my first step into the dual core business produced an error. A ‘watchdog timer’ had become nervous and halted the ESP32. It turned out that including the following line in both functions solved at least that problem (bug?):

 

Now the ESP32 no longer crashed, and it actually produced sound. But alas, it was Michael Palin again…!

I can’t think of any reason for this behaviour. The serial monitor shows that the ESP32 has no problem filling the 30 Kbyte circular buffer, so feeding the VS1053 seems to be the bottleneck. I swapped cores for both tasks, but that didn’t help.

UPDATE: Meanwhile, I had become so desperate about finding a solution that I renamed the project “dESPeRadio“. But then, during a sleepless night, I wondered if the ESP32 might perhaps need a different policy for filling the circular buffer. The chunks it reads from the WiFi client per loop cycle have an upper size limit of 1024 bytes, in order to prevent the VS1053 from running dry during the process. But I didn’t specify a lower size limit.

For the ESP8266 this was OK, but the EP32 is so fast that, after it has rapidly filled the circular buffer, there will never be more than just a couple of free positions in the next loop cycle. All following cycles will therefore be very inefficient, causing so much overhead that the VS1053 actually runs dry in a very regular time pattern – the stuttering.

So I added just one single line for skipping the fill cycle if there’s less than 512 bytes of free buffer space available and uploaded the sketch without much hope, because so many ‘good’ ideas had failed before. What followed was complete silence…, which then turned out to be the pause between two movements of  Brahms’ fourth symphony! Then the music started in clean and, above all, uninterrupted sound!

The radio has been running non-stop now for a couple of hours. It looks like I finally have a solid and stable foundation for further development of my dESPeRadio on ESP32. Although using two cores isn’t necessary, it will be nice way to explore FreeRTOS.

In order to keep the code clean and readable, all future features and controls will be implemented as (optional) includible header files. I’ll make the full code available in my first github repository soon, so if you’re interested, stay tuned.

Plans for additional control options, apart from the already implemented keypad, are:

  • ☑ an embedded web server
  • ☐ the touch screen overlay of my display
  • ☐ IR remote
  • ☐ rotary encoders for volume, bass and treble

Other plans:

  • ☑ add VU meters (I love the analogue ones!)
  • ☐ use a FreeRTOS queue instead of a circular buffer

 

 

 

 

 

Quest for Fire

By no means a gamer myself, I’ve always been interested in the math behind the graphics in video games. A previous post showed how simple clouds can be produced by the Diamond-Square algorithm, running on an esp8266. That algorithm can generate maps and landscapes as well.

Another visual effect on my todo list was algorithmically simulated fire.

 

The video shows my first attempt on a small TFT display. I had to write a relatively simple algorithm, because it needs to run on a microcontroller. Some further experimenting with the color palette, weighting factors and randomization, as well as adding Perlin noise, will hopefully result in a more realistic fire. I also plan to add a rotary encoder for regulating the ‘fire’.

 

Visual memory

Soon after my start with Arduino, I bought an external 32KB EEPROM, just in case a future project would need more memory than the board’s modest SRAM. But when I finally needed it for a sketch, I realized that the 300 KByte RAM of my 480×320 TFT display could be used for this purpose as well.

I had used pixels as memory before in visualisations of IFS fractals, where I made each pixel keep track of the number of times it was ‘hit’ by the iteration process, and then color it accordingly. Recent examples are this autumn inspired version of Barnsley’s Fern and this conifer-like fractal, based on an example on Ken Brakke’s IFS page. Both fractals were produced by an esp8266 on a 320×240 TFT display (ili9341).

Using a display as external memory for projects brings memory intensive algorithms, like the A* path search algorithm for my 15-Puzzle project, within reach of Arduino and ESP boards. The only prerequisite is that your display library lets you read single pixel values. Bonus: leaving the display’s backlight on will show a ‘brain scan’ of the sketch in progress.

 

Busy Beaver

Today I discovered that Langton’s Ant is not the only Turing machine that’s nicknamed after an animal. There’s also the Busy Beaver, invented by Tibor Radó. Despite its chaotic behaviour and unpredictable* outcome, the machine itself is extremely simple and can easily be simulated on a TFT display by an Arduino.

[video shows the final 3 steps of the busiest of all 4-state, 2-color beavers]

The beaver’s workflow is: read cell color -> lookup instructions for current [cell color/system state] combination  -> adjust state -> adjust cell color -> move head -> start again, unless state equals X.

In general, the components of a Turing machine, as described by Alan Turing in 1936, are:

  • an infinite tape, divided into an infinite number of cells, each of which can contain a symbol from a finite set of characters, including a ‘blank’ (the alfabet)
  • a head that can read and optionally change the content of a cell, as well as optionally move to the next or previous cell on tape
  • a finite set of n possible states that the system can be in, plus at least one additional state that marks the end of the process
  • a finite set of instructions that, based on the current state and the currently read symbol, decides about: (1) whether the symbol will be erased, overwritten or left as is; (2) whether the head will move one cell (left or right), or keeps it position over the tape; (3) what the next system state will beome.

The alfabet in my sketch consists of 0 (‘blank’, white cell) and 1 (blue cell). The four system states are represented by A, B, C and D. Note that this 4-state machine has a 5th state X, the exit state. A beaver must have exactly one (1) exit state. Also, all of its instructions must include movement of the head.

The algorithm could (in theory) simulate any 4-state, 2-color Turing machine on a 480×320 TFT diplay, but my sketch is initialized for simulating the ‘busiest’ of all possible 25,600,000,000** 4-state Busy Beavers: the one that stops with the maximum numbers of ‘1‘s on tape (13, after 107 steps). Luckily, it will not run out of Arduino’s limited virtual tape during the process (‘real’ Turing machines have an infinite tape, that’s why they can only exist as mathematical models).

It would have been impossible to simulate the busiest 5-state beaver, since several super computers and my Arduino 😉 are still searching for it. So far, ‘we’ have found one that has 4098 ‘1‘s on tape when it stops.

* Turing proved that no single algorithm can exist, that’s capable of predicting the outcome of all possible beavers. Apart from the trivial ones, you’ll simply have to run them…

**In a Busy Beaver Turing machine, the head will always be moved. That reduces the number of possible [color, state, move] instruction triplets for a 4-state, 2-color beaver to 2x5x2 = 20 (remember the 5th X state). We have to provide instruction triplets to 4×2 = 8 state/color combinations (state X doesn’t need one), so there are 208 different 4-state, 2-color Busy Beavers.

 

 

Newton fractal

Here’s another famous fractal that I wanted to try on Arduino: the Newton fractal. It’s named after Isaac Newton’s iterative method for finding roots of functions. The classic example applies the method to function f(z) = z3 – 1 , where z is a complex number. This function has three roots in the complex plane.

classic newton fractal

The sketch that produced this picture loops over the pixels of a 480×320 display, mapping each of them to a complex number z0, that will serve as the initial value for Newton’s iteration process:

zn+1 = zn – f(zn) / f'(zn)

The basic color of a pixel (red, green or blue) depends on to which of the three roots the process converges. Its color intensity depends on the number of iterations that were needed to reach that root within a pre-defined precision. It’s just that simple!

Apart from playing with different color mappings (always essential for producing visually appealing fractals), I wanted to use modified versions of Newton’s method, as well as to apply them to different functions. The Arduino core has no support for complex calculus, and a library that I found didn’t even compile. So I wrote a couple of basic complex functions and put them in a functions.h file. There must better ways, but it works for me.

Once you have a basic sketch, the canvas of complexity is all yours!

 

 

f(z) = z4 – 1

 

This is my functions.h file. It must be in the same folder as the fractal sketch.

 

And here’s the Newton fractal sketch. Note the #include “functions.h” on the first line.

 

 

Snowflakes & Clouds

I haven’t been working on larger projects for a while, so here’s a couple of recent chips from the workbench.

In my ongoing quest for a faster HX8357D display driver, I was exited to read that the fast TFT_eSPI library now claims to support this beautiful but rather slow display. Unfortunately, the results were disappointing (unstable, readPixel freezes the screen). However, a nice bycatch of my experiments was the discovery that display updates become noticeably faster when the esp8266’s runs at 160MHz. Very useful for my fractal sketches, and for faster aircraft position updates on the Flight Radar‘s map.

Then I wondered how fast an esp8266 could actually drive my ili9341 driven 320×240 display. So I used my fast ‘offset-modulo’ Koch Snowflake algorithm for a snowflake animation speed test.

The result: an esp8266 at 160MHz can draw more than 100 (iteration depth 3) snowflakes per second on an ili9341 display, using the TFT_eSPI library. That’s about 5x faster than Adafruit’s HX8357 library can achieve on my 480×320 display. Hopefully, there will be a fast and reliable library for that display some day. Update: the TFT_eSPI library now supports driving the HX8457 in parallel mode and the results are impressive.

 

My renewed attention to the Koch Snowflake led to the discovery of a fractal that was new to me: the Plasma fractal. A commonly used algorithm to produce it is the Diamond-square algorithm. It’s apparently being used to generate nature-like backgrounds for video games.

I wrote a small demo sketch that produces random clouds (well, sort of…), but it can also be used to produce (non-musical) rock formations or landscapes.

6 samples of randomly generated clouds

(picture quality suffers from camera pixel interference with the TFT display)

 

The Diamond-square algorithm assigns initial values to the four corners of a (2n+1)x(2n+1) grid. Then it starts a process of calculating the remaining elements of the grid, based on neighbour values and a random noise of decreasing magnitude. The visual result depends on the four initial corner values, the random noise and the chosen color mapping.

The name Plasma fractal becomes clear when we show consecutive results in a loop. While still using random noise, the trick is to initialize the random generator with the same seed value for every new loop cycle. By slightly changing the initial corner values for each new cycle, the results will also be slightly different from the previous one, which is exactly what we want for a smooth animation.

The general formula that I used for initializing corner k with value h[k] is:

h[k] = d[k] + a[k]*sin( t*f[k] )   where t is a loop counter.

So the value of corner k will periodically swing around value d[k] with amplitude a[k]. The period of the swing is determined by f[k].

Now we can play with the d, a and f values for each corner, as well as with the color mapping, in order to find visually appealing animations. Here’s a first impression (esp8266 & ili9341 320×240 display). The sketch uses a 65×65 grid, with every point filling a corresponding 3×4 rectangle on the display with it’s color value. Since grid values are stored in 4-byte float variables, a 129×129 grid would become too large for the esp8266.

 

Used values:

Assignment:

Color mapping:

 

 

VS1053 + Matrix keypad

 

The VS1053b is a popular chip, used in many MCU controlled audio projects. It surprises me that nobody seems to use its eight GPIO pins, controllable over SPI by the master MCU. These GPIO pins enabled me to control my esp8266 based Internet radio by means of a matrix keypad, despite the fact that the esp8266 had only one free pin left.

The cheap 12-key membrane keypad shown in the picure uses 7 of these pins, wheras a 16-key matrix keypad uses all of them.

 

Adafruit’s VS1053 library has basic functions for adressing the chip’s GPIO pins. I wrote a simple demo sketch that prints the pressed key. In a real project, this information would typically be used to control other functions of the VS1053, like changing audio source or adjusting volume.

 

Make sure that pin definitions in the demo sketch match your MCU and connect RST to your board’s RST.

The VS1053_GPIO wiring for the demo sketch is different from what the picture shows. This is what I used for my 12-key membrane matrix keypad:

 

 

 

 

 

The connector’s column/row mapping for these particular keypads is usually as follows (from left to right in the picture): row0 – row1 – row2 -row3 – col0 – col1 – col2. The sketch can easily be adapted for a 16-key matrix keypad.

The basic idea of the sketch is:

  • keypadListen() sets all four row pins to INPUT with value 0; it sets all three column pins to OUTPUT with value 1. In this idle state, GPIO_digitalRead() will return B00000111 (decimal 7).
  • In loop(), we check GPIO_digitalRead(). A value higher than 7 means that one of the row pins has changed from 0 to 1 because it got connected to one of the column pins. This happens if a key on that row was pressed (that’s how matrix keypads are wired). We call keypadRead() to find the responsible key.
  • keypadRead() searches the row that caused the GPIO_digitalRead() > 7 condition, by checking each row pin until it reads 1. This gives us the row of the pressed key: R.
  • Next, it finds the column pin that is connected to row pin R by setting all column pins to INPUT with value 0 and row pin R only to OUTPUT with value 1. Checking each column pin until it reads 1 gives the column of the pressed key: C.
  • Now, keymap[R][C] holds the value of the pressed key (10 and 11 for * and #).
  • A simple debouncer is added for, well, debouncing.

 

15-Puzzle (part I)

Part I of this ’15-Puzzle’ project was writing a simple sketch that shows the puzzle in its solved state, then shuffles it and lets you solve it manually. The more challenging part, of course, will be to write a generic solver algorithm that fits inside the limited memory of an Arduino Uno (and finishes within a reasonable amount of time).

 

First, I considered the following algorithms:

  • Human approach algorithm. Reasonably simple to write, but it’s not likely to find the fastest solution in terms of moves. It also becomes more complicated if you want to program the tricks that humans develop by thinking moves ahead.
  • Dijkstra’s algorithm, which would be a nice tribute to my former teacher. It will find the shortest solution, but may not fit into an Uno (and may be rather slow as well).
  • An A* algorithm (had never heard of it until today). It promises to find the shortest solution and, based on heuristics, will usually be faster than Dijkstra.

It didn’t take long before I realized that options 2 and 3 are definitely out of Arduino Uno’s league. Being graph search algorithms, they require storage of many nodes and paths during the search process. Since every node is a permutation of 16 digits, the sketch would certainly run out of memory.

So what remains is the challenge of implementing option 1. For nostalgic reasons, I’m going to use a method that I figured out when I was 8 years old (so probably not the most efficient way to solve this puzzle).

Will be continued. For now, here’s the sketch of part I:

 

More fractals…

At the end of my exploration of the fractal-universe, I had my tireless Arduino Uno create a couple of ‘classics’. The video shows the growth of the famous Barnsley Fern.

This is a special IFS fractal: each iteration step transforms just one single pixel (instead of the usual geometric shape), using one out of four possible transformations. By changing the parameters and/or the probabilities of the transformations, different kinds of fern patterns will appear.

After I’d finished the sketch (the simplest of all my fractal sketches), I could hardly believe that it could generate such a beautiful image (after 50,000 iterations).

And then, of course, there is the mother of all fractals, named after the (disputed) father of all fractalists: Mandelbrot. The sketch to produce it is much simpler than most backtracking sketches, but it may take up to 256 iterative function calls for every pixel to calculate its color. With 153,600 pixels, it took the Uno a couple of minutes to finish the picture.

The black area is a 2D representation of the actual Mandelbrot Set, but the fascinating stuff is happening at its border. By changing the coordinates and zoom level inside the sketch, you can zoom in at any area. Key factors for getting spectacular results are: selection of the area, maximum number of iterations and color mapping.

 

This short exploration wouldn’t be complete without mentioning the elegant Julia fractals. Like Mandelbrot fractals, they are created by iteration of the function f(z) = z² + c  (in the complex plane). For Julia fractals, c is fixed and we examine the function’s behaviour for all z values within our range of interest. So there’s a Julia set Jc for every c in the complex plane, but the most visually appealing Julia fractals will arise from c values at the border of the Mandelbrot set.

Julia fractal (c = -0.79 -0.15i) produced by an esp8266 on a 480×320 TFT display

IFS fractals

The use of backtracking in my Pythagorean tree sketch made me realize that it can be useful for creating other IFS fractals as well (fractals produced by Iterated Function Systems). These fractals are very ‘geometric’ by nature (and can actually be found in nature). Classic examples are the Sierpinski Gasket and the Koch Snowflake.

Memory usage for backtracking is O(d), where d is the chosen recursion depth. For my purpose (displaying contractive fractals on a 480×320 display), d will usually be quite small (<15) in order to prevent objects from becoming single pixels. That means that most sketches will run fine on an Arduino Uno.

The above video shows a (slightly randomized) Binary Tree fractal

Generating Binary Tree fractals resembles the growth of a plant. Starting with a vertical trunk, branches recursively split into two new branches, growing in directions determined by θ (0°<θ<180°) while their length is reduced by a factor r (0<r<1).

In symmetric binary trees, θ and r are fixed. In order to produce more naturally looking ‘plants’, I applied some randomization to both θ and r for every new branch. The drawing order of splitting branches is randomized as well. As a finishing touch, branches with lengths smaller than a treshold value will be drawn in red.

 

 

Another famous IFS fractal is the Sierpinski Carpet. The largest square in the picture is 243×243 pixels wide. Iteration depth is 5, so the smallest black squares become single pixels (you may have to zoom in to see them).

 

 

 

 

And this is the Sierpinski Gasket (or Triangle) with iteration depth 6. It’s interesting that the same pattern can also be generated by elementary cellular automaton Rule 30, by a Lindenmayer system or by a Chaos game method.

 

 

 

Next is the Koch Snowflake (actually a triangle of three Koch curves) with iteration depth 4.

After writing a familiar backtracking based algorithm, I also wrote a far more compact sketch for drawing a Koch curve. It’s the second sketch at the bottom of this page.

 

 

My last example is the space-filling (pseudo) Hilbert curve. It took me some time to understand why a backtracking approach is less suitable in this case. Each level consists of more than just four transformations of its parent’s pattern, because these transformations have to be interconnected as well. Moreover, since levels don’t share any lines with previous levels (see video below), you can’t use the display to preserve lines.

So I went for a solution that iteratively builds an array of drawing instructions. This results in a very simple algorithm, but the size of the instructions array is equal to  4n – 1, where n is the desired iteration depth. With some bitwise tricks, an Arduino Uno can draw the Hilbert curve for up to n=6, which happens to be the maximum for my 480×320 display, because line distances would become smaller than a pixel for n>6.

 

 

This is the sketch that generated the fractal in the Binary Tree video:

 

And here’s my newest sketch for drawing a Koch Snowflake. It superimposes the same pattern of rotation instructions with a different offset and periodicity for each level. I like the elegance of this algorithm. It kind of generates and processes the final string of an L-system on the fly, without any rewriting or intermediate data storage.