2D arrays in PSRAM

This post shows a minimal example sketch for storing a 2D array  in PSRAM memory of ESP32 WROVER modules. It’s mainly for my own reference, now that I’ve finally figured out how these notorious C++ pointers work*.

(An earlier post already showed a simple way to store 2D arrays in PSRAM, but that required some row/column book keeping).

The following sketch creates a 2D array named grid for storing float values in PSRAM. Then it fills all ‘cells’ with values of the form <column>.<row>. Finally, it prints the value of grid[123][234] (which should be 123.234, of course).


Storing, for example, uint16_t values instead of floats only requires replacement of all occurrences of float with uint16_t.

Note that you can store multiple 2D arrays (even of different types) this way. Just declare an additional pair of global pointers of the desired type and add a corresponding ‘create’ section for each array in setup().

Elements in the grid array from the above example can now simply be addressed like grid[x][y], just like in regular 2D arrays. This convenience comes at a price:

  • The presented method needs some extra memory for storing additional column pointers (the term sizeof(float *) * cols in the calculation formula of len).
  • My (and probably everybody’s) favorite ESP graphics library TFT_eSPI can push an entire buffer of pixel values to a TFT display (very fast!). However, that buffer needs to be referrenced to by a single pointer, whereas grid is declared as a double pointer. As far as I know, that means that you’ll have to push the grid array column by column, where grid[i] is a pointer to the column with (zero-offset) index i.

I’m now using the described method in sketches with many single-pixel read/write operations, or with multiple arrays stored in PSRAM. For ‘speedy’ TFT projects, I’ll keep stitching my rows and columns together for storage in a single 1D PSRAM array.


* Finding this piece of information about pointers was very helpful:  “… type is required to declare a pointer. This is the type of data that will live at the memory address it points to.” Before that, I presumed that pointers were supposed to be unsigned integers, long enough to hold any address they could point to.

Self-replicating Cellular Automata

Studying John von Neumann’s self-reproducing universal constructor* reminded me of some unfinished business in my earlier Cellular Automata (CA) projects. Always looking for coding challenges, I wondered if I could write a sketch for simulating self-reproducing CA loops that would fit inside an ESP32.

The general idea is to have a 2-dimensional grid of cells, each of which can be in one out of n states. These cells periodically change their states simultaneously, according to a set of rules. Each cell complies with the same set of rules, which lets it calculate its new state based on its current state and that of its direct neighbours. The way this ‘organism’ will evolve also depends on its initial pattern. A self-reproducing CA is a combination of states, rules and initial pattern that will continuously produce copies of the initial pattern, that will produce more copies of the initial pattern… etc. etc.

I guessed that Von Neumann’s universal automaton, with its 29 cell states and large set of rules, could be well out of ESP32’s league. Fortunately there are some famous (be it less universal) simplifications, so I started with the simplest one I could find: Chou-Reggia-2.

ESP32 WROVER simulating Chou-Reggia-2 on a 320×240 display (1-pixel per cell)

The video shows the indefatigable CA at work, producing ever more copies of the initial pattern (just 5 cells). It mimics a much sought after form of life: old cells don’t die…

They’re just like kids, they grow up so fast 😉


In order to squeeze my Chou-Reggia-2 algorithm into the ESP32, I had to come up with some new programming- and storage techniques for cellular automata. I ended up writing a framework for CA sketches, that can import start patterns and rules from Golly. It runs on ESP32 WROVER modules, using PSRAM for storing a double display/states buffer. The two cores of the ESP32 cooperatively calculate a new grid state based on the current one.

As a proof of concept for the new framework, I initialised it with the specific data for Langton’s Loop, an automaton similar to Chou-Reggia-2, but with more rules. Everything worked fine, as did two slightly more complicated automata that have variables in their Golly rules, as well as substantially more states: Evoloop-finite and Perrier Loop.

Now that I have it up and running, my CA framework for ESP32 WROVER modules reduces simulation of most 2D cellular automata to feeding it with their specific data: states, transition rules (with or without variables), neighborhood (Moore or Von Neumann) and rule symmetry, all imported from Golly.

[ After further optimization, my CA sketches run 5x faster than what is shown in the videos! ]

Close up of Langton’s Loop on an ESP32-driven 320×240 TFT display (slow version)

Because every ‘cell’ is represented by a single pixel on a small 2.4″ display, I had to zoom in on the pattern in order to show the difference between ‘living’ and ‘dead’ cells (Langton’s ‘organism’ grows like coral: inner cells die, but their cell-walls are preserved).

(See this post for an implementation of Perrier Loop inside the same framework)

* Von Neumann’s abstract and purely logical reasoning about self-replication eventually helped biologists to understand how (real) cells manage to make exact copies of themselves.



Dual Core Business

Almost a year after finishing my ESP8266-based Internet Radio project, I still have had no idea why running that sketch on the much more powerful ESP32 produces a heavily stuttering sound, resembling Michael Palin in “A Fish Called Wanda”.  So I wondered if I could solve this problem by dividing the audio streaming process between the ESP32’s two cores: filling the audio buffer on one core, and feeding the VS1053 decoder from that buffer on the other core.

Thanks to its builtin FreeRTOS, pinning a task to a specific core of the ESP32 is easy:

1. Put the code for the task in a function, wrapped in an endless for (;;) { } loop.

2. Create a task handle for the task in the main section of the sketch:

3. Create the task, specifying which function it should execute on which core:


(Since there was nothing in my loop() function (yet), I put a delay(1000); command in it. The loop() function runs on core 1, and I didn’t want it to claim too much of that core’s processor time.)

So my Radio sketch will have two independent tasks running on different cores: one that listens to the radio station’s audio stream for filling the circular buffer, and one for feeding the VS1053 decoder with bytes from that buffer. Both processes maintain their own pointer for accessing the circular buffer and the code guarantees that they will never catch up with the other process’ pointer. So there is no need to use a semaphore.

However, my first step into the dual core business produced an error. A ‘watchdog timer’ had become nervous and halted the ESP32. It turned out that including the following line in both functions solved at least that problem (bug?):


Now the ESP32 no longer crashed, and it actually produced sound. But alas, it was Michael Palin again…!

I can’t think of any reason for this behaviour. The serial monitor shows that the ESP32 has no problem filling the 30 Kbyte circular buffer, so feeding the VS1053 seems to be the bottleneck. I swapped cores for both tasks, but that didn’t help.

UPDATE: Meanwhile, I had become so desperate about finding a solution that I renamed the project “dESPeRadio“. But then, during a sleepless night, I wondered if the ESP32 might perhaps need a different policy for filling the circular buffer. The chunks it reads from the WiFi client per loop cycle have an upper size limit of 1024 bytes, in order to prevent the VS1053 from running dry during the process. But I didn’t specify a lower size limit.

For the ESP8266 this was OK, but the EP32 is so fast that, after it has rapidly filled the circular buffer, there will never be more than just a couple of free positions in the next loop cycle. All following cycles will therefore be very inefficient, causing so much overhead that the VS1053 actually runs dry in a very regular time pattern – the stuttering.

So I added just one single line for skipping the fill cycle if there’s less than 512 bytes of free buffer space available and uploaded the sketch without much hope, because so many ‘good’ ideas had failed before. What followed was complete silence…, which then turned out to be the pause between two movements of  Brahms’ fourth symphony! Then the music started in clean and, above all, uninterrupted sound!

The radio has been running non-stop now for a couple of hours. It looks like I finally have a solid and stable foundation for further development of my dESPeRadio on ESP32. Although using two cores isn’t necessary, it will be nice way to explore FreeRTOS.

In order to keep the code clean and readable, all future features and controls will be implemented as (optional) includible header files. I’ll make the full code available in my first github repository soon, so if you’re interested, stay tuned.

Plans for additional control options, apart from the already implemented keypad, are:

  • ☑ an embedded web server
  • ☐ the touch screen overlay of my display
  • ☐ IR remote
  • ☐ rotary encoders for volume, bass and treble

Other plans:

  • ☑ add VU meters (I love the analogue ones!)
  • ☐ use a FreeRTOS queue instead of a circular buffer






Quest for Fire

By no means a gamer myself, I’ve always been interested in the math behind the graphics in video games. A previous post showed how simple clouds can be produced by the Diamond-Square algorithm, running on an esp8266. That algorithm can generate maps and landscapes as well.

Another visual effect on my todo list was algorithmically simulated fire.


The video shows my first attempt on a small TFT display. I had to write a relatively simple algorithm, because it needs to run on a microcontroller. Some further experimenting with the color palette, weighting factors and randomization, as well as adding Perlin noise, will hopefully result in a more realistic fire. I also plan to add a rotary encoder for regulating the ‘fire’.

[Update: a newer version is here]

Visual memory

Soon after my start with Arduino, I bought an external 32KB EEPROM, just in case a future project would need more memory than the board’s modest SRAM. But when I finally needed it for a sketch, I realized that the 300 KByte RAM of my 480×320 TFT display could be used for this purpose as well.

I had used pixels as memory before in visualisations of IFS fractals, where I made each pixel keep track of the number of times it was ‘hit’ by the iteration process, and then color it accordingly. Recent examples are this autumn inspired version of Barnsley’s Fern and this conifer-like fractal, based on an example on Ken Brakke’s IFS page. Both fractals were produced by an esp8266 on a 320×240 TFT display (ili9341).

Using a display as external memory for projects brings memory intensive algorithms, like the A* path search algorithm for my 15-Puzzle project, within reach of Arduino and ESP boards without PSRAM. The only prerequisite is that your display library lets you read single pixel values. Bonus: leaving the display’s backlight on will show a ‘brain scan’ of the sketch in progress.


Busy Beaver

Today I discovered that Langton’s Ant is not the only Turing machine that’s nicknamed after an animal. There’s also the Busy Beaver [sic], invented by Tibor Radó. Despite its chaotic behaviour and unpredictable* outcome, the machine itself is extremely simple and can easily be simulated on a TFT display by an Arduino.

[video shows the final 3 steps of the busiest of all 4-state, 2-symbol beavers]

The beaver’s workflow is: read cell color -> lookup instructions for current [cell color/system state] combination  -> adjust state -> adjust cell color -> move head -> start again, unless state equals X.

In general, the components of a Turing machine, as described by Alan Turing in 1936, are:

  • an infinite tape, divided into an infinite number of cells, each of which can contain a symbol from a finite set of characters, including a ‘blank’ (the alfabet)
  • a head that can read and optionally change the content of a cell, as well as optionally move to the next or previous cell on tape
  • a finite set of n possible states that the system can be in, plus at least one additional state that marks the end of the process
  • a finite set of instructions that, based on the current state and the currently read symbol, decides about: (1) whether the symbol will be erased, overwritten or left as is; (2) whether the head will move one cell (left or right), or keeps it position over the tape; (3) what the next system state will beome.

The alfabet in my sketch consists of 0 (‘blank’, white cell) and 1 (blue cell). The four system states are represented by A, B, C and D. Note that this 4-state machine has a 5th state X, the exit state. A beaver must have exactly one (1) exit state. Also, all of its instructions must include movement of the head.

The algorithm could (in theory) simulate any 4-state, 2-symbol Turing machine on a 480×320 TFT diplay, but my sketch is initialized for simulating the ‘busiest’ of all possible 25,600,000,000** 4-state Busy Beavers: the one that stops with the maximum numbers of ‘1‘s on tape (13, after 107 steps). Luckily, it will not run out of Arduino’s limited virtual tape during the process (‘real’ Turing machines have an infinite tape, that’s why they can only exist as mathematical models).

It would have been impossible to simulate the busiest 5-state 2-symbol beaver, since several super computers and my Arduino 😉 are still searching for it. So far, ‘we’ have found one that has 4098 ‘1‘s on tape when it stops.

* Turing proved that no single algorithm can exist, that’s capable of predicting the outcome of all possible beavers. Apart from the trivial ones, you’ll simply have to run them…

**In a Busy Beaver Turing machine, the head will always be moved. That reduces the number of possible [color, state, move] instruction triplets for a 4-state, 2-color beaver to 2x5x2 = 20 (remember the 5th X state). We have to provide instruction triplets to 4×2 = 8 state/color combinations (state X doesn’t need one), so there are 208 different 4-state, 2-color Busy Beavers.



Newton fractal

Here’s another famous fractal that I wanted to try on Arduino: the Newton fractal. It’s named after Isaac Newton’s iterative method for finding roots of functions. The classic example applies the method to function f(z) = z3 – 1 , where z is a complex number. This function has three roots in the complex plane.

classic newton fractal

The sketch that produced this picture loops over the pixels of a 480×320 display, mapping each of them to a complex number z0, that will serve as the initial value for Newton’s iteration process:

zn+1 = zn – f(zn) / f'(zn)

The basic color of a pixel (red, green or blue) depends on to which of the three roots the process converges. Its color intensity depends on the number of iterations that were needed to reach that root within a pre-defined precision. It’s just that simple!

Apart from playing with different color mappings (always essential for producing visually appealing fractals), I wanted to use modified versions of Newton’s method, as well as to apply them to different functions. The Arduino core has no support for complex calculus, and a library that I found didn’t even compile. So I wrote a couple of basic complex functions and put them in a functions.h file. There must better ways, but it works for me.

Once you have a basic sketch, the canvas of complexity is all yours!



f(z) = z4 – 1


This is my functions.h file. It must be in the same folder as the fractal sketch.


And here’s the Newton fractal sketch. Note the #include “functions.h” on the first line.



Snowflakes & Clouds

I haven’t been working on larger projects for a while, so here’s a couple of recent chips from the workbench.

In my ongoing quest for a faster HX8357D display driver, I was excited to read that the fast TFT_eSPI library now claims to support this beautiful but rather slow display. Unfortunately, the results were disappointing (unstable, readPixel freezes the screen). However, a nice bycatch of my experiments was the discovery that display updates become noticeably faster when the esp8266’s runs at 160MHz. Very useful for my fractal sketches, and for faster aircraft position updates on the Flight Radar‘s map.

Then I wondered how fast an esp8266 could actually drive my ili9341 driven 320×240 display. So I used my fast ‘offset-modulo’ Koch Snowflake algorithm for a snowflake animation speed test.

The result: an esp8266 at 160MHz can draw more than 100 (iteration depth 3) snowflakes per second on an ili9341 display, using the TFT_eSPI library. That’s about 5x faster than Adafruit’s HX8357 library can achieve on my 480×320 display. Hopefully, there will be a fast and reliable library for that display some day. Update: the TFT_eSPI library now supports driving the HX8457 in parallel mode and the results are impressive.


My renewed attention to the Koch Snowflake led to the discovery of a fractal that was new to me: the Plasma fractal. A commonly used algorithm to produce it is the Diamond-square algorithm. It’s apparently being used to generate nature-like backgrounds for video games.

I wrote a small demo sketch that produces random clouds (well, sort of…), but it can also be used to produce (non-musical) rock formations or landscapes.

6 samples of randomly generated clouds

(picture quality suffers from camera pixel interference with the TFT display)


The Diamond-square algorithm assigns initial values to the four corners of a (2n+1)x(2n+1) grid. Then it starts a process of calculating the remaining elements of the grid, based on neighbour values and a random noise of decreasing magnitude. The visual result depends on the four initial corner values, the random noise and the chosen color mapping.

The name Plasma fractal becomes clear when we show consecutive results in a loop. While still using random noise, the trick is to initialize the random generator with the same seed value for every new loop cycle. By slightly changing the initial corner values for each new cycle, the results will also be slightly different from the previous one, which is exactly what we want for a smooth animation.

The general formula that I used for initializing corner k with value h[k] is:

h[k] = d[k] + a[k]*sin( t*f[k] )   where t is a loop counter.

So the value of corner k will periodically swing around value d[k] with amplitude a[k]. The period of the swing is determined by f[k].

Now we can play with the d, a and f values for each corner, as well as with the color mapping, in order to find visually appealing animations. Here’s a first impression (esp8266 & ili9341 320×240 display). The sketch uses a 65×65 grid, with every point filling a corresponding 3×4 rectangle on the display with it’s color value. Since grid values are stored in 4-byte float variables, a 129×129 grid would become too large for the esp8266.


Used values:


Color mapping:



VS1053 + Matrix keypad


The VS1053b is a popular chip, used in many MCU controlled audio projects. It surprises me that nobody seems to use its eight GPIO pins, controllable over SPI by the master MCU. These GPIO pins enabled me to control my esp8266 based Internet radio by means of a matrix keypad, despite the fact that the esp8266 had only one free pin left.

The cheap 12-key membrane keypad shown in the picure uses 7 of these pins, wheras a 16-key matrix keypad uses all of them.


Adafruit’s VS1053 library has basic functions for adressing the chip’s GPIO pins. I wrote a simple demo sketch that prints the pressed key. In a real project, this information would typically be used to control other functions of the VS1053, like changing audio source or adjusting volume.


Make sure that pin definitions in the demo sketch match your MCU and connect RST to your board’s RST.

The VS1053_GPIO wiring for the demo sketch is different from what the picture shows. This is what I used for my 12-key membrane matrix keypad:






The connector’s column/row mapping for these particular keypads is usually as follows (from left to right in the picture): row0 – row1 – row2 -row3 – col0 – col1 – col2. The sketch can easily be adapted for a 16-key matrix keypad.

The basic idea of the sketch is:

  • keypadListen() sets all four row pins to INPUT with value 0; it sets all three column pins to OUTPUT with value 1. In this idle state, GPIO_digitalRead() will return B00000111 (decimal 7).
  • In loop(), we check GPIO_digitalRead(). A value higher than 7 means that one of the row pins has changed from 0 to 1 because it got connected to one of the column pins. This happens if a key on that row was pressed (that’s how matrix keypads are wired). We call keypadRead() to find the responsible key.
  • keypadRead() searches the row that caused the GPIO_digitalRead() > 7 condition, by checking each row pin until it reads 1. This gives us the row of the pressed key: R.
  • Next, it finds the column pin that is connected to row pin R by setting all column pins to INPUT with value 0 and row pin R only to OUTPUT with value 1. Checking each column pin until it reads 1 gives the column of the pressed key: C.
  • Now, keymap[R][C] holds the value of the pressed key (10 and 11 for * and #).
  • A simple debouncer is added for, well, debouncing.


15-Puzzle (part I)

Part I of this ’15-Puzzle’ project was writing a simple sketch that shows the puzzle in its solved state, then shuffles it and lets you solve it manually. The more challenging part, of course, will be to write a generic solver algorithm that fits inside the limited memory of an Arduino Uno (and finishes within a reasonable amount of time).


First, I considered the following algorithms:

  • Human approach algorithm. Reasonably simple to write, but it’s not likely to find the fastest solution in terms of moves. It also becomes more complicated if you want to program the tricks that humans develop by thinking moves ahead.
  • Dijkstra’s algorithm, which would be a nice tribute to my former teacher. It will find the shortest solution, but may not fit into an Uno (and may be rather slow as well).
  • An A* algorithm (had never heard of it until today). It promises to find the shortest solution and, based on heuristics, will usually be faster than Dijkstra.

It didn’t take long before I realized that options 2 and 3 are definitely out of Arduino Uno’s league. Being graph search algorithms, they require storage of many nodes and paths during the search process. Since every node is a permutation of 16 digits, the sketch would certainly run out of memory.

So what remains is the challenge of implementing option 1. For nostalgic reasons, I’m going to use a method that I figured out when I was 8 years old (so probably not the most efficient way to solve this puzzle).

Will be continued. For now, here’s the sketch of part I: