ESP32-CAM demystified

I’ve always felt a bit intimidated by the many different ESP32 Camera boards, as well as by the abundance of information and example programs. Uncertain whether I’d be capable of writing my own application, I took the plunge anyway and purchased a TTGO-T-Camera Plus. No, not the selfie version, as you may have guessed from the picture 🙂

After examining some examples, I soon noticed that most of their code was based on the same source. Espressif’s well documented library and some of lewisxhe’s github repositories then helped me find the right ‘abstraction level’ for dealing with these boards, treating low level libraries and some standard code snippets as black boxes.

Maybe my first basic steps below can help other beginners.


The mission
In order to grasp the basic concept behind all ESP32-Camera applications, I just wanted to make the TFT display of my board show the live camera image. No streaming over WiFi yet, just a continuous transfer of pixel color values from the camera’s sensor to the display, as fast as possible for the chosen resolution. If I could achieve this, everything else would require just general programming skills rather than camera-specific knowledge. And I wanted to use the good old-fashioned Arduino IDE, of course.

Which board to select?
Since my TTGO board wasn’t listed in the Arduino IDE (and couldn’t be added via Board Manager), I could have selected ESP32 Dev Module (with PSRAM enabled) or ESP32 Wrover Module. However, I always prefer to manually add unlisted boards so they can have their own pins_arduino.h file. This requires adding a section in the boards.txt file and creating a pins_arduino.h file in a subdirectory for the added board under variants. On a Windows machine, the location of boards.txt is something like:
But as I said, you can also select a standard board and hope there will be no conflicts.

Basic sketch


Explanation of the most important lines:

Set resolution
The following line in the above code tells the camera to use HQVGA resolution (240×176), which will make the image frames fit on the 240×240 display of my TTGO board.

Names of resolutions that are supported by the mounted OV2640 camera are:


Set format
This line defines the format in which a frame is stored in the frame buffer(s).

All supported format options are:



Espressif’s esp_camera library is what powers most of the sketch. This is an ESP32 core library, so if your Arduino IDE is ‘esp aware’, no installation is required. Once included, it lets you create objects from structs (hope I use the correct C++ terminology) like gpio_config_t, sensor_t, camera_config_t and camera_fb_t.

The #include “pins_arduino.h” line will look for a pins_arduino.h file in your sketch folder or, if not present (as in my case), in the board’s subdirectory under variants (see the earlier  “Which board to select?” section). These are the board specific pin definitions for my board:


As you can see, I included the TFT_eSPI library to drive my board’s ST7789 display. Make sure to change the library’s User_Setup_Select.h file accordingly.

Grabbing a frame

This crucial line creates a pointer *fb to the last filled frame buffer (most recent snapshot). In a streaming situation, it will typically live inside in a loop. The frame buffer is a struct that contains a pixel data array buf, as well as metadata like frame size, timestamp etc.

Now we need to process the information in the frame buffer, usually as fast as possible. Regardless of the chosen CAMERA_PIXEL_FORMAT, the frame buffer will always store frame values as 8-bit unsigned integers (uint8_t). Since the example sketch asks for RGB565  color values, the following line reconstructs 16-bit RGB565 color values from pairs of 8-bit values in the fb->buf array and stores them in the 16-bit rgbBitmap array.

The rgbBitmap array is then pushed to the display by the following lines:

After the entire frame buffer has been processed, the resources need to be released asap by the following lines:


Result and tuning
The above sketch immediately worked and the visual result was smoother than expected. Then I added a couple of lines to measure the achieved frame rate. Running on my TTGO board, the above example sketch attains a stable 22 fps. Not really impressive, but fast enough for a flicker-free display.


Trying to improve the frame rate, I wrote a version that accepts and processes a JPEG decoded frame buffer. Here it is:

This version also works fine, but its frame rate tops at 17 fps. A jpeg encoded frame buffer can be usefull, though, in cases were the (much smaller) buffer needs to travel over WiFi.

Two sketch parameters in particular are of interest when trying to get higher frame rates : xclk_freq_hz and fb_count.

The maximum frame rate you can theoretically get is limited by the camera sensor itself. It is shown in this table (for OV2640):

Note the counterintuitive effect of higher frame rates for half the frequency. These figures were confirmed by a dummy sketch that only grabs a buffer and immediately releases it. For ‘real’ applications that actually process the frame buffer, you simply have to find out what gives the best result.

Although a comment in some example mentions that the value of fb_count should only be larger than 1 for JPEG buffers, the RGB565 example in this post works best (on PSRAM boards) with fb_count = 4 , and the JPEG example with fb_count = 2. Again, you simply have to play with these parameters to find the best values for your application and board.

What’s next?
Now that I understand the basic concept, next step will be to stream the frames over http, and control the camera via a browser. All examples that I could find use the same concept: an http daemon (webserver) that serves an index.html file showing camera controls. Actually, they all use the same files. I’m not sure what to do yet: adapt these files or write something myself. As my camera will be mounted on my DIY 2-axis gimbal, I also want to control its movement via the same web page.

Other ToDos:

  • Save frame captures to SD card or web storage (e.g. for time lapse videos)
  • Stream a continuous 360° panorama view using my new DIY gimbal
  • Experiment with the onboard I2S MEMS microphone
  • Experiment with face/object/color recognition

To be continued…