Fig.1. The iron robot final result
The ESP32 is a very capable microcontroller that can drive multiple devices simultaneously at a good frequency, like displays and sensors. In this project, I connect two SPI TFT displays to the ESP32-S3, along with a Time-of-Flight (ToF) sensor that allows the ESP32 to "see" its surroundings.
The first section describes how to connect the displays and run the first example of two animated eyes. The second section is about interaction: I add the ToF sensor and show how to recover the data while maintaining a smooth, high-frequency update of the displays.
So, without more introduction, let's jump into the rendering and animations.
The core of the system is a simple animation trick. The ESP32 loads a large 350x350 eye image from flash memory. Since this image is larger than the 240x240 display, only a part of the image can be rendered in full resolution at any single time. It's like looking through a circular tube window over the full image. By moving this virtual window, we create the illusion of the eye moving within the screen. So, actually, the eye is never moving; it's the window that moves in the opposite direction over the static texture. In the code, I made a special movement of the window that mimics how a real eye moves—in saccades. It moves quickly in one direction and then slows down as it approaches the wanted gaze position.
Lifelike Eye Animation: The Saccade Movement
To make the eyes appear more natural and alive, I've implemented a behavior that mimics a biological phenomenon known as saccadic eye movement. Saccades are the extremely rapid, darting motions the eyes make between two points of fixation.
In this project, the saccade behavior serves as the "idle" or "daydreaming" state for the eyes. It activates whenever the ToF sensor does not detect a person or object to focus on. This prevents the eyes from looking static and robotic, giving them a more organic and convincing presence.
How It Works
The saccade logic is governed by a simple yet effective set of rules:
* Activation Condition: The idle saccade behavior begins only when the ToF sensor has not detected a valid target for a specific period (defined as `SACCADE_DELAY_AFTER_TRACK_MS`). This ensures a smooth, natural transition from actively tracking an object to the idle, random-gazing state.
* Timed Intervals: Once in the idle state, the system generates a new random target at regular intervals (defined as `SACCADE_INTERVAL_MS`).
* Generating a New Target: A new target position is calculated by generating random horizontal (x) and vertical (y) coordinates, which are normalized between -1.0 and 1.0.
* Smooth, Interpolated Movement: Instead of instantly snapping to the new random position, the eye's movement is smoothed using Linear Interpolation (LERP), creating a fluid, controlled motion that closely resembles a biological saccade.
To achieve a fluid and natural motion for the eyes, we use Linear Interpolation, or LERP. The goal is to move the eye from its current position to a new target position smoothly over several frames, avoiding the robotic "snap" of an instantaneous jump.
In each frame of the main loop, the code calculates the next small step the eye should take to get closer to its destination.
cpp
// From main.cpp
eyes[i].x += (final_target_x - eyes[i].x) * LERP_SPEED;
eyes[i].y += (final_target_y - eyes[i].y) * LERP_SPEED;
Let's break down this calculation for the horizontal (x) position:
(final_target_x - eyes[i].x): Calculates the total distance remaining between the target and the eye's current position.
* LERP_SPEED: Takes a small fraction of that total distance. LERP_SPEED (e.g., 0.1) determines how "fast" the eye moves.
eyes[i].x += ...: Adds this small, calculated step to the eye's current position.
The beauty of this method is that it creates an automatic "easing" effect. When the eye is far from its target, the step is large, causing it to move quickly. As it gets closer, the distance decreases, making each subsequent step smaller. This causes the eye to naturally and smoothly slow down as it approaches its final destination, perfectly mimicking the way organic eyes focus on a new point.
The hardware for the project is straightforward:
ESP32 Dev Board: I used the ESP32-S3 with N16R8 (16MB of flash and 8MB of PSRAM) to store multiple eye images.
LCD Display: I used the 1.28" LCD running the GC9A01 IC driver (Fig. 2B - note: reference to figure in original text). It's supported by the TFT_eSPI library and has good speed.
Lens: To add emphasis and presence, I added a plano-convex lens over the display (Fig. 2A). A 37mm diameter lens covers slightly more than the display to avoid border effects.
Fig. 2A Acrylic Plano-convex Lens Diam 37mm
Fig. 2B 1.28 Inch (32mm) TFT LCD Display Module Round RGB 240*240 GC9A01
Fig. 2C ESP32-S3 N16R8.
We have two displays to connect, and most pins share the same lines on the ESP32. The SPI lines (SCL, SDA), the data/command pin (DC), and reset (RST) are all shared. Both screens are reset at the same time.
However, the CS (Chip Select) pins must be separated because we might not necessarily draw the same image on both displays. In the code, we need to switch the active display through manual pin assignment, as it is not supported directly by the TFT_eSPI library.
Fig.3 Wiring diagram following the pinout in Table 1. The pins dc, rst, sda, scl are shared by the display. Chip select (CS) must be adressed separatly by the controller to communicate with the right screen.
To connect the devices, I went with old-school wirewrapping wire. Wire wrap is a classic method that involves wrapping 30 AWG wire around square metal posts to create a reliable connection without soldering. It's much better than jumper wire, especially for fast SPI communication.
The final setup is shown in Fig. 5A and Fig. 5B. Wire wrap doesn't give the clean result of a PCB, but it does the job to validate the concept of this project.
Fig. 4B You literally wrap/screw the wire on the pin header.
Fig. 4C The result is a solid and reliable link between the headers.
The final setup is shown in Fig.5A and Fig. 5B. Wire wrap does not give the clean result of a pcb but it does the job to validate the concept of this project. Next section is about the software.
Fig. 5A Front view of the setup. this is a proof that wirewrap is rarely clean.
Fig. 5B Back view of the setup, The pins shared are easy to spot. Wirewrap make this sharing very straightforward, when compared to jumper wire.
The project is built using PlatformIO, which handles the toolchain and library management. It relies on a few key libraries:
Framework: arduino
Libraries:
bodmer/TFT_eSPI: A powerful library for display control and rendering.
sparkfun/SparkFun VL53L5CX Arduino Library: Used to interface with the 8x8 Time-of-Flight distance sensor.
LittleFS: Used to store and retrieve image assets from the ESP32's flash memory.
PlatformIO will automatically install these libraries when you build the project.
To get the project running on your own hardware, follow these steps:
Clone the Project: Download the project to your local machine:
Bash
git clone https://github.com/intellar/Dual_Display_ESP32
cd Dual_Display_ESP32
Configure Hardware Pins: Open src/config.h and adjust the pin definitions to match your hardware wiring:
PIN_CS1 and PIN_CS2 (Display Chip Selects)
PIN_TOF_SCL and PIN_TOF_SDA (I2C pins for the sensor)
Prepare Image Assets: The eye textures are stored as 350x350 raw 16-bit RGB565 (.bin) files. A Python-based tool is provided in the image_tools folder to help you convert your own images.
Upload Filesystem to ESP32: Place your generated .bin files into the data folder at the root of the project. Then, use the PlatformIO "Upload Filesystem Image" task to transfer the assets to the ESP32's LittleFS partition.
Compile and Upload: Use PlatformIO to compile and upload the project to your ESP32.
Note on Configuration: All display settings (like GC9A01_DRIVER, TFT_WIDTH, etc.) are defined as build flags in the platformio.ini file. You do not need to edit the User_Setup.h file from the TFT_eSPI library, as these flags will override it.
The config.h file is the central control panel for customizing the project's behavior without digging into the main source code. Here you can adjust:
Hardware Pinout: Define chip-select and sensor pins.
Asset Paths: Specify filenames for the eye texture assets.
Animation Behavior: Fine-tune the eye's movement range (MAX_2D_OFFSET_PIXELS), interpolation speed (LERP_SPEED), and the timing of the idle saccade movements.
Sensor Behavior: Enable/disable the ToF sensor, activate calibration mode, and set the maximum tracking distance.
To achieve a smooth frame rate and a flicker-free animation, especially on a microcontroller, I can't just redraw things directly to the screen. I use a professional graphics technique called double buffering (or off-screen rendering).
The entire drawing process for each frame is broken down into four distinct, optimized steps:
Clear Buffers
Before drawing anything new, I completely clear two off-screen memory blocks, called framebuffers, to a solid black color. Each framebuffer is a complete 240x240 pixel image stored in the ESP32's PSRAM, one for each eye. All drawing happens on these hidden buffers first.
Draw Eye Images (The Optimized Core)
The main eye image is rendered onto each framebuffer. This is the most performance-intensive step, but it's heavily optimized:
Pre-Calculated Scanlines: The screens are circular. To avoid computationally expensive calculations (like sqrt()) in every frame, I pre-calculate the start and end points of every horizontal line that fits within the circular display area during setup. When drawing, the code only renders pixels within this pre-calculated visible area. This turns a complex geometry problem into a simple, lightning-fast array lookup.
Efficient Transparency: The eye image asset uses a specific color (0x0000, or pure black) to act as a "transparent color key." When the drawing function encounters a pixel of this color in the source image, it simply skips it, which is far more efficient than a full alpha blend.
Low-Level Optimizations: The draw_eye_image function uses pointer arithmetic instead of index calculations inside the tightest loop and inlines the pixel-drawing logic to write color data directly to memory, eliminating function call overhead.
Draw Overlays
After the main eye is drawn, any additional information is rendered on top. This includes the FPS counter and the 8x8 ToF sensor debug grid, which are typically drawn only on one screen.
Push to Displays
Only when both framebuffers are complete with the final image do I push them to the physical screens. The display_all_buffers() function sends the entire 240x240 image from each framebuffer to its corresponding display in a single, high-speed operation.
This process ensures that the user only ever sees a complete, finished frame, resulting in a smooth, flicker-free animation.
To make the eye's state immediately obvious, I use two different eye textures to visually communicate whether it is idly "daydreaming" or actively tracking an object:
Normal Eye (image_giant.bin): The default texture, used whenever the ToF sensor is idle (paired with random saccade movement).
"Bad" or Tracking Eye (image_giant_bad.bin): I switch to this alternate texture the moment the ToF sensor locks onto a valid target. This texture is designed to look more focused or intense, providing instant visual feedback that the eye has "seen" something and is now in tracking mode.
Two animated eyes are nice, but this project is more involving than that. I want the system to be able to detect and look at the person interacting with it. So, I added a VL53L5CX Time-of-Flight (ToF) sensor.
As the complexity was increasing, I needed to create a support for the electronics. I even asked two AIs—Copilot and Gemini—what the support could look like. While Copilot came back with a cartoon-like render (Fig. 6A), Gemini made a more realistic suggestion (Fig. 6B). Personally, I find the proposition from Gemini more interesting, though a bit boring. I made what I believe is a much better support by designing the iron robot head.
Fig.6a Copilot smart (gpt-5)
Fig.6b Gemini flash 2.5
The iron robot head is a very simple shape to create in FreeCAD. It consists of a cylinder and a sphere, with pockets for the eyes. When working with CAD, constructing the shape is like a puzzle. A much better solution than drawing everything is to use a 180° revolution of the base shape with a mirror plane to avoid making two eyes and all their details.
Note: I am learning CAD design through these projects, as my academic formation is in computer engineering with a PhD in 3D computer vision.
The head design, including support for the TFT displays, the nose hole that acts as a support for the ToF sensor, and a base plate to hold the ESP32-S3, is available on the GitHub repo. The parts were 3D printed on my old Ender 5 printer, but I realized the 4mm thick walls are too sturdy and took 17 hours to print; I plan to reduce them to 2mm.
The head design, including support for the TFT displays, the nose hole that acts as a support for the ToF sensor, and a base plate to hold the ESP32-S3, is available on the GitHub repo. The parts were 3D printed on my old Ender 5 printer, but I realized the 4mm thick walls are too sturdy and took 17 hours to print; I plan to reduce them to 2mm.
To give the eyes the ability to "see" and track objects, I integrated the VL53L5CX Time-of-Flight (ToF) sensor.
This sensor projects an invisible infrared laser signal that covers a square 45° x 45° field of view (Fig. 7A - note: reference to figure in original text). It measures the time it takes for the light to return to its 8x8 grid of collectors, giving it 64 independent distance measurements. This creates a low-resolution depth map of whatever is in front of it (Fig. 7C).
Fig.7a. Field of view of the VL53L5CX ToF sensor.
Fig 7b. View of the 2d infrared signal projected by the ToF sensor.
Fig 7c. the 8x8 collector grid of the ToF sensor
1. Retrieving the Measurement Matrix
The process starts by continuously checking if the sensor has a new set of measurements ready using myImager.isDataReady(). Once new data is available, myImager.getRangingData(&measurementData) populates a structure with:
A distance_mm array containing the 64 distance values.
A target_status array indicating the validity of each measurement.
2. Finding the Most Reliable Target in the Matrix
A raw 8x8 grid can be noisy. A simple approach (finding the single closest pixel) is not robust. To solve this, I implemented a more comprehensive algorithm in process_measurement_data that evaluates every potential target area.
Here’s how the improved logic works:
Iterate Through All Potential Targets: The algorithm iterates through all 64 pixels, treating each one as a potential center of a target.
Evaluate a 3x3 "Window": For each center, it examines the surrounding 3x3 window of pixels.
Find Reliable Pixels: Within each window, it counts the number of "reliable" pixels. A pixel is reliable only if:
Its status code is 5 (valid reading).
Its measured distance is closer than a predefined maximum (MAX_DIST_TOF, set to 400mm).
Identify Candidate Regions: A window must contain a minimum number of reliable pixels (MIN_RELIABLE_PIXELS_IN_WINDOW, set to 4) to be considered a valid object.
Select the Best Overall Region: The algorithm compares all valid candidates and keeps track of the one that has the lowest average distance. This region represents the closest, most solid object detected.
Calculate Final Coordinates: If a "best" region is found, its center pixel's grid position is converted into a normalized format (-1.0 to 1.0) and fed into the eye's LERP movement logic.
If no window meets the minimum criteria, the system reverts to the idle "saccade" behavior. This robust method ensures the eyes reliably lock onto a person or object and are not distracted by noise.
To ensure the eye-tracking logic is robust and accurate, I built a special Calibration Mode, enabled by setting TOF_CALIBRATION_MODE 1 in config.h.
When active, the system completely ignores the real ToF sensor. Instead, it generates a synthetic 8x8 distance matrix with a predictable, moving target.
How the Simulation Works
The run_calibration_simulation() function drives the mode:
Pre-defined Test Pattern: I created a set of 13 specific coordinates on the 8x8 grid. These points cover the corners, edges, and center of the field of view, including the crucial corner cases to test partial detection.
Timed Movement: Every two seconds (CALIB_INTERVAL_MS), the simulated target automatically moves to the next position in the pattern, creating a predictable path.
Synthetic Data Generation: The function generates a new measurementData matrix in each frame, creating a clear "object" by setting:
Target Pixel: A close distance (e.g., 200mm).
Neighboring Pixels: A slightly further distance (e.g., 300mm).
Background: A far distance (e.g., 1000mm).
While in calibration mode, the random "saccade" movement is disabled, forcing the eyes to only react to the simulated target. This makes it easy to verify that the tracking algorithm works as expected across its entire range.
// Define test positions: 9 inside and 4 on the corners to test partial detection
const int calib_positions[NUM_CALIB_POSITIONS][2] = {
// --- Fully visible patterns ---
{1, 1}, {1, 4}, {1, 6}, // Top row
{4, 1}, {4, 4}, {4, 6}, // Middle row
{6, 1}, {6, 4}, {6, 6}, // Bottom row
// --- Partially visible patterns (center of pattern is on the corner pixel) ---
{0, 0}, // Top-left corner (only 2x2 of the 3x3 pattern is visible)
{0, 7}, // Top-right corner
{7, 0}, // Bottom-left corner
{7, 7} // Bottom-right corner
};
The following video shows the calibration mode in action:
I've made a significant enhancement to my development process by transitioning from the traditional Arduino IDE to a more robust and modern setup: Visual Studio Code (VS Code) combined with the PlatformIO IDE extension. This move was driven by the need for better performance and a more feature-rich development environment.
While the Arduino IDE is excellent for beginners, my needs evolved toward more complex applications, which is where VS Code and PlatformIO truly excel.
Key advantages of this new setup include:
Faster Compilation: PlatformIO's build system features caching, which significantly speeds up compilation times for large projects.
Superior IDE Experience: VS Code offers a professional-grade experience with features like:
IntelliSense: Smart code completion and suggestions.
Advanced Debugging: The ability to use a debugger to set breakpoints and inspect variables.
Efficient Library Management: PlatformIO handles dependencies on a per-project basis, which helps avoid version conflicts and ensures a project is always portable.
Automatic Port Detection: It automatically detects the COM port your board is connected to.
This transition has streamlined my workflow, enabling me to develop more sophisticated and reliable applications with increased efficiency and confidence.
A key library in my project is TFT_eSPI by Bodmer. Configuring this library in the Arduino IDE can be cumbersome, as the standard method involves editing global files (User_Setup.h).
This approach has several drawbacks: configuration is global, updates are risky as they can overwrite custom settings, and it's poor for version control.
The solution in PlatformIO is to use build flags inside the platformio.ini file. This lets you define all the necessary hardware and pinout settings (like GC9A01_DRIVER, TFT_WIDTH, TFT_MOSI, etc.) outside of the library source code. This keeps the library pristine, makes the project completely self-contained, and allows for clean version control.
[env:esp32-s3-devkitc-1-n16r8v]
platform = espressif32
board = esp32-s3-devkitc-1
framework = arduino
board_upload.flash_size = 16MB
board_upload.maximum_size = 16777216
board_build.arduino.memory_type = qio_opi
board_build.filesystem = littlefs
build_flags =
-DBOARD_HAS_PSRAM
; TFT_eSPI configuration
-D USER_SETUP_LOADED=1
-D GC9A01_DRIVER=1
-D TFT_WIDTH=240
-D TFT_HEIGHT=240
-D TFT_MOSI=11
-D TFT_SCLK=13
-D TFT_MISO=17
-D TFT_DC=4
-D TFT_RST=6
-D USE_HSPI_PORT=1
-D TFT_CS=-1 ; Using manual CS
-D SPI_FREQUENCY=80000000
-D SMOOTH_FONT=1
lib_deps =
sparkfun/SparkFun VL53L5CX Arduino Library@^1.0.3
bodmer/TFT_eSPI@^2.5.43
Source and CAD model: https://github.com/intellar/Dual_Display_ESP32
Demo videos: https://youtube.com/shorts/loqei5ePCf8 https://youtube.com/shorts/lkqTNm5tmS0