EchoEar: AI Pet that Can Listen, Move, and Accompany
YouTube Video | Source Code Repository (Open Source) | Replication Tutorial

Project Introduction
EchoEar is an intelligent AI development kit created by Espressif, suitable for toys, smart speakers, smart control centers, and other voice interaction products that require AI capabilities. The device is equipped with ESP32-S3-WROOM-1 module, 1.85-inch QSPI circular touch screen, dual microphone array, supporting offline voice wake-up and sound source localization algorithm. EchoEar can achieve full-duplex voice interaction, multimodal recognition and intelligent agent control, providing a solid foundation for developers to create complete edge-side AI application experiences.
Video Demo
Introducing EchoEar: Espressif's Smart AI Development Kit
About Firmware
Xiaozhi
This example is developed based on Xiaozhi AI Robot, and the code is already open source.Quick Experience: esp-launchpad
Firmware can also be downloaded from attachments (echoear_xiaozhi_1_0_2.bin).
Feature Showcase
EchoEar's main controller uses Espressif's ESP32-S3-WROOM-1-N32R8 module, supporting 2.4 GHz Wi-Fi and Bluetooth 5 (LE) wireless connectivity. For storage, the device has 8MB PSRAM and 32MB Flash storage space, plus a microSD card slot supporting up to 32GB, meeting voice interaction and multimedia processing needs. It features a 1.85-inch circular touch screen (360×360 resolution) with ESP32-S3 native touch sensors, providing intuitive and rich interactive experiences.
For audio, EchoEar has a built-in 3W speaker and dual microphone array, supporting local voice wake-up and sound source localization. The power system is compatible with 5V DC and 3.7V 700mAh lithium battery power supply. Additionally, it integrates a USB-C interface supporting power supply and programming download, with reserved Pogopin interface for easy functional expansion.
As a major technical highlight of EchoEar, the device's esp-brookesia framework handles overall UI construction and rendering and integrates with Espressif's new audio-video framework esp-gmf, providing multiple intelligent functions optimized for edge-side applications. With this framework, EchoEar can achieve full-duplex voice interaction, multimodal recognition and intelligent agent control, building a more immersive human-computer interaction experience.
Intelligent Conversation and Emotion Recognition
The intelligent conversation and emotion recognition capabilities can actively identify user intent and emotional changes, based on the large model's semantic understanding capabilities, combining tone, semantics, and contextual context for comprehensive judgment, and responding through anthropomorphic dynamic expressions and voice feedback, further enhancing the device's emotional expression ability and personality characteristics.Long-term Memory Capability
The long-term memory capability supports continuous recording of multi-turn dialogue content, able to remember user names, preferences, common phrases, and other core information, and call upon them in subsequent interactions, achieving personalized experiences closer to user habits, enhancing the interactive value of the device as an "emotional companion".Offline Voice Wake-up and Sound Source Localization
Offline voice wake-up and sound source localization combined with motor control module and dual microphone array can achieve precise directional tracking within a 180° range. For each voice wake-up, the device can automatically identify the sound source direction and coordinate with the rotating base for visual eye contact, making the interaction process more immersive and natural.Custom Voice Characters
EchoEar supports custom voice characters, allowing flexible switching of voice tones and styles, creating your exclusive AI voice image. Supporting DIY character voices, developers can customize as needed, with more freedom and more personality in voice experiences.MCP Protocol and Function Call Capabilities
EchoEar also supports MCP protocol and Function Call capabilities, able to connect to local smart devices, implementing remote control, task distribution, and status feedback functions, serving as a local control center in smart home systems, providing users with stable and efficient edge control capabilities and open expansion interfaces.Motion and Posture Sensing
EchoEar has a built-in IMU sensor that can sense your movements and posture, interacting with you using "body language".
Replication Tutorial
EchoEar Main Body Replication
Please prepare the following materials before assembly:
No. | Description |
|---|---|
1 | 3D printed shell bottom cover |
2 | 3D printed shell top cover |
3 | 1.85-inch TFT display screen |
4 | M1.6×4mm screws |
5 | Rubber gaskets |
6 | Microphone dust-proof foam |
7 | Touch copper foil |
8 | EchoEar MicBoard |
9 | M2×5mm screws |
10 | EchoEar CoreBoard |
11 | EchoEar BaseBoard |
12 | 4Ω 3W 2828 square cavity speaker |
13 | 3.7V polymer lithium battery 902530 700mAh |
14 | 8P unidirectional 0.5mm pitch FPC flexible cable |

PCB Manufacturing Notes:
Board thickness should be 1.0mm
LCD connector CN3 is for compatibility with other screens, can be left unpopulated
V1.0 Main Body Hardware Assembly
EchoEar consists of CoreBoard, BaseBoard, and MicBoard three sub-boards. CoreBoard and BaseBoard are connected through two 2×5 1.27mm pitch pin headers. MicBoard is connected to CoreBoard through FPC flexible cable.
EchoEar Main Body 3D Structure Exploded View

Assembly Steps:
Assemble CoreBoard and BaseBoard, connected through two pin headers. Note that the interface direction has no fool-proof design, so pay special attention during assembly that antenna and TYPE-C are in the same direction.

Install CoreBoard and BaseBoard into the shell bottom cover, and secure with two M1.6 screws. Note to insert TYPE-C and pogopin contact spring pins into the bottom cover holes.


Paste Touch copper foil to the top of the bottom cover, and insert the connector into the Touch interface on CoreBoard. Note that the copper foil should not block the front and back cover assembly gap

Place the speaker in the fixed position, secure with two M2 screws, and insert the connector into the SPK interface on CoreBoard

Install the battery, paste it on the back of the speaker, and insert the connector into the BAT interface on CoreBoard

Paste dust-proof foam on the microphone of the MicBoard

Install MicBoard to the shell front cover, secure with two M1.6 screws, and insert the FPC flexible cable

Combine the front and back covers and secure with two M1.6 screws, insert the other end of the FPC flexible cable into the MIC interface on CoreBoard

Insert the screen cable into the FPC connector on CoreBoard in a downward connection manner, and lock the FPC connector

Peel off the screen back adhesive sticker, paste the screen to the shell, and also paste the rubber gasket to the bottom of the shell

Flash the firmware to get your EchoEar

Hardware Circuit Design Description
Main Body Hardware Circuit Design
Main Circuit Design
EchoEar hardware mainly includes power management, MCU, IMU, audio, LCD, SD card. The overall hardware block diagram is as follows:
Power Supply Methods
EchoEar supports three power supply methods: USB-Type-C, lithium battery, and magnetic connector. The 700mAh battery provides sufficient runtime. The main power is 5V, provided by USB. The auxiliary power is 3.7V, provided by the battery. When powered externally, the device will simultaneously charge the battery. During charging, the red light on the back will light up, turning green when fully charged.
EchoEar has a power switch at the bottom. Regardless of the power supply method, clicking the button can switch between power on and off states.

Power Domain Control: SD card, LCD backlight, and LCD driver power are controlled by POWER_CTRL(GPIO9)
Audio
EchoEar uses ES8311 chip for audio capture and NS4150B as audio amplifier; for better sound pickup, it uses ES7210 chip connected to two LMA3729T381-OY3S (compatible with MSM381A3729H9BPC package, can be replaced) analog microphones as pickups, with 45mm microphone spacing, effectively achieving sound source identification.
Bill of Materials
EchoEar Main Body BOM
ESP32-S3-WROOM-2-N32R16V Note: EchoEar main controller uses Espressif's ESP32-S3-WROOM-1-N32R8 module (currently populated with ESP32-S3-WROOM-2-N32R16V module, for mass production we recommend using ESP32-S3-WROOM-1-N32R8 module, if you need this module please contact Espressif Sales).
1.85-inch TFT Display Screen--1.85-inch IPS-360*360-Capacitive Display Screen
4Ω 3W 2828 Square Cavity Speaker--XHXDZ-2828-4R3W-2P1.25 Silver
FPC Flexible Cable--8P Unidirectional 0.5mm Pitch 50mm Long (5 pieces)
M1.6 Screws for BaseBoard/MicBoard/Front-Back Cover Mounting--KM1.6*4 (100 pieces)
Touch Connection Cable--GH1.25-2P with Lock Single End 15CM 5 pieces
Touch Copper Foil--Round 15mm Diameter*60 pieces (2 sheets total)
Custom Nitrile Rubber Gaskets--Black Rubber: Use the rubber gasket outline drawing in the attachments to find a supplier for customization
3D printed shell can be printed yourself or through JLCPCB printing
Microphone dust-proof foam can be customized
Attachment List
echoear_xiaozhi_1_0_2.bin: EchoEar-Xiaozhi version firmware
EchoEar-v1.0-Shell.zip: v1.0 shell, including front cover and back cover.
Rubber Gaskets.zip: Bottom gaskets for anti-slip function.
Microphone Dust-proof Foam.zip: Microphone dust-proof sound-transparent foam.
Known Issues
V1.0 Main Body: Need to remove Baseboard's C1, C10 and CoreBoard's C4, three 10uF capacitors. These capacitors will affect the VCC power domain supply capability on the board, causing probabilistic chip device restarts.
If you find issues after replication, welcome to comment and point out!!!
