← Back to Projects
Data/ML

B-CUBE

A pocket-sized, offline-capable voice assistant device that runs a local LLM on an ESP32-S3 — no phone or internet required.

C/C++ESP32FreeRTOSllama.cppEmbeddedLLM

Overview

B-CUBE is a startup concept for a standalone voice-controlled scheduling assistant built into a 4 cm cube. Unlike Siri or Alexa, it requires no smartphone, no cloud, and no internet for core functionality — all STT, LLM inference, and TTS run on-device. The firmware targets an ESP32-S3 (FreeRTOS) for power-constrained hardware with a hybrid fallback to a Raspberry Pi Zero 2W co-processor for full open-vocabulary speech recognition.

Key Features

  • Always-on wake word detection via ESP-SR WakeNet9 at ~40 mA active standby
  • On-device command recognition for scheduling vocabulary using ESP-SR MultiNet (~200 commands)
  • Quantized LLM intent extraction (TinyLlama 1.1B Q4 / SmolLM2-135M) via llama.cpp
  • eSpeak-ng TTS on ESP32-S3; Piper TTS (neural voice) on the RPi co-processor
  • SQLite-backed schedule manager with RTC-backed alarms that survive deep sleep
  • OTA dual-partition firmware updates with SHA-256 verification and automatic rollback

Technical Decisions

C/C++ was chosen over Python because the ESP32-S3 has only 512 KB SRAM + 8 MB PSRAM — CPython alone requires ~25 MB just to start. A Hardware Abstraction Layer isolates pin assignments from application logic so the same firmware binary runs on the breadboard prototype and the eventual custom 4-layer PCB with the ESP32-S3-WROOM-1 module. The ESP32-S3 handles wake word and command recognition; the RPi Zero 2W co-processor (powered on only after wake word detection) handles Whisper Tiny STT and SmolLM2-135M inference, reducing average current draw from ~120 mA to ~2 mA during idle.