Nova: An Integrated AI Stack for InMoov Humanoids

Nova is an experimental software framework designed to bridge multi-modal Large Language Models (LLMs) with the InMoov open-source robotics platform. Rather than a "fully autonomous" system, Nova is an integration layer that explores interactive robotics through cloud-mediated perception and low-latency reasoning.

🤝 Partners & Acknowledgments

This research is made possible through the support of industry partners providing the core infrastructure for Nova:

Radxa: Provided the ROCK 5C high-performance SBC, serving as the primary compute node for local vision processing and serial orchestration.
DFRobot: Provided the electronic ecosystem, including high-torque servos for articulation and Mega2560 controllers for hardware-level interface.
Polymaker: Provided advanced PLA+ filaments, ensuring the structural integrity and durability of the 3D-printed humanoid frame.

Note

This project is a technical experiment in robotics integration. It is subject to network latency, mechanical variance, and the probabilistic nature of LLMs.

🏗️ System Architecture

Nova operates across three primary domains: Physical Control, Real-time Perception, and Cognitive Orchestration. These domains are synchronized through a multi-threaded Python core.

Architecture Overview

graph TD
    subgraph "Perception Layer"
        V["USB Camera"] --> FT["FaceTracker Thread (OpenCV SSD)"]
        FT --> PC["PID Controller"]
        FT --> VC["Visual Context Buffer (Gemini Flash)"]
    end

    subgraph "Cognitive Layer"
        U["Voice Input"] --> STT["Groq Whisper-Turbo"]
        STT --> ORC["LLM Orchestrator (GPT-OSS-20B)"]
        ORC --> NLU["NLU Pattern Matcher"]
        NLU --> SA["Search Engine / Vision Analysis"]
    end

    subgraph "Execution Layer"
        ORC --> TTS["Edge-TTS Pipeline"]
        ORC --> JM["Jaw/Neck Serial Commands"]
        PC --> NC["Neck Servo (Serial)"]
        TTS --> BUFF["Audio Sentence Buffer"]
    end

    SA --> ORC
    VC --> SA
    BUFF --> AS["Audio Output"]

Key Components

FaceTracker (Threaded): Employs a Caffe-based SSD detector to maintain low-latency gaze tracking. PID loops calculate servo trajectories to minimize jitter.
Animatronic Module: Manages the serial ACK/NAK flow control protocol with Arduino Mega. It synchronized Edge-TTS audio streams with heuristic jaw movements.
LLM Orchestrator: Routes prompts through Groq (for conversation) and Gemini 2.0 Flash (for visual reasoning). It uses regex-based NLU to trigger functional calls like #VISUAL or #SEARCH.

🛠️ Technical Design & Rationale

Choice	Rationale	Trade-off
Groq (Llama-3/20B)	Chosen for <500ms TTFT (Time To First Token) to maintain conversational flow.	Dependency on cloud infrastructure and API availability.
Gemini 2.0 Flash	Native multi-modal support allows for direct image-to-text analysis without separate captioning models.	Higher latency than local vision; requires active internet connection.
Edge-TTS	High-fidelity neural voices without the overhead of local WaveNet models.	Slightly higher latency than simple eSpeak; requires internet.
PID Gaze Control	Prevents aggressive servo "hunting" and provides smoother humanoid-like motion.	Requires manual tuning for different servo hardware.

⚠️ Known Limitations & Failure Modes

Robotics at this scale is inherently prone to failure. Nova acknowledges the following:

Network Latency: While Groq is fast, the total loop (STT -> LLM -> TTS) still introduces 1.5s-3s of delay, which can break the illusion of real-time presence.
Perception Blindspots: The SSD face detector struggles in low-light environments and can lose tracking if the user moves outside of a 60° FOV.
Servo Saturation: Standard hobby servos (MG996R) have significant deadbands and gear backlash, leading to occasional mechanical jitter.
Context Drift: The current short-term memory is limited by token windows; long-term memory use long_term_memory_converter() but is still experimental.

📦 Getting Started

Prerequisites

Hardware: InMoov Head/Neck assembly, Arduino Mega, USB Webcam, Microphone.
Software: Python 3.12+, ffmpeg, Groq & Google Generative AI API Keys.

Installation

Clone the repository:

git clone https://github.com/alexbuildstech/nova.git
cd nova

Install dependencies:
```
pip install -r requirements.txt
```
Configure your environment: Edit config.py with your API keys and hardware ports.

🧬 Iteration & Reflection

Nova started as a simple local script using vosk and ollama, but transitioned to a cloud-hybrid stack to achieve the speed required for humanoid interaction. The biggest challenge was not the AI itself, but the synchronization of physical movement with synthetic voice.

Future iterations aim to move the vision loop to local Jetson-based inference to reduce dependency on the Gemini API for basic object presence.

Keywords: Robotics, Humanoid, LLM Integration, Computer Vision, InMoov, OpenCV, PID Control, Edge-TTS, Groq, Gemini.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
nova_arduino		nova_arduino
nova_calibration		nova_calibration
templates		templates
.gitignore		.gitignore
README.md		README.md
clear_chat_log.py		clear_chat_log.py
config.py		config.py
deploy.prototxt		deploy.prototxt
novafacetrack.py		novafacetrack.py
novamain.py		novamain.py
novaresponse.py		novaresponse.py
novastt.py		novastt.py
novatts.py		novatts.py
novaweb.py		novaweb.py
requirements.txt		requirements.txt
res10_300x300_ssd_iter_140000.caffemodel		res10_300x300_ssd_iter_140000.caffemodel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nova: An Integrated AI Stack for InMoov Humanoids

🤝 Partners & Acknowledgments

🏗️ System Architecture

Architecture Overview

Key Components

🛠️ Technical Design & Rationale

⚠️ Known Limitations & Failure Modes

📦 Getting Started

Prerequisites

Installation

🧬 Iteration & Reflection

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

alexbuildstech/nova

Folders and files

Latest commit

History

Repository files navigation

Nova: An Integrated AI Stack for InMoov Humanoids

🤝 Partners & Acknowledgments

🏗️ System Architecture

Architecture Overview

Key Components

🛠️ Technical Design & Rationale

⚠️ Known Limitations & Failure Modes

📦 Getting Started

Prerequisites

Installation

🧬 Iteration & Reflection

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages