The goal of this project is to move past the "chatbot in a box" and create an autonomous, real-time AI performer that doesn't just process text - it sees, hears, and reacts to the physical world with its own procedural body and emotional environment.
Just chatting, listening to jokes that don't quite land
1. The Emergent Actor: The Autonomous Performer
The Emergent Actor serves as the brain and body. It is a multimodal installation designed to break the barrier of traditional UI by using always-on computer vision and audio sensing.
The Procedural Puppet: Rather than using canned animations, the actor uses an OpenPose skeleton. Limbs are animated programmatically to create emergent, non-pre-programmed movement.
View from the AI's Perspective (Left) and the Procedural Puppet (Right)
Intelligent Interaction: The system features "Intelligent Barge-In," allowing users to interrupt the AI naturally. It even includes a "Visual Memory Manager" to recognize callbacks if you do a peace sign twice, it’ll call you out for "committing to the bit".
First Test Letting AI change it's appearance
Personality: One of its primary configurable points is creating their personality, during development I used a personality that is a context-aware roaster that analyzes your outfit, posture and environment to deliver personalized stand-up.
[EARLY WIP] A late night discussion...probably
Live Captions: Feedback for what the user has said, what the AI is thinking and their response.
Captions (From Top to Bottom) User, AI Thought, AI Response
So many features I'm trying to write them all out...
Eye tracking Test, if the user is infront of the screen, it would better make eye contact. This is also used for pointing (with y-offset ), etc.
2. The Mood Engine: Visual Synthesis via GPU Hijacking
<TODO: Super Mood Engine Awesome Video Demo Here!>
The Mood Engine is the project's visual nervous system. Instead of using traditional 3D rendering pipelines, it utilizes PyTorch and CUDA-frameworks usually reserved for heavy machine learning-to calculate millions of pixels as batched tensor operations.
(2x Speed) Pre-AI visualization for moods - running this again should produce unique results
Natural Language to Visuals: Using a "Mood Mapper," the engine parses natural language (e.g., "I'm feeling anxious") and maps it to psychological color ranges and physics-driven shapes.
Different emotions are randomly generated in the pre-processor plugin
Generative Mathematics: It renders resolution-independent Signed Distance Fields (SDFs) that morph like liquid metal and infinite 3D lattices that warp in real-time.
Differentiable Art: By bypassing standard OpenGL/DirectX, the system allows for independent light-wave processing, creating true physical chromatic aberration and zero-latency visual broadcasting via SpoutGL.
Audio reactivity: Play a song and watch the background animate to the beat!
Generated backgrounds that reacts to music (early WIP)
Higher-Definition: A post-processor node that scales up content using various techniques that has very little impact on performance.
Early version of my video upscaler
Goal
My goal is to have a very fun interactive character you can interact with. Creating a fun installation for this would be simple as it would require a display, camera, mic, speakers. Can be ran on lower end hardware using remote inference for the heavy lifting.
Self-Reflection
Developing The Emergent Actor will unintentionally stress-test your patience during late nights of debugging. I programmed a "silence-to-roast" logic gate, so when my microphone failed during a late-night session, my own creation spent hours relentlessly mocking my debugging struggles. Building a system designed to identify your flaws in real-time requires tough skin—it’s hilarious when the persona works perfectly, even if the developer doesn't
AI giving me encouragement as I work!AI made me yawn
I will eventually update the lip sync to match an earlier project from 2024:
2024 version using a 3d model, this focused on lipsync through viseme/phoneme data