I Trained an AI on My Own Visuals. Now It Performs With Me.

I picked up Hydra during the pandemic a browser-based video synth where you write code and it becomes visuals in real time.

https://hydra.ojack.xyz/

What started as a lockdown hobby turned into a practice. Now I perform live visuals regularly, and I wanted to build one instrument that ties together everything I'm into.

Ruby City in San Antonio, Texas - Photo by Jo E. Norris

So I forked Hydra and rebuilt a VJ-App.

https://github.com/diegochavez-io/hydra-synth_vj.git

The visual pipeline chains several systems: Hydra live coding, DayDream Scope real-time gen-AI, GLSL shaders, Cellular Automata , all feeding into TouchDesigner as part of a larger live visual network. It works standalone as a performance app, but also slots into this bigger routing setup.

Daydream Scopeintegration.

Daydream Scope's generative AI runs through a custom LoRA I trained on my own output. I added a record_batch function to my forked Hydra build that auto-captures 5-second clips formatted for Wan 2.1 14B fine-tuning. So the LoRA driving Scope's real-time generation was literally trained on Hydra's visual output, one tool feeding the next.

This Is What I make.

These are my Hydra presets audio-reactive, tuned for ambient warmth. This is the raw material the AI learned from.

Teaching the AI my aesthetic

Wan is an open-source AI video model. You give it a text prompt, and it generates short video clips. The problem is that, out of the box, AI models can produce generic output if your prompt isn't clever enough or if you trigger the wrong word token. It doesn't know what my visuals look like. That's where a LoRA comes in. A LoRA (Low-Rank Adaptation) is a small, focused training layer you add on top of a base model.

To build the dataset, I added a record_batch function to my forked Hydra build. One click, and it walks through every preset, auto-capturing a 5-second clip of each, already formatted for Wan 2.1 fine-tuning. No screen recording, no manual trimming. The browser renders the visuals and writes the training data in one pass.

23 clips, fed into the model. It learned my visual language: the color palettes and the textures. The result is a 300 MB .safetensors file that can be loaded into Daydream Scope or other WAN 2.1 workflows, such as ComfyUI.

Technical details: Wan 2.1 14B base model. Trained on RunPod A100 80GB via ai-toolkit by Ostris.

Model on HuggingFace.

Here's the model learning.

Step 250, picking up the color palette.

Step 500, flowing textures, learning the aesthetic.

Step 750 loaded in Scope

The Datamosh

One LoRA wasn't enough. While the first training run was finishing on a rented A100 (after burning an hour on a 5090 that didn't have enough VRAM), I vibe coded a set of datamosh scripts with Claude Code on my Mac. Stripping I-frames from actual bitstreams, cross-moshing motion vectors between clips, pixel-level glitch. The corrupted clips teach the model my aesthetichere:

Datamosh tools: https://github.com/diegochavez-io/datamosh-tools

Lora Training Output: https://huggingface.co/diegochavez/pixmo_h_v2

Captioning matters

Each training clip gets a .txt file that tells the model what it's looking at. The trigger word goes at the front of every caption so the model learns to associate that token with your visual concept. For this LoRA the trigger is pixmo_h.

Captions don't need to be poetic. They need to be specific. Describe what's actually happening in the clip: the motion, the colors, the artifacts. Generic captions like "abstract colorful video" give the model nothing to latch onto. Here's one of mine:

pixmo_h P-frame drift on neon green base, hot pink and yellow macroblock shrapnel smearing along original motion paths, single-reference melt

A few things I learned:

Caption what you see, not what you want. Write as if you're describing the clip to someone who can't see it.
Keep the format consistent across every caption in the dataset. Inconsistent styles create noise in training.
Check your clip lengths. ai-toolkit auto-truncates to ~4-5 seconds. If your caption describes motion from second 6, the model is learning from a mismatch.

Step 0 — base model. Generic glitch attempt, no real codec knowledge

Step 500 — starting to learn. Darker, more structured

Step 1000 — real datamosh macroblock patterns emerging, saturated color tearing

Step 1500 — horizontal banding, compression artifacts, codec-native motion

Step 2000 — fully trained. This is the datamosh aesthetic I fed it

The instrument

In live coding, you write the visuals while the audience watches. There's no timeline, no pre-rendered content. It's closer to playing a synthesizer than editing a video.

I forked Hydra and built the instrument I always wanted around it: presets, audio reactivity (audio drives color and hue), Ableton Link sync, and built-in projection mapping, all running in a browser.

Prompt Sequencer

I built a prompt sequencer directly into the launcher. You write a list of prompts and the sequencer cycles through them on bar boundaries via Ableton Link. Every N bars, the next prompt fires to Scope over OSC. The AI's visual theme shifts with the music, hands-free.

The sequencer pre-loads the next prompt into a live edit field before it fires. I can rewrite it mid-performance, hit Send, and that's what Scope gets instead.

Cellular Automata

My last cohort project was a standalone cellular automata plugin for Scope (Lenia, SmoothLife, MNCA, and more). For this project, I built those engines directly into the Hydra launcher as another content source feeding into the pipeline.

Open source

The engine is Hydra, Olivia Jack's open-source video synth that got me into live coding. That world came with an ethos: share your screens, share your code, learn in public. Everything here is open: the app, the LoRAs, the workflow.

Hydra → Scope: Live Coding My Own AI

Hydra → Scope: Live Coding My Own AI

Explore new worlds with Daydream Scope

I Trained an AI on My Own Visuals. Now It Performs With Me.

This Is What I make.

Teaching the AI my aesthetic

The Datamosh

Captioning matters

The instrument

Prompt Sequencer

Cellular Automata

Open source

More like this

Slime Mold Stop-Motion

Cellular Automata, Custom LoRAs, and Real-Time AI Video

Becoming