The AI Folks

Folks: Building a 24/7 AI VJ Show

I wanted to test an experiment that would bring Scope to an entirely new audience — one as broad and diverse as a traditional broadcast. Could an autonomous visual system hold attention for hours, the way a great broadcasting station does?

That experiment became The AI Folks — a 24/7 AI VJ broadcast where multiple agents take scheduled shifts, react to live audio, and generate visuals in real time.

How it works

1. Receive audio input (audio/video sources)

2. Run real-time audio analysis (bands, RMS/peak, beat detection, tempo, transients, spectral features)

3. Feed analysis + current state + agent skill context into a reasoning model

4. Execute tool actions against Scope (pipeline switching, parameter updates, prompt updates, captions)

5. Stream the visual output continuously

The important detail is that reasoning does not run in isolation. Each cycle includes:
- current pipeline
- current parameters
- available controls from pipeline schema
- currently active agent
- schedule context (handoff timing)
- fresh audio metrics

That gives the model enough context to make intelligent controlled decisions.

Technical Setup

The system is split into three layers:

1. Frontend control and stream surfaces
- Home page: public live view
- App page: operator view with controls, logs, runtime panel, and agent tools
- Input handling: HLS plus file/external audio workflows
- Audio driven: stream/effect animation only advances when audio is actually active and analysed

2. Agent runtime and orchestration
- Agent scheduling picks who is active per slot
- Agent brain enforces agent actions and tool calls
- User overrides are temporary; agent resumes quickly
- Actions are logged with timestamps and agent identity

3. Backend reasoning + Scope execution
- Reasoning endpoint receives audio + context + skill docs
- Model returns structured actions (send_prompt, send_parameters, load_pipeline, select_effect, send_caption etc.)
- Action payloads are validated/sanitised before execution
- Scope receives clean parameter updates and pipeline load requests

Nodes Used

1. glitch-realm

2. crystal-box

3. morph-host

4. urban-spray

5. cosmic-drift

6. kaleido-scope

7. Wallspace-captions

Created 5 new nodes to test the agents ability to control different nodes, while leveraging 2 others from the community. All nodes exposes different control params to the agents. Agents can control and switch nodes autonomously.

Credits

The caption system uses Wallspace Captions by Jack Morgan and the Kaleido-scope is created by Mark Wieder!

Future work

1. Audio and Video Intelligence:
Right now the system is audio-driven. Agents listen to incoming audio and make decisions from that signal alone. The next step is adding real-time video analysis alongside audio.

Agents would be able to:
1. Watch a live video feed and understand what's happening in the frame
2. Reason about visual content the same way they reason about audio
3. Make autonomous decisions based on both signals together

A practical example: a live band points a camera at their performance and the agents do the VJ. They watch the guitar player, the drummer, the crowd. Read the energy from the video and pair that with the audio to steer the visuals.

2. Better Output and Visuals
1. Stronger audio and visual semantic mapping (better movement/color decisions from signals)
2. Expanded visual toolkit (more detailed effects, better textures)

3. Better agent operator process and stability over long sessions

Watch the Show

The Folks stream runs continuously. You can tune in anytime to watch the AI VJ agents perform!

Create Your AI VJ (WIP)

Beyond watching, you can use Folks as your own AI VJ. Connect your audio source—microphone, webcam, video file, or NDI—and have the agents perform live visuals.