Synthetic media - humans as code - Patrick Debois

This is not what I do daily – people know me from DevOps, but this is my pandemic hobby that turned into a deep exploration of synthetic media. It started with wanting to present remotely without the misery of green screens and ended with the realization that humans are becoming an API call away.

Green screen mastery requires understanding light physics: the inverse square law of decay, even illumination verified with specialized apps, DSLR cameras with proper 4:4:4 chroma subsampling, and the movie industry wisdom of “fix it in post.” ML-based rotoscoping eventually made manual keying obsolete. Virtual production inspired by The Mandalorian – game engines rendering backgrounds, VR trackers for camera position, DIY rigs with Arduino and HTC Vive base stations – showed that indie creators could approximate Hollywood techniques.

Building a virtual body progressed from multi-camera photogrammetry (cumbersome, days of cleanup) through LiDAR phone scanning to Unreal Engine’s MetaHuman system that generates realistic 3D humans from a single photo. The rigging system uses inverse kinematics – a skeleton model that understands how joints connect, so moving one limb produces realistic cascading motion. Computer vision is replacing physical trackers for body, face, and hand motion capture. The V-tuber community is pushing this hard from their home studios.

Voice synthesis works through spectrograms – logarithmic compression of the voice spectrum. Text-to-speech generates audio from text, but the one-to-many problem of intonation and emotion makes it sound robotic without manual annotation. The tool Overdub was a revelation: record a talk, see it as text, rearrange paragraphs, delete filler words, even type new sentences that get generated in your voice. Video editing as text editing.

Lip syncing bridges audio to visuals through phonemes-to-visemes mapping – predicting lip shapes from sound patterns. Deep fakes overlay faces convincingly enough that you cannot tell which is real at a glance. Practical applications include live translation (speaking in Hindi with correctly lip-synced facial movements), therapeutic grief sessions with deceased relatives, and virtual performers. Synthesia turns text into a realistic speaking avatar as a commercial product. The realism is getting close to the “humans as code” title – you type text, select a body and background, and a convincing virtual human delivers your message.

Watch on YouTube — available on the jedi4ever channel

This summary was generated using AI based on the auto-generated transcript.

Get notified of new posts