I’ve been messing around with AI avatars since before ChatGPT. At the start of 2022 I came across Synthesia, and started using their avatars to present text article content.
By late 2023, it was possible for me to ‘clone’ myself, and through 2024 I put out a few different versions of my digital twin.
Since starting a YouTube channel this summer I’ve been rather overly obsessed with getting the right setup. This involves composition, lighting, a good camera and a decent mic.
Having got there with a good setup, I thought, this is basically the best an avatar could be, so I put it to the test in my latest YouTube video. The results are… well you can’t really tell, can you?
Entering the ‘uncanny valley’
Human communication is nuanced. Our faces convey subtle emotional cues that are difficult for AI to replicate.
When avatars fall short, they appear ‘off' and make viewers uncomfortable. This is the uncanny valley. Today’s technology has largely overcome it for short-form or professional use.
But I don’t recommend using avatars for public YouTube videos. The extended runtimes increases the chance uncanny quirks will show. For now, I think keeping avatars to short, professional formats works best.
Setting up for success
High-quality input creates high-quality avatars. Poor lighting or composition leads to weak results.
Composition: Choose a spacious room with an appealing background. Position yourself centrally and align with background elements for balance.
Lighting: Avoid harsh ring lights. Instead, invest in a soft Neewer LED key light and a simple backlight. This combination creates natural shadows and helps you stand out.
Small spaces: If room is limited, use a compact greenscreen attached to your chair and replace the background in post-production with Runway or After Effects.
These adjustments create a professional look without major cost.
Recording your avatar
HeyGen’s Creator plan ($29/month) enables custom avatars with just a two-minute recording. Key steps:
Use a fixed tripod and record with your phone’s back camera for clarity.
Film in both horizontal and vertical formats to cover social media.
Keep it natural - speak freely for two minutes, avoid flustered gestures, and maintain good posture.
Wear clean, ironed clothing to avoid visual distractions.
Upload the recording directly to HeyGen after consenting to verification. Within minutes, your avatar will be ready.
Getting the voice right
Avatars need matching voices to avoid uncanny dissonance. HeyGen’s default voice training uses only two minutes of data, which can sound artificial. Alternatives:
Record your own audio: Use a quality microphone like the Rode NT USB for the most natural results.
Voice cloning with Eleven Labs: Provide 30 minutes to 3 hours of audio for training. Clean the files carefully to remove filler sounds. Once processed, Eleven Labs produces a near-accurate clone of your voice, which can be integrated directly into HeyGen.
Both approaches raise realism and improve audience reception. HeyGen’s voice cloning isn’t really up to this.
Automating the workflow
For frequent creators, automation further reduces effort. By linking Google Docs, Eleven Labs, and HeyGen through Make.com, you can generate videos with minimal manual steps.
Draft your script in Google Docs.
Approve it with a simple code.
Automation pushes the text to Eleven Labs for voice synthesis, then to HeyGen for video creation.
Completed files are stored automatically in Google Drive.
This setup transforms video creation into a streamlined process, ideal for businesses producing content at scale.

A simple Make automation reduces friction in the production process.
Practical tips
Keep avatar videos under five minutes when publishing to platforms like YouTube.
For corporate training, sales, or internal updates, length is less of a concern, as the audience understands the avatar context.
Always consider lighting and background—the avatar is only as good as the material you feed it.
If time allows, use your own recorded voice for authenticity.
AI avatars are a powerful productivity tool. They are not a replacement for human-to-human video in all contexts, but they excel where speed, scale, and consistency matter most. By following best practices in lighting, setup, voice, and automation, you can create professional avatars that engage without distracting uncanny effects.
The future of communication includes AI presenters. Master the workflow now, and you can save time, elevate your content, and stand out in a crowded digital landscape.





