I'm Not a Designer. Here's the AI Video Playbook I Built Anyway.

May 1

Let me be upfront about something. I am not a designer. I am not a video editor. I have never opened After Effects in my life, and the last time I touched anything resembling creative production was asking a designer to change a button color in 2019 LOL.

What I am is a performance marketer. Twenty-plus years across agencies, media buying, analytics, and growth strategy. I understand attribution models, creative testing frameworks, and how to read a ROAS lever. What I don't understand — or didn't, until recently — is how to build the creative itself.

That changed this year. Not because I suddenly developed design talent, but because I stopped waiting for someone else to hand me assets and started building a system to produce them myself.

This is that system.

Why I Had To Figure This Out

The pitch for AI-generated video sounds almost too good: describe what you want, press a button, get a polished 30-second ad. If you've tried it, you already know that's not how it works. Here's what actually happens when you go in blind:

You generate a clip and the motion is jittery — like someone filmed it during an earthquake. You try again and the face morphs mid-scene into something that belongs in a horror film. You add text and the AI hallucinates words that don't exist, or spells your brand name wrong in four different ways across four seconds. You spend an afternoon burning tokens on prompt variations trying to brute-force quality out of a single tool, and you end up with nothing usable.

I know because I did all of that. The problem isn't the tools. The tools have gotten genuinely impressive. The problem is that most people approach AI video the way you'd approach a vending machine — put in a prompt, expect a finished product to fall out. That's not how any creative production works, AI or otherwise.

The shift that unlocked everything for me was simple: stop thinking about tools and start thinking about workflow.

Once I treated AI video production as a multi-stage system — with each stage having a specific job and a specific tool assigned to it — the quality jumped immediately. Not because I got better at prompting. Because I stopped asking one tool to do everything.

The System: Five Stages That Actually Work

Stage 1: Storyboard First. Always.

Before you touch a single AI video tool, you need a storyboard. Not a vague idea. Not a one-sentence brief. An actual scene-by-scene breakdown of what the video needs to communicate and in what order. This is where I use Claude or Gemini — not to generate the video, but to structure the idea.

Here's the key constraint that most people get wrong: short clips are your friend.For a 30-second ad, you're targeting roughly 10 clips, each running 2 to 4 seconds. That sounds like a lot of clips for a short video, but there's a very good reason for it.

Every AI video tool degrades with clip length. The longer you ask the model to sustain motion, consistency, and coherence, the more it drifts. Characters change. Lighting shifts. Objects appear and disappear. A 4-second clip is manageable. A 12-second clip is a gamble. A 20-second clip is almost always unusable.

Short clips also give you more control over pacing and editing in the final stage. You can cut, reorder, or replace any single 3-second moment without rebuilding the whole thing.

So: use AI to break your concept into scenes, assign each scene a rough duration, and decide what needs to be communicated in each moment. This document becomes your production brief. Don't skip it.

Stage 2: Create Seed Images — One Per Scene

Once you have your storyboard, the next job is to anchor each scene visually before you ever try to generate motion. This is what I call the seed image stage, and it's the single biggest unlock in the whole system. The way it works: for each scene in your storyboard, you generate a still image that represents what that clip should look like. This image becomes the input for your video generation in Stage 3. You're doing image-to-video, not text-to-video. That distinction matters enormously, and I'll explain why in a moment.

For seed images, I use Adobe Firefly or Leonardo AI. A few things that matter here:

Lock your character early. If your video has a consistent character — a person, a mascot, a presenter — establish that character in your seed images and reuse the same seed prompt or reference image throughout. Character consistency is the thing that makes or breaks whether a video feels coherent or like a slideshow of unrelated clips. The moment your character's face changes between scenes, the whole thing falls apart.

Stick to 16:9. Generate your seed images in 16:9 from the start. Trying to reformat later introduces cropping problems and composition issues that compound downstream.

Generate options, pick fast, move on. Don't try to get the perfect seed image. Generate three or four options per scene, pick the one that best matches your storyboard intent, and move on. The temptation to chase the ideal image at this stage wastes time and doesn't meaningfully improve the final video. Good enough here leads to better output in Stage 3 than perfect here leads to in Stage 4.

Stage 3: Image-to-Video — And Why Text-to-Video Fails

Here's the thing nobody tells you clearly enough: text-to-video is not production-ready for commercial creative. Not yet. The outputs are inconsistent, the character drift is severe, and you have almost no control over composition. Image-to-video is different. When you give the model a concrete reference frame — your seed image — it has an anchor. It knows what the character looks like. It knows the lighting, the composition, the color palette. Its job becomes animating that scene rather than imagining it from scratch. The quality difference is significant.

For this stage I use two tools depending on shot type, and the distinction is important:

Kling AI handles wide shots, environmental scenes, and anything with substantial motion across the frame. Product shots, landscapes, dynamic action in a setting — Kling handles these well.

Runway handles faces, hands, and close-ups. If you have a character speaking, reacting, or doing anything detail-sensitive, Runway's closer control over fine motion makes it the better choice. Hands in particular are notoriously difficult for video AI — Runway manages them better than most alternatives.

Settings that actually work across both tools:

Image-to-video mode (not text-to-video)
Clip length: 4 seconds maximum
Motion intensity: medium or low

That last one trips people up. The instinct is to generate more motion — it feels more dynamic, more like a "real" video. But high motion intensity is where jitter lives. It's where your character's face starts sliding. Keep it at medium or low, and use editing in Stage 5 to create the sense of pacing and energy.

Stage 4: Control the Motion in Your Prompts

Even within image-to-video, your motion prompt matters. And there are specific words that reliably break quality. Avoid: fast, rapid, intense, dynamic, energetic, powerful.

Every one of those words introduces jitter and instability into the output. The model interprets them as instructions to generate exaggerated motion, and exaggerated motion in AI video almost always looks wrong.

For face shots and close-ups, I add the following phrase to every single prompt: "camera holds still, subtle motion only." It sounds almost too simple, but it works. It tells the model to deprioritize camera movement and focus on the subject's natural, minimal motion. The output is dramatically more usable.

For wide shots and environments, you have a bit more latitude — a slow pan or a subtle zoom reads well and doesn't introduce the instability that fast motion does.

The other thing worth knowing: AI video models are sensitive to contradiction in prompts. If your seed image shows a static interior scene and your prompt asks for sweeping camera movement, you're going to get a degraded output. Keep your motion direction consistent with what the seed image implies.

Stage 5: Stitch, Polish, Publish

You now have 8 to 12 short clips — each 3 to 4 seconds, each controlled, each visually anchored by your seed images. This is where they become an actual ad. I use Canva for this stage, and it does the job well for performance creative.

The Canva workflow:

Import your clips in storyboard order. Trim any frames at the start or end of each clip where the motion hasn't quite settled — AI video often has a slightly unstable first and last frame. Trimming half a second on each end usually cleans it up.

Add your overlays: text, branding, a logo lock-up. This is also where you control pacing through cut timing. A slightly faster cut pace creates energy without requiring you to generate high-motion clips. This is the editing trick that makes restrained, medium-motion clips feel dynamic.

Add music — something with consistent BPM that matches your cut timing. The combination of tight editing and matched audio does more for the feel of the ad than any individual clip quality improvement.

Final output: export at the highest resolution Canva offers and keep a copy of all your individual clips. You'll want them for testing variations.

What I Learned That Nobody Tells You

After going through this process several times now, here's the honest version of what I know:

Longer clips always produce worse output. I tested this repeatedly. There is no version of a 10-second AI video clip that holds together as well as three 3-second clips stitched together. Build short.

Consistency beats creativity at every stage. The temptation is to make each scene visually interesting on its own terms. The discipline is to make them all feel like they belong together. Consistency of character, lighting, and color palette is what makes a video feel like a produced ad rather than a highlight reel of AI experiments.

Good enough beats perfect, every time. In performance marketing, the creative that ships and gets tested beats the creative that never ships because you were still optimizing it. The system I've described produces good-enough output in a fraction of the time it would take to brief an agency or wait on a designer. That speed is the competitive advantage.

This is a system, not a tool. That's the whole point. No single AI tool produces polished video. A well-structured workflow using several tools for their specific strengths does. The people who are already producing competitive creative with AI aren't better prompters — they're better systems thinkers.

Why This Matters for Performance Marketers Specifically

Creative iteration speed is one of the most underrated levers in performance marketing. The faster you can generate, test, and learn from creative variants, the faster you can identify what works and scale it.

Historically, that lever has been constrained by production cost and turnaround time. You could have a brilliant hypothesis about why a different opening hook would outperform your control — and it would take two weeks and a significant budget to test it.

With this system, that same test takes a day. Maybe less.

That's not a small improvement. That's a structural change in how quickly a solo operator or lean team can move. And in a media environment where attention patterns shift constantly and what worked last quarter may already be fatiguing — speed of creative iteration is a genuine performance advantage.

The tools are ready. The system works. What's left is building the discipline to use it.

See It In Action

The 30-second video I built using this exact workflow is live here: youtube.com/watch?v=APJhHtDzCb0

If you're figuring this out inside your team, or doing it solo like I am, I'm happy to compare notes. You can find me at MK2 Media.

Mahmood is a performance marketing strategist with 20+ years across agencies, media, analytics, and growth. MK2 Media — Performance Marketing, Simplified

Mahmood Khan https://www.mk2.media