- Smarter with AI
- Posts
- MonDive#33: Make Scroll-Stopping Real UGC Ads With AI
MonDive#33: Make Scroll-Stopping Real UGC Ads With AI
Build realistic video ads from product images using free AI tools

Welcome to the MonDive
Today in MonDive, we’re showing how to create realistic UGC style video ads using free AI tools, starting from a simple product image and turning it into a finished video.
We’ll go step by step through the process so it’s easy to follow and easy to try. By the end, you’ll be able to make natural looking video ads for your own products without creators, shoots, or editing.
Alright, let’s dive in.
60 Ways Marketers Are Winning With Attention
As privacy regulations, platform shifts, and automation reshape digital advertising, marketers are grappling with widespread signal loss. The 2026 Outcomes Guide from Adelaide explores how attention metrics offer a privacy-safe signal to helps teams understand whether ads actually register—and which placements are worth the investment.
With 60 real-world paid media examples from 16 industries, the guide shows how attention can inform smarter, data-driven decisions without replacing what matters most: business outcomes.
Why This Matters
Making video ads usually takes time, money, and coordination with creators and editors. This workflow removes that complexity and lets you create realistic ads on your own, without depending on anyone else.
It’s easy to use and costs nothing to get started. With free AI tools and a single product image, you can quickly turn ideas into finished video ads without technical skills or a long setup.
This is useful for founders, brands, and marketers who need content fast. You can create ready-to-use videos for promoting your product anywhere, without the usual hassle of production.
How to Make a Realistic UGC Ad With AI
This workflow shows how to create a realistic UGC-style video ad using AI tools without hiring a designer, influencer, or video editor.
Tools used:
ChatGPT (prompts, script, JSON)
Google AI Studio (Nano Banana) (character image, product image, image merge)
Google Flow (frame-to-video)
You can use any product for this method. In this tutorial, I’ll use a men’s wrist watch, but you can replace it with your own product image anytime.
Step 1: Generate a Character Prompt in ChatGPT
Open ChatGPT and generate a prompt for a male AI influencer image.
Sample Prompt :
Write a highly detailed image-generation prompt to create a realistic male influencer (UGC creator).
Requirements:
- Age: 25–35
- Looks like a real lifestyle creator from the USA or UK
- Friendly, confident expression
- Casual premium style (clean t-shirt or casual shirt), well-groomed
- Soft natural daylight, realistic shadows
- Background: minimal indoor (apartment/studio) or clean urban blur
- Framing: chest-up portrait with sharp facial focus
- Photorealistic skin texture and hair detail, high resolution
- No text, no watermark, no logos
Also include a separate Negative Prompt to avoid common AI issues.
Return in this format:
Main Prompt:
Negative Prompt:
The Prompt that ChatGPT gave me,
Main Prompt:
Photorealistic chest-up portrait of a male UGC lifestyle creator, age 25–35, who looks like a real influencer based in the USA or UK. Friendly, confident, approachable expression with a natural relaxed smile. Authentic facial structure, realistic proportions, subtle asymmetry like a real human face. Well-groomed hair and light stubble or clean-shaven, healthy natural skin tone with visible pores, fine skin texture, and micro-details.
Wearing casual premium clothing such as a clean fitted t-shirt or a relaxed casual button-down shirt in neutral tones. Style feels modern, minimal, and influencer-ready — not flashy, not staged. Natural posture, shoulders relaxed.
Soft natural daylight coming from a window or outdoor ambient source, creating gentle realistic shadows on the face and neck. Balanced exposure, no harsh lighting, no dramatic contrast. Color tones are true-to-life and warm.
Background is either a minimal modern indoor apartment or studio setting (neutral walls, soft depth) or a clean urban environment with a subtle background blur. Background remains unobtrusive and realistic.
Camera framing is chest-up portrait, eye-level angle, sharp focus on the face with shallow depth of field. Professional photography look, DSLR or mirrorless quality, 85mm portrait lens feel, cinematic but natural.
Ultra-high resolution, photorealistic rendering, realistic hair strands, accurate light falloff, lifelike skin shading, zero stylization, zero CGI look. Looks indistinguishable from a real smartphone or camera photo taken by a lifestyle creator.
No text, no logos, no watermarks, no branding.
Negative Prompt:
cartoon, illustration, anime, 3D render, CGI, plastic skin, waxy face, over-smooth skin, beauty filter, airbrushed, uncanny valley, distorted face, extra fingers, asymmetrical eyes, cross-eye, unnatural smile, fake teeth, blurry face, low resolution, over-sharpened, oversaturated colors, harsh studio lighting, dramatic shadows, fashion model posing, runway look, overly muscular, celebrity likeness, stock photo look, watermark, logo, text, subtitles, artifacts, noise, deformed hair, unrealistic skin textureStep 2: Generate the AI Influencer Image
Open Google AI Studio and go to the image generation section.
Paste the Prompt from ChatGPT.

Generate and save the best result
If the face looks inconsistent or distorted, regenerate and pick the cleanest output.
Step 3: Generate a Product Image
You can use this workflow with:
An AI-generated product image, or
Your real product photo (recommended if you already have one)
In this tutorial, we’ll generate a men’s wrist watch.
Go to ChatGPT and paste this prompt :
Write a premium product photography image prompt for a men's wrist watch.
Requirements:
- Shot type: close-up commercial product photography
- Background: clean neutral (white/gray) or subtle gradient studio backdrop
- Lighting: studio softbox lighting, crisp reflections, sharp edges
- Focus: dial, hands, bezel, strap texture clearly visible
- Realistic materials: metal case, glass reflections, leather or steel strap
- Photorealistic, high resolution
- No text, no watermark, no logos
Also include a Negative Prompt.
Return in this format:
Main Prompt:
Negative Prompt:
The Prompt ChatGPT gave me,
Main Prompt:
Close-up premium men’s wrist watch product photography, shot in a professional studio environment. The watch is centered and precisely aligned, captured at a slightly angled perspective to showcase depth and craftsmanship. Dial, hands, bezel, crown, and strap texture are in razor-sharp focus, with clean separation from the background.
Background is a clean neutral white or soft gray studio backdrop, or a subtle gradient that transitions smoothly without banding. Minimal, distraction-free composition with a luxury commercial aesthetic.
Lighting uses high-end studio softbox setup with controlled highlights and crisp reflections on the metal case and bezel. Balanced key and fill lights create sharp edges, accurate contours, and realistic shadow falloff beneath the watch. Glass crystal shows natural reflections without glare or overexposure.
Materials are highly realistic: brushed or polished metal case with visible micro-texture, reflective sapphire or mineral glass, and either a premium leather strap with natural grain and stitching or a stainless-steel bracelet with articulated links. True-to-life color accuracy and material response.
Ultra-high-resolution, photorealistic rendering with macro-level detail, professional DSLR or medium-format camera quality, shallow but controlled depth of field. Looks indistinguishable from a real commercial product photo used for luxury e-commerce or advertising.
No text, no branding, no logos, no watermark.
Negative Prompt:
cartoon, illustration, CGI, 3D render, plastic materials, unrealistic reflections, warped glass, blurry dial, unreadable hands, distorted proportions, uneven lighting, harsh shadows, blown highlights, low resolution, noise, grain, oversharpening, oversaturated colors, fake textures, stock photo look, fingerprints, dust, scratches, text, logo, watermark, brand name, background clutterThen open Google AI Studio, generate the product image,

If you’re using your own product photo instead, name it product.png Save it and continue.
Step 4: Merge the Influencer + Product Into One Final Frame
The goal is to create a single image where the influencer looks like he’s naturally using/wearing the product.
In ChatGPT, generate a Prompt for merging influencer + Product.
Sample Prompt :
I have two images:
1) A male AI influencer portrait
2) A product image
Write an image merge prompt that places the exact product naturally into the influencer photo.
Rules:
- Keep the influencer’s face and identity consistent
- Place the product realistically with correct scale and perspective
- Match lighting and shadows so it looks like one real photo
- Maintain photorealism (no distorted hands, no fake textures)
- Keep background clean and realistic
- No text, no watermark, no logos
Return in this format:
Main Merge Prompt:
Negative Prompt:The Prompt ChatGPT gave me,
Main Merge Prompt:
Merge the exact product from Image 2 into the scene of Image 1 (male AI influencer portrait) so it appears as if the influencer is naturally presenting or wearing the product in a real lifestyle photo.
Preserve the influencer’s face, identity, expression, and body proportions exactly as shown in Image 1. Facial structure, skin texture, hair, and expression must remain unchanged and fully consistent.
Place the product with realistic scale, correct perspective, and natural positioning relative to the influencer’s body. The product should appear physically plausible in the scene — aligned with gravity, body posture, and camera angle. No floating, stretching, or unnatural placement.
Match lighting, color temperature, and shadow direction between the influencer and the product. Shadows should fall naturally on skin or clothing where the product interacts with the body. Reflections on the product must correspond to the ambient light present in the influencer photo.
Maintain photorealistic material behavior: accurate metal, glass, fabric, or surface textures from the original product image. No smoothing, no artificial shine, no texture loss.
Hands (if visible) must be anatomically correct with natural grip, finger spacing, and skin contact. No distortion, extra fingers, or warped joints.
Keep the background clean, realistic, and unchanged, preserving depth of field and environmental consistency from the influencer image. The final result must look like a single, real camera photo taken in one moment.
Ultra-high resolution, realistic skin detail, accurate shadows, seamless blending, zero stylization.
No text, no branding, no logos, no watermark.
Negative Prompt:
face change, identity drift, different person, altered facial features, distorted hands, extra fingers, broken anatomy, floating product, incorrect scale, wrong perspective, mismatched lighting, fake shadows, flat lighting, plastic textures, CGI look, 3D render, cartoon, illustration, over-smoothed skin, beauty filter, blur, noise, artifacts, warped reflections, duplicated objects, text, logo, watermark, branding, unnatural pose, uncanny valleyNow open Google AI Studio:
Upload the male influencer image
Upload Product image
Paste the merge prompt
Generate the merged image

Save this image, and this will become the base frame for your UGC video. So keep generating until you get the best result.
Step 5: Write a 4-Scene UGC Script in ChatGPT
For a realistic UGC ad, you need a short script that the influencer will “speak” on video.
Again, for the prompt Generating, we will use ChatGPT
Sample Prompt For ChatGPT:
Write a 4-scene UGC video script for a men's wrist watch.
Requirements:
- Natural UGC tone (authentic, not salesy)
- Simple spoken English
- Each scene must fit into ~5 seconds of dialogue (keep it short)
Structure:
Scene 1: Hook (attention grabber)
Scene 2: Key benefit (comfort/feel)
Scene 3: Design + durability (why it stands out)
Scene 4: CTA (clear next step)
Output format:
Scene 1:
Visual:
Dialogue:
Scene 2:
Visual:
Dialogue:
Scene 3:
Visual:
Dialogue:
Scene 4:
Visual:
Dialogue:The Outcome that ChatGPT gave me,
Scene 1
Visual Prompt:
Vertical 9:16 smartphone video (1080×1920 or higher). Use the uploaded reference image as the strict identity + wardrobe + environment anchor. Keep the same male creator (face, hair, beard, skin texture, body proportions) and the same outfit (dark t-shirt, jeans). Keep the same apartment location by the window with soft natural daylight coming from the left, same background objects and layout.
Shot design: mid-shot in 9:16 from upper chest to mid-thigh (slightly wider than chest-up so the watch arm can be seen naturally). Subject placed centered with slight headroom. Maintain realistic phone-camera optics (natural perspective, mild handheld sway). Start with the creator looking toward camera, smiling naturally, then a quick glance down at the watch and back to camera. Subtle push-in (very slow) like a real handheld move. Photorealistic skin, pores, and hair strands; realistic shadows and window light.
Keep the watch exactly as in the reference image with correct scale and perspective; reflections must match the window light. No visible branding or readable logo text on the watch.
Dialogue:
“I didn’t think a watch could feel this comfortable… until I wore this.”
Negative Prompt:
identity change, face morphing, different person, different haircut, beauty filter, over-smoothed skin, plastic skin, uncanny valley. Distorted hands, extra fingers, warped wrist, broken anatomy. Watch redesign, dial changes, strap changes, wrong scale, floating watch, mismatched reflections, mismatched lighting. Background changes, object popping, warping, flicker, jitter. Overly cinematic moves, fast zoom, camera jumps. Text, subtitles, watermark, logos, branding.
Scene 2
Visual Prompt:
Vertical 9:16 smartphone video. Use the same reference image as the identity and watch anchor. The creator must remain identical. The watch must remain identical and physically correct.
Shot design: start in a mid-shot (upper chest area) and tilt the phone down smoothly to a tight wrist close-up within 1–2 seconds, then hold for detail. The watch fills the center of the frame, with the dial and bezel sharply visible. Depth of field is realistic: watch is sharp, background softly blurred. The creator slowly rotates the wrist left-right to show natural reflections on the glass and metal edges. Strap texture (leather grain or stitching, or metal link detail) stays crisp. Shadows on the wrist and watch must match the left-side window light with realistic falloff.
No readable branding or logo text on the dial. Maintain photorealism (real materials, no CGI shine).
Dialogue:
“It’s light, smooth on the skin, and I honestly forget I’m wearing it.”
Negative Prompt:
blurry dial, unreadable hands, warped glass, melted bezel, fake textures, plastic metal, over-glossy reflections, incorrect lighting direction, flat shadows. Hand deformation, extra fingers, broken wrist angle. Focus pumping, heavy motion blur, flicker. Text, watermark, logos, branding.
Scene 3
Visual Prompt:
Vertical 9:16 smartphone video. Lock identity to the reference image: same face, hairstyle, beard, skin, clothing, and environment by the window. Lock the watch: same exact watch, same strap, same dial color and layout, same case shape, same finish.
Shot design: upper torso framing (chest to just below the watch hand). The creator lifts the wrist naturally into the lower-middle of the frame while keeping the face in the upper third (classic UGC composition). The creator slowly rotates the wrist and slightly angles it toward the window so the watch catches clean, realistic highlights. Keep movement subtle and human (micro hand tremor, breathing). The watch must look solid: crisp edges, realistic metal micro-texture, believable glass reflections, natural shadow under the case and around strap contact points.
Keep the background stable and clean with soft blur; do not introduce new props.
No readable branding or logo text.
Dialogue:
“The design is clean, but it feels solid. You can tell it’s built to last.”
Negative Prompt:
face drift, identity change, unnatural smile, teeth artifacts. Watch changes, dial redesign, strap swap, incorrect scale, floating watch. CGI look, overly sharp “render” edges, fake sparkle, lens flares. Distorted fingers/wrist. Background warping, flicker, object popping. Text, watermark, logos, branding.
Scene 4
Visual Prompt:
Vertical 9:16 smartphone video. Use the reference image as strict identity lock: same creator, same outfit, same room, same left-side daylight. Keep the watch exactly the same.
Shot design: chest-up framing with the face in the top half, slight headroom. The creator looks directly into camera with a friendly confident expression. Minimal handheld movement (stable but real). The creator brings the watch wrist into the frame near chest level for 1–2 seconds (lower-center area) so it’s clearly visible, then relaxes. Lighting and shadows remain consistent across the whole clip. Photoreal skin and hair detail, clean background blur.
No readable branding or logos.
Dialogue:
“If you like simple watches that just work, check this one out.”
Negative Prompt:
identity drift, face morphing, skin smoothing, beauty filter. Distorted hands, extra fingers, warped wrist. Watch redesign, dial changes, wrong reflections, mismatched lighting. Background changes, flicker, camera jumps. Text, watermark, logos, branding.Step 6: Convert the Script Into Video-Ready JSON Prompts
We will be using Google Flow (Frame-to-Video) for ad generating, and it works best when each scene is generated separately. So you’ll convert your script into 4 separate Veo-3 JSON prompts, one JSON per scene, then paste them into Google Flow scene-by-scene.
Give this Prompt in the same interface in ChatGPT,
You are converting my 4-scene UGC ad script into Veo-3 JSON prompts for Google Flow (Frame-to-Video).
INPUT:
I will paste my script below. Each scene contains:
- Visual:
- Dialogue:
GOAL:
Return 4 separate Veo-3 JSON objects (one per scene) that I can paste into Google Flow one-by-one.
HARD RULES:
- Output exactly 4 JSON objects labeled:
Scene 1 JSON
Scene 2 JSON
Scene 3 JSON
Scene 4 JSON
- Each scene must be 5 seconds long.
- Keep the same male AI influencer identity and style consistent across all scenes.
- Ensure the watch is clearly visible in at least 2 scenes (wrist close-up shots).
- Dialogue must remain exactly as in my script (do not rewrite).
- Visual directions must match the script visuals and be realistic UGC style.
- No extra commentary outside the JSON.
VEO-3 JSON FORMAT:
For each scene, output a JSON object with exactly these keys:
{
"model": "veo-3",
"duration_seconds": 5,
"aspect_ratio": "9:16",
"prompt": "...",
"negative_prompt": "...",
"dialogue": "...",
"camera": "...",
"motion": "...",
"audio": {
"voice": "male",
"style": "natural ugc",
"pace": "normal",
"tone": "friendly"
}
}
FIELD GUIDANCE:
- prompt: describe the full scene visually as a frame-to-video instruction. Mention the influencer, setting, watch visibility, lighting, and UGC realism.
- negative_prompt: list issues to avoid (face changes, warped hands, extra fingers, text, logos, watermarks, unnatural skin, flicker).
- camera: simple camera direction (medium shot, close-up wrist, over-the-shoulder, etc.).
- motion: subtle natural movement (hand gesture, wrist raise, head nod, slight body shift).
- dialogue: copy the dialogue line exactly from the script.
NOW CONVERT THIS SCRIPT:
[PASTE YOUR 4-SCENE SCRIPT HERE]
Step 7: Generate the Full UGC Video in Google Flow
Open Google Flow and create a new project using Frame-to-Video.
Upload:
The Merged Image product with the influencer
JSON Prompt from ChatGPT
Choose format:
Portrait (best for Reels / Shorts)

Generate Scene 1
Paste Scene 1 Veo-3 JSON into the prompt box and run it.
Once the clip is generated:
Click Add to Scene (or the equivalent add button) so it becomes part of your timeline/story.
Generate Scene 2–4 (extend the video)
For each next scene:
Click Extend / Add next scene
Paste the next JSON (Scene 2, then Scene 3, then Scene 4)
Generate
Add it to the same scene/timeline
Repeat until all 4 scenes are added in order.

Export / Download
After Scene 4 is added, you now have a complete UGC ad sequence. Export/download the final video in your preferred format.
Final Result
Here, we created a realistic UGC-style watch ad where the influencer looks consistent, the watch sits naturally on the wrist, and the overall scene feels like a real lifestyle shoot

We made this ad end-to-end using only ~50 Google Flow credits, which you can typically access through the free trial credits after login.
Swap the watch with your own product image, and you can create the same high-quality, realistic UGC ad for any product, ready to use across all promotion channels without the usual hassle.
We’d love to hear from you!How did you feel about today's MonDive? Your feedback helps us improve and deliver the best possible content. |
Know someone who may be interested?
And that's a wrap on today's MonDive!

Reply