Overview

AI Reels Generator is a SaaS platform that leverages Google Gemini's multimodal capabilities to transform plain-text prompts into short-form video content optimized for Instagram Reels, TikTok, and YouTube Shorts. The platform handles everything from script generation to scene composition, voiceover synthesis, and final export — all inside a single workflow.

Problem

Content creators and marketing teams spend 4–6 hours producing a single short-form reel. The process spans scriptwriting, visual sourcing, editing, captioning, and platform-specific formatting. Smaller teams with no dedicated video editor are effectively excluded from this content channel despite it being the highest-reach format on most platforms today.

Research

After surveying 40+ content creators and social media managers, three patterns emerged:

▸78% identified ideation and scripting as the biggest time sinks, not editing
▸65% struggled with platform-specific formatting — aspect ratios, caption burn-ins, hook pacing
▸91% would pay for a tool that reduced production time by more than half

Most existing tools required a separate tool for each stage (script → video → captions), causing constant context-switching and no unified output format. We identified a clear opportunity: a single pipeline from prompt to publishable reel.

Solution

We built an end-to-end generation pipeline:

▸User enters a text prompt describing the reel concept
▸Gemini Pro generates a structured script with timestamped scene descriptions
▸The pipeline fetches royalty-free B-roll footage via Pexels API
▸Gemini TTS synthesizes voiceover audio per scene
▸FFmpeg assembles scenes, overlays captions, and renders the final MP4
▸Supabase stores assets and streams real-time job progress to the UI

The frontend shows a live pipeline progress view for each step, giving users full transparency and reducing perceived wait time significantly.

Technical Architecture

The system is composed of four layers:

AI Layer — Gemini 1.5 Pro handles script generation with structured JSON output enforced via function calling. Custom prompt engineering ensures consistent scene descriptions that translate predictably to visual queries.

Media Pipeline — A Node.js worker process orchestrates FFmpeg commands via child_process. Each generation job runs in isolation with a unique job ID persisted in Supabase. The worker lives on a dedicated VPS to bypass serverless filesystem and binary limits.

Realtime Layer — Supabase Realtime channels broadcast job state changes (queued → processing → rendering → done) to subscribed clients. The frontend uses a custom useJobProgress hook to consume these events.

Storage — Supabase Storage holds raw fetched assets and final renders. Signed URLs provide time-limited access for downloads.

Challenges Faced

FFmpeg in serverless environments was the largest infrastructure challenge. Vercel functions have a 250MB size limit, no persistent filesystem, and short execution timeouts — all incompatible with FFmpeg. The solution was offloading media work entirely to a VPS job worker and treating Vercel as pure API and UI layer.

Gemini rate limits required per-user quota tracking. We store monthly generation counts in Supabase and enforce limits before job submission, returning descriptive errors rather than failed API calls.

Caption timing needed precise synchronization between TTS audio duration and the script's scene timestamps. We solved this by having Gemini output expected durations per scene, then adjusting FFmpeg filter timestamps based on actual TTS output duration.

Results

The platform launched to a closed beta of 120 users. After one month:

▸Average reel production time dropped from 4 hours to 8 minutes
▸Beta users reported generating 3× more content per week
▸92% satisfaction rate in post-session surveys across all user segments
▸2,400+ reels generated in the first month with zero critical errors

Future Improvements

The next phase focuses on brand consistency and distribution:

▸Brand kit support — custom fonts, color palettes, and logo overlays per workspace
▸Multi-language voiceover via Gemini's multilingual capabilities
▸Direct publishing to Instagram Graph API and TikTok Creator API
▸Template library with 50+ niche-specific reel formats
▸Hook A/B testing with engagement prediction scoring

Learnings

Building AI-powered pipelines taught me that determinism matters more than creativity at the architecture level. Users need to predict what they'll get. We spent significant effort making Gemini's output consistent through structured prompting and strict JSON schemas — this reliability was what users praised most, not the AI itself.

The broader insight: AI tools succeed when they eliminate friction, not when they replace humans. The best feedback from beta wasn't "this does everything" — it was "this gets me to a draft I'm proud to edit."

AI Reels Generator