AI-Powered Automated Video Production Platform

Overview

Uniclone AI Movie is an end-to-end AI-powered video production platform that transforms simple text ideas into complete short videos. The system orchestrates multiple AI services—story generation, visual synthesis, voice narration, and video editing—to automate the entire creative pipeline.

Demo Videos

See the platform in action with these example outputs generated by the system:

These demonstrations showcase the complete pipeline from text input to final rendered video, including AI-generated visuals, synchronized narration, subtitle overlay, and background music composition.

The Challenge

Creating video content traditionally requires:

Professional scriptwriting and storyboarding
Visual design and illustration
Voice acting and audio production
Video editing and post-production

This process is time-consuming, expensive, and requires specialized skills. The goal was to build a platform that democratizes video creation by automating these steps through AI while maintaining creative control.

Technical Architecture

Backend: FastAPI Microservices

Built a robust Python backend using FastAPI with:

Asynchronous API handlers for long-running AI operations
SQLAlchemy ORM with PostgreSQL for data persistence
Alembic for database migrations
JWT authentication with passlib for secure user sessions
AWS S3 integration for scalable media storage

Frontend: Angular SPA

Developed a modern Angular 17 single-page application featuring:

Real-time project management with reactive UI updates
Multi-stage workflow for story → storyboard → video generation
Crew Discussion System - an AI director that provides creative feedback
Interactive storyboard editor with drag-and-drop reordering
Voice selection with preview demos for narration

AI Pipeline Integration

1. Story Generation (GPT-3.5)

Converts user ideas into coherent 650-850 character narratives
Supports multiple story types: romance, adventure, mystery, sci-fi, fairy tale
Content moderation for inappropriate inputs

2. Storyboard Segmentation

Dynamically splits stories into N scenes (user-configurable)
Generates visual descriptions for each scene
Creates background music prompts matching story mood

3. Image Generation

Primary: Stability AI (Stable Diffusion v1.6) with custom parameters
Fallback: Unofficial Midjourney API integration
Generates 1024x576 images optimized for video

4. Image-to-Video Conversion

Stability AI's image-to-video API
Configurable motion parameters (motion_bucket_id, cfg_scale)
Automatic polling for completion with retry logic

5. Text-to-Speech Synthesis

OpenAI TTS with 6 voice options (alloy, echo, fable, onyx, nova, shimmer)
Whisper API for timestamp extraction
Automatic SRT subtitle generation

6. Background Music Generation

Integration with music generation API (Suno-like service)
Custom prompt-based composition
Async polling with 30-second intervals

7. Video Post-Production (FFmpeg)

Video stretching to match narration duration
Subtitle overlay with custom fonts
Audio mixing (narration + background music)
Opening title card generation
Multi-clip concatenation into final video

Key Technical Challenges

1. Video Synchronization

Problem: Matching video duration with narration length while maintaining quality.

Solution: Implemented dynamic video stretching using FFmpeg's setpts filter:

video = video.filter("setpts", f"PTS*{stretch_factor}")

Combined with audio mixing to create perfectly synchronized output.

2. Async AI Service Orchestration

Problem: Multiple AI APIs with varying response times (2s for GPT, 30-60s for video generation).

Solution:

Designed polling-based architecture for long-running tasks
Implemented retry logic with exponential backoff
Used FastAPI's async capabilities for non-blocking operations

3. State Management Across Services

Problem: Tracking project state across 7+ generation steps with potential failures.

Solution:

Built comprehensive database schema with relationships (User → Project → Storyboard → Advice)
Implemented status tracking for each generation phase
Created rollback mechanisms for failed operations

4. Media Storage & Delivery

Problem: Handling large video files (10-50MB each) efficiently.

Solution:

Direct S3 uploads with pre-signed URLs
CloudFront-like delivery for low-latency streaming
Lazy loading for project thumbnails

Advanced Features

AI Director Crew System

AI Director Crew System Interface

Implemented a unique "crew discussion" feature where AI agents provide directorial feedback:

Analyzes storyboard descriptions for visual coherence
Suggests improvements for scene composition
Modifies prompts for better image generation
Maintains conversation history per project

Dynamic Storyboard Reordering

Built drag-and-drop interface with backend synchronization:

Optimistic UI updates for responsiveness
Server-side order validation and cascading updates
Automatic reindexing of dependent resources

Opening Title Card Generator

Created custom FFmpeg pipeline for animated title cards:

Fade in/out animations with Lanczos scaling
Dynamic text positioning and font rendering
Silent audio track for seamless concatenation

Results & Impact

Automated 90% of traditional video production pipeline
Reduced production time from hours to 5-10 minutes
Enabled non-technical users to create professional videos
Modular architecture allows easy integration of new AI models

Code Highlights

Storyboard Generation with Validation

max_retries = 3
retry_count = 0
while retry_count < max_retries:
    chat_completion = await gpt.client.chat.completions.create(
        messages=[...],
        model="gpt-3.5-turbo",
        response_format={"type": "json_object"}
    )
    try:
        story_json = json.loads(story_content)
        if "normal_output" not in story_json:
            raise ValueError("Invalid JSON structure")
        break
    except (json.JSONDecodeError, ValueError):
        retry_count += 1

FFmpeg Video Processing Pipeline

# Stretch video to match speech duration
video = video.filter("setpts", f"PTS*{stretch_factor}")
# Add subtitles
video = video.filter("subtitles", srt_path)
# Combine with audio
stream = ffmpeg.output(video, speech, output_file, 
                      vcodec="libx264", acodec="aac")

What I Learned

Multi-service orchestration: Coordinating 5+ external APIs with different response patterns
Video processing at scale: FFmpeg optimization for batch operations
Async Python patterns: Leveraging asyncio for concurrent AI requests
Database design for media: Modeling complex relationships with file references
Error resilience: Building retry logic and fallback strategies for unreliable services

Future Enhancements

WebSocket support for real-time progress updates
Fine-tuned models for better prompt engineering
Video style transfer and filters
Multi-language subtitle support
Collaborative editing features

Source Code: GitHub Repository
Tech Stack: FastAPI, Angular 17, PostgreSQL, OpenAI, Stability AI, FFmpeg, AWS S3