AI-Powered Automated Video Production Platform

FastAPIAngularOpenAI GPTStability AIFFmpegPostgreSQLAWS S3PythonTypeScript

Overview

Uniclone AI Movie is an end-to-end AI-powered video production platform that transforms simple text ideas into complete short videos. The system orchestrates multiple AI services—story generation, visual synthesis, voice narration, and video editing—to automate the entire creative pipeline.

Demo Videos

See the platform in action with these example outputs generated by the system:

These demonstrations showcase the complete pipeline from text input to final rendered video, including AI-generated visuals, synchronized narration, subtitle overlay, and background music composition.

The Challenge

Creating video content traditionally requires:

  • Professional scriptwriting and storyboarding
  • Visual design and illustration
  • Voice acting and audio production
  • Video editing and post-production

This process is time-consuming, expensive, and requires specialized skills. The goal was to build a platform that democratizes video creation by automating these steps through AI while maintaining creative control.

Technical Architecture

Backend: FastAPI Microservices

Built a robust Python backend using FastAPI with:

  • Asynchronous API handlers for long-running AI operations
  • SQLAlchemy ORM with PostgreSQL for data persistence
  • Alembic for database migrations
  • JWT authentication with passlib for secure user sessions
  • AWS S3 integration for scalable media storage

Frontend: Angular SPA

Developed a modern Angular 17 single-page application featuring:

  • Real-time project management with reactive UI updates
  • Multi-stage workflow for story → storyboard → video generation
  • Crew Discussion System - an AI director that provides creative feedback
  • Interactive storyboard editor with drag-and-drop reordering
  • Voice selection with preview demos for narration

AI Pipeline Integration

1. Story Generation (GPT-3.5)

  • Converts user ideas into coherent 650-850 character narratives
  • Supports multiple story types: romance, adventure, mystery, sci-fi, fairy tale
  • Content moderation for inappropriate inputs

2. Storyboard Segmentation

  • Dynamically splits stories into N scenes (user-configurable)
  • Generates visual descriptions for each scene
  • Creates background music prompts matching story mood

3. Image Generation

  • Primary: Stability AI (Stable Diffusion v1.6) with custom parameters
  • Fallback: Unofficial Midjourney API integration
  • Generates 1024x576 images optimized for video

4. Image-to-Video Conversion

  • Stability AI's image-to-video API
  • Configurable motion parameters (motion_bucket_id, cfg_scale)
  • Automatic polling for completion with retry logic

5. Text-to-Speech Synthesis

  • OpenAI TTS with 6 voice options (alloy, echo, fable, onyx, nova, shimmer)
  • Whisper API for timestamp extraction
  • Automatic SRT subtitle generation

6. Background Music Generation

  • Integration with music generation API (Suno-like service)
  • Custom prompt-based composition
  • Async polling with 30-second intervals

7. Video Post-Production (FFmpeg)

  • Video stretching to match narration duration
  • Subtitle overlay with custom fonts
  • Audio mixing (narration + background music)
  • Opening title card generation
  • Multi-clip concatenation into final video

Key Technical Challenges

1. Video Synchronization

Problem: Matching video duration with narration length while maintaining quality.

Solution: Implemented dynamic video stretching using FFmpeg's setpts filter:

video = video.filter("setpts", f"PTS*{stretch_factor}")

Combined with audio mixing to create perfectly synchronized output.

2. Async AI Service Orchestration

Problem: Multiple AI APIs with varying response times (2s for GPT, 30-60s for video generation).

Solution:

  • Designed polling-based architecture for long-running tasks
  • Implemented retry logic with exponential backoff
  • Used FastAPI's async capabilities for non-blocking operations

3. State Management Across Services

Problem: Tracking project state across 7+ generation steps with potential failures.

Solution:

  • Built comprehensive database schema with relationships (User → Project → Storyboard → Advice)
  • Implemented status tracking for each generation phase
  • Created rollback mechanisms for failed operations

4. Media Storage & Delivery

Problem: Handling large video files (10-50MB each) efficiently.

Solution:

  • Direct S3 uploads with pre-signed URLs
  • CloudFront-like delivery for low-latency streaming
  • Lazy loading for project thumbnails

Advanced Features

AI Director Crew System

AI Director Crew System Interface

Implemented a unique "crew discussion" feature where AI agents provide directorial feedback:

  • Analyzes storyboard descriptions for visual coherence
  • Suggests improvements for scene composition
  • Modifies prompts for better image generation
  • Maintains conversation history per project

Dynamic Storyboard Reordering

Built drag-and-drop interface with backend synchronization:

  • Optimistic UI updates for responsiveness
  • Server-side order validation and cascading updates
  • Automatic reindexing of dependent resources

Opening Title Card Generator

Created custom FFmpeg pipeline for animated title cards:

  • Fade in/out animations with Lanczos scaling
  • Dynamic text positioning and font rendering
  • Silent audio track for seamless concatenation

Results & Impact

  • Automated 90% of traditional video production pipeline
  • Reduced production time from hours to 5-10 minutes
  • Enabled non-technical users to create professional videos
  • Modular architecture allows easy integration of new AI models

Code Highlights

Storyboard Generation with Validation

max_retries = 3
retry_count = 0
while retry_count < max_retries:
    chat_completion = await gpt.client.chat.completions.create(
        messages=[...],
        model="gpt-3.5-turbo",
        response_format={"type": "json_object"}
    )
    try:
        story_json = json.loads(story_content)
        if "normal_output" not in story_json:
            raise ValueError("Invalid JSON structure")
        break
    except (json.JSONDecodeError, ValueError):
        retry_count += 1

FFmpeg Video Processing Pipeline

# Stretch video to match speech duration
video = video.filter("setpts", f"PTS*{stretch_factor}")
# Add subtitles
video = video.filter("subtitles", srt_path)
# Combine with audio
stream = ffmpeg.output(video, speech, output_file, 
                      vcodec="libx264", acodec="aac")

What I Learned

  • Multi-service orchestration: Coordinating 5+ external APIs with different response patterns
  • Video processing at scale: FFmpeg optimization for batch operations
  • Async Python patterns: Leveraging asyncio for concurrent AI requests
  • Database design for media: Modeling complex relationships with file references
  • Error resilience: Building retry logic and fallback strategies for unreliable services

Future Enhancements

  • WebSocket support for real-time progress updates
  • Fine-tuned models for better prompt engineering
  • Video style transfer and filters
  • Multi-language subtitle support
  • Collaborative editing features

Source Code: GitHub Repository
Tech Stack: FastAPI, Angular 17, PostgreSQL, OpenAI, Stability AI, FFmpeg, AWS S3