AI-Powered Automated Video Production Platform
Overview
Uniclone AI Movie is an end-to-end AI-powered video production platform that transforms simple text ideas into complete short videos. The system orchestrates multiple AI services—story generation, visual synthesis, voice narration, and video editing—to automate the entire creative pipeline.
Demo Videos
See the platform in action with these example outputs generated by the system:
These demonstrations showcase the complete pipeline from text input to final rendered video, including AI-generated visuals, synchronized narration, subtitle overlay, and background music composition.
The Challenge
Creating video content traditionally requires:
- Professional scriptwriting and storyboarding
- Visual design and illustration
- Voice acting and audio production
- Video editing and post-production
This process is time-consuming, expensive, and requires specialized skills. The goal was to build a platform that democratizes video creation by automating these steps through AI while maintaining creative control.
Technical Architecture
Backend: FastAPI Microservices
Built a robust Python backend using FastAPI with:
- Asynchronous API handlers for long-running AI operations
- SQLAlchemy ORM with PostgreSQL for data persistence
- Alembic for database migrations
- JWT authentication with passlib for secure user sessions
- AWS S3 integration for scalable media storage
Frontend: Angular SPA
Developed a modern Angular 17 single-page application featuring:
- Real-time project management with reactive UI updates
- Multi-stage workflow for story → storyboard → video generation
- Crew Discussion System - an AI director that provides creative feedback
- Interactive storyboard editor with drag-and-drop reordering
- Voice selection with preview demos for narration
AI Pipeline Integration
1. Story Generation (GPT-3.5)
- Converts user ideas into coherent 650-850 character narratives
- Supports multiple story types: romance, adventure, mystery, sci-fi, fairy tale
- Content moderation for inappropriate inputs
2. Storyboard Segmentation
- Dynamically splits stories into N scenes (user-configurable)
- Generates visual descriptions for each scene
- Creates background music prompts matching story mood
3. Image Generation
- Primary: Stability AI (Stable Diffusion v1.6) with custom parameters
- Fallback: Unofficial Midjourney API integration
- Generates 1024x576 images optimized for video
4. Image-to-Video Conversion
- Stability AI's image-to-video API
- Configurable motion parameters (motion_bucket_id, cfg_scale)
- Automatic polling for completion with retry logic
5. Text-to-Speech Synthesis
- OpenAI TTS with 6 voice options (alloy, echo, fable, onyx, nova, shimmer)
- Whisper API for timestamp extraction
- Automatic SRT subtitle generation
6. Background Music Generation
- Integration with music generation API (Suno-like service)
- Custom prompt-based composition
- Async polling with 30-second intervals
7. Video Post-Production (FFmpeg)
- Video stretching to match narration duration
- Subtitle overlay with custom fonts
- Audio mixing (narration + background music)
- Opening title card generation
- Multi-clip concatenation into final video
Key Technical Challenges
1. Video Synchronization
Problem: Matching video duration with narration length while maintaining quality.
Solution: Implemented dynamic video stretching using FFmpeg's setpts filter:
video = video.filter("setpts", f"PTS*{stretch_factor}")Combined with audio mixing to create perfectly synchronized output.
2. Async AI Service Orchestration
Problem: Multiple AI APIs with varying response times (2s for GPT, 30-60s for video generation).
Solution:
- Designed polling-based architecture for long-running tasks
- Implemented retry logic with exponential backoff
- Used FastAPI's async capabilities for non-blocking operations
3. State Management Across Services
Problem: Tracking project state across 7+ generation steps with potential failures.
Solution:
- Built comprehensive database schema with relationships (User → Project → Storyboard → Advice)
- Implemented status tracking for each generation phase
- Created rollback mechanisms for failed operations
4. Media Storage & Delivery
Problem: Handling large video files (10-50MB each) efficiently.
Solution:
- Direct S3 uploads with pre-signed URLs
- CloudFront-like delivery for low-latency streaming
- Lazy loading for project thumbnails
Advanced Features
AI Director Crew System

Implemented a unique "crew discussion" feature where AI agents provide directorial feedback:
- Analyzes storyboard descriptions for visual coherence
- Suggests improvements for scene composition
- Modifies prompts for better image generation
- Maintains conversation history per project
Dynamic Storyboard Reordering
Built drag-and-drop interface with backend synchronization:
- Optimistic UI updates for responsiveness
- Server-side order validation and cascading updates
- Automatic reindexing of dependent resources
Opening Title Card Generator
Created custom FFmpeg pipeline for animated title cards:
- Fade in/out animations with Lanczos scaling
- Dynamic text positioning and font rendering
- Silent audio track for seamless concatenation
Results & Impact
- Automated 90% of traditional video production pipeline
- Reduced production time from hours to 5-10 minutes
- Enabled non-technical users to create professional videos
- Modular architecture allows easy integration of new AI models
Code Highlights
Storyboard Generation with Validation
max_retries = 3
retry_count = 0
while retry_count < max_retries:
chat_completion = await gpt.client.chat.completions.create(
messages=[...],
model="gpt-3.5-turbo",
response_format={"type": "json_object"}
)
try:
story_json = json.loads(story_content)
if "normal_output" not in story_json:
raise ValueError("Invalid JSON structure")
break
except (json.JSONDecodeError, ValueError):
retry_count += 1FFmpeg Video Processing Pipeline
# Stretch video to match speech duration
video = video.filter("setpts", f"PTS*{stretch_factor}")
# Add subtitles
video = video.filter("subtitles", srt_path)
# Combine with audio
stream = ffmpeg.output(video, speech, output_file,
vcodec="libx264", acodec="aac")What I Learned
- Multi-service orchestration: Coordinating 5+ external APIs with different response patterns
- Video processing at scale: FFmpeg optimization for batch operations
- Async Python patterns: Leveraging asyncio for concurrent AI requests
- Database design for media: Modeling complex relationships with file references
- Error resilience: Building retry logic and fallback strategies for unreliable services
Future Enhancements
- WebSocket support for real-time progress updates
- Fine-tuned models for better prompt engineering
- Video style transfer and filters
- Multi-language subtitle support
- Collaborative editing features
Source Code: GitHub Repository
Tech Stack: FastAPI, Angular 17, PostgreSQL, OpenAI, Stability AI, FFmpeg, AWS S3