Overview
Built and deployed the Wan2.2-TI2V-5B diffusion model (5 billion parameters) on serverless GPU for text-to-video and image-to-video generation. The system produces 720p video at 24fps in both landscape and portrait orientations, with configurable duration (2–5 seconds), guidance scale, and seed control.
The deployment supports both text-to-video (generating entirely from a text prompt) and image-to-video (animating a still image with a text prompt guiding the motion). Batch processing and optional S3 storage are built in for integration with downstream pipelines. The model was previously deployed on RunPod's serverless GPU infrastructure.
Key Features
- 5 billion parameter Wan2.2-TI2V diffusion model
- Text-to-video and image-to-video generation modes
- 720p output at 24fps in landscape or portrait
- Configurable duration (2–5s), guidance scale, and seed control
- Batch processing for multiple generations
- Optional S3 storage for output videos
Architecture
Text Prompt →──┌
│ Wan2.2-TI2V-5B → Diffusion Steps → Frame Decode → Video (720p)
Image Input →──└ ↓
S3 Upload (optional)
Deployment:
Docker Container (CUDA) → RunPod Serverless GPU → API Endpoint
↓
Batch Processing
Text prompts and optional image inputs feed into the Wan2.2 diffusion pipeline. The model runs iterative denoising steps, decodes the latent frames into pixel space, and encodes the result as an MP4 video at 720p/24fps. The container handles VRAM management for the 5B-parameter model, with batch processing support for multiple requests. Output can be returned directly or uploaded to S3.
Sample Output
Three test generations at different frame counts, showing quality and motion progression. All generated by Wan2.2-TI2V-5B at 720p/24fps.
41 Frames (~1.7s)
65 Frames (~2.7s)
81 Frames (~3.4s)
Tech Stack
I build and deploy production AI systems.
Let's talk about your next project.