Project

AI Video Generation

Text-to-video & image-to-video at 720p

Built · Previously deployed on RunPod

Overview

Built and deployed the Wan2.2-TI2V-5B diffusion model (5 billion parameters) on serverless GPU for text-to-video and image-to-video generation. The system produces 720p video at 24fps in both landscape and portrait orientations, with configurable duration (2–5 seconds), guidance scale, and seed control.

The deployment supports both text-to-video (generating entirely from a text prompt) and image-to-video (animating a still image with a text prompt guiding the motion). Batch processing and optional S3 storage are built in for integration with downstream pipelines. The model was previously deployed on RunPod's serverless GPU infrastructure.

Key Features

5 billion parameter Wan2.2-TI2V diffusion model
Text-to-video and image-to-video generation modes
720p output at 24fps in landscape or portrait
Configurable duration (2–5s), guidance scale, and seed control
Batch processing for multiple generations
Optional S3 storage for output videos

Architecture

Text Prompt →──┌
                │ Wan2.2-TI2V-5B → Diffusion Steps → Frame Decode → Video (720p)
Image Input →──└                                                    ↓
                                                              S3 Upload (optional)

Deployment:
Docker Container (CUDA) → RunPod Serverless GPU → API Endpoint
                                                         ↓
                                                    Batch Processing

Text prompts and optional image inputs feed into the Wan2.2 diffusion pipeline. The model runs iterative denoising steps, decodes the latent frames into pixel space, and encodes the result as an MP4 video at 720p/24fps. The container handles VRAM management for the 5B-parameter model, with batch processing support for multiple requests. Output can be returned directly or uploaded to S3.

Sample Output

Three test generations at different frame counts, showing quality and motion progression. All generated by Wan2.2-TI2V-5B at 720p/24fps.

41 Frames (~1.7s)

65 Frames (~2.7s)

81 Frames (~3.4s)

Tech Stack

Wan2.2 Diffusion Models Docker RunPod Serverless CUDA S3 Python PyTorch

I build and deploy production AI systems.

Let's talk about your next project.

Get in touch See more projects