A revolutionary next-frame prediction neural network structure that generates videos progressively. FramePack compresses input contexts to a constant length so the generation workload is invariant to video length.
FramePack is a next-frame prediction neural network structure that generates videos progressively. It compresses input contexts to a constant length making generation workload invariant to video length.
Generate 60-second, 30fps (1800 frames) videos with a 13B model using only 6GB VRAM. Laptop GPUs can handle it easily.
As a next-frame prediction model, you'll directly see the generated frames, getting plenty of visual feedback throughout the entire generation process.
Compresses input contexts to a constant length, making generation workload invariant to video length and supporting ultra-long video generation.
Provides a feature-complete desktop application with minimal standalone high-quality sampling system and memory management.
The girl dances gracefully, with clear movements, full of charm.
The man dances energetically, leaping mid-air with fluid arm swings and quick footwork.
One-click-package will be released soon. Please check back later.
# We recommend having an independent Python 3.10 environment
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
# Start the GUI
python demo_gradio.py
FramePack is a revolutionary video generation technology that compresses input contexts to a constant length, making the generation workload invariant to video length. Learn about our methods, architecture, and experimental results in detail.