PixArt-Σ

An advanced, highly data-efficient open-source text-to-image model that evolves the Diffusion Transformer (DiT) architecture to new heights of performance. It features a key structural upgrade to a 4K text token length and utilizes a novel weak-to-strong training strategy alongside a high-efficiency VAE. This allows PixArt-Σ to generate breathtaking 4K ultra-high-definition images while using significantly less computational power than comparable legacy models. It delivers exceptional prompt alignment, precise spatial layouts, and masterful text rendering within artwork. By drastically lowering the VRAM threshold required to produce studio-grade creative imagery, it democratizes high-fidelity content creation for independent creators, small game studios, and budget-conscious design teams.

[←]

Science & Technology