FLUX.2
The groundbreaking second-generation visual foundation architecture engineered by Black Forest Labs to bridge the gap between open-weight models and closed commercial systems. Unlike older diffusion networks, it integrates a massive 24-billion-parameter Mistral-3 vision language model for semantic scene comprehension, paired with a rectified flow transformer for pixel generation. A defining technical milestone is its native multi-reference input system, allowing users to feed up to 10 source images directly into the core model to lock in consistent characters, objects, or brand styles across different scenes without external LoRAs. Supporting up to 4-megapixel resolutions, FLUX.2 introduces absolute color precision via hex codes and elite typographic handling.
