i1-3B — Text-to-Image

A simple and fully open recipe for strong text-to-image models, from Princeton's Z-Lab. The 3B DiT generates 1024px images, conditioned on T5Gemma text embeddings and decoded with the FLUX.2 VAE. Prompts are rewritten by Qwen3-4B using the official metaprompt (recommended — the model is trained on long, descriptive captions).