S3-DiT ArchitectureDecoupled-DMDOnly 8 NFEs

Sub-Second AI Image
Generation

Experience the power of 6 billion parameters with sub-second inference latency. Z-Image Turbo delivers photorealistic quality in just 8 inference steps. Fits on 16GB VRAM consumer devices.

Prompt:

A futuristic cyberpunk street in Tokyo at night, neon signs, rain-soaked pavement...

Parameters

<1s

Inference Latency

NFEs Only

16GB

VRAM Compatible

🤗2.26k+ Likes on Hugging Face

📊#1 Open-Source on AI Arena Leaderboard

⬇️186k+ Downloads/Month

AI Image Generator

Harness the power of Z-Image Turbo's 6B parameter model. Enter your prompt and generate photorealistic images in under a second.

Configuration

Describe your image and adjust settings

Prompt

Try a sample prompt:

Dimensions

Generated Image

Your AI-generated masterpiece will appear here

Your generated image will appear here

Sub-second generation with Z-Image Turbo

Why Z-Image Turbo?

Built on cutting-edge research from Tongyi-MAI, Z-Image Turbo combines unprecedented speed with exceptional quality—ranking #1 among open-source models on the AI Arena Leaderboard.

Sub-Second Inference

Generate images in under 1 second on H800 GPUs. Lightning-fast creation without compromising quality.

6B Parameters

Massive 6 billion parameter model delivers state-of-the-art quality with exceptional detail and accuracy.

Only 8 NFEs

Highly optimized to achieve stunning results with just 8 Number of Function Evaluations (inference steps).

Photorealistic Quality

Excels at generating photorealistic images with exceptional aesthetic quality and fine details.

Bilingual Text Rendering

Accurate rendering of both English and Chinese text within images—a unique capability.

16GB VRAM Compatible

Runs comfortably on consumer GPUs with just 16GB VRAM. No expensive hardware required.

Cutting-Edge Technology

Z-Image Turbo is powered by innovative architectures and distillation techniques developed by the Tongyi-MAI research team.

S3-DiT Architecture

Scalable Single-Stream Diffusion Transformer

Z-Image adopts a novel Scalable Single-Stream DiT architecture where text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level as a unified input stream.

Maximizes parameter efficiency vs dual-stream
Unified processing of multimodal inputs
Scalable to billions of parameters

Decoupled-DMD

Distribution Matching Distillation

The core few-step distillation algorithm that empowers Z-Image's 8-step generation. It decouples two key mechanisms for optimized performance:

CFG Augmentation — Primary engine driving distillation
Distribution Matching — Regularizer for stability
Enhanced with DMDR (Reinforcement Learning)

Read the research papers:

📄Z-Image Paper 📄Decoupled-DMD 📄DMDR Paper

How It Works

Create stunning AI images in three simple steps

Describe Your Vision

Type a detailed description of the image you want. Z-Image Turbo excels at understanding complex prompts with excellent instruction adherence.

Customize Settings

Choose dimensions (up to 1024px) and adjust quality settings. The default 8 inference steps is optimized for the best quality/speed balance.

Generate & Download

Click generate and get your photorealistic image in under a second. Download in JPG, PNG, or WebP format.

Frequently Asked Questions

Got questions? We've got answers.

What is Z-Image Turbo?

Z-Image Turbo is a distilled version of Z-Image (造相), a 6B parameter image generation model developed by Tongyi-MAI. It delivers state-of-the-art quality with only 8 inference steps and sub-second latency.

How is Z-Image Turbo different from other AI generators?

Z-Image Turbo uses the innovative S3-DiT architecture and Decoupled-DMD distillation. It's ranked #1 among open-source models on the AI Arena Leaderboard while requiring only 8 NFEs (vs 20-50 for other models).

What makes the bilingual text rendering special?

Z-Image Turbo excels at accurately rendering complex Chinese and English text within images—a unique capability that most other models struggle with. This makes it ideal for creating images with embedded text.

What are the hardware requirements?

Z-Image Turbo fits comfortably within 16GB VRAM consumer devices. On enterprise-grade H800 GPUs, it achieves sub-second inference latency.

What is the optimal number of inference steps?

The recommended setting is 8 inference steps (NFEs), which provides the optimal balance between quality and speed. You can increase steps for potentially higher quality at the cost of longer generation time.

Can I use generated images commercially?

Z-Image Turbo is released under Apache-2.0 license. Images generated can be used for both personal and commercial projects. Please ensure your prompts don't infringe on copyrights or trademarks.

What image formats and sizes are supported?

We support JPG, PNG, and WebP formats. The maximum resolution is 1024×1024 pixels, with presets for square, landscape, and portrait orientations.

Where can I learn more about the technology?

Check out the research papers on arXiv: Z-Image (2511.22699), Decoupled-DMD (2511.22677), and DMDR (2511.13649). The model is also available on Hugging Face.

Ready to Experience Sub-Second AI Generation?

Join thousands of creators using Z-Image Turbo—the #1 open-source image generator. Try it free, no sign-up required.

Sub-Second AI ImageGeneration