Transformers documentation

DiffusionGemma

Transformers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v5.11.0).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

This model was contributed to Hugging Face Transformers on 2026-06-10.

DiffusionGemma

Overview

DiffusionGemma is engineered to reduce the sequential bottlenecks of standard causal language models. It employs an encoder-decoder architecture specifically optimized for inference speed.

The encoder operates in a prefill capacity, processing the initial prompt and generating the KV cache. The decoder then utilizes bidirectional attention to process an input block (a ‘canvas’) of tokens, accessing the cached context via cross-attention.

During inference, DiffusionGemma leverages multi-canvas sampling. Rather than generating one token at a time, the model iteratively denoises a full block of tokens using a diffusion sampler. Once a canvas is fully denoised, it is processed by the encoder and appended to the KV cache, after which the model generates the next canvas. This block-autoregressive approach facilitates text generation at higher speeds.

You can find the model card and checkpoint here.

Usage examples

Despite it being a text diffusion model and having a custom generation loop, most of the interface is shared with other model that can generate text with .generate(). If you’re using another transformers model in your app, you should be able to directly replace it with this model.

Common caveats:

DiffusionGemma doesn’t accept use_cache. It always uses a KV cache;
Support for common flags like top_k won’t be available at release day, but will be added over time if they are compatible with text diffusion.

from transformers import DiffusionGemmaForBlockDiffusion, AutoProcessor


model = DiffusionGemmaForBlockDiffusion.from_pretrained(
    "google/diffusiongemma-26B-A4B-it", device_map="auto",
)
processor = AutoProcessor.from_pretrained("google/diffusiongemma-26B-A4B-it")

messages = [
    {
        "role": "user", "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
]
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Set `cache_implementation="static"` in `generate` to trigger `torch.compile`.
# Compilation is much faster, after warming up!
output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output.sequences[0][input_len:], skip_special_tokens=True))

Like other models that can generate text, you can set a streamer class to stream text. Unlike other models, DiffusionGemma generates intermediate drafts before the final text. You can visualize them with TextDiffusionStreamer

from transformers import TextDiffusionStreamer

# (... copy from the example above, up to the `generate` call)
streamer = TextDiffusionStreamer(tokenizer=processor.tokenizer)
model.generate(**inputs, max_new_tokens=256, streamer=streamer)

Transformers

DiffusionGemma

Overview

Usage examples

DiffusionGemmaTextConfig

class transformers.DiffusionGemmaTextConfig

DiffusionGemmaConfig

class transformers.DiffusionGemmaConfig

DiffusionGemmaGenerationOutput

class transformers.DiffusionGemmaGenerationOutput

DiffusionGemmaGenerationMixin

class transformers.DiffusionGemmaGenerationMixin

adjust_generation_fn

generate

DiffusionGemmaGenerationConfig

class transformers.DiffusionGemmaGenerationConfig

EntropyBoundSamplerConfig

class transformers.EntropyBoundSamplerConfig

EntropyBoundSampler

class transformers.EntropyBoundSampler

accept_canvas

initialize_canvas

renoise_canvas

StableAndConfidentStoppingCriteria

class transformers.StableAndConfidentStoppingCriteria

LinearTemperatureScheduleLogitsProcessor

class transformers.LinearTemperatureScheduleLogitsProcessor

DiffusionGemmaPreTrainedModel

class transformers.DiffusionGemmaPreTrainedModel

_forward_unimplemented

DiffusionGemmaModel

class transformers.DiffusionGemmaModel

forward

DiffusionGemmaEncoderModel

class transformers.DiffusionGemmaEncoderModel

forward

DiffusionGemmaEncoderTextModel

class transformers.DiffusionGemmaEncoderTextModel

forward

DiffusionGemmaDecoderModel

class transformers.DiffusionGemmaDecoderModel

forward

DiffusionGemmaForBlockDiffusion

class transformers.DiffusionGemmaForBlockDiffusion

forward