Google DeepMind's DiffusionGemma Gets a Turbo Boost from NVIDIA: Local AI That Thinks in Blocks

Elena
June 11, 2026

Google DeepMind's DiffusionGemma Gets a Turbo Boost from NVIDIA: Local AI That Thinks in Blocks

For years every large language model has worked in the same way: it generates one word at a time with each new word depending on the word before it. This is why AI chat feels like it is typing, word by word pause by pause.

Google DeepMind just changed this. They released DiffusionGemma, a model that generates text in a different way. It starts with noise. Then refines a whole block of text at once like diffusion models generate images.

DiffusionGemma does not generate one word at a time. It works on up to 256 words at the time. This means the model thinks in blocks, not one word at a time. This is a change and it makes the model work much faster for single users.

DiffusionGemma is built on Gemma 4. Gemma 4 is a model with 26 billion parameters.. It only uses 3.8 billion parameters at a time. This makes the model work efficiently. By combining a diffusion model with Gemma the new model gets the best of both worlds. It has the quality of a model and the efficiency of a smaller model.

The old way of generating text was slow because it was limited by memory. The new way is faster because it uses the power of the computer efficiently. DiffusionGemma can generate a block of text at once which is what computers are good at.

The numbers show how faster DiffusionGemma is. On an NVIDIA H100 Tensor Core GPU it can generate 1,000 words per second. On an NVIDIA DGX Spark it can generate 150 words per second. On an NVIDIA DGX Station it can generate up to 2,000 words per second. This is 4 times faster than the old way.

DiffusionGemma is free and open. It can run on hardware without needing the cloud. It can run on GeForce RTX GPUs, NVIDIA RTX PRO 6000 workstations, DGX Spark and DGX Station.

The change from the way to DiffusionGemma is a big deal. It means that AI models can work faster and feel more instantaneous. NVIDIA and Google DeepMind have shown that the future of AI is not about making bigger models. It is about making models that work well with the hardware. DiffusionGemma is available to try out

DiffusionGemma is a change for developers, researchers and AI enthusiasts. It means they can run AI models on their hardware without waiting for the cloud. This makes it easier to work with AI and make things. The future of AI is looking brighter with DiffusionGemma.

Some of the features of DiffusionGemma include:

* Generating text in blocks, not one word at a time

* Working on up to 256 words at the time

* Being built on Gemma 4

* Having the quality of a model and the efficiency of a smaller model

* Being free and open

* Being able to run on local hardware

DiffusionGemma is a way of generating text. It is faster and more efficient than the way. It is a change and it is going to make a big difference, in the world of AI.

AI Is Not One Technology – It's a Family of 15+ Subsets Transforming Every Industry

Jun 18, 2026 Elena

AI Search Visibility Shift Signals New Discovery Paradigm

Jun 18, 2026 Elena

Why Writers Must Stop Resisting AI: A Cultural Historian's Case for the Shared Language Model

Jun 18, 2026 Nisha

How Inova Modernized Its Data Architecture in 6 Months to Deploy 70+ AI Applications

Jun 18, 2026 Nisha

$60 billion acquisition deal and funding history

Jun 16, 2026 Elena

View All Posts

Google DeepMind's DiffusionGemma Gets a Turbo Boost from NVIDIA: Local AI That Thinks in Blocks

AI Is Not One Technology – It's a Family of 15+ Subsets Transforming Every Industry

AI Search Visibility Shift Signals New Discovery Paradigm

Why Writers Must Stop Resisting AI: A Cultural Historian's Case for the Shared Language Model

How Inova Modernized Its Data Architecture in 6 Months to Deploy 70+ AI Applications

$60 billion acquisition deal and funding history

AI Is Not One Technology – It's a Family of 15+ Subsets Transforming Every Industry

AI Search Visibility Shift Signals New Discovery Paradigm

Why Writers Must Stop Resisting AI: A Cultural Historian's Case for the Shared Language Model

How Inova Modernized Its Data Architecture in 6 Months to Deploy 70+ AI Applications

Malaysia's Respond.io Secures $62.5M Series B for Global AI-Powered Customer Conversation Platform

Anthropic's AI Restrictions Expose India's Tech Vulnerability

AI Data Centers Get Power Upgrade from Schneider, Foxconn

NASA's Gemma 3-Powered Spacecraft Finds Targets Without Human Help

SpaceX IPO Listing Creates World's First Trillionaire as Elon Musk Crosses $1 Trillion

Meet Varya: India’s New Video AI That Generates Clips for Just 48 Paise

News Details

Google DeepMind's DiffusionGemma Gets a Turbo Boost from NVIDIA: Local AI That Thinks in Blocks

Related News

AI Is Not One Technology – It's a Family of 15+ Subsets Transforming Every Industry

AI Search Visibility Shift Signals New Discovery Paradigm

Why Writers Must Stop Resisting AI: A Cultural Historian's Case for the Shared Language Model

How Inova Modernized Its Data Architecture in 6 Months to Deploy 70+ AI Applications

Malaysia's Respond.io Secures $62.5M Series B for Global AI-Powered Customer Conversation Platform

Anthropic's AI Restrictions Expose India's Tech Vulnerability

AI Data Centers Get Power Upgrade from Schneider, Foxconn

NASA's Gemma 3-Powered Spacecraft Finds Targets Without Human Help

SpaceX IPO Listing Creates World's First Trillionaire as Elon Musk Crosses $1 Trillion

Meet Varya: India’s New Video AI That Generates Clips for Just 48 Paise