Google DeepMind's DiffusionGemma Gets a Turbo Boost from NVIDIA: Local AI That Thinks in Blocks
For years every large language model has worked in the same way: it generates one word at a time with each new word depending on the word before it. This is why AI chat feels like it is typing, word by word pause by pause.
Google DeepMind just changed this. They released DiffusionGemma, a model that generates text in a different way. It starts with noise. Then refines a whole block of text at once like diffusion models generate images.
DiffusionGemma does not generate one word at a time. It works on up to 256 words at the time. This means the model thinks in blocks, not one word at a time. This is a change and it makes the model work much faster for single users.
DiffusionGemma is built on Gemma 4. Gemma 4 is a model with 26 billion parameters.. It only uses 3.8 billion parameters at a time. This makes the model work efficiently. By combining a diffusion model with Gemma the new model gets the best of both worlds. It has the quality of a model and the efficiency of a smaller model.
The old way of generating text was slow because it was limited by memory. The new way is faster because it uses the power of the computer efficiently. DiffusionGemma can generate a block of text at once which is what computers are good at.
The numbers show how faster DiffusionGemma is. On an NVIDIA H100 Tensor Core GPU it can generate 1,000 words per second. On an NVIDIA DGX Spark it can generate 150 words per second. On an NVIDIA DGX Station it can generate up to 2,000 words per second. This is 4 times faster than the old way.
DiffusionGemma is free and open. It can run on hardware without needing the cloud. It can run on GeForce RTX GPUs, NVIDIA RTX PRO 6000 workstations, DGX Spark and DGX Station.
The change from the way to DiffusionGemma is a big deal. It means that AI models can work faster and feel more instantaneous. NVIDIA and Google DeepMind have shown that the future of AI is not about making bigger models. It is about making models that work well with the hardware. DiffusionGemma is available to try out
DiffusionGemma is a change for developers, researchers and AI enthusiasts. It means they can run AI models on their hardware without waiting for the cloud. This makes it easier to work with AI and make things. The future of AI is looking brighter with DiffusionGemma.
Some of the features of DiffusionGemma include:
* Generating text in blocks, not one word at a time
* Working on up to 256 words at the time
* Being built on Gemma 4
* Having the quality of a model and the efficiency of a smaller model
* Being free and open
* Being able to run on local hardware
DiffusionGemma is a way of generating text. It is faster and more efficient than the way. It is a change and it is going to make a big difference, in the world of AI.