A Revolutionary AI Chip
Boost AI speed by 8x using 98% less energy. Discover Onyx, the first chip that fully leverages sparsity for efficient AI.
For artificial intelligence models, scale matters.
Although some AI experts have warned that performance gains in large language models (LLMs) are gradually diminishing, companies continue to release ever-larger AI tools. Meta’s latest Llama model boasts a staggering 2 trillion parameters.
As models grow in scale, their capabilities also increase. However, energy consumption and runtime rise as well, leading to higher carbon emissions. To mitigate these issues, people have begun using smaller, less capable models and opting for lower-precision parameters whenever possible.
But there is another approach—one that preserves the high performance of massive models while reducing runtime and energy consumption. The key lies in making friends with the zeros inside large AI models.
In many models, the majority of parameters (weights and activations) are actually zero, or close enough to zero that they can be treated as zero without any loss in accuracy. This property is called sparsity. Sparsity offers an excellent opportunity to save computational resources: there is no need to waste time and energy performing additions or multiplications with zeros—these operations can simply be skipped. There is also no need to store large numbers of zeros in memory; only the non-zero parameters need to be stored.



