Microsoft-Backed Startup d-Matrix Launches Innovative AI Processor to Revolutionize Inference Without GPUs

A Santa Clara-based startup announces a new Corsair processor, which it says will revolutionize AI inference, not requiring traditional GPUs or pricey high-bandwidth memory.

Microsoft supports this startup, providing the latest technology and support for its pioneering AI solutions.

Corsair achieves 60,000 tokens per second for Llama3 8B models and 30,000 tokens per second for more complex Llama3 70B models with much better energy and cost efficiency than traditional GPU solutions.

Key Background:

d-Matrix Inc, a Silicon Valley based hardware start up, today announced the industry’s first breakthrough AI processor, Corsair. Backed by Microsoft, Corsair is meant to define new future prospects of AI Inference. It comes with performance benefits and savings compared to utilizing traditional GPUs, coupled with high-bandwidth memory at expensive costs.

Corsair’s innovative architecture is meant to optimize generative AI models in terms of their performance with the most demanding inference tasks. At this processor, Llama3 8B models produce 60,000 tokens per second, each at 1 millisecond. Even more demanding scenarios, like Llama3 70B models, can be supported by a single server rack with a throughput of 30,000 tokens per second, 2 milliseconds per token. These capabilities reduce operational and energy costs while challenging the reliance on solutions based on GPUs, which traditionally are used in AI inference.

The Corsair processor utilizes an advanced chiplet-based design that combines computation and memory to reach maximum efficiency. The product comes from Nighthawk and Jayhawk II tiles, produced through a 6nm process. Every tile on Nighthawk supports four neural cores and a single RISC-V CPU, optimized for digital in-memory computation or DIMC. They can support wide data types, like block floating point, or BFP, and make possible large-scale model inference. The Corsair is packed as a PCIe Gen5 full height, full-length card and enables scalable performance through the integration of DMX Bridge cards. Each unit of Corsair presents 2400 TFLOPs of peak computing power in 8-bit precision with 2GB integrated memory and up to 256GB of off-chip memory.

The versatility of the processor truly does seem to optimize its design for transformer-based models, agentic AI, and interactive video generation-that is, the areas in AI where the demand for most computationally efficient solutions really is paramount. So its design really does deliver on high-speed token generation for interactive applications-making the possibility of considering, and eventually being commercially viable for, generative AI with more than one user worthwhile.

Corsair is already available to early-access customers and will be generally available in the second quarter of 2025. More interestingly, the company has partnered with Micron Technology, which is a key partner for Nvidia, to boost the functionality of its processors. As told by Sid Sheth, a co-founder and CEO of d-Matrix, “Our goal at d-Matrix is to help bridge the increasingly urgent need for large-scale generative AI and transformers, positioning Corsair as a next-generation solution for AI inference.”.