Meta on April 10 revealed the latest outcome of its chip development endeavors a day after Intel introduced its AI accelerator hardware.
Dubbed the “next-gen” Meta Training and Inference Accelerator (MTIA), this chip serves as the successor to last year’s MTIA v1. Its capabilities include running models for tasks such as ranking and recommending display advertisements across Meta-owned platforms like Facebook.
The next-generation MTIA, a follow-up to MTIA v1, represents a significant leap forward in chip technology for Meta. Unlike its predecessor, which utilized a 7nm manufacturing process, the new MTIA is built on a more advanced 5nm process.
This essentially means that the components on the chip are smaller and more densely packed, allowing for improved performance.
Physically, the next-gen MTIA is larger than its predecessor and boasts more processing cores. Despite consuming more power at 90W compared to MTIA v1’s 25W, it offers enhancements such as increased internal memory (128MB versus 64MB) and operates at a higher average clock speed (1.35GHz up from 800MHz).
According to Meta, the next-gen MTIA is currently operational in 16 of its data center regions and demonstrates up to three times better overall performance compared to MTIA v1.
“Because we control the whole stack, we can achieve greater efficiency compared to commercially available GPUs,” Meta wrote in a blog post.
In a bid to keep pace with its competitors in the field of generative AI, Meta is pouring billions of dollars into its own AI initiatives.
A significant portion of this investment is directed towards recruiting top AI researchers. However, the lion’s share of these funds is allocated to the development of hardware, particularly chips designed to power and train Meta’s AI models.
Unexpected Timing and Challenges
Meta’s recent unveiling of its hardware comes as a surprise, especially given its timing just 24 hours after a press briefing highlighting the company’s ongoing generative AI endeavors. This move by Meta is notable for several reasons.
Firstly, Meta’s blog post reveals that contrary to expectations, the next-gen MTIA is not currently utilized for generative AI training tasks, although the company asserts it is actively exploring this avenue through various programs.
Secondly, Meta acknowledges that the next-gen MTIA is not intended to replace GPUs for running or training models; rather, it is meant to complement them.
Reading between the lines, it appears that Meta’s progress in the AI race may be slower than anticipated.
Meta’s AI teams are almost certainly under pressure to cut costs. The company is set to spend an estimated $18 billion by the end of 2024 on GPUs for training and running generative AI models, and — with training costs for cutting-edge generative models ranging in the tens of millions of dollars — in-house hardware presents an attractive alternative.
And while Meta’s hardware drags, rivals are pulling ahead, much to the consternation of Meta’s leadership.
This week, Google introduced its fifth-generation custom chip, TPU v5p, for training AI models, now accessible to Google Cloud users. Furthermore, Google launched its inaugural chip designed specifically for executing models, called Axion.
Amazon has various custom AI chip families, and Microsoft joined the competition last year with the Azure Maia AI Accelerator and the Azure Cobalt 100 CPU.
Meta stated that it took less than nine months to transition from initial silicon development to the production phase of the next-generation MTIA. This timeframe is relatively shorter compared to the usual duration for Google TPUs.
However, Meta still has significant ground to cover in order to reduce its reliance on third-party GPUs and stay competitive with its rivals.
READ ALSO: Government’s Performance Tracker Fabricated- Adams