OpenAI’s ChapGPT, Microsoft’s Copilot, Google’s Bard, and latest Elon Musk’s TruthGPT – what will be the next buzzword for AI? In just under six months, the AI competition has heated up, stirring up ripples in the once-calm AI server market, as AI-generated content (AIGC) models take center stage.
The convenience unprecedentedly brought by AIGC has attracted a massive number of users, with OpenAI’s mainstream model, GPT-3, receiving up to 25 million daily visits, often resulting in server overload and disconnection issues.
Given the evolution of these models has led to an increase in training parameters and data volume, making computational power even more scarce, OpenAI has reluctantly adopted measures such as paid access and traffic restriction to stabilize the server load.
High-end Cloud Computing is gaining momentum
According to Trendforce, AI servers currently have a merely 1% penetration rate in global data centers, which is far from sufficient to cope with the surge in data demand from the usage side. Therefore, besides optimizing software to reduce computational load, increasing the number of high-end AI servers in hardware will be another crucial solution.
Take GPT-3 for instance. The model requires at least 4,750 AI servers with 8 GPUs for each, and every similarly large language model like ChatGPT will need 3,125 to 5,000 units. Considering ChapGPT and Microsoft’s other applications as a whole, the need for AI servers is estimated to reach some 25,000 units in order to meet the basic computing power.
As the emerging applications of AIGC and its vast commercial potential have both revealed the technical roadmap moving forward, it also shed light on the bottlenecks in the supply chain.
The down-to-earth problem: cost
Compared to general-purpose servers that use CPUs as their main computational power, AI servers heavily rely on GPUs, and DGX A100 and H100, with computational performance up to 5 PetaFLOPS, serve as primary AI server computing power. Given that GPU costs account for over 70% of server costs, the increase in the adoption of high-end GPUs has made the architecture more expansive.
Moreover, a significant amount of data transmission occurs during the operation, which drives up the demand for DDR5 and High Bandwidth Memory (HBM). The high power consumption generated during operation also promotes the upgrade of components such as PCBs and cooling systems, which further raises the overall cost.
Not to mention the technical hurdles posed by the complex design architecture – for example, a new approach for heterogeneous computing architecture is urgently required to enhance the overall computing efficiency.
The high cost and complexity of AI servers has inevitably limited their development to only large manufacturers. Two leading companies, HPE and Dell, have taken different strategies to enter the market:
With the booming market for AIGC applications, we seem to be one step closer to a future metaverse centered around fully virtualized content. However, it remains unclear whether the hardware infrastructure can keep up with the surge in demand. This persistent challenge will continue to test the capabilities of cloud server manufacturers to balance cost and performance.
(Photo credit: Google)