Real time data processing at the source is required for edge computing with reduced latency for Internet of Things (IoT) and 5G networks as they use cloud.
How to Get Your Data Center Ready for AI? Part One: Advanced Cooling
by
GIGABYTE
The proliferation of artificial intelligence has led to the broader adoption of innovative technology, such as advanced cooling and cluster computing, in data centers around the world. Specifically, the rollout of powerful AI processors with ever higher TDPs has made it all but mandatory for data centers to upgrade or even retrofit their infrastructure to utilize more energy-efficient and cost-effective cooling. In part one of GIGABYTE Technology’s latest Tech Guide, we explore the industry’s most advanced cooling solutions so you can evaluate whether your data center can leverage them to get ready for the era of AI.
Industry insiders familiar with the natural progression of the modern data center will appreciate that the current artificial intelligence (AI) trend has accelerated rather than altered the evolution of this bedrock of the information technology (IT) world. As technology progresses, it is natural for server processors to grow more powerful and consume more power, and so the predominant method of cooling them with air conditioning was always going to run into a wall. The advent of AI spurred the adoption of state-of-the-art CPUs and GPUs characterized by unprecedented thermal design power (TDP). As a result, the inevitable day when air cooling will no longer be enough may come sooner instead of later.
Already, the most advanced AI chips on the market are barreling past the thermal limits of air cooling. At the time of writing, the average power density of an air-cooled server rack is well below 20 kilowatts (kW). A single NVIDIA H100 accelerator has a maximum TDP of up to 700 watts, while the next generation of enterprise GPUs, like NVIDIA B100 and B200, may have a maximum TDP of 1,000 watts. If we use GIGABYTE’s G593-ZD1 AI Server as an example, a 5U server can house eight GPUs. In other words, even a moderately populated rack (which offers 42 to 48 rack units of space) will generate more heat than what’s generally considered cost-effective for air cooling. Because of this, it should come as no surprise that NVIDIA’s rack-scale AI supercomputer, the GB200 NVL72, was designed from the get-go to be deployed with liquid cooling.
If the AI wave is the “push” that’s motivating data center operators to evaluate new thermal management tools, the “pull” must be the benefits of utilizing such tools, which can be summarized as upgrades to “performance”, “sustainability”, and “cost-efficiency”.
● Performance The most sophisticated chip in the world won’t be able to unleash its full potential if it keeps getting too hot. When processors run at peak capacity, the heat that they generate must be quickly dispelled to avoid overheating and throttling. Effective cooling can ensure maximum performance as well as stable operations.
● Sustainability Cranking up the AC for better cooling is a double-edged sword because it also consumes more energy, which hampers the data center’s power usage effectiveness (PUE) while ballooning the carbon footprint. The strength of advanced cooling is that it can dissipate more heat with less energy, allowing organizations to ramp up productivity while lowering emissions and moving toward ESG targets.
● Cost-efficiency Procuring new cooling equipment may take a chunk out of the IT budget, but the smaller power bill will help decrease operating expenses (OpEx) in the long run. Improved operational stability also means less downtime and fewer components to repair and replace, which reduces total cost of ownership (TCO). Greater cost-efficiency is a competitive edge that no organization can afford to pass up on.
The three benefits listed above guarantee that advanced cooling will become a mainstay of data centers and server rooms around the world, regardless of whether AI continues to be the driving force behind its adoption. In the next sections, we will discuss the three primary methods of cooling: liquid cooling, immersion cooling, and air cooling—and how GIGABYTE can help you incorporate them into your IT infrastructure.
Direct Liquid Cooling (DLC) or Direct-to-chip (D2C) Liquid Cooling
There is no doubt that liquid cooling, which pipes liquid coolant into servers through cooling loops and absorbs the heat from key components through cold plates, is gaining traction faster than ever thanks to the requirement of AI processors. As already stated, many AI supercomputers are designed with liquid cooling in mind. As data center operators, including major cloud service providers (CSPs), upgrade their infrastructure to adopt liquid cooling, it may only be a matter of time before this becomes a new standard in the industry.
Adding DLC to your data center may be easier than you think. You may start with the “liquid-to-air” option, which doesn’t require a significant overhaul of the traditional air cooling infrastructure. Hot coolant coming out of the servers are chilled with coolant distribution units (CDUs) that expel the heat into the hot aisle, in a manner that’s identical to air-cooled server racks. The “liquid-to-liquid” variant can dissipate more heat, but it requires the facility to have a built-in liquid-based cooling loop that draws water from an on-site source. A Rear Door Heat Exchanger (RDHx) can also be added to the back of the server rack for greater energy efficiency.
Since reliability and safety are critically important when implementing liquid cooling, GIGABYTE works closely with our verified partners to offer a complete solution. From the cold plates inside the servers, to quality-assured connectors supplied by the industry veteran Stäubli, to leak sensor boards for additional protection against coolant leakage, to manifolds and CDUs at the rack level, you won’t have to shop around when you choose GIGABYTE for DLC. What’s more, not only does GIGABYTE boast a complete lineup of liquid-cooled servers, but we also offer racks such as the DL90-ST0 for fast and easy deployment. The star product is GIGABYTE’s GIGAPOD, a multi-rack cluster computing solution that bolsters the performance of AI chips through liquid cooling.
For data center operators who want to plan ahead and future-proof their infrastructure, immersion cooling is the logical next step after liquid cooling. By submerging the servers directly in a bath of nonconductive coolant, heat from the components can be dispelled through a CDU (which is the case for single-phase immersion cooling) or through the natural vaporization of the coolant inside the tank (two-phase immersion cooling). Immersion cooling can lower PUE to 1.02, which means only 2% more power than what’s used for computing is needed for cooling. However, the barrier to entry is that many aspects of the data center, from the physical infrastructure to safety permits, must be specially prepared for immersion cooling.
If immersion cooling is on your IT roadmap, there are two excellent reasons why you should choose GIGABYTE as your partner. One, GIGABYTE has a comprehensive product line which encompasses everything from immersion-ready servers to immersion tanks of both the two-phase and single-phase variety, available for EIA or OCP servers. Accessories such as the IT lift for lifting the servers vertically out of the tanks and IT dry racks for resting the servers on during maintenance, and verified coolant products for submerging the servers in, are also available from GIGABYTE and our partners.
Two, GIGABYTE has real experience in helping our customers set up immersion cooling in their data centers. Just to list a couple of examples, the Japanese telco giant KDDI opted for single-phase immersion for its “container-type immersion cooling small data centers”, while one of Taiwan’s foremost semiconductor giants chose two-phase immersion for its “green HPC data centers”. These success cases show that industry pioneers recognize immersion cooling as the best option if they want to get the most in compute performance, sustainability, and cost efficiency. They also show that GIGABYTE has the real-world expertise that’s necessary to deploy immersion cooling in different vertical sectors.
Better Air Cooling with GIGABYTE’s Proprietary Design and RDHx
While it is absolutely recommended that you consider liquid or immersion cooling to prepare for AI workloads, it is perfectly understandable that the process may be complicated, and so air-cooled servers will still be your primary source of compute for the foreseeable future. Fortunately, GIGABYTE’s air-cooled solutions are designed with top performance and optimal cooling in mind.
GIGABYTE’s air-cooled servers feature a proprietary airflow-friendly hardware design that provides the most optimized ventilation, as evaluated by simulation software. Powerful heat sinks are combined with specially designed air ducts to enhance heat dissipation. An automatic fan speed control system linked to sensors strategically placed inside the chassis adjusts fan speed according to the temperature of key components, allowing for unrivaled thermal control while remaining cost-effective and energy-efficient. RDHx can also be installed on air-cooled server racks to improve the data center’s performance and PUE.
Thank you for reading GIGABYTE’s Tech Guide on “How to Get Your Data Center Ready for AI? Part One: Advanced Cooling”. We hope this article has been helpful and informative. To read on to "Part Two: Cluster Computing", click here. For further consultation on how you can incorporate advanced cooling in your data center, we welcome you to reach out to our representatives at marketing@gigacomputing.com.