Data Center & Open Compute
Digital transformation (DX) and green transformation (GX) are two major trends that modern society is addressing. Data centers, infrastructure that supports the digital society, play an extremely large role in pressing forward with DX and GX (Fig. 1).
It is essential to share and commonly use the huge storage and calculation powers over the whole of society to aggregate the diverse and vast amounts of data handled in daily life, business, and other settings and to extract the knowledge that leads to solving issues by using artificial intelligence (AI) and other technologies while putting DX into practice. It is also necessary to meticulously manage and control the movements of people, things, devices, equipment, and other systems to discover and eliminate the waste in energy consumption present in daily life, production activities, logistics and transportation, and social activities as a part of GX. Data centers are solely responsible for the information processing essential to put these DX and GX into practice.
Data centers are essential for putting DX and GX into practice. However, there is also a dilemma that needs to be solved. Power consumption is rapidly rising as data center capabilities are improving and sites are expanding. As a result, the sustainability of data centers themselves is at risk. We explain in this article the technologies to reduce power consumption in data centers and effective heat dissipation technologies that are a prerequisite for further improvements in the capabilities of data centers.
The increase in power consumption by data centers has already surfaced as a social problem. According to the results of an investigation published by a global statistical data company, there are more than 9,000 data centers around the world as of March 2024. The number of data centers being established is expected to steadily increase in the future. The total annual power consumption has been doubling roughly every four years as the number of data centers increase. In the United States, where the greatest number of data centers has been established, data centers are becoming larger in size. There have even been cases of data centers at the size of 500 MW being established. This is equivalent to the power demand of a city with a population of 1 million. This means that we are entering a period when it will be necessary to think about establishing data centers and power plants as a set.
In addition, the trend toward increasing power consumption is being accelerated by the increase in the processing of tasks that require enormous calculation performance such as in training generative AI. It is said that training generative AI consumes 10 to 20 times the power of the workload of corporate operations processed by cloud services. Moreover, there are many examples of servers that perform AI processing being equipped with advanced graphics processing units (GPUs). There has also been the debut of servers among them that consume 3.7 MWh a year with one chip. This is an amount equivalent to the power consumption for a year and a half of a general household.
The enormous amount of power consumed by GPUs and other advanced electrical and electronic circuits is converted into heat in proportion to the amount consumed. This heat then causes harmful effects that hinder the stable operation of the electrical and electronic circuits themselves. If the rise in temperature due to heat generation is left unchecked, there is a risk that semiconductors may stop operating properly. Therefore, powerful air-conditioning equipment is being introduced into data centers to cool the servers. At present, the amount of power consumed by the air-conditioning equipment for cooling itself accounts for 30% to 50% of the total power consumption by data centers (Fig. 2). The high-density layout of servers is rapidly progressing in today's data centers to meet the increase in demand for information processing. As a result, the cooling effectiveness tends to decrease. This has led to strong demand for the realization of even more highly efficient cooling systems through the introduction of new technologies.
Measures from diverse approaches are now being introduced to reduce power consumption and improve heat dissipation efficiency in newly established data centers in anticipation of the increase in power consumption and heat generation. We look here at some leading initiatives (Fig. 3).
First, there has been active progress in improving efficiency with a focus on cooling systems that account for a large proportion of the power consumed in data centers. A server's workload can vary greatly depending on its usage situation. Therefore, it is necessary to introduce a mechanism to control operation according to the amount of heat generated in the system that cools the server. It is possible to reduce power consumption by controlling the system so that it only cools the server when necessary instead of constantly cooling it. Precise and dynamic temperature management that effectively utilizes temperature sensors and other devices is required when performing such control.
Moreover, in recent years, there has been an increasing number of cases in which the use of liquid cooling systems with a cooling efficiency approximately 30% greater than air cooling is being combined with conventional air cooling systems. Rather than using liquid cooling everywhere in these combined cooling systems, sensors detect the areas that generate an especially large amount of heat (hot spots) and then adjust the air flow of the air cooling system to those hot spots. This makes it possible to minimize power consumption while stably operating the server.
Another slightly different approach involves attempting to distribute the respective loads of numerous servers installed in data centers from the perspective of improving cooling efficiency. This method involves utilizing AI to predict the servers on which the processing load is likely to become concentrated and the servers that are likely to reach a high temperature locally and then distributing the load in advance so that it is possible to deal with the heat with the minimum amount of cooling. There is also a method being put into practice in which this AI and a cooling system are combined to adjust the amount of airflow before the temperature rises too high to ensure that it is possible to deal with the heat with the minimum operation of the cooling system.
Furthermore, in recent years, it has also become possible to see attempts to introduce liquid immersion cooling systems in anticipation of the increase in the amount of heat generated due to further improvements in calculation power. These systems involve the servers themselves being used after being immersed in a fluorine-based inert liquid, silicon oil, or other substance that combines both insulating properties and high cooling efficiency. A further development is that we have also been seeing the emergence of cases in which the data centers themselves are being established in cold regions with the aim of efficiently cooling them at low power by using the outside air. Development is also underway at the R&D level to develop technologies that cool data center equipment by sealing and installing it underwater.
New technologies are also being developed and introduced to improve the efficiency of the power supply systems that supply the power essential to the operation of servers.
A typical data center receives 6,600 V of AC power from the power grid. Meanwhile, large sites receive 22,000 V of AC power, known as extra-high voltage power, from the power grid. This power is then repeatedly converted many times to operate the equipment installed inside the data center such as the server and air-conditioning (Fig. 4). For example, transistors in a server's CPU operate on extremely low-voltage DC power of less than 1 V in the most advanced chips. Before the received high-voltage power is converted into the power to drive the transistors, the power is converted multiple times in the on-site power grid, inside the uninterruptible power supply (UPS), at the entrance of the server rack, and on the server board and inside the semiconductor chip. A certain amount of power is lost in each conversion process. These losses lead to wasteful consumption.
In recent years, there has been a growing trend to replace the power semiconductors that constitute the power conversion circuits in these power supply systems to chips based on new semiconductor materials with higher power efficiency using silicon carbide (SiC), gallium nitride (GaN), etc. in place of conventional silicon-based chips. The switching frequency inside the conversion circuits increases by utilizing these new material-based power semiconductors. That allows the utilization of small capacitors, transformers, and other components. There is now a need for smaller passive components capable of handling large currents to realize highly efficient and smaller power supplies.
Previously, the means of suppressing power consumption in data centers were focused on measures in cooling systems and power supply systems. Meanwhile, measures for heat-generating sources such as CPUs, GPUs, and other processors, memories, and similar components were considered off-limits and left untouched. That was because improving performance was the top priority. However, it is expected that the load of AI-related processing tasks will increase dramatically in the future. Under these circumstances, there is now a need for drastic measures to reduce power consumption in these semiconductors as well.
It is anticipated that there will be a sharp increase in demand for calculation processing in AI-related processing in data centers in the future. This type of processing is characterized by the frequency and volume of data transfer between the processors and memories being noticeably higher than other tasks. Technology is also being introduced to stack processors and memories in three dimensions and to shorten the length of the wiring connecting the two with the aim of reducing the power consumption associated with AI-related processing. In recent years, a technology called "chiplet" has been attracting attention as a mounting technology for large-scale cutting-edge semiconductors. This is a technology for which large chips are deliberately separated into multiple small chips (chiplets) and then integrated into the package. This technology for chiplets is also expected to be applied to 3D stacking aimed at reducing the power consumption in AI-related processing.
Furthermore, attempts are underway to connect processors and memories using optical communication technology. It is thought this technology will be put into practical use in the not-too-distant future. The medium for transmitting data is electrical signals on current server boards. Therefore, loss occurs due to wiring resistance and charging/discharging. That results in power consumption. If the medium is changed to optical communications, it would enable data transmission to be sped up and for power consumption during transfer to be minimized. However, elements that convert electrical signals into optical signals will become newly necessary. Accordingly, new technologies are required for things such as improving the conversion efficiency and mounting the conversion elements in this section. Currently, technological development is actively ongoing in this area.
It is now no longer possible to talk about improving the processing power of data centers without looking at reducing power consumption and putting heat dissipation measures into practice. We can expect to see technological developments and the introduction of innovative technologies one after another in this area.