Data Center & Open Compute
The Open Compute Project (OCP) is a community that promotes the open sourcing of efficient hardware specifications and designs in response to the growing demand for data centers. Hardware that complies with the OCP's specifications and designs is said to bring about high processing power, economy, and also reduced power consumption for large data centers. We look here at the background to the expanding demand for data centers, the numerous issues faced by data centers, and the issues that OCP-compliant hardware will solve for those data centers.
The use of IT continues to expand and evolve every day. Needs are diversifying. Under such circumstances, data centers are facilities that house the servers, network devices, and other IT devices that provide various services. We look here at the background to the expansion in demand for data centers.
The increase in the use of cloud services has had an effect on the expansion in demand for data centers. The reasons given for the increase in the use of cloud services include an expansion in online business, an increase in smart devices, and the utilization of social media. Moreover, cloud streaming, cloud games, data collection from IoT devices, and other services are provided by cloud services in many cases. It is necessary to send, receive, and then process large amounts of data to handle those situations. This is the main factor in the expansion in demand for data centers.
AI data centers are data centers that provide artificial intelligence, machine learning, and other AI services. AI data centers that provide AI services are designed specifically for AI processing. For example, they are designed with high-speed networks, parallel processing capabilities, and large memories. In particular, generative AI's accuracy and quality improves the more it learns. Therefore, it is necessary to have a large language model (LLM) that can learn a large amount of data. Similar calculations must be repeated on a large amount of data for learning to build an LLM. This kind of calculation processing is performed by a graphics processing unit (GPU). It is also common for AI data centers to be equipped with dedicated hardware called an "AI accelerator" to realize even lower latency and higher speed processing. In recent years, the utilization of generative AI in business has been expanding. Accordingly, the demand for AI data centers is increasing.
The equipment in data centers that provide cloud services, AI services, and other services must be able to maintain stable and normal operation over the long term. The power equipment that supplies the power to that equipment is especially important. There is a need to build power systems with a high usage rate and no waste for the required power capacity. We describe here the issues faced by the power systems in data centers.
The amount of heat generated per unit area continues to rise rapidly in data centers as a result of the increase in thermal design power (TDP)*1 due to the higher performance of servers and also the increased density of the IT equipment installed. In particular, AI servers installed with GPUs and AI accelerators tend to require large amounts of power and thereby generate high heat. Heat exhaust problems caused by higher density and power consumption increase the risk of server failures and malfunctions. Therefore, cooling is essential. However, the installation of cooling equipment and the increase in the power consumption required to operate that cooling equipment have led to the associated operating costs and environmental impact becoming major issues.
*1 Thermal design power (TDP): This is the amount of heat at the time of maximum power consumption.
Hot spots have become an issue in data centers. This is a phenomenon in which the temperature rises locally due to the operation of IT equipment. There is no choice but to cool the entire data center with an air-conditioning system that cannot cool only hot spots. That results in the consumption of excess power. If hot spots are neglected, the risk of server performance degradation and failure rises.
Power outages and power flickers cannot be tolerated in the power equipment in data centers that provide reliable and continuous services. High quality and highly reliable performance are required. Therefore, there is a need to have high maintainability so that data centers can continue to operate even at the time of commercial power outages, power flickers, voltage fluctuations, maintenance, expansion work, and failures. However, the IT equipment is placed at a high density. As a result, there is an issue in that the wiring clusters together to lead to reduced maintainability.
There have been cases of data centers where the capacity of the air-conditioning and power equipment could not keep up with the increase in servers, thus making it difficult to add racks. Moreover, the load on the floor increases even if there is room for more equipment in racks where servers are concentrated. In many cases, accordingly, the structure of the building means it is not possible to install additional racks. Therefore, it is necessary to carefully consider the selection of an efficient air-conditioning system and the layout plan for outdoor equipment. That means that insufficient space has become a serious issue.
We describe here how OCP-compliant power systems solve issues such as heat exhaust, cooling of hot spots, maintenance, and insufficient space.
21-inch open racks with OCP specifications are used as the racks for OCP-compliant data centers. Compared to conventional 19-inch racks, the height of one unit is higher with 21-inch racks. This makes it possible to increase the number of servers and storage units that can be installed in the racks. That contributes to space saving. 21-inch racks are measured using a unit called ""1OU (open unit)"" to distinguish them from 19-inch racks.
High heat exhaust performance is essential to prevent servers from overheating. Switching from a conventional distributed power supply system (Fig. 1) to a centralized power supply system (Fig. 2) makes it possible to separate the power unit, which is a major heat source, from the servers and to optimally manage the temperature of each. In particular, the OCP's optimal temperature management is effective in improving the heat exhaust performance of AI servers for which there are concerns about high heat generation due to GPUs and AI accelerators.
In addition, although air-conditioning is commonly used to cool servers, it has been pointed out that this method has limitations in terms of its cooling capacity. Accordingly, immersion cooling has been attracting attention as a more efficient cooling method in recent years. This method of cooling involves immersing servers in a special liquid (non-conductive dielectric liquid) to cool them.
It is possible to operate centralized power supply systems in a load range with higher conversion efficiency. This makes it possible to operate the power with high efficiency overall. Moreover, it is possible to simultaneously supply power to the cooling distribution unit (CDU) in addition to the server. This can suppress the amount of power consumed by the racks. That leads to a reduction in the cost of operating the data center. Another benefit is that it is possible to install even more servers with respect to the limited power capacity of the data center.
OCP-compliant servers and storage units are designed with consideration for operability during maintenance. Servers do not have power cables; instead, there is a structure in which the servers are directly connected to the power bar (bus bar) to supply power. This leads to a state in which it is possible to supply power at the same time as the servers are attached to the rack. It is also possible to simplify the layout of the wiring. Moreover, blind-mate plugs can be used. These are plugs that can be mated correctly even when the mating position cannot be confirmed. This is a maintenance-friendly design. Therefore, it reduces the burden of equipment maintenance work and contributes to maintaining stable operation.
In addition to highly efficient power systems, data centers require functions such as high heat exhaust performance, maintainability, and also space saving. The OCP is a technology to achieve these functions.
Murata Manufacturing's OCP-compliant centralized power system has adopted a high-energy density power supply unit (PSU) that realizes compactness, space saving, and high-power efficiency. This also contributes to improving the efficiency of heat exhaust treatment. In addition, hot swapping is possible. Therefore, it is possible to quickly recover from various troubles. The power shelf is compatible with 19-inch and 21-inch racks. That means it can store IT equipment of various sizes. It also supports Delta, Wye, Single, and other inputs. Furthermore, a wide range of accessories are available including input cables, power distribution units (PDUs), and mounting kits. Accordingly, they can be installed in various environments.
These features utilize the benefits of the OCP to the maximum. At the same time, the system has the functions to be able to solve the various issues faced by power systems for data centers. The OCP is expected to be an important initiative that supports the growing demand for data centers.