[go: up one dir, main page]

CN111601490B - Reinforced learning control method for data center active ventilation floor - Google Patents

Reinforced learning control method for data center active ventilation floor Download PDF

Info

Publication number
CN111601490B
CN111601490B CN202010456237.6A CN202010456237A CN111601490B CN 111601490 B CN111601490 B CN 111601490B CN 202010456237 A CN202010456237 A CN 202010456237A CN 111601490 B CN111601490 B CN 111601490B
Authority
CN
China
Prior art keywords
rack
time
value
active ventilation
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010456237.6A
Other languages
Chinese (zh)
Other versions
CN111601490A (en
Inventor
万剑雄
周杰
熊伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN202010456237.6A priority Critical patent/CN111601490B/en
Publication of CN111601490A publication Critical patent/CN111601490A/en
Application granted granted Critical
Publication of CN111601490B publication Critical patent/CN111601490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H05ELECTRIC TECHNIQUES NOT OTHERWISE PROVIDED FOR
    • H05KPRINTED CIRCUITS; CASINGS OR CONSTRUCTIONAL DETAILS OF ELECTRIC APPARATUS; MANUFACTURE OF ASSEMBLAGES OF ELECTRICAL COMPONENTS
    • H05K7/00Constructional details common to different types of electric apparatus
    • H05K7/20Modifications to facilitate cooling, ventilating, or heating
    • H05K7/20709Modifications to facilitate cooling, ventilating, or heating for server racks or cabinets; for data centers, e.g. 19-inch computer racks
    • H05K7/20836Thermal management, e.g. server temperature control
    • HELECTRICITY
    • H05ELECTRIC TECHNIQUES NOT OTHERWISE PROVIDED FOR
    • H05KPRINTED CIRCUITS; CASINGS OR CONSTRUCTIONAL DETAILS OF ELECTRIC APPARATUS; MANUFACTURE OF ASSEMBLAGES OF ELECTRICAL COMPONENTS
    • H05K7/00Constructional details common to different types of electric apparatus
    • H05K7/20Modifications to facilitate cooling, ventilating, or heating
    • H05K7/20709Modifications to facilitate cooling, ventilating, or heating for server racks or cabinets; for data centers, e.g. 19-inch computer racks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Thermal Sciences (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

A reinforcement learning control method for an active ventilation floor of a data center is characterized in that a Markov decision process model is established for the problem of a rack hotspot of a lifting floor structure data center, a reinforcement learning model solving algorithm is provided as the core of the reinforcement learning control algorithm, the rotating speed of a fan of the active ventilation floor (a floor with the fan attached to the back of a common ventilation floor) is intelligently controlled according to the current rack temperature distribution on the premise of not improving the air conditioning power of a machine room, and the rack inlet temperature distribution is homogenized by the mode of actively conveying sufficient cold air, so that the problem of the rack hotspot ubiquitous in the data center of the lifting floor structure is solved, the refrigeration energy consumption is saved, and the safety and the stability of a server are ensured. Compared with the existing data center rack-level airflow management method, the method is easier to deploy, more cost-effective and stronger in universality.

Description

数据中心主动通风地板的强化学习控制方法Reinforcement Learning Control Method for Active Ventilation Floors in Data Centers

技术领域technical field

本发明属于自动控制技术领域,特别涉及数据中心主动通风地板的强化学习控制方法。The invention belongs to the technical field of automatic control, and particularly relates to a reinforcement learning control method for an active ventilation floor of a data center.

背景技术Background technique

机架热点,即数据中心机房机架某一个或几个位置,温度明显高于其他位置温度的高温点。过高的温度会导致数据中心某些服务器工作效率降低,进而降低其整体功率密度,同时也会降低其可靠性,这显然与数据中心的需求相悖。Rack hotspots are high-temperature spots where the temperature of one or several locations on the data center rack is significantly higher than that of other locations. Excessive temperatures can cause some servers in the data center to work less efficiently, thereby reducing their overall power density and reducing their reliability, which is obviously contrary to the needs of the data center.

采用全局调控的方式进行缓解或消除机架热点,例如提升机房空调功率以提供足量冷气,必然会导致大部分机架区域处于过度制冷状态,在造成制冷资源浪费的同时,使得数据中心总能耗中占比近半的制冷能耗更加巨大。因此,机架级制冷方案更适合于缓解机架热点问题。Using global regulation to alleviate or eliminate rack hot spots, such as increasing the power of the air conditioner in the equipment room to provide sufficient cooling air, will inevitably lead to excessive cooling in most of the rack areas, resulting in waste of cooling resources and at the same time making the data center always available. The cooling energy consumption, which accounts for nearly half of the consumption, is even more huge. Therefore, rack-level cooling solutions are more suitable for alleviating rack hotspot issues.

目前已有机架级制冷方案,例如安装自适应通风地板、安装挡板、封闭单个机架并为其设置通风管等。但这些方案皆为“被动式”制冷方案,不能主动为机架提供冷气流,当冷气供应不足时,这些方案都无能为力。Rack-level cooling solutions exist, such as installing adaptive ventilation floors, installing baffles, and enclosing and ducting individual racks. However, these solutions are all "passive" cooling solutions, which cannot actively provide cold airflow to the racks. When the cooling air supply is insufficient, these solutions are powerless.

主动通风地板作为另一种机架级制冷方案,通过主动输送冷气的方式缓解机架热点问题,相较于上述方案更容易部署,更具成本效益,但其控制的难点主要在于其放置环境的多样性与动态性,例如机房空调、机架相对位置以及机架内部服务器分布不同;冷、热通道封闭状态不同,服务器机架标准和密封情况不同;机房空调功率、不同机架服务器的热负载不同,等等。因此,数据中心的热能效与气流模型,一般难以用解析模型进行描述。As another rack-level cooling solution, active ventilation floors can alleviate the problem of rack hot spots by actively transporting cold air. Compared with the above solutions, it is easier to deploy and more cost-effective, but the difficulty of its control mainly lies in the placement environment. Diversity and dynamism, such as different computer room air conditioners, relative positions of racks, and server distribution inside the rack; different closed states of cold and hot aisles, different server rack standards and sealing conditions; computer room air conditioner power, heat load of servers in different racks different, wait. Therefore, the thermal energy efficiency and airflow models of data centers are generally difficult to describe with analytical models.

现有的主动通风地板相关研究大多是基于测量或仿真的性能建模和评估,目前还没有主动通风地板控制问题的研究文献。Most of the existing active ventilation floor related research is based on measurement or simulation performance modeling and evaluation, and there is no research literature on the control problem of active ventilation floor.

发明内容SUMMARY OF THE INVENTION

为了克服上述现有技术的缺点,本发明的目的在于提供一种数据中心主动通风地板的强化学习控制方法,在不提升机房空调功率的前提下,自动学习最优运行策略,规划机架气流,使机架温度分布均匀化,缓解机架热点问题。且不必建立和校准复杂气流和热交换模型,从而提高主动通风地板的普适性。In order to overcome the above-mentioned shortcomings of the prior art, the purpose of the present invention is to provide a reinforcement learning control method for an active ventilation floor of a data center, which can automatically learn the optimal operation strategy and plan the rack airflow without increasing the power of the air conditioner in the computer room. Uniform rack temperature distribution and alleviate rack hot spots. And there is no need to build and calibrate complex airflow and heat exchange models, thereby improving the universality of active ventilation floors.

为了实现上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical scheme adopted in the present invention is:

一种数据中心主动通风地板的强化学习控制方法,对抬升地板结构数据中心的机架热点问题建立马尔可夫决策过程模型,并提供一种强化学习模型求解算法,阵列式算法,作为强化学习控制算法的核心。所述模型由系统状态、行为、奖励和价值函数四部分组成,所述模型的解为,在一系列系统状态下不断选择最优行为,使得系统累计奖励最大化,所述强化学习控制算法,利用机架入风口温度分布是否均匀以及主动通风地板能耗是否较低作为评价标准,通过不断探索和学习PWM信号占空比值与该值升高、降低或者维持不变之间的复杂关系,调节主动通风地板风扇转速,使得机架入风口温度分布均匀化,缓解机架热点问题。A reinforcement learning control method for active ventilation floors of data centers, establishing a Markov decision process model for the rack hotspot problem of a data center with a raised floor structure, and providing a reinforcement learning model solving algorithm, an array algorithm, as reinforcement learning control. The heart of the algorithm. The model consists of four parts: system state, behavior, reward and value function. The solution of the model is to continuously select the optimal behavior under a series of system states to maximize the cumulative reward of the system. The reinforcement learning control algorithm , using whether the temperature distribution of the air inlet of the rack is uniform and whether the energy consumption of the active ventilation floor is low as the evaluation criteria, by continuously exploring and learning the complex relationship between the PWM signal duty cycle value and the increase, decrease or maintenance of this value, Adjust the fan speed of the active ventilation floor to make the temperature distribution of the air inlet of the rack uniform and alleviate the problem of rack hot spots.

与现有技术相比,本发明的有益效果是:Compared with the prior art, the beneficial effects of the present invention are:

本发明不必建立和校准复杂的气流和热交换模型,使用阵列式控制算法,克服主动通风地板放置环境的多样性和动态性,根据机架入风口温度分布是否均匀以及主动通风地板能耗,自动匹配PWM信号占空比值与该值升高、降低或维持不变之间的关系,只需要将原普通通风地板置换为运行本发明的主动通风地板,本发明即可自主运行,找到最优PWM信号占空比值,调节主动通风地板转速,改善机架入风口温度分布,缓解机架热点问题,相比其他方案,本发明普适性更高,更易部署,更具成本效益。The present invention does not need to establish and calibrate complex air flow and heat exchange models, uses an array control algorithm, overcomes the diversity and dynamics of the placement environment of the active ventilation floor, automatically To match the relationship between the duty cycle value of the PWM signal and the value of increasing, decreasing or maintaining the same value, it is only necessary to replace the original ordinary ventilated floor with the active ventilated floor running the present invention, and the present invention can operate autonomously and find the optimal PWM Compared with other solutions, the present invention is more universal, easier to deploy, and more cost-effective than other solutions.

相较于使用三种智能算法的智能控制方法,使用阵列式算法的强化学习控制方法更加简单,所需计算资源开销较小。Compared with the intelligent control method using three intelligent algorithms, the reinforcement learning control method using the array algorithm is simpler and requires less computational resources.

相较于使用阵列式算法的强化学习控制方法,使用所述三种智能算法的智能控制方法关于状态和行为的定义对解决热点问题更加直接有效,且非离散化的状态定义以及对Q函数的近似,强化了智能控制方法的普适性。Compared with the reinforcement learning control method using the array algorithm, the intelligent control method using the three intelligent algorithms has a more direct and effective definition of states and behaviors for solving hotspot problems, and the non-discrete state definition and the Q-function definition are more direct and effective. The approximation strengthens the universality of the intelligent control method.

附图说明Description of drawings

图1为主动通风地板设计及部署图。图中标号1为温度传感器,2为机架,3为微控制器,4为驱动板,5为开关电源,6为PC,7为主动通风地板。Figure 1 is an active ventilation floor design and deployment diagram. Reference numeral 1 in the figure is a temperature sensor, 2 is a rack, 3 is a microcontroller, 4 is a drive board, 5 is a switching power supply, 6 is a PC, and 7 is an active ventilation floor.

具体实施方式Detailed ways

下面结合附图和实施例详细说明本发明的实施方式。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples.

图1为本发明的详细部署实施示意图,一定数量的温度传感器一1均匀分布在机架2入风口处,监测机架2入风口温度分布,同时在主动通风地板下另设一个温度传感器二,监测主动通风地板下送风温度。1 is a schematic diagram of the detailed deployment implementation of the present invention. A certain number of temperature sensors 1 are evenly distributed at the air inlets of rack 2 to monitor the temperature distribution of the air inlets of rack 2. At the same time, another temperature sensor 2 is installed under the active ventilation floor. Monitors actively ventilated underfloor supply air temperature.

本领域中,机架2是一个长方体铁盒子,里面放一定数量的服务器,许多机架一排一排摆放。在某一排机架中,一般某一机架左右面板与其他机架紧贴,机架前面板即为入风口,用来吸冷气制冷服务器,机架后面板为出风口,用来排出制冷后的热气,监测机架入风口温度分布即监测机架前面板某些位置的温度,这些位置的温度组成了机架入风口温度分布,因此温度传感器一1的个数取决于这些位置的数量。In the art, the rack 2 is a rectangular iron box in which a certain number of servers are placed, and many racks are placed in a row. In a row of racks, the left and right panels of a rack are generally close to other racks. The front panel of the rack is the air inlet, which is used to suck in cold air to cool the server, and the rear panel of the rack is the air outlet, which is used to discharge cooling. After the hot air, monitoring the temperature distribution of the air inlet of the rack is to monitor the temperature of certain positions on the front panel of the rack. The temperature of these positions constitutes the temperature distribution of the air inlet of the rack. Therefore, the number of temperature sensors 1 depends on the number of these positions. .

本发明主动通风地板强化学习控制方法运行于PC端,PC6与微控制器3连接,微控制器3连接驱动板4,驱动板4在连接开关电源5(12V,20A)后与主动通风地板风扇7连接。根据温度传感器一1传回的温度分布,产生PWM信号的占空比值,并传给微控制器3,微控制器3据此占空比值,产生相应PWM信号,传输给驱动板4,驱动板4根据PWM信号控制开关电源5提供给主动通风地板风扇7的电压,通过控制风扇供电电压,达到调节风扇转速的目的。The active ventilation floor reinforcement learning control method of the present invention runs on the PC side, the PC6 is connected to the microcontroller 3, the microcontroller 3 is connected to the driving board 4, and the driving board 4 is connected to the active ventilation floor fan after connecting the switching power supply 5 (12V, 20A). 7 Connections. According to the temperature distribution returned by the temperature sensor 1, the duty cycle value of the PWM signal is generated and transmitted to the microcontroller 3. The microcontroller 3 generates the corresponding PWM signal according to the duty cycle value and transmits it to the driver board 4. The driver board 4. Control the voltage provided by the switching power supply 5 to the active ventilation floor fan 7 according to the PWM signal, and achieve the purpose of adjusting the fan speed by controlling the fan power supply voltage.

控制方法包括以下部分:The control method includes the following parts:

1、对抬升地板结构(数据中心的送风结构,数据中心机房地板被架高,留出60-100cm高的地板下空间用于机房空调输送冷气,这种结构即为抬升地板结构,目前国内大部分数据中心均采用这种构造)数据中心的机架热点问题建立马尔可夫决策过程模型,由以下ABCD四部分组成:1. For the raised floor structure (the air supply structure of the data center, the floor of the data center computer room is raised, leaving a 60-100cm high under-floor space for the air conditioner of the computer room to deliver cold air. This structure is the raised floor structure. At present, domestic Most data centers use this structure) The rack hotspot problem of the data center establishes a Markov decision process model, which consists of the following four parts: ABCD:

A系统状态st,定义为离散化的PWM信号方波占空比,公式如下:A system state s t is defined as the duty cycle of the discretized PWM signal square wave, the formula is as follows:

Figure GDA0003657059700000041
Figure GDA0003657059700000041

st为t时刻系统状态,

Figure GDA0003657059700000042
为状态空间,s为
Figure GDA0003657059700000043
中的某一系统状态,DC为PWM信号方波占空比数值,max(DC)为DC最大值,DTQ为DC离散化等分比,k表示某个状态中DTQ的个数。s t is the system state at time t,
Figure GDA0003657059700000042
is the state space, and s is
Figure GDA0003657059700000043
For a certain system state in , DC is the duty cycle value of the square wave of the PWM signal, max(DC) is the maximum value of DC, D TQ is the equal division ratio of DC discretization, and k represents the number of D TQ in a certain state.

B系统行为空间

Figure GDA0003657059700000044
定义为主动通风地板风扇转速的变化,即
Figure GDA0003657059700000045
B system behavior space
Figure GDA0003657059700000044
Defined as the change in the fan speed of the active ventilation floor, i.e.
Figure GDA0003657059700000045

C奖励Rt+1,由机架入风口温度分布均匀程度的量化指标及主动通风地板风扇能耗两部分构成,其公式为:The C reward R t+1 is composed of the quantitative index of the uniformity of the temperature distribution of the air inlet of the rack and the energy consumption of the active ventilation floor fan. The formula is:

Figure GDA0003657059700000046
Figure GDA0003657059700000046

其中Rt+1为t时刻系统采取某行为后所得的奖励,

Figure GDA0003657059700000047
表示机架入风口温度分布均匀程度,该式值全为负,越接近0,表明机架入风口温度分布越均匀,Tt,i为t时刻编号为i的温度传感器一的温度读数,
Figure GDA0003657059700000048
为t时刻机架参考温度,
Figure GDA0003657059700000049
Tt,under为t时刻温度传感器二的读数,ΔT为根据主动通风地板上下冷热气流混合程度设置的固定温度差,为正数,
Figure GDA00036570597000000410
为温度传感器一的集合,
Figure GDA00036570597000000411
为温度传感器一的总数;-(Aref×DCt)3表示主动通风地板风扇能耗,该式的值全为负,越接近0,表明风扇能耗越低,其中Aref为保持与机架入风口温度分布均匀程度同一量级的参考行为值,DCt为t时刻PWM信号方波占空比。where R t+1 is the reward obtained by the system after taking a certain behavior at time t,
Figure GDA0003657059700000047
Indicates the uniformity of the temperature distribution of the air inlet of the rack. The value of this formula is all negative. The closer to 0, the more uniform the temperature distribution of the air inlet of the rack is. T t,i is the temperature reading of the temperature sensor number i at time t.
Figure GDA0003657059700000048
is the rack reference temperature at time t,
Figure GDA0003657059700000049
T t,under is the reading of temperature sensor 2 at time t, Δ T is the fixed temperature difference set according to the mixing degree of the hot and cold air above and below the active ventilation floor, which is a positive number,
Figure GDA00036570597000000410
is a set of temperature sensors,
Figure GDA00036570597000000411
is the total number of temperature sensors 1; -(A ref ×DC t ) 3 represents the energy consumption of the active ventilation floor fan, the values of this formula are all negative, the closer to 0, the lower the fan energy consumption, where A ref is the maintenance and the machine The reference behavior value of the same magnitude of the uniformity of the air inlet temperature distribution, DC t is the duty cycle of the square wave of the PWM signal at time t.

D价值函数Q(st,at),为行为价值函数,其公式为:D value function Q(s t , at t ) is a behavioral value function, and its formula is:

Figure GDA0003657059700000051
Figure GDA0003657059700000051

其中价值函数Q(s,a)称为Q函数,

Figure GDA0003657059700000052
为t时刻系统采取的行为,
Figure GDA0003657059700000053
为期望函数,y为相对于t时刻的未来时刻,Rt+y+1表示系统在t+y时刻采取行为后获得的奖励,γ表示衰减因子,表示模型对未来奖励(环境影响)的重视程度,0≤γ<1,γy为γ的y次方,是t+y时刻Rt+y+1的衰减因子。where the value function Q(s, a) is called the Q function,
Figure GDA0003657059700000052
is the action taken by the system at time t,
Figure GDA0003657059700000053
is the expectation function, y is the future time relative to time t, R t+y+1 represents the reward obtained by the system after taking action at time t+y, γ represents the decay factor, which represents the importance of the model to the future reward (environmental impact) degree, 0≤γ<1, γy is the y power of γ, which is the attenuation factor of R t+y+1 at time t+y.

E马尔可夫决策过程模型可以被总结为,在任意t时刻系统状态下,通过选择最优行为,使得累计奖励最大化,其模型公式为:The E-Markov decision process model can be summarized as, in the system state at any time t, by selecting the optimal behavior to maximize the cumulative reward, the model formula is:

Figure GDA0003657059700000054
Figure GDA0003657059700000054

约束于bound to

Figure GDA0003657059700000055
Figure GDA0003657059700000055

γt是t时刻系Rt+1的衰减因子。γ t is the decay factor of the system R t+1 at time t.

2、模型的解及求解算法2. Model solution and solution algorithm

a模型的解,计算得到最优Q函数,即可根据最优Q函数在任意t时刻系统状态下选择最优行为,使累计奖励最大化,最优Q函数计算公式为:The solution of model a can be calculated to obtain the optimal Q function, and then the optimal behavior can be selected according to the optimal Q function in the system state at any time t to maximize the cumulative reward. The calculation formula of the optimal Q function is:

Figure GDA0003657059700000056
Figure GDA0003657059700000056

在任意t时刻,最优行为选择公式为:At any time t, the optimal behavior selection formula is:

Figure GDA0003657059700000057
Figure GDA0003657059700000057

其中Q*(st,at)表示最优Q函数,st+1表示t+1时刻系统状态,a表示在t+1时刻系统可能采取的所有行为中的任一行为,亦即行为空间

Figure GDA0003657059700000058
中的某一行为,
Figure GDA0003657059700000059
表示在st+1状态下,系统采取任意一个
Figure GDA00036570597000000510
中的行为,能得到的最大的最优Q函数值。where Q * (s t , at ) represents the optimal Q function, s t +1 represents the state of the system at time t+1, and a represents any of all actions that the system may take at time t+1, that is, behavior space
Figure GDA0003657059700000058
an act in
Figure GDA0003657059700000059
Indicates that in the state of s t+1 , the system takes any one
Figure GDA00036570597000000510
The behavior in , the maximum optimal Q-function value that can be obtained.

b求解算法即为,计算得到最优Q函数并在决策中选择选择最优行为,使得累计奖励最大化。强化学习模型求解算法为阵列式算法,采用二维阵列(行索引为状态,列索引为行为)存储所述Q函数,通过计算Q样本值Qt+1,target与Q查询值Qt(st,at)之差δt+1,迭代更新阵列中的Q值,计算最优Q函数,进而通过查询阵列选择最优行为,使得所述模型的累计奖励最大化。其中Q样本值根据最优Q函数计算公式,以及实时系统所得Rt+1和st+1计算得到,Q查询值为根据系统实时所得st和at,到二维阵列中对应行列查询所得值。The b solution algorithm is to calculate the optimal Q function and select the optimal behavior in the decision-making, so as to maximize the cumulative reward. The reinforcement learning model solving algorithm is an array algorithm, using a two-dimensional array (row index is the state, column index is the behavior) to store the Q function, by calculating the Q sample value Q t+1, target and Q query value Q t (s t , at t ) difference δ t+1 , iteratively update the Q value in the array, calculate the optimal Q function, and then select the optimal behavior by querying the array to maximize the cumulative reward of the model. The Q sample value is calculated according to the optimal Q function calculation formula and R t+1 and s t+1 obtained by the real-time system, and the Q query value is obtained according to the real-time s t and at of the system, and the corresponding row and column query in the two-dimensional array obtained value.

Q样本值计算公式如下:The formula for calculating the sample value of Q is as follows:

Figure GDA0003657059700000061
Figure GDA0003657059700000061

其中

Figure GDA0003657059700000062
为t时刻所述二维阵列st+1对应行中最大的Q查询值,阵列更新方式为:in
Figure GDA0003657059700000062
is the largest Q query value in the row corresponding to the two-dimensional array s t+1 at time t, and the array update method is:

Figure GDA0003657059700000063
Figure GDA0003657059700000063

其中Qt(st,at)为t时刻二维阵列中st和at对应的Q查询值,Qt+1(st,at)为t+1时刻二维阵列中st和at对应的Q查询值,β(st,at)∈[0,1]为阵列中每个状态-行为对对应的学习步长。where Q t (s t , at t ) is the Q query value corresponding to s t and at t in the two-dimensional array at time t, and Q t+1 (s t , at t ) is the s t in the two-dimensional array at time t+1 Q query value corresponding to a t , β(s t , at t )∈[0,1] is the learning step size corresponding to each state-action pair in the array.

3,采用强化学习模型求解算法对所述模型求解,利用机架入风口温度分布是否均匀以及主动通风地板能耗是否较低作为评价标准,通过不断探索和学习PWM信号占空比值与该值升高、降低或者维持不变之间的复杂关系,调节主动通风地板风扇转速,使得机架入风口温度分布均匀化,缓解机架热点问题。其在PC端的运行逻辑如下:3. Use the reinforcement learning model solving algorithm to solve the model, using whether the temperature distribution of the air inlet of the rack is uniform and whether the energy consumption of the active ventilation floor is low as the evaluation criteria, through continuous exploration and learning of the PWM signal duty cycle value and this value increase. The complex relationship between high, low or unchanged, adjust the active ventilation floor fan speed, make the temperature distribution of the rack air inlet uniform, and alleviate the problem of rack hot spots. Its operation logic on the PC side is as follows:

1:设置参考温度

Figure GDA0003657059700000064
初始化β(st,at);初始化所述阵列;1: Set the reference temperature
Figure GDA0003657059700000064
initialize β( s t , at ); initialize the array;

2:设置初始时刻t=0;探索概率变化区间random_slots;初始行为探索概率ε,探索率随t减少量Δε,最小探索概率εmin2: Set the initial time t=0; explore the probability change interval random_slots; initial behavior exploration probability ε, the exploration rate decreases with t Δ ε , the minimum exploration probability ε min ;

3:选取初始状态s0=max(DC);3: Select the initial state s 0 =max(DC);

4:循环体开始4: The loop body starts

5:若t小于random_slots,行为从行为空间随机选择并转7,否则转6;5: If t is less than random_slots, the behavior is randomly selected from the behavior space and turned to 7, otherwise it is turned to 6;

6:探索概率ε取ε-Δε和εmin中的最小值,并根据以下公式选择行为:6: The exploration probability ε takes the minimum of ε- Δε and ε min and chooses the behavior according to the following formula:

Figure GDA0003657059700000071
Figure GDA0003657059700000071

7:执行at(PC发送占空比指令到微控制器),并获得系统下一状态st+1(PC发送温度请求指令获得机架温度分布),根据奖励公式计算Rt+17: Execute at (the PC sends the duty cycle command to the microcontroller), and obtain the next state of the system s t +1 (the PC sends the temperature request command to obtain the rack temperature distribution), and calculate R t+1 according to the reward formula;

8:根据公式更新阵列中对应值;8: Update the corresponding value in the array according to the formula;

9:时刻t增加1;9: time t increases by 1;

10:循环体结束。10: The loop body ends.

综上,本发明对抬升地板结构数据中心的机架热点问题建立马尔可夫决策过程模型,并提供一种强化学习模型求解算法,作为强化学习控制算法的核心,在不提升机房空调功率的前提下,根据当前机架温度分布,智能控制主动通风地板(在普通通风地板背部附装风扇的地板)风扇转速,通过这种主动输送足量冷气的方式,使得机架入风口温度分布均匀化,缓解抬升地板结构的数据中心普遍存在的机架热点问题,从而节约制冷能耗,保证服务器的安全性和稳定性。与现有的数据中心机架级气流管理方法相比,本发明更容易部署,更具成本效益,普适性更强。To sum up, the present invention establishes a Markov decision process model for the rack hotspot problem of a data center with a raised floor structure, and provides a reinforcement learning model solving algorithm, which is the core of the reinforcement learning control algorithm without increasing the power of the computer room air conditioner. According to the current rack temperature distribution, the fan speed of the active ventilation floor (the floor with fans attached to the back of the ordinary ventilation floor) is intelligently controlled, and through this method of actively delivering sufficient cold air, the temperature distribution of the air inlet of the rack is evenly distributed. Alleviate the rack hotspot problem common in data centers with raised floor structures, thereby saving cooling energy consumption and ensuring the security and stability of servers. Compared with existing data center rack-level airflow management methods, the present invention is easier to deploy, more cost-effective, and more universal.

Claims (3)

1. The reinforcement learning control method of the data center active ventilation floor is characterized by comprising the following steps:
step 1, arranging a certain number of first temperature sensors for monitoring the temperature distribution of an air inlet of a rack at the air inlet of the rack, and arranging a second temperature sensor for monitoring the air supply temperature under an active ventilation floor under the active ventilation floor;
step 2, establishing a Markov decision process model for the rack hot spot problem of the raised floor structure data center, wherein the model is determined by a system state s t Behavior space
Figure FDA0003657059690000011
Reward R t+1 And a cost function Q(s) t ,a t ) The four parts are formed;
wherein: the system state s t For the system state at the time t,
Figure FDA0003657059690000012
the state space is defined as the duty ratio of a discretized PWM signal square wave, and the formula is as follows:
Figure FDA0003657059690000013
wherein s is
Figure FDA0003657059690000014
DC is the value of the square wave duty ratio of the PWM signal, max (DC) is the maximum value of DC, D TQ For DC discretization of the equivalence ratio, k denotes D in a certain state TQ The number of (2); the PWM signal square wave is generated by the following method: generating a duty ratio value of a PWM signal according to the temperature distribution returned by the first temperature sensor, and transmitting the duty ratio value to the microcontroller, wherein the microcontroller generates a corresponding PWM signal according to the duty ratio value;
space of action
Figure FDA0003657059690000015
Defined as the change in the rotational speed of the active ventilation floor fan,
Figure FDA0003657059690000016
reward R t+1 The temperature distribution uniformity of the air inlet of the rack is quantified, and the energy consumption of the active ventilation floor fan is calculated according to the following formula:
Figure FDA0003657059690000021
wherein R is t+1 The reward obtained after the system takes some action for time t,
Figure FDA0003657059690000022
the temperature distribution uniformity of the air inlet of the rack is shown, the more the formula value is negative, the closer to 0, the more uniform the temperature distribution of the air inlet of the rack is, the T t,i The temperature reading of the first temperature sensor numbered i at time t,
Figure FDA0003657059690000023
for the reference temperature of the rack at time t,
Figure FDA0003657059690000024
T t,under is tReading, delta, of the second temperature sensor at the moment T The fixed temperature difference set according to the mixing degree of the cold air and the hot air on the active ventilation floor is positive,
Figure FDA0003657059690000025
is a collection of the first temperature sensors,
Figure FDA0003657059690000026
the total number of the temperature sensors is one; - (A) ref ×DC t ) 3 Representing the active ventilation floor fan energy consumption, the values of the formula are all negative, the closer to 0, the lower the fan energy consumption, wherein A ref To maintain a reference behavior value of the same order of magnitude as the uniformity of the temperature distribution at the inlet of the frame, DC t The duty ratio of the square wave of the PWM signal at the moment t;
cost function Q(s) t ,a t ) The formula of the behavior cost function is as follows:
Figure FDA0003657059690000027
wherein the merit function Q (s, a) is referred to as the Q function,
Figure FDA0003657059690000028
for the action taken by the system at time t,
Figure FDA0003657059690000029
as a function of the expectation, y is the future time relative to time t, R t+y+1 Represents the reward obtained after the system takes action at the time t + y, gamma represents the attenuation factor, gamma is more than or equal to 0 and less than 1, and gamma is y Y power of gamma, is t + y time R t+y+1 The attenuation factor of (d);
the markov decision process model is summarized as: under the system state at any time t, the accumulated reward is maximized by selecting the optimal behavior, and the model formula is as follows:
Figure FDA00036570596900000210
is constrained to
Figure FDA0003657059690000031
Wherein, γ t Is time t system R t+1 The attenuation factor of (d);
and 3, solving the model by adopting a reinforcement learning model solving algorithm, and adjusting the rotating speed of the fan of the active ventilation floor by continuously exploring and learning the complex relation between the duty ratio of the PWM signal and the rise, fall or maintenance of the duty ratio by using whether the temperature distribution of the air inlet of the rack is uniform and whether the energy consumption of the active ventilation floor is low as evaluation standards, so that the temperature distribution of the air inlet of the rack is uniform, and the hot spot problem of the rack is relieved.
2. The reinforcement learning control method for the active ventilation floor of the data center according to claim 1, wherein in the step 2, an optimal Q function is obtained through calculation, that is, an optimal behavior can be selected according to the optimal Q function under a system state at any time t, so that the accumulated reward is maximized, and the optimal Q function has a calculation formula:
Figure FDA0003657059690000032
at any time t, the optimal behavior selection formula is as follows:
Figure FDA0003657059690000033
wherein Q * (s t ,a t ) Representing the optimal Q function, s t+1 Represents the state of the system at the moment t +1, and a represents any action in all actions that the system may take at the moment t +1, namely, the action space
Figure FDA0003657059690000034
Is performed in a manner such that a certain behavior in (2),
Figure FDA0003657059690000035
is shown at s t+1 In the state, the system adopts any one
Figure FDA0003657059690000036
The largest optimal Q function value can be obtained.
3. The reinforcement learning control method for the active ventilation floor of the data center according to claim 1, wherein in the step 3, the reinforcement learning model solving algorithm is an array algorithm, the Q function is stored by using a two-dimensional array, wherein a row index is a state and a column index is a behavior, and the Q sample value Q is calculated t+1,target And Q query value Q t (s t ,a t ) Difference of delta t+1 Iteratively updating the Q value in the array, calculating an optimal Q function, and further selecting an optimal behavior by inquiring the array so as to maximize the accumulative reward of the model; wherein the Q sample value is calculated according to the optimal Q function, and R obtained by the real-time system t+1 And s t+1 Calculated, the Q query value is s obtained in real time according to the system t And a t Searching the value obtained by the corresponding row and column query in the two-dimensional array;
the Q sample value calculation formula is as follows:
Figure FDA0003657059690000041
wherein
Figure FDA0003657059690000042
For time t said two-dimensional array s t+1 Corresponding to the maximum Q query value in the row, the array updating mode is as follows:
Figure FDA0003657059690000043
wherein Q t (s t ,a t ) For s in a two-dimensional array at time t t And a t Corresponding Q query value, Q t+1 (s t ,a t ) For s in a two-dimensional array at time t +1 t And a t Corresponding Q query value, beta(s) t ,a t )∈[0,1]A corresponding learning step size for each state-behavior pair in the array.
CN202010456237.6A 2020-05-26 2020-05-26 Reinforced learning control method for data center active ventilation floor Active CN111601490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010456237.6A CN111601490B (en) 2020-05-26 2020-05-26 Reinforced learning control method for data center active ventilation floor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010456237.6A CN111601490B (en) 2020-05-26 2020-05-26 Reinforced learning control method for data center active ventilation floor

Publications (2)

Publication Number Publication Date
CN111601490A CN111601490A (en) 2020-08-28
CN111601490B true CN111601490B (en) 2022-08-02

Family

ID=72186518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010456237.6A Active CN111601490B (en) 2020-05-26 2020-05-26 Reinforced learning control method for data center active ventilation floor

Country Status (1)

Country Link
CN (1) CN111601490B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020079B (en) * 2021-11-03 2022-09-16 北京邮电大学 Indoor space temperature and humidity regulation and control method and device
CN120596339B (en) * 2025-08-07 2025-10-28 苏州元脑智能科技有限公司 Temperature control method, device, electronic device, storage medium and product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05159075A (en) * 1991-12-03 1993-06-25 Nippon Telegr & Teleph Corp <Ntt> Interpolation method based on Markov random field with continuous values
CN103473613A (en) * 2013-09-09 2013-12-25 武汉理工大学 Landscape structure-surface temperature-electricity consumption coupling model and application thereof
JP2015082224A (en) * 2013-10-23 2015-04-27 日本電信電話株式会社 Probabilistic server load estimation method and server load estimation apparatus
CN106528941A (en) * 2016-10-13 2017-03-22 内蒙古工业大学 Data center energy consumption optimization resource control algorithm under server average temperature constraint
CN108446783A (en) * 2018-01-29 2018-08-24 杭州电子科技大学 A kind of prediction of new fan operation power and monitoring method
WO2019154739A1 (en) * 2018-02-07 2019-08-15 Abb Schweiz Ag Method and system for controlling power consumption of a data center based on load allocation and temperature measurements
CN110322977A (en) * 2019-07-10 2019-10-11 河北工业大学 A kind of analysis method for reliability of nuclear power reactor core water level monitoring system
CN111144793A (en) * 2020-01-03 2020-05-12 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8478451B2 (en) * 2009-12-14 2013-07-02 Intel Corporation Method and apparatus for dynamically allocating power in a data center
US20130226501A1 (en) * 2012-02-23 2013-08-29 Infosys Limited Systems and methods for predicting abnormal temperature of a server room using hidden markov model
US20140324240A1 (en) * 2012-12-14 2014-10-30 Alcatel-Lucent Usa Inc. Method And System For Disaggregating Thermostatically Controlled Appliance Energy Usage From Other Energy Usage
JP7134949B2 (en) * 2016-09-26 2022-09-12 ディー-ウェイブ システムズ インコーポレイテッド Systems, methods, and apparatus for sampling from a sampling server
US20180100662A1 (en) * 2016-10-11 2018-04-12 Mitsubishi Electric Research Laboratories, Inc. Method for Data-Driven Learning-based Control of HVAC Systems using High-Dimensional Sensory Observations

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05159075A (en) * 1991-12-03 1993-06-25 Nippon Telegr & Teleph Corp <Ntt> Interpolation method based on Markov random field with continuous values
CN103473613A (en) * 2013-09-09 2013-12-25 武汉理工大学 Landscape structure-surface temperature-electricity consumption coupling model and application thereof
JP2015082224A (en) * 2013-10-23 2015-04-27 日本電信電話株式会社 Probabilistic server load estimation method and server load estimation apparatus
CN106528941A (en) * 2016-10-13 2017-03-22 内蒙古工业大学 Data center energy consumption optimization resource control algorithm under server average temperature constraint
CN108446783A (en) * 2018-01-29 2018-08-24 杭州电子科技大学 A kind of prediction of new fan operation power and monitoring method
WO2019154739A1 (en) * 2018-02-07 2019-08-15 Abb Schweiz Ag Method and system for controlling power consumption of a data center based on load allocation and temperature measurements
CN110322977A (en) * 2019-07-10 2019-10-11 河北工业大学 A kind of analysis method for reliability of nuclear power reactor core water level monitoring system
CN111144793A (en) * 2020-01-03 2020-05-12 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN111601490A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN104698843B (en) A kind of data center&#39;s energy-saving control method based on Model Predictive Control
US8594857B2 (en) Modulized heat-dissipation control method for datacenter
US10888028B2 (en) Chassis intelligent airflow control and cooling regulation mechanism
WO2024113906A1 (en) Server cluster temperature adjustment method and device
CN111601490B (en) Reinforced learning control method for data center active ventilation floor
Wan et al. Intelligent rack-level cooling management in data centers with active ventilation tiles: A deep reinforcement learning approach
CN104728997A (en) Air conditioner for constant temperature control, constant temperature control system and constant temperature control method
CN114511208A (en) Optimal control method of data center energy consumption based on deep reinforcement learning
US20140206272A1 (en) Container-type data center and method for controlling container-type data center
CN117519980A (en) Energy-saving data center
CN111836524A (en) IT load change-based method for regulating and controlling variable air volume of precision air conditioner between data center columns
WO2025152564A1 (en) Control method for thermal management system, electronic device, computer readable storage medium, and vehicle
CN110263974B (en) Regional energy management system and management method based on distributed optimization algorithm
CN118151689A (en) Control system and method for adjusting temperature of temperature control equipment in machine room
CN119247786B (en) Thermal-electric integrated optimization control method for intelligent connected fuel cell vehicles
CN115793751A (en) Battery cabinet heat management method and device, battery cabinet and readable storage medium
CN111637614A (en) Intelligent control method for active ventilation floor in data center
Pramanik Electronic Cooler Technologies and Superior Data Center Cooling Techniques
CN117608331A (en) Temperature control methods, devices, equipment and storage media for data center computer rooms
CN120645625A (en) Thermal management system for a vehicle
CN119451038A (en) Micro-module data center air conditioning group control method and system
CN111787768A (en) A data center fan and its control method
CN119374202A (en) A method and system for multi-brand central air conditioning certification and adjustment based on mathematical hybrid model driving
CN114368279A (en) Vehicle thermal management control system and method
CN110595008A (en) A multi-equipment collaborative optimization method and system for a ground source heat pump air conditioning system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant