JP7724621B2

JP7724621B2 - PID control parameter adjustment method, PID control device, and air conditioner equipped with the same

Info

Publication number: JP7724621B2
Application number: JP2021031322A
Authority: JP
Inventors: 裕介守口
Original assignee: Mitsubishi Electric Building Solutions Corp
Current assignee: Mitsubishi Electric Building Solutions Corp
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2025-08-18
Anticipated expiration: 2041-03-01
Also published as: JP2022132716A

Description

本発明は、ＰＩＤ制御パラメータ調整方法、ＰＩＤ制御装置およびこれを備えた空気調和機に関するものである。 The present invention relates to a PID control parameter adjustment method, a PID control device, and an air conditioner equipped with the same.

機械制御や温度制御にＰＩＤ制御が広く用いられている。設備の導入初期において、ＰＩＤ制御パラメータを決定する際には、オートチューニングが一般的に用いられている。しかし、精密温湿度管理が必要な試験室や検査室等の温湿度をＰＩＤ制御で行う場合、オートチューニングで得られるＰＩＤ制御パラメータでは十分な精度が得られないことがある。この場合、ＰＩＤ制御パラメータを調整するために、作業者が出力波形を確認しながら、試行錯誤してＰＩＤ制御パラメータを調整していた。制御対象装置に適したＰＩＤ制御パラメータを決定するまでには、時間を要するのが現状である。 PID control is widely used for machine control and temperature control. When initially installing equipment, autotuning is typically used to determine PID control parameters. However, when using PID control for temperature and humidity in test rooms or inspection rooms that require precise temperature and humidity management, the PID control parameters obtained through autotuning may not provide sufficient accuracy. In such cases, to adjust the PID control parameters, workers would adjust them by trial and error while checking the output waveform. Currently, it takes time to determine the PID control parameters that are appropriate for the equipment being controlled.

ＰＩＤ制御パラメータの設定に機械学習の一手法である強化学習を利用する試みがある。特許文献１には、強化学習を利用したＰＩＤ制御パラメータを決定する技術が開示されている。 There have been attempts to use reinforcement learning, a type of machine learning, to set PID control parameters. Patent Document 1 discloses a technology for determining PID control parameters using reinforcement learning.

特開２０１９－１９７３１５号公報Japanese Patent Application Laid-Open No. 2019-197315

強化学習を利用してＰＩＤ制御パラメータを決定する場合、強化学習の具体的な手段は様々であり、一意に定まるものではない。強化学習のアルゴリズムは、あくまで枠組みを示したもので、環境・行動・報酬のデータ化、環境の状態観測、報酬の計算等のアルゴリズムは、応用する問題に応じて別途、設定する必要がある。 When using reinforcement learning to determine PID control parameters, there are many different specific reinforcement learning methods, and no single method can be determined. The reinforcement learning algorithm merely provides a framework, and algorithms for converting environment, behavior, and reward data, observing environmental conditions, calculating rewards, etc. must be set separately depending on the problem being applied.

強化学習は、所定の回数の実行を完了するか、あるいは所定値以上の報酬（良好な結果）が得られる等、予め設定した条件を満たすと終了する。大きな負荷変動や環境の変化、あるいは設備の経年劣化が起こった場合、得られている学習結果では、十分な制御精度が得られないときがある。この場合、再度ＰＩＤ制御パラメータを調整する必要がある。この調整も設備の導入初期と同様に時間を要していた。
温湿度制御においては、特に精度を要される場合がある。負荷変動や環境変化、あるいは設備の経年劣化が進むと、得られている学習結果からは、制御精度が要求を満たさない状況が発生することがあり、ＰＩＤ制御パラメータの決定に強化学習を適用することは、容易ではなかった。 Reinforcement learning ends when a predetermined condition is met, such as completing a set number of runs or obtaining a reward (good result) greater than or equal to a set value. When large load fluctuations, environmental changes, or equipment deterioration occur, the learning results obtained may not provide sufficient control accuracy. In this case, the PID control parameters must be adjusted again. This adjustment also takes time, just like when the equipment was first installed.
Temperature and humidity control requires particular precision. When load fluctuations, environmental changes, or equipment deterioration progresses, the obtained learning results can lead to situations where the control precision does not meet the required level, making it difficult to apply reinforcement learning to determine PID control parameters.

本発明の目的は、ＰＩＤ制御装置のためのＰＩＤ制御パラメータの調整方法、特に負荷変動や経年劣化に自動的に対応できるＰＩＤ制御パラメータの調整方法を提供することである。 The object of the present invention is to provide a method for adjusting PID control parameters for a PID control device, in particular a method for adjusting PID control parameters that can automatically respond to load fluctuations and deterioration over time.

本開示に係るＰＩＤ制御パラメータ調整方法は、第１の所定時間の間、予め定められたＰＩＤ制御パラメータで制御対象装置を制御目標値にＰＩＤ制御するステップと、第１の所定時間の間の制御対象装置の出力データを基に制御評価値を算出するステップと、第１の所定時間のＰＩＤ制御の制御条件と、ＰＩＤ制御パラメータと、制御評価値を学習データとして記憶するステップと、ＰＩＤ制御装置のＰＩＤ制御パラメータを更新するステップと、更新したＰＩＤ制御パラメータを使用して、制御対象装置をＰＩＤ制御するステップを再度実行するステップを有する。 The PID control parameter adjustment method disclosed herein includes the steps of PID controlling a control target device to a control target value using predetermined PID control parameters for a first predetermined time period; calculating a control evaluation value based on output data from the control target device for the first predetermined time period; storing the control conditions for PID control for the first predetermined time period, the PID control parameters, and the control evaluation value as learning data; updating the PID control parameters of the PID control device; and re-executing the step of PID controlling the control target device using the updated PID control parameters.

本開示に係るＰＩＤ制御パラメータ調整方法によれば、負荷変動や経年劣化にも自動的に対応できるＰＩＤ制御パラメータの調整方法を実現できる。 The PID control parameter adjustment method disclosed herein makes it possible to realize a PID control parameter adjustment method that can automatically respond to load fluctuations and deterioration over time.

実施形態のＰＩＤ制御装置の構成を示す図である。FIG. 1 is a diagram illustrating a configuration of a PID control device according to an embodiment. 一般的なＰＩＤ制御のブロック図である。FIG. 1 is a block diagram of a general PID control. 実施形態の強化学習部の構成を示す図である。FIG. 2 is a diagram illustrating a configuration of a reinforcement learning unit according to an embodiment. 実施形態のＰＩＤ制御装置の強化学習の動作を説明するフローチャートである。10 is a flowchart illustrating the operation of reinforcement learning of the PID control device according to the embodiment. 同上におけるＰＩＤ制御実行処理の動作を説明するフローチャートである。4 is a flowchart illustrating the operation of the PID control execution process in the embodiment; 同上における評価値算出処理の動作を説明するフローチャートである。10 is a flowchart illustrating the operation of the evaluation value calculation process in the embodiment. 同上におけるＰＩＤ制御パラメータ更新処理の動作を説明するフローチャートである。10 is a flowchart illustrating an operation of a PID control parameter update process in the embodiment;

まず初めに本開示における強化学習について説明するとともに、強化学習に関する言葉の定義を行う。 First, we will explain reinforcement learning in this disclosure and define terms related to reinforcement learning.

（強化学習）
強化学習とは、ある環境下に置かれたエージェントが環境に対して行動をし、その行動により得られる報酬が最大化されるような方策を求めるものである。エージェントが環境に対して行動を起こし、環境が状態の更新と行動の評価を行い、状態と報酬をエージェントに知らせるというステップを時系列的に繰り返し、得られる報酬の合計の期待値が最大化されるように行動価値関数と方策を最適化する。 (Reinforcement learning)
Reinforcement learning is a method of finding a policy that maximizes the reward obtained by an agent placed in a certain environment, taking action in the environment. The agent takes action in the environment, the environment updates the state and evaluates the action, and notifies the agent of the state and reward. This process is repeated chronologically, and the action value function and policy are optimized to maximize the expected value of the total reward obtained.

（本開示の強化学習におけるの状態、行動、報酬の定義）
本開示におけるＰＩＤ制御パラメータの強化学習において、状態は、ＰＩＤ制御の制御条件であり、具体的には、ＰＩＤ制御の制御目標値と周囲の環境データとする。行動は、強化学習によるＰＩＤ制御パラメータの更新とする。報酬は、ＰＩＤ制御を所定期間実行した後、制御対象装置の出力データの収束性、応答性、安定性などを演算して得られた値（以降、制御評価値と言う）とする。 (Definitions of state, action, and reward in reinforcement learning in this disclosure)
In the reinforcement learning of PID control parameters in the present disclosure, the state is the control conditions of PID control, specifically, the control target value of PID control and surrounding environmental data. The action is the update of the PID control parameters through reinforcement learning. The reward is a value (hereinafter referred to as a control evaluation value) obtained by calculating the convergence, responsiveness, stability, etc. of the output data of the controlled device after executing PID control for a predetermined period.

課題で述べたように、強化学習が終了した装置においても大きな負荷変動や環境の変化、あるいは制御対象装置の経年劣化が起こり、制御精度が得られなくなった場合には、再度、強化学習が必要になる。設備作業者は、再度の強化学習が必要かどうかを確認する必要がある。確認作業には、出力波形を詳細に分析する必要がある。また、制御結果の異常に気づかないこともあり、管理は煩雑である。 As mentioned in the issue section, even for equipment that has completed reinforcement learning, if there are large load fluctuations, changes in the environment, or deterioration of the equipment being controlled over time, and control accuracy can no longer be achieved, reinforcement learning will be required again. Facility operators must confirm whether reinforcement learning is necessary again. This confirmation process requires detailed analysis of the output waveform. Furthermore, abnormalities in the control results may not be noticed, making management cumbersome.

そこで出願人らは、上述の課題に対応可能なＰＩＤ制御パラメータの調整方法について種々の検討の結果、本開示のＰＩＤ制御パラメータの調整方法を得た。本開示におけるＰＩＤ制御パラメータの調整方法は、次の２つの特徴を有する。第１の特徴は、制御対象装置のＰＩＤ制御において常に強化学習を続けることである。第２の特徴は、所定期間、ＰＩＤ制御を実行して得られる出力データを一定間隔で記憶し、その出力データの変化過程を含んで、収束性、応答性、安定性を演算する手法を組み込んだことである。 The applicants therefore conducted extensive research into PID control parameter adjustment methods that could address the above-mentioned issues, and as a result, arrived at the PID control parameter adjustment method disclosed herein. The PID control parameter adjustment method disclosed herein has the following two features. The first feature is that reinforcement learning is constantly performed in the PID control of the controlled device. The second feature is that it incorporates a method of storing output data obtained by executing PID control for a predetermined period at regular intervals, and calculating convergence, responsiveness, and stability, including the process by which that output data changes.

第１の特徴により、大きな負荷変動や外気の変化、システムの経年劣化があった場合でも、自動的にＰＩＤ制御パラメータの修正が可能となる。
第２の特徴により、精度が求められるＰＩＤ制御において、収束性が高く、応答性がよく、安定性の高いＰＩＤ制御パラメータを設定可能となる。 The first feature makes it possible to automatically correct the PID control parameters even when there are large load fluctuations, changes in the outside air, or deterioration of the system over time.
The second feature makes it possible to set PID control parameters that have high convergence, good responsiveness, and high stability in PID control that requires precision.

以下、本発明の実施形態について詳細に説明する。以下の説明において、具体的な形状、材料、方向、数値等は、本開示の理解を容易にするための例示であって、用途、目的、仕様等に合わせて適宜変更することができる。 Embodiments of the present invention are described in detail below. In the following description, specific shapes, materials, directions, numerical values, etc. are examples intended to facilitate understanding of the present disclosure and can be modified as appropriate to suit the application, purpose, specifications, etc.

図１に本実施形態のＰＩＤ制御装置１の構成を示す。ＰＩＤ制御装置１は、ＰＩＤ制御部１０、目標設定入力部３０、環境データ取得部４０、制御データ測定部５０、強化学習部６０を有する。強化学習のために特有の構成は、制御データ測定部５０と強化学習部６０である。 Figure 1 shows the configuration of the PID control device 1 of this embodiment. The PID control device 1 has a PID control unit 10, a target setting input unit 30, an environmental data acquisition unit 40, a control data measurement unit 50, and a reinforcement learning unit 60. The components specific to reinforcement learning are the control data measurement unit 50 and the reinforcement learning unit 60.

ＰＩＤ制御部１０は、予め設定されたＰＩＤ制御パラメータに基づいて、制御対象装置２０の出力が目標値（制御目標値）に一致するようにＰＩＤ制御を実行する。制御対象装置２０の制御目標値は目標値設定入力部３０から取得する。 The PID control unit 10 performs PID control based on preset PID control parameters so that the output of the controlled device 20 matches a target value (control target value). The control target value for the controlled device 20 is obtained from the target value setting input unit 30.

図２は、ＰＩＤ制御部１０と制御対象装置２０のブロック線図である。Ｋ_Ｐ、Ｋ_Ｉ、Ｋ_Ｄはそれぞれ、比例ゲイン、積分ゲイン、微分ゲインである。本実施形態のＰＩＤ制御部１０は、後述する強化学習部６０からＰＩＤ制御パラメータの入力を受けて、各ゲインの値を変更するように構成されている。 2 is a block diagram of the PID control unit 10 and the controlled device 20. _KP , _KI , and _KD are proportional gain, integral gain, and differential gain, respectively. The PID control unit 10 of this embodiment is configured to change the value of each gain upon receiving PID control parameters from a reinforcement learning unit 60, which will be described later.

制御対象装置２０は、ＰＩＤ制御を実行される対象装置である。具体的には、恒温恒湿室などであるが、これに限定されない。制御対象装置２０の出力データは、ＰＩＤ制御部１０にフィードバックされ、制御目標値との偏差に基づきＰＩＤ制御が行われる。 The controlled device 20 is a device on which PID control is performed. Specifically, it may be, but is not limited to, a constant temperature and humidity room. The output data of the controlled device 20 is fed back to the PID control unit 10, and PID control is performed based on the deviation from the control target value.

目標設定入力部３０は、制御対象装置２０の制御目標値を取得する。目標値設定入力部３０が取得した制御目標値は、ＰＩＤ制御部１０に送られ、ＰＩＤ制御の制御目標値となる。更に当該制御目標値は、強化学習部６０にも送られる。強化学習部６０において、制御目標値は、強化学習における状態の値の一部として使用される。 The target setting input unit 30 acquires the control target value of the controlled device 20. The control target value acquired by the target value setting input unit 30 is sent to the PID control unit 10 and becomes the control target value for PID control. The control target value is also sent to the reinforcement learning unit 60. In the reinforcement learning unit 60, the control target value is used as part of the state value in reinforcement learning.

環境データ取得部４０は、制御対象装置２０の環境データ、例えば、外気温度や室温、湿度を取得する。環境データを取得するための具体的な構成は限定しないが、各種センサからデータを取得するように構成することができる。環境データ取得部４０が取得した環境データは、強化学習における状態の値の一部として使用される。 The environmental data acquisition unit 40 acquires environmental data of the controlled device 20, such as the outside air temperature, room temperature, and humidity. The specific configuration for acquiring environmental data is not limited, but it can be configured to acquire data from various sensors. The environmental data acquired by the environmental data acquisition unit 40 is used as part of the state value in reinforcement learning.

制御データ測定部５０は、ＰＩＤ制御を実行中の制御対象装置２０の出力データを逐次測定して、強化学習部６０へ送る。制御データ測定部５０の出力データは、制御対象装置２０からＰＩＤ制御部１０へフィードバックする出力データと兼用するように構成してもよい。 The control data measurement unit 50 sequentially measures the output data of the controlled device 20 executing PID control and sends it to the reinforcement learning unit 60. The output data of the control data measurement unit 50 may also be configured to be used as the output data fed back from the controlled device 20 to the PID control unit 10.

強化学習部６０は、目標値設定入力部３０から取得したＰＩＤ制御の制御目標値と、環境データ取得部４０から取得した制御対象装置２０の環境データを強化学習における状態の値の一部として記憶する。 The reinforcement learning unit 60 stores the control target value for PID control acquired from the target value setting input unit 30 and the environmental data of the controlled device 20 acquired from the environmental data acquisition unit 40 as part of the state values in reinforcement learning.

強化学習部６０は、更にＰＩＤ制御実行中の制御対象装置２０の出力データを制御データ測定部５０から取得し記憶する。当該取得された出力データは、ＰＩＤ制御を所定時間実行した後、制御評価値の算出に使用される。 The reinforcement learning unit 60 further acquires and stores output data from the control data measurement unit 50 for the controlled device 20 during PID control execution. After PID control has been executed for a predetermined period of time, the acquired output data is used to calculate the control evaluation value.

強化学習部６０は、所定時間、ＰＩＤ制御を実行した後、強化学習における報酬としての制御評価値の算出を行う。制御評価値の算出には、出力データと制御目標値と差分など、制御目標値との関係に基づき演算を行うようにしてもよい。制御評価値の算出については、後で詳細に説明する。 After executing PID control for a predetermined time, the reinforcement learning unit 60 calculates a control evaluation value as a reward in reinforcement learning. The control evaluation value may be calculated based on the relationship between the control target value, such as the difference between the output data and the control target value. The calculation of the control evaluation value will be explained in detail later.

強化学習部６０は、制御評価値に基づき、ＰＩＤ制御パラメータを更新して、ＰＩＤ制御部１０に送信する。ＰＩＤ制御部１０は、更新されたＰＩＤ制御パラメータと目標設定入力３０の制御目標値に基づいて、ＰＩＤ制御を実行する。 The reinforcement learning unit 60 updates the PID control parameters based on the control evaluation value and transmits them to the PID control unit 10. The PID control unit 10 performs PID control based on the updated PID control parameters and the control target value of the target setting input 30.

強化学習部６０は、学習結果を保存する機能も有する。本実施形態において、強化学習における状態は、制御目標値と環境データ値である。強化学習における行動は、ＰＩＤ制御パラメータの更新である。具体的には、新たにＰＩＤ制御のゲインＫ_Ｐ、Ｋ_Ｉ、Ｋ_Ｄを更新することである。強化学習における報酬は、制御評価値である。ＰＩＤ制御パラメータの更新の際には、既に保存されている学習結果が参照される。 The reinforcement learning unit 60 also has a function of saving the learning results. In this embodiment, the state in reinforcement learning is the control target value and the environmental data value. The action in reinforcement learning is the update of the PID control parameters. Specifically, this is the new update of the PID control gains _KP , _KI , and _KD . The reward in reinforcement learning is the control evaluation value. When updating the PID control parameters, the already saved learning results are referenced.

図３に、強化学習部６０の構成を示す。強化学習部６０は、少なくとも入力部６１０、制御部６２０、記憶部６３０、演算部６４０、出力部６５０を有している。 Figure 3 shows the configuration of the reinforcement learning unit 60. The reinforcement learning unit 60 has at least an input unit 610, a control unit 620, a memory unit 630, a calculation unit 640, and an output unit 650.

入力部６１０は、ＰＩＤ制御および強化学習を実行するために必要なデータを入力する機能を有する。入力部６１０は、目標値入力部６１１、環境データ入力部６１２、出力データ測定部６１３を有する。目標値入力部６１１は、目標設定入力部３０から制御目標値を取得する。環境データ入力部６１２は、環境データ取得部４０から環境データを取得する。環境データは、恒温恒湿室の制御においては、外気温、室温、湿度等が該当する。出力データ測定部６１３は、ＰＩＤ制御を実行中の制御対象装置２０の出力データを所定の間隔で取得する。出力データ測定部６１３は、制御データ測定部５０から制御対象装置２０の出力データを取得する。 The input unit 610 has the function of inputting data necessary to perform PID control and reinforcement learning. The input unit 610 has a target value input unit 611, an environmental data input unit 612, and an output data measurement unit 613. The target value input unit 611 acquires control target values from the target setting input unit 30. The environmental data input unit 612 acquires environmental data from the environmental data acquisition unit 40. In the case of controlling a constant temperature and humidity room, environmental data corresponds to the outside air temperature, room temperature, humidity, etc. The output data measurement unit 613 acquires output data of the controlled device 20 that is executing PID control at predetermined intervals. The output data measurement unit 613 acquires output data of the controlled device 20 from the control data measurement unit 50.

制御部６２０は、制御時間計測部６２１を有する。制御時間計測部６２１は、ＰＩＤ制御部１０がＰＩＤ制御を実行する第１の所定時間Ｔ１を計測する。第１の所定時間Ｔ１は、一回の強化学習時間に相当する。また、制御時間計測部６２１は、制御対象装置２０から出力されるデータを保存する第２の所定時間Ｔ２も計測するように構成されてもよい。尚、第２の所定時間Ｔ２は、ＰＩＤ制御部１０が計測するように構成してもよい。 The control unit 620 has a control time measurement unit 621. The control time measurement unit 621 measures a first predetermined time T1 during which the PID control unit 10 executes PID control. The first predetermined time T1 corresponds to the time for one reinforcement learning session. The control time measurement unit 621 may also be configured to measure a second predetermined time T2 during which data output from the controlled device 20 is saved. The second predetermined time T2 may also be configured to be measured by the PID control unit 10.

記憶部６３０は、測定データ記憶部６３１と学習データ記憶部６３２を有する。測定データ記憶部６３１は、出力データ測定部６１３が取得した制御対象装置２０の出力データを受けとり記憶する。例えば、第２の所定時間Ｔ２ごとの制御対象装置２０の出力データを記憶する。学習データ記憶部６３２は、測定データ記憶部６３１のデータを基に、後述する演算部６４０によって算出された制御評価値を記憶する。 The memory unit 630 has a measurement data memory unit 631 and a learning data memory unit 632. The measurement data memory unit 631 receives and stores the output data of the control target device 20 acquired by the output data measurement unit 613. For example, it stores the output data of the control target device 20 for each second predetermined time T2. The learning data memory unit 632 stores a control evaluation value calculated by the calculation unit 640 (described below) based on the data from the measurement data memory unit 631.

演算部６４０は、データ演算部６４１とパラメータ設定部６４２を有する。データ演算部６４１は、測定データ記憶部６３１に記憶された第２の所定時間Ｔ２ごとの制御対象装置２０の出力データに基づいて、制御評価値を算出する。制御評価値は、学習データの一部として、学習データ記憶部６３２に記憶される。 The calculation unit 640 has a data calculation unit 641 and a parameter setting unit 642. The data calculation unit 641 calculates a control evaluation value based on the output data of the control target device 20 for each second predetermined time T2 stored in the measurement data storage unit 631. The control evaluation value is stored in the learning data storage unit 632 as part of the learning data.

演算部６４０のパラメータ設定部６４２は、次のＰＩＤ制御において使用されるＰＩＤ制御パラメータの値を更新する。本実施形態においては、ＰＩＤ制御パラメータの値は、目標値入力部６１１から取得した制御目標値と、環境データ入力部６１２から取得した環境データの値に対して、制御評価値が所定値（通常は最大値）となる行動（ＰＩＤ制御パラメータの更新）を学習データから抽出することで更新される。 The parameter setting unit 642 of the calculation unit 640 updates the values of the PID control parameters to be used in the next PID control. In this embodiment, the values of the PID control parameters are updated by extracting from the learning data an action (update of the PID control parameters) that results in a control evaluation value of a predetermined value (usually the maximum value) for the control target value obtained from the target value input unit 611 and the environmental data value obtained from the environmental data input unit 612.

出力部６５０は、パラメータ出力部６５１を有する。パラメータ出力部６５１は、演算部６４０のパラメータ設定部６４２が決定したＰＩＤ制御パラメータを、ＰＩＤ制御部１０に送信する。ＰＩＤ制御部１０では、送信されたＰＩＤ制御パラメータに基づき、各ゲインの値が変更される。 The output unit 650 has a parameter output unit 651. The parameter output unit 651 transmits the PID control parameters determined by the parameter setting unit 642 of the calculation unit 640 to the PID control unit 10. The PID control unit 10 changes the values of each gain based on the transmitted PID control parameters.

次に図４～７のフローチャートを参照しつつ、本実施形態の強化学習の詳細について説明する。図４は、強化学習の概略を表すフローチャートである。 Next, we will explain the details of reinforcement learning in this embodiment with reference to the flowcharts in Figures 4 to 7. Figure 4 is a flowchart that shows an overview of reinforcement learning.

ステップＳ０１：初めにＰＩＤ制御パラメータの初期値を決定する。ＰＩＤ制御パラメータの初期値は、オートチューニングによって決定するように構成してもよい。あるいは手動で設定するようにしてもよい。ＰＩＤ制御パラメータが予め設定されている場合は、その値を使用するようにして、ステップＳ０１を実行しないように構成してもよい。次にステップＳ０２に移る。 Step S01: First, determine the initial values of the PID control parameters. The initial values of the PID control parameters may be determined by autotuning, or may be set manually. If the PID control parameters have been set in advance, the initial values may be used, and step S01 may not be executed. Next, proceed to step S02.

ステップＳ０２：現状のＰＩＤ制御パラメータで、制御対象装置２０のＰＩＤ制御を実行し、制御対象装置２０の出力データを記憶する。制御時間は、第１の所定時間Ｔ１である。ＰＩＤ制御実行処理の詳細は図５で説明する。 Step S02: PID control of the controlled device 20 is performed using the current PID control parameters, and the output data of the controlled device 20 is stored. The control time is the first predetermined time T1. Details of the PID control execution process are explained in Figure 5.

ステップＳ０３：記憶された出力データに基づき制御評価値の算出を行う。制御評価値は、ＰＩＤ制御パラメータの良し悪しを決める尺度を反映したものとなる。具体的には、制御結果の収束性、応答性、安定性の観点で制御評価値は算出される。算出された制御評価値は学習データの一部として記憶される。制御評価値算出および学習データ記憶処理の詳細については、図６で説明する。 Step S03: Calculate a control evaluation value based on the stored output data. The control evaluation value reflects a measure for determining the quality of the PID control parameters. Specifically, the control evaluation value is calculated from the perspective of the convergence, responsiveness, and stability of the control results. The calculated control evaluation value is stored as part of the learning data. Details of the control evaluation value calculation and learning data storage process are explained in Figure 6.

ステップＳ０４：ＰＩＤ制御パラメータを更新し、更新されたＰＩＤ制御パラメータをＰＩＤ制御部１０に送信する。具体的なＰＩＤ制御パラメータの更新方法については、図７で説明する。次にステップＳ０２に戻り、更新されたＰＩＤ制御パラメータで、所定時間、ＰＩＤ制御を実行する。 Step S04: The PID control parameters are updated and the updated PID control parameters are sent to the PID control unit 10. A specific method for updating the PID control parameters is described in Figure 7. Next, return to step S02 and execute PID control for a predetermined time using the updated PID control parameters.

図４に示す強化学習のフローチャートで、一般的な強化学習と異なる点は、強化学習の終了条件がない点である。後に説明する強化学習方針によって、制御目標値と環境データに対して、ＰＩＤ制御の制御評価値が悪化した場合に、新たに最適なＰＩＤ制御パラメータを求めるように、再度強化学習が進行するように構成されている。 The reinforcement learning flowchart shown in Figure 4 differs from general reinforcement learning in that there is no termination condition for reinforcement learning. According to the reinforcement learning policy described later, if the control evaluation value of PID control deteriorates in relation to the control target value and environmental data, reinforcement learning is configured to proceed again so that new optimal PID control parameters are found.

尚、強化学習部６０の記憶部６３０には、常に学習結果が蓄積されていく。従って、記憶部６３０は常に学習結果が蓄積されていくことを考慮して構成される必要がある。例えば、記憶部６３０はハードディスク装置で構成してもよい。あるいは、ネットワーク接続したサーバ上に構成してもよい。強化学習が進めば、大きな目標値変動や設備の急な劣化などがない限り、学習結果のデータの増加は緩やかになる。 Note that learning results are constantly being accumulated in the memory unit 630 of the reinforcement learning unit 60. Therefore, the memory unit 630 must be configured to take into account the fact that learning results are constantly being accumulated. For example, the memory unit 630 may be configured as a hard disk drive. Alternatively, it may be configured on a server connected to a network. As reinforcement learning progresses, the increase in learning result data will slow down unless there is a large change in the target value or sudden deterioration of the equipment.

次に図４のフローチャートにおけるステップＳ０２～ステップＳ０４の各処理の詳細について説明する。 Next, we will explain the details of each process in steps S02 to S04 in the flowchart in Figure 4.

（ＰＩＤ制御実行処理）
図５は、図４のステップＳ０２のＰＩＤ制御実行処理のフローチャートである。ＰＩＤ制御実行処理は、第1の所定時間の間、設定されたＰＩＤ制御パラメータで、制御対象装置２０のＰＩＤ制御を実行する。第１の所定時間Ｔ１の間に制御対象装置２０の出力データを取得し、第２の所定時間Ｔ２間隔で、出力データを保存する。保存された出力データは、次ステップの制御評価値算出処理において使用される。 (PID control execution process)
5 is a flowchart of the PID control execution process of step S02 in FIG. 4. The PID control execution process executes PID control of the controlled device 20 using set PID control parameters for a first predetermined time. Output data of the controlled device 20 is acquired during the first predetermined time T1, and the output data is saved at intervals of a second predetermined time T2. The saved output data is used in the control evaluation value calculation process in the next step.

ステップＳ１１:ＰＩＤ制御の経過時間を０にセットする。 Step S11: Set the elapsed time of PID control to 0.

ステップＳ１２:現在のＰＩＤ制御パラメータでＰＩＤ制御を実行する。 Step S12: Execute PID control using the current PID control parameters.

ステップＳ１３:制御対象装置２０の出力データを取得する。ステップＳ１３では、必ずしも出力データの保存は行わない。 Step S13: Obtain output data from the controlled device 20. In step S13, the output data is not necessarily saved.

ステップＳ１４:第２の所定時間Ｔ２が経過した場合、次のステップＳ１５へ進む。第２の所定時間Ｔ２が経過していない場合は、ステップＳ１２に戻る。 Step S14: If the second predetermined time T2 has elapsed, proceed to the next step S15. If the second predetermined time T2 has not elapsed, return to step S12.

ステップＳ１５:ステップＳ１３で取得した出力データを記憶する。出力データは、測定データ記憶部６３１に記憶される。これによって、第２の所定時間Ｔ２ごとの出力データが記憶されていく。 Step S15: The output data acquired in step S13 is stored. The output data is stored in the measurement data storage unit 631. This allows the output data for each second predetermined time T2 to be stored.

ステップＳ１６:第１の所定時間Ｔ１が経過した場合、ＰＩＤ制御実行処理を終了する。第１の所定時間Ｔ１が経過していない場合は、ステップＳ１２へ戻る。 Step S16: If the first predetermined time T1 has elapsed, end the PID control execution process. If the first predetermined time T1 has not elapsed, return to step S12.

以上がＰＩＤ制御実行処理の処理内容である。ここで、第１の所定時間Ｔ１は、ＰＩＤ制御を実行する時間であり、例えば１時間に設定されている。第２の所定時間Ｔ２は、第１の所定時間Ｔ１より短く、出力データを取得する間隔である。例えば、第２の所定時間Ｔ２は１秒に設定されている。ここで示した第１の所定時間Ｔ１と第２の所定時間Ｔ２は例示である。制御対象装置２０の物理的な大きさ等によって、出力の安定時間を考慮して、第１の所定時間Ｔ１と第２の所定時間Ｔ２は適宜変更してもよい。 The above is the processing content of the PID control execution process. Here, the first predetermined time T1 is the time for executing PID control and is set to, for example, one hour. The second predetermined time T2 is shorter than the first predetermined time T1 and is the interval for acquiring output data. For example, the second predetermined time T2 is set to one second. The first predetermined time T1 and second predetermined time T2 shown here are examples. Depending on the physical size of the controlled device 20, etc., and taking into account the time it takes for the output to stabilize, the first predetermined time T1 and second predetermined time T2 may be changed as appropriate.

（制御評価値算出および学習データ記憶処理）
図６は、制御評価値算出および学習データ記憶処理のフローチャートである。制御評価値の算出においては、第１の所定時間Ｔ１のＰＩＤ制御の出力データ（第２の所定時間Ｔ２ごとに保存した出力データ）を基に演算を行う。制御評価値の算出における演算は、現在のＰＩＤ制御パラメータの良し悪しを反映する値として、ＰＩＤ制御の結果が出力目標値に精度よく一致している場合に高い点数となる点数付けをおこなうようにする。 (Control evaluation value calculation and learning data storage processing)
6 is a flowchart of the control evaluation value calculation and learning data storage process. The control evaluation value is calculated based on the PID control output data for the first predetermined time T1 (the output data saved every second predetermined time T2). The calculation for the control evaluation value reflects the quality of the current PID control parameters, and a high score is assigned when the PID control result precisely matches the output target value.

ステップＳ２１：第１の所定時間Ｔ１のＰＩＤ制御の出力データを基に、制御評価値の算出を行う。出力目標値に精度よく一致しているほど制御評価値が高くなるような演算をおこなう。具体的には、恒温恒湿室の制御の場合には、目標温度と計測した出力温度との偏差の値は小さいほど制御は良好である。従って、この場合は、偏差が小さいほど高い制御評価値が算出されるような演算を行う。 Step S21: Calculate a control evaluation value based on the output data of PID control for the first predetermined time T1. Calculations are performed so that the more accurately the output matches the target value, the higher the control evaluation value. Specifically, in the case of control of a constant temperature and humidity chamber, the smaller the deviation between the target temperature and the measured output temperature, the better the control. Therefore, in this case, calculations are performed so that the smaller the deviation, the higher the control evaluation value calculated.

ステップＳ２２：制御評価値の算出が終了すると、学習データの保存を行う。ＰＩＤ制御実行の制御条件（制御目標値、環境データ）とＰＩＤ制御パラメータと制御評価値を組にして、学習データとして記憶する。これによって、制御条件に対して、どのようにＰＩＤ制御パラメータを更新すれば、制御評価値が大きくなる（＝所望の良好な制御結果が得られる）のかということが学習データとして記憶されることになる。 Step S22: Once calculation of the control evaluation value has been completed, the learning data is saved. The control conditions for PID control execution (control target value, environmental data), PID control parameters, and control evaluation value are paired and stored as learning data. This stores as learning data how to update the PID control parameters for the control conditions to increase the control evaluation value (= obtain the desired good control results).

ステップＳ２３：制御評価値によって、次ステップのＰＩＤ制御パラメータ更新における学習方針を決定する。詳細は後述する。 Step S23: The learning policy for updating the PID control parameters in the next step is determined based on the control evaluation value. Details will be described later.

制御評価値の算出には、ＰＩＤ制御結果の、（１）収束性、（２）応答性、（３）安定性を反映するように測定データを基に演算を行うが、これに限定されない。 The control evaluation value is calculated based on measurement data to reflect, but is not limited to, (1) convergence, (2) responsiveness, and (3) stability of the PID control results.

（１）収束性については、第２の所定時間Ｔ２ごとの出力データの値と制御目標値との差、あるいはその累積値を対応させるようにしてもよい。出力データの値と制御目標値との差が小さいほど、制御精度が良いことになり、収束性は良いと言える。また、時間経過における出力データの値と制御目標値との差の累積値は、どれだけ速く収束したかを反映させた値となる。 (1) Convergence may be measured by the difference between the output data value and the control target value for each second predetermined time T2, or by the cumulative value thereof. The smaller the difference between the output data value and the control target value, the better the control accuracy and the better the convergence. Furthermore, the cumulative value of the difference between the output data value and the control target value over time reflects how quickly the convergence occurred.

（２）応答性については、第２の所定時間Ｔ２ごとの出力データの変化幅を対応させるようにしてもよい。一定時間間隔ごとの出力データの変化幅は、出力データの変化率に相当するので、応答性を反映した値となる。 (2) Responsiveness may be measured by the range of change in the output data for each second predetermined time period T2. The range of change in the output data for each fixed time period corresponds to the rate of change in the output data, and therefore is a value that reflects responsiveness.

（３）安定性については、所定の時間経過後の出力データの最大値と最小値の差、あるいは制御目標値を中心として出力データが増減する変化の回数や増減する変化幅と対応させるようにしてもよい。これらの値は、ハンチングが生じているか否か、ハンチングが生じている場合にどの程度であるかを反映した値となっており、安定性を反映している。 (3) Stability may be measured by the difference between the maximum and minimum values of the output data after a predetermined time has elapsed, or by the number of times the output data increases or decreases around the control target value, or the magnitude of the increase or decrease. These values reflect whether hunting is occurring, and if so, the extent of hunting, and therefore reflect stability.

収束性、応答性、安定性の各性能については、複雑に関連しており、完全に切り離して評価できるものではない。従って、制御評価値の算出は、例えば、複数の制御評価値を算出し、係数を掛けて合計した値を採用してもよい。あるいは複数の制御評価値を採用するようにしてもよい。 The performance of convergence, responsiveness, and stability are intricately related and cannot be evaluated completely separately. Therefore, the control evaluation value may be calculated, for example, by calculating multiple control evaluation values, multiplying them by a coefficient, and adding them together. Alternatively, multiple control evaluation values may be used.

以上、説明した制御評価値の算出の方法は、例示である。所定時間内の出力データを基に制御評価値を算出することが重要である。具体的にどのような演算を行うかについては、応用する制御対象に合わせて調整することが可能である。 The method for calculating the control evaluation value explained above is an example. It is important to calculate the control evaluation value based on output data within a specified period of time. The specific calculations performed can be adjusted to suit the control target being applied.

次に学習方針の決定について説明する。学習方針は、後述のＰＩＤ制御パラメータの更新方法（＝強化学習における行動）を決定する。例えば、学習の進行度に応じて、強化学習の方針が変更されるようにすることが好ましい。 Next, we will explain how to determine the learning policy. The learning policy determines how to update the PID control parameters (= behavior in reinforcement learning), which will be described later. For example, it is preferable to change the reinforcement learning policy depending on the progress of learning.

学習方針は、例えば以下のようなものである。
・方針Ａは、十分に学習が進んだ後であり、最適なＰＩＤ制御パラメータを選択する
・方針Ｂは、経年劣化等の影響の可能性があり、強化学習により新たにＰＩＤ制御パラメータの修正を進める
・方針Ｃは、学習の初期の段階であり、データ取得を繰り返し、学習データを蓄積する The learning policy is as follows:
・Policy A is after sufficient learning has progressed, and optimal PID control parameters are selected. ・Policy B is likely to be affected by aging, etc., and new PID control parameters are revised using reinforcement learning. ・Policy C is in the early stages of learning, and data is repeatedly acquired and learning data is accumulated.

例えば、所定時間のＰＩＤ制御を実行した制御評価値に基づいて、これらの方針のどの方針を選択するかを決めることができる。
・同じ制御条件と同じＰＩＤ制御パラメータに対して、同じ制御評価値が得られている場合は方針Ａを選択する。
・同じ制御条件と同じＰＩＤ制御パラメータに対して、前回よりも低い制御評価値が得られた場合は方針Ｂを選択する。制御評価値の下がり方が大きい場合は、方針Ｃを選択するようにしてもよい。例えば、制御評価値が１０％以上減少した場合には、方針Ｃとする。
・学習によって得られたデータ数が少ない場合は、無条件に方針Ｃを選択する。 For example, it is possible to determine which of these strategies to select based on a control evaluation value obtained by executing PID control for a predetermined period of time.
If the same control evaluation value is obtained for the same control conditions and the same PID control parameters, strategy A is selected.
When a lower control evaluation value is obtained than the previous time for the same control conditions and the same PID control parameters, strategy B is selected. When the control evaluation value decreases significantly, strategy C may be selected. For example, when the control evaluation value decreases by 10% or more, strategy C is selected.
If the amount of data obtained by learning is small, policy C is selected unconditionally.

強化学習における状態（制御条件）と行動（ＰＩＤ制御パラメータの更新）に対応する制御評価値によって、学習方針が決定される。 The learning policy is determined by the control evaluation value corresponding to the state (control condition) and action (update of PID control parameters) in reinforcement learning.

（ＰＩＤ制御パラメータ更新処理）
図７は、ＰＩＤ制御パラメータ更新処理のフローチャートである。ＰＩＤ制御パラメータ更新処理では、次のＰＩＤ制御実行のためにＰＩＤ制御パラメータを更新する。 (PID control parameter update process)
7 is a flowchart of the PID control parameter update process, in which the PID control parameters are updated for the next PID control execution.

ステップＳ３１：現在の強化学習の学習方針を確認する。学習方針によって、更新方法を変更する。
ステップＳ３２～Ｓ３４：現在の学習方針に対応するＰＩＤ制御パラメータの更新を実行する（後述）。
ステップＳ３５：前ステップで更新されたＰＩＤ制御パラメータをＰＩＤ制御部１０に送信する。 Step S31: Check the current reinforcement learning learning policy. Change the update method depending on the learning policy.
Steps S32 to S34: Update of PID control parameters corresponding to the current learning policy is executed (described later).
Step S35: The PID control parameters updated in the previous step are sent to the PID control unit 10.

本実施形態においては、ステップＳ３２～Ｓ３４は、学習方針が３つの場合を示している。方針Ａの場合は、更新Ａを実行する（ステップＳ３２）。方針Ｂの場合は、更新Ｂを実行する（ステップＳ３３）。方針Ｃの場合は、更新Ｃを実行する(ステップＳ３４）。更新方法は３つの例を示しているが、これに限定されるものではない。 In this embodiment, steps S32 to S34 show the case where there are three learning policies. For policy A, update A is executed (step S32). For policy B, update B is executed (step S33). For policy C, update C is executed (step S34). Three examples of update methods are shown, but the present invention is not limited to these.

以下、方針Ａ～方針Ｃに対応する更新方法の具体的な内容について例示する。
・方針Ａに対応する更新Ａ：制御評価値が最大となる行動（＝ＰＩＤ制御パラメータ）を選択する。
・方針Ｂに対応する更新Ｂ：制御評価値が最大となる行動の値に対して、ランダムに増減した値とする。
・方針Ｃに対応する更新Ｃ：ランダムに行動の値を決定する。
以上は、学習方針に対応する更新方法の例であるが、他の更新方法を用いてもよい。 Specific examples of update methods corresponding to policies A to C are given below.
Update A corresponding to policy A: Select the action (=PID control parameter) that maximizes the control evaluation value.
Update B corresponding to policy B: The value is set to a value that is randomly increased or decreased from the value of the action that maximizes the control evaluation value.
Update C corresponding to policy C: The value of the action is determined randomly.
The above are examples of update methods corresponding to the learning policy, but other update methods may also be used.

本実施形態で示したＰＩＤ制御パラメータの調整方法は、所定回数あるいは所定条件に達したときに強化学習を終了させず、常に強化学習を継続させるという特徴を備える。常に強化学習を継続しているので、常に制御評価値が算出される。制御対象装置において、経年劣化が徐々に進行すると、制御評価値がこれまでの値より低くなる場合が発生するようになる。すると、上述の方針Ｂの場合に該当するようになる。この場合、ＰＩＤ制御パラメータの更新は、これまでの制御評価値が最大となるＰＩＤ制御パラメータの値からランダムに増減した値となる。従って、新たなＰＩＤ制御パラメータの強化学習が進むことになり、これを繰り返すことによって、強化学習における新たな状態に対する最適なＰＩＤ制御パラメータが学習されることになる。以上によって、経年劣化等によってＰＩＤ制御パラメータの修正が必要となる場合にも、自動的に対応していくことが可能となる。 The PID control parameter adjustment method described in this embodiment is characterized by the fact that reinforcement learning is not terminated when a predetermined number of times or when a predetermined condition is reached, but is instead constantly continued. Because reinforcement learning is constantly ongoing, a control evaluation value is constantly calculated. As aging progresses in the controlled device, cases will arise in which the control evaluation value becomes lower than its previous value. This corresponds to the above-mentioned strategy B. In this case, the PID control parameters are updated to values that are randomly increased or decreased from the PID control parameter value that maximized the previous control evaluation value. Therefore, reinforcement learning of new PID control parameters progresses, and by repeating this process, optimal PID control parameters for the new state in reinforcement learning are learned. As a result, it is possible to automatically respond to cases in which PID control parameters need to be modified due to aging, etc.

以上説明した本実施形態のＰＩＤ制御装置の多くの構成はコンピュータを使用したハードウェアおよびソフトウェアによって構成することができる。１つのコンピュータシステムで実現してもよいし、それぞれ別のコンピュータシステムを連携させて実現してもよい。あるいはコンピュータを使用しない構成とすることも可能である。全体システムをどのように構成するかは本開示の本質とは無関係である。 Many of the components of the PID control device of this embodiment described above can be configured using computer-based hardware and software. They may be implemented using a single computer system, or by linking separate computer systems. Alternatively, a configuration that does not use a computer is also possible. How the overall system is configured is irrelevant to the essence of this disclosure.

尚、本実施形態のＰＩＤ制御装置は、常に強化学習を行うために、学習データが膨大になる可能性がある。このために、記憶部６３０の記憶容量は、これを考慮したものとする必要がある。記憶部６３０は、コンピュータシステムの記憶装置、例えば、ハードディスク装置で構成することができる。あるいは、ネットワーク接続したクラウド上の記憶装置を使用する構成としてもよい。この観点で、強化学習部６０はコンピュータシステムで構成することが好ましい。 In addition, because the PID control device of this embodiment constantly performs reinforcement learning, there is a possibility that the amount of learning data will become enormous. For this reason, the storage capacity of the storage unit 630 must be determined taking this into consideration. The storage unit 630 can be configured as a storage device of a computer system, for example, a hard disk drive. Alternatively, it may be configured to use a storage device on a cloud connected to a network. From this perspective, it is preferable that the reinforcement learning unit 60 be configured as a computer system.

（応用例）
恒温恒湿室に用いる空気調和機に、本開示のＰＩＤ制御パラメータ調整方法が適用可能である。恒温恒湿室は、各種の実験室や測定室、クリーンルーム等に広く使用され、高い精度の温湿度制御が求められる。いったん設備が導入されると、長期間に亘って使用されるので、ＰＩＤ制御パラメータを経年劣化に対応して最適化することは必須となる。本開示のＰＩＤ制御パラメータ調整方法をＰＩＤ制御装置に組み込んで、空気調和機を制御することによって、経年劣化に自動的に対応できる温湿度制御機能を有する空気調和機が実現できる。尚、温湿度制御に限らず、ＰＩＤ制御を用いた制御装置一般においても、本開示のＰＩＤ制御パラメータ調整方法は適用可能である。 (Application example)
The PID control parameter adjustment method of the present disclosure can be applied to air conditioners used in constant temperature and humidity rooms. Constant temperature and humidity rooms are widely used in various laboratories, measurement rooms, clean rooms, etc., and require highly accurate temperature and humidity control. Once installed, the equipment is used for a long period of time, so it is essential to optimize the PID control parameters to address deterioration over time. By incorporating the PID control parameter adjustment method of the present disclosure into a PID control device and controlling the air conditioner, an air conditioner with a temperature and humidity control function that can automatically address deterioration over time can be realized. Note that the PID control parameter adjustment method of the present disclosure can be applied not only to temperature and humidity control, but also to general control devices using PID control.

１ＰＩＤ制御装置、１０ＰＩＤ制御部、２０制御対象装置、３０目標設定入力部、４０環境データ取得部、５０制御データ測定部、６０強化学習部、６１０入力部、６１１目標値入力部、６１２環境データ入力部、６１３出力データ測定部、６２０制御部、６２１制御時間計測部、６３０記憶部、６３１測定データ記憶部、６３２学習データ記憶部、６４０演算部、６４１データ演算部、６４２パラメータ設定部、６５０出力部、６５１パラメータ出力部、Ｔ１第１の所定時間、Ｔ２第２の所定時間
1 PID control device, 10 PID control unit, 20 Control target device, 30 Target setting input unit, 40 Environmental data acquisition unit, 50 Control data measurement unit, 60 Reinforcement learning unit, 610 Input unit, 611 Target value input unit, 612 Environmental data input unit, 613 Output data measurement unit, 620 Control unit, 621 Control time measurement unit, 630 Memory unit, 631 Measurement data memory unit, 632 Learning data memory unit, 640 Calculation unit, 641 Data calculation unit, 642 Parameter setting unit, 650 Output unit, 651 Parameter output unit, T1 First predetermined time, T2 Second predetermined time

Claims

1. A method for adjusting PID control parameters for a PID controller, comprising:
a first step of PID-controlling a value of output data indicating at least one of temperature and humidity of a controlled device to a control target value using predetermined PID control parameters for a first predetermined time;
a second step of calculating a control evaluation value using the value of the output data of the control target device during the first predetermined time period , the control target value, and an equation indicating the relationship between the value of the output data and the control target value, after the first step is executed;
a third step of storing, after execution of the second step, the control conditions of the PID control for the first predetermined time, the PID control parameters, and the control evaluation value as learning data;
and a fourth step of updating a PID control parameter of the PID control device based on the control evaluation value after the third step is executed,
a PID control parameter adjusting method, wherein after the fourth step is performed, the first step is always performed using the updated PID control parameters;

2. The PID control parameter adjusting method according to claim 1, wherein the first step includes a step of measuring a value of the output data of the controlled device for the first predetermined time period and storing the value of the output data.

The second step includes:
a difference between the value of the stored output data for each second predetermined time period that is shorter than the first predetermined time period and the control target value;
a change range of the value of the stored output data for each second predetermined time;
the difference between the maximum and minimum values of the stored output data;
3. The PID control parameter adjustment method according to claim 2, wherein the control evaluation value is calculated based on at least one of the number of times the value of the stored output data increases or decreases around the control target value, or the width of the change.

The fourth step includes:
selecting an update method based on a learning policy determined by the control conditions and the control evaluation values corresponding to the PID control parameters;
updating the PID control parameters based on the selected update method;
4. The PID control parameter adjusting method according to claim 1, further comprising the step of transmitting the updated PID control parameters to the PID control device.

A PID controller,
a PID control unit that executes a first process of PID-controlling a value of output data indicating at least one of temperature and humidity of the control target device to a control target value using predetermined PID control parameters for a first predetermined time;
a data calculation unit that executes, after the execution of the first process, a second process of calculating a control evaluation value using the value of the output data of the control target device during the first predetermined time, the control target value, and an equation indicating the relationship between the value of the output data and the control target value ;
a storage unit that executes a third process of storing, after the execution of the second process, the control conditions of the PID control for the first predetermined time, the PID control parameters, and the control evaluation value as learning data;
a parameter setting unit that executes a fourth process to update a PID control parameter of the PID control device based on the control evaluation value after the third process is executed,
A PID control device in which, after the fourth process is executed, the first process is always executed using the updated PID control parameters.

An air conditioner having the PID control device described in claim 5.

A PID control device,
The PID control device
a PID control unit that PID controls a value of output data indicating at least one of the temperature and humidity of the controlled device to a control target value;
a target setting input unit for setting the control target value of the PID control unit;
an environmental data acquisition unit that acquires environmental conditions;
a control data measurement unit that measures the value of the output data of the control target device;
a reinforcement learning unit that learns a PID control parameter after the PID control is executed,
The reinforcement learning unit
an input unit that acquires the control target value from the target setting input unit, acquires the environmental condition from the environmental data acquisition unit, and acquires the value of the output data of the control target device from the control data measurement unit;
a control unit that controls a control time of the PID control unit;
a storage unit that stores the value of the output data of the control target device acquired from the control data measurement unit;
a calculation unit having a function of calculating a control evaluation value using the output data value of the control target device stored in the storage unit, the control target value, and an equation indicating the relationship between the output data value and the control target value, and a function of updating a PID control parameter based on the control evaluation value;
an output unit that transmits the updated PID control parameters of the calculation unit to the PID control unit;
A PID control device in which the PID control is always performed after the PID control parameters are learned.

An air conditioner having the PID control device described in claim 7.