JP2004118658A5

JP2004118658A5 -

Info

Publication number: JP2004118658A5
Application number: JP2002282987A
Authority: JP
Filing date: 2002-09-27
Publication date: 2005-05-12
Anticipated expiration: 2022-09-27

Description

理想的なアルゴリズムは、入力データ内の冗長性による数値上の問題を避け、入力値の異常を排除し、学習中の更新処理の計算の複雑さをおさえながらデータ効率を高く維持し、高次元空間での学習をリアルタイムで行なえるようにし、当然のことながら、正確な関数近似が可能でかつ十分に一般化可能である必要がある。さらに、学習制御において関数近似を行なう上で特に問題となるのは、多くの場合、動作範囲が未知であり、上限でしか規定されないということである。そのように動作範囲を大きく見積もった場合に関数近似を行なう場合、多くの学習パラメータを割当てなければならないので計算コストが高くなる。さらにそれらのパラメータが学習データによる制約を適切に受けていないと、ノイズに対し、オーバーフィッティングしてしまうというおそれもある。一般に、推定すべき関数の複雑さが未知である場合に、学習パラメータの数としていくつを選べばよいかを決定するのは難しく、特に学習をオンラインで行なう場合には困難な問題である。 The ideal algorithm avoids numerical problems due to redundancy in the input data, eliminates anomalies in the input values, maintains high data efficiency while keeping the computational complexity of the update process during learning, high dimensional In order to be able to perform learning in space in real time, it is of course necessary that accurate function approximation be possible and sufficiently generalizable. Furthermore, what is particularly problematic in performing function approximation in learning control is that in many cases the operating range is unknown and can only be defined at the upper limit. When performing function approximation if the estimated increased so the operating range, the computational cost because it must assign a number of learning parameters increases. Furthermore, if these parameters are not properly restricted by the learning data, there is a risk that they will overfit against noise. In general, when the complexity of the function to be estimated is unknown, it is difficult to determine how many to choose as the number of learning parameters, especially when learning is performed online.

パラメータθ_kを、（x_i，y_i)または（x_i，e_i）の形式で与えられるデータから近似する必要がある。ここでy_iは学習のターゲットであり、e_iは推定誤差e_p,I＝f(x_i)-^f(x_i)を近似する誤差信号であって、平均値が０の雑音を含む。 The parameter θ _k needs to be approximated from the data given in the form (x _i , y _i ) or (x _i , e _i ). Here, y _i is a target of learning, and e _i is an error signal that approximates the estimation error e _{p, I} = f (x _i )-^ f (x _i ), and includes noise with an average value of 0. .

Ｋ．Ｓ．ナレンドラおよびＡ．Ｍ．アナスワミ著、「安定適応システム」プレンティスホール社発行、１９８９年（K．S．Narendra and A．M．Annaswamy， Stable Adaptive Systems．Prentice Hall，1989．）K. S. Narendra and A. M. Published by Anaswami, "Stabilized Adaptive Systems", Prentice Hall, 1989 (K.S. Narendra and A. M. Annaswamy, Stable Adaptive Systems. Prentice Hall, 1989.) Ｊ．−Ｊ．Ｅ．スロタインおよびＷ．リー著、「応用非線形制御」、プレンティスホール社発行、１９９１年（J．-J．E．Slotine and W．Li，Applied Nonlinear Control．Prentice Hall，1991．）J. -J. E. Throtain and W. Lee, "Applied Nonlinear Control", published by Prentice Hall, 1991 (J.-J. E. Slotine and W. Li, Applied Nonlinear Control. Prentice Hall, 1991.) Ｊ．−Ｊ．Ｅ．スロタインおよびＷ．リー著、「ロボットマニピュレータの適応制御について」、インターナショナル・ジャーナル・オブ・ロボティックス・リサーチ、第６巻第３号、ｐｐ．４９−５０，１９８７年（J．-J．E．Slotine and W．Li，“On the adaptive control of robot manipulators，” International Journal of Robotics Research，vol．6，no．3，pp． 49-59，1987．）J. -J. E. Throtain and W. Lee, "On adaptive control of robot manipulators", International Journal of Robotics Research, Vol. 6, No. 3, pp. 49-50, 1987 (J.-J.E. Slotine and W. Li, "On the adaptive control of robot manipulators," International Journal of Robotics Research, vol. 6, no. 3, pp. 49-59, 1987.) Ｌ．Ｌ．ホィットコム、Ａ．Ａ．リッツィおよびＤ．Ｅ．コディシェク著、「ロボットアームのための新たな適応制御を用いた比較実験」、ＩＥＥＥトランザクションズ・オン・ロボティックス・アンド・オートメーション、第９巻ｐｐ．５９−７０，１９９３年２月（L．L．Whitcomb，A．A．Rizzi，and D．E．Koditschek，“Comparative experiments with a new adaptive controller for robot arms，” IEEE Transactions on Robotics and Automation，vol．9，pp．59-70，Feb．1993．）L. L. Whitecom, A. A. Rizzi and D.S. E. Kodi Shek, "Comparison Experiment with New Adaptive Control for Robot Arms", IEEE Transactions on Robotics and Automation, Volume 9 pp. 59-70, February 1993 (L. L. Whitcomb, A. A. Rizzi, and D. E. Koditschek, "Comparison experiments with a new adaptive controller for robot arms," IEEE Transactions on Robotics and Automation, vol. 9, pp. 59-70, Feb. 1993.). Ａ．Ｕ．レヴィンおよびＫ．Ｓ．ナレンドラ著、「ニューラルネットワークを用いた非線形動システムの制御：可制御性および安定性」、ＩＥＥＥトランザクションズ・オン・ニューラル・ネットワークス、第４巻、ｐｐ．１９２−２０６、１９９３年３月（A．U．Levin and K．S．Narendra，“Control of nonlinear dynamical systems using neural networks：Controllability and stabilization，” IEEE Transactions on Neural Networks，vol．4，pp．192-206，Mar．1993．）A. U. Levin and K. S. Narendra, "Control of Nonlinear Motion Systems Using Neural Networks: Controllability and Stability", IEEE Transactions on Neural Networks, Volume 4, pp. 192-206, March 1993 (A. U. Levin and K. S. Narendra, "Control of nonlinear dynamical systems using neural networks: Controllability and stabilization," IEEE Transactions on Neural Networks, vol. 4, pp. 192- 206, Mar. 1993.). Ｆ．−Ｃ．チェンおよびＨ．Ｋ．カリル著、「ニューラルネットワークを用いた非線形離散時間システムのクラスの適応制御」、ＩＥＥＥトランザクションズ・オン・オートマチック・コントロール、第４０巻、ｐｐ．７９１−８０１、１９９５年５月（F．-C．Chen and H．K．Khalil，“Adaptive control of a class of nonlinear discrete-time systems using neural networks，” IEEE Transactionson Automatic Control，vol．40，pp．791-801，May 1995．）F. -C.I. Chen and H. K. Karil, "Adaptive Control of a Class of Nonlinear Discrete-Time Systems Using Neural Networks," IEEE Transactions on Automatic Control, vol. 791-801, May 1995 (F.-C. Chen and H. K. Khalil, "Adaptive control of a class of nonlinear discrete-time systems using neural networks," IEEE Transactionson Automatic Control, vol. 40, pp. 791-801, May 1995.) Ｒ．サナーおよびＪ．−Ｊ．Ｅ．スロタイン著、「直接適応制御のためのガウシアンネットワーク」、ＩＥＥＥトランザクションズ・オン・ニューラル・ネットワークス、第３巻、ｐｐ．８３７−８６３，１９９２年１１月（R．Sanner and J．-J．E．Slotine，“Gaussian networks for direct adaptive control，” IEEE Transactions on Neural Networks，vol．3，pp．837-863，Nov．1992．）R. Sanar and J.A. -J. E. Throtain, "Gaussian Network for Direct Adaptive Control", IEEE Transactions on Neural Networks, vol. 837-863, November 1992 (R. Sanner and J.-J. E. Slotine, "Gaussian networks for direct adaptive control," IEEE Transactions on Neural Networks, vol. 3, pp. 837-863, Nov. 1992 .) Ｓ．セシャギリおよびＨ．Ｋ．カリル著、「ＲＢＦニューラル・ネットワークスを用いた非線形システムの出力フィードバック制御」、ＩＥＥＥトランザクションズ・オン・ニューラル・ネットワークス、第１１巻，ｐｐ．６９−７９，２０００年１月（S．Seshagiri and H．K．Khalil，“Output feedback control of nonlinear systems using RBF neural networks，” IEEE Transactions on Neural Networks，vol．11，pp．69-79，Jan．2000．）S. Seshagiri and H. K. Karil, "Output feedback control of nonlinear systems using RBF neural networks", IEEE Transactions on Neural Networks, Vol. 11, pp. 69-79, January 2000 (S. Seshagiri and H. K. Khalil, “Output feedback control of nonlinear systems using RBF neural networks,” IEEE Transactions on Neural Networks, vol. 11, pp. 69-79, Jan. 2000.) Ｊ．Ｙ．チョイおよびＪ．Ａ．ファレル著、「ピースワイズ線形近似のネットワークを用いた非線形適応制御」、ＩＥＥＥトランザクションズ・オン・ニューラル・ネットワークス、第１１巻、ｐｐ．３９０−４０１、２０００年３月（J．Y．Choi and J．A．Farrell，“Nonlinear adaptive control using networks of piecewise linear approximations，” IEEE Transactions on Neural Networks，vol．11，pp．390-401，Mar．2000．）J. Y. Choi and J.A. A. See Farrell, "Nonlinear Adaptive Control Using a Network of Piecewise Linear Approximations," IEEE Transactions on Neural Networks, Vol. 11, pp. 390-401, March 2000 (J. Y. Choi and J. A. Farrell, "Non-linear adaptive control using networks of piecewise linear approximations," IEEE Transactions on Neural Networks, vol. 11, pp. 390-401, Mar .2000.) Ｃ．Ｇ．アトキソン、Ａ．Ｗ．ムーア、およびＳ．シャール著、「局所重み付け学習」、アーティフィシャル・インテリジェンス・レビュー、第１１巻、第１−５号、ｐｐ．１１−７３、１９９７年（C．G．Atkeson，A．W．Moore，and S．Schaal，“Locally weighted learning，” Artificial Intelligence Review，vol．11，no．1-5，pp．11-73，1997．）C. G. Ataxone A. W. Moore, and S. Schal, "Locally Weighted Learning", Artificial Intelligence Review, Vol. 11, No. 1-5, pp. 259-324. 11-73, 1997 (CG Atkeson, A.W. Moore, and S. Schaal, “Locally weighted learning,” Artificial Intelligence Review, vol. 11, no. 1-5, pp. 11-73, 1997.) Ｊ．−Ｊ．Ｅ．スロタインおよびＷ．リー著、「ロボットマニピュレータの複合的適応制御」、オートマチカ、第２５巻、第４号、ｐｐ．５０９−５１９、１９８９年（J．-J．E．Slotine and W．Li，“Composite adaptive control of robot manipulators，” Automatica，vol．25，no．4，pp．509-519，1989．）J. -J. E. Throtain and W. Lee, "Integrated Adaptive Control of Robot Manipulators", Automata, Vol. 25, No. 4, pp. 509-519, 1989 (J.-J.E. Slotine and W. Li, "Composite adaptive control of robot manipulators," Automatica, vol. 25, no. 4, pp. 509-519, 1989.) Ｓ．ヴィジャヤクマールおよびＨ．オガワ著、「正確なインクリメンタル学習のためのＲＫＨＳベースの関数分析」、ニューロコンピューティング、第２９巻、第１−３号、ｐｐ．８５−１１３、１９９９年（S．Vijayakumar and H．Ogawa，“RKHS based functional analysis for exact incremental learning，” Neurocomputing，vol．29，no．1-3，pp．85-113，1999．）S. Vijayakumar and H.A. Ogawa, "RKH based functional analysis for accurate incremental learning," Neurocomputing, Vol. 29, No. 1-3, pp. 85-113, 1999 (S. Vijayakumar and H. Ogawa, "RKHS based functional analysis for exact incremental learning," Neurocomputing, vol. 29, no. 1-3, pp. 85-113, 1999.) Ｓ．シャールおよびＣ．Ｇ．アトキソン、「局所情報のみからのコンストラクティブ・インクリメンタル学習」、ニューラル・コンピューテーション、第１０巻、第８号、ｐｐ．２０４７−２０８４、１９９８年（S．Schaal and C．G．Atkeson，“Constructive incremental learning from only local information，” Neural Computation，vol．10，no．8，pp．2047-2084，1998．）S. Shall and C.I. G. Atoxon, "Constructive Incremental Learning from Local Information Only", Neural Computation, Vol. 10, No. 8, pp. 2047-2084, 1998 (S. Schaal and C. G. Atkeson, "Constructive incremental learning from only local information," Neural Computation, vol. 10, no. 8, pp. 2047-2084, 1998.) Ｌ．リュングおよびＴ．ソーダーストローム著、「再帰的同定の理論と実践」、ＭＩＴプレス発行、１９８６年（L．Ljung and T．Soederstroem，Theory and Practice of Recursive Identification．MIT Press，1986．）L. Lung and T.W. Sodastrom, "Theory and Practice of Recursive Identification", MIT Press, 1986 (L. Ljung and T. Soederstroem, Theory and Practice of Recursive Identification. MIT Press, 1986.) Ｈ．Ｋ．カリル著、「非線形系（第２版）」、プレンティスホール社刊、１９９６年（H．K．Khalil，Nonlinear Systems （2nd Edition）．Prentice Hall，1996．）H. K. Karil, "Nonlinear Systems (2nd Edition)", Prentice Hall, 1996 (HK Khalil, Nonlinear Systems (2nd Edition). Prentice Hall, 1996.) Ｓ．シャールおよびＣ．Ｇ．アトキソン著、「レセプティブ・フィールド重み付け回帰」、テクニカル・レポートＲＥ−Ｈ−２０９、ＡＴＲ人間情報処理研究所発行、１９９７年（S．Schaal and C．G．Atkeson，“Receptive field weighted regression，” Technical report RE-H-209，ATR Human Information Processing Laboratories，1997．）S. Shall and C.I. G. By Atoxon, "Receptive Field Weighted Regression", Technical Report RE-H-209, ATR Human Information Processing Research Institute, 1997 (S. Schaal and CG Atkeson, "Receptive field weighted regression," Technical report RE-H-209, ATR Human Information Processing Laboratories, 1997.) Ｈ．ゴミおよびＭ．カワト著、「フィードバック誤差学習を用いたクローズド・ループ系のためのニューラル・ネットワーク制御」、ニューラル・ネットワークス、第６巻、ｐｐ．９３３−９４６、１９９３年（H．Gomi and M．Kawato， “Neural network control for a closed-loop system using feedback-error-learning，” Neural Networks，vol．6，pp．933-946，1993．）H. Garbage and M. Kawato, "Neural Network Control for Closed Loop Systems Using Feedback Error Learning," Neural Networks, Vol. 933-946, 1993 (H. Gomi and M. Kawato, "Neural network control for a closed-loop system using feedback-error-learning," Neural Networks, vol. 6, pp. 933-946, 1993.)

Ｐ_kは重み付けされた入力ｘ_kに対する共分散行列の逆行列、θ_kは当該局所モデルの学習パラメータ、ｗ_kは当該局所モデルの重み、ｅはトラッキング誤差、ｅ_pkは近似誤差、λは忘却係数、
にしたがって当該局所モデルの学習パラメータの近似＾θ_kを算出するステップと、所定の式により定められる、学習データを表わす関数値ｙと関数近似＾ｙとの間で定められる誤差指標を最小化することにより、距離メトリックの各々を最適化するステップとを含んでもよい。

P _k is the inverse matrix of the covariance matrix for the input x _k weighted, theta _k learning parameters of the local models, w _k is the weight of the local model, e is the tracking error, e _pk approximation error, lambda forgetting coefficient,
Calculating an approximation ^ θ _k of the learning parameter of the local model according to and minimizing an error index defined between the function value y representing the learning data and the function approximation ^ y determined by a predetermined equation And D. optimizing each of the distance metrics.

この発明のさらに他の局面にかかる物理系の制御装置は、物理系の動力学を記述する非線形関数を、線形の局所モデルに重みを付けて加算することにより得られる関数近似で近似することによって物理系を制御する物理系の制御装置である。関数近似を構成する局所モデルの構造と、それぞれの重みとはそれぞれ所定の学習パラメータにより定められる。この装置は、関数近似の初期構造を規定するための初期化手段と、物理系の実際の状態を表わす状態データを受信するための受信手段と、状態データに基づいて、物理系の目標軌跡と実際の軌跡との間のトラッキング誤差、および状態データと関数近似との間の近似誤差に基づいて、各局所モデルごとに独立に所定の誤差指標を最小化するように各局所モデルの学習パラメータを更新することで関数近似を更新するための更新手段と、更新された関数近似を用い、制御系の制御則にしたがって制御変数の計算を行なうための計算手段と、計算された制御変数を物理系に出力するための出力手段と、受信手段、更新手段、計算手段および出力手段が繰返し動作するよう制御するための制御手段とを含む。 A controller of a physical system according to still another aspect of the present invention approximates the non-linear function describing the dynamics of the physical system by a function approximation obtained by weighting and adding a linear local model. It is a control device of a physical system that controls the physical system. The structure of the local model constituting the function approximation and the respective weights are determined by predetermined learning parameters. This apparatus comprises: initialization means for defining an initial structure of function approximation; receiving means for receiving state data representing an actual state of the physical system; and a target trajectory of the physical system based on the state data Based on the tracking error between the actual trajectory and the approximation error between the state data and the function approximation, learning parameters of each local model are minimized so as to minimize a predetermined error index independently for each local model. Updating means for updating function approximation by updating, calculating means for calculating control variables according to the control law of the control system using the updated function approximation, and the calculated control variables as a physical system And output means for outputting data, and control means for controlling the receiving means, the updating means, the calculating means, and the output means to operate repeatedly.

Ｐ_kは重み付けされた入力ｘ_kに対する共分散行列の逆行列、θ_kは当該局所モデルの学習パラメータ、ｗ_kは当該局所モデルの重み、ｅはトラッキング誤差、ｅ_pkは近似誤差、λは忘却係数、
にしたがって当該局所モデルの学習パラメータの近似＾θ_kを算出するための手段と、所定の式により定められる、学習データを表わす関数値ｙと関数近似＾ｙとの間で定められる誤差指標を最小化することにより、距離メトリックの各々を最適化するための最適化手段とを含む。

P _k is the inverse matrix of the covariance matrix for the input x _k weighted, theta _k learning parameters of the local models, w _k is the weight of the local model, e is the tracking error, e _pk approximation error, lambda forgetting coefficient,
Means for calculating the approximation ^ θ _k of the learning parameter of the local model according to and the error index defined between the function value y representing the learning data and the function approximation ^ y determined by a predetermined equation And optimizing means for optimizing each of the distance metrics.

図５に、本実施の形態にかかる非線形制御を行なうコントローラ６０のブロック図を、コントローラ６０に対して制御対象となるロボットなどの物理系からの制御変数の入力を行なうためのセンサ群６２Ａ−６２Ｎと、コントローラ６０によって制御されて動作するアクチュエータ群６４Ａ−６４Ｍとともに示す。コントローラ６０は、センサ群６２Ａ−６２Ｎからの入力を受ける入力ポート７０と、アクチュエータ群６４Ａ−６４Ｍが接続される出力ポート７２と、入力ポート７０および出力ポート７２に接続されるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）７４と、いずれもＣＰＵ７４に接続されるＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）７６、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）７８、ネットワークボード８２、およびメモリリーダ８０とを含む。ネットワークボード８２は外部のネットワーク９２に接続される。メモリリーダ８０には、集積回路からなるメモリカード９０を着脱可能であり、メモリカード９０に格納されたデータおよびプログラムをＣＰＵ７４に供給し、ＣＰＵ７４からのデータを格納することが可能である。 FIG. 5 is a block diagram of the controller 60 that performs non-linear control according to the present embodiment, and a sensor group 62A-62N for inputting control variables from a physical system such as a robot to be controlled to the controller 60. And the actuator group 64A-64M controlled and operated by the controller 60. The controller 60 has a CPU (Central Processing Unit) connected to the input port 70 receiving the input from the sensor group 62A to 62N, the output port 72 to which the actuator group 64A to 64M is connected, and the input port 70 and the output port 72 74 includes a ROM (Read-Only Memory) 76, a RAM (Random Access Memory) 78, a network board 82, and a memory reader 80, all of which are connected to the CPU 74. The network board 82 is connected to an external network 92. A memory card 90 made of an integrated circuit is removable from the memory reader 80. The data and program stored in the memory card 90 can be supplied to the CPU 74, and data from the CPU 74 can be stored.

Ｐ_kは重み付けされた入力ｘ_kに対する共分散行列の逆行列であり、θ_kは学習パラメータであり、ｗ_kは前述の重みであり、ｅはトラッキング誤差であり、ｅ_pkは近似誤差であり、λは忘却係数である。忘却係数λは、パラメータ更新においてはある程度新しいデータだけを用いるために導入された係数で［０，１］の値をとる。

P _k is the inverse matrix of the covariance matrix for the input x _k weighted, theta _k is the learning parameter, w _k is the weight of the above, e is the tracking error, e _pk is an approximation error , Λ are oblivion factors. The forgetting factor λ takes a value of [0, 1] with a factor introduced to use only new data to a certain extent in parameter updating.

図９に、図７のステップ１２２およびステップ１２４の一例として局所モデルを追加する場合のプログラムのフローチャートを示す。図９を参照して、まずステップ１７０であるデータ点ｘについて計算された全ての重みｗ_kがあるしきい値より小さいか否かが判定される。この判定結果がＹＥＳであれば、このデータ点の存在がどの局所モデルにも十分に反映されていないということなので、ステップ１７２で新規な局所モデルを追加する。この場合の局所モデルの中心ｃ _kの初期値はｘに設定される。その幅には適当な初期値が設定される。たとえば隣接する局所モデルの幅などを初期値に設定するとよい。これは、隣接する局所モデルは、真の関数の隣接する部分に対応しているので、そこでの真の関数の曲率にもそれほど大きな違いはないだろうという推定に基づく。もっとも、ここで新規に追加する局所モデルは以後の更新処理で調整されていくため、上のように幅を選ぶことは必須ではない。ただし、上のように選ぶことにより局所モデルの幅が早期に最適な値に調整されるという効果がある。 FIG. 9 shows a flowchart of a program for adding a local model as an example of steps 122 and 124 of FIG. Referring to FIG. 9, whether initially less than all of the weights w _k is the threshold calculated for the data points x is the step 170 is determined. If the determination result is YES, it means that the presence of this data point is not sufficiently reflected in any local model, so a new local model is added in step 172. The initial value of the center c _k of the local model in this case is set to x . An appropriate initial value is set to the width. For example, the width of an adjacent local model may be set to an initial value. This is based on the assumption that adjacent local models correspond to adjacent parts of the true function, so the curvature of the true function there will not be much different. However, since the local model to be newly added here is adjusted in the subsequent update processing, it is not essential to select the width as described above. However, selecting as above has an effect that the width of the local model is adjusted to the optimum value at an early stage.

再び図６を参照して、ステップ１０６で計算に使用される制御則の例は以下の
形のものである。 Referring again to FIG. 6 , an example of a control law used in the calculations at step 106 is of the form:

図１２は、Ｎ（０，０．０１）のガウスノイズを測定値に加えたときの非適応的ＰＤコントローラによるトラッキング誤差２３０と、Γ_k＝１０Ｉおよび２５０Ｉのトラッキング誤差ベースの適応コントローラのトラッキング誤差２３２および２３４と、本実施の形態のＲＦＷＲ複合適応コントローラによるトラッキング誤差２３６とを比較して示す。図１２に示すように、Γk＝２５０Ｉのトラッキング誤差ベースの適応コントローラの性能は、ガウスノイズの存在によって大きく低下する。それに対し本実施の形態にかかるＲＦＷＲ複合適応コントローラは安定でかつ高速な学習を実現していることが分かる。 FIG. 12 shows tracking error 230 by the non-adaptive PD controller when N (0, 0.01) Gaussian noise is added to the measured value, and tracking by the tracking error based adaptive controller of Γ _k = 10 I and 250 I The errors 232 and 234 and the tracking error 236 by the RFWR complex adaptive controller of the present embodiment are compared and shown. As shown in FIG. 12, the performance of the tracking error based adaptive controller at Γ k = 250 I is greatly degraded by the presence of Gaussian noise. On the other hand, it can be seen that the RFWR complex adaptive controller according to the present embodiment realizes stable and fast learning.

【符号の説明】
２０目標軌跡、２２実際の軌跡、２４トラッキング誤差、２６，４０真の関数、２８関数近似、３０近似誤差、３２カーネル関数、４２，４４，４６カーネル直径の範囲、４２Ｃトレーニング点、５２，５４，５６局所近似線形関数、６０コントローラ、７４ＣＰＵ、７６ＲＯＭ、７８ＲＡＭ、８０メモリリーダ、８２ネットワークボード、９０メモリカード、９２ネットワーク [Description of the code]
20 target trajectory, 22 actual trajectory, 24 tracking error, 26, 40 true function, 28 function approximation, 30 approximation error, 32 kernel function, 42, 44, 46 kernel diameter range, 42C training point , 52, 54, 56 local approximation linear function, 60 controller, 74 CPU, 76 ROM, 78 RAM, 80 memory reader, 82 network board, 90 memory card, 92 network

Claims

A control method of a physical system for controlling a physical system by approximating a non-linear function describing the dynamics of the physical system by a function approximation obtained by weighting and adding a linear local model. The structure of the local model making up the approximation and the respective weights are determined by predetermined learning parameters,
Defining an initial structure of the function approximation;
Receiving state data representing an actual state of the physical system;
Based on the state data, it is determined independently for each local model based on the tracking error between the target trajectory of the physical system and the actual trajectory, and the approximation error between the state data and the function approximation. Updating the function approximation by updating the learning parameters of each local model so as to minimize the error index of
Calculating the control variable according to the control law of the control system using the updated function approximation;
Outputting the calculated control variable to the physical system;
A control method of a physical system, comprising the steps of receiving, updating, performing calculation, and repeatedly performing outputting.

The function approximation ^ y is

However

c _k is the center position of the k th linear model,
w _k is a weight represented by a predetermined kernel function,
The control method of the physical system according to claim 1, represented by

The weight w _k is

The control method of a physical system according to claim 2, wherein the control method is calculated by a kernel function.

The updating step is
For each of the existing local models, based on the state data, based on tracking errors between the target trajectory of the physical system and the actual trajectory, and an approximation error between the state data and the function approximation A second step of updating the learning parameter to minimize a predetermined error indicator;
Determining whether a learning parameter of each local model updated in the second step of updating satisfies a predetermined condition;
The method according to claim 3, further comprising the steps of: adding or deleting a local model in response to determining that the learning parameter of each local model satisfies the predetermined condition in the determining step. System control method.

The second step of updating is
For each of the local models,
Calculating the weights w _k based on said status data and said tracking error,
Using the weight w _k , the following equation

P _k is the inverse matrix of the covariance matrix for the input x _k weighted, theta _k learning parameters of the local models, w _k is the weight of the local model, e is the tracking error, e _pk approximation error, lambda forgetting Calculating an approximation ^ θ _k of the learning parameter of the local model according to the coefficient;
Optimizing each of the distance metrics by minimizing an error indicator defined between the function value y representing the training data and the function approximation y y defined by the predetermined equation. The control method of the physical system as described in.

The determining step includes the step of determining whether the weights w _k (k = 1 to the number of local models) calculated for all the local models are less than a predetermined threshold value.
Wherein the step of adding or deleting includes the step of all the weights w _k calculated for the local model in response to the determination that below a predetermined threshold, adding a new local model, wherein Item 5. A control method of a physical system according to item 5.

The control method of a physical system according to claim 6, wherein an initial value of a center position of the local model added in the adding step is selected to be equal to a data point corresponding to the state data.

The control method of a physical system according to claim 6 or 7, wherein the initial value of the width of the local model added in the adding step is selected to be equal to the width of the local model closest to the added local model. .

The optimizing step includes the step of optimizing the distance metric D _{k, ij} so as to minimize the error indicator J _k defined by the following equation:

Here, the following gradient descent method is used,

here,

The control method of the physical system according to any one of claims 6 to 8, wherein γ is a scalar quantity that determines the size of the penalty, and α is a learning rate.

A computer program for control of a physical system, comprising computer program code means configured to execute the control method of a physical system according to any one of claims 1 to 9 by being executed on a computer. .

A computer program for controlling a physical system according to claim 10, recorded on a computer readable storage medium.

A control system for a physical system, which controls a physical system by approximating a non-linear function describing the dynamics of the physical system by a function approximation obtained by weighting and adding a linear local model. The structure of the local model making up the approximation and the respective weights are determined by predetermined learning parameters,
Initialization means for defining an initial structure of the function approximation;
Receiving means for receiving state data representing an actual state of the physical system;
Based on the state data, it is determined independently for each local model based on the tracking error between the target trajectory of the physical system and the actual trajectory, and the approximation error between the state data and the function approximation. Updating means for updating the function approximation by updating learning parameters of each local model so as to minimize an error index of
Calculation means for calculating control variables according to the control law of the control system using the updated function approximation;
Output means for outputting the calculated control variable to the physical system;
A control system for a physical system, comprising: the receiving means, the updating means, the calculating means, and a control means for controlling the output means to operate repeatedly.

The function approximation ^ y is

However

c _k is the center position of the k th linear model,
w _k is a weight represented by a predetermined kernel function,
The control device of a physical system according to claim 12, represented by

The weight w _k is

The controller of a physical system according to claim 13, which is calculated by a kernel function

The updating means is
For each of the existing local models, based on the state data, based on tracking errors between the target trajectory of the physical system and the actual trajectory, and an approximation error between the state data and the function approximation Second updating means for updating the learning parameters to minimize a predetermined error indicator;
Determining means for determining whether the learning parameter of each local model updated by the second updating means satisfies a predetermined condition;
And means for adding or deleting a local model in response to the determination means determining that the learning parameter of each local model satisfies the predetermined condition. Control system.

The second updating means is
For each of the local models,
Means for calculating the weights w _k based on said status data and said tracking error,
Using the weight w _k , the following equation

P _k is the inverse matrix of the covariance matrix for the input x _k weighted, theta _k learning parameters of the local models, w _k is the weight of the local model, e is the tracking error, e _pk approximation error, lambda forgetting Means for calculating an approximation ^ θ _k of the learning parameter of the local model according to the coefficient;
And optimization means for optimizing each of the distance metrics by minimizing an error indicator defined between a function value y representing the training data and a function approximation y determined by a predetermined equation The control device of a physical system according to claim 15.

The determining means includes means for determining whether the weights w _k (k = 1 to the number of local models) calculated for all the local models are less than a predetermined threshold value,
Said means for adding or deleting, all weights w _k calculated for the local model in response to the determination that below a predetermined threshold, additional to add a new local model The control device of the physical system according to claim 16 including means.

The control device of a physical system according to claim 17, wherein an initial value of a central position of the local model added by the addition means is selected to be equal to a data point corresponding to the state data.

The control device of a physical system according to claim 17 or 18, wherein the initial value of the width of the local model added by the addition means is selected to be equal to the width of the local model closest to the added local model.

The optimization means includes means for optimizing the distance metric D _{k, ij} to minimize an error indicator J _k defined by the following equation:

Here, the following gradient descent method is used,

here,

20. The control system of the physical system according to any one of claims 17 to 19, wherein γ is a scalar quantity that determines the size of the penalty, and α is a learning rate.