WO2025126326A1

WO2025126326A1 - Information processing device, information processing method, information processing system, and program

Info

Publication number: WO2025126326A1
Application number: PCT/JP2023/044465
Authority: WO
Inventors: 慧竹村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2023-12-12
Filing date: 2023-12-12
Publication date: 2025-06-19
Anticipated expiration: 2026-06-12

Abstract

To be able to derive a more suitable decision-making result (optimal solution), an information processing device (1) is provided with: an acquisition means for acquiring an output value obtained from each of one or a plurality of models; and a derivation means for executing a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a second derivation process for deriving a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

Description

Information processing device, information processing method, information processing system, and program

　本発明は、情報処理装置、情報処理方法、情報処理システム、及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, an information processing system, and a program.

　需要量又は供給量等に関する予測（意思決定結果）を導出し、導出した予測の実行結果を観測し、当該観測結果に基づき更なる予測を導出するというプロセスを逐次的に繰り返す逐次的意思決定技術が知られている。 A sequential decision-making technique is known in which a process is sequentially repeated: deriving a prediction (decision-making result) regarding demand or supply volume, etc., observing the results of executing the derived prediction, and deriving a further prediction based on the observed results.

　例えば、特許文献１には、不確定性要因を含む設備投資計画の立案及び評価を行う最適意思決定方法が記載されている。 For example, Patent Document 1 describes an optimal decision-making method for planning and evaluating capital investment plans that include uncertain factors.

特開２００５－１０８１４７号公報JP 2005-108147 A

　一般に、逐次的意思決定技術では、より適切な意思決定結果（最適解）を導出することが求められるが、特許文献１に記載の技術では、この点において改善の余地があった。 Generally, sequential decision-making techniques are required to derive more appropriate decision-making results (optimal solutions), but the technique described in Patent Document 1 leaves room for improvement in this regard.

　本発明の一態様は、上記の問題に鑑みてなされたものであり、その目的の一例は、より適切な意思決定結果（最適解）を導出することのできる技術を提供することにある。 One aspect of the present invention was made in consideration of the above problems, and one of its objectives is to provide a technology that can derive more appropriate decision-making results (optimal solutions).

　本発明の一態様に係る情報処理装置は、１又は複数のモデルの各々から得られる出力値を取得する取得手段と、前記取得手段が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行する導出手段とを備えている。 An information processing device according to one aspect of the present invention includes an acquisition means for acquiring an output value obtained from each of one or more models, a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a derivation means for executing a second derivation process for deriving a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　本発明の一態様に係る情報処理方法は、情報処理装置が、１又は複数のモデルの各々から得られる出力値を取得し、前記取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行する。 In one aspect of the information processing method of the present invention, an information processing device executes a plurality of first derivation processes that acquire output values obtained from each of one or more models and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　本発明の一態様に係るプログラムは、コンピュータを情報処理装置として機能させるプログラムであって、前記プログラムは、前記コンピュータに、１又は複数のモデルの各々から得られる出力値を取得させ、前記取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行させる。 A program according to one aspect of the present invention is a program that causes a computer to function as an information processing device, and the program causes the computer to acquire output values obtained from each of one or more models, and execute a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　本発明の一態様に係る情報処理システムは、情報処理装置と、端末装置とを含む情報処理システムであって、前記情報処理装置は、１又は複数のモデルの各々から得られる出力値を取得する取得手段と、前記取得手段が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行する導出手段とを備え、前記端末装置は、前記情報処理装置が導出した前記第２の最適解を実行する実行手段を備えている。 An information processing system according to one aspect of the present invention is an information processing system including an information processing device and a terminal device, the information processing device includes an acquisition means for acquiring an output value obtained from each of one or more models, a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a derivation means for executing a second derivation process for deriving a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes, and the terminal device includes an execution means for executing the second optimal solution derived by the information processing device.

　本発明の一態様によれば、より適切な意思決定結果（最適解）を導出することができる。 According to one aspect of the present invention, it is possible to derive a more appropriate decision-making result (optimal solution).

例示的実施形態に係る情報処理装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理方法の流れを示すフロー図である。1 is a flow diagram illustrating a flow of an information processing method according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による処理を説明するための図である。FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理システムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an information processing system according to an exemplary embodiment. 例示的実施形態に係る情報処理システムによる処理の流れを示すフロー図である。FIG. 1 is a flow diagram illustrating a process flow of an information processing system according to an exemplary embodiment. 例示的実施形態に係る情報処理装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による処理を説明するための図である。FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による処理を説明するための図である。FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による効果を説明するための図である。11A to 11C are diagrams for explaining effects achieved by an information processing device according to an exemplary embodiment. 例示的実施形態の適用例に係る情報処理装置による処理を説明するための図である。1 is a diagram for explaining processing by an information processing device according to an application example of an exemplary embodiment. 例示的実施形態の適用例に係る情報処理装置が参照する情報の例を示す図である。FIG. 11 is a diagram showing an example of information referred to by an information processing device according to an application example of the exemplary embodiment. 各例示的実施形態に係る情報処理装置として機能するコンピュータの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a computer that functions as an information processing device according to each exemplary embodiment.

　以下、本発明の実施形態を例示する。ただし、本発明は、以下に示す各例示的実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。例えば、以下に示す各例示的実施形態において採用される技術的手段を適宜組み合わせることにより得られる実施形態についても、本発明の範疇に含まれ得る。また、以下に示す各例示的実施形態において採用される技術的手段の一部を適宜省略することにより得られる実施形態についても、本発明の範疇に含まれ得る。また、以下に示す各例示的実施形態において言及する効果は、その例示的実施形態において期待される効果の一例であり、本発明の外延を規定するものではない。すなわち、以下に示す各例示的実施形態において言及する効果を奏さない実施形態についても、本発明の範疇に含まれ得る。 Below are examples of embodiments of the present invention. However, the present invention is not limited to the exemplary embodiments shown below, and various modifications are possible within the scope of the claims. For example, embodiments obtained by appropriately combining the technical means employed in the exemplary embodiments shown below may also be included in the scope of the present invention. Furthermore, embodiments obtained by appropriately omitting some of the technical means employed in the exemplary embodiments shown below may also be included in the scope of the present invention. Furthermore, the effects mentioned in the exemplary embodiments shown below are examples of effects expected in the exemplary embodiments, and do not define the scope of the present invention. In other words, embodiments that do not exhibit the effects mentioned in the exemplary embodiments shown below may also be included in the scope of the present invention.

　〔第１の例示的実施形態〕
　本発明の実施形態の一例である第１の例示的実施形態について、図面を参照して詳細に説明する。本例示的実施形態は、後述する各例示的実施形態の基本となる形態である。なお、本例示的実施形態において採用する各技術的手段の適用範囲は、本例示的実施形態に限定されない。すなわち、本例示的実施形態において採用する各技術的手段は、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。また、本例示的実施形態を説明するために参照する図面に示される各技術的手段も、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。 First Exemplary Embodiment
A first exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. This exemplary embodiment is the basic form of each exemplary embodiment described later. The scope of application of each technical means adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technical means adopted in this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure to the extent that no particular technical obstacle occurs. In addition, each technical means shown in the drawings referred to for explaining this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure to the extent that no particular technical obstacle occurs.

　＜情報処理装置１の概要＞
　まず、本例示的実施形態に係る情報処理装置１の概要について説明する。本例示的実施形態に係る情報処理装置１は、複数のエキスパート（モデル）の各々から、当該エキスパートによる出力値を取得し、取得した複数の出力値を参照して意思決定を行う情報処理装置である。また、本例示的実施形態に係る情報処理装置１は、出力値の取得と、意思決定とを逐次的に行う。例えば、ラウンドｔにおいて、エキスパート１から出力値Ｐ１ｔを取得し、エキスパート２から出力値Ｐ２ｔを取得し、取得した出力値Ｐ１ｔとＰ２ｔとを参照して、当該ラウンドｔにおける意思決定結果（ｔ）を導出する。そして、情報処理装置１は、意思決定のためのパラメータを更新し、次のラウンドにおける出力値を取得し、当該次のラウンドにおける意思決定結果を導出するという処理を行う。なお、ｔは繰り返し回数を表現するインデックスであり、タイミングを示すインデックスと解釈することもできる。 <Overview of information processing device 1>
First, an overview of the information processing device 1 according to this exemplary embodiment will be described. The information processing device 1 according to this exemplary embodiment is an information processing device that acquires an output value by each of a plurality of experts (models) from the corresponding expert, and makes a decision by referring to the acquired plurality of output values. In addition, the information processing device 1 according to this exemplary embodiment sequentially acquires output values and makes a decision. For example, in round t, an output value P1t is acquired from expert 1, an output value P2t is acquired from expert 2, and a decision-making result (t) in the round t is derived by referring to the acquired output values P1t and P2t. Then, the information processing device 1 performs a process of updating parameters for decision-making, acquiring an output value in the next round, and deriving a decision-making result in the next round. Note that t is an index that represents the number of repetitions, and can also be interpreted as an index indicating timing.

　本例示的実施形態において、「エキスパート」とは、何らかの出力値を出力するハードウェア、ソフトウェア、生体の何れであってもよい。一例として、「エキスパート」は、出力値として予測値を出力する予測値導出装置のようなハードウェアであってもよいし、出力値として予測値を出力する予測値導出アルゴリズムのようなソフトウェアであってもよいし、出力値として予測値を何らかの手法で出力する人であってもよい。また、「エキスパート」は、予測値を出力するものに限られず、出力値として何らかの生成結果を出力する生成モデルであってもよいし、出力値として何らかの制御値を出力する制御モデルであってもよい。また、本例示的実施形態に係る情報処理装置１は、「エキスパート」を含む構成であってもよいし、外部の「エキスパート」から出力値を取得する構成であってもよい。なお、「エキスパート」は「モデル」または「エージェント」等とも呼称される。 In this exemplary embodiment, the "expert" may be any of hardware, software, or a living organism that outputs some kind of output value. As an example, the "expert" may be hardware such as a predicted value derivation device that outputs a predicted value as an output value, software such as a predicted value derivation algorithm that outputs a predicted value as an output value, or a person who outputs a predicted value as an output value using some method. In addition, the "expert" is not limited to one that outputs a predicted value, but may be a generation model that outputs some kind of generation result as an output value, or a control model that outputs some kind of control value as an output value. In addition, the information processing device 1 according to this exemplary embodiment may be configured to include an "expert" or may be configured to obtain an output value from an external "expert". In addition, the "expert" is also called a "model" or "agent", etc.

　また、本例示的実施形態において、「出力値」はどのようなものであってもよい。一例として、「出力値」は、例えば、需要や供給に関する予測値であってもよいし、その他の事例（事象）に関連した予測値であってもよい。また、「出力値」は予測に関するものでなくてもよい。例えば、本例示的実施形態における「出力値」は、情報処理装置１が参照する何らかのパラメータに関する出力値であってもよい。本例示的実施形態に係る情報処理装置１は、対象となる事象に関し、複数のエキスパートの各々からの出力値を参照して意思決定を行うプロセス全般に適用することができる。 Furthermore, in this exemplary embodiment, the "output value" may be anything. As an example, the "output value" may be, for example, a predicted value related to demand or supply, or a predicted value related to other cases (events). Furthermore, the "output value" does not have to be related to a prediction. For example, the "output value" in this exemplary embodiment may be an output value related to some parameter referenced by the information processing device 1. The information processing device 1 according to this exemplary embodiment can be applied to the general process of making decisions regarding a target event by referring to output values from each of multiple experts.

　また、本例示的実施形態において、「意思」とは、対象の事象に関する何らかの情報のことを指し、生体（人）が有する意思に限定的に解釈されるものではない。例えば、対象の商品に関する需要を予測するという適用シーンにおいて、将来の需要量に関する予測値は、本例示的実施形態に係る情報処理装置１が決定した「意思」、又は情報処理装置１が導出した「意思決定結果」の一例である。本例示的実施形態に係る情報処理装置１は、意思決定装置、または意思決定結果導出装置などと表現することもできる。なお、「意思決定結果」は、「最適化解」、「最適化結果」とも呼ばれる。 In addition, in this exemplary embodiment, "intention" refers to some information related to the target event, and is not limited to being interpreted as the intention of a living organism (person). For example, in an application scenario in which demand for a target product is predicted, a predicted value of future demand is an example of an "intention" determined by the information processing device 1 according to this exemplary embodiment, or a "decision-making result" derived by the information processing device 1. The information processing device 1 according to this exemplary embodiment can also be expressed as a decision-making device, a decision-making result derivation device, or the like. The "decision-making result" is also called an "optimization solution" or an "optimization result".

　また、本例示的実施形態では、複数のエキスパートの各々が提供する出力値に対応して、損失値が提供（取得）され得る。ここで、当該損失値は、情報処理装置１による「意思決定結果」に応じて、観測によって取得される場合もあるし、観測によっては取得できない場合もある。情報処理装置１は、観測によって取得できない損失値を、導出によって取得してもよい。どのような「意思決定結果」によって、どのような損失値を「観測」できるかは、一例として、フィードバックグラフと呼ばれる有向グラフの構造によって表現され得るが、これは本例示的実施形態を限定するものではない。 Furthermore, in this exemplary embodiment, a loss value can be provided (acquired) corresponding to an output value provided by each of the multiple experts. Here, the loss value may be acquired by observation depending on the "decision-making result" by the information processing device 1, or may not be able to be acquired depending on the observation. The information processing device 1 may acquire a loss value that cannot be acquired by observation by derivation. As an example, what loss value can be "observed" depending on what "decision-making result" can be expressed by a directed graph structure called a feedback graph, but this is not a limitation of this exemplary embodiment.

　本例示的実施形態において、損失値は、一例として出力値（予測値）と観測値（実測値）との相違として表現することができるが、これは本例示的実施形態を限定するものではない。損失値は、予測値と他の所定値との相違であってもよい。また、損失値は、損失に関する推定値であってもよい。また、「損失値」との文言は、「報酬」という概念を含み得る。例えば、損失値は、報酬値の符号を反転させたもの（報酬値に負の定数を乗じたもの）として表現することもできる。したがって、本例示的実施形態に係る損失値を報酬値と読み替えてもよい。 In this exemplary embodiment, the loss value can be expressed as the difference between the output value (predicted value) and the observed value (actual value), as an example, but this does not limit this exemplary embodiment. The loss value may be the difference between the predicted value and another predetermined value. The loss value may also be an estimated value related to the loss. The term "loss value" may also include the concept of "reward." For example, the loss value may be expressed as the reward value with the sign reversed (the reward value multiplied by a negative constant). Therefore, the loss value according to this exemplary embodiment may be read as the reward value.

　＜情報処理装置１の構成＞
　続いて、情報処理装置１の構成について図１を参照して説明する。図１は、情報処理装置１の構成を示すブロック図である。図１に示すように、情報処理装置１は、取得部１１、及び導出部１２を備える。 <Configuration of information processing device 1>
Next, the configuration of the information processing device 1 will be described with reference to Fig. 1. Fig. 1 is a block diagram showing the configuration of the information processing device 1. As shown in Fig. 1, the information processing device 1 includes an acquisition unit 11 and a derivation unit 12.

　（取得部１１）
　取得部１１は、複数のモデル（エキスパート）の各々から得られる出力値を取得する。ここで、「出力値」とは、上述したように、例えば、需要や供給に関する予測値であってもよいし、その他の事例（事象）に関連した出力値であってもよい。また、「出力値」は、情報処理装置１が参照する何らかのパラメータに関する出力値であってもよい。また、取得部１１は、逐次的な意思決定処理において、ラウンド毎に、当該ラウンドにおける出力値を取得する構成とすることができるが、これは本例示的実施形態を限定するものではない。 (Acquisition unit 11)
The acquisition unit 11 acquires output values obtained from each of the multiple models (experts). Here, as described above, the "output value" may be, for example, a predicted value related to demand or supply, or an output value related to other cases (events). The "output value" may also be an output value related to some parameter referenced by the information processing device 1. The acquisition unit 11 may be configured to acquire an output value for each round in the sequential decision-making process, but this does not limit the present exemplary embodiment.

　（導出部１２）
　導出部１２は、
・前記取得手段が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
・前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を実行する。ここで、第１の導出処理のことをベースアルゴリズムと呼称し、第２の導出処理のことをマスタアルゴリズムと表現することもあるが、当該文言は本例示的実施形態を限定するものではない。 (Derivation section 12)
The derivation unit 12 is
a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. Here, the first derivation process is sometimes referred to as a base algorithm, and the second derivation process is sometimes referred to as a master algorithm, but these terms do not limit this exemplary embodiment.

　また、信頼度とは、各エキスパートによる出力値を、意思決定処理においてどの程度反映するかを示す指標である。当該信頼度は、各々の第１の導出処理による第１の最適解を、意思決定処理においてどの程度反映するかを示す指標であると表現してもよい。信頼度は、一例として、各エキスパートによる予測値に演算される相対的な重みとして表現することもできるし、第１の最適解の各々に演算される相対的な重みとして表現することもできる。 Furthermore, reliability is an index showing to what extent the output value by each expert is reflected in the decision-making process. The reliability may be expressed as an index showing to what extent the first optimal solution by each first derivation process is reflected in the decision-making process. As an example, reliability can be expressed as a relative weight calculated on the predicted value by each expert, or as a relative weight calculated on each of the first optimal solutions.

　以上のように、本例示的実施形態に係る情報処理装置１は、１又は複数のモデル（エキスパート）の各々から得られる出力値を取得し、取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び、前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理装置１は、信頼度を用いた階層的な処理によって最適解の導出（意思決定）を行う。したがって、本例示的実施形態に係る情報処理装置１によれば、より適切な意思決定結果（最適解）を導出することができる。 As described above, the information processing device 1 according to this exemplary embodiment acquires output values obtained from each of one or more models (experts) and executes a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing device 1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing device 1 according to this exemplary embodiment, it is possible to derive a more appropriate decision-making result (optimal solution).

　＜情報処理方法Ｓ１の流れ＞
　続いて、本例示的実施形態１に係る情報処理方法Ｓ１の流れについて、図２を参照して説明する。図２は、情報処理方法Ｓ１の流れを示すフロー図である。 <Flow of information processing method S1>
Next, the flow of the information processing method S1 according to the present exemplary embodiment 1 will be described with reference to Fig. 2. Fig. 2 is a flow diagram showing the flow of the information processing method S1.

　（ステップＳ１１）
　ステップＳ１１において、取得部１１は、１又は複数のモデル（エキスパート）の各々から得られる出力値を取得する。 (Step S11)
In step S11, the acquisition unit 11 acquires output values obtained from one or more models (experts).

　（ステップＳ１２）
　ステップＳ１２において、導出部１２は、
・ステップＳ１１において取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
・前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を実行する。 (Step S12)
In step S12, the derivation unit 12
- Executing a plurality of first derivation processes to derive a first optimal solution by referring to the output values acquired in step S11, and - executing a second derivation process to derive a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　情報処理装置１は、あるラウンドにおけるステップＳ１１及びステップＳ１２の処理を行ったうえで、次のラウンドにおけるステップＳ１１及びステップＳ１２の処理を行う。 The information processing device 1 performs the processes of steps S11 and S12 in one round, and then performs the processes of steps S11 and S12 in the next round.

　図３は、本例示的実施形態に係る情報処理方法Ｓ１による逐次的意思決定処理を模式的に説明するための図である。図３に示すように、情報処理装置１は、あるラウンドにおいて、各エキスパートが提供する出力値を、複数のエキスパートについて取得する。そして、取得した各出力値を参照して意思決定結果（最適解）を導出する。そして、導出された意思決定結果（最適解）が実行され、各エキスパートによって次のラウンドにおける出力値が提供される。また、当該次のラウンドに対応する損失値が観測され得る。ここで、上述したように、当該損失値は、情報処理装置１による意思決定結果（最適解）に応じて、観測によって取得される場合もあるし、観測によっては取得できない場合もある。情報処理装置１は、観測によって取得できない損失値を、導出によって取得してもよい。このようにして、情報処理装置１は、意思決定結果の導出を逐次的に行う。 FIG. 3 is a diagram for illustrating a sequential decision-making process by the information processing method S1 according to this exemplary embodiment. As shown in FIG. 3, the information processing device 1 acquires output values provided by each expert for multiple experts in a certain round. Then, a decision-making result (optimal solution) is derived by referring to each acquired output value. Then, the derived decision-making result (optimal solution) is executed, and an output value in the next round is provided by each expert. In addition, a loss value corresponding to the next round can be observed. Here, as described above, the loss value may be acquired by observation depending on the decision-making result (optimal solution) by the information processing device 1, or may not be acquired by observation. The information processing device 1 may acquire a loss value that cannot be acquired by observation by derivation. In this way, the information processing device 1 sequentially derives a decision-making result.

　以上のように、本例示的実施形態に係る情報処理方法Ｓ１においては、１又は複数のモデル（エキスパート）の各々から得られる出力値を取得し、取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び、前記複数の第１の導出処理の各々が導出した第１の最適解と前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理方法Ｓ１は、信頼度を用いた階層的な処理によって最適解の導出（意思決定）を行う。したがって、本例示的実施形態に係る情報処理方法Ｓ１によれば、より適切な意思決定結果（最適解）を導出することができる。 As described above, the information processing method S1 according to this exemplary embodiment executes a plurality of first derivation processes that acquire output values from each of one or more models (experts) and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing method S1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing method S1 according to this exemplary embodiment, a more appropriate decision-making result (optimal solution) can be derived.

　＜情報処理システム１００の構成＞
　続いて、本例示的実施形態に係る情報処理システム１００の構成について図４を参照して説明する。図４は、情報処理システム１００の構成を示すブロック図である。図４に示すように、情報処理装置１００は、互いに通信可能に接続された情報処理装置１と端末装置２とを備えている。情報処理装置１が備える各構成については上述したためここでは説明を省略する。 <Configuration of information processing system 100>
Next, the configuration of the information processing system 100 according to this exemplary embodiment will be described with reference to Fig. 4. Fig. 4 is a block diagram showing the configuration of the information processing system 100. As shown in Fig. 4, the information processing device 100 includes an information processing device 1 and a terminal device 2 that are communicably connected to each other. Each component of the information processing device 1 has been described above, and therefore description thereof will be omitted here.

　（端末装置２）
　図４に示すように、端末装置２は、実行部２１と、損失値取得部２２とを備えている。実行部２１は、情報処理装置１が導出した意思決定結果（最適解）、又は当該意思決定結果（最適解）に対応する処理を実行する。一例として、意思決定結果が、商品Ａに関する本日の需要としてＸ個を予測するものである場合、実行部２１は、商品ＡについてＸ個の発注を行う。 (Terminal device 2)
4, the terminal device 2 includes an execution unit 21 and a loss value acquisition unit 22. The execution unit 21 executes the decision-making result (optimal solution) derived by the information processing device 1, or a process corresponding to the decision-making result (optimal solution). As an example, if the decision-making result predicts X units of product A as today's demand, the execution unit 21 places an order for X units of product A.

　＜情報処理方法Ｓ１００の流れ＞
　続いて、本例示的実施形態１に係る情報処理方法Ｓ１００の流れについて、図５を参照して説明する。図５は、情報処理システム１００が実行する情報処理方法Ｓ１００の流れを示すフロー図である。 <Flow of information processing method S100>
Next, the flow of the information processing method S100 according to the first exemplary embodiment will be described with reference to Fig. 5. Fig. 5 is a flow diagram showing the flow of the information processing method S100 executed by the information processing system 100.

　（ステップＳ１１－１、Ｓ１２－１）
　図４に示すように、ステップＳ１１－１において、取得部１１は、複数のエキスパート（モデル）の各々から得られる出力値を取得する。 (Steps S11-1, S12-1)
As shown in FIG. 4, in step S11-1, the acquisition unit 11 acquires output values obtained from each of a plurality of experts (models).

　ステップＳ１２－１において、導出部１２は、
・ステップＳ１１－１において取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
・前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を実行する。 In step S12-1, the derivation unit 12
- Executing a plurality of first derivation processes to derive a first optimal solution by referring to the output values acquired in step S11-1, and - executing a second derivation process to derive a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　（ステップＳ２１－１、Ｓ２２－１）
　ステップＳ２１－１において、端末装置２の実行部２１は、ステップＳ１２－１において導出された意思決定結果（より具体的には第２の最適解）、又は当該意思決定結果に対応する処理を実行する。ステップＳ２２－１において、端末装置２は、当該実行の結果を情報処理装置１に提供する。実行した意思決定結果によって損失値が得られる場合、当該実行の結果には、損失値が含まれ得る。 (Steps S21-1, S22-1)
In step S21-1, the execution unit 21 of the terminal device 2 executes the decision-making result derived in step S12-1 (more specifically, the second optimal solution) or a process corresponding to the decision-making result. In step S22-1, the terminal device 2 provides the result of the execution to the information processing device 1. If a loss value is obtained by the executed decision-making result, the result of the execution may include the loss value.

　（ステップＳ１１－２、Ｓ１２－２）
　ステップＳ１１－２において、取得部１１は、複数のエキスパート（モデル）の各々から得られる当該ラウンド（ラウンドｔ＝２）に関する出力値を取得する。ここで、当該各予測値は、一例として、ステップＳ２２－１において損失値取得部２２が取得した損失値であってもよいし、情報処理装置１が導出した損失値であってもよい。 (Steps S11-2, S12-2)
In step S11-2, the acquisition unit 11 acquires output values for the round (round t=2) obtained from each of the multiple experts (models). Here, each predicted value may be, for example, the loss value acquired by the loss value acquisition unit 22 in step S22-1 or the loss value derived by the information processing device 1.

　ステップＳ１２－２において、導出部１２は、
・ステップＳ１１－２において取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
・前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を実行する。ステップＳ１２－２において導出された意思決定結果（より具体的には第２の最適解）は、端末装置２に提供され、ステップＳ２１－２において実行される。 In step S12-2, the derivation unit 12
A plurality of first derivation processes are executed to derive a first optimal solution by referring to the output values acquired in step S11-2, and a second derivation process is executed to derive a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. The decision-making result derived in step S12-2 (more specifically, the second optimal solution) is provided to the terminal device 2 and executed in step S21-2.

　以上のように、本例示的実施形態に係る情報処理方法Ｓ１００においては、１又は複数のモデル（エキスパート）の各々から得られる出力値を取得し、取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び、前記複数の第１の導出処理の各々が導出した第１の最適解と前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理方法Ｓ１００は、信頼度を用いた階層的な処理によって最適解の導出（意思決定）を行う。したがって、本例示的実施形態に係る情報処理方法Ｓ１００によれば、より適切な意思決定結果（最適解）を導出することができる。 As described above, the information processing method S100 according to this exemplary embodiment executes a plurality of first derivation processes that acquire output values from each of one or more models (experts) and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing method S100 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing method S100 according to this exemplary embodiment, a more appropriate decision-making result (optimal solution) can be derived.

　〔例示的実施形態２〕
　本発明の実施形態の一例である第２の例示的実施形態について、図面を参照して詳細に説明する。上述した例示的実施形態にて説明した構成要素と同じ機能を有する構成要素については、同じ符号を付し、その説明を適宜省略する。なお、本例示的実施形態において採用する各技術的手段の適用範囲は、本例示的実施形態に限定されない。すなわち、本例示的実施形態において採用する各技術的手段は、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。また、本例示的実施形態を説明するために参照する各図面に示される各技術的手段は、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。 Exemplary embodiment 2
A second exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. Components having the same functions as those described in the above exemplary embodiment will be given the same reference numerals, and their description will be omitted as appropriate. The scope of application of each technical means adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technical means adopted in this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure, as long as no particular technical hindrance occurs. In addition, each technical means shown in each drawing referred to for explaining this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure, as long as no particular technical hindrance occurs.

　＜情報処理システム１００Ａの構成＞
　本例示的実施形態に係る情報処理システム１００Ａの構成について、図６を参照して説明する。図６は、情報処理システム１００Ａの構成を示すブロック図である。図６に示すように、情報処理システム１００Ａは、情報処理装置１Ａと、端末装置２Ａとを含んでいる。また、図６に示すように、情報処理装置１Ａと端末装置２ＡとはネットワークＮを介して通信可能に構成されている。ここで、ネットワークＮの具体的構成は本例示的実施形態を限定するものではないが、一例として、無線ＬＡＮ（Local Area Network）、有線ＬＡＮ、ＷＡＮ（Wide Area Network）、公衆回線網、モバイルデータ通信網、又は、これらのネットワークの組み合わせを用いることができる。 <Configuration of Information Processing System 100A>
The configuration of an information processing system 100A according to this exemplary embodiment will be described with reference to Fig. 6. Fig. 6 is a block diagram showing the configuration of the information processing system 100A. As shown in Fig. 6, the information processing system 100A includes an information processing device 1A and a terminal device 2A. Also, as shown in Fig. 6, the information processing device 1A and the terminal device 2A are configured to be able to communicate with each other via a network N. Here, the specific configuration of the network N does not limit this exemplary embodiment, but as an example, a wireless LAN (Local Area Network), a wired LAN, a WAN (Wide Area Network), a public line network, a mobile data communication network, or a combination of these networks can be used.

　＜情報処理装置１Ａの構成＞
　本例示的実施形態に係る情報処理装置１Ａの構成について、図６を参照して説明する。図６は、情報処理装置１Ａの構成を示すブロック図である。 <Configuration of information processing device 1A>
The configuration of an information processing device 1A according to this exemplary embodiment will be described with reference to Fig. 6. Fig. 6 is a block diagram showing the configuration of the information processing device 1A.

　図６に示すように、情報処理装置１Ａは、制御部１０Ａと、記憶部１５Ａと、通信部１６Ａとを備えている。 As shown in FIG. 6, the information processing device 1A includes a control unit 10A, a memory unit 15A, and a communication unit 16A.

　通信部１６Ａは、情報処理装置１Ａの外部の装置と通信を行う。一例として通信部１６Ａは、端末装置２Ａと通信を行う。通信部１６Ａは、制御部１０Ａから供給されたデータを端末装置２Ａに送信したり、端末装置２Ａから受信したデータを制御部１０Ａに供給したりする。 The communication unit 16A communicates with devices external to the information processing device 1A. As an example, the communication unit 16A communicates with the terminal device 2A. The communication unit 16A transmits data supplied from the control unit 10A to the terminal device 2A, and supplies data received from the terminal device 2A to the control unit 10A.

　（記憶部１５Ａ）
　記憶部１５Ａには、制御部１０Ａによって参照される各種の情報、及び制御部１０Ａによって導出された各種の情報が格納されている。一例として、記憶部１５Ａには、
・複数のエキスパートの各々による出力値を含む出力値情報ＰＩ、
・複数のエキスパートの各々による出力値に対応する損失値を含む損失値情報ＬＩ
・複数のエキスパートの各々の信頼度、及び後述する複数の第１の導出部１２１－１、１２１－２、・・・の各々の信頼度の少なくとも何れかを示す信頼度情報ＣＩ
・後述する複数の第１の導出部１２１－１、１２１－２、・・・の各々が導出した第１の最適解（第１の意思決定結果）ＤＲ１－１、ＤＲ１－２、・・・、及び
・後述する第２の導出部１２２が導出した第２の最適解（第２の意思決定結果）ＤＲ２
が格納されている。 (Storage unit 15A)
The storage unit 15A stores various information referenced by the control unit 10A and various information derived by the control unit 10A.
Output value information PI including output values by each of a plurality of experts;
Loss value information LI including loss values corresponding to the output values by each of a plurality of experts
Reliability information CI indicating at least one of the reliability of each of the multiple experts and the reliability of each of the multiple first derivation units 121-1, 121-2, ... described later
A first optimal solution (first decision-making result) DR1-1, DR1-2, ... derived by each of a plurality of first derivation units 121-1, 121-2, ... described later, and A second optimal solution (second decision-making result) DR2 derived by a second derivation unit 122 described later
is stored.

　なお、例示的実施形態１において説明したように、本例示的実施形態に係る情報処理装置システム１００Ａは、「エキスパート（モデル）」を含む構成であってもよいし、システム外部の「エキスパート（モデル）」から上記出力値を取得する構成であってもよい。 As described in the first exemplary embodiment, the information processing device system 100A according to this exemplary embodiment may be configured to include an "expert (model)" or may be configured to obtain the above output values from an "expert (model)" external to the system.

　（制御部１０Ａ）
　制御部１０Ａは、図６に示すように、取得部１１、及び導出部１２を備えている。 (Control unit 10A)
As shown in FIG. 6 , the control unit 10A includes an acquisition unit 11 and a derivation unit 12 .

　（取得部１１）
　取得部１１は、例示的実施形態１と同様に、複数のエキスパート（モデル）の各々から得られる出力値を取得する。ここで、当該出力値に対応する損失値が、観測によって取得可能な場合、取得部１１は、当該損失値を更に取得してもよい。出力値、及び損失値については例示的実施形態１において説明したため同様の説明は省略する。出力値、及び損失値の具体例については後述する。 (Acquisition unit 11)
The acquiring unit 11 acquires output values obtained from each of a plurality of experts (models) in the same manner as in the exemplary embodiment 1. Here, when a loss value corresponding to the output value can be acquired by observation, the acquiring unit 11 may further acquire the loss value. The output value and the loss value have been explained in the exemplary embodiment 1, and therefore the explanation thereof will be omitted. Specific examples of the output value and the loss value will be explained later.

　（導出部１２）
　図６に示すように、導出部１２は、複数の第１の導出部１２－１、１２１－２、・・・、及び第２の導出部１２２を備えている。本例示的実施形態に係る導出部１２は、例示的実施形態１と同様に、
・取得部１１が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
・前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を実行する。本例示的実施形態では、上記複数の第１の導出処理の各々は、複数の第１の導出部１２１－１、１２１－２、・・・の各々によって実行され、上記第２の導出処理は、第２の導出部１２２によって実行される。 (Derivation section 12)
6, the lead-out portion 12 includes a plurality of first lead-out portions 12-1, 12-2, ..., and a second lead-out portion 122. The lead-out portion 12 according to this exemplary embodiment, like the first exemplary embodiment,
A plurality of first derivation processes are executed to derive a first optimal solution by referring to the output values acquired by the acquisition unit 11, and a second derivation process is executed to derive a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In this exemplary embodiment, each of the plurality of first derivation processes is executed by each of a plurality of first derivation units 121-1, 121-2, ..., and the second derivation process is executed by a second derivation unit 122.

　換言すれば、各々の第１の導出部１２１－ｊ（ｊは第１の導出部を互いに識別するためのインデックス）は、ラウンドｔにおいて、取得部１１が取得した各出力値（ｍ_ｔ）を参照して第１の最適解（第１の意思決定結果）（ｐ_ｔ（ｊ））を導出する。また、第２の導出部１２２は、複数の第１の導出部１２１－１、１２１－２、・・・の各々が導出した第１の最適解（第１の意思決定結果）（ｐ_ｔ（ｊ））と、前記複数の第１の導出手段の各々の信頼度（ｗ_ｔ（ｊ））とに応じて第２の最適解（第２の意思決定結果）を導出する。 In other words, each first derivation unit 121-j (j is an index for distinguishing the first derivation units from one another) derives a first optimal solution (first decision-making result) (p _t (j)) by referring to each output value (m _t ) acquired by the acquisition unit 11 in round t. Also, the second derivation unit 122 derives a second optimal solution (second decision-making result) in accordance with the first optimal solution (first decision-making result) (p _t (j)) derived by each of the multiple first derivation units 121-1, 121-2, ... and the reliability (w _t (j)) of each of the multiple first derivation means.

　なお、第１の導出処理のことをベースアルゴリズムと呼称し、第２の導出処理のことをマスタアルゴリズムと表現することもあるが、当該文言は本例示的実施形態を限定するものではない。また、第１の導出部１２１－１、１２１－２、・・・の各々を、単に、第１の導出部１２１と表記することもある。また、前記第１の導出処理及び前記第２の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理（オンライン学習アルゴリズム）であると表現することもできる。 The first derivation process may be referred to as a base algorithm, and the second derivation process may be referred to as a master algorithm, but these terms do not limit this exemplary embodiment. Each of the first derivation units 121-1, 121-2, ... may be simply referred to as the first derivation unit 121. The first derivation process and the second derivation process may also be referred to as online machine learning processes (online learning algorithms) that refer to the output values that are sequentially obtained.

　また、上述の信頼度は、例示的実施形態１でも説明したように、各エキスパート（モデル）による出力値を、意思決定処理においてどの程度反映するかを示す指標である。当該信頼度は、各々の第１の導出処理による第１の最適解を、意思決定処理においてどの程度反映するかを示す指標であると表現してもよい。信頼度は、一例として、各エキスパートによる予測値に演算される相対的な重みとして表現することもできるし、第１の最適解の各々に演算される相対的な重みとして表現することもできる。 Furthermore, as explained in the first exemplary embodiment, the reliability is an index indicating the extent to which the output values by each expert (model) are reflected in the decision-making process. The reliability may be expressed as an index indicating the extent to which the first optimal solutions by each first derivation process are reflected in the decision-making process. As an example, the reliability can be expressed as a relative weight calculated on the predicted values by each expert, or as a relative weight calculated on each of the first optimal solutions.

　また、導出部１２は、取得部１１が取得した出力値、及び、前記複数の第１の導出処理の各々が導出した第１の最適解の少なくとも何れかを参照して、前記信頼度を導出してもよい。また、当該信頼度の導出処理を、前記第２の導出処理の一部として実行してもよい。当該構成によれば、前記出力値及び前記第１の最適解の少なくとも何れかを参照して、前記信頼度を導出するので、好適な信頼度を導出することができる。また、そのように好適な信頼度を参照することにより、好適な第２の最適解を導出することができる。なお、信頼度のより具体的な導出処理例については後述する。 The derivation unit 12 may derive the reliability by referring to the output value acquired by the acquisition unit 11 and at least one of the first optimal solutions derived by each of the multiple first derivation processes. The reliability derivation process may be executed as part of the second derivation process. With this configuration, the reliability is derived by referring to at least one of the output value and the first optimal solution, so that a suitable reliability can be derived. By referring to such a suitable reliability, a suitable second optimal solution can be derived. A more specific example of the reliability derivation process will be described later.

　また、導出部１２は、前記信頼度を、所定のタイミングで初期化する構成としてもよい。一例として、導出部１２は、第２の最適解を所定の回数（例えば１００回）導出する毎に、前記信頼度を初期化する（信頼度を初期値に設定する）構成としてもよい。このように、導出部１２が、前記信頼度を所定のタイミングで初期化することにより、環境の変化に好適に対応することができる。なお、前記信頼度の初期化は、上述した第１の導出処理及び第２の導出処理の少なくとも何れかをリスタートする処理の一部として実行されてもよい。 The derivation unit 12 may also be configured to initialize the reliability at a predetermined timing. As an example, the derivation unit 12 may be configured to initialize the reliability (set the reliability to an initial value) every time the second optimal solution is derived a predetermined number of times (e.g., 100 times). In this way, the derivation unit 12 can appropriately respond to changes in the environment by initializing the reliability at a predetermined timing. Note that the initialization of the reliability may be executed as part of the process of restarting at least one of the first derivation process and the second derivation process described above.

　また、導出部１２は、信頼度を初期化するタイミングを、環境変化の周期に応じて決定してもよい。一例として、本例示的実施形態に係る情報処理装置システム１００Ａによる意思決定処理を、環境変化が概ね１ヶ月程度の周期で起こる状況に適用する場合、導出部１２は、前記信頼度を、１ヶ月毎に初期化する構成としてもよい。 The derivation unit 12 may also determine the timing for initializing the reliability depending on the period of the environmental change. As an example, when the decision-making process by the information processing device system 100A according to this exemplary embodiment is applied to a situation in which the environmental change occurs approximately once a month, the derivation unit 12 may be configured to initialize the reliability every month.

　また、導出部１２は、前記第１の最適解に対応する損失値を推定する処理を行ってもよい。ここで、当該損失値を推定する処理は、上述した第２の導出処理の一部として実行してもよい。また、導出部１２は、上記第１の導出処理において、当該導出した損失値を参照して、前記第１の最適解を更新する処理を行ってもよい。
　上述したように、意思決定の結果によっては、出力値に対応する損失値が、観測によって取得可能な場合もあるし、取得不可能な場合もある。上記の構成によれば、導出部１２は、前記第１の最適解に対応する損失値を推定し、推定した損失値を参照して、当該第１の最適解を更新するので、出力値に対応する損失値が、観測によって取得不可能な場合であっても、好適な最適解（意思決定結果）を導出することができる。 The derivation unit 12 may also perform a process of estimating a loss value corresponding to the first optimal solution. Here, the process of estimating the loss value may be executed as a part of the above-mentioned second derivation process. The derivation unit 12 may also perform a process of updating the first optimal solution by referring to the derived loss value in the above-mentioned first derivation process.
As described above, depending on the result of the decision-making, the loss value corresponding to the output value may be obtainable by observation in some cases, but may not be obtainable in other cases. According to the above configuration, the derivation unit 12 estimates a loss value corresponding to the first optimal solution and updates the first optimal solution by referring to the estimated loss value, so that even if the loss value corresponding to the output value cannot be obtained by observation, it is possible to derive a suitable optimal solution (decision-making result).

　また、導出部１２は、前記損失値を、不偏推定量として算出する構成としてもよい。当該構成によれば、不偏推定量として算出された損失値を参照して最適化の導出を行うので、より好適な最適解（意思決定結果）を導出することができる。 The derivation unit 12 may also be configured to calculate the loss value as an unbiased estimator. With this configuration, the optimization is derived by referring to the loss value calculated as an unbiased estimator, so that a more suitable optimal solution (decision-making result) can be derived.

　また、導出部１２は、損失値又は信頼度を補正するためのパラメータを更に参照して、前記意思決定結果を導出してもよい。当該構成によれば、各エキスパート（モデル）の信頼度、各第１の導出処理の信頼度、及び損失値の少なくとも何れかを、当該パラメータによって適宜補正可能となるので、情報処理装置１に関し、最低限の精度を保証した運用、いわゆる保守的運用が可能となる。 The derivation unit 12 may further refer to parameters for correcting the loss value or the reliability to derive the decision-making result. With this configuration, at least one of the reliability of each expert (model), the reliability of each first derivation process, and the loss value can be appropriately corrected using the parameters, making it possible to operate the information processing device 1 with a minimum level of accuracy guaranteed, i.e., a conservative operation.

　＜端末装置２Ａの構成＞
　端末装置２Ａは、図６に示すように、制御部２０Ａ、実行部２１、及び通信部２６を備えている。端末装置２Ａは、一例として、店舗に配置された会計用端末、及び倉庫に配置された在庫管理用端末等として具体的に実現することができるが、これは本例示的実施形態を限定するものではない。 <Configuration of terminal device 2A>
6, the terminal device 2A includes a control unit 20A, an execution unit 21, and a communication unit 26. As an example, the terminal device 2A can be specifically realized as a checkout terminal installed in a store, an inventory management terminal installed in a warehouse, or the like, but this is not intended to limit the present exemplary embodiment.

　通信部２６は、端末装置２Ａの外部の装置と通信を行う。一例として通信部２６は、情報処理装置１Ａと通信を行う。通信部２６は、制御部２０Ａから供給されたデータを情報処理装置１Ａに送信したり、情報処理装置１Ａから受信したデータを制御部２０Ａに供給したりする。 The communication unit 26 communicates with devices external to the terminal device 2A. As an example, the communication unit 26 communicates with the information processing device 1A. The communication unit 26 transmits data supplied from the control unit 20A to the information processing device 1A, and supplies data received from the information processing device 1A to the control unit 20A.

　実行部２１は、導出部１２が導出した意思決定結果又は当該意思決定結果に対応する処理を行う。一例として、実行部２１は、導出部１２が導出した（第２の最適解）第２の意思決定結果を実行する。実行部２１は、導出部１２が導出した第２の意思決定結果に代えて、又はそれと共に、導出部１２が導出した複数の第１の意思決定結果の少なくとも一部を実行する構成としてもよい。また、実行部２１は、導出部１２が導出した第２の意思決定結果、及び、導出部１２が導出した複数の第１の意思決定結果の少なくとも一部を表示可能に構成されていてもよいし、第２の意思決定結果、及び、複数の第１の意思決定結果を互いに比較可能に表示する構成としてもよい。 The execution unit 21 performs the decision-making result derived by the derivation unit 12 or a process corresponding to the decision-making result. As an example, the execution unit 21 executes the second decision-making result (second optimal solution) derived by the derivation unit 12. The execution unit 21 may be configured to execute at least a part of the multiple first decision-making results derived by the derivation unit 12 instead of or together with the second decision-making result derived by the derivation unit 12. Furthermore, the execution unit 21 may be configured to be able to display the second decision-making result derived by the derivation unit 12 and at least a part of the multiple first decision-making results derived by the derivation unit 12, or may be configured to display the second decision-making result and the multiple first decision-making results so that they can be compared with each other.

　制御部２０Ａは、図６に示すように、損失値取得部２２、及び損失値提供部２３を備えている。損失値取得部２２は、実行部２１による実行の結果として、各エキスパート（モデル）に関する損失値が観測（測定）によって取得可能である場合に、当該損失値を取得する。損失値提供部２３は、損失値取得部２２が取得した損失値を、通信部２６を介して情報処理装置１Ａに提供する。 As shown in FIG. 6, the control unit 20A includes a loss value acquisition unit 22 and a loss value providing unit 23. When a loss value for each expert (model) can be acquired by observation (measurement) as a result of execution by the execution unit 21, the loss value acquisition unit 22 acquires the loss value. The loss value providing unit 23 provides the loss value acquired by the loss value acquisition unit 22 to the information processing device 1A via the communication unit 26.

　図７は、本例示的実施形態に係る情報処理システム１００Ａによる逐次的意思決定処理を模式的に説明するための図である。図７に示すように、情報処理装置１Ａは、あるラウンドにおいて、各エキスパート（モデル）に対応付けられた出力値を、複数のエキスパートについて取得し、取得した各出力値を参照して意思決定結果を導出する。ここで、当該主力値に対応付けられた損失値が観測によって取得可能である場合、当該損失値を更に取得し、取得した損失値を更に参照して思決定結果を導出してもよい。また、情報処理装置１Ａが行う意思決定処理は、一例として、上述したように、複数の第１の導出部１２－１、１２１－２、・・・、及び第２の導出部１２２による、階層的な意思決定処理である。 FIG. 7 is a diagram for explaining the sequential decision-making process by the information processing system 100A according to this exemplary embodiment. As shown in FIG. 7, in a certain round, the information processing device 1A acquires output values associated with each expert (model) for multiple experts, and derives a decision-making result by referring to each acquired output value. Here, if a loss value associated with the force value can be acquired by observation, the loss value may be further acquired, and the decision-making result may be derived by further referring to the acquired loss value. In addition, the decision-making process performed by the information processing device 1A is, as an example, a hierarchical decision-making process by multiple first derivation units 12-1, 121-2, ... and second derivation unit 122, as described above.

　そして、導出された意思決定結果（最適解）が実行されることにより、次のラウンドに対応する損失値が観測され得る。損失値が観測によって取得可能である場合、当該損失値と、当該損失値に対応する出力値とが情報処理装置１Ａに提供され、当該次のラウンドにおける意思決定処理において参照される。このようにして、情報処理システム１００Ａは、意思決定結果の導出を逐次的に行う。 Then, the derived decision-making result (optimal solution) is executed, and a loss value corresponding to the next round can be observed. If a loss value can be obtained by observation, the loss value and an output value corresponding to the loss value are provided to the information processing device 1A and are referenced in the decision-making process in the next round. In this way, the information processing system 100A sequentially derives decision-making results.

　（情報処理装置１Ａによる具体的処理例）
　以下では、情報処理装置１Ａによる具体的処理例について説明する。本例に係る情報処理装置１Ａは、以下のアルゴリズム１とアルゴリズム２とを実行する。アルゴリズム１は、主として、複数の第１の導出部１２１－１、１２１－２、・・・の各々によって実行され、アルゴリズム２は、主として第２の導出部１２２によって実行される。

　（アルゴリズム１の処理）
　まず、アルゴリズム１の処理について説明する。アルゴリズム１の冒頭に示すように、取得部１１は、グラフＧ、パラメータη、及びパラメータＴを取得する。ここで、グラフＧは、アルゴリズム１の冒頭に示すように、頂点Ｖ、及びエッジ（辺、リンク）Ｅとによって規定される。本例示的実施形態において、グラフＧは、一例として、フィードバックグラフと呼ばれる有向グラフである。 (Specific processing example by information processing device 1A)
A specific example of processing by the information processing device 1A will be described below. The information processing device 1A according to this example executes the following algorithm 1 and algorithm 2. Algorithm 1 is mainly executed by each of the multiple first derivation units 121-1, 121-2, ..., and algorithm 2 is mainly executed by the second derivation unit 122.

(Processing of Algorithm 1)
First, the process of Algorithm 1 will be described. As shown at the beginning of Algorithm 1, the acquisition unit 11 acquires a graph G, a parameter η, and a parameter T. Here, as shown at the beginning of Algorithm 1, the graph G is defined by a vertex V and an edge (side, link) E. In this exemplary embodiment, the graph G is, as an example, a directed graph called a feedback graph.

　フィードバックグラフでは、各頂点Ｖは、意思決定結果が取り得る選択肢に対応し、各エッジは損失の観測可能性を示す。一例として、頂点Ｖ（ｉ）から頂点Ｖ（ｊ）への有向エッジＥ（ｉ，ｊ）は、意思決定結果が選択肢ｉを示す場合（プレーヤが選択肢ｉを選択した場合）に、選択肢ｊに付随した損失値が観測可能であることを示す。ここで、ｉ＝ｊの場合は、セルフループとも呼ばれる。 In the feedback graph, each vertex V corresponds to a possible option that the decision-making result can take, and each edge indicates the observability of a loss. As an example, a directed edge E(i,j) from vertex V(i) to vertex V(j) indicates that if the decision-making result indicates option i (if the player selects option i), then the loss value associated with option j is observable. Here, the case of i=j is also called a self-loop.

　図８の上段は、フィードバックグラフの例１を示している。当該例は、「リンゴがおいしいなら出荷したほうがよいが、そうでないなら出荷しないほうがよい」という問題設定に対応するフィードバックグラフである。図８の上段に示すように、当該例のフィードバックグラフには、リンゴを出荷しない選択肢Ｖ（１）と、リンゴを出荷する選択肢Ｖ（２）とが存在し、リンゴを出荷しない選択肢Ｖ（１）を選択した場合、当該リンゴを味見できるので当該リンゴがおいしいかどうかわかることになる。したがって、当該リンゴを出荷した方が良かったか否かがわかることになる。これは、リンゴを出荷しない選択肢Ｖ（１）を選択した場合、
・リンゴを出荷しない選択肢Ｖ（１）に付随した損失値が観測可能であり（エッジＥ（１，１）に対応）、
・リンゴを出荷する選択肢Ｖ（２）に付随した損失値も観測可能である（エッジＥ（１，２）に対応）
ことを示している。 The top part of Fig. 8 shows Example 1 of a feedback graph. This example is a feedback graph that corresponds to the problem setting of "If the apples are delicious, it is better to ship them, but if they are not, it is better not to ship them." As shown in the top part of Fig. 8, the feedback graph of this example has option V(1) of not shipping the apples and option V(2) of shipping the apples. If option V(1) of not shipping the apples is selected, the apples can be tasted and it is possible to know whether they are delicious or not. Therefore, it is possible to know whether it would have been better to ship the apples or not. This means that if option V(1) of not shipping the apples is selected,
The loss value associated with the option V(1) of not shipping apples is observable (corresponding to edge E(1,1)),
The loss value associated with option V(2) of shipping apples is also observable (corresponding to edge E(1,2)).
This shows that.

　一方で、リンゴを出荷する選択肢Ｖ（２）を選択した場合、当該リンゴを味見できないので当該リンゴがおいしいかどうかわからないことになる。したがって、リンゴを出荷する選択肢Ｖ（２）を選択した場合には、何らの損失値も観測できないことを示している。これは、フィードバックグラフにおいて、選択肢Ｖ（２）を起点（始点）とする外向き矢印（外向きエッジ）が存在しないことに対応する。 On the other hand, if option V(2), which involves shipping the apples, is selected, the apples cannot be tasted and therefore it is not known whether they are delicious or not. This indicates that if option V(2), which involves shipping the apples, is selected, no loss value can be observed. This corresponds to the absence of an outward arrow (outward edge) in the feedback graph that has option V(2) as its origin (starting point).

　図８の下段は、フィードバックグラフの例２を示している。当該例は、「ある商品について適切な量を発注したい」という問題設定に対応するフィードバックグラフである。図８の下段に示すように、当該例のフィードバックグラフには、３００個発注する選択肢Ｖ（１）と、２００個発注する選択肢Ｖ（２）と、１００個発注する選択肢Ｖ（３）とが存在し、より多く発注した場合には、より少なく発注した場合の損失値が観測できるが、より少なく発注した場合には、より多く発注した場合の損失値を観測できない場合があることを示している。 The lower part of Figure 8 shows feedback graph example 2. This example is a feedback graph that corresponds to the problem statement of "want to order the appropriate amount of a certain product." As shown in the lower part of Figure 8, the feedback graph of this example has option V(1) of ordering 300 units, option V(2) of ordering 200 units, and option V(3) of ordering 100 units, which shows that when a larger amount is ordered, the loss value when a smaller amount is ordered can be observed, but when a smaller amount is ordered, it may not be possible to observe the loss value when a larger amount is ordered.

　このように、フィードバックグラフによって、様々な問題設定を表現することができる。例えば、バンディットフィードバックは、複数の頂点の各々がセルフループのみしか有しない場合に対応し、フルフィードバック（full information feedback）は、複数の頂点に含まれる全てのペアが双方向に連結され、かつ、全ての頂点がセルフループを有する場合に対応する。 In this way, feedback graphs can be used to express a variety of problem settings. For example, bandit feedback corresponds to the case where each of the multiple vertices has only self-loops, and full information feedback corresponds to the case where all pairs contained in the multiple vertices are bidirectionally connected and all vertices have self-loops.

　本例示的実施形態に係るアルゴリズム１は、任意のフィードバックグラフに適用可能に構成されているため、本例示的実施形態に係る情報処理システム１００Ａは、非常に広範な問題設定に対して適用することができる。 Since algorithm 1 according to this exemplary embodiment is configured to be applicable to any feedback graph, information processing system 100A according to this exemplary embodiment can be applied to a very wide range of problem settings.

　アルゴリズム１の説明に戻ると、取得部１１が取得するパラメータηは、後述するように、信頼度を導出するために参照される学習率としての役割を有している。また、取得部１１が取得するパラメータＴは、ラウンドの総数を規定する自然数である。 Returning to the explanation of algorithm 1, the parameter η acquired by the acquisition unit 11 serves as a learning rate that is referenced to derive the reliability, as described below. In addition, the parameter T acquired by the acquisition unit 11 is a natural number that specifies the total number of rounds.

　続いて、アルゴリズム１の「１：」に示すように、第１の導出部１２１は、第１の最適解（第１の意思決定結果）ｐ_ｔを導出するために参照するパラメータ（重み因子）ｐ’_ｔの初期値を、

によって設定する。ここで、Ｋは、第１の最適解（第１の意思決定結果）として取り得る選択肢の総数を表しており、上述したフィードバックグラフの頂点の総数｜Ｖ｜でもある。後述の説明からも理解されるように、ｐ_ｔを導出するために参照するパラメータ（重み因子）ｐ’_ｔは、各エキスパート（モデル）の出力値（ｍ_ｔ）の各々の信頼度を表すパラメータと捉えることができる。ただし当該解釈は本例示的実施形態を限定するものではない。 Next, as shown in “1:” of Algorithm 1, the first derivation unit 121 determines the initial value of a parameter (weighting factor) p′ _t to be referenced in order to derive a first optimal solution (first decision-making result) p _t as follows:

Here, K represents the total number of options that can be taken as the first optimal solution (first decision-making result), and is also the total number of vertices |V| of the feedback graph described above. As will be understood from the explanation below, the parameter (weighting factor) p' _t referred to in order to derive p _t can be considered as a parameter representing the reliability of each output value (m _t ) of each expert (model). However, this interpretation does not limit this exemplary embodiment.

　また、上記（式１）において、右辺の分子は、

によって定義されるＫ次元のベクトル（配列）である。ここで、［Ｋ］は、

によって規定される集合（すなわち、自然数１からＫまでを要素とする集合）を表している。 In addition, in the above formula (1), the numerator on the right side is

Here, [K] is a K-dimensional vector (array) defined by

(i.e., a set whose elements are the natural numbers 1 through K).

　続いて、第１の導出部１２１は、アルゴリズム１の「２：」～「７：」によって特定されるループ処理を実行する。当該ループ処理は、ラウンドを示すインデックスｔをインクリメントしながら、ｔ＝Ｔ（Ｔはラウンドの総数）となるまで実行される。換言すれば、第１の導出部１２１は、アルゴリズム１の「３：」～「７：」において特定される処理を、ラウンド毎に繰り返す。 Then, the first derivation unit 121 executes the loop process specified by "2:" to "7:" in algorithm 1. The loop process is executed while incrementing an index t indicating the round, until t = T (T is the total number of rounds). In other words, the first derivation unit 121 repeats the process specified in "3:" to "7:" in algorithm 1 for each round.

　より具体的には、まず、アルゴリズム１の「３：」に示すように、第１の導出部１２１が、関数ψ（ｐ）を、

によって設定（定義）する。当該関数ψ（ｐ）は、後述するブレグマン情報量（Bregman divergence）を規定する凸関数としての役割を有している。上記式において、ｐ（ｉ）は、

によって規定される集合の要素である。また、関数ψ（ｐ）の定義式におけるＮ^ｉｎ（ｉ）は、

によって規定される集合（換言すれば、頂点番号ｉの頂点Ｖ（ｉ）について、当該頂点Ｖ（ｉ）に入ってくるエッジの起点（始点）となる頂点Ｖ（ｊ）の頂点番号ｊを要素とする集合）であり、｜Ｎ^ｉｎ（ｉ）｜は、当該集合の要素の数を示している。また、関数ψ（ｐ）の定義式における

は、［］内の条件が満たされる場合、すなわち、｜Ｎ^ｉｎ（ｉ）｜＜Ｋ　が満たされる場合に１を返し、そうでない場合に、０を返す関数である。
　したがって、関数ψ（ｐ）の定義式（式４）における右辺第２項は、ある頂点Ｖ（ｉ）について、当該頂点Ｖ（ｉ）に入ってくるエッジの起点（始点）となる頂点Ｖ（ｊ）の数が、頂点の総数｜Ｖ｜＝Ｋよりも少ない場合に、ｌｏｇ　ｐ（ｉ）に比例する０でない値を有する。当該第２項のことを対数バリア項とも呼ぶ。 More specifically, first, as shown in “3:” of Algorithm 1, the first derivation unit 121 derives the function ψ(p) as follows:

The function ψ(p) serves as a convex function that defines the Bregman divergence, which will be described later. In the above formula, p(i) is defined as follows:

In addition, N ⁱⁿ (i) in the definition of the function ψ(p) is expressed as follows:

(in other words, for a vertex V(i) with vertex number i, a set whose element is the vertex number j of a vertex V(j) that is the starting point (start point) of an edge entering the vertex V(i)), and |N ⁱⁿ (i)| indicates the number of elements of the set. Also, in the definition of the function ψ(p),

is a function that returns 1 if the condition in [ ] is satisfied, that is, if |N ⁱⁿ (i)|<K is satisfied, and returns 0 otherwise.
Therefore, the second term on the right side of the definition equation (Equation 4) of the function ψ(p) has a non-zero value proportional to log p(i) when the number of vertices V(j) that are the origins (starting points) of edges entering a vertex V(i) is less than the total number of vertices |V| = K. This second term is also called the logarithmic barrier term.

　このように、本例示的実施形態に係る第１の導出部１２１が第１の最適解を導出するために参照するブレグマン情報量は、凸関数ψ（ｐ）によって記載され、当該凸関数ψ（ｐ）は、上述した対数バリア項を含む。このような対数バリア項を用いることにより、様々な環境（様々な問題設定（様々なフィードバックグラフ））に対して柔軟に適用可能な意思決定処理を行うことができる。 In this manner, the Bregman information that the first derivation unit 121 according to this exemplary embodiment refers to in order to derive the first optimal solution is described by a convex function ψ(p), and the convex function ψ(p) includes the logarithmic barrier term described above. By using such a logarithmic barrier term, it is possible to perform decision-making processing that can be flexibly applied to various environments (various problem settings (various feedback graphs)).

　続いて、アルゴリズム１の「４：」に示すように、取得部１１は、各エキスパート（モデル）が出力する出力値（ｍ_ｔ）を取得する。各エキスパート（モデル）が出力する出力値については上述したためここでは説明を省略する。 Next, as shown in “4:” of Algorithm 1, the acquisition unit 11 acquires the output value (m _t ) output by each expert (model). The output value output by each expert (model) has been described above, so a description thereof will be omitted here.

　続いて、アルゴリズム１の「５：」に示すように、第１の導出部１２１は、

によって、ラウンドｔにおける第１の最適解（第１の意思決定結果）ｐ_ｔを導出する。ここで、（式８）の右辺第１項は、ｍ_ｔとｐとの内積を示しており、右辺第２項は、関数ψを用いて規定されるブレグマン情報量（Bregman divergence）

の引数として、ｐと、ｐ’_ｔとを用いたものを示している。また、（式８）におけるΔ’_Ｋは、

によって規定される集合である。 Next, as shown in “5:” of Algorithm 1, the first derivation unit 121

Here, the first term on the right-hand side of (Equation 8) represents the inner product of _mt and p, and the second term on the right-hand side represents the Bregman divergence defined using the function _ψ .

In addition, Δ′ _K in ( _Equation 8) is expressed as follows:

is the set defined by

　（式８）に示すように、第１の導出部１２１は、
・ｍ_ｔとｐとの内積＜ｍ_ｔ，ｐ＞と、
・ｐ、ｐ’_ｔ、及びψによって規定されるブレグマン情報量Ｄ_ψ（ｐ，ｐ’_ｔ）と
の線形和が最小となるようなｐを、第１の最適解（第１の意思決定結果）として導出する。 As shown in (Equation 8), the first derivation part 121 is
・The inner product of m _t and p, <m _t , p>,
Derive p that minimizes the linear sum of p, p' _t , and the Bregman information D _ψ (p, p' _t ) defined by ψ as a first optimal solution (first decision-making result).

　このように、第１の導出部１２１は、１又は複数のモデル（エキスパート）の各々から取得した出力値（ｍ_ｔ）を参照して第１の最適解（ｐ_ｔ）を導出する第１の導出処理（ベースアルゴリズム）を実行する。 In this way, the first derivation unit 121 executes a first derivation process (base algorithm) that derives a first optimal solution ( _pt ) by referring to the output values ( _mt ) obtained from each of one or more models (experts).

　続いて、アルゴリズム１の「６：」に示すように、情報処理システム１００Ａは、上記第１の最適解ｐ_ｔを実行し、当該最適解の実行に伴う損失値ｌ_ｔを観測する。また、取得部１１は、補正のためのパラメータα_ｔを取得する。ここで、第１の最適解ｐ_ｔの実行は、一例として、端末装置２Ａの実行部２１によって実行され得る。また、損失値の取得は、一例として、損失値取得部２２によって実行される。ただし、導出された第１の最適解ｐ_ｔの少なくとも一部は、実行されることなく、後述するアルゴリズム２を実行する第２の導出部１２２に供給される構成としてもよい。また、第１の最適解ｐ_ｔが示す内容によっては、損失値ｌ_ｔが観測できない場合もあることは上述した通りである。また、損失値ｌ_ｔ、及び補正のためのパラメータα_ｔの少なくとも一部は、後述するアルゴリズム２によって導出された値を用いてもよい。 Next, as shown in "6:" of algorithm 1, the information processing system 100A executes the first optimal solution p _t and observes the loss value l _t associated with the execution of the optimal solution. The acquisition unit 11 acquires a parameter α _t for correction. Here, the execution of the first optimal solution p _t may be executed by the execution unit 21 of the terminal device 2A, as an example. The acquisition of the loss value is executed by the loss value acquisition unit 22, as an example. However, at least a part of the derived first optimal solution p _t may be supplied to the second derivation unit 122 that executes algorithm 2, which will be described later, without being executed. As described above, depending on the contents indicated by the first optimal solution p _t , there are cases where the loss value l _t cannot be observed. At least a part of the loss value l _t and the parameter α _t for correction may use values derived by algorithm 2, which will be described later.

　続いて、アルゴリズム１の「７：」に示すように、第１の導出部１２１は、

によって、パラメータｐ’_ｔを更新する。このように、第１の導出部１２１は、損失値ｌ_ｔを参照して、第１の最適解（ｐ_ｔ）を導出するためのパラメータ（ｐ’_ｔ）を更新する。ここで、ａ_ｔは、

によって規定されるパラメータである。より具体的に言えば、ａ_ｔは、損失ｌ_ｔ（ｉ）、エキスパート（モデル）の出力値ｍ_ｔ（ｉ）、補正パラメータα_ｔ、及び学習率ηによって規定されるパラメータである。（式１１）及び（式１２）から明らかなように、補正パラメータα_ｔは、損失値ｌ_ｔ、出力値ｍ_ｔ、パラメータｐ’_ｔ、及び第１の最適解ｐ_ｔの少なくとも何れかを補正するためのパラメータと捉えることができる。また、（式１２）によって規定されるパラメータａ_ｔも、同様の意味での補正のためのパラメータと捉えることができる。 Next, as shown in “7:” of Algorithm 1, the first derivation unit 121

In this way, the first derivation unit 121 refers to the loss value l _t and updates the parameter (p' _t ₎ for deriving the first optimal solution (p _t ). Here, a _t is expressed as

More specifically, a _t is a parameter defined by the loss l _t (i), the output value m _t (i) of the expert (model), the correction parameter α _t , and the learning rate η. As is clear from (Equation 11) and (Equation 12), the correction parameter α _t can be considered as a parameter for correcting at least one of the loss value l _t , the output value m _t , the parameter p′ _t , and the first optimal solution p _t . The parameter a _t defined by (Equation 12) can also be considered as a parameter for correction in the same sense.

　このように、第１の導出部１２１は、補正のためのパラメータα_ｔ又はａ_ｔを更に参照して、第１の最適解（ｐ_ｔ）を更新する構成であると捉えることができる。 In this way, the first derivation unit 121 can be considered to be configured to update the first optimal solution (p _t ) by further referring to the parameter α _t or a _t for correction.

　（アルゴリズム２の処理）
　続いて、アルゴリズム２の処理について説明する。アルゴリズム２の冒頭に示すように、取得部１１は、グラフＧ（Ｖ，Ｅ）、及びパラメータＴを取得する。グラフＧ（Ｖ，Ｅ）、及びパラメータＴについては上述したためここでは説明を省略する。 (Processing of Algorithm 2)
Next, the process of algorithm 2 will be described. As shown at the beginning of algorithm 2, the acquisition unit 11 acquires a graph G(V, E) and a parameter T. The graph G(V, E) and the parameter T have been described above, and therefore will not be described here.

　また、アルゴリズム２の冒頭の「Input」に示す通り、アルゴリズム２はベースアルゴリズムＢを参照して実行される。ここで、ベースアルゴリズムＢは、一例として、上述したアルゴリズム１のことを指す。アルゴリズム２は複数のベースアルゴリズムＢを参照して実行される。換言すれば、第２の導出部１２２は、複数のアルゴリズム１の各々を実行する第１の導出部１２１の各々と連携しながら、第２のアルゴリズムを実行する。 Also, as shown in "Input" at the beginning of algorithm 2, algorithm 2 is executed with reference to base algorithm B. Here, base algorithm B refers to algorithm 1 described above, as an example. Algorithm 2 is executed with reference to multiple base algorithms B. In other words, the second derivation unit 122 executes the second algorithm in cooperation with each of the first derivation units 121 that execute each of the multiple algorithms 1.

　続いて、アルゴリズム２の「１：」に示すように、第２の導出部１２２は、ラウンドの総数Ｔを用いて、パラメータＭを

に設定する。ここで、ｌｏｇ_２Ｔの両側の記号は、天井関数を示している。したがって、パラメータＭは、第２の導出部１２２によって、実数である引数（ｌｏｇ_２Ｔ）以上である最小の整数値に設定される（例えばｌｏｇ_２Ｔ＝３．３・・・であれば、Ｍ＝４に設定される）。ここで、パラメータＭは、アルゴリズム２が参照するベースアルゴリズム（アルゴリズム１）の総数としての意味を有するが、これは本例示的実施形態を限定するものではない。 Next, as shown in “1:” of Algorithm 2, the second derivation unit 122 uses the total number of rounds T to derive the parameter M.

Here, the symbols on both sides of log ₂ T indicate ceiling functions. Therefore, the parameter M is set by the second derivation unit 122 to the smallest integer value that is equal to or greater than the argument (log ₂ T), which is a real number (for example, if log ₂ T=3.3..., then M=4). Here, the parameter M has the meaning of the total number of base algorithms (algorithm 1) that algorithm 2 refers to, but this does not limit the present exemplary embodiment.

　また、アルゴリズム２の「１：」に示すように、第２の導出部１２２は、パラメータη（ｊ）を

に設定する。ここで、インデックスｊは、複数のベースアルゴリズムを互いに識別するためのインデックスであり、アルゴリズム２の「１：」に示すように、一例として、１からＭまでの整数値を取る。なお、パラメータη（ｊ）は、後述する信頼度（各ベースアルゴリズムの信頼度）ｗ_ｔ（ｊ）を導出するために参照する学習率としての役割を有している。したがって、第２の導出部１２２は、当該信頼度ｗ_ｔ（ｊ）を導出するために参照する学習率η（ｊ）を、ベースアルゴリズム（第１の導出処理）の総数Ｍの－１／２乗に比例した値に設定すると表現することができる。発明者の得た知見によれば、上述のように学習率η（ｊ）を設定することにより、例えば、学習率η（ｊ）を、ベースアルゴリズムの総数Ｍの－１乗に比例した値に設定した場合等に比べて、信頼度ｗ_ｔ（ｊ）の導出をより好適に行うことができる。なお、以下では、第１の導出部１２１－１，１２１－２，・・・の各々を、上述のインデックスｊを用いて、１２１－ｊと表記することもある。 As shown in “1:” of Algorithm 2, the second derivation unit 122 calculates the parameter η(j) as

Here, the index j is an index for distinguishing a plurality of base algorithms from each other, and as an example, takes an integer value from 1 to M, as shown in "1:" of algorithm 2. The parameter η(j) serves as a learning rate referenced to derive the reliability (reliability of each base algorithm) w _t (j), which will be described later. Therefore, it can be expressed that the second derivation unit 122 sets the learning rate η(j) referenced to derive the reliability w _t (j) to a value proportional to the -1/2 power of the total number M of base algorithms (first derivation process). According to the knowledge obtained by the inventors, by setting the learning rate η(j) as described above, the reliability w _t (j) can be more suitably derived, for example, compared to a case in which the learning rate η(j) is set to a value proportional to the -1 power of the total number M of base algorithms. In the following, each of the first derivation parts 121-1, 121-2, . . . may be expressed as 121-j using the above-mentioned index j.

　続いて、アルゴリズム２の「２：」に示すように、第２の導出部１２２は、ラウンドｔ＝１における重み因子ｗ_１’を、ラウンド１における意思決定結果に対応するパラメータｐ_１’が、各ｊに関し

を満たすように設定する。ここで、ラウンドｔ＝１における重み因子ｗ_１’は、アルゴリズム２の「２：」に示すように、

を満たす。ここで、Δ_Ｍは、以下の定義式においてＫをＭに置き換えて得られる集合を示す。

　続いて、アルゴリズム２の「３：」に示すように、第２の導出部１２２は、各ベースアルゴリズム

を初期化する。ここで、当該初期化処理には、アルゴリズム２の「３：」に示すように、第２の導出部１２２が、ベースアルゴリズムＢ_ｊに対して、
　　Ｇ，η（ｊ），Ｔ
の各値を渡す処理が含まれる。 Next, as shown in “2:” of Algorithm 2, the second derivation unit 122 derives the weight factor w ₁ ′ in round t=1 by deducing that the parameter p ₁ ′ corresponding to the decision-making result in round 1 is

Here, the weight factor w ₁ ′ in round t=1 is set to satisfy the following, as shown in “2:” of Algorithm 2:

Here, _ΔM denotes a set obtained by replacing K with M in the following definition.

Next, as shown in “3:” of Algorithm 2, the second derivation unit 122 calculates each base algorithm

Here, in the initialization process, as shown in “3:” of the algorithm 2, the second derivation unit 122 initializes the base algorithm B _j as follows:
G, η(j), T
This includes passing each value.

　続いて、第２の導出部１２２は、アルゴリズム２の「４：」～「１３：」によって特定されるループ処理を実行する。当該ループ処理は、ラウンドを示すインデックスｔをインクリメントしながら、ｔ＝Ｔとなるまで実行される。換言すれば、第２の導出部１２２は、アルゴリズム１の「５：」～「１３：」において特定される処理を、ラウンド毎に繰り返す。 Then, the second derivation unit 122 executes the loop process specified by "4:" to "13:" of algorithm 2. The loop process is executed while incrementing the index t indicating the round, until t = T. In other words, the second derivation unit 122 repeats the process specified by "5:" to "13:" of algorithm 1 for each round.

　より具体的には、まず、アルゴリズム２の「５：」に示すように、取得部１１が、予測値ｍ_ｔを取得し、第２の導出部１２２が、各第１の導出部１２１－ｊを介して、各ベースアルゴリズム（各アルゴリズム１）に、当該予測値ｍ_ｔを供給する。 More specifically, first, as shown in “5:” of algorithm 2, the acquisition unit 11 acquires a predicted value m _t , and the second derivation unit 122 supplies the predicted value m _t to each base algorithm (each algorithm 1) via each first derivation unit 121-j.

　続いて、アルゴリズム２の「６：」に示すように、取得部１１が、各ベースアルゴリズム（各アルゴリズム１）による各第１の最適解（第１の意思決定結果）ｐ_ｔ，ｊを取得する。そして、第２の導出部１２２は、アルゴリズム２の「６：」に示すように、パラメータｈ_ｔ（ｊ）を、

によって設定する。ここで、< , >は内積を示している。 Next, as shown in “6:” of Algorithm 2, the acquisition unit 11 acquires each first optimal solution (first decision-making result) p _t,j by each base algorithm (each algorithm 1). Then, as shown in “6:” of Algorithm 2, the second derivation unit 122 calculates the parameter h _t (j) as follows:

Here, <,> denotes the dot product.

　続いて、アルゴリズム２の「７：」に示すように、第２の導出部１２２は、

によって、各ベースアルゴリズム（各アルゴリズム１）（各第１の導出手段１２１－ｊ）の信頼度を示す信頼度ベクトルｗ_ｔを導出する。ここで、Ｄ_φは、上述したブレグマン情報量（Bregman divergence）を表している。ただし、当該ブレグマン情報量を規定する凸関数φは、

によって与えられる。このように、また、一部上述したが、ｗ’_ｔは、信頼度ｗ_ｔを導出するために参照される重み因子である。このように、第２の導出部１２２は、出力値（ｍ_ｔ）及び前記第１の最適解（ｐ_ｔ）を参照して、前記信頼度（ｗ_ｔ（ｊ））を導出する。 Next, as shown in “7:” of Algorithm 2, the second derivation unit 122

A reliability vector _wt indicating the reliability of each base algorithm (each algorithm 1) (each first derivation means 121-j) is derived by: where _Dφ represents the above-mentioned Bregman divergence. However, the convex function φ that defines the Bregman divergence is given by:

As described above, _w't is a weighting factor that is referred to in order to derive the reliability _wt . In this manner, the second derivation unit 122 derives the reliability ( _wt (j)) by referring to the output value ( _mt ) and the first optimal solution ( _pt ).

　続いて、アルゴリズム２の「８：」に示すように、第２の導出部１２２は、
・各ベースアルゴリズム（各アルゴリズム１）による第１の最適解（第１の意思決定結果）ｐ_ｔ，ｊ、
・各ベースアルゴリズムの信頼度を示す信頼度ベクトルｗ_ｔの各成分である信頼度ｗ_ｔ（ｊ）
を用いて、

によって、第２の最適解（第２の意思決定結果）ｐ_ｔを導出する。 Next, as shown in “8:” of Algorithm 2, the second derivation unit 122
A first optimal solution (first decision-making result) p _t,j by each base algorithm (each algorithm 1),
Reliability w _t (j), which is each component of the reliability vector w _t indicating the reliability of each base algorithm
Using

A second optimal solution (second decision-making result) p _t is derived by:

　このように、第２の導出部１２２は、上述した複数の第１の導出処理（ベースアルゴリズム）の各々が導出した第１の最適解（ｐ_ｔ,ｊ）と、前記複数の第１の導出処理（ベースアルゴリズム）の各々の信頼度（ｗ_ｔ（ｊ））とに応じて第２の最適解（ｐ_ｔ）を導出する第２の導出処理（マスタアルゴリズム）を実行する。より具体的には、上記式にて表現されるように、第２の導出部１２２は、複数の第１の導出部１２１－ｊが導出した各第１の最適解（第１の意思決定結果）ｐ_ｔ，ｊの加重和であって、当該複数の第１の導出部１２１－ｊの各々の信頼度ｗ_ｔ（ｊ）に応じた加重和によって、第２の最適解（第２の意思決定結果）ｐ_ｔを導出する。 In this way, the second derivation unit 122 executes a second derivation process (master algorithm) that derives a second optimal solution (p t ) in accordance with the first optimal solution (p _t,j ) derived by each of the multiple first derivation processes (base algorithms) described above and the reliability (w _t ( _j )) of each of the multiple first derivation processes (base algorithms). More specifically, as expressed in the above formula, the second derivation unit 122 derives the second optimal solution (second decision-making result) p _{t by a weighted sum of each of the first optimal solutions (first decision-making result) p t,j} derived by the multiple first derivation units 121- _j , the weighted sum corresponding to the reliability w _t (j) of each of the multiple first derivation units 121-j.

　上記のように、第２の導出部１２２は、複数の第１の導出部１２１－ｊの各々が導出した第１の意思決定結果（ｐ_ｔ，ｊ）と、当該複数の第１の導出部１２１－ｊの各々の信頼度（ｗ_ｔ（ｊ））とに応じて、第２の意思決定結果ｐ_ｔを導出する。ここで、第１の意思決定結果は、上述したように各エキスパート（モデル）が提供する出力値（ｍ_ｔ）を参照して行われる。したがって、上記の構成によれば、各エキスパート（モデル）が提供する出力値を参照して、第１の導出部１２１及び第２の導出部１２２による階層的な処理により、好適な意思決定結果を導出することができる。
　続いて、アルゴリズム２の「９：」に示すように、第２の導出部１２２は、第２の最適解ｐ_ｔに応じて選択肢ｉ_ｔを特定する。より具体的には、第２の導出部１２２は、第２の最適解ｐ_ｔが示す確率分布に応じた確率で、選択肢ｉ_ｔを選択する。ここで、ｉは、

を満たすものであり、ｉ_ｔは、ｔステップにおける当該ｉを示している。また、当該式におけるＮ^ｏｕｔ（ｉ）は、

によって定義される集合である。換言すれば、Ｎ^ｏｕｔ（ｉ）は、頂点番号ｉの頂点Ｖ（ｉ）について、当該頂点Ｖ（ｉ）から出ていくエッジの終点となる頂点Ｖ（ｊ）の頂点番号ｊを要素とする集合である。
　続いて、アルゴリズム２の「１０：」に示すように、導出された第２の最適解（第２の意思決定結果）ｐ_ｔに応じて選択された選択肢ｉ_ｔが、一例として、端末装置２Ａの実行部２１によって実行される。そして、当該第２の最適解ｐ_ｔに対応する損失値ｌ_ｔ（換言すれば選択された選択肢ｉ_ｔに対応する損失値ｌ_ｔ）が、端末装置２Ａの損失値取得部２２によって取得され、情報処理装置１Ａに提供される。 As described above, the second derivation unit 122 derives the second decision-making result p t in accordance with the first decision-making result (p _t,j ) derived by each of the multiple first derivation units 121-j and the reliability (w _t (j)) of each of the multiple first derivation units 121- _j . Here, the first decision-making result is made with reference to the output value (m _t ) provided by each expert (model) as described above. Therefore, according to the above configuration, a suitable decision-making result can be derived by hierarchical processing by the first derivation unit 121 and the second derivation unit 122 with reference to the output value provided by each expert (model).
Next, as shown in “9:” of Algorithm 2, the second derivation unit 122 identifies an option i _t according to the second optimal solution p _t . More specifically, the second derivation unit 122 selects an option i _t with a probability according to the probability distribution indicated by the second optimal solution p _t . Here, i is

where i _t indicates the i in the t step. In addition, N ^out (i) in the formula satisfies the following:

In other words, N ^out (i) is a set whose elements are the vertex numbers j of the vertices V(j) that are the end points of the edges going out from the vertex V(i) with the vertex number i.
Next, as shown in “10:” of Algorithm 2, an option i _t selected according to the derived second optimal solution (second decision-making result) p _t is executed by the execution unit 21 of the terminal device 2A, for example. Then, a loss value l _t corresponding to the second optimal solution p _t (in other words, a loss value l _t corresponding to the selected option i _t ) is acquired by the loss value acquisition unit 22 of the terminal device 2A and provided to the information processing device 1A.

　なお、アルゴリズム２の「１０：」に示すように、当該選択肢ｉ_ｔの実行は、

を満たす全ての選択肢ｉに対して行われる。ここで、Ｎ^ｏｕｔ（ｉ_ｔ）は、上述したように、頂点番号ｉ_ｔの頂点Ｖ（ｉ_ｔ）について、当該頂点Ｖ（ｉ_ｔ）から出ていくエッジの終点となる頂点Ｖ（ｊ）の頂点番号ｊを要素とする集合であるため、選択肢ｉ_ｔに対応する損失値ｌ_ｔは観測可能な損失値である。
　続いて、アルゴリズム２の「１１：」に示すように、第２の導出部１２２は、アルゴリズム２の「１０：」において観測によって取得した損失値ｌ_ｔ、及びエキスパート（モデル）の出力値ｍ_ｔを参照して、

によって、ハット付きの損失値（＾ｌ_ｔ）を導出する。ここで、Ｐ_ｔ（ｉ）は、第１の導出処理（ベースアルゴリズム）の各々が導出した第１の最適解ｐ_ｔ（ｊ）を用いて

のように算出される。また、（式２６）の右辺第１項における

は、［］内の条件が満たされる場合、すなわち、インデックスｉが集合Ｎ^ｏｕｔ（ｉ_ｔ）の要素である場合（換言すれば、損失値ｌ_ｔが観測によって取得される場合）に１を返し、そうでない場合に、０を返す関数である。
　このようにして第２の導出部１２２が導出するハット付きの損失値（＾ｌ_ｔ）は、

を満たす（ここでＥ_ｔ［］は期待値を表す）。すなわち、第２の導出部１２２は、ハット付きの損失値（＾ｌ_ｔ）と不偏推定量（unbiased estimate）として導出する。このようにして、第２の導出部１２２は、損失値ｌ_ｔが観測によって取得される場合であっても、そうでない場合であっても、ハット付きの損失値（＾ｌ_ｔ）を好適に導出することができるので、損失値ｌ_ｔが観測によって取得される場合であっても、そうでない場合であっても意思決定（最適解の導出）を好適に行うことができる。換言すれば、本例示的実施形態に係る情報処理システム１００Ａは、様々な環境（様々な問題設定（様々なフィードバックグラフ））に対して柔軟に適用可能な意思決定処理を行うことができる。なお、本明細書において、観測によって取得された損失値ｌ_ｔ、及び、第２の導出部１２２によって導出されたハット付きの損失値（＾ｌ_ｔ）の双方を、単に損失値と表現することもある。 As shown in “10:” of Algorithm 2, the execution of the option i _t is as follows:

Here, as described above, N ^out (i _t ) is a set whose elements are the vertex numbers j of the vertices V( _j ) that are the end points of the edges going out from the vertex V(i _t ) of the vertex V(i t ) with the vertex number i _t , so the loss value l _t corresponding to the option i _t is an observable loss value.
Next, as shown in “11:” of Algorithm 2, the second derivation unit 122 refers to the loss value l _t acquired by observation in “10:” of Algorithm 2 and the output value m _t of the expert (model), and calculates:

Here, P _t (i) is calculated by using the first optimal solution p _t (j) derived by each of the first derivation processes (base algorithm) _.

In addition, in the first term on the right side of (Equation 26),

is a function that returns 1 if the condition in [ ] is satisfied, i.e., if index i is an element of set N ^out (i _t ) (in other words, if loss value l _t is obtained by observation), and returns 0 otherwise.
The loss value with a hat (^l _t ) derived by the second derivation unit 122 in this manner is given by

(where E _t [ ] represents the expected value). That is, the second derivation unit 122 derives the loss value with a hat (^l _t ) as an unbiased estimate. In this way, the second derivation unit 122 can suitably derive the loss value with a hat (^l _t ) whether the loss value l _t is obtained by observation or not, and therefore can suitably perform decision-making (derive an optimal solution) whether the loss value l _t is obtained by observation or not. In other words, the information processing system 100A according to this exemplary embodiment can perform decision-making processing that is flexibly applicable to various environments (various problem settings (various feedback graphs)). Note that in this specification, both the loss value l _t obtained by observation and the loss value with a hat (^l _t ) derived by the second derivation unit 122 may be simply expressed as a loss value.

　続いて、アルゴリズム２の「１２：」に示すように、第２の導出部１２２は、アルゴリズム２の「１１：」にて導出した損失値（＾ｌ_ｔ）、及び補正パラメータα_ｔを、ベースアルゴリズムＢ_ｊに供給する。ここで、当該補正パラメータα_ｔは、アルゴリズム２の「１２：」に示すように、一例として、

によって導出される。換言すれば、第２の導出部１２２は、補正パラメータα_ｔを、損失値（＾ｌ_ｔ）と最適解（意思決定結果）ｐ_ｔとの内積（にマイナス１を乗じたもの）として導出する。当該ステップにおいて提供される損失値（＾ｌ_ｔ）は、一例として、アルゴリズム１の「６：」において取得されるｌ_ｔに対応している。また、当該ステップにおいて提供されるα_ｔ（ｊ）は、アルゴリズム１の「７：」において取得される補正パラメータα_ｔに対応している。 Next, as shown in “12:” of Algorithm 2, the second derivation unit 122 supplies the loss value (^l _t ) derived in “11:” of Algorithm 2 and the correction parameter α _t to the base algorithm B _j . Here, as shown in “12:” of Algorithm 2, the correction parameter α _t is, for example,

In other words, the second derivation unit 122 derives the correction parameter α _t as the inner product (multiplied by minus 1) of the loss value (^l _t ) and the optimal solution (decision-making result) p _t . The loss value (^l _t ) provided in this step corresponds to the l _t obtained in “6:” of Algorithm 1, for example. Also, the α _t (j) provided in this step corresponds to the correction parameter α _t obtained in “7:” of Algorithm 1.

　続いて、アルゴリズム２の「１３：」に示すように、第２の導出部１２２は、ラウンドｔにおける重み因子ｗ’_ｔを、ラウンドｔ＋１における重み因子ｗ’_ｔ＋１に更新する。より具体的には、第２の導出部１２２は、ラウンドｔ＋１における重み因子ｗ’_ｔ＋１を、

によって導出する。ここで、ｇ_ｔは、

によって定義され、ｂ_ｔは、

によって定義される。 Next, as shown in “13:” of Algorithm 2, the second derivation unit 122 updates the weight factor w′ _t in round t to a weight factor w′ _{t+1 in round t+1} . More specifically, the second derivation unit 122 updates the weight factor w′ _{t+1 in round t+} 1 as follows:

Here, _gt is derived by:

and _bt is defined by

is defined as follows:

　なお、信頼度ｗ_ｔを導出するために参照するパラメータ（重み因子）ｗ’_ｔも、各ベースアルゴリズムの信頼度（各第１の最適解の信頼度）を表すパラメータと捉えることができる。ただし当該解釈は本例示的実施形態を限定するものではない。 The parameter (weighting factor) _w't referred to in deriving the reliability _wt can also be considered as a parameter representing the reliability of each base algorithm (the reliability of each first optimal solution). However, this interpretation does not limit the present exemplary embodiment.

　以上のように、本例示的実施形態では、１又は複数のモデル（エキスパート）の各々から得られる出力値を取得し、取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び、前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理装置１は、信頼度を用いた階層的な処理によって最適解の導出（意思決定）を行う。したがって、本例示的実施形態に係る情報処理装置１によれば、より適切な意思決定結果（最適解）を導出することができる。 As described above, in this exemplary embodiment, a plurality of first derivation processes are executed in which output values obtained from each of one or more models (experts) are acquired and a first optimal solution is derived by referring to the acquired output values, and a second derivation process is executed in which a second optimal solution is derived according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing device 1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing device 1 according to this exemplary embodiment, it is possible to derive a more appropriate decision-making result (optimal solution).

　また、前記信頼度は所定のタイミングで初期化される構成としてもよい。このような構成とすることにより、環境の変化に好適に対応することができる。
　また、上述のように、本例示的実施形態に係る情報処理システム１００Ａは、任意のフィードバックグラフＧ（Ｖ，Ｅ）に対して適用可能である。換言すれば、本例示的実施形態に係る情報処理システム１００Ａは、損失値が観測できる場合であってもそうでない場合であっても適用可能である。より具体的には、上述したように、本例示的実施形態では、損失値（＾ｌ_ｔ）を推定によって（より具体的には不偏推定量として）導出してもよく、これにより、情報処理システム１００Ａは、任意のフィードバックグラフＧ（Ｖ，Ｅ）に対して好適に適用可能な構成となっている。換言すれば、情報処理システム１００Ａは、任意のフィードバックグラフＧ（Ｖ，Ｅ）に対応可能であり、かつ、環境の変化に好適に対応可能な構成となっている。 The reliability may be initialized at a predetermined timing, making it possible to appropriately respond to changes in the environment.
Also, as described above, the information processing system 100A according to this exemplary embodiment is applicable to any feedback graph G(V,E). In other words, the information processing system 100A according to this exemplary embodiment is applicable whether the loss value can be observed or not. More specifically, as described above, in this exemplary embodiment, the loss value (^l _t ) may be derived by estimation (more specifically, as an unbiased estimator), and this allows the information processing system 100A to be suitably applicable to any feedback graph G(V,E). In other words, the information processing system 100A is capable of handling any feedback graph G(V,E) and is capable of suitably handling changes in the environment.

　（情報処理システム１００Ａによるより具体的な効果の説明）
　以下では、図９を参照して、情報処理システム１００Ａによるより具体的な効果について説明する。図９は、情報処理システム１００Ａをリンゴの出荷に関する意思決定問題に適用した場合の例を示している。当該問題設定では、リンゴを出荷する、リンゴを出荷しないの２つの選択肢が存在し（上述したアルゴリズム１においてＫ＝２に対応）、パラメータＴの値をＴ＝２００とした。換言すれは、意思決定回数の総数を２００回とした。また、各損失を
　　リンゴがおいしいときの損失：出荷しないと１、出荷すると０
　　リンゴがおいしくないときの損失：出荷しないと０、出荷すると１
とした。 (More specific effects of the information processing system 100A)
More specific effects of the information processing system 100A will be described below with reference to FIG. 9. FIG. 9 shows an example in which the information processing system 100A is applied to a decision-making problem regarding the shipping of apples. In this problem setting, there are two options, shipping apples and not shipping apples (corresponding to K=2 in the above-mentioned algorithm 1), and the value of the parameter T is set to T=200. In other words, the total number of decision-making times is set to 200. In addition, each loss is defined as: Loss when the apples are delicious: 1 if they are not shipped, 0 if they are shipped.
Loss when apples are not tasty: 0 if not shipped, 1 if shipped
It was decided.

　また、環境変化として、リンゴがおいしい確率が、
　　前半１００回では平均０．９であり、
　　後半１００回では平均０．１である
という設定とした。また、本例示的実施形態に係る情報処理システム１００Ａにおいて、５０回毎に、信頼度を初期化（アルゴリズムを初期化）するよう構成した。また、エキスパート（モデル）の出力値（ｍ_ｔ）は常に０とした。 In addition, the probability that the apples are delicious changes as a result of environmental changes.
The average for the first 100 times was 0.9.
The average reliability was set to 0.1 in the latter 100 times. In addition, in the information processing system 100A according to this exemplary embodiment, the reliability was initialized (the algorithm was initialized) every 50 times. In addition, the output value (m _t ) of the expert (model) was always set to 0.

　以上のような問題設定において、本例示的実施形態に係る情報処理システム１００Ａによる損失値の推移と、比較例に係る構成による損失値の推移とを示したのが図９である。図９に示すように、情報処理システム１００Ａによる損失値は、比較例に係る構成による損失値に比べて顕著に小さく抑えられており、また、１００回目において生じる環境変化にも迅速に適用できていることが分かる。また５０回毎の信頼度を初期化の後も、損失値が速やかに収束していることが見てとれる。このように、本例示的実施形態に係る情報処理システム１００Ａは、比較例に係る構成に比べて、環境変化への適用性が顕著に高いことが分かる。 In the above problem setting, Figure 9 shows the progress of the loss value using the information processing system 100A according to this exemplary embodiment and the progress of the loss value using the configuration according to the comparative example. As shown in Figure 9, the loss value using the information processing system 100A is significantly smaller than the loss value using the configuration according to the comparative example, and it can be seen that it is able to quickly adapt to the environmental changes that occur on the 100th iteration. It can also be seen that the loss value quickly converges even after the reliability is initialized every 50 iterations. In this way, it can be seen that the information processing system 100A according to this exemplary embodiment has significantly higher adaptability to environmental changes than the configuration according to the comparative example.

　〔適用例〕
　以下では、本例示的実施形態に係る情報処理システム１００Ａによるより具体的な適用例について説明する。図１０は、本例に係る情報処理装置１による処理を模式的に示す図である。 [Application example]
A more specific application example of the information processing system 100A according to this exemplary embodiment will be described below. Fig. 10 is a diagram showing a schematic diagram of a process performed by the information processing device 1 according to this embodiment.

　図８に示すように、本例に係る情報処理装置１は、複数の医療従事者と複数の病院とのマッチングに関する意思決定を行う（意思決定結果を導出する）。ここで、本例に係る情報処理装置１は、上述したように、あるラウンドにおいて、各エキスパート（モデル）に対応付けられた出力値を、複数のエキスパートについて取得し、取得した出力値を参照して意思決定結果を導出する。 As shown in FIG. 8, the information processing device 1 of this example makes a decision regarding matching multiple medical professionals with multiple hospitals (derives a decision-making result). Here, as described above, the information processing device 1 of this example acquires output values associated with each expert (model) for multiple experts in a certain round, and derives a decision-making result by referring to the acquired output values.

　本例に係る情報処理装置１の具体的構成は本適用例を限定するものではないが、例示的実施形態１において説明した情報処理装置１と同様の構成であってもよいし、例示的実施形態２において説明した情報処理装置１Ａと同様の構成であってもよい。 The specific configuration of the information processing device 1 in this example does not limit this application example, but may be the same as the information processing device 1 described in exemplary embodiment 1, or may be the same as the information processing device 1A described in exemplary embodiment 2.

　また、本例に係る情報処理装置１が行う意思決定処理は、一例として、情報処理装置１Ａに関して説明したように、複数の第１の導出部１２１－１、１２１－２、・・・、及び第２の導出部１２２による、階層的な意思決定処理とすることができるが、これは本適用例を限定するものではない。 The decision-making process performed by the information processing device 1 in this example can be, as an example, a hierarchical decision-making process using multiple first derivation units 121-1, 121-2, ... and second derivation unit 122, as described for information processing device 1A, but this is not a limitation of this application example.

　そして、導出された意思決定結果が実行されることにより、次のラウンドに対応する損失値が観測によって取得されるか、または導出によって取得される。当該損失値と、当該損失値に対応する出力値とが本例に係る情報処理装置１に提供され、当該次のラウンドにおける意思決定処理において参照される。 Then, the derived decision-making result is executed, and a loss value corresponding to the next round is obtained by observation or by derivation. The loss value and an output value corresponding to the loss value are provided to the information processing device 1 in this example, and are referenced in the decision-making process in the next round.

　本例に係る各エキスパートへの入力、及び当該各エキスパートの出力値を例示すれば以下の通りである。なお、第１の例示的実施形態において説明したように、本例に係る情報処理システムは、「エキスパート（モデル）」を含む構成であってもよいし、外部の「エキスパート（モデル）」から予測値を取得する構成であってもよい。 The inputs to each expert in this example and the output values of each expert are exemplified below. As explained in the first exemplary embodiment, the information processing system in this example may be configured to include an "expert (model)" or may be configured to obtain predicted values from an external "expert (model)."

　（エキスパートへの入力例）
・時刻ｔ（ラウンドｔ）で観測された各病院と各医療従事者との特徴量
・前回（ラウンドｔ－１）における各病院での来院患者数
　（エキスパートの出力例）
・時刻ｔ（ラウンドｔ）における医療従事者毎の配属先病院。 (Example of input to the expert)
・Features of each hospital and each medical worker observed at time t (round t) ・Number of patients visiting each hospital in the previous round (round t-1) (Example of expert output)
The hospital to which each medical worker is assigned at time t (round t).

　（本例に係る処理の流れ）
　本例に係る情報処理システム１００による処理の一例について説明すれば以下の通りである。 (Processing flow according to this example)
An example of the processing performed by the information processing system 100 according to this embodiment will be described below.

　まず、病院における入力担当者は、情報処理システム１００に、端末装置２Ａ等を介して、診断状況、病室の空き状況、診療科目、診療時間などの情報（病院データとも呼ぶ）を入力する。入力された各情報は、一例として記憶部１５Ａに記憶され、制御部１０Ａによって参照される。図１１の下段は、本例に係る情報処理システム１００が管理する病院データの一例である。 First, a person in charge of inputting data at the hospital inputs information (also called hospital data) such as diagnosis status, availability of hospital rooms, medical departments, and consultation hours into the information processing system 100 via a terminal device 2A or the like. Each piece of input information is stored in the memory unit 15A, for example, and is referenced by the control unit 10A. The lower part of Figure 11 is an example of hospital data managed by the information processing system 100 according to this example.

　続いて、各医療従事者は、情報処理システム１００に、自身のデータ（専門、勤続年数、希望病院など）（医療従事者データとも呼ぶ）を入力する。入力された各情報は、一例として記憶部１５Ａに記憶され、制御部１０Ａによって参照される。図１１の上段は、本例に係る情報処理システム１００が管理する医療従事者データの一例である。 Next, each medical worker inputs their own data (specialty, years of service, preferred hospital, etc.) (also referred to as medical worker data) into the information processing system 100. Each piece of input information is stored in the memory unit 15A, as an example, and is referenced by the control unit 10A. The top part of Figure 11 is an example of medical worker data managed by the information processing system 100 of this example.

　続いて、本例に係る情報処理装置１は、病院データと、医療従事者のデータとを参照して、最適な病院と医療従事者のマッチングに関する意思決定結果を導出する。一例として、本例に係る複数のエキスパート（モデル）の各々は、病院データと、医療従事者のデータとを参照して出力値を算出する。そして、本例に係る情報処理装置１は、これらの出力値を参照して、最適な病院と医療従事者のマッチングに関する意思決定結果を導出する。 Then, the information processing device 1 according to this example derives a decision-making result regarding optimal matching of hospitals and medical professionals by referring to the hospital data and medical professional data. As an example, each of the multiple experts (models) according to this example calculates an output value by referring to the hospital data and medical professional data. Then, the information processing device 1 according to this example derives a decision-making result regarding optimal matching of hospitals and medical professionals by referring to these output values.

　そして、本例に係る情報処理システム１００は、端末装置２Ａ等を介して、医療従事者に対して最適な病院の候補を提案する。一例として、本例に係る情報処理システム１００は、端末装置２Ａが備える表示パネル等を介して、医療従事者に対して最適な病院の候補を提示する。また、本例に係る情報処理システム１００は、医療従事者毎に、勤務に関する登録を行う。 Then, the information processing system 100 according to this example proposes optimal hospital candidates to the medical worker via the terminal device 2A or the like. As an example, the information processing system 100 according to this example presents optimal hospital candidates to the medical worker via a display panel or the like provided in the terminal device 2A. Furthermore, the information processing system 100 according to this example registers work-related information for each medical worker.

　本例に係る情報処理システム１００は、一例として、毎月（毎ラウンド）における来院患者数を病院毎に記録する。そして、本例に係る情報処理システム１００は、当該来院患者数に応じた損失値に基づいて、次回（次ラウンド）における配属先を決定する。ここで、損失値の具体例は本例を限定するものではないが、一例として、病院における混雑度合に応じた損失値を用いることができる。 As an example, the information processing system 100 according to this example records the number of patients visiting each month (each round) for each hospital. Then, the information processing system 100 according to this example determines the allocation for the next time (next round) based on a loss value according to the number of patients visiting. Here, the specific example of the loss value is not limited to this example, but as an example, a loss value according to the degree of congestion at the hospital can be used.

　より具体的には、損失値を、
　　（各病院における実際の来院患者数）－（各病院に配属された医療従事者数×医療従事者が一人当たり診察可能な患者数）
によって算出する構成とすることができる。 More specifically, the loss value is
(Actual number of patients visiting each hospital) – (Number of medical staff assigned to each hospital x Number of patients that each medical staff can see)
The calculation may be performed as follows.

　ただし、病院によっては、プライバシーへの配慮等の理由により、来院者数の提供を行わない場合も生じ得る。このような場合には、当該病院（当該意思決定結果）に関する損失値は観測可能ではない。 However, some hospitals may not provide the number of visitors due to privacy considerations or other reasons. In such cases, the loss value for that hospital (the decision-making result) is not observable.

　本例に係る情報処理システム１００は、上述したように、損失値が観測可能な場合であっても、そうでない場合であっても、好適に意思決定を行うことができるので、上記のような場合であっても、好適に病院と医療従事者のマッチングを行うことができる。また、年月の経過と共に、病院データや医療従事者データは変化し得る。本例に係る情報処理システム１００は、このような環境変化が生じる場合であっても、好適に病院と医療従事者のマッチングを行うことができる。 As described above, the information processing system 100 according to this example can make appropriate decisions whether or not the loss value is observable, and therefore can appropriately match hospitals with medical personnel even in the above-mentioned cases. In addition, hospital data and medical personnel data may change over time. The information processing system 100 according to this example can appropriately match hospitals with medical personnel even in the case of such environmental changes.

　〔ソフトウェアによる実現例〕
　情報処理装置１、１Ａ、端末装置２、２Ａの制御ブロック（特に取得部１１、導出部１２）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Software implementation example]
The control blocks (particularly the acquisition unit 11 and the derivation unit 12) of the information processing device 1, 1A and the terminal device 2, 2A may be realized by a logic circuit (hardware) formed on an integrated circuit (IC chip) or the like, or may be realized by software.

　後者の場合、情報処理装置１、１Ａ、端末装置２、２Ａは、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば少なくとも１つのプロセッサ（制御装置）を備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な少なくとも１つの記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the information processing device 1, 1A and the terminal device 2, 2A are provided with a computer that executes the instructions of a program, which is software that realizes each function. This computer is provided with, for example, at least one processor (control device) and at least one computer-readable recording medium that stores the above program. The object of the present invention is achieved by the processor in the computer reading the above program from the recording medium and executing it. The processor can be, for example, a CPU (Central Processing Unit). The recording medium can be a "non-transient tangible medium" such as a ROM (Read Only Memory), as well as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, etc. The computer can also be provided with a RAM (Random Access Memory) for expanding the above program. The above program can also be supplied to the computer via any transmission medium (such as a communication network or broadcast waves) that can transmit the program. Note that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.

　〔付記事項Ａ〕
　本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。 [Appendix A]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

　（付記Ａ１）
　１又は複数のモデルの各々から得られる出力値を取得する取得手段と、
　　前記取得手段が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
　　前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を実行する導出手段と
を備えている情報処理装置。 (Appendix A1)
obtaining means for obtaining output values resulting from each of the one or more models;
an information processing device comprising: a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means; and a derivation means that executes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　（付記Ａ２）
　前記導出手段は、
　　前記出力値及び前記第１の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
　　前記信頼度を所定のタイミングで初期化する
付記Ａ１に記載の情報処理装置。 (Appendix A2)
The derivation means is
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing device according to claim A1, wherein the reliability is initialized at a predetermined timing.

　（付記Ａ３）
　前記導出手段は、
　　前記第１の最適解に対応する損失値を推定し、
　　前記損失値を参照して、前記第１の最適解を導出するためのパラメータを更新する
付記Ａ１又はＡ２に記載の情報処理装置。 (Appendix A3)
The derivation means is
Estimating a loss value corresponding to the first optimal solution;
The information processing device according to claim A1 or A2, wherein a parameter for deriving the first optimal solution is updated by referring to the loss value.

　（付記Ａ４）
　前記導出手段は、前記損失値を、不偏推定量として算出する
付記Ａ３に記載の情報処理装置。 (Appendix A4)
The information processing device according to claim 3, wherein the derivation means calculates the loss value as an unbiased estimator.

　（付記Ａ５）
　前記導出手段は、補正のためのパラメータを更に参照して、前記第１の最適解を更新する
付記Ａ３又はＡ４に記載の情報処理装置。 (Appendix A5)
The information processing device according to claim 3, wherein the derivation means updates the first optimal solution by further referring to a parameter for correction.

　（付記Ａ６）
　前記導出手段は、前記信頼度を導出するために参照する学習率を、前記第１の導出処理の総数の－１／２乗に比例した値に設定する
付記Ａ１からＡ５の何れか１つに記載の情報処理装置。 (Appendix A6)
The information processing device according to any one of appendices A1 to A5, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the −1/2 power of a total number of the first derivation processes.

　（付記Ａ７）
　前記導出手段が前記第１の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記Ａ１からＡ５の何れか１つに記載の情報処理装置。 (Appendix A7)
The information processing device according to any one of appendices A1 to A5, wherein the Bregman divergence referred to by the derivation means to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

　（付記Ａ８）
　前記第１の導出処理及び前記第２の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記Ａ１からＡ７の何れか１つに記載の情報処理装置。 (Appendix A8)
The information processing device according to any one of appendices A1 to A7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values that are sequentially acquired.

　〔付記事項Ｂ〕
　本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。 [Appendix B]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

　（付記Ｂ１）
　少なくとも１つのプロセッサが、
　１又は複数のモデルの各々から得られる出力値を取得する取得処理と、
　　前記取得処理が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
　　前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を含む導出処理と
を含んでいる情報処理方法。 (Appendix B1)
At least one processor
an acquisition process for acquiring output values from each of the one or more models;
an information processing method including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a derivation process that includes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　（付記Ｂ２）
　前記導出処理において、前記少なくとも１つのプロセッサは、
　　前記出力値及び前記第１の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
　　前記信頼度を所定のタイミングで初期化する
付記Ｂ１に記載の情報処理方法。 (Appendix B2)
In the derivation process, the at least one processor
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing method according to claim B1, further comprising initializing the reliability at a predetermined timing.

　（付記Ｂ３）
　前記導出処理において、前記少なくとも１つのプロセッサは、
　　前記第１の最適解に対応する損失値を推定し、
　　前記損失値を参照して、前記第１の最適解を導出するためのパラメータを更新する
付記Ｂ１又はＢ２に記載の情報処理方法。 (Appendix B3)
In the derivation process, the at least one processor
Estimating a loss value corresponding to the first optimal solution;
The information processing method according to claim 1 or 2, further comprising updating a parameter for deriving the first optimal solution by referring to the loss value.

　（付記Ｂ４）
　前記導出処理において、前記少なくとも１つのプロセッサは、前記損失値を、不偏推定量として算出する
付記Ｂ３に記載の情報処理方法。 (Appendix B4)
The information processing method according to claim B3, wherein in the derivation process, the at least one processor calculates the loss value as an unbiased estimator.

　（付記Ｂ５）
　前記導出処理において、前記少なくとも１つのプロセッサは、補正のためのパラメータを更に参照して、前記第１の最適解を更新する
付記Ｂ３又はＢ４に記載の情報処理方法。 (Appendix B5)
The information processing method according to any one of claims 3 to 4, wherein in the derivation process, the at least one processor further refers to a parameter for correction to update the first optimal solution.

　（付記Ｂ６）
　前記導出処理において、前記少なくとも１つのプロセッサは、前記信頼度を導出するために参照する学習率を、前記第１の導出処理の総数の－１／２乗に比例した値に設定する
付記Ｂ１からＢ５の何れか１つに記載の情報処理方法。 (Appendix B6)
The information processing method according to any one of appendices B1 to B5, wherein, in the derivation process, the at least one processor sets a learning rate referred to for deriving the reliability to a value proportional to the -1/2 power of a total number of the first derivation processes.

　（付記Ｂ７）
　前記導出処理が前記第１の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記Ｂ１からＢ５の何れか１つに記載の情報処理方法。 (Appendix B7)
The information processing method according to any one of appendices B1 to B5, wherein the Bregman divergence referred to in the derivation process to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

　（付記Ｂ８）
　前記第１の導出処理及び前記第２の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記Ｂ１からＢ７の何れか１つに記載の情報処理方法。 (Appendix B8)
The information processing method according to any one of appendices B1 to B7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values obtained sequentially.

　〔付記事項Ｃ〕
　本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。 [Appendix C]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

　（付記Ｃ１）
　情報処理装置としてコンピュータを機能させるプログラムであって、
　前記コンピュータを、
　１又は複数のモデルの各々から得られる出力値を取得する取得手段と、
　　前記取得手段が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
　　前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を実行する導出手段と
として機能させる情報処理プログラム。 (Appendix C1)
A program for causing a computer to function as an information processing device,
The computer,
obtaining means for obtaining output values resulting from each of the one or more models;
an information processing program that causes the information processing device to function as a derivation means that executes a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means, and a second derivation process that derives a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　（付記Ｃ２）
　前記導出手段は、
　　前記出力値及び前記第１の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
　　前記信頼度を所定のタイミングで初期化する
付記Ｃ１に記載の情報処理プログラム。 (Appendix C2)
The derivation means is
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing program according to claim C1, further comprising: initializing the reliability at a predetermined timing.

　（付記Ｃ３）
　前記導出手段は、
　　前記第１の最適解に対応する損失値を推定し、
　　前記損失値を参照して、前記第１の最適解を導出するためのパラメータを更新する
付記Ｃ１又はＣ２に記載の情報処理プログラム。 (Appendix C3)
The derivation means is
Estimating a loss value corresponding to the first optimal solution;
The information processing program according to claim 1 or 2, further comprising updating a parameter for deriving the first optimal solution by referring to the loss value.

　（付記Ｃ４）
　前記導出手段は、前記損失値を、不偏推定量として算出する
付記Ｃ３に記載の情報処理プログラム。 (Appendix C4)
The information processing program according to claim C3, wherein the derivation means calculates the loss value as an unbiased estimator.

　（付記Ｃ５）
　前記導出手段は、補正のためのパラメータを更に参照して、前記第１の最適解を更新する
付記Ｃ３又はＣ４に記載の情報処理プログラム。 (Appendix C5)
The information processing program according to claim C3 or C4, wherein the derivation means further refers to a parameter for correction to update the first optimal solution.

　（付記Ｃ６）
　前記導出手段は、前記信頼度を導出するために参照する学習率を、前記第１の導出処理の総数の－１／２乗に比例した値に設定する
付記Ｃ１からＣ５の何れか１つに記載の情報処理プログラム。 (Appendix C6)
The information processing program according to any one of appendices C1 to C5, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the −1/2 power of a total number of the first derivation processes.

　（付記Ｃ７）
　前記導出手段が前記第１の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記Ｃ１からＣ５の何れか１つに記載の情報処理プログラム。 (Appendix C7)
The information processing program according to any one of appendices C1 to C5, wherein the Bregman divergence referred to by the derivation means to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

　（付記Ｃ８）
　前記第１の導出処理及び前記第２の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記Ｃ１からＣ７の何れか１つに記載の情報処理プログラム。 (Appendix C8)
The information processing program according to any one of appendices C1 to C7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values obtained sequentially.

　〔付記事項Ｄ〕
　本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。 [Appendix D]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

　（付記Ｄ１）
　少なくとも１つのプロセッサを備え、前記少なくとも１つのプロセッサは、
　１又は複数のモデルの各々から得られる出力値を取得する取得処理と、
　　前記取得処理が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
　　前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を含む導出処理と
を実行する情報処理装置。 (Appendix D1)
At least one processor, the at least one processor comprising:
an acquisition process for acquiring output values from each of the one or more models;
an information processing device that executes a derivation process including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　なお、前記情報処理装置は、更にメモリを備えていてもよい。また、前記メモリには、前記各処理を前記少なくとも１つのプロセッサに実行させるためのプログラムが記憶されていてもよい。 The information processing device may further include a memory. The memory may also store a program for causing the at least one processor to execute each of the processes.

　（付記Ｄ２）
　前記導出処理において、前記少なくとも１つのプロセッサは、
　　前記出力値及び前記第１の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
　　前記信頼度を所定のタイミングで初期化する
付記Ｄ１に記載の情報処理装置。 (Appendix D2)
In the derivation process, the at least one processor
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing device according to claim D1, wherein the reliability is initialized at a predetermined timing.

　（付記Ｄ３）
　前記導出処理において、前記少なくとも１つのプロセッサは、
　　前記第１の最適解に対応する損失値を推定し、
　　前記損失値を参照して、前記第１の最適解を導出するためのパラメータを更新する
付記Ｄ１又はＤ２に記載の情報処理装置。 (Appendix D3)
In the derivation process, the at least one processor
Estimating a loss value corresponding to the first optimal solution;
The information processing device according to claim D1 or D2, wherein a parameter for deriving the first optimal solution is updated by referring to the loss value.

　（付記Ｄ４）
　前記導出処理において、前記少なくとも１つのプロセッサは、前記損失値を、不偏推定量として算出する
付記Ｄ３に記載の情報処理装置。 (Appendix D4)
The information processing device of claim D3, wherein in the derivation process, the at least one processor calculates the loss value as an unbiased estimator.

　（付記Ｄ５）
　前記導出処理において、前記少なくとも１つのプロセッサは、補正のためのパラメータを更に参照して、前記第１の最適解を更新する
付記Ｄ３又はＤ４に記載の情報処理装置。 (Appendix D5)
The information processing device according to claim D3 or D4, wherein, in the derivation process, the at least one processor further refers to a parameter for correction to update the first optimal solution.

　（付記Ｄ６）
　前記導出処理において、前記少なくとも１つのプロセッサは、前記信頼度を導出するために参照する学習率を、前記第１の導出処理の総数の－１／２乗に比例した値に設定する
付記Ｄ１からＤ５の何れか１つに記載の情報処理装置。 (Appendix D6)
The information processing device according to any one of appendices D1 to D5, wherein, in the derivation process, the at least one processor sets a learning rate referred to for deriving the reliability to a value proportional to the −1/2 power of a total number of the first derivation processes.

　（付記Ｄ７）
　前記導出処理が前記第１の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記Ｄ１からＤ５の何れか１つに記載の情報処理装置。 (Appendix D7)
The information processing device according to any one of appendices D1 to D5, wherein the Bregman divergence referred to in the derivation process to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

　（付記Ｄ８）
　前記第１の導出処理及び前記第２の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記Ｄ１からＤ７の何れか１つに記載の情報処理装置。 (Appendix D8)
The information processing device according to any one of appendices D1 to D7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values acquired sequentially.

　〔付記事項Ｅ〕
　本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。 [Appendix E]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

　（付記Ｅ１）
　情報処理装置としてコンピュータを機能させるプログラムであって、
　前記コンピュータに、
　１又は複数のモデルの各々から得られる出力値を取得する取得処理と、
　　前記取得処理が取得した出力値を参照して第１の最適解を導出する複数の第１の導出処理、及び
　　前記複数の第１の導出処理の各々が導出した第１の最適解と、前記複数の第１の導出処理の各々の信頼度とに応じて第２の最適解を導出する第２の導出処理
を含む導出処理と
を実行させる情報処理プログラム、を記録した一時的でない記録媒体。 (Appendix E1)
A program for causing a computer to function as an information processing device,
The computer includes:
an acquisition process for acquiring output values from each of the one or more models;
A non-transitory recording medium having recorded thereon an information processing program for executing a derivation process including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

　本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. The technical scope of the present invention also includes embodiments obtained by appropriately combining the technical means disclosed in the different embodiments.

　１，１Ａ　　　　・・・情報処理装置
　１００、１００Ａ・・・情報処理システム
　１０Ａ　　　　　・・・制御部
　１１　　　　　　・・・取得部
　１２　　　　　　・・・導出部
　１２１　　　　　・・・第１の導出部
　１２２　　　　　・・・第２の導出部
　Ｓ１，Ｓ１００　・・・情報処理方法 Reference Signs List 1, 1A: Information processing device 100, 100A: Information processing system 10A: Control unit 11: Acquisition unit 12: Derivation unit 121: First derivation unit 122: Second derivation unit S1, S100: Information processing method

Claims

obtaining means for obtaining output values resulting from each of the one or more models;
an information processing device comprising: a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means; and a derivation means that executes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

The derivation means is
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing apparatus according to claim 1 , wherein the reliability is initialized at a predetermined timing.

The derivation means is
Estimating a loss value corresponding to the first optimal solution;
The information processing apparatus according to claim 1 , further comprising: updating a parameter for deriving the first optimal solution by referring to the loss value.

The information processing apparatus according to claim 3 , wherein the derivation means calculates the loss value as an unbiased estimator.

The information processing apparatus according to claim 3 , wherein the derivation means updates the first optimum solution by further referring to a parameter for correction.

6. The information processing device according to claim 1, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the power of −1/2 of a total number of the first derivation processes.

The information processing apparatus according to claim 1 , wherein the Bregman divergence referred to by the derivation means for deriving the first optimal solution is defined by a convex function including a logarithmic barrier term.

The information processing device according to claim 1 , wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values that are sequentially acquired.

An information processing device,
obtaining output values from each of the one or more models;
an information processing method that executes a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output value; and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and a reliability of each of the plurality of first derivation processes.

A program for causing a computer to function as an information processing device,
The program causes the computer to
obtaining output values from each of the one or more models;
a program for executing a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output value, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

An information processing system including an information processing device and a terminal device,
The information processing device includes:
obtaining means for obtaining output values resulting from each of the one or more models;
a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means; and a derivation means that executes a second derivation process that derives a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and a reliability of each of the plurality of first derivation processes,
The terminal device
an information processing system comprising an execution means for executing the second optimal solution derived by the information processing device;