[go: up one dir, main page]

WO2025126326A1 - Information processing device, information processing method, information processing system, and program - Google Patents

Information processing device, information processing method, information processing system, and program Download PDF

Info

Publication number
WO2025126326A1
WO2025126326A1 PCT/JP2023/044465 JP2023044465W WO2025126326A1 WO 2025126326 A1 WO2025126326 A1 WO 2025126326A1 JP 2023044465 W JP2023044465 W JP 2023044465W WO 2025126326 A1 WO2025126326 A1 WO 2025126326A1
Authority
WO
WIPO (PCT)
Prior art keywords
derivation
information processing
optimal solution
processing device
processes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2023/044465
Other languages
French (fr)
Japanese (ja)
Inventor
慧 竹村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to PCT/JP2023/044465 priority Critical patent/WO2025126326A1/en
Publication of WO2025126326A1 publication Critical patent/WO2025126326A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the present invention relates to an information processing device, an information processing method, an information processing system, and a program.
  • a sequential decision-making technique is known in which a process is sequentially repeated: deriving a prediction (decision-making result) regarding demand or supply volume, etc., observing the results of executing the derived prediction, and deriving a further prediction based on the observed results.
  • Patent Document 1 describes an optimal decision-making method for planning and evaluating capital investment plans that include uncertain factors.
  • One aspect of the present invention was made in consideration of the above problems, and one of its objectives is to provide a technology that can derive more appropriate decision-making results (optimal solutions).
  • An information processing device includes an acquisition means for acquiring an output value obtained from each of one or more models, a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a derivation means for executing a second derivation process for deriving a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • an information processing device executes a plurality of first derivation processes that acquire output values obtained from each of one or more models and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • a program according to one aspect of the present invention is a program that causes a computer to function as an information processing device, and the program causes the computer to acquire output values obtained from each of one or more models, and execute a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • An information processing system is an information processing system including an information processing device and a terminal device
  • the information processing device includes an acquisition means for acquiring an output value obtained from each of one or more models, a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a derivation means for executing a second derivation process for deriving a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes
  • the terminal device includes an execution means for executing the second optimal solution derived by the information processing device.
  • FIG. 1 is a block diagram showing a configuration of an information processing device according to an exemplary embodiment.
  • 1 is a flow diagram illustrating a flow of an information processing method according to an exemplary embodiment.
  • FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment.
  • FIG. 1 is a block diagram showing a configuration of an information processing system according to an exemplary embodiment.
  • FIG. 1 is a flow diagram illustrating a process flow of an information processing system according to an exemplary embodiment.
  • FIG. 1 is a block diagram showing a configuration of an information processing device according to an exemplary embodiment.
  • FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment.
  • FIG. 1 is a block diagram showing a configuration of an information processing device according to an exemplary embodiment.
  • FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment.
  • 11A to 11C are diagrams for explaining effects achieved by an information processing device according to an exemplary embodiment.
  • 1 is a diagram for explaining processing by an information processing device according to an application example of an exemplary embodiment.
  • FIG. 11 is a diagram showing an example of information referred to by an information processing device according to an application example of the exemplary embodiment.
  • FIG. 1 is a block diagram showing a configuration of a computer that functions as an information processing device according to each exemplary embodiment.
  • a first exemplary embodiment which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings.
  • This exemplary embodiment is the basic form of each exemplary embodiment described later.
  • the scope of application of each technical means adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technical means adopted in this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure to the extent that no particular technical obstacle occurs.
  • each technical means shown in the drawings referred to for explaining this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure to the extent that no particular technical obstacle occurs.
  • the information processing device 1 is an information processing device that acquires an output value by each of a plurality of experts (models) from the corresponding expert, and makes a decision by referring to the acquired plurality of output values.
  • the information processing device 1 according to this exemplary embodiment sequentially acquires output values and makes a decision. For example, in round t, an output value P1t is acquired from expert 1, an output value P2t is acquired from expert 2, and a decision-making result (t) in the round t is derived by referring to the acquired output values P1t and P2t.
  • the information processing device 1 performs a process of updating parameters for decision-making, acquiring an output value in the next round, and deriving a decision-making result in the next round.
  • t is an index that represents the number of repetitions, and can also be interpreted as an index indicating timing.
  • the "expert” may be any of hardware, software, or a living organism that outputs some kind of output value.
  • the "expert” may be hardware such as a predicted value derivation device that outputs a predicted value as an output value, software such as a predicted value derivation algorithm that outputs a predicted value as an output value, or a person who outputs a predicted value as an output value using some method.
  • the "expert” is not limited to one that outputs a predicted value, but may be a generation model that outputs some kind of generation result as an output value, or a control model that outputs some kind of control value as an output value.
  • the information processing device 1 may be configured to include an "expert” or may be configured to obtain an output value from an external "expert".
  • the "expert” is also called a "model” or "agent”, etc.
  • the "output value” may be anything.
  • the "output value” may be, for example, a predicted value related to demand or supply, or a predicted value related to other cases (events).
  • the "output value” does not have to be related to a prediction.
  • the "output value” in this exemplary embodiment may be an output value related to some parameter referenced by the information processing device 1.
  • the information processing device 1 according to this exemplary embodiment can be applied to the general process of making decisions regarding a target event by referring to output values from each of multiple experts.
  • intention refers to some information related to the target event, and is not limited to being interpreted as the intention of a living organism (person).
  • a predicted value of future demand is an example of an "intention” determined by the information processing device 1 according to this exemplary embodiment, or a "decision-making result” derived by the information processing device 1.
  • the information processing device 1 according to this exemplary embodiment can also be expressed as a decision-making device, a decision-making result derivation device, or the like.
  • the "decision-making result” is also called an "optimization solution” or an "optimization result”.
  • a loss value can be provided (acquired) corresponding to an output value provided by each of the multiple experts.
  • the loss value may be acquired by observation depending on the "decision-making result" by the information processing device 1, or may not be able to be acquired depending on the observation.
  • the information processing device 1 may acquire a loss value that cannot be acquired by observation by derivation.
  • what loss value can be "observed” depending on what "decision-making result” can be expressed by a directed graph structure called a feedback graph, but this is not a limitation of this exemplary embodiment.
  • the loss value can be expressed as the difference between the output value (predicted value) and the observed value (actual value), as an example, but this does not limit this exemplary embodiment.
  • the loss value may be the difference between the predicted value and another predetermined value.
  • the loss value may also be an estimated value related to the loss.
  • the term "loss value" may also include the concept of "reward.”
  • the loss value may be expressed as the reward value with the sign reversed (the reward value multiplied by a negative constant). Therefore, the loss value according to this exemplary embodiment may be read as the reward value.
  • Fig. 1 is a block diagram showing the configuration of the information processing device 1. As shown in Fig. 1, the information processing device 1 includes an acquisition unit 11 and a derivation unit 12.
  • the acquisition unit 11 acquires output values obtained from each of the multiple models (experts).
  • the "output value” may be, for example, a predicted value related to demand or supply, or an output value related to other cases (events).
  • the "output value” may also be an output value related to some parameter referenced by the information processing device 1.
  • the acquisition unit 11 may be configured to acquire an output value for each round in the sequential decision-making process, but this does not limit the present exemplary embodiment.
  • the derivation unit 12 is a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the first derivation process is sometimes referred to as a base algorithm
  • the second derivation process is sometimes referred to as a master algorithm, but these terms do not limit this exemplary embodiment.
  • reliability is an index showing to what extent the output value by each expert is reflected in the decision-making process.
  • the reliability may be expressed as an index showing to what extent the first optimal solution by each first derivation process is reflected in the decision-making process.
  • reliability can be expressed as a relative weight calculated on the predicted value by each expert, or as a relative weight calculated on each of the first optimal solutions.
  • the information processing device 1 acquires output values obtained from each of one or more models (experts) and executes a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the information processing device 1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing device 1 according to this exemplary embodiment, it is possible to derive a more appropriate decision-making result (optimal solution).
  • Fig. 2 is a flow diagram showing the flow of the information processing method S1.
  • step S11 the acquisition unit 11 acquires output values obtained from one or more models (experts).
  • Step S12 the derivation unit 12 - Executing a plurality of first derivation processes to derive a first optimal solution by referring to the output values acquired in step S11, and - executing a second derivation process to derive a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the information processing device 1 performs the processes of steps S11 and S12 in one round, and then performs the processes of steps S11 and S12 in the next round.
  • FIG. 3 is a diagram for illustrating a sequential decision-making process by the information processing method S1 according to this exemplary embodiment.
  • the information processing device 1 acquires output values provided by each expert for multiple experts in a certain round. Then, a decision-making result (optimal solution) is derived by referring to each acquired output value. Then, the derived decision-making result (optimal solution) is executed, and an output value in the next round is provided by each expert. In addition, a loss value corresponding to the next round can be observed.
  • the loss value may be acquired by observation depending on the decision-making result (optimal solution) by the information processing device 1, or may not be acquired by observation.
  • the information processing device 1 may acquire a loss value that cannot be acquired by observation by derivation. In this way, the information processing device 1 sequentially derives a decision-making result.
  • the information processing method S1 executes a plurality of first derivation processes that acquire output values from each of one or more models (experts) and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the information processing method S1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing method S1 according to this exemplary embodiment, a more appropriate decision-making result (optimal solution) can be derived.
  • Fig. 4 is a block diagram showing the configuration of the information processing system 100.
  • the information processing device 100 includes an information processing device 1 and a terminal device 2 that are communicably connected to each other.
  • Each component of the information processing device 1 has been described above, and therefore description thereof will be omitted here.
  • the terminal device 2 includes an execution unit 21 and a loss value acquisition unit 22.
  • the execution unit 21 executes the decision-making result (optimal solution) derived by the information processing device 1, or a process corresponding to the decision-making result (optimal solution). As an example, if the decision-making result predicts X units of product A as today's demand, the execution unit 21 places an order for X units of product A.
  • Fig. 5 is a flow diagram showing the flow of the information processing method S100 executed by the information processing system 100.
  • Step S11-1 the acquisition unit 11 acquires output values obtained from each of a plurality of experts (models).
  • step S12-1 the derivation unit 12 - Executing a plurality of first derivation processes to derive a first optimal solution by referring to the output values acquired in step S11-1, and - executing a second derivation process to derive a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • Step S21-1 the execution unit 21 of the terminal device 2 executes the decision-making result derived in step S12-1 (more specifically, the second optimal solution) or a process corresponding to the decision-making result.
  • step S22-1 the terminal device 2 provides the result of the execution to the information processing device 1. If a loss value is obtained by the executed decision-making result, the result of the execution may include the loss value.
  • each predicted value may be, for example, the loss value acquired by the loss value acquisition unit 22 in step S22-1 or the loss value derived by the information processing device 1.
  • step S12-2 the derivation unit 12 A plurality of first derivation processes are executed to derive a first optimal solution by referring to the output values acquired in step S11-2, and a second derivation process is executed to derive a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the decision-making result derived in step S12-2 (more specifically, the second optimal solution) is provided to the terminal device 2 and executed in step S21-2.
  • the information processing method S100 executes a plurality of first derivation processes that acquire output values from each of one or more models (experts) and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the information processing method S100 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing method S100 according to this exemplary embodiment, a more appropriate decision-making result (optimal solution) can be derived.
  • Exemplary embodiment 2 A second exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. Components having the same functions as those described in the above exemplary embodiment will be given the same reference numerals, and their description will be omitted as appropriate.
  • the scope of application of each technical means adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technical means adopted in this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure, as long as no particular technical hindrance occurs. In addition, each technical means shown in each drawing referred to for explaining this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure, as long as no particular technical hindrance occurs.
  • Fig. 6 is a block diagram showing the configuration of the information processing system 100A.
  • the information processing system 100A includes an information processing device 1A and a terminal device 2A.
  • the information processing device 1A and the terminal device 2A are configured to be able to communicate with each other via a network N.
  • the specific configuration of the network N does not limit this exemplary embodiment, but as an example, a wireless LAN (Local Area Network), a wired LAN, a WAN (Wide Area Network), a public line network, a mobile data communication network, or a combination of these networks can be used.
  • a wireless LAN Local Area Network
  • a wired LAN a wired LAN
  • a WAN Wide Area Network
  • public line network a mobile data communication network, or a combination of these networks.
  • Fig. 6 is a block diagram showing the configuration of the information processing device 1A.
  • the communication unit 16A communicates with devices external to the information processing device 1A. As an example, the communication unit 16A communicates with the terminal device 2A. The communication unit 16A transmits data supplied from the control unit 10A to the terminal device 2A, and supplies data received from the terminal device 2A to the control unit 10A.
  • the storage unit 15A stores various information referenced by the control unit 10A and various information derived by the control unit 10A.
  • Output value information PI including output values by each of a plurality of experts;
  • Loss value information LI including loss values corresponding to the output values by each of a plurality of experts
  • Reliability information CI indicating at least one of the reliability of each of the multiple experts and the reliability of each of the multiple first derivation units 121-1, 121-2, ... described later
  • a second optimal solution (second decision-making result) DR2 derived by a second derivation unit 122 described later is stored.
  • the information processing device system 100A may be configured to include an "expert (model)” or may be configured to obtain the above output values from an “expert (model)” external to the system.
  • Control unit 10A As shown in FIG. 6 , the control unit 10A includes an acquisition unit 11 and a derivation unit 12 .
  • the acquiring unit 11 acquires output values obtained from each of a plurality of experts (models) in the same manner as in the exemplary embodiment 1.
  • the acquiring unit 11 may further acquire the loss value.
  • the output value and the loss value have been explained in the exemplary embodiment 1, and therefore the explanation thereof will be omitted. Specific examples of the output value and the loss value will be explained later.
  • each first derivation unit 121-j (j is an index for distinguishing the first derivation units from one another) derives a first optimal solution (first decision-making result) (p t (j)) by referring to each output value (m t ) acquired by the acquisition unit 11 in round t.
  • the second derivation unit 122 derives a second optimal solution (second decision-making result) in accordance with the first optimal solution (first decision-making result) (p t (j)) derived by each of the multiple first derivation units 121-1, 121-2, ... and the reliability (w t (j)) of each of the multiple first derivation means.
  • the first derivation process may be referred to as a base algorithm, and the second derivation process may be referred to as a master algorithm, but these terms do not limit this exemplary embodiment.
  • Each of the first derivation units 121-1, 121-2, ... may be simply referred to as the first derivation unit 121.
  • the first derivation process and the second derivation process may also be referred to as online machine learning processes (online learning algorithms) that refer to the output values that are sequentially obtained.
  • the reliability is an index indicating the extent to which the output values by each expert (model) are reflected in the decision-making process.
  • the reliability may be expressed as an index indicating the extent to which the first optimal solutions by each first derivation process are reflected in the decision-making process.
  • the reliability can be expressed as a relative weight calculated on the predicted values by each expert, or as a relative weight calculated on each of the first optimal solutions.
  • the derivation unit 12 may derive the reliability by referring to the output value acquired by the acquisition unit 11 and at least one of the first optimal solutions derived by each of the multiple first derivation processes.
  • the reliability derivation process may be executed as part of the second derivation process.
  • the reliability is derived by referring to at least one of the output value and the first optimal solution, so that a suitable reliability can be derived.
  • a suitable second optimal solution can be derived.
  • a more specific example of the reliability derivation process will be described later.
  • the derivation unit 12 may also be configured to initialize the reliability at a predetermined timing.
  • the derivation unit 12 may be configured to initialize the reliability (set the reliability to an initial value) every time the second optimal solution is derived a predetermined number of times (e.g., 100 times). In this way, the derivation unit 12 can appropriately respond to changes in the environment by initializing the reliability at a predetermined timing.
  • the initialization of the reliability may be executed as part of the process of restarting at least one of the first derivation process and the second derivation process described above.
  • the derivation unit 12 may also determine the timing for initializing the reliability depending on the period of the environmental change. As an example, when the decision-making process by the information processing device system 100A according to this exemplary embodiment is applied to a situation in which the environmental change occurs approximately once a month, the derivation unit 12 may be configured to initialize the reliability every month.
  • the derivation unit 12 may also perform a process of estimating a loss value corresponding to the first optimal solution.
  • the process of estimating the loss value may be executed as a part of the above-mentioned second derivation process.
  • the derivation unit 12 may also perform a process of updating the first optimal solution by referring to the derived loss value in the above-mentioned first derivation process.
  • the loss value corresponding to the output value may be obtainable by observation in some cases, but may not be obtainable in other cases.
  • the derivation unit 12 estimates a loss value corresponding to the first optimal solution and updates the first optimal solution by referring to the estimated loss value, so that even if the loss value corresponding to the output value cannot be obtained by observation, it is possible to derive a suitable optimal solution (decision-making result).
  • the derivation unit 12 may also be configured to calculate the loss value as an unbiased estimator. With this configuration, the optimization is derived by referring to the loss value calculated as an unbiased estimator, so that a more suitable optimal solution (decision-making result) can be derived.
  • the derivation unit 12 may further refer to parameters for correcting the loss value or the reliability to derive the decision-making result.
  • the terminal device 2A includes a control unit 20A, an execution unit 21, and a communication unit 26.
  • the terminal device 2A can be specifically realized as a checkout terminal installed in a store, an inventory management terminal installed in a warehouse, or the like, but this is not intended to limit the present exemplary embodiment.
  • the communication unit 26 communicates with devices external to the terminal device 2A. As an example, the communication unit 26 communicates with the information processing device 1A. The communication unit 26 transmits data supplied from the control unit 20A to the information processing device 1A, and supplies data received from the information processing device 1A to the control unit 20A.
  • the execution unit 21 performs the decision-making result derived by the derivation unit 12 or a process corresponding to the decision-making result. As an example, the execution unit 21 executes the second decision-making result (second optimal solution) derived by the derivation unit 12.
  • the execution unit 21 may be configured to execute at least a part of the multiple first decision-making results derived by the derivation unit 12 instead of or together with the second decision-making result derived by the derivation unit 12.
  • the execution unit 21 may be configured to be able to display the second decision-making result derived by the derivation unit 12 and at least a part of the multiple first decision-making results derived by the derivation unit 12, or may be configured to display the second decision-making result and the multiple first decision-making results so that they can be compared with each other.
  • the control unit 20A includes a loss value acquisition unit 22 and a loss value providing unit 23.
  • the loss value acquisition unit 22 acquires the loss value.
  • the loss value providing unit 23 provides the loss value acquired by the loss value acquisition unit 22 to the information processing device 1A via the communication unit 26.
  • FIG. 7 is a diagram for explaining the sequential decision-making process by the information processing system 100A according to this exemplary embodiment.
  • the information processing device 1A acquires output values associated with each expert (model) for multiple experts, and derives a decision-making result by referring to each acquired output value.
  • the decision-making process performed by the information processing device 1A is, as an example, a hierarchical decision-making process by multiple first derivation units 12-1, 121-2, ... and second derivation unit 122, as described above.
  • the derived decision-making result (optimal solution) is executed, and a loss value corresponding to the next round can be observed. If a loss value can be obtained by observation, the loss value and an output value corresponding to the loss value are provided to the information processing device 1A and are referenced in the decision-making process in the next round. In this way, the information processing system 100A sequentially derives decision-making results.
  • Algorithm 1 is mainly executed by each of the multiple first derivation units 121-1, 121-2, ..., and algorithm 2 is mainly executed by the second derivation unit 122.
  • Algorithm 1 is mainly executed by each of the multiple first derivation units 121-1, 121-2, ..., and algorithm 2 is mainly executed by the second derivation unit 122.
  • the acquisition unit 11 acquires a graph G, a parameter ⁇ , and a parameter T.
  • the graph G is defined by a vertex V and an edge (side, link) E.
  • the graph G is, as an example, a directed graph called a feedback graph.
  • each vertex V corresponds to a possible option that the decision-making result can take, and each edge indicates the observability of a loss.
  • a directed edge E(i,j) from vertex V(i) to vertex V(j) indicates that if the decision-making result indicates option i (if the player selects option i), then the loss value associated with option j is observable.
  • the top part of Fig. 8 shows Example 1 of a feedback graph.
  • This example is a feedback graph that corresponds to the problem setting of "If the apples are delicious, it is better to ship them, but if they are not, it is better not to ship them.”
  • the feedback graph of this example has option V(1) of not shipping the apples and option V(2) of shipping the apples. If option V(1) of not shipping the apples is selected, the apples can be tasted and it is possible to know whether they are delicious or not. Therefore, it is possible to know whether it would have been better to ship the apples or not.
  • option V(2) which involves shipping the apples
  • the apples cannot be tasted and therefore it is not known whether they are delicious or not.
  • option V(2) which involves shipping the apples, is selected, no loss value can be observed. This corresponds to the absence of an outward arrow (outward edge) in the feedback graph that has option V(2) as its origin (starting point).
  • the lower part of Figure 8 shows feedback graph example 2.
  • This example is a feedback graph that corresponds to the problem statement of "want to order the appropriate amount of a certain product.”
  • the feedback graph of this example has option V(1) of ordering 300 units, option V(2) of ordering 200 units, and option V(3) of ordering 100 units, which shows that when a larger amount is ordered, the loss value when a smaller amount is ordered can be observed, but when a smaller amount is ordered, it may not be possible to observe the loss value when a larger amount is ordered.
  • bandit feedback corresponds to the case where each of the multiple vertices has only self-loops
  • full information feedback corresponds to the case where all pairs contained in the multiple vertices are bidirectionally connected and all vertices have self-loops.
  • algorithm 1 according to this exemplary embodiment is configured to be applicable to any feedback graph
  • information processing system 100A according to this exemplary embodiment can be applied to a very wide range of problem settings.
  • the parameter ⁇ acquired by the acquisition unit 11 serves as a learning rate that is referenced to derive the reliability, as described below.
  • the parameter T acquired by the acquisition unit 11 is a natural number that specifies the total number of rounds.
  • the first derivation unit 121 determines the initial value of a parameter (weighting factor) p′ t to be referenced in order to derive a first optimal solution (first decision-making result) p t as follows:
  • K represents the total number of options that can be taken as the first optimal solution (first decision-making result), and is also the total number of vertices
  • the parameter (weighting factor) p' t referred to in order to derive p t can be considered as a parameter representing the reliability of each output value (m t ) of each expert (model).
  • this interpretation does not limit this exemplary embodiment.
  • [K] is a K-dimensional vector (array) defined by (i.e., a set whose elements are the natural numbers 1 through K).
  • the first derivation unit 121 executes the loop process specified by "2:” to “7:” in algorithm 1.
  • the first derivation unit 121 repeats the process specified in "3:” to "7:” in algorithm 1 for each round.
  • the first derivation unit 121 derives the function ⁇ (p) as follows:
  • the function ⁇ (p) serves as a convex function that defines the Bregman divergence, which will be described later.
  • p(i) is defined as follows:
  • N in (i) in the definition of the function ⁇ (p) is expressed as follows: (in other words, for a vertex V(i) with vertex number i, a set whose element is the vertex number j of a vertex V(j) that is the starting point (start point) of an edge entering the vertex V(i)), and
  • indicates the number of elements of the set.
  • the function ⁇ (p) is a function that returns 1 if the condition in [ ] is satisfied, that is, if
  • K. This second term is also called the logarithmic barrier term.
  • the Bregman information that the first derivation unit 121 according to this exemplary embodiment refers to in order to derive the first optimal solution is described by a convex function ⁇ (p), and the convex function ⁇ (p) includes the logarithmic barrier term described above.
  • a logarithmic barrier term By using such a logarithmic barrier term, it is possible to perform decision-making processing that can be flexibly applied to various environments (various problem settings (various feedback graphs)).
  • the acquisition unit 11 acquires the output value (m t ) output by each expert (model).
  • the output value output by each expert (model) has been described above, so a description thereof will be omitted here.
  • Equation 8 the first term on the right-hand side of (Equation 8) represents the inner product of mt and p, and the second term on the right-hand side represents the Bregman divergence defined using the function ⁇ .
  • ⁇ ′ K in ( Equation 8) is expressed as follows: is the set defined by
  • the first derivation part 121 is ⁇ The inner product of m t and p, ⁇ m t , p>, Derive p that minimizes the linear sum of p, p' t , and the Bregman information D ⁇ (p, p' t ) defined by ⁇ as a first optimal solution (first decision-making result).
  • the first derivation unit 121 executes a first derivation process (base algorithm) that derives a first optimal solution ( pt ) by referring to the output values ( mt ) obtained from each of one or more models (experts).
  • the information processing system 100A executes the first optimal solution p t and observes the loss value l t associated with the execution of the optimal solution.
  • the acquisition unit 11 acquires a parameter ⁇ t for correction.
  • the execution of the first optimal solution p t may be executed by the execution unit 21 of the terminal device 2A, as an example.
  • the acquisition of the loss value is executed by the loss value acquisition unit 22, as an example.
  • at least a part of the derived first optimal solution p t may be supplied to the second derivation unit 122 that executes algorithm 2, which will be described later, without being executed.
  • the loss value l t cannot be observed.
  • At least a part of the loss value l t and the parameter ⁇ t for correction may use values derived by algorithm 2, which will be described later.
  • the first derivation unit 121 refers to the loss value l t and updates the parameter (p' t ) for deriving the first optimal solution (p t ).
  • a t is expressed as More specifically, a t is a parameter defined by the loss l t (i), the output value m t (i) of the expert (model), the correction parameter ⁇ t , and the learning rate ⁇ .
  • the correction parameter ⁇ t can be considered as a parameter for correcting at least one of the loss value l t , the output value m t , the parameter p′ t , and the first optimal solution p t .
  • the parameter a t defined by (Equation 12) can also be considered as a parameter for correction in the same sense.
  • the first derivation unit 121 can be considered to be configured to update the first optimal solution (p t ) by further referring to the parameter ⁇ t or a t for correction.
  • algorithm 2 is executed with reference to base algorithm B.
  • base algorithm B refers to algorithm 1 described above, as an example.
  • Algorithm 2 is executed with reference to multiple base algorithms B.
  • the second derivation unit 122 executes the second algorithm in cooperation with each of the first derivation units 121 that execute each of the multiple algorithms 1.
  • the second derivation unit 122 uses the total number of rounds T to derive the parameter M.
  • the parameter M has the meaning of the total number of base algorithms (algorithm 1) that algorithm 2 refers to, but this does not limit the present exemplary embodiment.
  • the second derivation unit 122 calculates the parameter ⁇ (j) as
  • the index j is an index for distinguishing a plurality of base algorithms from each other, and as an example, takes an integer value from 1 to M, as shown in “1:” of algorithm 2.
  • the parameter ⁇ (j) serves as a learning rate referenced to derive the reliability (reliability of each base algorithm) w t (j), which will be described later. Therefore, it can be expressed that the second derivation unit 122 sets the learning rate ⁇ (j) referenced to derive the reliability w t (j) to a value proportional to the -1/2 power of the total number M of base algorithms (first derivation process).
  • each of the first derivation parts 121-1, 121-2, . . . may be expressed as 121-j using the above-mentioned index j.
  • ⁇ M denotes a set obtained by replacing K with M in the following definition.
  • the second derivation unit 122 calculates each base algorithm
  • the second derivation unit 122 initializes the base algorithm B j as follows: G, ⁇ (j), T This includes passing each value.
  • the second derivation unit 122 executes the loop process specified by "4:” to “13:” of algorithm 2.
  • the second derivation unit 122 repeats the process specified by "5:” to "13:” of algorithm 1 for each round.
  • the acquisition unit 11 acquires a predicted value m t
  • the second derivation unit 122 supplies the predicted value m t to each base algorithm (each algorithm 1) via each first derivation unit 121-j.
  • the acquisition unit 11 acquires each first optimal solution (first decision-making result) p t,j by each base algorithm (each algorithm 1). Then, as shown in “6:” of Algorithm 2, the second derivation unit 122 calculates the parameter h t (j) as follows: Here, ⁇ ,> denotes the dot product.
  • the second derivation unit 122 A reliability vector wt indicating the reliability of each base algorithm (each algorithm 1) (each first derivation means 121-j) is derived by: where D ⁇ represents the above-mentioned Bregman divergence. However, the convex function ⁇ that defines the Bregman divergence is given by: As described above, w't is a weighting factor that is referred to in order to derive the reliability wt . In this manner, the second derivation unit 122 derives the reliability ( wt (j)) by referring to the output value ( mt ) and the first optimal solution ( pt ).
  • the second derivation unit 122 A first optimal solution (first decision-making result) p t,j by each base algorithm (each algorithm 1), Reliability w t (j), which is each component of the reliability vector w t indicating the reliability of each base algorithm Using A second optimal solution (second decision-making result) p t is derived by:
  • the second derivation unit 122 executes a second derivation process (master algorithm) that derives a second optimal solution (p t ) in accordance with the first optimal solution (p t,j ) derived by each of the multiple first derivation processes (base algorithms) described above and the reliability (w t ( j )) of each of the multiple first derivation processes (base algorithms).
  • master algorithm a second derivation process that derives a second optimal solution (p t ) in accordance with the first optimal solution (p t,j ) derived by each of the multiple first derivation processes (base algorithms) described above and the reliability (w t ( j )) of each of the multiple first derivation processes (base algorithms).
  • the second derivation unit 122 derives the second optimal solution (second decision-making result) p t by a weighted sum of each of the first optimal solutions (first decision-making result) p t,j derived by the multiple first derivation units 121- j , the weighted sum corresponding to the reliability w t (j) of each of the multiple first derivation units 121-j.
  • the second derivation unit 122 derives the second decision-making result p t in accordance with the first decision-making result (p t,j ) derived by each of the multiple first derivation units 121-j and the reliability (w t (j)) of each of the multiple first derivation units 121- j .
  • the first decision-making result is made with reference to the output value (m t ) provided by each expert (model) as described above. Therefore, according to the above configuration, a suitable decision-making result can be derived by hierarchical processing by the first derivation unit 121 and the second derivation unit 122 with reference to the output value provided by each expert (model).
  • the second derivation unit 122 identifies an option i t according to the second optimal solution p t . More specifically, the second derivation unit 122 selects an option i t with a probability according to the probability distribution indicated by the second optimal solution p t .
  • i is where i t indicates the i in the t step.
  • N out (i) in the formula satisfies the following: In other words, N out (i) is a set whose elements are the vertex numbers j of the vertices V(j) that are the end points of the edges going out from the vertex V(i) with the vertex number i.
  • an option i t selected according to the derived second optimal solution (second decision-making result) p t is executed by the execution unit 21 of the terminal device 2A, for example.
  • a loss value l t corresponding to the second optimal solution p t is acquired by the loss value acquisition unit 22 of the terminal device 2A and provided to the information processing device 1A.
  • N out (i t ) is a set whose elements are the vertex numbers j of the vertices V( j ) that are the end points of the edges going out from the vertex V(i t ) of the vertex V(i t ) with the vertex number i t , so the loss value l t corresponding to the option i t is an observable loss value.
  • the second derivation unit 122 refers to the loss value l t acquired by observation in “10:” of Algorithm 2 and the output value m t of the expert (model), and calculates:
  • P t (i) is calculated by using the first optimal solution p t (j) derived by each of the first derivation processes (base algorithm) .
  • the first term on the right side of (Equation 26) is a function that returns 1 if the condition in [ ] is satisfied, i.e., if index i is an element of set N out (i t ) (in other words, if loss value l t is obtained by observation), and returns 0 otherwise.
  • the loss value with a hat ( ⁇ l t ) derived by the second derivation unit 122 in this manner is given by (where E t [ ] represents the expected value). That is, the second derivation unit 122 derives the loss value with a hat ( ⁇ l t ) as an unbiased estimate. In this way, the second derivation unit 122 can suitably derive the loss value with a hat ( ⁇ l t ) whether the loss value l t is obtained by observation or not, and therefore can suitably perform decision-making (derive an optimal solution) whether the loss value l t is obtained by observation or not.
  • the information processing system 100A can perform decision-making processing that is flexibly applicable to various environments (various problem settings (various feedback graphs)).
  • both the loss value l t obtained by observation and the loss value with a hat ( ⁇ l t ) derived by the second derivation unit 122 may be simply expressed as a loss value.
  • the second derivation unit 122 supplies the loss value ( ⁇ l t ) derived in “11:” of Algorithm 2 and the correction parameter ⁇ t to the base algorithm B j .
  • the correction parameter ⁇ t is, for example, In other words, the second derivation unit 122 derives the correction parameter ⁇ t as the inner product (multiplied by minus 1) of the loss value ( ⁇ l t ) and the optimal solution (decision-making result) p t .
  • the loss value ( ⁇ l t ) provided in this step corresponds to the l t obtained in “6:” of Algorithm 1, for example.
  • the ⁇ t (j) provided in this step corresponds to the correction parameter ⁇ t obtained in “7:” of Algorithm 1.
  • the second derivation unit 122 updates the weight factor w′ t in round t to a weight factor w′ t+1 in round t+1 . More specifically, the second derivation unit 122 updates the weight factor w′ t+1 in round t+ 1 as follows:
  • gt is derived by: and bt is defined by is defined as follows:
  • the parameter (weighting factor) w't referred to in deriving the reliability wt can also be considered as a parameter representing the reliability of each base algorithm (the reliability of each first optimal solution).
  • this interpretation does not limit the present exemplary embodiment.
  • a plurality of first derivation processes are executed in which output values obtained from each of one or more models (experts) are acquired and a first optimal solution is derived by referring to the acquired output values, and a second derivation process is executed in which a second optimal solution is derived according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the information processing device 1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing device 1 according to this exemplary embodiment, it is possible to derive a more appropriate decision-making result (optimal solution).
  • the reliability may be initialized at a predetermined timing, making it possible to appropriately respond to changes in the environment.
  • the information processing system 100A according to this exemplary embodiment is applicable to any feedback graph G(V,E).
  • the information processing system 100A according to this exemplary embodiment is applicable whether the loss value can be observed or not.
  • the loss value ( ⁇ l t ) may be derived by estimation (more specifically, as an unbiased estimator), and this allows the information processing system 100A to be suitably applicable to any feedback graph G(V,E).
  • the information processing system 100A is capable of handling any feedback graph G(V,E) and is capable of suitably handling changes in the environment.
  • FIG. 9 shows an example in which the information processing system 100A is applied to a decision-making problem regarding the shipping of apples.
  • the total number of decision-making times is set to 200.
  • each loss is defined as: Loss when the apples are delicious: 1 if they are not shipped, 0 if they are shipped. Loss when apples are not tasty: 0 if not shipped, 1 if shipped It was decided.
  • the probability that the apples are delicious changes as a result of environmental changes.
  • the average for the first 100 times was 0.9.
  • the average reliability was set to 0.1 in the latter 100 times.
  • the reliability was initialized (the algorithm was initialized) every 50 times.
  • the output value (m t ) of the expert (model) was always set to 0.
  • Figure 9 shows the progress of the loss value using the information processing system 100A according to this exemplary embodiment and the progress of the loss value using the configuration according to the comparative example.
  • the loss value using the information processing system 100A is significantly smaller than the loss value using the configuration according to the comparative example, and it can be seen that it is able to quickly adapt to the environmental changes that occur on the 100th iteration. It can also be seen that the loss value quickly converges even after the reliability is initialized every 50 iterations. In this way, it can be seen that the information processing system 100A according to this exemplary embodiment has significantly higher adaptability to environmental changes than the configuration according to the comparative example.
  • FIG. 10 is a diagram showing a schematic diagram of a process performed by the information processing device 1 according to this embodiment.
  • the information processing device 1 of this example makes a decision regarding matching multiple medical professionals with multiple hospitals (derives a decision-making result).
  • the information processing device 1 of this example acquires output values associated with each expert (model) for multiple experts in a certain round, and derives a decision-making result by referring to the acquired output values.
  • the specific configuration of the information processing device 1 in this example does not limit this application example, but may be the same as the information processing device 1 described in exemplary embodiment 1, or may be the same as the information processing device 1A described in exemplary embodiment 2.
  • the derived decision-making result is executed, and a loss value corresponding to the next round is obtained by observation or by derivation.
  • the loss value and an output value corresponding to the loss value are provided to the information processing device 1 in this example, and are referenced in the decision-making process in the next round.
  • a person in charge of inputting data at the hospital inputs information (also called hospital data) such as diagnosis status, availability of hospital rooms, medical departments, and consultation hours into the information processing system 100 via a terminal device 2A or the like.
  • information also called hospital data
  • Each piece of input information is stored in the memory unit 15A, for example, and is referenced by the control unit 10A.
  • the lower part of Figure 11 is an example of hospital data managed by the information processing system 100 according to this example.
  • each medical worker inputs their own data (specialty, years of service, preferred hospital, etc.) (also referred to as medical worker data) into the information processing system 100.
  • Each piece of input information is stored in the memory unit 15A, as an example, and is referenced by the control unit 10A.
  • the top part of Figure 11 is an example of medical worker data managed by the information processing system 100 of this example.
  • the information processing device 1 derives a decision-making result regarding optimal matching of hospitals and medical professionals by referring to the hospital data and medical professional data.
  • each of the multiple experts (models) according to this example calculates an output value by referring to the hospital data and medical professional data.
  • the information processing device 1 according to this example derives a decision-making result regarding optimal matching of hospitals and medical professionals by referring to these output values.
  • the information processing system 100 proposes optimal hospital candidates to the medical worker via the terminal device 2A or the like.
  • the information processing system 100 according to this example presents optimal hospital candidates to the medical worker via a display panel or the like provided in the terminal device 2A.
  • the information processing system 100 according to this example registers work-related information for each medical worker.
  • the information processing system 100 records the number of patients visiting each month (each round) for each hospital. Then, the information processing system 100 according to this example determines the allocation for the next time (next round) based on a loss value according to the number of patients visiting.
  • the specific example of the loss value is not limited to this example, but as an example, a loss value according to the degree of congestion at the hospital can be used.
  • the loss value is (Actual number of patients visiting each hospital) – (Number of medical staff assigned to each hospital x Number of patients that each medical staff can see)
  • the calculation may be performed as follows.
  • the information processing system 100 can make appropriate decisions whether or not the loss value is observable, and therefore can appropriately match hospitals with medical personnel even in the above-mentioned cases.
  • hospital data and medical personnel data may change over time.
  • the information processing system 100 according to this example can appropriately match hospitals with medical personnel even in the case of such environmental changes.
  • control blocks (particularly the acquisition unit 11 and the derivation unit 12) of the information processing device 1, 1A and the terminal device 2, 2A may be realized by a logic circuit (hardware) formed on an integrated circuit (IC chip) or the like, or may be realized by software.
  • the information processing device 1, 1A and the terminal device 2, 2A are provided with a computer that executes the instructions of a program, which is software that realizes each function.
  • This computer is provided with, for example, at least one processor (control device) and at least one computer-readable recording medium that stores the above program.
  • the object of the present invention is achieved by the processor in the computer reading the above program from the recording medium and executing it.
  • the processor can be, for example, a CPU (Central Processing Unit).
  • the recording medium can be a "non-transient tangible medium" such as a ROM (Read Only Memory), as well as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, etc.
  • the computer can also be provided with a RAM (Random Access Memory) for expanding the above program.
  • the above program can also be supplied to the computer via any transmission medium (such as a communication network or broadcast waves) that can transmit the program.
  • any transmission medium such as a communication network or broadcast waves
  • one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.
  • (Appendix A1) obtaining means for obtaining output values resulting from each of the one or more models; an information processing device comprising: a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means; and a derivation means that executes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the derivation means is deriving the reliability by referring to at least one of the output value and the first optimal solution;
  • the information processing device according to claim A1, wherein the reliability is initialized at a predetermined timing.
  • the derivation means is Estimating a loss value corresponding to the first optimal solution;
  • the information processing device according to claim A1 or A2, wherein a parameter for deriving the first optimal solution is updated by referring to the loss value.
  • Appendix A6 The information processing device according to any one of appendices A1 to A5, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the ⁇ 1/2 power of a total number of the first derivation processes.
  • Appendix A7 The information processing device according to any one of appendices A1 to A5, wherein the Bregman divergence referred to by the derivation means to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.
  • Appendix A8 The information processing device according to any one of appendices A1 to A7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values that are sequentially acquired.
  • At least one processor an acquisition process for acquiring output values from each of the one or more models; an information processing method including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a derivation process that includes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the at least one processor Estimating a loss value corresponding to the first optimal solution; The information processing method according to claim 1 or 2, further comprising updating a parameter for deriving the first optimal solution by referring to the loss value.
  • Appendix B6 The information processing method according to any one of appendices B1 to B5, wherein, in the derivation process, the at least one processor sets a learning rate referred to for deriving the reliability to a value proportional to the -1/2 power of a total number of the first derivation processes.
  • Appendix B7 The information processing method according to any one of appendices B1 to B5, wherein the Bregman divergence referred to in the derivation process to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.
  • Appendix B8 The information processing method according to any one of appendices B1 to B7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values obtained sequentially.
  • Appendix C1 A program for causing a computer to function as an information processing device, The computer, obtaining means for obtaining output values resulting from each of the one or more models; an information processing program that causes the information processing device to function as a derivation means that executes a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means, and a second derivation process that derives a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the derivation means is deriving the reliability by referring to at least one of the output value and the first optimal solution;
  • the derivation means is Estimating a loss value corresponding to the first optimal solution;
  • the information processing program according to claim 1 or 2 further comprising updating a parameter for deriving the first optimal solution by referring to the loss value.
  • Appendix C6 The information processing program according to any one of appendices C1 to C5, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the ⁇ 1/2 power of a total number of the first derivation processes.
  • Appendix C7 The information processing program according to any one of appendices C1 to C5, wherein the Bregman divergence referred to by the derivation means to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.
  • Appendix C8 The information processing program according to any one of appendices C1 to C7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values obtained sequentially.
  • At least one processor comprising: an acquisition process for acquiring output values from each of the one or more models; an information processing device that executes a derivation process including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • the information processing device may further include a memory.
  • the memory may also store a program for causing the at least one processor to execute each of the processes.
  • Appendix D6 The information processing device according to any one of appendices D1 to D5, wherein, in the derivation process, the at least one processor sets a learning rate referred to for deriving the reliability to a value proportional to the ⁇ 1/2 power of a total number of the first derivation processes.
  • Appendix D7 The information processing device according to any one of appendices D1 to D5, wherein the Bregman divergence referred to in the derivation process to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.
  • Appendix D8 The information processing device according to any one of appendices D1 to D7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values acquired sequentially.
  • (Appendix E1) A program for causing a computer to function as an information processing device, The computer includes: an acquisition process for acquiring output values from each of the one or more models; A non-transitory recording medium having recorded thereon an information processing program for executing a derivation process including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
  • Reference Signs List 1A Information processing device 100, 100A: Information processing system 10A: Control unit 11: Acquisition unit 12: Derivation unit 121: First derivation unit 122: Second derivation unit S1, S100: Information processing method

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

To be able to derive a more suitable decision-making result (optimal solution), an information processing device (1) is provided with: an acquisition means for acquiring an output value obtained from each of one or a plurality of models; and a derivation means for executing a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a second derivation process for deriving a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

Description

情報処理装置、情報処理方法、情報処理システム、及びプログラムInformation processing device, information processing method, information processing system, and program

 本発明は、情報処理装置、情報処理方法、情報処理システム、及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, an information processing system, and a program.

 需要量又は供給量等に関する予測(意思決定結果)を導出し、導出した予測の実行結果を観測し、当該観測結果に基づき更なる予測を導出するというプロセスを逐次的に繰り返す逐次的意思決定技術が知られている。 A sequential decision-making technique is known in which a process is sequentially repeated: deriving a prediction (decision-making result) regarding demand or supply volume, etc., observing the results of executing the derived prediction, and deriving a further prediction based on the observed results.

 例えば、特許文献1には、不確定性要因を含む設備投資計画の立案及び評価を行う最適意思決定方法が記載されている。 For example, Patent Document 1 describes an optimal decision-making method for planning and evaluating capital investment plans that include uncertain factors.

特開2005-108147号公報JP 2005-108147 A

 一般に、逐次的意思決定技術では、より適切な意思決定結果(最適解)を導出することが求められるが、特許文献1に記載の技術では、この点において改善の余地があった。 Generally, sequential decision-making techniques are required to derive more appropriate decision-making results (optimal solutions), but the technique described in Patent Document 1 leaves room for improvement in this regard.

 本発明の一態様は、上記の問題に鑑みてなされたものであり、その目的の一例は、より適切な意思決定結果(最適解)を導出することのできる技術を提供することにある。 One aspect of the present invention was made in consideration of the above problems, and one of its objectives is to provide a technology that can derive more appropriate decision-making results (optimal solutions).

 本発明の一態様に係る情報処理装置は、1又は複数のモデルの各々から得られる出力値を取得する取得手段と、前記取得手段が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行する導出手段とを備えている。 An information processing device according to one aspect of the present invention includes an acquisition means for acquiring an output value obtained from each of one or more models, a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a derivation means for executing a second derivation process for deriving a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 本発明の一態様に係る情報処理方法は、情報処理装置が、1又は複数のモデルの各々から得られる出力値を取得し、前記取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行する。 In one aspect of the information processing method of the present invention, an information processing device executes a plurality of first derivation processes that acquire output values obtained from each of one or more models and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 本発明の一態様に係るプログラムは、コンピュータを情報処理装置として機能させるプログラムであって、前記プログラムは、前記コンピュータに、1又は複数のモデルの各々から得られる出力値を取得させ、前記取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行させる。 A program according to one aspect of the present invention is a program that causes a computer to function as an information processing device, and the program causes the computer to acquire output values obtained from each of one or more models, and execute a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 本発明の一態様に係る情報処理システムは、情報処理装置と、端末装置とを含む情報処理システムであって、前記情報処理装置は、1又は複数のモデルの各々から得られる出力値を取得する取得手段と、前記取得手段が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行する導出手段とを備え、前記端末装置は、前記情報処理装置が導出した前記第2の最適解を実行する実行手段を備えている。 An information processing system according to one aspect of the present invention is an information processing system including an information processing device and a terminal device, the information processing device includes an acquisition means for acquiring an output value obtained from each of one or more models, a plurality of first derivation processes for deriving a first optimal solution by referring to the output value acquired by the acquisition means, and a derivation means for executing a second derivation process for deriving a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes, and the terminal device includes an execution means for executing the second optimal solution derived by the information processing device.

 本発明の一態様によれば、より適切な意思決定結果(最適解)を導出することができる。 According to one aspect of the present invention, it is possible to derive a more appropriate decision-making result (optimal solution).

例示的実施形態に係る情報処理装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理方法の流れを示すフロー図である。1 is a flow diagram illustrating a flow of an information processing method according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による処理を説明するための図である。FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理システムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an information processing system according to an exemplary embodiment. 例示的実施形態に係る情報処理システムによる処理の流れを示すフロー図である。FIG. 1 is a flow diagram illustrating a process flow of an information processing system according to an exemplary embodiment. 例示的実施形態に係る情報処理装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による処理を説明するための図である。FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による処理を説明するための図である。FIG. 1 is a diagram for explaining a process performed by an information processing device according to an exemplary embodiment. 例示的実施形態に係る情報処理装置による効果を説明するための図である。11A to 11C are diagrams for explaining effects achieved by an information processing device according to an exemplary embodiment. 例示的実施形態の適用例に係る情報処理装置による処理を説明するための図である。1 is a diagram for explaining processing by an information processing device according to an application example of an exemplary embodiment. 例示的実施形態の適用例に係る情報処理装置が参照する情報の例を示す図である。FIG. 11 is a diagram showing an example of information referred to by an information processing device according to an application example of the exemplary embodiment. 各例示的実施形態に係る情報処理装置として機能するコンピュータの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a computer that functions as an information processing device according to each exemplary embodiment.

 以下、本発明の実施形態を例示する。ただし、本発明は、以下に示す各例示的実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。例えば、以下に示す各例示的実施形態において採用される技術的手段を適宜組み合わせることにより得られる実施形態についても、本発明の範疇に含まれ得る。また、以下に示す各例示的実施形態において採用される技術的手段の一部を適宜省略することにより得られる実施形態についても、本発明の範疇に含まれ得る。また、以下に示す各例示的実施形態において言及する効果は、その例示的実施形態において期待される効果の一例であり、本発明の外延を規定するものではない。すなわち、以下に示す各例示的実施形態において言及する効果を奏さない実施形態についても、本発明の範疇に含まれ得る。 Below are examples of embodiments of the present invention. However, the present invention is not limited to the exemplary embodiments shown below, and various modifications are possible within the scope of the claims. For example, embodiments obtained by appropriately combining the technical means employed in the exemplary embodiments shown below may also be included in the scope of the present invention. Furthermore, embodiments obtained by appropriately omitting some of the technical means employed in the exemplary embodiments shown below may also be included in the scope of the present invention. Furthermore, the effects mentioned in the exemplary embodiments shown below are examples of effects expected in the exemplary embodiments, and do not define the scope of the present invention. In other words, embodiments that do not exhibit the effects mentioned in the exemplary embodiments shown below may also be included in the scope of the present invention.

 〔第1の例示的実施形態〕
 本発明の実施形態の一例である第1の例示的実施形態について、図面を参照して詳細に説明する。本例示的実施形態は、後述する各例示的実施形態の基本となる形態である。なお、本例示的実施形態において採用する各技術的手段の適用範囲は、本例示的実施形態に限定されない。すなわち、本例示的実施形態において採用する各技術的手段は、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。また、本例示的実施形態を説明するために参照する図面に示される各技術的手段も、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。
First Exemplary Embodiment
A first exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. This exemplary embodiment is the basic form of each exemplary embodiment described later. The scope of application of each technical means adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technical means adopted in this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure to the extent that no particular technical obstacle occurs. In addition, each technical means shown in the drawings referred to for explaining this exemplary embodiment can also be adopted in other exemplary embodiments included in this disclosure to the extent that no particular technical obstacle occurs.

 <情報処理装置1の概要>
 まず、本例示的実施形態に係る情報処理装置1の概要について説明する。本例示的実施形態に係る情報処理装置1は、複数のエキスパート(モデル)の各々から、当該エキスパートによる出力値を取得し、取得した複数の出力値を参照して意思決定を行う情報処理装置である。また、本例示的実施形態に係る情報処理装置1は、出力値の取得と、意思決定とを逐次的に行う。例えば、ラウンドtにおいて、エキスパート1から出力値P1tを取得し、エキスパート2から出力値P2tを取得し、取得した出力値P1tとP2tとを参照して、当該ラウンドtにおける意思決定結果(t)を導出する。そして、情報処理装置1は、意思決定のためのパラメータを更新し、次のラウンドにおける出力値を取得し、当該次のラウンドにおける意思決定結果を導出するという処理を行う。なお、tは繰り返し回数を表現するインデックスであり、タイミングを示すインデックスと解釈することもできる。
<Overview of information processing device 1>
First, an overview of the information processing device 1 according to this exemplary embodiment will be described. The information processing device 1 according to this exemplary embodiment is an information processing device that acquires an output value by each of a plurality of experts (models) from the corresponding expert, and makes a decision by referring to the acquired plurality of output values. In addition, the information processing device 1 according to this exemplary embodiment sequentially acquires output values and makes a decision. For example, in round t, an output value P1t is acquired from expert 1, an output value P2t is acquired from expert 2, and a decision-making result (t) in the round t is derived by referring to the acquired output values P1t and P2t. Then, the information processing device 1 performs a process of updating parameters for decision-making, acquiring an output value in the next round, and deriving a decision-making result in the next round. Note that t is an index that represents the number of repetitions, and can also be interpreted as an index indicating timing.

 本例示的実施形態において、「エキスパート」とは、何らかの出力値を出力するハードウェア、ソフトウェア、生体の何れであってもよい。一例として、「エキスパート」は、出力値として予測値を出力する予測値導出装置のようなハードウェアであってもよいし、出力値として予測値を出力する予測値導出アルゴリズムのようなソフトウェアであってもよいし、出力値として予測値を何らかの手法で出力する人であってもよい。また、「エキスパート」は、予測値を出力するものに限られず、出力値として何らかの生成結果を出力する生成モデルであってもよいし、出力値として何らかの制御値を出力する制御モデルであってもよい。また、本例示的実施形態に係る情報処理装置1は、「エキスパート」を含む構成であってもよいし、外部の「エキスパート」から出力値を取得する構成であってもよい。なお、「エキスパート」は「モデル」または「エージェント」等とも呼称される。 In this exemplary embodiment, the "expert" may be any of hardware, software, or a living organism that outputs some kind of output value. As an example, the "expert" may be hardware such as a predicted value derivation device that outputs a predicted value as an output value, software such as a predicted value derivation algorithm that outputs a predicted value as an output value, or a person who outputs a predicted value as an output value using some method. In addition, the "expert" is not limited to one that outputs a predicted value, but may be a generation model that outputs some kind of generation result as an output value, or a control model that outputs some kind of control value as an output value. In addition, the information processing device 1 according to this exemplary embodiment may be configured to include an "expert" or may be configured to obtain an output value from an external "expert". In addition, the "expert" is also called a "model" or "agent", etc.

 また、本例示的実施形態において、「出力値」はどのようなものであってもよい。一例として、「出力値」は、例えば、需要や供給に関する予測値であってもよいし、その他の事例(事象)に関連した予測値であってもよい。また、「出力値」は予測に関するものでなくてもよい。例えば、本例示的実施形態における「出力値」は、情報処理装置1が参照する何らかのパラメータに関する出力値であってもよい。本例示的実施形態に係る情報処理装置1は、対象となる事象に関し、複数のエキスパートの各々からの出力値を参照して意思決定を行うプロセス全般に適用することができる。 Furthermore, in this exemplary embodiment, the "output value" may be anything. As an example, the "output value" may be, for example, a predicted value related to demand or supply, or a predicted value related to other cases (events). Furthermore, the "output value" does not have to be related to a prediction. For example, the "output value" in this exemplary embodiment may be an output value related to some parameter referenced by the information processing device 1. The information processing device 1 according to this exemplary embodiment can be applied to the general process of making decisions regarding a target event by referring to output values from each of multiple experts.

 また、本例示的実施形態において、「意思」とは、対象の事象に関する何らかの情報のことを指し、生体(人)が有する意思に限定的に解釈されるものではない。例えば、対象の商品に関する需要を予測するという適用シーンにおいて、将来の需要量に関する予測値は、本例示的実施形態に係る情報処理装置1が決定した「意思」、又は情報処理装置1が導出した「意思決定結果」の一例である。本例示的実施形態に係る情報処理装置1は、意思決定装置、または意思決定結果導出装置などと表現することもできる。なお、「意思決定結果」は、「最適化解」、「最適化結果」とも呼ばれる。 In addition, in this exemplary embodiment, "intention" refers to some information related to the target event, and is not limited to being interpreted as the intention of a living organism (person). For example, in an application scenario in which demand for a target product is predicted, a predicted value of future demand is an example of an "intention" determined by the information processing device 1 according to this exemplary embodiment, or a "decision-making result" derived by the information processing device 1. The information processing device 1 according to this exemplary embodiment can also be expressed as a decision-making device, a decision-making result derivation device, or the like. The "decision-making result" is also called an "optimization solution" or an "optimization result".

 また、本例示的実施形態では、複数のエキスパートの各々が提供する出力値に対応して、損失値が提供(取得)され得る。ここで、当該損失値は、情報処理装置1による「意思決定結果」に応じて、観測によって取得される場合もあるし、観測によっては取得できない場合もある。情報処理装置1は、観測によって取得できない損失値を、導出によって取得してもよい。どのような「意思決定結果」によって、どのような損失値を「観測」できるかは、一例として、フィードバックグラフと呼ばれる有向グラフの構造によって表現され得るが、これは本例示的実施形態を限定するものではない。 Furthermore, in this exemplary embodiment, a loss value can be provided (acquired) corresponding to an output value provided by each of the multiple experts. Here, the loss value may be acquired by observation depending on the "decision-making result" by the information processing device 1, or may not be able to be acquired depending on the observation. The information processing device 1 may acquire a loss value that cannot be acquired by observation by derivation. As an example, what loss value can be "observed" depending on what "decision-making result" can be expressed by a directed graph structure called a feedback graph, but this is not a limitation of this exemplary embodiment.

 本例示的実施形態において、損失値は、一例として出力値(予測値)と観測値(実測値)との相違として表現することができるが、これは本例示的実施形態を限定するものではない。損失値は、予測値と他の所定値との相違であってもよい。また、損失値は、損失に関する推定値であってもよい。また、「損失値」との文言は、「報酬」という概念を含み得る。例えば、損失値は、報酬値の符号を反転させたもの(報酬値に負の定数を乗じたもの)として表現することもできる。したがって、本例示的実施形態に係る損失値を報酬値と読み替えてもよい。 In this exemplary embodiment, the loss value can be expressed as the difference between the output value (predicted value) and the observed value (actual value), as an example, but this does not limit this exemplary embodiment. The loss value may be the difference between the predicted value and another predetermined value. The loss value may also be an estimated value related to the loss. The term "loss value" may also include the concept of "reward." For example, the loss value may be expressed as the reward value with the sign reversed (the reward value multiplied by a negative constant). Therefore, the loss value according to this exemplary embodiment may be read as the reward value.

 <情報処理装置1の構成>
 続いて、情報処理装置1の構成について図1を参照して説明する。図1は、情報処理装置1の構成を示すブロック図である。図1に示すように、情報処理装置1は、取得部11、及び導出部12を備える。
<Configuration of information processing device 1>
Next, the configuration of the information processing device 1 will be described with reference to Fig. 1. Fig. 1 is a block diagram showing the configuration of the information processing device 1. As shown in Fig. 1, the information processing device 1 includes an acquisition unit 11 and a derivation unit 12.

 (取得部11)
 取得部11は、複数のモデル(エキスパート)の各々から得られる出力値を取得する。ここで、「出力値」とは、上述したように、例えば、需要や供給に関する予測値であってもよいし、その他の事例(事象)に関連した出力値であってもよい。また、「出力値」は、情報処理装置1が参照する何らかのパラメータに関する出力値であってもよい。また、取得部11は、逐次的な意思決定処理において、ラウンド毎に、当該ラウンドにおける出力値を取得する構成とすることができるが、これは本例示的実施形態を限定するものではない。
(Acquisition unit 11)
The acquisition unit 11 acquires output values obtained from each of the multiple models (experts). Here, as described above, the "output value" may be, for example, a predicted value related to demand or supply, or an output value related to other cases (events). The "output value" may also be an output value related to some parameter referenced by the information processing device 1. The acquisition unit 11 may be configured to acquire an output value for each round in the sequential decision-making process, but this does not limit the present exemplary embodiment.

 (導出部12)
 導出部12は、
・前記取得手段が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
・前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する。ここで、第1の導出処理のことをベースアルゴリズムと呼称し、第2の導出処理のことをマスタアルゴリズムと表現することもあるが、当該文言は本例示的実施形態を限定するものではない。
(Derivation section 12)
The derivation unit 12 is
a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. Here, the first derivation process is sometimes referred to as a base algorithm, and the second derivation process is sometimes referred to as a master algorithm, but these terms do not limit this exemplary embodiment.

 また、信頼度とは、各エキスパートによる出力値を、意思決定処理においてどの程度反映するかを示す指標である。当該信頼度は、各々の第1の導出処理による第1の最適解を、意思決定処理においてどの程度反映するかを示す指標であると表現してもよい。信頼度は、一例として、各エキスパートによる予測値に演算される相対的な重みとして表現することもできるし、第1の最適解の各々に演算される相対的な重みとして表現することもできる。 Furthermore, reliability is an index showing to what extent the output value by each expert is reflected in the decision-making process. The reliability may be expressed as an index showing to what extent the first optimal solution by each first derivation process is reflected in the decision-making process. As an example, reliability can be expressed as a relative weight calculated on the predicted value by each expert, or as a relative weight calculated on each of the first optimal solutions.

 以上のように、本例示的実施形態に係る情報処理装置1は、1又は複数のモデル(エキスパート)の各々から得られる出力値を取得し、取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び、前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理装置1は、信頼度を用いた階層的な処理によって最適解の導出(意思決定)を行う。したがって、本例示的実施形態に係る情報処理装置1によれば、より適切な意思決定結果(最適解)を導出することができる。 As described above, the information processing device 1 according to this exemplary embodiment acquires output values obtained from each of one or more models (experts) and executes a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing device 1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing device 1 according to this exemplary embodiment, it is possible to derive a more appropriate decision-making result (optimal solution).

 <情報処理方法S1の流れ>
 続いて、本例示的実施形態1に係る情報処理方法S1の流れについて、図2を参照して説明する。図2は、情報処理方法S1の流れを示すフロー図である。
<Flow of information processing method S1>
Next, the flow of the information processing method S1 according to the present exemplary embodiment 1 will be described with reference to Fig. 2. Fig. 2 is a flow diagram showing the flow of the information processing method S1.

 (ステップS11)
 ステップS11において、取得部11は、1又は複数のモデル(エキスパート)の各々から得られる出力値を取得する。
(Step S11)
In step S11, the acquisition unit 11 acquires output values obtained from one or more models (experts).

 (ステップS12)
 ステップS12において、導出部12は、
・ステップS11において取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
・前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する。
(Step S12)
In step S12, the derivation unit 12
- Executing a plurality of first derivation processes to derive a first optimal solution by referring to the output values acquired in step S11, and - executing a second derivation process to derive a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 情報処理装置1は、あるラウンドにおけるステップS11及びステップS12の処理を行ったうえで、次のラウンドにおけるステップS11及びステップS12の処理を行う。 The information processing device 1 performs the processes of steps S11 and S12 in one round, and then performs the processes of steps S11 and S12 in the next round.

 図3は、本例示的実施形態に係る情報処理方法S1による逐次的意思決定処理を模式的に説明するための図である。図3に示すように、情報処理装置1は、あるラウンドにおいて、各エキスパートが提供する出力値を、複数のエキスパートについて取得する。そして、取得した各出力値を参照して意思決定結果(最適解)を導出する。そして、導出された意思決定結果(最適解)が実行され、各エキスパートによって次のラウンドにおける出力値が提供される。また、当該次のラウンドに対応する損失値が観測され得る。ここで、上述したように、当該損失値は、情報処理装置1による意思決定結果(最適解)に応じて、観測によって取得される場合もあるし、観測によっては取得できない場合もある。情報処理装置1は、観測によって取得できない損失値を、導出によって取得してもよい。このようにして、情報処理装置1は、意思決定結果の導出を逐次的に行う。 FIG. 3 is a diagram for illustrating a sequential decision-making process by the information processing method S1 according to this exemplary embodiment. As shown in FIG. 3, the information processing device 1 acquires output values provided by each expert for multiple experts in a certain round. Then, a decision-making result (optimal solution) is derived by referring to each acquired output value. Then, the derived decision-making result (optimal solution) is executed, and an output value in the next round is provided by each expert. In addition, a loss value corresponding to the next round can be observed. Here, as described above, the loss value may be acquired by observation depending on the decision-making result (optimal solution) by the information processing device 1, or may not be acquired by observation. The information processing device 1 may acquire a loss value that cannot be acquired by observation by derivation. In this way, the information processing device 1 sequentially derives a decision-making result.

 以上のように、本例示的実施形態に係る情報処理方法S1においては、1又は複数のモデル(エキスパート)の各々から得られる出力値を取得し、取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び、前記複数の第1の導出処理の各々が導出した第1の最適解と前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理方法S1は、信頼度を用いた階層的な処理によって最適解の導出(意思決定)を行う。したがって、本例示的実施形態に係る情報処理方法S1によれば、より適切な意思決定結果(最適解)を導出することができる。 As described above, the information processing method S1 according to this exemplary embodiment executes a plurality of first derivation processes that acquire output values from each of one or more models (experts) and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing method S1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing method S1 according to this exemplary embodiment, a more appropriate decision-making result (optimal solution) can be derived.

 <情報処理システム100の構成>
 続いて、本例示的実施形態に係る情報処理システム100の構成について図4を参照して説明する。図4は、情報処理システム100の構成を示すブロック図である。図4に示すように、情報処理装置100は、互いに通信可能に接続された情報処理装置1と端末装置2とを備えている。情報処理装置1が備える各構成については上述したためここでは説明を省略する。
<Configuration of information processing system 100>
Next, the configuration of the information processing system 100 according to this exemplary embodiment will be described with reference to Fig. 4. Fig. 4 is a block diagram showing the configuration of the information processing system 100. As shown in Fig. 4, the information processing device 100 includes an information processing device 1 and a terminal device 2 that are communicably connected to each other. Each component of the information processing device 1 has been described above, and therefore description thereof will be omitted here.

 (端末装置2)
 図4に示すように、端末装置2は、実行部21と、損失値取得部22とを備えている。実行部21は、情報処理装置1が導出した意思決定結果(最適解)、又は当該意思決定結果(最適解)に対応する処理を実行する。一例として、意思決定結果が、商品Aに関する本日の需要としてX個を予測するものである場合、実行部21は、商品AについてX個の発注を行う。
(Terminal device 2)
4, the terminal device 2 includes an execution unit 21 and a loss value acquisition unit 22. The execution unit 21 executes the decision-making result (optimal solution) derived by the information processing device 1, or a process corresponding to the decision-making result (optimal solution). As an example, if the decision-making result predicts X units of product A as today's demand, the execution unit 21 places an order for X units of product A.

 <情報処理方法S100の流れ>
 続いて、本例示的実施形態1に係る情報処理方法S100の流れについて、図5を参照して説明する。図5は、情報処理システム100が実行する情報処理方法S100の流れを示すフロー図である。
<Flow of information processing method S100>
Next, the flow of the information processing method S100 according to the first exemplary embodiment will be described with reference to Fig. 5. Fig. 5 is a flow diagram showing the flow of the information processing method S100 executed by the information processing system 100.

 (ステップS11-1、S12-1)
 図4に示すように、ステップS11-1において、取得部11は、複数のエキスパート(モデル)の各々から得られる出力値を取得する。
(Steps S11-1, S12-1)
As shown in FIG. 4, in step S11-1, the acquisition unit 11 acquires output values obtained from each of a plurality of experts (models).

 ステップS12-1において、導出部12は、
・ステップS11-1において取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
・前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する。
In step S12-1, the derivation unit 12
- Executing a plurality of first derivation processes to derive a first optimal solution by referring to the output values acquired in step S11-1, and - executing a second derivation process to derive a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 (ステップS21-1、S22-1)
 ステップS21-1において、端末装置2の実行部21は、ステップS12-1において導出された意思決定結果(より具体的には第2の最適解)、又は当該意思決定結果に対応する処理を実行する。ステップS22-1において、端末装置2は、当該実行の結果を情報処理装置1に提供する。実行した意思決定結果によって損失値が得られる場合、当該実行の結果には、損失値が含まれ得る。
(Steps S21-1, S22-1)
In step S21-1, the execution unit 21 of the terminal device 2 executes the decision-making result derived in step S12-1 (more specifically, the second optimal solution) or a process corresponding to the decision-making result. In step S22-1, the terminal device 2 provides the result of the execution to the information processing device 1. If a loss value is obtained by the executed decision-making result, the result of the execution may include the loss value.

 (ステップS11-2、S12-2)
 ステップS11-2において、取得部11は、複数のエキスパート(モデル)の各々から得られる当該ラウンド(ラウンドt=2)に関する出力値を取得する。ここで、当該各予測値は、一例として、ステップS22-1において損失値取得部22が取得した損失値であってもよいし、情報処理装置1が導出した損失値であってもよい。
(Steps S11-2, S12-2)
In step S11-2, the acquisition unit 11 acquires output values for the round (round t=2) obtained from each of the multiple experts (models). Here, each predicted value may be, for example, the loss value acquired by the loss value acquisition unit 22 in step S22-1 or the loss value derived by the information processing device 1.

 ステップS12-2において、導出部12は、
・ステップS11-2において取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
・前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する。ステップS12-2において導出された意思決定結果(より具体的には第2の最適解)は、端末装置2に提供され、ステップS21-2において実行される。
In step S12-2, the derivation unit 12
A plurality of first derivation processes are executed to derive a first optimal solution by referring to the output values acquired in step S11-2, and a second derivation process is executed to derive a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. The decision-making result derived in step S12-2 (more specifically, the second optimal solution) is provided to the terminal device 2 and executed in step S21-2.

 以上のように、本例示的実施形態に係る情報処理方法S100においては、1又は複数のモデル(エキスパート)の各々から得られる出力値を取得し、取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び、前記複数の第1の導出処理の各々が導出した第1の最適解と前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理方法S100は、信頼度を用いた階層的な処理によって最適解の導出(意思決定)を行う。したがって、本例示的実施形態に係る情報処理方法S100によれば、より適切な意思決定結果(最適解)を導出することができる。 As described above, the information processing method S100 according to this exemplary embodiment executes a plurality of first derivation processes that acquire output values from each of one or more models (experts) and derive a first optimal solution by referring to the acquired output values, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing method S100 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing method S100 according to this exemplary embodiment, a more appropriate decision-making result (optimal solution) can be derived.

 〔例示的実施形態2〕
 本発明の実施形態の一例である第2の例示的実施形態について、図面を参照して詳細に説明する。上述した例示的実施形態にて説明した構成要素と同じ機能を有する構成要素については、同じ符号を付し、その説明を適宜省略する。なお、本例示的実施形態において採用する各技術的手段の適用範囲は、本例示的実施形態に限定されない。すなわち、本例示的実施形態において採用する各技術的手段は、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。また、本例示的実施形態を説明するために参照する各図面に示される各技術的手段は、特段の技術的支障が生じない範囲で、本開示に含まれる他の例示的実施形態においても採用可能である。
Exemplary embodiment 2
A second exemplary embodiment, which is an example of an embodiment of the present invention, will be described in detail with reference to the drawings. Components having the same functions as those described in the above exemplary embodiment will be given the same reference numerals, and their description will be omitted as appropriate. The scope of application of each technical means adopted in this exemplary embodiment is not limited to this exemplary embodiment. That is, each technical means adopted in this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure, as long as no particular technical hindrance occurs. In addition, each technical means shown in each drawing referred to for explaining this exemplary embodiment can be adopted in other exemplary embodiments included in this disclosure, as long as no particular technical hindrance occurs.

 <情報処理システム100Aの構成>
 本例示的実施形態に係る情報処理システム100Aの構成について、図6を参照して説明する。図6は、情報処理システム100Aの構成を示すブロック図である。図6に示すように、情報処理システム100Aは、情報処理装置1Aと、端末装置2Aとを含んでいる。また、図6に示すように、情報処理装置1Aと端末装置2AとはネットワークNを介して通信可能に構成されている。ここで、ネットワークNの具体的構成は本例示的実施形態を限定するものではないが、一例として、無線LAN(Local Area Network)、有線LAN、WAN(Wide Area Network)、公衆回線網、モバイルデータ通信網、又は、これらのネットワークの組み合わせを用いることができる。
<Configuration of Information Processing System 100A>
The configuration of an information processing system 100A according to this exemplary embodiment will be described with reference to Fig. 6. Fig. 6 is a block diagram showing the configuration of the information processing system 100A. As shown in Fig. 6, the information processing system 100A includes an information processing device 1A and a terminal device 2A. Also, as shown in Fig. 6, the information processing device 1A and the terminal device 2A are configured to be able to communicate with each other via a network N. Here, the specific configuration of the network N does not limit this exemplary embodiment, but as an example, a wireless LAN (Local Area Network), a wired LAN, a WAN (Wide Area Network), a public line network, a mobile data communication network, or a combination of these networks can be used.

 <情報処理装置1Aの構成>
 本例示的実施形態に係る情報処理装置1Aの構成について、図6を参照して説明する。図6は、情報処理装置1Aの構成を示すブロック図である。
<Configuration of information processing device 1A>
The configuration of an information processing device 1A according to this exemplary embodiment will be described with reference to Fig. 6. Fig. 6 is a block diagram showing the configuration of the information processing device 1A.

 図6に示すように、情報処理装置1Aは、制御部10Aと、記憶部15Aと、通信部16Aとを備えている。 As shown in FIG. 6, the information processing device 1A includes a control unit 10A, a memory unit 15A, and a communication unit 16A.

 通信部16Aは、情報処理装置1Aの外部の装置と通信を行う。一例として通信部16Aは、端末装置2Aと通信を行う。通信部16Aは、制御部10Aから供給されたデータを端末装置2Aに送信したり、端末装置2Aから受信したデータを制御部10Aに供給したりする。 The communication unit 16A communicates with devices external to the information processing device 1A. As an example, the communication unit 16A communicates with the terminal device 2A. The communication unit 16A transmits data supplied from the control unit 10A to the terminal device 2A, and supplies data received from the terminal device 2A to the control unit 10A.

 (記憶部15A)
 記憶部15Aには、制御部10Aによって参照される各種の情報、及び制御部10Aによって導出された各種の情報が格納されている。一例として、記憶部15Aには、
・複数のエキスパートの各々による出力値を含む出力値情報PI、
・複数のエキスパートの各々による出力値に対応する損失値を含む損失値情報LI
・複数のエキスパートの各々の信頼度、及び後述する複数の第1の導出部121-1、121-2、・・・の各々の信頼度の少なくとも何れかを示す信頼度情報CI
・後述する複数の第1の導出部121-1、121-2、・・・の各々が導出した第1の最適解(第1の意思決定結果)DR1-1、DR1-2、・・・、及び
・後述する第2の導出部122が導出した第2の最適解(第2の意思決定結果)DR2
が格納されている。
(Storage unit 15A)
The storage unit 15A stores various information referenced by the control unit 10A and various information derived by the control unit 10A.
Output value information PI including output values by each of a plurality of experts;
Loss value information LI including loss values corresponding to the output values by each of a plurality of experts
Reliability information CI indicating at least one of the reliability of each of the multiple experts and the reliability of each of the multiple first derivation units 121-1, 121-2, ... described later
A first optimal solution (first decision-making result) DR1-1, DR1-2, ... derived by each of a plurality of first derivation units 121-1, 121-2, ... described later, and A second optimal solution (second decision-making result) DR2 derived by a second derivation unit 122 described later
is stored.

 なお、例示的実施形態1において説明したように、本例示的実施形態に係る情報処理装置システム100Aは、「エキスパート(モデル)」を含む構成であってもよいし、システム外部の「エキスパート(モデル)」から上記出力値を取得する構成であってもよい。 As described in the first exemplary embodiment, the information processing device system 100A according to this exemplary embodiment may be configured to include an "expert (model)" or may be configured to obtain the above output values from an "expert (model)" external to the system.

 (制御部10A)
 制御部10Aは、図6に示すように、取得部11、及び導出部12を備えている。
(Control unit 10A)
As shown in FIG. 6 , the control unit 10A includes an acquisition unit 11 and a derivation unit 12 .

 (取得部11)
 取得部11は、例示的実施形態1と同様に、複数のエキスパート(モデル)の各々から得られる出力値を取得する。ここで、当該出力値に対応する損失値が、観測によって取得可能な場合、取得部11は、当該損失値を更に取得してもよい。出力値、及び損失値については例示的実施形態1において説明したため同様の説明は省略する。出力値、及び損失値の具体例については後述する。
(Acquisition unit 11)
The acquiring unit 11 acquires output values obtained from each of a plurality of experts (models) in the same manner as in the exemplary embodiment 1. Here, when a loss value corresponding to the output value can be acquired by observation, the acquiring unit 11 may further acquire the loss value. The output value and the loss value have been explained in the exemplary embodiment 1, and therefore the explanation thereof will be omitted. Specific examples of the output value and the loss value will be explained later.

 (導出部12)
 図6に示すように、導出部12は、複数の第1の導出部12-1、121-2、・・・、及び第2の導出部122を備えている。本例示的実施形態に係る導出部12は、例示的実施形態1と同様に、
・取得部11が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
・前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する。本例示的実施形態では、上記複数の第1の導出処理の各々は、複数の第1の導出部121-1、121-2、・・・の各々によって実行され、上記第2の導出処理は、第2の導出部122によって実行される。
(Derivation section 12)
6, the lead-out portion 12 includes a plurality of first lead-out portions 12-1, 12-2, ..., and a second lead-out portion 122. The lead-out portion 12 according to this exemplary embodiment, like the first exemplary embodiment,
A plurality of first derivation processes are executed to derive a first optimal solution by referring to the output values acquired by the acquisition unit 11, and a second derivation process is executed to derive a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In this exemplary embodiment, each of the plurality of first derivation processes is executed by each of a plurality of first derivation units 121-1, 121-2, ..., and the second derivation process is executed by a second derivation unit 122.

 換言すれば、各々の第1の導出部121-j(jは第1の導出部を互いに識別するためのインデックス)は、ラウンドtにおいて、取得部11が取得した各出力値(m)を参照して第1の最適解(第1の意思決定結果)(p(j))を導出する。また、第2の導出部122は、複数の第1の導出部121-1、121-2、・・・の各々が導出した第1の最適解(第1の意思決定結果)(p(j))と、前記複数の第1の導出手段の各々の信頼度(w(j))とに応じて第2の最適解(第2の意思決定結果)を導出する。 In other words, each first derivation unit 121-j (j is an index for distinguishing the first derivation units from one another) derives a first optimal solution (first decision-making result) (p t (j)) by referring to each output value (m t ) acquired by the acquisition unit 11 in round t. Also, the second derivation unit 122 derives a second optimal solution (second decision-making result) in accordance with the first optimal solution (first decision-making result) (p t (j)) derived by each of the multiple first derivation units 121-1, 121-2, ... and the reliability (w t (j)) of each of the multiple first derivation means.

 なお、第1の導出処理のことをベースアルゴリズムと呼称し、第2の導出処理のことをマスタアルゴリズムと表現することもあるが、当該文言は本例示的実施形態を限定するものではない。また、第1の導出部121-1、121-2、・・・の各々を、単に、第1の導出部121と表記することもある。また、前記第1の導出処理及び前記第2の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理(オンライン学習アルゴリズム)であると表現することもできる。 The first derivation process may be referred to as a base algorithm, and the second derivation process may be referred to as a master algorithm, but these terms do not limit this exemplary embodiment. Each of the first derivation units 121-1, 121-2, ... may be simply referred to as the first derivation unit 121. The first derivation process and the second derivation process may also be referred to as online machine learning processes (online learning algorithms) that refer to the output values that are sequentially obtained.

 また、上述の信頼度は、例示的実施形態1でも説明したように、各エキスパート(モデル)による出力値を、意思決定処理においてどの程度反映するかを示す指標である。当該信頼度は、各々の第1の導出処理による第1の最適解を、意思決定処理においてどの程度反映するかを示す指標であると表現してもよい。信頼度は、一例として、各エキスパートによる予測値に演算される相対的な重みとして表現することもできるし、第1の最適解の各々に演算される相対的な重みとして表現することもできる。 Furthermore, as explained in the first exemplary embodiment, the reliability is an index indicating the extent to which the output values by each expert (model) are reflected in the decision-making process. The reliability may be expressed as an index indicating the extent to which the first optimal solutions by each first derivation process are reflected in the decision-making process. As an example, the reliability can be expressed as a relative weight calculated on the predicted values by each expert, or as a relative weight calculated on each of the first optimal solutions.

 また、導出部12は、取得部11が取得した出力値、及び、前記複数の第1の導出処理の各々が導出した第1の最適解の少なくとも何れかを参照して、前記信頼度を導出してもよい。また、当該信頼度の導出処理を、前記第2の導出処理の一部として実行してもよい。当該構成によれば、前記出力値及び前記第1の最適解の少なくとも何れかを参照して、前記信頼度を導出するので、好適な信頼度を導出することができる。また、そのように好適な信頼度を参照することにより、好適な第2の最適解を導出することができる。なお、信頼度のより具体的な導出処理例については後述する。 The derivation unit 12 may derive the reliability by referring to the output value acquired by the acquisition unit 11 and at least one of the first optimal solutions derived by each of the multiple first derivation processes. The reliability derivation process may be executed as part of the second derivation process. With this configuration, the reliability is derived by referring to at least one of the output value and the first optimal solution, so that a suitable reliability can be derived. By referring to such a suitable reliability, a suitable second optimal solution can be derived. A more specific example of the reliability derivation process will be described later.

 また、導出部12は、前記信頼度を、所定のタイミングで初期化する構成としてもよい。一例として、導出部12は、第2の最適解を所定の回数(例えば100回)導出する毎に、前記信頼度を初期化する(信頼度を初期値に設定する)構成としてもよい。このように、導出部12が、前記信頼度を所定のタイミングで初期化することにより、環境の変化に好適に対応することができる。なお、前記信頼度の初期化は、上述した第1の導出処理及び第2の導出処理の少なくとも何れかをリスタートする処理の一部として実行されてもよい。 The derivation unit 12 may also be configured to initialize the reliability at a predetermined timing. As an example, the derivation unit 12 may be configured to initialize the reliability (set the reliability to an initial value) every time the second optimal solution is derived a predetermined number of times (e.g., 100 times). In this way, the derivation unit 12 can appropriately respond to changes in the environment by initializing the reliability at a predetermined timing. Note that the initialization of the reliability may be executed as part of the process of restarting at least one of the first derivation process and the second derivation process described above.

 また、導出部12は、信頼度を初期化するタイミングを、環境変化の周期に応じて決定してもよい。一例として、本例示的実施形態に係る情報処理装置システム100Aによる意思決定処理を、環境変化が概ね1ヶ月程度の周期で起こる状況に適用する場合、導出部12は、前記信頼度を、1ヶ月毎に初期化する構成としてもよい。 The derivation unit 12 may also determine the timing for initializing the reliability depending on the period of the environmental change. As an example, when the decision-making process by the information processing device system 100A according to this exemplary embodiment is applied to a situation in which the environmental change occurs approximately once a month, the derivation unit 12 may be configured to initialize the reliability every month.

 また、導出部12は、前記第1の最適解に対応する損失値を推定する処理を行ってもよい。ここで、当該損失値を推定する処理は、上述した第2の導出処理の一部として実行してもよい。また、導出部12は、上記第1の導出処理において、当該導出した損失値を参照して、前記第1の最適解を更新する処理を行ってもよい。
 上述したように、意思決定の結果によっては、出力値に対応する損失値が、観測によって取得可能な場合もあるし、取得不可能な場合もある。上記の構成によれば、導出部12は、前記第1の最適解に対応する損失値を推定し、推定した損失値を参照して、当該第1の最適解を更新するので、出力値に対応する損失値が、観測によって取得不可能な場合であっても、好適な最適解(意思決定結果)を導出することができる。
The derivation unit 12 may also perform a process of estimating a loss value corresponding to the first optimal solution. Here, the process of estimating the loss value may be executed as a part of the above-mentioned second derivation process. The derivation unit 12 may also perform a process of updating the first optimal solution by referring to the derived loss value in the above-mentioned first derivation process.
As described above, depending on the result of the decision-making, the loss value corresponding to the output value may be obtainable by observation in some cases, but may not be obtainable in other cases. According to the above configuration, the derivation unit 12 estimates a loss value corresponding to the first optimal solution and updates the first optimal solution by referring to the estimated loss value, so that even if the loss value corresponding to the output value cannot be obtained by observation, it is possible to derive a suitable optimal solution (decision-making result).

 また、導出部12は、前記損失値を、不偏推定量として算出する構成としてもよい。当該構成によれば、不偏推定量として算出された損失値を参照して最適化の導出を行うので、より好適な最適解(意思決定結果)を導出することができる。 The derivation unit 12 may also be configured to calculate the loss value as an unbiased estimator. With this configuration, the optimization is derived by referring to the loss value calculated as an unbiased estimator, so that a more suitable optimal solution (decision-making result) can be derived.

 また、導出部12は、損失値又は信頼度を補正するためのパラメータを更に参照して、前記意思決定結果を導出してもよい。当該構成によれば、各エキスパート(モデル)の信頼度、各第1の導出処理の信頼度、及び損失値の少なくとも何れかを、当該パラメータによって適宜補正可能となるので、情報処理装置1に関し、最低限の精度を保証した運用、いわゆる保守的運用が可能となる。 The derivation unit 12 may further refer to parameters for correcting the loss value or the reliability to derive the decision-making result. With this configuration, at least one of the reliability of each expert (model), the reliability of each first derivation process, and the loss value can be appropriately corrected using the parameters, making it possible to operate the information processing device 1 with a minimum level of accuracy guaranteed, i.e., a conservative operation.

 <端末装置2Aの構成>
 端末装置2Aは、図6に示すように、制御部20A、実行部21、及び通信部26を備えている。端末装置2Aは、一例として、店舗に配置された会計用端末、及び倉庫に配置された在庫管理用端末等として具体的に実現することができるが、これは本例示的実施形態を限定するものではない。
<Configuration of terminal device 2A>
6, the terminal device 2A includes a control unit 20A, an execution unit 21, and a communication unit 26. As an example, the terminal device 2A can be specifically realized as a checkout terminal installed in a store, an inventory management terminal installed in a warehouse, or the like, but this is not intended to limit the present exemplary embodiment.

 通信部26は、端末装置2Aの外部の装置と通信を行う。一例として通信部26は、情報処理装置1Aと通信を行う。通信部26は、制御部20Aから供給されたデータを情報処理装置1Aに送信したり、情報処理装置1Aから受信したデータを制御部20Aに供給したりする。 The communication unit 26 communicates with devices external to the terminal device 2A. As an example, the communication unit 26 communicates with the information processing device 1A. The communication unit 26 transmits data supplied from the control unit 20A to the information processing device 1A, and supplies data received from the information processing device 1A to the control unit 20A.

 実行部21は、導出部12が導出した意思決定結果又は当該意思決定結果に対応する処理を行う。一例として、実行部21は、導出部12が導出した(第2の最適解)第2の意思決定結果を実行する。実行部21は、導出部12が導出した第2の意思決定結果に代えて、又はそれと共に、導出部12が導出した複数の第1の意思決定結果の少なくとも一部を実行する構成としてもよい。また、実行部21は、導出部12が導出した第2の意思決定結果、及び、導出部12が導出した複数の第1の意思決定結果の少なくとも一部を表示可能に構成されていてもよいし、第2の意思決定結果、及び、複数の第1の意思決定結果を互いに比較可能に表示する構成としてもよい。 The execution unit 21 performs the decision-making result derived by the derivation unit 12 or a process corresponding to the decision-making result. As an example, the execution unit 21 executes the second decision-making result (second optimal solution) derived by the derivation unit 12. The execution unit 21 may be configured to execute at least a part of the multiple first decision-making results derived by the derivation unit 12 instead of or together with the second decision-making result derived by the derivation unit 12. Furthermore, the execution unit 21 may be configured to be able to display the second decision-making result derived by the derivation unit 12 and at least a part of the multiple first decision-making results derived by the derivation unit 12, or may be configured to display the second decision-making result and the multiple first decision-making results so that they can be compared with each other.

 制御部20Aは、図6に示すように、損失値取得部22、及び損失値提供部23を備えている。損失値取得部22は、実行部21による実行の結果として、各エキスパート(モデル)に関する損失値が観測(測定)によって取得可能である場合に、当該損失値を取得する。損失値提供部23は、損失値取得部22が取得した損失値を、通信部26を介して情報処理装置1Aに提供する。 As shown in FIG. 6, the control unit 20A includes a loss value acquisition unit 22 and a loss value providing unit 23. When a loss value for each expert (model) can be acquired by observation (measurement) as a result of execution by the execution unit 21, the loss value acquisition unit 22 acquires the loss value. The loss value providing unit 23 provides the loss value acquired by the loss value acquisition unit 22 to the information processing device 1A via the communication unit 26.

 図7は、本例示的実施形態に係る情報処理システム100Aによる逐次的意思決定処理を模式的に説明するための図である。図7に示すように、情報処理装置1Aは、あるラウンドにおいて、各エキスパート(モデル)に対応付けられた出力値を、複数のエキスパートについて取得し、取得した各出力値を参照して意思決定結果を導出する。ここで、当該主力値に対応付けられた損失値が観測によって取得可能である場合、当該損失値を更に取得し、取得した損失値を更に参照して思決定結果を導出してもよい。また、情報処理装置1Aが行う意思決定処理は、一例として、上述したように、複数の第1の導出部12-1、121-2、・・・、及び第2の導出部122による、階層的な意思決定処理である。 FIG. 7 is a diagram for explaining the sequential decision-making process by the information processing system 100A according to this exemplary embodiment. As shown in FIG. 7, in a certain round, the information processing device 1A acquires output values associated with each expert (model) for multiple experts, and derives a decision-making result by referring to each acquired output value. Here, if a loss value associated with the force value can be acquired by observation, the loss value may be further acquired, and the decision-making result may be derived by further referring to the acquired loss value. In addition, the decision-making process performed by the information processing device 1A is, as an example, a hierarchical decision-making process by multiple first derivation units 12-1, 121-2, ... and second derivation unit 122, as described above.

 そして、導出された意思決定結果(最適解)が実行されることにより、次のラウンドに対応する損失値が観測され得る。損失値が観測によって取得可能である場合、当該損失値と、当該損失値に対応する出力値とが情報処理装置1Aに提供され、当該次のラウンドにおける意思決定処理において参照される。このようにして、情報処理システム100Aは、意思決定結果の導出を逐次的に行う。 Then, the derived decision-making result (optimal solution) is executed, and a loss value corresponding to the next round can be observed. If a loss value can be obtained by observation, the loss value and an output value corresponding to the loss value are provided to the information processing device 1A and are referenced in the decision-making process in the next round. In this way, the information processing system 100A sequentially derives decision-making results.

 (情報処理装置1Aによる具体的処理例)
 以下では、情報処理装置1Aによる具体的処理例について説明する。本例に係る情報処理装置1Aは、以下のアルゴリズム1とアルゴリズム2とを実行する。アルゴリズム1は、主として、複数の第1の導出部121-1、121-2、・・・の各々によって実行され、アルゴリズム2は、主として第2の導出部122によって実行される。

Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
 (アルゴリズム1の処理)
 まず、アルゴリズム1の処理について説明する。アルゴリズム1の冒頭に示すように、取得部11は、グラフG、パラメータη、及びパラメータTを取得する。ここで、グラフGは、アルゴリズム1の冒頭に示すように、頂点V、及びエッジ(辺、リンク)Eとによって規定される。本例示的実施形態において、グラフGは、一例として、フィードバックグラフと呼ばれる有向グラフである。 (Specific processing example by information processing device 1A)
A specific example of processing by the information processing device 1A will be described below. The information processing device 1A according to this example executes the following algorithm 1 and algorithm 2. Algorithm 1 is mainly executed by each of the multiple first derivation units 121-1, 121-2, ..., and algorithm 2 is mainly executed by the second derivation unit 122.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
(Processing of Algorithm 1)
First, the process of Algorithm 1 will be described. As shown at the beginning of Algorithm 1, the acquisition unit 11 acquires a graph G, a parameter η, and a parameter T. Here, as shown at the beginning of Algorithm 1, the graph G is defined by a vertex V and an edge (side, link) E. In this exemplary embodiment, the graph G is, as an example, a directed graph called a feedback graph.

 フィードバックグラフでは、各頂点Vは、意思決定結果が取り得る選択肢に対応し、各エッジは損失の観測可能性を示す。一例として、頂点V(i)から頂点V(j)への有向エッジE(i,j)は、意思決定結果が選択肢iを示す場合(プレーヤが選択肢iを選択した場合)に、選択肢jに付随した損失値が観測可能であることを示す。ここで、i=jの場合は、セルフループとも呼ばれる。 In the feedback graph, each vertex V corresponds to a possible option that the decision-making result can take, and each edge indicates the observability of a loss. As an example, a directed edge E(i,j) from vertex V(i) to vertex V(j) indicates that if the decision-making result indicates option i (if the player selects option i), then the loss value associated with option j is observable. Here, the case of i=j is also called a self-loop.

 図8の上段は、フィードバックグラフの例1を示している。当該例は、「リンゴがおいしいなら出荷したほうがよいが、そうでないなら出荷しないほうがよい」という問題設定に対応するフィードバックグラフである。図8の上段に示すように、当該例のフィードバックグラフには、リンゴを出荷しない選択肢V(1)と、リンゴを出荷する選択肢V(2)とが存在し、リンゴを出荷しない選択肢V(1)を選択した場合、当該リンゴを味見できるので当該リンゴがおいしいかどうかわかることになる。したがって、当該リンゴを出荷した方が良かったか否かがわかることになる。これは、リンゴを出荷しない選択肢V(1)を選択した場合、
・リンゴを出荷しない選択肢V(1)に付随した損失値が観測可能であり(エッジE(1,1)に対応)、
・リンゴを出荷する選択肢V(2)に付随した損失値も観測可能である(エッジE(1,2)に対応)
ことを示している。
The top part of Fig. 8 shows Example 1 of a feedback graph. This example is a feedback graph that corresponds to the problem setting of "If the apples are delicious, it is better to ship them, but if they are not, it is better not to ship them." As shown in the top part of Fig. 8, the feedback graph of this example has option V(1) of not shipping the apples and option V(2) of shipping the apples. If option V(1) of not shipping the apples is selected, the apples can be tasted and it is possible to know whether they are delicious or not. Therefore, it is possible to know whether it would have been better to ship the apples or not. This means that if option V(1) of not shipping the apples is selected,
The loss value associated with the option V(1) of not shipping apples is observable (corresponding to edge E(1,1)),
The loss value associated with option V(2) of shipping apples is also observable (corresponding to edge E(1,2)).
This shows that.

 一方で、リンゴを出荷する選択肢V(2)を選択した場合、当該リンゴを味見できないので当該リンゴがおいしいかどうかわからないことになる。したがって、リンゴを出荷する選択肢V(2)を選択した場合には、何らの損失値も観測できないことを示している。これは、フィードバックグラフにおいて、選択肢V(2)を起点(始点)とする外向き矢印(外向きエッジ)が存在しないことに対応する。 On the other hand, if option V(2), which involves shipping the apples, is selected, the apples cannot be tasted and therefore it is not known whether they are delicious or not. This indicates that if option V(2), which involves shipping the apples, is selected, no loss value can be observed. This corresponds to the absence of an outward arrow (outward edge) in the feedback graph that has option V(2) as its origin (starting point).

 図8の下段は、フィードバックグラフの例2を示している。当該例は、「ある商品について適切な量を発注したい」という問題設定に対応するフィードバックグラフである。図8の下段に示すように、当該例のフィードバックグラフには、300個発注する選択肢V(1)と、200個発注する選択肢V(2)と、100個発注する選択肢V(3)とが存在し、より多く発注した場合には、より少なく発注した場合の損失値が観測できるが、より少なく発注した場合には、より多く発注した場合の損失値を観測できない場合があることを示している。 The lower part of Figure 8 shows feedback graph example 2. This example is a feedback graph that corresponds to the problem statement of "want to order the appropriate amount of a certain product." As shown in the lower part of Figure 8, the feedback graph of this example has option V(1) of ordering 300 units, option V(2) of ordering 200 units, and option V(3) of ordering 100 units, which shows that when a larger amount is ordered, the loss value when a smaller amount is ordered can be observed, but when a smaller amount is ordered, it may not be possible to observe the loss value when a larger amount is ordered.

 このように、フィードバックグラフによって、様々な問題設定を表現することができる。例えば、バンディットフィードバックは、複数の頂点の各々がセルフループのみしか有しない場合に対応し、フルフィードバック(full information feedback)は、複数の頂点に含まれる全てのペアが双方向に連結され、かつ、全ての頂点がセルフループを有する場合に対応する。 In this way, feedback graphs can be used to express a variety of problem settings. For example, bandit feedback corresponds to the case where each of the multiple vertices has only self-loops, and full information feedback corresponds to the case where all pairs contained in the multiple vertices are bidirectionally connected and all vertices have self-loops.

 本例示的実施形態に係るアルゴリズム1は、任意のフィードバックグラフに適用可能に構成されているため、本例示的実施形態に係る情報処理システム100Aは、非常に広範な問題設定に対して適用することができる。 Since algorithm 1 according to this exemplary embodiment is configured to be applicable to any feedback graph, information processing system 100A according to this exemplary embodiment can be applied to a very wide range of problem settings.

 アルゴリズム1の説明に戻ると、取得部11が取得するパラメータηは、後述するように、信頼度を導出するために参照される学習率としての役割を有している。また、取得部11が取得するパラメータTは、ラウンドの総数を規定する自然数である。 Returning to the explanation of algorithm 1, the parameter η acquired by the acquisition unit 11 serves as a learning rate that is referenced to derive the reliability, as described below. In addition, the parameter T acquired by the acquisition unit 11 is a natural number that specifies the total number of rounds.

 続いて、アルゴリズム1の「1:」に示すように、第1の導出部121は、第1の最適解(第1の意思決定結果)pを導出するために参照するパラメータ(重み因子)p’の初期値を、

Figure JPOXMLDOC01-appb-M000003
によって設定する。ここで、Kは、第1の最適解(第1の意思決定結果)として取り得る選択肢の総数を表しており、上述したフィードバックグラフの頂点の総数|V|でもある。後述の説明からも理解されるように、pを導出するために参照するパラメータ(重み因子)p’は、各エキスパート(モデル)の出力値(m)の各々の信頼度を表すパラメータと捉えることができる。ただし当該解釈は本例示的実施形態を限定するものではない。 Next, as shown in “1:” of Algorithm 1, the first derivation unit 121 determines the initial value of a parameter (weighting factor) p′ t to be referenced in order to derive a first optimal solution (first decision-making result) p t as follows:
Figure JPOXMLDOC01-appb-M000003
Here, K represents the total number of options that can be taken as the first optimal solution (first decision-making result), and is also the total number of vertices |V| of the feedback graph described above. As will be understood from the explanation below, the parameter (weighting factor) p' t referred to in order to derive p t can be considered as a parameter representing the reliability of each output value (m t ) of each expert (model). However, this interpretation does not limit this exemplary embodiment.

 また、上記(式1)において、右辺の分子は、

Figure JPOXMLDOC01-appb-M000004
によって定義されるK次元のベクトル(配列)である。ここで、[K]は、
Figure JPOXMLDOC01-appb-M000005
によって規定される集合(すなわち、自然数1からKまでを要素とする集合)を表している。 In addition, in the above formula (1), the numerator on the right side is
Figure JPOXMLDOC01-appb-M000004
Here, [K] is a K-dimensional vector (array) defined by
Figure JPOXMLDOC01-appb-M000005
(i.e., a set whose elements are the natural numbers 1 through K).

 続いて、第1の導出部121は、アルゴリズム1の「2:」~「7:」によって特定されるループ処理を実行する。当該ループ処理は、ラウンドを示すインデックスtをインクリメントしながら、t=T(Tはラウンドの総数)となるまで実行される。換言すれば、第1の導出部121は、アルゴリズム1の「3:」~「7:」において特定される処理を、ラウンド毎に繰り返す。 Then, the first derivation unit 121 executes the loop process specified by "2:" to "7:" in algorithm 1. The loop process is executed while incrementing an index t indicating the round, until t = T (T is the total number of rounds). In other words, the first derivation unit 121 repeats the process specified in "3:" to "7:" in algorithm 1 for each round.

 より具体的には、まず、アルゴリズム1の「3:」に示すように、第1の導出部121が、関数ψ(p)を、

Figure JPOXMLDOC01-appb-M000006
によって設定(定義)する。当該関数ψ(p)は、後述するブレグマン情報量(Bregman divergence)を規定する凸関数としての役割を有している。上記式において、p(i)は、
Figure JPOXMLDOC01-appb-M000007
によって規定される集合の要素である。また、関数ψ(p)の定義式におけるNin(i)は、
Figure JPOXMLDOC01-appb-M000008
によって規定される集合(換言すれば、頂点番号iの頂点V(i)について、当該頂点V(i)に入ってくるエッジの起点(始点)となる頂点V(j)の頂点番号jを要素とする集合)であり、|Nin(i)|は、当該集合の要素の数を示している。また、関数ψ(p)の定義式における
Figure JPOXMLDOC01-appb-M000009
は、[]内の条件が満たされる場合、すなわち、|Nin(i)|<K が満たされる場合に1を返し、そうでない場合に、0を返す関数である。
 したがって、関数ψ(p)の定義式(式4)における右辺第2項は、ある頂点V(i)について、当該頂点V(i)に入ってくるエッジの起点(始点)となる頂点V(j)の数が、頂点の総数|V|=Kよりも少ない場合に、log p(i)に比例する0でない値を有する。当該第2項のことを対数バリア項とも呼ぶ。 More specifically, first, as shown in “3:” of Algorithm 1, the first derivation unit 121 derives the function ψ(p) as follows:
Figure JPOXMLDOC01-appb-M000006
The function ψ(p) serves as a convex function that defines the Bregman divergence, which will be described later. In the above formula, p(i) is defined as follows:
Figure JPOXMLDOC01-appb-M000007
In addition, N in (i) in the definition of the function ψ(p) is expressed as follows:
Figure JPOXMLDOC01-appb-M000008
(in other words, for a vertex V(i) with vertex number i, a set whose element is the vertex number j of a vertex V(j) that is the starting point (start point) of an edge entering the vertex V(i)), and |N in (i)| indicates the number of elements of the set. Also, in the definition of the function ψ(p),
Figure JPOXMLDOC01-appb-M000009
is a function that returns 1 if the condition in [ ] is satisfied, that is, if |N in (i)|<K is satisfied, and returns 0 otherwise.
Therefore, the second term on the right side of the definition equation (Equation 4) of the function ψ(p) has a non-zero value proportional to log p(i) when the number of vertices V(j) that are the origins (starting points) of edges entering a vertex V(i) is less than the total number of vertices |V| = K. This second term is also called the logarithmic barrier term.

 このように、本例示的実施形態に係る第1の導出部121が第1の最適解を導出するために参照するブレグマン情報量は、凸関数ψ(p)によって記載され、当該凸関数ψ(p)は、上述した対数バリア項を含む。このような対数バリア項を用いることにより、様々な環境(様々な問題設定(様々なフィードバックグラフ))に対して柔軟に適用可能な意思決定処理を行うことができる。 In this manner, the Bregman information that the first derivation unit 121 according to this exemplary embodiment refers to in order to derive the first optimal solution is described by a convex function ψ(p), and the convex function ψ(p) includes the logarithmic barrier term described above. By using such a logarithmic barrier term, it is possible to perform decision-making processing that can be flexibly applied to various environments (various problem settings (various feedback graphs)).

 続いて、アルゴリズム1の「4:」に示すように、取得部11は、各エキスパート(モデル)が出力する出力値(m)を取得する。各エキスパート(モデル)が出力する出力値については上述したためここでは説明を省略する。 Next, as shown in “4:” of Algorithm 1, the acquisition unit 11 acquires the output value (m t ) output by each expert (model). The output value output by each expert (model) has been described above, so a description thereof will be omitted here.

 続いて、アルゴリズム1の「5:」に示すように、第1の導出部121は、

Figure JPOXMLDOC01-appb-M000010
によって、ラウンドtにおける第1の最適解(第1の意思決定結果)pを導出する。ここで、(式8)の右辺第1項は、mとpとの内積を示しており、右辺第2項は、関数ψを用いて規定されるブレグマン情報量(Bregman divergence)
Figure JPOXMLDOC01-appb-M000011
の引数として、pと、p’とを用いたものを示している。また、(式8)におけるΔ’は、
Figure JPOXMLDOC01-appb-M000012
によって規定される集合である。 Next, as shown in “5:” of Algorithm 1, the first derivation unit 121
Figure JPOXMLDOC01-appb-M000010
Here, the first term on the right-hand side of (Equation 8) represents the inner product of mt and p, and the second term on the right-hand side represents the Bregman divergence defined using the function ψ .
Figure JPOXMLDOC01-appb-M000011
In addition, Δ′ K in ( Equation 8) is expressed as follows:
Figure JPOXMLDOC01-appb-M000012
is the set defined by

 (式8)に示すように、第1の導出部121は、
・mとpとの内積<m,p>と、
・p、p’、及びψによって規定されるブレグマン情報量Dψ(p,p’)と
の線形和が最小となるようなpを、第1の最適解(第1の意思決定結果)として導出する。
As shown in (Equation 8), the first derivation part 121 is
・The inner product of m t and p, <m t , p>,
Derive p that minimizes the linear sum of p, p' t , and the Bregman information D ψ (p, p' t ) defined by ψ as a first optimal solution (first decision-making result).

 このように、第1の導出部121は、1又は複数のモデル(エキスパート)の各々から取得した出力値(m)を参照して第1の最適解(p)を導出する第1の導出処理(ベースアルゴリズム)を実行する。 In this way, the first derivation unit 121 executes a first derivation process (base algorithm) that derives a first optimal solution ( pt ) by referring to the output values ( mt ) obtained from each of one or more models (experts).

 続いて、アルゴリズム1の「6:」に示すように、情報処理システム100Aは、上記第1の最適解pを実行し、当該最適解の実行に伴う損失値lを観測する。また、取得部11は、補正のためのパラメータαを取得する。ここで、第1の最適解pの実行は、一例として、端末装置2Aの実行部21によって実行され得る。また、損失値の取得は、一例として、損失値取得部22によって実行される。ただし、導出された第1の最適解pの少なくとも一部は、実行されることなく、後述するアルゴリズム2を実行する第2の導出部122に供給される構成としてもよい。また、第1の最適解pが示す内容によっては、損失値lが観測できない場合もあることは上述した通りである。また、損失値l、及び補正のためのパラメータαの少なくとも一部は、後述するアルゴリズム2によって導出された値を用いてもよい。 Next, as shown in "6:" of algorithm 1, the information processing system 100A executes the first optimal solution p t and observes the loss value l t associated with the execution of the optimal solution. The acquisition unit 11 acquires a parameter α t for correction. Here, the execution of the first optimal solution p t may be executed by the execution unit 21 of the terminal device 2A, as an example. The acquisition of the loss value is executed by the loss value acquisition unit 22, as an example. However, at least a part of the derived first optimal solution p t may be supplied to the second derivation unit 122 that executes algorithm 2, which will be described later, without being executed. As described above, depending on the contents indicated by the first optimal solution p t , there are cases where the loss value l t cannot be observed. At least a part of the loss value l t and the parameter α t for correction may use values derived by algorithm 2, which will be described later.

 続いて、アルゴリズム1の「7:」に示すように、第1の導出部121は、

Figure JPOXMLDOC01-appb-M000013
によって、パラメータp’を更新する。このように、第1の導出部121は、損失値lを参照して、第1の最適解(p)を導出するためのパラメータ(p’)を更新する。ここで、aは、
Figure JPOXMLDOC01-appb-M000014
によって規定されるパラメータである。より具体的に言えば、aは、損失l(i)、エキスパート(モデル)の出力値m(i)、補正パラメータα、及び学習率ηによって規定されるパラメータである。(式11)及び(式12)から明らかなように、補正パラメータαは、損失値l、出力値m、パラメータp’、及び第1の最適解pの少なくとも何れかを補正するためのパラメータと捉えることができる。また、(式12)によって規定されるパラメータaも、同様の意味での補正のためのパラメータと捉えることができる。 Next, as shown in “7:” of Algorithm 1, the first derivation unit 121
Figure JPOXMLDOC01-appb-M000013
In this way, the first derivation unit 121 refers to the loss value l t and updates the parameter (p' t ) for deriving the first optimal solution (p t ). Here, a t is expressed as
Figure JPOXMLDOC01-appb-M000014
More specifically, a t is a parameter defined by the loss l t (i), the output value m t (i) of the expert (model), the correction parameter α t , and the learning rate η. As is clear from (Equation 11) and (Equation 12), the correction parameter α t can be considered as a parameter for correcting at least one of the loss value l t , the output value m t , the parameter p′ t , and the first optimal solution p t . The parameter a t defined by (Equation 12) can also be considered as a parameter for correction in the same sense.

 このように、第1の導出部121は、補正のためのパラメータα又はaを更に参照して、第1の最適解(p)を更新する構成であると捉えることができる。 In this way, the first derivation unit 121 can be considered to be configured to update the first optimal solution (p t ) by further referring to the parameter α t or a t for correction.

 (アルゴリズム2の処理)
 続いて、アルゴリズム2の処理について説明する。アルゴリズム2の冒頭に示すように、取得部11は、グラフG(V,E)、及びパラメータTを取得する。グラフG(V,E)、及びパラメータTについては上述したためここでは説明を省略する。
(Processing of Algorithm 2)
Next, the process of algorithm 2 will be described. As shown at the beginning of algorithm 2, the acquisition unit 11 acquires a graph G(V, E) and a parameter T. The graph G(V, E) and the parameter T have been described above, and therefore will not be described here.

 また、アルゴリズム2の冒頭の「Input」に示す通り、アルゴリズム2はベースアルゴリズムBを参照して実行される。ここで、ベースアルゴリズムBは、一例として、上述したアルゴリズム1のことを指す。アルゴリズム2は複数のベースアルゴリズムBを参照して実行される。換言すれば、第2の導出部122は、複数のアルゴリズム1の各々を実行する第1の導出部121の各々と連携しながら、第2のアルゴリズムを実行する。 Also, as shown in "Input" at the beginning of algorithm 2, algorithm 2 is executed with reference to base algorithm B. Here, base algorithm B refers to algorithm 1 described above, as an example. Algorithm 2 is executed with reference to multiple base algorithms B. In other words, the second derivation unit 122 executes the second algorithm in cooperation with each of the first derivation units 121 that execute each of the multiple algorithms 1.

 続いて、アルゴリズム2の「1:」に示すように、第2の導出部122は、ラウンドの総数Tを用いて、パラメータMを

Figure JPOXMLDOC01-appb-M000015
に設定する。ここで、logTの両側の記号は、天井関数を示している。したがって、パラメータMは、第2の導出部122によって、実数である引数(logT)以上である最小の整数値に設定される(例えばlogT=3.3・・・であれば、M=4に設定される)。ここで、パラメータMは、アルゴリズム2が参照するベースアルゴリズム(アルゴリズム1)の総数としての意味を有するが、これは本例示的実施形態を限定するものではない。 Next, as shown in “1:” of Algorithm 2, the second derivation unit 122 uses the total number of rounds T to derive the parameter M.
Figure JPOXMLDOC01-appb-M000015
Here, the symbols on both sides of log 2 T indicate ceiling functions. Therefore, the parameter M is set by the second derivation unit 122 to the smallest integer value that is equal to or greater than the argument (log 2 T), which is a real number (for example, if log 2 T=3.3..., then M=4). Here, the parameter M has the meaning of the total number of base algorithms (algorithm 1) that algorithm 2 refers to, but this does not limit the present exemplary embodiment.

 また、アルゴリズム2の「1:」に示すように、第2の導出部122は、パラメータη(j)を

Figure JPOXMLDOC01-appb-M000016
に設定する。ここで、インデックスjは、複数のベースアルゴリズムを互いに識別するためのインデックスであり、アルゴリズム2の「1:」に示すように、一例として、1からMまでの整数値を取る。なお、パラメータη(j)は、後述する信頼度(各ベースアルゴリズムの信頼度)w(j)を導出するために参照する学習率としての役割を有している。したがって、第2の導出部122は、当該信頼度w(j)を導出するために参照する学習率η(j)を、ベースアルゴリズム(第1の導出処理)の総数Mの-1/2乗に比例した値に設定すると表現することができる。発明者の得た知見によれば、上述のように学習率η(j)を設定することにより、例えば、学習率η(j)を、ベースアルゴリズムの総数Mの-1乗に比例した値に設定した場合等に比べて、信頼度w(j)の導出をより好適に行うことができる。なお、以下では、第1の導出部121-1,121-2,・・・の各々を、上述のインデックスjを用いて、121-jと表記することもある。 As shown in “1:” of Algorithm 2, the second derivation unit 122 calculates the parameter η(j) as
Figure JPOXMLDOC01-appb-M000016
Here, the index j is an index for distinguishing a plurality of base algorithms from each other, and as an example, takes an integer value from 1 to M, as shown in "1:" of algorithm 2. The parameter η(j) serves as a learning rate referenced to derive the reliability (reliability of each base algorithm) w t (j), which will be described later. Therefore, it can be expressed that the second derivation unit 122 sets the learning rate η(j) referenced to derive the reliability w t (j) to a value proportional to the -1/2 power of the total number M of base algorithms (first derivation process). According to the knowledge obtained by the inventors, by setting the learning rate η(j) as described above, the reliability w t (j) can be more suitably derived, for example, compared to a case in which the learning rate η(j) is set to a value proportional to the -1 power of the total number M of base algorithms. In the following, each of the first derivation parts 121-1, 121-2, . . . may be expressed as 121-j using the above-mentioned index j.

 続いて、アルゴリズム2の「2:」に示すように、第2の導出部122は、ラウンドt=1における重み因子w’を、ラウンド1における意思決定結果に対応するパラメータp’が、各jに関し

Figure JPOXMLDOC01-appb-M000017
を満たすように設定する。ここで、ラウンドt=1における重み因子w’は、アルゴリズム2の「2:」に示すように、
Figure JPOXMLDOC01-appb-M000018
を満たす。ここで、Δは、以下の定義式においてKをMに置き換えて得られる集合を示す。
Figure JPOXMLDOC01-appb-M000019
 続いて、アルゴリズム2の「3:」に示すように、第2の導出部122は、各ベースアルゴリズム
Figure JPOXMLDOC01-appb-M000020
を初期化する。ここで、当該初期化処理には、アルゴリズム2の「3:」に示すように、第2の導出部122が、ベースアルゴリズムBに対して、
  G,η(j),T
の各値を渡す処理が含まれる。 Next, as shown in “2:” of Algorithm 2, the second derivation unit 122 derives the weight factor w 1 ′ in round t=1 by deducing that the parameter p 1 ′ corresponding to the decision-making result in round 1 is
Figure JPOXMLDOC01-appb-M000017
Here, the weight factor w 1 ′ in round t=1 is set to satisfy the following, as shown in “2:” of Algorithm 2:
Figure JPOXMLDOC01-appb-M000018
Here, ΔM denotes a set obtained by replacing K with M in the following definition.
Figure JPOXMLDOC01-appb-M000019
Next, as shown in “3:” of Algorithm 2, the second derivation unit 122 calculates each base algorithm
Figure JPOXMLDOC01-appb-M000020
Here, in the initialization process, as shown in “3:” of the algorithm 2, the second derivation unit 122 initializes the base algorithm B j as follows:
G, η(j), T
This includes passing each value.

 続いて、第2の導出部122は、アルゴリズム2の「4:」~「13:」によって特定されるループ処理を実行する。当該ループ処理は、ラウンドを示すインデックスtをインクリメントしながら、t=Tとなるまで実行される。換言すれば、第2の導出部122は、アルゴリズム1の「5:」~「13:」において特定される処理を、ラウンド毎に繰り返す。 Then, the second derivation unit 122 executes the loop process specified by "4:" to "13:" of algorithm 2. The loop process is executed while incrementing the index t indicating the round, until t = T. In other words, the second derivation unit 122 repeats the process specified by "5:" to "13:" of algorithm 1 for each round.

 より具体的には、まず、アルゴリズム2の「5:」に示すように、取得部11が、予測値mを取得し、第2の導出部122が、各第1の導出部121-jを介して、各ベースアルゴリズム(各アルゴリズム1)に、当該予測値mを供給する。 More specifically, first, as shown in “5:” of algorithm 2, the acquisition unit 11 acquires a predicted value m t , and the second derivation unit 122 supplies the predicted value m t to each base algorithm (each algorithm 1) via each first derivation unit 121-j.

 続いて、アルゴリズム2の「6:」に示すように、取得部11が、各ベースアルゴリズム(各アルゴリズム1)による各第1の最適解(第1の意思決定結果)pt,jを取得する。そして、第2の導出部122は、アルゴリズム2の「6:」に示すように、パラメータh(j)を、

Figure JPOXMLDOC01-appb-M000021
によって設定する。ここで、< , >は内積を示している。 Next, as shown in “6:” of Algorithm 2, the acquisition unit 11 acquires each first optimal solution (first decision-making result) p t,j by each base algorithm (each algorithm 1). Then, as shown in “6:” of Algorithm 2, the second derivation unit 122 calculates the parameter h t (j) as follows:
Figure JPOXMLDOC01-appb-M000021
Here, <,> denotes the dot product.

 続いて、アルゴリズム2の「7:」に示すように、第2の導出部122は、

Figure JPOXMLDOC01-appb-M000022
によって、各ベースアルゴリズム(各アルゴリズム1)(各第1の導出手段121-j)の信頼度を示す信頼度ベクトルwを導出する。ここで、Dφは、上述したブレグマン情報量(Bregman divergence)を表している。ただし、当該ブレグマン情報量を規定する凸関数φは、
Figure JPOXMLDOC01-appb-M000023
によって与えられる。このように、また、一部上述したが、w’は、信頼度wを導出するために参照される重み因子である。このように、第2の導出部122は、出力値(m)及び前記第1の最適解(p)を参照して、前記信頼度(w(j))を導出する。 Next, as shown in “7:” of Algorithm 2, the second derivation unit 122
Figure JPOXMLDOC01-appb-M000022
A reliability vector wt indicating the reliability of each base algorithm (each algorithm 1) (each first derivation means 121-j) is derived by: where represents the above-mentioned Bregman divergence. However, the convex function φ that defines the Bregman divergence is given by:
Figure JPOXMLDOC01-appb-M000023
As described above, w't is a weighting factor that is referred to in order to derive the reliability wt . In this manner, the second derivation unit 122 derives the reliability ( wt (j)) by referring to the output value ( mt ) and the first optimal solution ( pt ).

 続いて、アルゴリズム2の「8:」に示すように、第2の導出部122は、
・各ベースアルゴリズム(各アルゴリズム1)による第1の最適解(第1の意思決定結果)pt,j
・各ベースアルゴリズムの信頼度を示す信頼度ベクトルwの各成分である信頼度w(j)
を用いて、

Figure JPOXMLDOC01-appb-M000024
によって、第2の最適解(第2の意思決定結果)pを導出する。 Next, as shown in “8:” of Algorithm 2, the second derivation unit 122
A first optimal solution (first decision-making result) p t,j by each base algorithm (each algorithm 1),
Reliability w t (j), which is each component of the reliability vector w t indicating the reliability of each base algorithm
Using
Figure JPOXMLDOC01-appb-M000024
A second optimal solution (second decision-making result) p t is derived by:

 このように、第2の導出部122は、上述した複数の第1の導出処理(ベースアルゴリズム)の各々が導出した第1の最適解(pt,j)と、前記複数の第1の導出処理(ベースアルゴリズム)の各々の信頼度(w(j))とに応じて第2の最適解(p)を導出する第2の導出処理(マスタアルゴリズム)を実行する。より具体的には、上記式にて表現されるように、第2の導出部122は、複数の第1の導出部121-jが導出した各第1の最適解(第1の意思決定結果)pt,jの加重和であって、当該複数の第1の導出部121-jの各々の信頼度w(j)に応じた加重和によって、第2の最適解(第2の意思決定結果)pを導出する。 In this way, the second derivation unit 122 executes a second derivation process (master algorithm) that derives a second optimal solution (p t ) in accordance with the first optimal solution (p t,j ) derived by each of the multiple first derivation processes (base algorithms) described above and the reliability (w t ( j )) of each of the multiple first derivation processes (base algorithms). More specifically, as expressed in the above formula, the second derivation unit 122 derives the second optimal solution (second decision-making result) p t by a weighted sum of each of the first optimal solutions (first decision-making result) p t,j derived by the multiple first derivation units 121- j , the weighted sum corresponding to the reliability w t (j) of each of the multiple first derivation units 121-j.

 上記のように、第2の導出部122は、複数の第1の導出部121-jの各々が導出した第1の意思決定結果(pt,j)と、当該複数の第1の導出部121-jの各々の信頼度(w(j))とに応じて、第2の意思決定結果pを導出する。ここで、第1の意思決定結果は、上述したように各エキスパート(モデル)が提供する出力値(m)を参照して行われる。したがって、上記の構成によれば、各エキスパート(モデル)が提供する出力値を参照して、第1の導出部121及び第2の導出部122による階層的な処理により、好適な意思決定結果を導出することができる。
 続いて、アルゴリズム2の「9:」に示すように、第2の導出部122は、第2の最適解pに応じて選択肢iを特定する。より具体的には、第2の導出部122は、第2の最適解pが示す確率分布に応じた確率で、選択肢iを選択する。ここで、iは、

Figure JPOXMLDOC01-appb-M000025
を満たすものであり、iは、tステップにおける当該iを示している。また、当該式におけるNout(i)は、
Figure JPOXMLDOC01-appb-M000026
によって定義される集合である。換言すれば、Nout(i)は、頂点番号iの頂点V(i)について、当該頂点V(i)から出ていくエッジの終点となる頂点V(j)の頂点番号jを要素とする集合である。
 続いて、アルゴリズム2の「10:」に示すように、導出された第2の最適解(第2の意思決定結果)pに応じて選択された選択肢iが、一例として、端末装置2Aの実行部21によって実行される。そして、当該第2の最適解pに対応する損失値l(換言すれば選択された選択肢iに対応する損失値l)が、端末装置2Aの損失値取得部22によって取得され、情報処理装置1Aに提供される。 As described above, the second derivation unit 122 derives the second decision-making result p t in accordance with the first decision-making result (p t,j ) derived by each of the multiple first derivation units 121-j and the reliability (w t (j)) of each of the multiple first derivation units 121- j . Here, the first decision-making result is made with reference to the output value (m t ) provided by each expert (model) as described above. Therefore, according to the above configuration, a suitable decision-making result can be derived by hierarchical processing by the first derivation unit 121 and the second derivation unit 122 with reference to the output value provided by each expert (model).
Next, as shown in “9:” of Algorithm 2, the second derivation unit 122 identifies an option i t according to the second optimal solution p t . More specifically, the second derivation unit 122 selects an option i t with a probability according to the probability distribution indicated by the second optimal solution p t . Here, i is
Figure JPOXMLDOC01-appb-M000025
where i t indicates the i in the t step. In addition, N out (i) in the formula satisfies the following:
Figure JPOXMLDOC01-appb-M000026
In other words, N out (i) is a set whose elements are the vertex numbers j of the vertices V(j) that are the end points of the edges going out from the vertex V(i) with the vertex number i.
Next, as shown in “10:” of Algorithm 2, an option i t selected according to the derived second optimal solution (second decision-making result) p t is executed by the execution unit 21 of the terminal device 2A, for example. Then, a loss value l t corresponding to the second optimal solution p t (in other words, a loss value l t corresponding to the selected option i t ) is acquired by the loss value acquisition unit 22 of the terminal device 2A and provided to the information processing device 1A.

 なお、アルゴリズム2の「10:」に示すように、当該選択肢iの実行は、

Figure JPOXMLDOC01-appb-M000027
を満たす全ての選択肢iに対して行われる。ここで、Nout(i)は、上述したように、頂点番号iの頂点V(i)について、当該頂点V(i)から出ていくエッジの終点となる頂点V(j)の頂点番号jを要素とする集合であるため、選択肢iに対応する損失値lは観測可能な損失値である。
 続いて、アルゴリズム2の「11:」に示すように、第2の導出部122は、アルゴリズム2の「10:」において観測によって取得した損失値l、及びエキスパート(モデル)の出力値mを参照して、
Figure JPOXMLDOC01-appb-M000028
によって、ハット付きの損失値(^l)を導出する。ここで、P(i)は、第1の導出処理(ベースアルゴリズム)の各々が導出した第1の最適解p(j)を用いて
Figure JPOXMLDOC01-appb-M000029
のように算出される。また、(式26)の右辺第1項における
Figure JPOXMLDOC01-appb-M000030
は、[]内の条件が満たされる場合、すなわち、インデックスiが集合Nout(i)の要素である場合(換言すれば、損失値lが観測によって取得される場合)に1を返し、そうでない場合に、0を返す関数である。
 このようにして第2の導出部122が導出するハット付きの損失値(^l)は、
Figure JPOXMLDOC01-appb-M000031
を満たす(ここでE[]は期待値を表す)。すなわち、第2の導出部122は、ハット付きの損失値(^l)と不偏推定量(unbiased estimate)として導出する。このようにして、第2の導出部122は、損失値lが観測によって取得される場合であっても、そうでない場合であっても、ハット付きの損失値(^l)を好適に導出することができるので、損失値lが観測によって取得される場合であっても、そうでない場合であっても意思決定(最適解の導出)を好適に行うことができる。換言すれば、本例示的実施形態に係る情報処理システム100Aは、様々な環境(様々な問題設定(様々なフィードバックグラフ))に対して柔軟に適用可能な意思決定処理を行うことができる。なお、本明細書において、観測によって取得された損失値l、及び、第2の導出部122によって導出されたハット付きの損失値(^l)の双方を、単に損失値と表現することもある。 As shown in “10:” of Algorithm 2, the execution of the option i t is as follows:
Figure JPOXMLDOC01-appb-M000027
Here, as described above, N out (i t ) is a set whose elements are the vertex numbers j of the vertices V( j ) that are the end points of the edges going out from the vertex V(i t ) of the vertex V(i t ) with the vertex number i t , so the loss value l t corresponding to the option i t is an observable loss value.
Next, as shown in “11:” of Algorithm 2, the second derivation unit 122 refers to the loss value l t acquired by observation in “10:” of Algorithm 2 and the output value m t of the expert (model), and calculates:
Figure JPOXMLDOC01-appb-M000028
Here, P t (i) is calculated by using the first optimal solution p t (j) derived by each of the first derivation processes (base algorithm) .
Figure JPOXMLDOC01-appb-M000029
In addition, in the first term on the right side of (Equation 26),
Figure JPOXMLDOC01-appb-M000030
is a function that returns 1 if the condition in [ ] is satisfied, i.e., if index i is an element of set N out (i t ) (in other words, if loss value l t is obtained by observation), and returns 0 otherwise.
The loss value with a hat (^l t ) derived by the second derivation unit 122 in this manner is given by
Figure JPOXMLDOC01-appb-M000031
(where E t [ ] represents the expected value). That is, the second derivation unit 122 derives the loss value with a hat (^l t ) as an unbiased estimate. In this way, the second derivation unit 122 can suitably derive the loss value with a hat (^l t ) whether the loss value l t is obtained by observation or not, and therefore can suitably perform decision-making (derive an optimal solution) whether the loss value l t is obtained by observation or not. In other words, the information processing system 100A according to this exemplary embodiment can perform decision-making processing that is flexibly applicable to various environments (various problem settings (various feedback graphs)). Note that in this specification, both the loss value l t obtained by observation and the loss value with a hat (^l t ) derived by the second derivation unit 122 may be simply expressed as a loss value.

 続いて、アルゴリズム2の「12:」に示すように、第2の導出部122は、アルゴリズム2の「11:」にて導出した損失値(^l)、及び補正パラメータαを、ベースアルゴリズムBに供給する。ここで、当該補正パラメータαは、アルゴリズム2の「12:」に示すように、一例として、

Figure JPOXMLDOC01-appb-M000032
によって導出される。換言すれば、第2の導出部122は、補正パラメータαを、損失値(^l)と最適解(意思決定結果)pとの内積(にマイナス1を乗じたもの)として導出する。当該ステップにおいて提供される損失値(^l)は、一例として、アルゴリズム1の「6:」において取得されるlに対応している。また、当該ステップにおいて提供されるα(j)は、アルゴリズム1の「7:」において取得される補正パラメータαに対応している。 Next, as shown in “12:” of Algorithm 2, the second derivation unit 122 supplies the loss value (^l t ) derived in “11:” of Algorithm 2 and the correction parameter α t to the base algorithm B j . Here, as shown in “12:” of Algorithm 2, the correction parameter α t is, for example,
Figure JPOXMLDOC01-appb-M000032
In other words, the second derivation unit 122 derives the correction parameter α t as the inner product (multiplied by minus 1) of the loss value (^l t ) and the optimal solution (decision-making result) p t . The loss value (^l t ) provided in this step corresponds to the l t obtained in “6:” of Algorithm 1, for example. Also, the α t (j) provided in this step corresponds to the correction parameter α t obtained in “7:” of Algorithm 1.

 続いて、アルゴリズム2の「13:」に示すように、第2の導出部122は、ラウンドtにおける重み因子w’を、ラウンドt+1における重み因子w’t+1に更新する。より具体的には、第2の導出部122は、ラウンドt+1における重み因子w’t+1を、

Figure JPOXMLDOC01-appb-M000033
によって導出する。ここで、gは、
Figure JPOXMLDOC01-appb-M000034
によって定義され、bは、
Figure JPOXMLDOC01-appb-M000035
によって定義される。 Next, as shown in “13:” of Algorithm 2, the second derivation unit 122 updates the weight factor w′ t in round t to a weight factor w′ t+1 in round t+1 . More specifically, the second derivation unit 122 updates the weight factor w′ t+1 in round t+ 1 as follows:
Figure JPOXMLDOC01-appb-M000033
Here, gt is derived by:
Figure JPOXMLDOC01-appb-M000034
and bt is defined by
Figure JPOXMLDOC01-appb-M000035
is defined as follows:

 なお、信頼度wを導出するために参照するパラメータ(重み因子)w’も、各ベースアルゴリズムの信頼度(各第1の最適解の信頼度)を表すパラメータと捉えることができる。ただし当該解釈は本例示的実施形態を限定するものではない。 The parameter (weighting factor) w't referred to in deriving the reliability wt can also be considered as a parameter representing the reliability of each base algorithm (the reliability of each first optimal solution). However, this interpretation does not limit the present exemplary embodiment.

 以上のように、本例示的実施形態では、1又は複数のモデル(エキスパート)の各々から得られる出力値を取得し、取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び、前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理を実行する。換言すれば、本例示的実施形態に係る情報処理装置1は、信頼度を用いた階層的な処理によって最適解の導出(意思決定)を行う。したがって、本例示的実施形態に係る情報処理装置1によれば、より適切な意思決定結果(最適解)を導出することができる。 As described above, in this exemplary embodiment, a plurality of first derivation processes are executed in which output values obtained from each of one or more models (experts) are acquired and a first optimal solution is derived by referring to the acquired output values, and a second derivation process is executed in which a second optimal solution is derived according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes. In other words, the information processing device 1 according to this exemplary embodiment derives an optimal solution (decision-making) by hierarchical processing using reliability. Therefore, according to the information processing device 1 according to this exemplary embodiment, it is possible to derive a more appropriate decision-making result (optimal solution).

 また、前記信頼度は所定のタイミングで初期化される構成としてもよい。このような構成とすることにより、環境の変化に好適に対応することができる。
 また、上述のように、本例示的実施形態に係る情報処理システム100Aは、任意のフィードバックグラフG(V,E)に対して適用可能である。換言すれば、本例示的実施形態に係る情報処理システム100Aは、損失値が観測できる場合であってもそうでない場合であっても適用可能である。より具体的には、上述したように、本例示的実施形態では、損失値(^l)を推定によって(より具体的には不偏推定量として)導出してもよく、これにより、情報処理システム100Aは、任意のフィードバックグラフG(V,E)に対して好適に適用可能な構成となっている。換言すれば、情報処理システム100Aは、任意のフィードバックグラフG(V,E)に対応可能であり、かつ、環境の変化に好適に対応可能な構成となっている。
The reliability may be initialized at a predetermined timing, making it possible to appropriately respond to changes in the environment.
Also, as described above, the information processing system 100A according to this exemplary embodiment is applicable to any feedback graph G(V,E). In other words, the information processing system 100A according to this exemplary embodiment is applicable whether the loss value can be observed or not. More specifically, as described above, in this exemplary embodiment, the loss value (^l t ) may be derived by estimation (more specifically, as an unbiased estimator), and this allows the information processing system 100A to be suitably applicable to any feedback graph G(V,E). In other words, the information processing system 100A is capable of handling any feedback graph G(V,E) and is capable of suitably handling changes in the environment.

 (情報処理システム100Aによるより具体的な効果の説明)
 以下では、図9を参照して、情報処理システム100Aによるより具体的な効果について説明する。図9は、情報処理システム100Aをリンゴの出荷に関する意思決定問題に適用した場合の例を示している。当該問題設定では、リンゴを出荷する、リンゴを出荷しないの2つの選択肢が存在し(上述したアルゴリズム1においてK=2に対応)、パラメータTの値をT=200とした。換言すれは、意思決定回数の総数を200回とした。また、各損失を
  リンゴがおいしいときの損失:出荷しないと1、出荷すると0
  リンゴがおいしくないときの損失:出荷しないと0、出荷すると1
とした。
(More specific effects of the information processing system 100A)
More specific effects of the information processing system 100A will be described below with reference to FIG. 9. FIG. 9 shows an example in which the information processing system 100A is applied to a decision-making problem regarding the shipping of apples. In this problem setting, there are two options, shipping apples and not shipping apples (corresponding to K=2 in the above-mentioned algorithm 1), and the value of the parameter T is set to T=200. In other words, the total number of decision-making times is set to 200. In addition, each loss is defined as: Loss when the apples are delicious: 1 if they are not shipped, 0 if they are shipped.
Loss when apples are not tasty: 0 if not shipped, 1 if shipped
It was decided.

 また、環境変化として、リンゴがおいしい確率が、
  前半100回では平均0.9であり、
  後半100回では平均0.1である
という設定とした。また、本例示的実施形態に係る情報処理システム100Aにおいて、50回毎に、信頼度を初期化(アルゴリズムを初期化)するよう構成した。また、エキスパート(モデル)の出力値(m)は常に0とした。
In addition, the probability that the apples are delicious changes as a result of environmental changes.
The average for the first 100 times was 0.9.
The average reliability was set to 0.1 in the latter 100 times. In addition, in the information processing system 100A according to this exemplary embodiment, the reliability was initialized (the algorithm was initialized) every 50 times. In addition, the output value (m t ) of the expert (model) was always set to 0.

 以上のような問題設定において、本例示的実施形態に係る情報処理システム100Aによる損失値の推移と、比較例に係る構成による損失値の推移とを示したのが図9である。図9に示すように、情報処理システム100Aによる損失値は、比較例に係る構成による損失値に比べて顕著に小さく抑えられており、また、100回目において生じる環境変化にも迅速に適用できていることが分かる。また50回毎の信頼度を初期化の後も、損失値が速やかに収束していることが見てとれる。このように、本例示的実施形態に係る情報処理システム100Aは、比較例に係る構成に比べて、環境変化への適用性が顕著に高いことが分かる。 In the above problem setting, Figure 9 shows the progress of the loss value using the information processing system 100A according to this exemplary embodiment and the progress of the loss value using the configuration according to the comparative example. As shown in Figure 9, the loss value using the information processing system 100A is significantly smaller than the loss value using the configuration according to the comparative example, and it can be seen that it is able to quickly adapt to the environmental changes that occur on the 100th iteration. It can also be seen that the loss value quickly converges even after the reliability is initialized every 50 iterations. In this way, it can be seen that the information processing system 100A according to this exemplary embodiment has significantly higher adaptability to environmental changes than the configuration according to the comparative example.

 〔適用例〕
 以下では、本例示的実施形態に係る情報処理システム100Aによるより具体的な適用例について説明する。図10は、本例に係る情報処理装置1による処理を模式的に示す図である。
[Application example]
A more specific application example of the information processing system 100A according to this exemplary embodiment will be described below. Fig. 10 is a diagram showing a schematic diagram of a process performed by the information processing device 1 according to this embodiment.

 図8に示すように、本例に係る情報処理装置1は、複数の医療従事者と複数の病院とのマッチングに関する意思決定を行う(意思決定結果を導出する)。ここで、本例に係る情報処理装置1は、上述したように、あるラウンドにおいて、各エキスパート(モデル)に対応付けられた出力値を、複数のエキスパートについて取得し、取得した出力値を参照して意思決定結果を導出する。 As shown in FIG. 8, the information processing device 1 of this example makes a decision regarding matching multiple medical professionals with multiple hospitals (derives a decision-making result). Here, as described above, the information processing device 1 of this example acquires output values associated with each expert (model) for multiple experts in a certain round, and derives a decision-making result by referring to the acquired output values.

 本例に係る情報処理装置1の具体的構成は本適用例を限定するものではないが、例示的実施形態1において説明した情報処理装置1と同様の構成であってもよいし、例示的実施形態2において説明した情報処理装置1Aと同様の構成であってもよい。 The specific configuration of the information processing device 1 in this example does not limit this application example, but may be the same as the information processing device 1 described in exemplary embodiment 1, or may be the same as the information processing device 1A described in exemplary embodiment 2.

 また、本例に係る情報処理装置1が行う意思決定処理は、一例として、情報処理装置1Aに関して説明したように、複数の第1の導出部121-1、121-2、・・・、及び第2の導出部122による、階層的な意思決定処理とすることができるが、これは本適用例を限定するものではない。 The decision-making process performed by the information processing device 1 in this example can be, as an example, a hierarchical decision-making process using multiple first derivation units 121-1, 121-2, ... and second derivation unit 122, as described for information processing device 1A, but this is not a limitation of this application example.

 そして、導出された意思決定結果が実行されることにより、次のラウンドに対応する損失値が観測によって取得されるか、または導出によって取得される。当該損失値と、当該損失値に対応する出力値とが本例に係る情報処理装置1に提供され、当該次のラウンドにおける意思決定処理において参照される。 Then, the derived decision-making result is executed, and a loss value corresponding to the next round is obtained by observation or by derivation. The loss value and an output value corresponding to the loss value are provided to the information processing device 1 in this example, and are referenced in the decision-making process in the next round.

 本例に係る各エキスパートへの入力、及び当該各エキスパートの出力値を例示すれば以下の通りである。なお、第1の例示的実施形態において説明したように、本例に係る情報処理システムは、「エキスパート(モデル)」を含む構成であってもよいし、外部の「エキスパート(モデル)」から予測値を取得する構成であってもよい。 The inputs to each expert in this example and the output values of each expert are exemplified below. As explained in the first exemplary embodiment, the information processing system in this example may be configured to include an "expert (model)" or may be configured to obtain predicted values from an external "expert (model)."

 (エキスパートへの入力例)
・時刻t(ラウンドt)で観測された各病院と各医療従事者との特徴量
・前回(ラウンドt-1)における各病院での来院患者数
 (エキスパートの出力例)
・時刻t(ラウンドt)における医療従事者毎の配属先病院。
(Example of input to the expert)
・Features of each hospital and each medical worker observed at time t (round t) ・Number of patients visiting each hospital in the previous round (round t-1) (Example of expert output)
The hospital to which each medical worker is assigned at time t (round t).

 (本例に係る処理の流れ)
 本例に係る情報処理システム100による処理の一例について説明すれば以下の通りである。
(Processing flow according to this example)
An example of the processing performed by the information processing system 100 according to this embodiment will be described below.

 まず、病院における入力担当者は、情報処理システム100に、端末装置2A等を介して、診断状況、病室の空き状況、診療科目、診療時間などの情報(病院データとも呼ぶ)を入力する。入力された各情報は、一例として記憶部15Aに記憶され、制御部10Aによって参照される。図11の下段は、本例に係る情報処理システム100が管理する病院データの一例である。 First, a person in charge of inputting data at the hospital inputs information (also called hospital data) such as diagnosis status, availability of hospital rooms, medical departments, and consultation hours into the information processing system 100 via a terminal device 2A or the like. Each piece of input information is stored in the memory unit 15A, for example, and is referenced by the control unit 10A. The lower part of Figure 11 is an example of hospital data managed by the information processing system 100 according to this example.

 続いて、各医療従事者は、情報処理システム100に、自身のデータ(専門、勤続年数、希望病院など)(医療従事者データとも呼ぶ)を入力する。入力された各情報は、一例として記憶部15Aに記憶され、制御部10Aによって参照される。図11の上段は、本例に係る情報処理システム100が管理する医療従事者データの一例である。 Next, each medical worker inputs their own data (specialty, years of service, preferred hospital, etc.) (also referred to as medical worker data) into the information processing system 100. Each piece of input information is stored in the memory unit 15A, as an example, and is referenced by the control unit 10A. The top part of Figure 11 is an example of medical worker data managed by the information processing system 100 of this example.

 続いて、本例に係る情報処理装置1は、病院データと、医療従事者のデータとを参照して、最適な病院と医療従事者のマッチングに関する意思決定結果を導出する。一例として、本例に係る複数のエキスパート(モデル)の各々は、病院データと、医療従事者のデータとを参照して出力値を算出する。そして、本例に係る情報処理装置1は、これらの出力値を参照して、最適な病院と医療従事者のマッチングに関する意思決定結果を導出する。 Then, the information processing device 1 according to this example derives a decision-making result regarding optimal matching of hospitals and medical professionals by referring to the hospital data and medical professional data. As an example, each of the multiple experts (models) according to this example calculates an output value by referring to the hospital data and medical professional data. Then, the information processing device 1 according to this example derives a decision-making result regarding optimal matching of hospitals and medical professionals by referring to these output values.

 そして、本例に係る情報処理システム100は、端末装置2A等を介して、医療従事者に対して最適な病院の候補を提案する。一例として、本例に係る情報処理システム100は、端末装置2Aが備える表示パネル等を介して、医療従事者に対して最適な病院の候補を提示する。また、本例に係る情報処理システム100は、医療従事者毎に、勤務に関する登録を行う。 Then, the information processing system 100 according to this example proposes optimal hospital candidates to the medical worker via the terminal device 2A or the like. As an example, the information processing system 100 according to this example presents optimal hospital candidates to the medical worker via a display panel or the like provided in the terminal device 2A. Furthermore, the information processing system 100 according to this example registers work-related information for each medical worker.

 本例に係る情報処理システム100は、一例として、毎月(毎ラウンド)における来院患者数を病院毎に記録する。そして、本例に係る情報処理システム100は、当該来院患者数に応じた損失値に基づいて、次回(次ラウンド)における配属先を決定する。ここで、損失値の具体例は本例を限定するものではないが、一例として、病院における混雑度合に応じた損失値を用いることができる。 As an example, the information processing system 100 according to this example records the number of patients visiting each month (each round) for each hospital. Then, the information processing system 100 according to this example determines the allocation for the next time (next round) based on a loss value according to the number of patients visiting. Here, the specific example of the loss value is not limited to this example, but as an example, a loss value according to the degree of congestion at the hospital can be used.

 より具体的には、損失値を、
  (各病院における実際の来院患者数)-(各病院に配属された医療従事者数×医療従事者が一人当たり診察可能な患者数)
によって算出する構成とすることができる。
More specifically, the loss value is
(Actual number of patients visiting each hospital) – (Number of medical staff assigned to each hospital x Number of patients that each medical staff can see)
The calculation may be performed as follows.

 ただし、病院によっては、プライバシーへの配慮等の理由により、来院者数の提供を行わない場合も生じ得る。このような場合には、当該病院(当該意思決定結果)に関する損失値は観測可能ではない。 However, some hospitals may not provide the number of visitors due to privacy considerations or other reasons. In such cases, the loss value for that hospital (the decision-making result) is not observable.

 本例に係る情報処理システム100は、上述したように、損失値が観測可能な場合であっても、そうでない場合であっても、好適に意思決定を行うことができるので、上記のような場合であっても、好適に病院と医療従事者のマッチングを行うことができる。また、年月の経過と共に、病院データや医療従事者データは変化し得る。本例に係る情報処理システム100は、このような環境変化が生じる場合であっても、好適に病院と医療従事者のマッチングを行うことができる。 As described above, the information processing system 100 according to this example can make appropriate decisions whether or not the loss value is observable, and therefore can appropriately match hospitals with medical personnel even in the above-mentioned cases. In addition, hospital data and medical personnel data may change over time. The information processing system 100 according to this example can appropriately match hospitals with medical personnel even in the case of such environmental changes.

 〔ソフトウェアによる実現例〕
 情報処理装置1、1A、端末装置2、2Aの制御ブロック(特に取得部11、導出部12)は、集積回路(ICチップ)等に形成された論理回路(ハードウェア)によって実現してもよいし、ソフトウェアによって実現してもよい。
[Software implementation example]
The control blocks (particularly the acquisition unit 11 and the derivation unit 12) of the information processing device 1, 1A and the terminal device 2, 2A may be realized by a logic circuit (hardware) formed on an integrated circuit (IC chip) or the like, or may be realized by software.

 後者の場合、情報処理装置1、1A、端末装置2、2Aは、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば少なくとも1つのプロセッサ(制御装置)を備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な少なくとも1つの記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばCPU(Central Processing Unit)を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ROM(Read Only Memory)等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するRAM(Random Access Memory)などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the information processing device 1, 1A and the terminal device 2, 2A are provided with a computer that executes the instructions of a program, which is software that realizes each function. This computer is provided with, for example, at least one processor (control device) and at least one computer-readable recording medium that stores the above program. The object of the present invention is achieved by the processor in the computer reading the above program from the recording medium and executing it. The processor can be, for example, a CPU (Central Processing Unit). The recording medium can be a "non-transient tangible medium" such as a ROM (Read Only Memory), as well as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, etc. The computer can also be provided with a RAM (Random Access Memory) for expanding the above program. The above program can also be supplied to the computer via any transmission medium (such as a communication network or broadcast waves) that can transmit the program. Note that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.

 〔付記事項A〕
 本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。
[Appendix A]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

 (付記A1)
 1又は複数のモデルの各々から得られる出力値を取得する取得手段と、
  前記取得手段が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する導出手段と
を備えている情報処理装置。
(Appendix A1)
obtaining means for obtaining output values resulting from each of the one or more models;
an information processing device comprising: a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means; and a derivation means that executes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 (付記A2)
 前記導出手段は、
  前記出力値及び前記第1の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
  前記信頼度を所定のタイミングで初期化する
付記A1に記載の情報処理装置。
(Appendix A2)
The derivation means is
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing device according to claim A1, wherein the reliability is initialized at a predetermined timing.

 (付記A3)
 前記導出手段は、
  前記第1の最適解に対応する損失値を推定し、
  前記損失値を参照して、前記第1の最適解を導出するためのパラメータを更新する
付記A1又はA2に記載の情報処理装置。
(Appendix A3)
The derivation means is
Estimating a loss value corresponding to the first optimal solution;
The information processing device according to claim A1 or A2, wherein a parameter for deriving the first optimal solution is updated by referring to the loss value.

 (付記A4)
 前記導出手段は、前記損失値を、不偏推定量として算出する
付記A3に記載の情報処理装置。
(Appendix A4)
The information processing device according to claim 3, wherein the derivation means calculates the loss value as an unbiased estimator.

 (付記A5)
 前記導出手段は、補正のためのパラメータを更に参照して、前記第1の最適解を更新する
付記A3又はA4に記載の情報処理装置。
(Appendix A5)
The information processing device according to claim 3, wherein the derivation means updates the first optimal solution by further referring to a parameter for correction.

 (付記A6)
 前記導出手段は、前記信頼度を導出するために参照する学習率を、前記第1の導出処理の総数の-1/2乗に比例した値に設定する
付記A1からA5の何れか1つに記載の情報処理装置。
(Appendix A6)
The information processing device according to any one of appendices A1 to A5, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the −1/2 power of a total number of the first derivation processes.

 (付記A7)
 前記導出手段が前記第1の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記A1からA5の何れか1つに記載の情報処理装置。
(Appendix A7)
The information processing device according to any one of appendices A1 to A5, wherein the Bregman divergence referred to by the derivation means to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

 (付記A8)
 前記第1の導出処理及び前記第2の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記A1からA7の何れか1つに記載の情報処理装置。
(Appendix A8)
The information processing device according to any one of appendices A1 to A7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values that are sequentially acquired.

 〔付記事項B〕
 本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。
[Appendix B]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

 (付記B1)
 少なくとも1つのプロセッサが、
 1又は複数のモデルの各々から得られる出力値を取得する取得処理と、
  前記取得処理が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を含む導出処理と
を含んでいる情報処理方法。
(Appendix B1)
At least one processor
an acquisition process for acquiring output values from each of the one or more models;
an information processing method including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a derivation process that includes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 (付記B2)
 前記導出処理において、前記少なくとも1つのプロセッサは、
  前記出力値及び前記第1の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
  前記信頼度を所定のタイミングで初期化する
付記B1に記載の情報処理方法。
(Appendix B2)
In the derivation process, the at least one processor
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing method according to claim B1, further comprising initializing the reliability at a predetermined timing.

 (付記B3)
 前記導出処理において、前記少なくとも1つのプロセッサは、
  前記第1の最適解に対応する損失値を推定し、
  前記損失値を参照して、前記第1の最適解を導出するためのパラメータを更新する
付記B1又はB2に記載の情報処理方法。
(Appendix B3)
In the derivation process, the at least one processor
Estimating a loss value corresponding to the first optimal solution;
The information processing method according to claim 1 or 2, further comprising updating a parameter for deriving the first optimal solution by referring to the loss value.

 (付記B4)
 前記導出処理において、前記少なくとも1つのプロセッサは、前記損失値を、不偏推定量として算出する
付記B3に記載の情報処理方法。
(Appendix B4)
The information processing method according to claim B3, wherein in the derivation process, the at least one processor calculates the loss value as an unbiased estimator.

 (付記B5)
 前記導出処理において、前記少なくとも1つのプロセッサは、補正のためのパラメータを更に参照して、前記第1の最適解を更新する
付記B3又はB4に記載の情報処理方法。
(Appendix B5)
The information processing method according to any one of claims 3 to 4, wherein in the derivation process, the at least one processor further refers to a parameter for correction to update the first optimal solution.

 (付記B6)
 前記導出処理において、前記少なくとも1つのプロセッサは、前記信頼度を導出するために参照する学習率を、前記第1の導出処理の総数の-1/2乗に比例した値に設定する
付記B1からB5の何れか1つに記載の情報処理方法。
(Appendix B6)
The information processing method according to any one of appendices B1 to B5, wherein, in the derivation process, the at least one processor sets a learning rate referred to for deriving the reliability to a value proportional to the -1/2 power of a total number of the first derivation processes.

 (付記B7)
 前記導出処理が前記第1の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記B1からB5の何れか1つに記載の情報処理方法。
(Appendix B7)
The information processing method according to any one of appendices B1 to B5, wherein the Bregman divergence referred to in the derivation process to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

 (付記B8)
 前記第1の導出処理及び前記第2の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記B1からB7の何れか1つに記載の情報処理方法。
(Appendix B8)
The information processing method according to any one of appendices B1 to B7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values obtained sequentially.

 〔付記事項C〕
 本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。
[Appendix C]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

 (付記C1)
 情報処理装置としてコンピュータを機能させるプログラムであって、
 前記コンピュータを、
 1又は複数のモデルの各々から得られる出力値を取得する取得手段と、
  前記取得手段が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する導出手段と
として機能させる情報処理プログラム。
(Appendix C1)
A program for causing a computer to function as an information processing device,
The computer,
obtaining means for obtaining output values resulting from each of the one or more models;
an information processing program that causes the information processing device to function as a derivation means that executes a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means, and a second derivation process that derives a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 (付記C2)
 前記導出手段は、
  前記出力値及び前記第1の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
  前記信頼度を所定のタイミングで初期化する
付記C1に記載の情報処理プログラム。
(Appendix C2)
The derivation means is
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing program according to claim C1, further comprising: initializing the reliability at a predetermined timing.

 (付記C3)
 前記導出手段は、
  前記第1の最適解に対応する損失値を推定し、
  前記損失値を参照して、前記第1の最適解を導出するためのパラメータを更新する
付記C1又はC2に記載の情報処理プログラム。
(Appendix C3)
The derivation means is
Estimating a loss value corresponding to the first optimal solution;
The information processing program according to claim 1 or 2, further comprising updating a parameter for deriving the first optimal solution by referring to the loss value.

 (付記C4)
 前記導出手段は、前記損失値を、不偏推定量として算出する
付記C3に記載の情報処理プログラム。
(Appendix C4)
The information processing program according to claim C3, wherein the derivation means calculates the loss value as an unbiased estimator.

 (付記C5)
 前記導出手段は、補正のためのパラメータを更に参照して、前記第1の最適解を更新する
付記C3又はC4に記載の情報処理プログラム。
(Appendix C5)
The information processing program according to claim C3 or C4, wherein the derivation means further refers to a parameter for correction to update the first optimal solution.

 (付記C6)
 前記導出手段は、前記信頼度を導出するために参照する学習率を、前記第1の導出処理の総数の-1/2乗に比例した値に設定する
付記C1からC5の何れか1つに記載の情報処理プログラム。
(Appendix C6)
The information processing program according to any one of appendices C1 to C5, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the −1/2 power of a total number of the first derivation processes.

 (付記C7)
 前記導出手段が前記第1の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記C1からC5の何れか1つに記載の情報処理プログラム。
(Appendix C7)
The information processing program according to any one of appendices C1 to C5, wherein the Bregman divergence referred to by the derivation means to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

 (付記C8)
 前記第1の導出処理及び前記第2の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記C1からC7の何れか1つに記載の情報処理プログラム。
(Appendix C8)
The information processing program according to any one of appendices C1 to C7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values obtained sequentially.

 〔付記事項D〕
 本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。
[Appendix D]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

 (付記D1)
 少なくとも1つのプロセッサを備え、前記少なくとも1つのプロセッサは、
 1又は複数のモデルの各々から得られる出力値を取得する取得処理と、
  前記取得処理が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を含む導出処理と
を実行する情報処理装置。
(Appendix D1)
At least one processor, the at least one processor comprising:
an acquisition process for acquiring output values from each of the one or more models;
an information processing device that executes a derivation process including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 なお、前記情報処理装置は、更にメモリを備えていてもよい。また、前記メモリには、前記各処理を前記少なくとも1つのプロセッサに実行させるためのプログラムが記憶されていてもよい。 The information processing device may further include a memory. The memory may also store a program for causing the at least one processor to execute each of the processes.

 (付記D2)
 前記導出処理において、前記少なくとも1つのプロセッサは、
  前記出力値及び前記第1の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
  前記信頼度を所定のタイミングで初期化する
付記D1に記載の情報処理装置。
(Appendix D2)
In the derivation process, the at least one processor
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing device according to claim D1, wherein the reliability is initialized at a predetermined timing.

 (付記D3)
 前記導出処理において、前記少なくとも1つのプロセッサは、
  前記第1の最適解に対応する損失値を推定し、
  前記損失値を参照して、前記第1の最適解を導出するためのパラメータを更新する
付記D1又はD2に記載の情報処理装置。
(Appendix D3)
In the derivation process, the at least one processor
Estimating a loss value corresponding to the first optimal solution;
The information processing device according to claim D1 or D2, wherein a parameter for deriving the first optimal solution is updated by referring to the loss value.

 (付記D4)
 前記導出処理において、前記少なくとも1つのプロセッサは、前記損失値を、不偏推定量として算出する
付記D3に記載の情報処理装置。
(Appendix D4)
The information processing device of claim D3, wherein in the derivation process, the at least one processor calculates the loss value as an unbiased estimator.

 (付記D5)
 前記導出処理において、前記少なくとも1つのプロセッサは、補正のためのパラメータを更に参照して、前記第1の最適解を更新する
付記D3又はD4に記載の情報処理装置。
(Appendix D5)
The information processing device according to claim D3 or D4, wherein, in the derivation process, the at least one processor further refers to a parameter for correction to update the first optimal solution.

 (付記D6)
 前記導出処理において、前記少なくとも1つのプロセッサは、前記信頼度を導出するために参照する学習率を、前記第1の導出処理の総数の-1/2乗に比例した値に設定する
付記D1からD5の何れか1つに記載の情報処理装置。
(Appendix D6)
The information processing device according to any one of appendices D1 to D5, wherein, in the derivation process, the at least one processor sets a learning rate referred to for deriving the reliability to a value proportional to the −1/2 power of a total number of the first derivation processes.

 (付記D7)
 前記導出処理が前記第1の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
付記D1からD5の何れか1つに記載の情報処理装置。
(Appendix D7)
The information processing device according to any one of appendices D1 to D5, wherein the Bregman divergence referred to in the derivation process to derive the first optimal solution is defined by a convex function including a logarithmic barrier term.

 (付記D8)
 前記第1の導出処理及び前記第2の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
付記D1からD7の何れか1つに記載の情報処理装置。
(Appendix D8)
The information processing device according to any one of appendices D1 to D7, wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values acquired sequentially.

 〔付記事項E〕
 本開示には、以下の各付記に記載の技術が含まれる。ただし、本発明は、以下の各付記に記載の技術に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。
[Appendix E]
This disclosure includes the techniques described in the following appendices. However, the present invention is not limited to the techniques described in the following appendices, and various modifications are possible within the scope of the claims.

 (付記E1)
 情報処理装置としてコンピュータを機能させるプログラムであって、
 前記コンピュータに、
 1又は複数のモデルの各々から得られる出力値を取得する取得処理と、
  前記取得処理が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を含む導出処理と
を実行させる情報処理プログラム、を記録した一時的でない記録媒体。
(Appendix E1)
A program for causing a computer to function as an information processing device,
The computer includes:
an acquisition process for acquiring output values from each of the one or more models;
A non-transitory recording medium having recorded thereon an information processing program for executing a derivation process including: a plurality of first derivation processes that derive a first optimal solution by referring to output values acquired by the acquisition process; and a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.

 本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. The technical scope of the present invention also includes embodiments obtained by appropriately combining the technical means disclosed in the different embodiments.

 1,1A    ・・・情報処理装置
 100、100A・・・情報処理システム
 10A     ・・・制御部
 11      ・・・取得部
 12      ・・・導出部
 121     ・・・第1の導出部
 122     ・・・第2の導出部
 S1,S100 ・・・情報処理方法
Reference Signs List 1, 1A: Information processing device 100, 100A: Information processing system 10A: Control unit 11: Acquisition unit 12: Derivation unit 121: First derivation unit 122: Second derivation unit S1, S100: Information processing method

Claims (11)

 1又は複数のモデルの各々から得られる出力値を取得する取得手段と、
  前記取得手段が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する導出手段と
を備えている情報処理装置。
obtaining means for obtaining output values resulting from each of the one or more models;
an information processing device comprising: a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means; and a derivation means that executes a second derivation process that derives a second optimal solution depending on the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
 前記導出手段は、
  前記出力値及び前記第1の最適解の少なくとも何れかを参照して、前記信頼度を導出し、
  前記信頼度を所定のタイミングで初期化する
請求項1に記載の情報処理装置。
The derivation means is
deriving the reliability by referring to at least one of the output value and the first optimal solution;
The information processing apparatus according to claim 1 , wherein the reliability is initialized at a predetermined timing.
 前記導出手段は、
  前記第1の最適解に対応する損失値を推定し、
  前記損失値を参照して、前記第1の最適解を導出するためのパラメータを更新する
請求項1又は2に記載の情報処理装置。
The derivation means is
Estimating a loss value corresponding to the first optimal solution;
The information processing apparatus according to claim 1 , further comprising: updating a parameter for deriving the first optimal solution by referring to the loss value.
 前記導出手段は、前記損失値を、不偏推定量として算出する
請求項3に記載の情報処理装置。
The information processing apparatus according to claim 3 , wherein the derivation means calculates the loss value as an unbiased estimator.
 前記導出手段は、補正のためのパラメータを更に参照して、前記第1の最適解を更新する
請求項3又は4に記載の情報処理装置。
The information processing apparatus according to claim 3 , wherein the derivation means updates the first optimum solution by further referring to a parameter for correction.
 前記導出手段は、前記信頼度を導出するために参照する学習率を、前記第1の導出処理の総数の-1/2乗に比例した値に設定する
請求項1から5の何れか1項に記載の情報処理装置。
6. The information processing device according to claim 1, wherein the derivation means sets a learning rate referred to for deriving the reliability to a value proportional to the power of −1/2 of a total number of the first derivation processes.
 前記導出手段が前記第1の最適解を導出するために参照するブレグマン情報量は、対数バリア項を含む凸関数によって規定される
請求項1から5の何れか1項に記載の情報処理装置。
The information processing apparatus according to claim 1 , wherein the Bregman divergence referred to by the derivation means for deriving the first optimal solution is defined by a convex function including a logarithmic barrier term.
 前記第1の導出処理及び前記第2の導出処理は、逐次的に取得する前記出力値を参照したオンライン機械学習処理である
請求項1から7の何れか1項に記載の情報処理装置。
The information processing device according to claim 1 , wherein the first derivation process and the second derivation process are online machine learning processes that refer to the output values that are sequentially acquired.
 情報処理装置が、
 1又は複数のモデルの各々から得られる出力値を取得し、
  前記取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する情報処理方法。
An information processing device,
obtaining output values from each of the one or more models;
an information processing method that executes a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output value; and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and a reliability of each of the plurality of first derivation processes.
 コンピュータを情報処理装置として機能させるプログラムであって、
 前記プログラムは、前記コンピュータに、
 1又は複数のモデルの各々から得られる出力値を取得させ、
  前記取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行させるプログラム。
A program for causing a computer to function as an information processing device,
The program causes the computer to
obtaining output values from each of the one or more models;
a program for executing a plurality of first derivation processes that derive a first optimal solution by referring to the acquired output value, and a second derivation process that derives a second optimal solution according to the first optimal solution derived by each of the plurality of first derivation processes and the reliability of each of the plurality of first derivation processes.
 情報処理装置と、端末装置とを含む情報処理システムであって、
 前記情報処理装置は、
 1又は複数のモデルの各々から得られる出力値を取得する取得手段と、
  前記取得手段が取得した出力値を参照して第1の最適解を導出する複数の第1の導出処理、及び
  前記複数の第1の導出処理の各々が導出した第1の最適解と、前記複数の第1の導出処理の各々の信頼度とに応じて第2の最適解を導出する第2の導出処理
を実行する導出手段と
を備え、
 前記端末装置は、
  前記情報処理装置が導出した前記第2の最適解を実行する実行手段
を備えている
情報処理システム。
An information processing system including an information processing device and a terminal device,
The information processing device includes:
obtaining means for obtaining output values resulting from each of the one or more models;
a plurality of first derivation processes that derive a first optimal solution by referring to the output values acquired by the acquisition means; and a derivation means that executes a second derivation process that derives a second optimal solution in accordance with the first optimal solution derived by each of the plurality of first derivation processes and a reliability of each of the plurality of first derivation processes,
The terminal device
an information processing system comprising an execution means for executing the second optimal solution derived by the information processing device;
PCT/JP2023/044465 2023-12-12 2023-12-12 Information processing device, information processing method, information processing system, and program Pending WO2025126326A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/044465 WO2025126326A1 (en) 2023-12-12 2023-12-12 Information processing device, information processing method, information processing system, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/044465 WO2025126326A1 (en) 2023-12-12 2023-12-12 Information processing device, information processing method, information processing system, and program

Publications (1)

Publication Number Publication Date
WO2025126326A1 true WO2025126326A1 (en) 2025-06-19

Family

ID=96056881

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/044465 Pending WO2025126326A1 (en) 2023-12-12 2023-12-12 Information processing device, information processing method, information processing system, and program

Country Status (1)

Country Link
WO (1) WO2025126326A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014201770A (en) * 2013-04-02 2014-10-27 株式会社神戸製鋼所 Estimation apparatus and estimation method of prediction model of converter
JP2018139050A (en) * 2017-02-24 2018-09-06 株式会社神戸製鋼所 Index presentation system and index presentation method
WO2019220479A1 (en) * 2018-05-14 2019-11-21 日本電気株式会社 Measure determination system, measure determination method, and measure determination program
JP2023523560A (en) * 2020-04-13 2023-06-06 カリベル・ラブズ・インコーポレーテッド Systems and methods for AI-assisted surgery

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014201770A (en) * 2013-04-02 2014-10-27 株式会社神戸製鋼所 Estimation apparatus and estimation method of prediction model of converter
JP2018139050A (en) * 2017-02-24 2018-09-06 株式会社神戸製鋼所 Index presentation system and index presentation method
WO2019220479A1 (en) * 2018-05-14 2019-11-21 日本電気株式会社 Measure determination system, measure determination method, and measure determination program
JP2023523560A (en) * 2020-04-13 2023-06-06 カリベル・ラブズ・インコーポレーテッド Systems and methods for AI-assisted surgery

Similar Documents

Publication Publication Date Title
US11461515B2 (en) Optimization apparatus, simulation system and optimization method for semiconductor design
US20210142200A1 (en) Probabilistic decision making system and methods of use
US11222262B2 (en) Non-Markovian control with gated end-to-end memory policy networks
US20230316141A1 (en) Systems and methods for weighted federated learning in a hybrid operating room environment
CN112055878B (en) Adjusting a machine learning model based on the second set of training data
US20230368040A1 (en) Online probabilistic inverse optimization system, online probabilistic inverse optimization method, and online probabilistic inverse optimization program
US20150112891A1 (en) Information processor, information processing method, and program
US11074529B2 (en) Predicting event types and time intervals for projects
US11501207B2 (en) Lifelong learning with a changing action set
US11593606B1 (en) System, server and method for predicting adverse events
US8635175B2 (en) Managing capacities and structures in stochastic networks
US20220165417A1 (en) Population-level gaussian processes for clinical time series forecasting
CN114730382A (en) Constrained training of artificial neural networks using labeled medical data with mixed quality
US20220328160A1 (en) Rehabilitation work support apparatus, rehabilitation work support system, rehabilitation work support method, and computer readable medium
WO2025126326A1 (en) Information processing device, information processing method, information processing system, and program
US20210326705A1 (en) Learning device, learning method, and learning program
US12105770B2 (en) Estimation device, estimation method, and computer program product
WO2024228243A1 (en) Information processing device, information processing method, information processing system, and program
KR20240009883A (en) Method, program, and apparatus for training of neural network model based on electrocardiogram
WO2024180789A1 (en) Information processing device, information processing method, program
Hulshof et al. Tactical planning in healthcare using approximate dynamic programming
Song Sequential bayesian risk set inference for robust discrete optimization via simulation
KR102881156B1 (en) Method, program, and apparatus for prediction of health status using electrocardiogram segment
Duan et al. Two-stage appointment scheduling considering patient foldback under a stochastic approximation approach
JP7687518B2 (en) estimation device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23961393

Country of ref document: EP

Kind code of ref document: A1