[go: up one dir, main page]

WO2025163857A1 - Dispositif de traitement d'informations, procédé de traitement d'informations et programme - Google Patents

Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Info

Publication number
WO2025163857A1
WO2025163857A1 PCT/JP2024/003301 JP2024003301W WO2025163857A1 WO 2025163857 A1 WO2025163857 A1 WO 2025163857A1 JP 2024003301 W JP2024003301 W JP 2024003301W WO 2025163857 A1 WO2025163857 A1 WO 2025163857A1
Authority
WO
WIPO (PCT)
Prior art keywords
information processing
model
uncertainty
samples
control input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2024/003301
Other languages
English (en)
Japanese (ja)
Inventor
大地 平野
凛 高野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to PCT/JP2024/003301 priority Critical patent/WO2025163857A1/fr
Publication of WO2025163857A1 publication Critical patent/WO2025163857A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators

Definitions

  • This disclosure relates to an information processing device, an information processing method, and a program.
  • Patent Document 1 discloses a technique for performing sampling-based optimal control using a dynamics model approximated by a neural network.
  • the learned dynamics model may contain errors compared to the actual dynamics model, and when such a dynamics model is used to calculate the optimal control input, there is a risk that errors will become large during execution. As a result, there is a problem in that there is a risk that the control accuracy using the dynamics model of the controlled object will decrease.
  • the purpose of this disclosure is to solve the above-mentioned problem of the risk of reduced control accuracy when performing control using a dynamics model of the controlled object.
  • An information processing device includes: an acquisition unit that acquires a dynamics model that approximates the dynamics of a controlled object and outputs a state of the controlled object according to a control input, and an uncertainty model that represents uncertainty of the dynamics model; a generation unit that generates a sample of a control input for the control object using the uncertainty model; a calculation unit that calculates optimal control inputs from the samples using the dynamics model; Equipped with The structure is as follows.
  • an information processing method includes: A dynamics model approximating the dynamics of a controlled object that outputs a state of the controlled object according to a control input and an uncertainty model representing uncertainty of the dynamics model are obtained; generating samples of control inputs to the control plant using the uncertainty model; calculating optimal control inputs from the samples using the dynamics model;
  • the structure is as follows.
  • a program includes: A dynamics model approximating the dynamics of a controlled object that outputs a state of the controlled object according to a control input and an uncertainty model representing uncertainty of the dynamics model are obtained; generating samples of control inputs to the control plant using the uncertainty model; calculating optimal control inputs from the samples using the dynamics model; Have the computer perform the process,
  • the structure is as follows.
  • the present disclosure can improve control accuracy when performing control using a dynamics model of the controlled object.
  • FIG. 1 is a block diagram showing the overall configuration of a control system according to the present disclosure.
  • FIG. 2 is a block diagram showing a hardware configuration of a learning device according to the present disclosure.
  • FIG. 2 is a block diagram showing a hardware configuration of a control device according to the present disclosure.
  • FIG. 1 is a block diagram illustrating a configuration of a learning device according to the present disclosure.
  • FIG. 2 is a block diagram illustrating a configuration of a control device according to the present disclosure.
  • FIG. 10 is a diagram illustrating a process performed by a control device according to the present disclosure.
  • FIG. 10 is a diagram illustrating a process performed by a control device according to the present disclosure.
  • 10 is a flowchart illustrating a processing operation of the learning device according to the present disclosure.
  • FIG. 4 is a flowchart illustrating a processing operation of a control device according to the present disclosure.
  • FIG. 1 is a block diagram showing a hardware configuration of an information processing device according to the present disclosure.
  • FIG. 1 is a block diagram illustrating a configuration of an information processing device according to the present disclosure.
  • Control target The control system according to the present disclosure can be used, for example, when a robot arm, which is the control target, is used to move an object to be manipulated, for example, by pushing it to a desired position.
  • the control target is the robot arm and the object.
  • the control input to the robot arm, which is the control target can be, for example, the position of the robot arm's hand
  • the state of the control target can be, for example, the position and velocity of the robot arm's hand.
  • the state of the control target may also be the position and velocity of an object manipulated by the robot arm's hand.
  • the control target by the control system is described as a robot arm, but the control target is not limited to a robot arm and may be anything.
  • Fig. 1 is a diagram showing an example of the configuration of a control system according to this embodiment.
  • the control system 5 includes a learning device 1, a storage device 2, a control device 3, and a control target 4.
  • the learning device 1 performs data communication with the storage device 2 via a communication network or by direct wireless or wired communication.
  • the control device 3 also performs data communication with the storage device 2 and the control target 4 via a communication network or by direct wireless or wired communication.
  • the controlled object 4 performs an operation related to the control target based on the control input provided by the control device 3.
  • the controlled object 4 also supplies a state signal representing the state of the controlled object 4 to the control device 3.
  • the controlled object 4 is, for example, a robot that is the subject of optimal control, such as a robot arm that behaves autonomously as described above.
  • the state signal representing the state of the controlled object is, for example, the output signal of various sensors that detect the position and orientation of the robot.
  • the state signal is particularly the position of the hand of the robot arm that is the controlled object 4.
  • the controlled object 4 may also include an object that is manipulated by the robot arm described above.
  • the state signal may be a detection signal of the position, speed, etc. of the object manipulated by the controlled object 4.
  • the state signal may be a detection signal of the position of an object detected from an image of the object captured by a camera attached to the robot arm.
  • the control device 3 receives the current state of the controlled object 4 as input, calculates a control input based on the control target, and outputs the calculated control input to the controlled object 4.
  • the control input is calculated using a sampling-based optimal control method using an approximate dynamics model and an uncertainty model stored in the storage device 2.
  • sampling-based optimal control methods include Model Predictive Path Integral control and control methods based on the Cross-Entropy Method.
  • the learning device 1 learns an approximate dynamics model of the control target 4 from previously provided learning data, for example, by machine learning using a neural network.
  • the learning device 1 also learns an uncertainty model of the learned approximate dynamics model.
  • the learning device 1 then registers the learned dynamics model and uncertainty model in the storage device 2.
  • the above-mentioned approximate dynamics model is a machine learning model that receives as input the state of the controlled object at the current time and a control input, and outputs the state of the controlled object at the next time. Furthermore, the learning data is data on the state of the controlled object, the control input, and the subsequent state of the controlled object, collected using the controlled object or a simulator of the controlled object, etc., in order to use it for learning the above-mentioned approximate dynamics model.
  • the above-mentioned uncertainty model is learned by machine learning, including the error of the approximate dynamics model and the probabilistic uncertainty of the dynamics of the controlled object. Furthermore, the uncertainty model may be learned using the same model as the approximate dynamics, for example, using a Bayesian neural network model, an ensemble model, etc.
  • the storage device 2 stores the dynamics model and uncertainty model learned by the learning device 1.
  • the storage device 2 may be an external storage device such as a hard disk connected to or built into the learning device 1 or the control device 3, a storage medium such as flash memory, or a server device that communicates data with the learning device 1 and the control device 3.
  • the storage device 2 may also be composed of multiple storage devices, and may have each of the above-mentioned storage units distributed among them.
  • control device 3 and the controlled object 4 may be configured as an integrated unit.
  • at least two of the learning device 1, storage device 2, and control device 3 may be configured as an integrated unit.
  • Fig. 2 is a diagram showing an example of the hardware configuration of the learning device 1.
  • the learning device 1 includes, as hardware, a processor 11, a memory 12, and an interface 13.
  • the processor 11, the memory 12, and the interface 13 are connected via a data bus 10.
  • Processor 11 functions as a controller (computing device) that controls the entire learning device 1 by executing programs stored in memory 12.
  • Processor 11 is, for example, a processor such as a CPU (Central Processing Unit), GPU (Graphics Processing Unit), or TPU (Tensor Processing Unit).
  • Processor 11 may also be composed of multiple processors.
  • Processor 11 is an example of a computer.
  • Memory 12 is composed of various types of volatile and non-volatile memory, such as RAM (Random Access Memory), ROM (Read Only Memory), and flash memory. Memory 12 also stores programs for executing the processes performed by learning device 1. Some of the information stored in memory 12 may be stored in one or more external storage devices (e.g., storage device 2) capable of communicating with learning device 1, or in a storage medium that is detachable from learning device 1.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • flash memory also stores programs for executing the processes performed by learning device 1.
  • Some of the information stored in memory 12 may be stored in one or more external storage devices (e.g., storage device 2) capable of communicating with learning device 1, or in a storage medium that is detachable from learning device 1.
  • Interface 13 is an interface for electrically connecting learning device 1 to other devices. These interfaces may be wireless interfaces such as network adapters for wirelessly sending and receiving data with other devices, or hardware interfaces for connecting to other devices via cables or the like. For example, interface 13 may interface with input devices that accept user input (external input), such as touch panels, buttons, keyboards, and voice input devices, display devices such as displays and projectors, and sound output devices such as speakers.
  • wireless interfaces such as network adapters for wirelessly sending and receiving data with other devices, or hardware interfaces for connecting to other devices via cables or the like.
  • interface 13 may interface with input devices that accept user input (external input), such as touch panels, buttons, keyboards, and voice input devices, display devices such as displays and projectors, and sound output devices such as speakers.
  • the hardware configuration of the learning device 1 is not limited to the configuration shown in FIG. 2.
  • the learning device 1 may incorporate at least one of a display device, an input device, and a sound output device.
  • the learning device 1 may also be configured to include a storage device 2.
  • FIG. 3 is a diagram showing an example of the hardware configuration of the control device 3.
  • the control device 3 includes, as hardware, a processor 31, a memory 32, and an interface 33.
  • the processor 31, memory 32, and interface 33 are connected via a data bus 30.
  • the processor 31 functions as a controller (arithmetic unit) that performs overall control of the control device 3 by executing programs stored in the memory 32.
  • the processor 31 is, for example, a processor such as a CPU, GPU, or TPU.
  • the processor 31 may also be composed of multiple processors.
  • Memory 32 is composed of various types of volatile and non-volatile memory, such as RAM, ROM, and flash memory. Memory 32 also stores programs for executing the processes performed by control device 3. Some of the information stored in memory 32 may be stored in one or more external storage devices (e.g., storage device 2) that can communicate with control device 3, or in a storage medium that is detachable from control device 3.
  • external storage devices e.g., storage device 2
  • Interface 33 is an interface for electrically connecting the control device 3 to other devices. These interfaces may be wireless interfaces such as network adapters for wirelessly transmitting and receiving data to other devices, or may be hardware interfaces for connecting to other devices via cables or the like.
  • control device 3 may incorporate at least one of a display device, an input device, and a sound output device.
  • the control device 3 may also be configured to include the storage device 2.
  • Fig. 4 is a diagram showing an example of the configuration of the learning device 1 that learns an approximate dynamics model and an uncertainty model.
  • the learning device 1 has the function of receiving input of learning data and learning an approximate dynamics model and an uncertainty model.
  • the learning device 1 includes a dynamics model learning unit 14 and an uncertainty model learning unit 15.
  • the functions of the dynamics model learning unit 14 and the uncertainty model learning unit 15 are realized when the equipped processor 11 executes a program stored in the memory 12.
  • the learning data is given by a set of pairs ⁇ x_t,u_t,x_(t+1) ⁇ of the state x_t and control input u_t at the current time t, and the state x_(t+1) at the next time t+1.
  • This data may also be given by the sequence of states and control inputs x_ ⁇ 0:T ⁇ ,u_( ⁇ 0:T-1 ⁇ ).
  • x_( ⁇ 0:T ⁇ ) it is taken to represent the sequence of states x from time 0 to time T.
  • the dynamics model learning unit 14 receives the above-mentioned learning data as input, and learns an approximate dynamics model F as shown in the following equation (1).
  • Various models can be used when learning the approximate dynamics model F.
  • a neural network may be used as the model, but this is not limitative.
  • the uncertainty model learning unit 15 receives training data as input and learns a model that represents the uncertainty of the dynamics model.
  • the probability distribution p(x_t,u_t) of the training data ⁇ x_t,u_t ⁇ learned using a generative model, or something similar, is used as the uncertainty model.
  • the reason for using the training data distribution as the uncertainty model of the trained dynamics model is that it is expected that the accuracy of the trained dynamics model will be high in areas where the training data density is high.
  • various generative models can be used when learning the probability distribution of training data.
  • flow-based models and energy-based models can be used, but are not limited to these.
  • the dynamics model and uncertainty model learned by the learning device 1 are stored in the storage device 2.
  • the approximate dynamics model and uncertainty model do not necessarily have to be learned by the learning device 1, and may be stored in advance in the storage device 2.
  • the approximate dynamics model and uncertainty model do not necessarily have to be generated by learning, and may be generated by a method other than learning.
  • the dynamics model and uncertainty model may be generated from design data, motion analysis data, simulation data, etc. of the robot, which is the control object 4, and may be generated by any method.
  • FIG. 5 is a diagram showing an example of the configuration of the control device 3 that calculates the control input.
  • the control device 3 calculates the control input by a sampling-based method using the current state provided by the controlled object 4 and information on the approximate dynamics model and uncertainty model read from the storage device 2.
  • the control input is the tip position of the robot arm
  • the control input sequence is the time series of the tip position of the robot arm, i.e., the tip trajectory.
  • the state of the controlled object is the position and velocity of the tip of the robot arm, i.e., the position and velocity of the object manipulated by the robot arm.
  • the control device 3 includes an initial control input sequence generator 34, a state/control input sequence sample generator 35, an evaluation cost calculator 38, and a control input sequence updater 39.
  • the state/control input sequence sample generator 35 also includes a control input sample generator 36 and a state transition calculator 37.
  • the control device 3 updates the control input sequence u_(0:T_h-1) and calculates the optimal control input sequence by repeatedly performing processing by each of the units 35-38.
  • the functions of the initial control input sequence generator 34, the state/control input sequence sample generator 35, the control input sample generator 36, the state transition calculator 37, the evaluation cost calculator 38, and the control input sequence updater 39 are realized by the equipped processor 31 executing a program stored in memory 32.
  • the initial control input sequence generator 34 outputs a time series of control inputs as an initial solution when performing optimization calculations for the control input sequence. This may be given as a zero vector, for example, or a control input policy that has been separately learned or designed in advance may be used.
  • the state and control input sequence sample generation unit 35 uses the state of the controlled object, the control input sequence of the current optimization step, and the uncertainty model and approximate dynamics model learned by the learning device 1 to calculate different control input sequence samples and state sequence samples calculated from each control input sequence sample.
  • the samples are generated by repeatedly executing the processing by the control input sample generation unit 36 and the state transition calculation unit 37 for the length of the time series.
  • the control input sample generation unit 36 generates samples using the state at a certain time, the nominal input, and the uncertainty model.
  • the state at a certain time is either the initial state x_0 provided by the control object 4, or the state at the previous time provided by the state transition calculation unit 37, or a state calculated from the input sample.
  • the nominal input refers to either the initial solution of the control input sequence provided by the initial control input sequence generation unit 34, or the control input sequence updated by optimization calculation provided by the control input sequence update unit 39.
  • the control input sample generation unit 36 can generate samples by, for example, regarding a normal distribution with the nominal input as the mean as the likelihood distribution, and the distribution of the control input conditioned on the current state obtained from the uncertainty model as the prior distribution, as shown in Figure 6, and performing an appropriate number of samples from the posterior distribution obtained from these.
  • the posterior distribution can be obtained by synthesizing or combining the likelihood distribution and the prior distribution.
  • the nominal input at the current time is given by u_t
  • the uncertainty model is given by the probability distribution p(x_t,u_t) of the training data.
  • the normal distribution shown in the following equation 2 in which the mean is the nominal input u_t and the variance is a pre-given parameter ⁇ , is used as the likelihood distribution
  • x_t) of the control input conditioned on the state at the current time obtained from the training data distribution p(x_t,u_t) is used as the prior distribution
  • sampling from the posterior distribution can be performed, for example, by the Markov chain Monte Carlo method.
  • N samples for example, as shown in Figure 7, M (>N) samples can be obtained from a normal distribution with the above nominal input as the mean, N samples with the smallest uncertainty obtained from the uncertainty model can be selected, and the remaining samples can be discarded.
  • the state transition calculation unit 37 calculates the next-time state sample x_(t+1) ⁇ k from the input sample u_t ⁇ k at a certain time generated by the control input sample generation unit 36 and the state sample x_t ⁇ k at the corresponding time, using the approximate dynamics learned by the learning device 1.
  • N input samples it is assumed that k ⁇ N.
  • the input samples and the next-time state sample are held until sampling is completed. If further generation of the next-time sample is required, the calculated next-time state is output to the control input sample generation unit 36.
  • the held input and state are output to the evaluation cost calculation unit 38 as a sample sequence shown in Equation 3.
  • the evaluation cost calculation unit 38 calculates the evaluation cost S ⁇ k based on the control target from each state sequence and control input sequence sample. It also outputs the calculated evaluation cost to the control input sequence update unit 39 along with the control input sequence sample.
  • the control input sequence update unit 39 calculates an updated nominal input sequence based on the nominal input sequence, the control input sequence samples, and the evaluation costs corresponding to each sample.
  • the nominal input sequence is either the sequence provided by the initial control input sequence generation unit 34, or the nominal input sequence previously updated by the control input sequence update unit 39.
  • the control input sequence update unit 39 determines whether optimization has ended, and upon completion, outputs a control input or a portion of the control input to the control object 4. If optimization is to continue, the control input sequence update unit 39 outputs an updated nominal input sequence to the control input sample generation unit 36. The determination of the end of optimization can be made, for example, by determining convergence of the evaluation cost, a threshold related to the number of updates to the nominal input sequence, or both.
  • the nominal input sequence is updated as follows: When the minimum cost of each sample is expressed by Equation 4 and the difference between each sample and the nominal input sequence is expressed by Equation 5, the updated nominal input sequence u_(0:T_h-1) ⁇ new is given by Equation 6.
  • Fig. 8 is a flowchart showing the operation of the learning device 1 in this embodiment.
  • Step S101 The learning device 1 receives input of learning data.
  • Step S102 The dynamics model learning unit 14 and the uncertainty model learning unit 15 respectively learn the approximate dynamics model and the uncertainty model. The learning of each model may be performed in parallel or in any order.
  • Step S103 The learning device 1 outputs the learning results, ie, the approximate dynamics model and the uncertainty model, to the storage device 2.
  • the approximate dynamics model and uncertainty model do not necessarily have to be learned by the learning device 1, but may be stored in advance in the storage device 2. Furthermore, the approximate dynamics model and uncertainty model do not necessarily have to be generated by learning, but may also be generated by a method other than learning.
  • Figure 9 is a flowchart showing the operation of the control device 3 in the first embodiment.
  • Step S301 First, the control device 3 receives an input of the current state from the controlled object 4.
  • Step S302 The initial control input sequence generator 34 generates an initial solution for the control input sequence.
  • Step S303 The control device 3 starts a loop L31 for sampling the state sequence and the control input sequence. In the loop L31, sampling calculations in the time series direction are performed by repeating the loop, and the number of times the loop is repeated is represented by time t.
  • Step S304 The control input sample generation unit 36 calculates a control input sample u_t ⁇ k from the state sample x_t ⁇ k for the current step t and the nominal input u_t ⁇ .
  • N samples are calculated based on the current state x_t received from the control device 3.
  • the samples are generated by sampling using the Markov chain Monte Carlo method or the like from the posterior distribution when the normal distribution obtained from the nominal input is used as the prior distribution and the conditional distribution of the input obtained from the training data distribution learned by the learning device 1 is used as the likelihood distribution.
  • Step S305 The state transition calculation unit 37 calculates the state sample x_(t+1) ⁇ k of the next step t+1 from the state sample x_t ⁇ k of the current step t and the control input sample u_t ⁇ k. This calculation is performed using the approximate dynamics model learned by the learning device 1.
  • Step S306 The control device 3 performs termination processing of the loop L31. Specifically, the control device 3 references the number of repetitions t and determines whether the sampling calculation has been completed up to the termination time. If it is determined that the sampling processing has not been completed, the control device 3 continues the sampling processing for the next time. In this case, the processing returns to step S304. On the other hand, if it is determined that the sampling processing has been completed, the control device 3 terminates the loop L31. In this case, the processing proceeds to step S307.
  • Step S307 The evaluation cost calculation unit 38 calculates the evaluation cost for each of the generated N sample sequences.
  • Step S308 The control input update unit 39 updates the nominal input string based on each sample and its evaluation cost.
  • Step S309 The control device 3 determines whether the optimization of the control input sequence has been completed. (Step S310) If the optimization is to be continued (step S310: YES), the process proceeds to step S303. On the other hand, if the optimization is to be ended (step S310: NO), the process proceeds to step S311. (Step S311) The control device 3 outputs the control input sequence or a part thereof resulting from the optimization calculation to the controlled object 4.
  • sampling can be performed that takes the uncertainty of the dynamics model into account when performing a sampling-based optimization method.
  • optimal control can be performed using a more likely region of the learned dynamics model, improving the accuracy of control execution.
  • sampling can be performed from the distribution.
  • the uncertainty model is different from that in the first embodiment.
  • the learning device 1 may learn the uncertainty model so that the uncertainty model is obtained as an amount corresponding to the variance of the output of the dynamics model.
  • the uncertainty model indicates that the greater the variance of the output of the dynamics model, the greater the uncertainty.
  • the uncertainty model can be obtained by learning a dynamics model using, for example, an ensemble model or a Bayesian neural network.
  • the configurations of the control system 5 and the control device 3 in the second embodiment are the same as those in the first embodiment.
  • control input when sampling the control input in the control input sample generation unit 36, for example, when generating N samples, the control input can be generated by a method such as obtaining M (>N) samples from a normal distribution with the nominal input as the mean, selecting N samples with the smallest uncertainty obtained from the uncertainty model, and discarding the remaining samples, as shown in Figure 7.
  • the information processing device 100 in this embodiment is configured as a general information processing device, and is equipped with the following hardware configuration, for example.
  • ⁇ CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • Program group 104 loaded into RAM 103
  • a storage device 105 that stores the program group 104
  • a drive device 106 that reads and writes data from and to a storage medium 110 external to the information processing device
  • a communication interface 107 that connects to a communication network 111 outside the information processing device
  • a bus 109 that connects each component
  • Figure 10 shows an example of the hardware configuration of an information processing device, which is information processing device 100, and the hardware configuration of the information processing device is not limited to the above-described case.
  • the information processing device may be configured with only a part of the above-described configuration, such as not having drive device 106.
  • the information processing device may use a GPU (Graphics Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating point number Processing Unit), PPU (Physics Processing Unit), TPU (Tensor Processing Unit), quantum processor, microcontroller, or a combination of these.
  • GPU Graphics Processing Unit
  • DSP Digital Signal Processor
  • MPU Micro Processing Unit
  • FPU Floating point number Processing Unit
  • PPU Physicals Processing Unit
  • TPU Tinsor Processing Unit
  • quantum processor microcontroller
  • the information processing device 100 can then configure and include the acquisition unit 121, generation unit 122, and calculation unit 123 shown in FIG. 11 by having the CPU 101 acquire and execute the group of programs 104.
  • the group of programs 104 may be stored in advance in the storage device 105 or ROM 102, for example, and loaded into RAM 103 and executed by the CPU 101 as needed.
  • the group of programs 104 may also be supplied to the CPU 101 via the communication network 111, or may be stored in advance in the storage medium 110, with the drive device 106 reading out the programs and supplying them to the CPU 101.
  • the acquisition unit 121, generation unit 122, and calculation unit 123 described above may also be configured using dedicated electronic circuits for realizing such means.
  • the acquisition unit 121 acquires a dynamics model that approximates the dynamics of the controlled object that has been trained to output a state of the controlled object according to a control input, and an uncertainty model that represents the uncertainty of the dynamics model.
  • the generation unit 122 uses the uncertainty model to generate samples of the control input for the controlled object.
  • the calculation unit 123 uses the dynamics model to calculate an optimal control input from the samples.
  • At least one of the functions of the acquisition unit 121, generation unit 122, and calculation unit 123 described above may be executed by an information processing device installed and connected anywhere on the network, that is, they may be executed by so-called cloud computing.
  • Non-transitory computer readable media include various types of tangible storage media.
  • Examples of non-transitory computer readable media include magnetic recording media (e.g., flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (e.g., magneto-optical disks), CD-ROM (Read Only Memory), CD-R, CD-R/W, and semiconductor memory (e.g., mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
  • the program may be supplied to a computer by various types of transitory computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves.
  • the temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire or optical fiber, or via a wireless communication path.
  • the information processing device acquires the uncertainty model based on learning data used in learning the dynamics model.
  • Information processing device. (Appendix 3) 3.
  • the information processing device according to claim 2 the acquisition unit acquires the uncertainty model based on a probability distribution of the training data.
  • Information processing device. (Appendix 4) 4.
  • the information processing device according to claim 3, the generation unit generates the samples using a distribution based on a preset control input and a probability distribution of the training data.
  • Information processing device. (Appendix 5) 4.
  • the information processing device according to claim 3 generates the samples using a normal distribution with a preset control input as the mean and a probability distribution of the training data.
  • the generation unit generates candidate samples of the control input from a normal distribution having a predetermined control input as an average, and generates the sample from the candidate samples using a probability distribution of the training data.
  • Information processing device. (Appendix 7) 7.
  • the information processing device according to claim 6, the generation unit generates the sample having smaller uncertainty of the dynamics model from the candidate samples using a probability distribution of the training data.
  • Information processing device. (Appendix 8) 10.
  • a dynamics model approximating the dynamics of a controlled object that outputs a state of the controlled object according to a control input and an uncertainty model representing uncertainty of the dynamics model are obtained; generating samples of control inputs to the control plant using the uncertainty model; calculating optimal control inputs from the samples using the dynamics model; Information processing methods.
  • Appendix 10 10. The information processing method according to claim 9, obtaining the uncertainty model based on training data used to train the dynamics model; Information processing methods.
  • the information processing method according to claim 10 obtaining the uncertainty model based on a probability distribution of the training data; Information processing methods.
  • Appendix 12 12.
  • the information processing method according to claim 11, generating the samples using a distribution based on a preset control input and a probability distribution of the training data; Information processing methods.
  • Appendix 13 13.
  • the information processing method according to claim 12, generating the samples using a normal distribution with a preset control input as the mean and a probability distribution of the training data; Information processing methods.
  • Appendix 14 13.
  • the information processing method according to claim 12, generating candidate samples of the control input from a normal distribution with a preset control input as the mean, and generating the sample from the candidate samples using the probability distribution of the learning data; Information processing methods.
  • Appendix 15 15.
  • the information processing method according to claim 14, generating, from the candidate samples, samples with smaller uncertainty in the dynamics model using a probability distribution of the training data; Information processing methods.
  • Appendix 16 10. The information processing method according to claim 9, obtaining the uncertainty model based on a variance of the output of the dynamics model; Information processing methods.
  • Appendix 17 A dynamics model approximating the dynamics of a controlled object that outputs a state of the controlled object according to a control input and an uncertainty model representing uncertainty of the dynamics model are obtained; generating samples of control inputs to the control plant using the uncertainty model; calculating optimal control inputs from the samples using the dynamics model;
  • a computer-readable storage medium that stores a program that causes a computer to execute a process.
  • Control input sequence update unit Control target 5
  • Information processing device 101
  • CPU 102
  • ROM 103
  • RAM 104
  • Program group 105
  • Storage device 106
  • Communication interface 108
  • Input/output interface 109
  • Bus 110
  • Storage medium 111
  • Communication network 121
  • Acquisition unit 122
  • Generation unit 123

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

Un dispositif de traitement d'informations (100) selon la présente divulgation comprend : une unité d'acquisition (121) qui acquiert un modèle de dynamique, qui se rapproche de la dynamique d'une cible de commande et délivre un état de la cible de commande en réponse à une entrée de commande, et un modèle d'incertitude, qui représente l'incertitude du modèle de dynamique ; une unité de génération (122) qui génère des échantillons d'entrée de commande pour la cible de commande à l'aide du modèle d'incertitude ; et une unité de calcul (123) qui calcule des entrées de commande optimales à partir des échantillons à l'aide du modèle de dynamique.
PCT/JP2024/003301 2024-02-01 2024-02-01 Dispositif de traitement d'informations, procédé de traitement d'informations et programme Pending WO2025163857A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2024/003301 WO2025163857A1 (fr) 2024-02-01 2024-02-01 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2024/003301 WO2025163857A1 (fr) 2024-02-01 2024-02-01 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Publications (1)

Publication Number Publication Date
WO2025163857A1 true WO2025163857A1 (fr) 2025-08-07

Family

ID=96590291

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/003301 Pending WO2025163857A1 (fr) 2024-02-01 2024-02-01 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Country Status (1)

Country Link
WO (1) WO2025163857A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018124982A (ja) * 2017-01-31 2018-08-09 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 制御装置および制御方法
WO2019004476A1 (fr) * 2017-06-30 2019-01-03 富士電機株式会社 Dispositif de commande et procédé de réglage de dispositif de commande
JP2020535562A (ja) * 2017-12-18 2020-12-03 三菱電機株式会社 システムを制御する装置及び方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018124982A (ja) * 2017-01-31 2018-08-09 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 制御装置および制御方法
WO2019004476A1 (fr) * 2017-06-30 2019-01-03 富士電機株式会社 Dispositif de commande et procédé de réglage de dispositif de commande
JP2020535562A (ja) * 2017-12-18 2020-12-03 三菱電機株式会社 システムを制御する装置及び方法

Similar Documents

Publication Publication Date Title
JP4169063B2 (ja) データ処理装置、データ処理方法、及びプログラム
US11080586B2 (en) Neural network reinforcement learning
US20190042943A1 (en) Cooperative neural network deep reinforcement learning with partial input assistance
JP6772213B2 (ja) 質問応答装置、質問応答方法及びプログラム
KR20160102690A (ko) 신경망 학습 방법 및 장치, 및 인식 방법 및 장치
US12456044B2 (en) Learning apparatus and method for bidirectional learning of predictive model based on data sequence
US11100388B2 (en) Learning apparatus and method for learning a model corresponding to real number time-series input data
JP4201012B2 (ja) データ処理装置、データ処理方法、およびプログラム
CN112016611B (zh) 生成器网络和策略生成网络的训练方法、装置和电子设备
US11195116B2 (en) Dynamic boltzmann machine for predicting general distributions of time series datasets
CN112016678A (zh) 用于增强学习的策略生成网络的训练方法、装置和电子设备
US20240013754A1 (en) Performance analysis method, performance analysis system and non-transitory computer-readable medium
WO2022180785A1 (fr) Dispositif d'apprentissage, procédé d'apprentissage et support de stockage
JP7179672B2 (ja) 計算機システム及び機械学習方法
US20250165860A1 (en) Learning device, control device, learning method, and storage medium
WO2025163857A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
JP2024003643A (ja) ニューラルネットワークの学習方法、コンピュータプログラム、及び余寿命予測システム
US11410042B2 (en) Dynamic Boltzmann machine for estimating time-varying second moment
WO2024180789A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations, programme
JP7508333B2 (ja) 計算機システム及び学習方法
JP2024120643A (ja) 情報処理装置、情報処理方法およびプログラム
CN116888665A (zh) 电子设备及其控制方法
CN114037066A (zh) 数据处理方法、装置、电子设备及存储介质
US20250164944A1 (en) Learning device, control device, learning method, and storage medium
WO2025046718A1 (fr) Dispositif d'apprentissage, procédé d'apprentissage et programme d'apprentissage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24922263

Country of ref document: EP

Kind code of ref document: A1