[go: up one dir, main page]

WO2020241037A1 - Dispositif d'apprentissage, procédé d'apprentissage, programme d'apprentissage, dispositif de commande automatique, procédé de commande automatique, et programme de commande automatique - Google Patents

Dispositif d'apprentissage, procédé d'apprentissage, programme d'apprentissage, dispositif de commande automatique, procédé de commande automatique, et programme de commande automatique Download PDF

Info

Publication number
WO2020241037A1
WO2020241037A1 PCT/JP2020/014981 JP2020014981W WO2020241037A1 WO 2020241037 A1 WO2020241037 A1 WO 2020241037A1 JP 2020014981 W JP2020014981 W JP 2020014981W WO 2020241037 A1 WO2020241037 A1 WO 2020241037A1
Authority
WO
WIPO (PCT)
Prior art keywords
learning
state value
target device
value
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/014981
Other languages
English (en)
Japanese (ja)
Inventor
学嗣 浅谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Exawizards Inc
Original Assignee
Exawizards Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Exawizards Inc filed Critical Exawizards Inc
Publication of WO2020241037A1 publication Critical patent/WO2020241037A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/18Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
    • G05B19/4155Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form characterised by programme execution, i.e. part programme or machine function execution, e.g. selection of a programme
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a learning device, a learning method and a learning program for training a learning model for controlling a target device, and an automatic control device, an automatic control method and an automatic control program using the learning model.
  • Non-Patent Document 1 motion patterns can be self-organized by directly teaching a robot to perform an object operation task and integrating and learning image, audio signal, and motor modality by a plurality of Deep Autoencoders. It is stated that
  • One aspect of the present invention is to realize a learning device, a learning method, and a learning program capable of learning the operation of a target device with high accuracy.
  • the learning device has a storage unit that acquires and accumulates the state value of the target device in operation and the measured value of the operation over time, and the storage unit in operation. At least the state value of the target device and the measured value of the operation are input, and the first learning model for predicting the future state value of the target device includes a learning unit for learning the teacher data, and the teacher data is , The time series data of the state value and the measured value accumulated in the storage unit are included.
  • the learning method includes a storage step of acquiring and accumulating the state value of the target device in operation and the measured value of the operation over time, and the state value of the target device in operation and the operation.
  • the first learning model in which at least the measured value is input and predicts the future state value of the target device includes a learning step of learning the teacher data, and the teacher data is the said accumulated in the accumulation step. Includes time-series data of state values and the measured values.
  • the automatic control device includes a first learning model in which at least the state value of the target device in operation and the measured value of the operation are input to predict the future state value of the target device, and at least.
  • the state value of the target device during operation and the measured value of the operation are input to the first learning model, and the state value of the target device is brought closer to the future state value predicted by the first learning model.
  • the first learning model includes an automatic control unit that controls the target device, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.
  • the state value of the target device in operation and the measured value of the operation are input, and at least in the first learning model for predicting the future state value of the target device.
  • the state value of the target device during operation and the measured value of the operation are input, and the target device is controlled so that the state value of the target device approaches the future state value predicted by the first learning model.
  • the first learning model includes an automatic control step, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.
  • FIG. 1 is a block diagram showing a schematic configuration of a learning system 1 according to an embodiment of the present invention.
  • the learning system 1 includes a manipulator (target device) 10, a camera 13, a measuring device 14, an input device 15, a display (display unit) 16, and a learning device (automatic control device) 100. There is.
  • FIG. 2 is a diagram schematically showing the appearance of the learning system 1.
  • the manipulator 10 is equipped with a spoon 17 as an end effector, and performs an operation of weighing the salt 2. For example, the operation of transferring the salt 2 in the container 3 to the container 4 by a set amount is performed.
  • the operation of the manipulator 10 is not limited to the weighing of salt, and other objects (powder, liquid) may be weighed, and other operations can be performed by exchanging the end effector. It may be configured to be.
  • End effectors are, but are not limited to, spoons, hands (grippers), suction hands, spray guns, or welding torches.
  • the manipulator 10 includes one or more joints 11 and operates by driving each joint 11.
  • the joint 11 may be an arm joint or an end effector joint.
  • the manipulator 10 also includes one or more sensors 12, and each sensor 12 may include, for example, an angle sensor that detects a state value (for example, a joint angle, a finger angle) of each joint 11, or a specific portion of the manipulator 10.
  • a force sensor for detecting the force sense (moment) in the above may be included.
  • the camera 13 captures an image (salt 2, container 3, container 4) of the operation (salt weighing) of the manipulator 10 and acquires an captured image.
  • the measuring device 14 is a weighing scale, and measures a measured value (amount of salt 2 transferred from the container 3 to the container 4) of the operation (salt weighing) of the manipulator 10.
  • the measuring device 14 is not limited to the weighing scale, and may be any device capable of measuring the amount of change (for example, the amount of salt) due to the operation of the target device.
  • the input device 15 is an input device for manually operating the manipulator 10.
  • the input device 15 has the same shape as the manipulator 10 as shown in FIG. 2, includes a sensor for detecting the joint angle of each joint, and intuitively moves the manipulator 10 by grasping and moving it by hand. It is a master-slave type input device that can be operated manually.
  • the input device 15 is not limited to this, and may be composed of a robot controller, a teach pendant, a keyboard, a lever, a button, a switch, a touch pad, and the like.
  • the display 16 is a display device for displaying various information, and may be, for example, an LCD display.
  • the learning device 100 includes a first learning model 101, a second learning model 102, a storage unit 103, an acquisition unit 104, a learning unit 105, a manual control unit 106, an automatic control unit 107, and display control.
  • the unit 108 is provided.
  • the first learning model 101 is a learning model in which at least the state value of the manipulator 10 in operation and the measured value of the operation are input to predict the future state value and the measured value of the manipulator 10, and learn the time series data. It can be a possible learning model.
  • the first learning model 101 is an RNN (Recurrent Neural Network) such as MTRNN (MultiTimescaleRNN) and RSTM (LongShortTermMemory).
  • RNN Recurrent Neural Network
  • MTRNN MultiTimescaleRNN
  • RSTM LongShortTermMemory
  • the learning model 101 is not limited to this, and the first learning model 101 may be an ARIMA (AutoRegressive, Integrated and Moving Average) model, a one-dimensional CNN (Convolutional Neural Network), or the like.
  • the second learning model 102 can be a learning model capable of compressing and restoring an image.
  • the second learning model 102 is a CAE (Convolutional Auto Encoder).
  • the second learning model 102 is not limited to this, and may be an autoencoder (Autoencoder), an RBM (Restricted Boltzmann Machine), a principal component analysis (Principal Component Analysis) model, or the like.
  • the storage unit 103 acquires and stores the state value of the operating manipulator 10 over time.
  • the storage unit 103 acquires the joint angle (state value) and force sense (state value) of each joint 11 of the manipulator 10 from the sensor 12 and stores them in a storage unit (not shown).
  • the storage unit 103 also acquires and stores at least one of the captured image of the target object of the operation of the manipulator 10 and the feature amount of the captured image over time.
  • the storage unit 103 stores the captured image of the target object (salt 2, container 3, container 4) captured by the camera 13 in a storage unit (not shown).
  • the storage unit 103 compresses the captured image of the target object captured by the camera 13 by the second learning model 102, acquires the compressed data as the feature amount of the captured image, and stores the compressed data in a storage unit (not shown).
  • the storage unit 103 also acquires and stores the measured values of the operation of the manipulator 10 over time. In one aspect, the storage unit 103 stores the measured values acquired by the acquisition unit 104 in a storage unit (not shown).
  • the second learning model 102 is a learning model such as CAE or an autoencoder that is deep-learned so that the input image and the output image match.
  • the learning unit 105 causes the second learning model 102 to learn the time-series data (moving image data of the operation of the manipulator 10) of the captured image of the camera 13 acquired by the storage unit 103.
  • the storage unit 103 can acquire the feature amount of the captured image from the intermediate layer of the second learning model 102 in which the captured image is input. That is, it can be said that the intermediate layer of the learning model deeply trained so that the input image and the output image match is expressed in a dimension smaller than the dimension of the input image without reducing the amount of information in the input image. It can be suitably used as a feature amount indicating the features of the captured image of the target object.
  • the acquisition unit 104 acquires the measured value (the amount of salt 2 moved to the container 4) of the operation (weighing of salt) of the manipulator 10.
  • the acquisition unit 104 may acquire the measured value from the measuring device 14 that measures the container 4 by wire or wirelessly, or the camera 13 images the display of the measuring device 14.
  • the measured value may be acquired by performing image analysis on the captured image of 13.
  • the acquisition unit 104 also acquires the measured value when the operation of the manipulator 10 is completed as a result value.
  • the learning unit 105 causes the first learning model 101 to learn the teacher data.
  • the details of the teacher data will be described later.
  • the learning unit 105 also causes the second learning model 102 to learn the time-series data of the captured images of the camera 13 stored in the storage unit 103.
  • the manual control unit 106 controls the manipulator 10 in response to an instruction from the input device 15 (external).
  • the automatic control unit 107 first learns the set target value (the amount of moving the salt 2 to the container 4), the state value of the manipulator 10, the feature amount of the captured image, and the measured value of the operation of the manipulator 10. It is input to the model 101, and the manipulator 10 is controlled so that the state value of the manipulator 10 approaches the future state value predicted by the first learning model 101. Details will be described later.
  • the display control unit 108 displays various information on the display 16.
  • the display content is not particularly limited, but includes a captured image of the camera 13, a predicted image of a future captured image (details will be described later), a modeling image of the manipulator 10, a set target value, a measured measured value, and the like. possible.
  • FIG. 3 is a flowchart showing an example of the flow of automatic control of the manipulator 10 by the learning device 100. Note that some steps may be performed in parallel or in a different order.
  • the learning device 100 which has been subjected to manual learning or automatic learning described later in advance (past), can automatically control the manipulator 10 (target device).
  • step S1 the automatic control unit 107 sets a target value for the operation of the manipulator 10.
  • the automatic control unit 107 may set a value input via an input unit (not shown) as a target value.
  • step S2 the automatic control unit 107 acquires the state value of the manipulator 10 (joint angle of each joint 11, force sense at a predetermined position, etc.) from the sensor 12.
  • step S3 the automatic control unit 107 acquires the feature amount of the captured image captured by the camera 13 from the second learning model 102.
  • step S4 the automatic control unit 107 acquires the measured value of the operation of the manipulator 10 from the acquisition unit 104.
  • step S5 the automatic control unit 107 generates an input parameter to be input to the first learning model 101.
  • FIG. 4 is a diagram illustrating input parameters and output parameters of the first learning model 101. As shown in FIG. 4, acquired state values, feature quantities, measured values, and set target values are assigned to each dimension of the input parameters. State values, feature quantities, measured values, and target values may be assigned to multiple dimensions. In addition, the state value, the feature amount, the measured value, and the target value are normalized by the normalization term corresponding to each dimension.
  • step S6 the automatic control unit 107 inputs an input parameter to the first learning model 101 and acquires an output parameter.
  • the first learning model 101 is trained to predict the input parameters to be input in the future when the input parameters are input. For example, when the input parameters at time t are input, the first learning model 101 is trained.
  • the model 101 is trained to output predicted values of input parameters at time t + 1. In other words, the first learning model 101 predicts the input parameters one frame ahead.
  • the target value is a fixed value.
  • step S7 the automatic control unit 107 controls the manipulator 10 so that the state value of the manipulator 10 approaches the future state value predicted by the first learning model 101.
  • the automatic control unit 107 refers to the parameter indicating the joint angle of each joint 11 among the output parameters output by the first learning model 101, and the joint angle of each joint 11 of the manipulator 10 is predicted. Each joint 11 may be controlled so as to approach the joint angle.
  • step S8 the automatic control unit 107 outputs to the display control unit 108 a parameter indicating the feature amount of the future captured image among the output parameters output by the first learning model 101.
  • the display control unit 108 restores the future captured image from the parameter indicating the feature amount of the future captured image by using the second learning model 102. Then, the display control unit 108 displays the captured image captured by the camera 13 and the restored future captured image on the display 16.
  • FIG. 5 is a diagram showing an example of the display contents of the display 16 in step S8.
  • the display control unit 108 causes the display 16 to display the current captured image 200 captured by the camera 13 and the restored future captured image 201. Then, the automatic control unit 107 controls the manipulator 10 so that the current captured image 200 is in the state of the future captured image 201 as a result. In step S8, the display control unit 108 may display only the current captured image 200 captured by the camera 13 on the display 16.
  • step S9 the automatic control unit 107 determines whether or not the operation of the manipulator 10 is completed, and if it is not completed (NO in step S9), returns to step S2 to continue the process and complete the process. If so (YES in step S9), the process ends.
  • the measured value acquired by the acquisition unit 104 is equal to or greater than the target value, or when the difference between the measured value acquired by the acquisition unit 104 and the target value is equal to or less than a preset threshold value.
  • the first learning model 101 is learned to output a specific parameter indicating that the operation is completed when the operation of the manipulator 10 is completed, and is automatically controlled. The unit 107 may determine that the operation of the manipulator 10 is completed when the specific parameter is output from the first learning model 101.
  • FIG. 6 is a flowchart showing an example of the flow of manual learning by the learning device 100. Note that some steps may be performed in parallel or in a different order.
  • step S11 the user operates the input device 15 to input the operation of the manipulator 10.
  • the input device 15 has the same shape as the manipulator 10 as shown in FIG. 2, and includes a sensor that detects the joint angle of each joint, and is input from the input device 15 to the manual control unit 106. An instruction signal indicating the joint angle of each joint of the device 15 is transmitted.
  • step S12 the manual control unit 106 acquires an instruction from the input device 15 (external) and controls the manipulator 10.
  • the manual control unit 106 refers to the instruction signal from the input device 15 so that the joint angle of each joint 11 of the manipulator 10 is the same as the joint angle of each joint of the input device 15. To control.
  • step S13 the storage unit 103 acquires the state value of the manipulator 10 (joint angle of each joint 11, force sense at a predetermined position, etc.) from the sensor 12 and stores it in chronological order.
  • step S14 the storage unit 103 acquires the captured image captured by the camera 13 and stores it in chronological order.
  • step S15 the storage unit 103 acquires the measured values acquired from the measuring device 14 by the acquisition unit 104 and stores them in chronological order.
  • step S16 the manual control unit 106 determines whether or not the operation of the manipulator 10 is completed, and if it is not completed (NO in step S16), returns to step S11 to continue the process and complete the process. If so (YES in step S16), the process proceeds to step S17.
  • the user can specify the completion of the operation by operating the input device 15.
  • step S17 the acquisition unit 104 acquires the result value of the completed operation.
  • the acquisition unit 104 receives the result value of the operation of the manipulator 10 (the amount of salt 2 moved to the container 4) from the measuring device 14 via wired or wireless communication, or analyzes the captured image of the camera 13. Get by doing.
  • steps S11 to S17 may be repeated a plurality of times in order to obtain sufficient teacher data.
  • step S18 the learning unit 105 uses the time-series data of the captured image stored in the storage unit 103 as the second learning model 102 so that the second learning model 102 can compress and restore the captured image. To learn.
  • step S19 as a result of the control by the manual control unit 106, the learning unit 105 receives the time-series data of the state value, the captured image, and the measured value accumulated in the storage unit 103 and the acquisition unit 104 for each operation of the manipulator 10.
  • Teacher data is generated using the acquired result value.
  • the learning unit 105 inputs the time-series data of the captured image into the second learning model 102, and acquires the time-series data of the feature amount.
  • the learning unit 105 generates teacher data including time-series data of state values, feature quantities and measured values, and result values.
  • step S20 the learning unit 105 causes the first learning model 101 to learn the generated teacher data.
  • the manual control unit 106 ends the process.
  • the teacher data includes time series data of state values, features and measured values, and result values instead of target values. That is, in one aspect, the teacher data is time-series data of the input parameters shown in FIG. 4, and the result value acquired by the acquisition unit 104 is a fixed value instead of the parameter to which the set target value is assigned. Is the data entered as.
  • the learning unit 105 sequentially inputs the time series data of the state value, the feature amount, and the measured value included in the teacher data together with the result value (fixed value), and the state value, the feature amount, and the measured value at the next time point.
  • the first learning model 101 is trained using the result value (fixed value) as the correct answer data.
  • the learning device 100 when the learning device 100 learns the operation of the target device, in addition to the time-series data of the state value and the feature amount of the captured image, the learning device 100 measures the operation of the target device (for example, By learning the time-series data (such as the amount of salt movement in the case of salt weighing), learning that reflects the measured values can be performed by an algorithm different from reinforcement learning, and the operation of the target device with high accuracy. Can be learned.
  • the learning device 100 learns the operation of the target device, after acquiring the result value of the operation, the learning device 100 assumes that the operation is an operation with the result value as the target value and performs learning.
  • the learning device 100 stores the state value, feature amount, measured value, etc. related to the operation, and after acquiring the result value of the operation, the accumulated state value, feature amount, and measurement.
  • a learning model is trained by using a value or the like as teacher data of an operation for obtaining the result value. As a result, learning that reflects the result value can be performed by an algorithm different from reinforcement learning, and the operation of the target device can be learned with high accuracy.
  • the time series data of the multidimensional parameters included in the teacher data is normalized by providing a normalization term for each parameter dimension. That is, in one embodiment, the learning unit 105 calculates the average and variance of each dimension in the teacher data, calculates the normalization term so that the parameters of each dimension have an average of 0 and a variance of 1, and normalizes the teacher data. After the conversion, the first learning model is trained. As a result, it is possible to learn the operation of the target device with high accuracy by matching the average and variance of multimodal parameters with different orders.
  • the loss function shown in the following equation (1) can be used. Dim represents the total number of dimensions. Mi represents the number of dimensions of each modality (for example, joint angle (state value), force sense (state value), feature amount and measured value). t represents the correct answer data. y represents the prediction data. N represents the number of data.
  • the learning device 100 uses the time-series data of the state value and the feature amount accumulated in the storage unit 103 as a result of the control by the automatic control unit 107 shown in FIG. 2 and the measurement value and the result value acquired by the acquisition unit 104. It is possible to generate teacher data and perform learning. As a result, the operation accuracy can be automatically improved. That is, the learning device 100 can self-learn the learning model without human intervention. Therefore, even when the number of manual learnings is small and the operating accuracy of the target device obtained by manual learning is lower than the desired operating accuracy, the automatic learning improves the operating accuracy of the target device to the desired operating accuracy. be able to. In other words, high motion accuracy can be obtained with less manual learning. As a result, it is possible to reduce the labor of the operator who performs manual learning and shorten the time required for learning.
  • FIG. 7 is a flowchart showing an example of the flow of automatic learning by the learning device 100. Note that some steps may be performed in parallel or in a different order.
  • the flowchart shown in FIG. 7 is executed by modifying a part of the flowchart shown in FIG. First, after performing step S1, steps S21 to S23 are performed instead of S2 to S4.
  • step S21 the automatic control unit 107 acquires the state value of the manipulator 10 (joint angle of each joint 11, force sense at a predetermined position, etc.) from the sensor 12 and accumulates it.
  • step S22 the automatic control unit 107 acquires and accumulates the feature amount of the captured image captured by the camera 13 from the second learning model 102.
  • step S23 the automatic control unit 107 acquires and accumulates the measured value of the operation of the manipulator 10 from the acquisition unit 104.
  • steps S5 to S9 are performed, and if step S9 is YES, steps S24 to S26 are performed.
  • step S24 the acquisition unit 104 acquires the result value of the completed operation from the measuring device 14.
  • step S25 as a result of control by the automatic control unit 107, the learning unit 105 receives time-series data of state values, feature quantities, and measured values accumulated in the storage unit 103 and acquisition unit 104 for each operation of the manipulator 10. Generate the acquired result value and the teacher data including it. Then, in step S26, the learning unit 105 causes the first learning model 101 to learn the generated teacher data. After that, the automatic control unit 107 ends the process.
  • the learning unit 105 may determine whether or not to perform steps S25 to S26 based on the time taken to complete the operation of the manipulator 10. That is, the learning device 100 first learns only the time-series data obtained as a result of the automatic control, which has a high operation speed (the time required to complete the operation is shorter than the threshold value) as the teacher data. By letting the model 101 learn, it is possible to speed up the operation at the time of automatic control. In one aspect, the learning device 100 divides the result value into predetermined stages, and among the time series data of the operation for which the result value of each stage is obtained, only the time series data of the operation having a high operation speed is used as the teacher data. The learning model 101 of the above may be trained.
  • the interval at which the storage unit 103 acquires the state value, the feature amount, and the measured value is preferably close to the time required for controlling the manipulator 10. In other words, it is preferable that the sampling rate of the state value, the feature amount and the measured value used for the teacher data is close to the processing rate of the control of the manipulator 10.
  • the automatic control unit 107 performs automatic control using a pseudo first learning model 101 prepared in advance, and the automatic control unit 107 performs at least one of a state value, a feature amount, and a measured value.
  • the time taken from the acquisition to the control of the manipulator 10 is measured.
  • the storage unit 103 adjusts the interval for acquiring at least one of the state value, the feature amount, and the measured value so as to approach the time. As a result, it is possible to learn the operation of the target device with higher accuracy.
  • the feature amount of the captured image is included as the input parameter, but the feature amount may not be included. Further, although the target value (result value) is included as the input parameter, the target value (result value) may not be included.
  • the operation of the manipulator 10 may be an object moving operation, a painting operation, a welding operation, or the like, in addition to the weighing of the object.
  • the measured value may be the amount (weight) of the object, the moving distance of the object, the paint color or range, the temperature, or the like.
  • the measuring device 14 may be a distance measuring device, a camera, a thermometer, or the like, in addition to the weight scale. When the measuring device 14 is a camera, the camera 13 can also be used as the measuring device 14.
  • the present invention can be applied to other controllable target devices (for example, machine tools, 3D printers, construction machines, medical devices, etc.).
  • controllable target devices for example, machine tools, 3D printers, construction machines, medical devices, etc.
  • the control block of the learning device 100 (particularly, the storage unit 103, the acquisition unit 104, the learning unit 105, the manual control unit 106, the automatic control unit 107, and the display control unit 108) is a logic formed in an integrated circuit (IC chip) or the like. It may be realized by a circuit (hardware) or by software.
  • FIG. 8 is a block diagram illustrating the configuration of the computer 910 that can be used as the learning device 100.
  • the computer 910 includes an arithmetic unit 912 connected to each other via a bus 911, a main storage device 913, an auxiliary storage device 914, an input / output interface 915, and a communication interface 916.
  • the arithmetic unit 912, the main storage device 913, and the auxiliary storage device 914 may be, for example, a processor, a RAM (random access memory), or a hard disk drive, respectively. Examples of the processor include a CPU (Central Processing Unit) and a GPU (Graphics Processing).
  • the learning of the first learning model 101 and the second learning model 102 is preferably executed by the GPU.
  • An input device 920 for the user to input various information to the computer 910 and an output device 930 for the computer 910 to output various information to the user are connected to the input / output interface 915.
  • the input device 920 and the output device 930 may be built in the computer 910 or may be connected (external) to the computer 910.
  • the input device 920 may be a keyboard, a mouse, a touch sensor, or the like
  • the output device 930 may be a display, a printer, a speaker, or the like.
  • the communication interface 916 is an interface for the computer 910 to communicate with an external device.
  • the auxiliary storage device 914 stores various programs for operating the computer 910 as the learning device 100. Then, the arithmetic unit 912 expands the program stored in the auxiliary storage device 914 on the main storage device 913 and executes the instructions included in the program to use the computer 910 as each part included in the learning device 100. Make it work.
  • the recording medium for recording information such as programs provided in the auxiliary storage device 914 may be a computer-readable "non-temporary tangible medium", for example, a tape, a disk, a card, a semiconductor memory, or a programmable logic. It may be a circuit or the like.
  • main storage device 913 may be omitted as long as the computer can execute the program recorded on the recording medium without expanding it on the main storage device 913.
  • Each of the above devices (arithmetic unit 912, main storage device 913, auxiliary storage device 914, input / output interface 915, communication interface 916, input device 920, and output device 930) may be one. There may be more than one.
  • the above program may be acquired from the outside of the computer 910, and in this case, it may be acquired via an arbitrary transmission medium (communication network, broadcast wave, etc.).
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.
  • the learning device has a storage unit that acquires and accumulates the state value of the target device in operation and the measured value of the operation over time, and the state value of the target device in operation and the operation.
  • the first learning model in which at least the measured value is input and predicts the future state value of the target device is provided with a learning unit for learning the teacher data, and the teacher data is stored in the storage unit. Includes time series data of state values and the measured values.
  • the learning device further includes an acquisition unit for acquiring a result value which is a measured value when the operation is completed, and the input data of the first learning model is the said.
  • the target value of the operation of the target device may be further included, and the teacher data may further include the result value acquired by the acquisition unit in place of the target value.
  • the learning device further includes a manual control unit that controls the target device in response to an instruction from the outside in the first or second aspect, and the learning unit is controlled by the manual control unit.
  • learning may be performed using at least the time-series data of the state value and the measured value accumulated in the storage unit.
  • the learning device inputs the state value of the target device in operation and the measured value of the operation into the first learning model in the first to third aspects, and the first learning model. It may be further provided with an automatic control unit that controls the target device so that the state value of the target device approaches the future state value predicted by.
  • the learning unit collects at least the time series data of the state value and the measured value accumulated in the storage unit as a result of control by the automatic control unit. It may be used for learning.
  • the automatic control unit has accumulated in the storage unit as a result of the operation based on the time taken to complete the operation. It may be determined whether or not the learning unit is to perform learning using at least the time series data of the value and the measured value.
  • the automatic control unit measures the time taken from the acquisition of the state value or the measured value to the control of the target device.
  • the interval at which the storage unit acquires the state value or the measured value may be adjusted based on the time.
  • the storage unit further acquires and accumulates the feature amount of the captured image obtained by capturing the target object of the operation of the target device over time.
  • the feature amount of the captured image is further input to the first learning model, and the teacher data may further include time-series data of the feature amount accumulated in the storage unit.
  • the storage unit further stores the captured image, and the learning unit is deeply trained so that the input image and the output image match.
  • the second learning model may be trained to learn the captured image accumulated in the storage unit, and the feature amount of the captured image may be obtained from the second learning model to which the captured image is input.
  • the first learning model further predicts the feature amount of the captured image in the future, and the future predicted by the first learning model. It may be further provided with a display control unit for displaying the future captured image restored from the feature amount of the captured image and the captured image captured by the target object on the display unit.
  • the learning device may include time-series data of a plurality of dimensional parameters in the first to tenth aspects, and the parameters may be normalized for each dimension.
  • the first learning model may be an RNN in the first to eleventh aspects.
  • the target device may be a manipulator having a joint
  • the state value may be the state value of the joint
  • the learning method according to aspect 14 of the present invention includes a storage step of acquiring and accumulating the state value of the target device in operation and the measured value of the operation over time, and the state value of the target device in operation and the operation.
  • the first learning model in which at least the measured value is input and predicts the future state value of the target device includes a learning step of learning the teacher data, and the teacher data is the said accumulated in the accumulation step. Includes time-series data of state values and the measured values.
  • the automatic control device includes a first learning model in which at least the state value of the target device in operation and the measured value of the operation are input to predict the future state value of the target device, and at least.
  • the state value of the target device during operation and the measured value of the operation are input to the first learning model, and the state value of the target device is brought closer to the future state value predicted by the first learning model.
  • the first learning model includes an automatic control unit that controls the target device, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.
  • the state value of the target device in operation and the measured value of the operation are input, and at least in the first learning model for predicting the future state value of the target device.
  • the state value of the target device during operation and the measured value of the operation are input, and the target device is controlled so that the state value of the target device approaches the future state value predicted by the first learning model.
  • the first learning model includes an automatic control step, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.
  • the learning device may be realized by a computer.
  • the learning device is realized by the computer by operating the computer as each part (software element) included in the learning device.
  • the learning program of the learning device and the computer-readable recording medium on which the learning program is recorded also fall within the scope of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Automation & Control Theory (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Manufacturing & Machinery (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Manipulator (AREA)
  • Image Analysis (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Numerical Control (AREA)

Abstract

La présente invention permet d'effectuer un apprentissage du fonctionnement d'un dispositif sujet avec une haute précision. Un dispositif d'apprentissage (100) comprend : une unité de stockage (103) qui acquiert au fil du temps, et stocke, des valeurs d'état d'un manipulateur (10) en fonctionnement et des valeurs de mesure du fonctionnement; et une unité d'apprentissage (105) dans laquelle au moins les valeurs d'état du manipulateur (10) en fonctionnement et les valeurs de mesure du fonctionnement sont entrées, et qui amène un premier modèle d'apprentissage (101) servant à prédire une valeur d'état futur du manipulateur (10) à apprendre des données d'enseignant, les données d'enseignant comprenant les données chronologiques des valeurs d'état et des valeurs de mesure stockées dans l'unité de stockage (103).
PCT/JP2020/014981 2019-05-24 2020-04-01 Dispositif d'apprentissage, procédé d'apprentissage, programme d'apprentissage, dispositif de commande automatique, procédé de commande automatique, et programme de commande automatique Ceased WO2020241037A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-098109 2019-05-24
JP2019098109A JP6811465B2 (ja) 2019-05-24 2019-05-24 学習装置、学習方法、学習プログラム、自動制御装置、自動制御方法および自動制御プログラム

Publications (1)

Publication Number Publication Date
WO2020241037A1 true WO2020241037A1 (fr) 2020-12-03

Family

ID=73547509

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/014981 Ceased WO2020241037A1 (fr) 2019-05-24 2020-04-01 Dispositif d'apprentissage, procédé d'apprentissage, programme d'apprentissage, dispositif de commande automatique, procédé de commande automatique, et programme de commande automatique

Country Status (2)

Country Link
JP (1) JP6811465B2 (fr)
WO (1) WO2020241037A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022125900A (ja) * 2021-02-17 2022-08-29 株式会社エクサウィザーズ 情報処理装置、情報処理方法、及びプログラム
CN115446867A (zh) * 2022-09-30 2022-12-09 山东大学 一种基于数字孪生技术的工业机械臂控制方法及系统
US20230271319A1 (en) * 2022-02-28 2023-08-31 Denso Wave Incorporated Method of generating a learning model for transferring fluid from one container to another by controlling robot arm based on a machine-learned learning model, and a method and system for weighing the fluid

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022145213A1 (fr) * 2020-12-28 2022-07-07 ソニーグループ株式会社 Dispositif, système et procédé de commande
WO2022145150A1 (fr) * 2020-12-28 2022-07-07 ソニーグループ株式会社 Système de commande et procédé de commande
JP7676911B2 (ja) * 2021-04-27 2025-05-15 マツダ株式会社 解析支援方法、該プログラムおよび解析支援装置
JP7237382B1 (ja) * 2021-12-24 2023-03-13 知能技術株式会社 画像処理装置、画像処理方法、及び画像処理プログラム
JP2024084012A (ja) * 2022-12-12 2024-06-24 オムロン株式会社 ロボットシステム
JP2025020634A (ja) * 2023-07-31 2025-02-13 オムロン株式会社 制御装置、学習済みモデル生成装置、方法、及びプログラム
JP2025042336A (ja) 2023-09-14 2025-03-27 株式会社デンソー データ収集システム、自動化作業システム、データ収集プログラム及び自動化作業プログラム生成プログラム

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11272305A (ja) * 1998-03-23 1999-10-08 Toshiba Corp プラント制御装置
JP2018100896A (ja) * 2016-12-20 2018-06-28 ヤフー株式会社 選択装置、選択方法及び選択プログラム
JP2018206286A (ja) * 2017-06-09 2018-12-27 川崎重工業株式会社 動作予測システム及び動作予測方法
JP2019032649A (ja) * 2017-08-07 2019-02-28 ファナック株式会社 制御装置及び機械学習装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11272305A (ja) * 1998-03-23 1999-10-08 Toshiba Corp プラント制御装置
JP2018100896A (ja) * 2016-12-20 2018-06-28 ヤフー株式会社 選択装置、選択方法及び選択プログラム
JP2018206286A (ja) * 2017-06-09 2018-12-27 川崎重工業株式会社 動作予測システム及び動作予測方法
JP2019032649A (ja) * 2017-08-07 2019-02-28 ファナック株式会社 制御装置及び機械学習装置

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Multimodal AI robot will be exhibited at GTC Japan 2018 (part 1)", EXAWIZARDS ENGINEER BLOG, 11 September 2018 (2018-09-11), Retrieved from the Internet <URL:https://techblog.exawizards.com/entry/2018/09/11/095201> *
SUZUKI, KANATA ET AL.: "Generation of Folding Action of Flexible Object by Multi-DOF Robot Using Deep Learning", LECTURE PROCEEDINGS(2) OF THE 78TH (2016) NATIONAL CONFERENCE OF ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE OF INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 78, no. 28, 10 March 2016 (2016-03-10), pages 2-319 - 2-320 *
YUKAWA, TSURUAKI: "Can Multimodal AI Robots Reproduce artisan's Skills? The World's Leading Edge Seen at Interphex Osaka", EXACOMMUNITY, 4 March 2019 (2019-03-04), Retrieved from the Internet <URL:https://community.exawizards.com/aishinbun/20190304> [retrieved on 20200701] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022125900A (ja) * 2021-02-17 2022-08-29 株式会社エクサウィザーズ 情報処理装置、情報処理方法、及びプログラム
US20230271319A1 (en) * 2022-02-28 2023-08-31 Denso Wave Incorporated Method of generating a learning model for transferring fluid from one container to another by controlling robot arm based on a machine-learned learning model, and a method and system for weighing the fluid
CN115446867A (zh) * 2022-09-30 2022-12-09 山东大学 一种基于数字孪生技术的工业机械臂控制方法及系统

Also Published As

Publication number Publication date
JP6811465B2 (ja) 2021-01-13
JP2020194242A (ja) 2020-12-03

Similar Documents

Publication Publication Date Title
JP6811465B2 (ja) 学習装置、学習方法、学習プログラム、自動制御装置、自動制御方法および自動制御プログラム
JP6810087B2 (ja) 機械学習装置、機械学習装置を用いたロボット制御装置及びロボットビジョンシステム、並びに機械学習方法
JP7458741B2 (ja) ロボット制御装置及びその制御方法及びプログラム
CN112638596B (zh) 自主学习型机器人装置以及自主学习型机器人装置的动作生成方法
JP6680750B2 (ja) 制御装置及び機械学習装置
US12162151B2 (en) Robot control device, robot system and robot control method
JP2011110620A (ja) ロボットの動作を制御する方法およびロボットシステム
CN102470530A (zh) 生成机器人的教导数据的方法以及机器人教导系统
JP7376318B2 (ja) アノテーション装置
Campbell et al. Learning whole-body human-robot haptic interaction in social contexts
JP2021082049A (ja) 情報処理装置、および、情報処理方法
CN119820564B (zh) 一种虚实结合的机械臂-灵巧手系统映射操作及样本采集方法
CN119347753B (zh) 确定控制策略模型的方法及装置以及用于控制末端执行器的方法及装置
US20250065493A1 (en) Method and system for dexterous manipulation by a robot
Thompson et al. Identification of unknown object properties based on tactile motion sequence using 2-finger gripper robot
CN115958595B (zh) 机械臂引导方法、装置、计算机设备和存储介质
CN111702759A (zh) 示教系统及机器人的示教方法
CN114454176B (zh) 机器人的控制方法、控制装置、机器人和存储介质
CN118265596A (zh) 机器人控制装置、机器人系统以及机器人控制方法
Guo et al. Grasp like humans: Learning generalizable multi-fingered grasping from human proprioceptive sensorimotor integration
CN121018608B (zh) 用于机器人夹爪的控制方法、电子设备、介质及系统
US20250319590A1 (en) Adjustment of manipulated value of robot
JP2025093156A (ja) 動作指令生成装置
CN120155933A (zh) 一种抓取策略模型的训练、抓取方法、装置及系统
CN120155915A (zh) 一种视触觉融合模型训练、操作方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20814673

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20814673

Country of ref document: EP

Kind code of ref document: A1