WO2018216492A1 - Learning apparatus, learning control method, program therefor - Google Patents
Learning apparatus, learning control method, program therefor Download PDFInfo
- Publication number
- WO2018216492A1 WO2018216492A1 PCT/JP2018/018141 JP2018018141W WO2018216492A1 WO 2018216492 A1 WO2018216492 A1 WO 2018216492A1 JP 2018018141 W JP2018018141 W JP 2018018141W WO 2018216492 A1 WO2018216492 A1 WO 2018216492A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- learning
- control
- sensor
- task
- scenes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
- B25J9/161—Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
Definitions
- the present invention relates to a learning apparatus, a learning control method, and a program therefor.
- This application claims priority to Japanese Patent Application No. 2017-104535 filed May 26, 2017, the entire contents of which are incorporated herein by reference.
- AI technology artificial intelligence technology
- neural networks has been widely researched (see JP H6-289918A, for example).
- AI technology such as neural networks
- deep learning the recognition rates of techniques for recognizing targets based on images have improved rapidly in these years, and the recognition rate of classifying images is almost higher than that of humans.
- Deep learning technology is expected to be applied not only to image recognition but also to various other fields such as speech recognition, personal authentication, behavior prediction, summary writing, machine translation, monitoring, autonomous driving, failure prediction, sensor data analysis, music genre determination, content generation, and security systems.
- machines can be trained to attain a predetermined ability.
- learning apparatuses that perform machine learning repeatedly perform learning operations until they attain a predetermined ability.
- JP H6-289918A discloses a learning control method for robots.
- an input value that is to be supplied to a driving unit of a robot is modified based on a difference between a target motion targeted in a robot operation set in advance by a person and an actual motion when the robot actually operates.
- control and sensor information output affect each other, and thus more complex learning has to be performed in order to attain a control method. Accordingly, in learning apparatuses that perform such complex learning, it is not easy for a person to set a target value of the control amount in advance as in JP H6-289918A. On the other hand, in the case of training learning apparatuses without setting a target value, it is necessary to repeat trial and error processing for an extremely large number of times, which is inefficient.
- An aspect of the present invention is directed to a learning apparatus configured to learn control of a series of operations involved in a predetermined task, the apparatus including: a classifying unit configured to divide the task into a plurality of scenes, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations; and a partially learning unit configured to learn control of the partial operation for each of the divided scenes.
- the learning apparatus can classify a series of operations involved in the learning into smaller unit according to divided scenes. Accordingly, it is possible to perform learning for each scene, thereby achieving the learning purpose in a shorter period of time.
- the learning apparatus further includes an output unit configured to output a classification result of the partial operation.
- learned partial operation classification can be used by other learning apparatuses as well.
- the learning apparatus further includes a control unit configured to sort the classified partial operations or the divided scenes, in a time-series order of the task.
- classified partial operations can be sorted and learned in the order of scenes corresponding to a task.
- An aspect of the present invention is directed to an autonomous driving control learning apparatus configured to learn control of a series of operations involved in autonomous driving of a vehicle that does laps of a predetermined course, the apparatus including: a classifying unit configured to divide the course into a plurality of scenes based on a position or an orientation in the course relative to a predetermined reference position, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations; and a partially learning unit configured to learn control of the partial operation for each of the divided scenes.
- An aspect of the present invention is directed to a robot control learning apparatus configured to learn control of a series of operations involved in a task of holding a predetermined workpiece and stacking the workpiece at a placement position according to a shape of the workpiece, the apparatus including: a classifying unit configured to divide the task into a plurality of scenes based on a displacement amount in an operation involved in learning of the task, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operation; and a partially learning unit configured to learn control of the partial operation for each of the divided scenes.
- An aspect of the present invention is directed to a learning method for learning control of a series of operations involved in a predetermined task, the method including: dividing, by a computer, the task into a plurality of scenes, and, for each of the divided scenes, specifying, by the computer, a partial operation that is to be performed in the scene, from among the series of operations; and learning, by the computer, control of the partial operation for each of the divided scenes.
- An aspect of the present invention is directed to a program for causing a computer configured to learn control of a series of operations involved in a predetermined task to execute: dividing the task into a plurality of scenes, and, for each of the divided scenes, specifying a partial operation that is to be performed in the scene, from among the series of operations; and learning control of the partial operation for each of the divided scenes.
- An aspect of the present invention is directed to an apparatus configured to perform a predetermined task, the apparatus including: a first sensor configured to sense information regarding a scene in which the apparatus performs the predetermined task; an actuator; a second sensor configured to sense information regarding an operation of the apparatus that is performed through control from the actuator; a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and a storage unit configured to store a learning result obtained by the above-described learning apparatus, wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit.
- An aspect of the present invention is directed to an autonomous driving control apparatus for doing laps of a predetermined course, the apparatus including: a first sensor configured to sense information regarding a position and an orientation in the predetermined course for the vehicle; an actuator configured to control driving of the vehicle; a second sensor configured to sense a control status of the vehicle as per the actuator; a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and a storage unit configured to store a learning result obtained by the above-described learning apparatus, wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit.
- FIG. 1 is a block diagram showing a schematic configuration of a learning apparatus in a first embodiment.
- FIG. 2 is a schematic diagram showing a course for autonomous driving of a vehicle that is controlled by the learning apparatus in the first embodiment.
- FIG. 3 is a flowchart showing the outline of the processing performed by the learning apparatus in the first embodiment.
- FIG. 4 is a block diagram showing a detailed configuration of the learning apparatus in the first embodiment.
- FIG. 5 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment.
- FIG. 6 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment.
- FIG. 7 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment.
- FIG. 1 is a block diagram showing a schematic configuration of a learning apparatus in a first embodiment.
- FIG. 2 is a schematic diagram showing a course for autonomous driving of a vehicle that is controlled by the learning apparatus in the first embodiment.
- FIG. 3 is a flow
- FIG. 8 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment.
- FIG. 9 is a diagram showing an example of the hardware configuration of the learning apparatus in the first embodiment.
- FIG. 10 is a block diagram showing a schematic configuration of a learning apparatus in a second embodiment.
- FIG. 1 is a block diagram showing a schematic configuration of a learning apparatus 1 according to this embodiment.
- the learning apparatus 1 learns a predetermined task.
- the learning apparatus 1 according to this embodiment is mounted in a vehicle controlled to drive autonomously (hereinafter, alternatively referred to simply as a “vehicle”) 90, and learns control of the vehicle 90 for autonomous driving along a predetermined course (see FIG. 2).
- the learning apparatus 1 is provided with learning data, for example, from an operator or the like.
- the learning data is data containing, for example, a learning purpose and learning requirements as follows.
- the task is a matter that is required to be achieved through an operation involved in the learning (“operation involved in the learning” in this embodiment refers to various types of control necessary for autonomous driving of the vehicle 90, and may be considered as being an operation that is performed by the vehicle 90 through the various types of control), and refers to doing laps of a course in this embodiment.
- the learning purpose is a standard that is to be achieved by the task, and refers to “do ten laps of a course within predetermined period of time and reach goal” as described above in this embodiment.
- making the task performable is provided as a learning requirement, in learning in the initial stage level.
- the learning apparatus 1 is described as an apparatus including a computer such as a PC (personal computer) or a server apparatus, but there is no limitation to this, and it may be realized by, for example, any built-in device that has a processor, a RAM, and a ROM. Furthermore, the configuration of elements installed in the apparatuses is not limited to those realized by software.
- the apparatuses may have a configuration realized by hardware.
- a later-described neural network 22 may be configured by an electronic circuit such as a custom LSI (large-scale integration) or an FPGA (field-programmable gate array).
- the learning apparatus 1 includes a control unit 10, a machine learning unit 20, an operation classifying unit 30, and a storage unit 40.
- control unit 10 is connected to a control sensor 91, an actuator 92, and a status detection sensor 93 provided outside the learning apparatus 1.
- the control unit 10 controls the actuator 92 in response to output from the control sensor 91 and the status detection sensor 93, thereby performing autonomous driving of the vehicle 90.
- the control sensor 91 is a sensor group for controlling the autonomous driving of the vehicle 90.
- the control sensor 91 is configured by a sensor for sensing obstacles outside the vehicle or a sensor for detecting the state of the road surface such as a vehicle-mounted camera or a laser, for example.
- the status detection sensor 93 is a sensor group for detecting the control status of the vehicle 90 in an autonomous driving state.
- the status detection sensor 93 is configured by a vibration sensor, a noise sensor, a fuel consumption sensor, a vehicle speed sensor, an acceleration sensor, a yaw rate sensor, or the like.
- the actuator 92 is controlled by the control unit 10 for autonomous driving of the vehicle 90.
- the actuator 92 is configured by, for example, an accelerator actuator, a brake actuator, a steering actuator, or the like.
- the accelerator actuator controls the vehicle driving force by controlling the throttle opening degree in response to a control signal from the control unit 10.
- the brake actuator controls the braking force on vehicle wheels by controlling the operation amount of the brake pedal in response to a control signal from the control unit 10.
- the steering actuator performs control of vehicle steering action by controlling driving of a steering assistance motor of an electronic power steering system in response to a control signal from the control unit 10.
- FIG. 3 is a flowchart showing the outline of the processing flow when the learning apparatus 1 performs learning.
- learning initial stage S1
- learning is performed for the purpose of making the task performable (i.e., making an operation that satisfies the learning requirements in the initial stage performable).
- the learning apparatus 1 in this embodiment is provided with “do one lap of course and reach goal” as a learning requirement in the initial stage.
- operation classification is performed.
- the content of the learning performed in the learning initial stage S1 is analyzed, so that the task is divided into a plurality of portions based on predetermined parameter (hereinafter, a portion obtained by dividing a task is alternatively referred to as a “scene”), and, in each divided scene, an operation that is to be performed in that scene (hereinafter, alternatively referred to as a “partial operation”) is specified from among the series of operations involved in the task.
- the predetermined parameter for dividing the task is, for example, a displacement amount in the operation involved in the learning of the task, or the environment in which the operation involved in the learning of the task is performed (the time elapsed from when the task is started, the position from where the task is started, etc.).
- the position from where the task is started (the environment in which the operation involved in the learning of the task is performed) is used as the predetermined parameter. That is to say, in this embodiment, the learning apparatus 1 divides the task into scenes based on the position in a course, and classifies the series of operations involved in the learning into scenes based on an operation performed in units of a course corresponding to each divided scene.
- the efficiency of learning can be increased by performing the learning in units of partial operations classified according to the scenes. Increasing the efficiency of learning may refer to, for example, shortening the time required from when the learning is started to when the learning purpose is achieved.
- the state in which the learning is made non-continuable refers to a state in which an operation that is required to be learned through learning is made non-continuable.
- the state in which the learning is made non-continuable refers to, for example, states in which the vehicle has slid off the track, in which the vehicle has crashed against a wall or the like and is not moving, in which the vehicle has broken down, or the like.
- a learning final stage (S4), the learning is optimized.
- learning for optimally performing the operation from start to end is performed.
- learning through doing ten laps of a course within a predetermined period of time and reaching a goal is performed as the learning in the final stage.
- FIG. 4 is a block diagram showing a detailed configuration of the learning apparatus 1 according to this embodiment.
- the machine learning unit 20 includes a learning data input/output unit 21, a neural network 22, and a learning result output unit 23.
- the operation classifying unit 30 includes a control data extracting unit 31 and an operation classifying result extracting unit 32.
- the processing of each unit will be described in detail in each step of FIG. 3.
- FIG. 5 is a flowchart showing details of the processing flow in the learning initial stage in step S1 shown in FIG. 3.
- the learning data input/output unit 21 accepts learning data (S101).
- the learning data is data containing, for example, a learning purpose and learning requirements, as described above.
- next step (S102) machine learning is performed.
- the control unit 10 operates the actuator 92 by setting a random control amount thereto.
- the control unit 10 reads output (hereinafter, alternatively referred to as “sensor value”) from the control sensor 91 and the status detection sensor 93 for the control amount given at random, and stores the data (the control amount and the sensor values) in the storage unit 40.
- the neural network 22 refers to the storage unit 40 and reads the stored control amount and sensor values, and performs learning of a control operation that matches the learning requirements through deep learning (S102).
- “do one lap of course and reach goal” is set as the purpose in the initial stage level. Accordingly, in the learning apparatus 1, for example, when it is determined based on output from the control sensor 91 that the vehicle has done one lap of the course and reached the goal, the machine learning is judged to have reached the initial stage level (S103: Y), and the learning in the initial stage is ended.
- FIG. 6 is a flowchart showing details of the processing flow in the operation classification in step S2 shown in FIG. 3.
- the control data extracting unit 31 extracts the sensor value of the control sensor 91 when the learning initial stage ends and the control amount of the actuator 92 and the sensor value of the status detection sensor 93 corresponding thereto, from the storage unit 40 (S201). the control data extracting unit 31 inputs the extracted values to the neural network 22 as learning data.
- the neural network 22 performs machine learning based on the learning data input by the control data extracting unit 31 (S202). At this time, the neural network 22 divides the course-running operation into a predetermined number of divided scenes.
- the neural network 22 classifies the course-running operation into scenes based on scene vectors and operation vectors.
- the scene vectors in this embodiment express a scene of the task that is performed by the vehicle 90.
- the scene vectors are acquired, for example, from sensor values (e.g., a position (or distance) from the start point, and a direction from the start point) that are output by the control sensor 91.
- sensor values e.g., a position (or distance) from the start point, and a direction from the start point
- the scene vector at a point l can be expressed as (l x , y ).
- the operation vectors express the control status of the driving vehicle 90.
- the operation vectors are acquired, for example, from sensor values (e.g., velocity, acceleration, angular velocity, angular acceleration, etc.) that are output by the status detection sensor 93.
- the operation vector at the point l can be expressed as (v l , a l ) using a velocity v and an acceleration a at the point l.
- the neural network 22 divides the task into scenes based on the scene vector (l x , y ), and learns, for each divided scene, operation classification that is to be learned in that scene based on the operation vector (v l , a l ). Accordingly, the learning apparatus 1 can learn control of a partial operation according to a scene, by judging in which scene the learning apparatus 1 is currently present. For example, the neural network 22 focuses on the position that is expressed by the scene vector as well as the point at which the operation vector changes, thereby finding acceleration, deceleration, change of direction, and the like of the operation of the vehicle 90, so that the series of operations can be classified into operations corresponding to the scenes based on the change point. Also, for example, the neural network 22 can learn the operation classification based on the similarity levels of the operation vectors.
- the task is divided into five scenes (a) to (e) based on a position on the course.
- the partial operations classified into scenes are, for example, as follows.
- the neural network 22 can preferably sort the divided scenes in the time-series order of the task.
- the neural network 22 can sort the divided scenes in order of positions in the course from closest to farthest from the task start position (start position) in the direction in which the vehicle 90 travels.
- the operation classifying result extracting unit 32 extracts the partial operation classification learned by the neural network 22, and stores it in the storage unit 40 (S203). At this time, the learning result output unit 23 can output a learning result according to the partial operation classification. Accordingly, the neural network 22 can, for example, accept a classification result of the partial learning, as learning data, from another learning apparatus 1' that has a similar configuration, thereby omitting learning regarding the partial operation classification.
- FIG. 7 is a flowchart showing details of the processing flow in the learning of control with which the learning is made non-continuable (second learning process) in step S3 shown in FIG. 3.
- the learning data input/output unit 21 selects a partial operation from among the partial operations classified in the processing in step S2, referring to the storage unit 40, and extracts the control amount for the actuator 92 necessary for this partial operation. Furthermore, the learning data input/output unit 21 performs control in the extracted control amount, referring to the storage unit 40, and judges whether or not the learning has been made non-continuable as a result of the control, for example, based on output from the status detection sensor 93 and the like.
- the learning data input/output unit 21 reads the extracted control amount and information as to whether or not the learning has been made non-continuable as a result of the control, as learning data, and gives them to the neural network 22 as learning data.
- the neural network 22 performs learning through deep learning based on the given learning data (S301).
- the learning result output unit 23 can output a learning result of the control with which the learning is made non-continuable. Accordingly, the neural network 22 can, for example, accept control with which the learning is made non-continuable, as learning data, from another learning apparatus 1' that has a similar configuration, and perform an additional learning process (S302). Thus, it is possible to perform a more efficient learning process. Note that the processing in step S302 is not essential.
- the learning apparatus 1 performs the processing in step S301 (and S302) on all classified partial operations (S303).
- the learning apparatus 1 may perform learning again through the series of operations (S304). Accordingly, faster course-running control can be performed.
- the learning apparatus 1 performs learning of control with which the learning is made non-continuable, for classified partial operations, and thus it is possible to perform learning while avoiding such control in the subsequent learning. Accordingly, it is possible to perform a more efficient learning process.
- FIG. 8 is a flowchart showing details of the processing flow in the optimization learning (third learning process) in step S4 shown in FIG. 3.
- the learning that has been performed in the steps up to S3 is optimized, and thus learning for achieving the learning purpose (“do ten laps of a course within predetermined period of time and reach goal” in this embodiment) given as learning data when the learning was started is performed.
- the learning data input/output unit 21 refers to the storage unit 40 and extracts learning data (which is set by an operator) input in the learning initial stage (S1 in FIG. 3).
- the learning data input/output unit 21 extracts a status of the neural network 22 after the learning of control with which the learning is made non-continuable, referring to the storage unit 40.
- the learning data input/output unit 21 sets the extracted data to the control unit 10.
- the control unit 10 outputs the control amount for the actuator 92, based on the set data described above, and acquires sensor values of the control sensor 91 and the status detection sensor 93 corresponding thereto.
- the control unit 10 stores the control amount and the sensor values output therefor, in the storage unit 40.
- the neural network 22 reads the control amount and the sensor values stored by the control unit 10 in the above-described processing, and performs learning through deep learning (S401). Accordingly, the neural network 22 can more efficiently learn a control operation that matches the learning requirements from the start to the end of the operation (i.e., from the start to the goal of the course), in a state where control with which the learning is made non-continuable has been learned.
- the processing in step S401 is repeatedly performed until the entire learning is optimized (S402). A result of the optimization learning is extracted by the learning result output unit 23, and stored in the storage unit 40. Accordingly, in the optimization learning, it is possible to perform learning, with the control with which the learning is made non-continuable being excluded.
- the learning apparatus 1 itself can classify an operation involved in the learning into partial operations, and perform learning. Accordingly, individual optimization can be performed for each classified operation, and the learning can be performed more efficiently.
- the learning apparatus 1 according to this embodiment when learning control of a partial operation, first, control with which the learning is made non-continuable is learned. Accordingly, the learning can be efficiently performed without a person setting detailed conditions in advance for each operation.
- the computer 800 includes a processor 801, a memory 803, a storage apparatus 805, an input interface unit (input I/F unit) 807, a data interface unit (data I/F unit) 809, a communication interface unit (communication I/F unit) 811, and a display apparatus 813.
- a processor 801 a memory 803, a storage apparatus 805, an input interface unit (input I/F unit) 807, a data interface unit (data I/F unit) 809, a communication interface unit (communication I/F unit) 811, and a display apparatus 813.
- the processor 801 controls various types of processing in the computer 800 by executing programs stored in the memory 803. For example, as a result of the processor 801 executing programs stored in the memory 803, the control unit 10, the machine learning unit 20, the operation classifying unit 30, and the like of the learning apparatus 1 can be realized.
- the memory 803 is a storage medium, for example, such as a RAM (random access memory).
- the memory 803 temporarily stores program code of programs that are executed by the processor 801, and data that is required at the time when the programs are executed.
- the storage apparatus 805 is, for example, an auxiliary memory such as a hard disk drive (HDD) or a solid state drive, or a non-volatile storage medium such as a flash memory.
- the storage apparatus 805 stores various programs for realizing the operating system or the above-described configurations. These programs and data are loaded onto the memory 803 as necessary, and referred to by the processor 801. For example, the storage unit 40 described above is realized by the storage apparatus 805.
- the input I/F unit 807 is a device for accepting input from an administrator. Specific examples of the input I/F unit 807 include a keyboard, a mouse, a touch panel, various sensors, a wearable device, and the like.
- the input I/F unit 807 may be connected to the computer 800, for example, via an interface such as a USB (universal serial bus).
- the data I/F unit 809 is a device for inputting data from the outside of the computer 800. Specific examples of the data I/F unit 809 include drive devices and the like for reading data stored in various storage media.
- the data I/F unit 809 may be provided external to the computer 800. In that case, the data I/F unit 809 is connected to the computer 800, for example, via an interface such as a USB.
- the communication I/F unit 811 is a device for performing wired or wireless data communication with an apparatus outside the computer 800 via the Internet N.
- the communication I/F unit 811 may be provided external to the computer 800.
- the communication I/F unit 811 is connected to the computer 800, for example, via an interface such as a USB.
- the display apparatus 813 is a device for displaying various types of information. Specific examples of the display apparatus 813 include a liquid crystal display, an organic EL (electro-luminescence) display, a wearable display, and the like. the display apparatus 813 may be provided external to the computer 800. In that case, the display apparatus 813 is connected to the computer 800, for example, via a display cable or the like.
- the learning apparatus 1 is used for the vehicle 90 controlled to drive autonomously.
- the apparatus to which the learning apparatus 1 is applied is not limited to the example shown in the first embodiment, and the learning apparatus 1 can be applied to various apparatuses.
- an example will be described in which the learning apparatus 1 is applied to the control of a robot whose task is to perform a pick-and-place operation.
- the second embodiment mainly differences from the first embodiment will be described.
- the configuration of the learning apparatus 1 is as in the first embodiment.
- the configuration outside the learning apparatus 1 in this embodiment is such that a control sensor 91' is configured by a sensor group for performing a pick-and-place operation. Specifically, it is configured by a workpiece detection sensor (image sensor), a robot holding force sensor, or the like. Furthermore, the control sensor 91' includes an image recognition algorithm, and can recognize the shape of a workpiece that it is holding.
- the other portions of the configuration outside the learning apparatus 1 are as described in the first embodiment.
- the pick-and-place operation that is a task according to this embodiment refers to an operation that is performed following the procedure below. 1. Recognize workpiece shape and hold the workpiece 2. Lift workpiece held thereby 3. Move lifted workpiece to predetermined position according to workpiece shape 4. Stack workpiece in tube according to shape
- the task is to stack workpiece in tube according to shape.
- the pick-and-place operation that is to be learned may be classified following a procedure similar to that in the first embodiment in which the course for driving of the vehicle 90 is classified.
- the task is divided based on a displacement amount in the operation involved in the learning of the task, into a scene corresponding to an operation of holding a workpiece, a scene corresponding to an operation of carrying a workpiece, and a scene corresponding to an operation of stacking a workpiece.
- the state in which the learning is made non-continuable refers to, for example, a state in which a workpiece cannot be placed into a tube.
- the control that is to be learned is, for example, as follows. - Wrong placement position (workpiece shape and tube inlet shape are different) - Wrong stacking orientation of workpiece (workpiece shape orientation and tube inlet shape orientation are different)
- the learning apparatus 1 if the learning of control with which the learning is made non-continuable is performed in advance, learning of properly recognizing workpiece shapes and tube shapes and orientations when holding workpieces can be performed in advance. Accordingly, making the learning non-continuable can be avoided in the learning in the final stage, and thus the efficiency of learning can be further increased.
- the other portions of the configuration are similar to those in the first embodiment.
- the present invention is not limited thereto, and can be applied to a wide variety of fields. Examples thereof include distinguishing non-defective products from defective products, foods, machine parts, chemical products, drugs, and the like in various fields including industrial fields, fishery fields, agricultural fields, forestry fields, service industries, and medical and health fields. Furthermore, the present invention can be applied to cases where AI technology is applied to products in embedding fields, systems such as social systems using IT techniques, analysis of big data, classifying processing in a wide variety of control apparatuses, and the like.
- a “portion”, “unit” or “procedure” does not merely mean a physical configuration, and there are also cases where processing that is performed by a “portion” or the like is realized by software. Furthermore, processing that is performed by one “portion”, “unit”, “procedure”, or apparatus may be realized by two or more physical configurations or apparatuses, and processing that is performed by two or more “portions” or the like, or apparatuses may be realized by one physical unit or apparatus.
- Additional Remark 1 A learning apparatus including at least one hardware processor, wherein, in a case of learning control of a series of operations involved in a predetermined task, the hardware processor divides the task into a plurality of scenes, and, for each of the divided scenes, specifies a partial operation that is to be performed in the scene, from among the series of operations, and learns control of the partial operation for each of the divided scenes.
- a learning method that is performed by at least one hardware processor for executing a step of learning control of a series of operations involved in a predetermined task, the method including: a step of dividing the task into a plurality of scenes, and, for each of the divided scenes, specifying a partial operation that is to be performed in the scene, from among the series of operations; and a step of learning control of the partial operation for each of the divided scenes.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Automation & Control Theory (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Robotics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Fuzzy Systems (AREA)
- Mechanical Engineering (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
In order to a technique for shortening the time required for a learning apparatus to achieve a learning purpose, without performing manual manipulation, a learning apparatus configured to learn control of a series of operations involved in a predetermined task includes: a classifying unit configured to divide the task into a plurality of scenes, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations; and a partially learning unit configured to learn control of a partial operation for each of the divided scenes.
Description
The present invention relates to a learning apparatus, a learning control method, and a program therefor.
(CROSS-REFERENCES TO RELATED APPLICATIONS)
This application claims priority to Japanese Patent Application No. 2017-104535 filed May 26, 2017, the entire contents of which are incorporated herein by reference.
(CROSS-REFERENCES TO RELATED APPLICATIONS)
This application claims priority to Japanese Patent Application No. 2017-104535 filed May 26, 2017, the entire contents of which are incorporated herein by reference.
Conventionally, artificial intelligence technology (hereinafter, referred to as “AI technology”) such as neural networks has been widely researched (see JP H6-289918A, for example). In particular, with the rise of AI technology called deep learning, for example, the recognition rates of techniques for recognizing targets based on images have improved rapidly in these years, and the recognition rate of classifying images is almost higher than that of humans. Deep learning technology is expected to be applied not only to image recognition but also to various other fields such as speech recognition, personal authentication, behavior prediction, summary writing, machine translation, monitoring, autonomous driving, failure prediction, sensor data analysis, music genre determination, content generation, and security systems.
In machine learning such as deep learning, machines can be trained to attain a predetermined ability. At this time, learning apparatuses that perform machine learning repeatedly perform learning operations until they attain a predetermined ability.
For example, JP H6-289918A discloses a learning control method for robots. In the learning control method described in JP H6-289918A, an input value that is to be supplied to a driving unit of a robot is modified based on a difference between a target motion targeted in a robot operation set in advance by a person and an actual motion when the robot actually operates.
In learning apparatuses that control actuators based on information from many sensors, such as control of an engine and driving of an automobile, chemical plants, or the like, control and sensor information output affect each other, and thus more complex learning has to be performed in order to attain a control method. Accordingly, in learning apparatuses that perform such complex learning, it is not easy for a person to set a target value of the control amount in advance as in JP H6-289918A. On the other hand, in the case of training learning apparatuses without setting a target value, it is necessary to repeat trial and error processing for an extremely large number of times, which is inefficient.
It is an object of the present invention to shorten the time required for a learning apparatus to achieve a learning purpose, without performing manual manipulation.
An aspect of the present invention is directed to a learning apparatus configured to learn control of a series of operations involved in a predetermined task, the apparatus including: a classifying unit configured to divide the task into a plurality of scenes, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations; and a partially learning unit configured to learn control of the partial operation for each of the divided scenes.
With this configuration, the learning apparatus can classify a series of operations involved in the learning into smaller unit according to divided scenes. Accordingly, it is possible to perform learning for each scene, thereby achieving the learning purpose in a shorter period of time.
It is possible that the learning apparatus further includes an output unit configured to output a classification result of the partial operation. With this aspect, learned partial operation classification can be used by other learning apparatuses as well.
It is possible that the learning apparatus further includes a control unit configured to sort the classified partial operations or the divided scenes, in a time-series order of the task. With this aspect, classified partial operations can be sorted and learned in the order of scenes corresponding to a task.
An aspect of the present invention is directed to an autonomous driving control learning apparatus configured to learn control of a series of operations involved in autonomous driving of a vehicle that does laps of a predetermined course, the apparatus including: a classifying unit configured to divide the course into a plurality of scenes based on a position or an orientation in the course relative to a predetermined reference position, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations; and a partially learning unit configured to learn control of the partial operation for each of the divided scenes.
An aspect of the present invention is directed to a robot control learning apparatus configured to learn control of a series of operations involved in a task of holding a predetermined workpiece and stacking the workpiece at a placement position according to a shape of the workpiece, the apparatus including: a classifying unit configured to divide the task into a plurality of scenes based on a displacement amount in an operation involved in learning of the task, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operation; and a partially learning unit configured to learn control of the partial operation for each of the divided scenes.
An aspect of the present invention is directed to a learning method for learning control of a series of operations involved in a predetermined task, the method including: dividing, by a computer, the task into a plurality of scenes, and, for each of the divided scenes, specifying, by the computer, a partial operation that is to be performed in the scene, from among the series of operations; and learning, by the computer, control of the partial operation for each of the divided scenes.
An aspect of the present invention is directed to a program for causing a computer configured to learn control of a series of operations involved in a predetermined task to execute: dividing the task into a plurality of scenes, and, for each of the divided scenes, specifying a partial operation that is to be performed in the scene, from among the series of operations; and learning control of the partial operation for each of the divided scenes.
An aspect of the present invention is directed to an apparatus configured to perform a predetermined task, the apparatus including: a first sensor configured to sense information regarding a scene in which the apparatus performs the predetermined task; an actuator; a second sensor configured to sense information regarding an operation of the apparatus that is performed through control from the actuator; a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and a storage unit configured to store a learning result obtained by the above-described learning apparatus, wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit.
An aspect of the present invention is directed to an autonomous driving control apparatus for doing laps of a predetermined course, the apparatus including: a first sensor configured to sense information regarding a position and an orientation in the predetermined course for the vehicle; an actuator configured to control driving of the vehicle; a second sensor configured to sense a control status of the vehicle as per the actuator; a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and a storage unit configured to store a learning result obtained by the above-described learning apparatus, wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit.
According to the present invention, it is possible to provide a technique for shortening the time required for a learning apparatus to achieve a learning purpose, without performing manual manipulation.
First Embodiment
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that the same constituent elements are denoted by the same reference numerals, and a description thereof may not be repeated. The following embodiments are merely illustrative for the sake of explanation, and are not intended to limit the present invention thereto. Various modifications can be made without departing from the gist thereof.
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that the same constituent elements are denoted by the same reference numerals, and a description thereof may not be repeated. The following embodiments are merely illustrative for the sake of explanation, and are not intended to limit the present invention thereto. Various modifications can be made without departing from the gist thereof.
1. System Outline
Hereinafter, the outline of the system in this embodiment will be described with reference to FIGS. 1 to 3. FIG. 1 is a block diagram showing a schematic configuration of alearning apparatus 1 according to this embodiment. the learning apparatus 1 learns a predetermined task. For example, the learning apparatus 1 according to this embodiment is mounted in a vehicle controlled to drive autonomously (hereinafter, alternatively referred to simply as a “vehicle”) 90, and learns control of the vehicle 90 for autonomous driving along a predetermined course (see FIG. 2). At this time, the learning apparatus 1 is provided with learning data, for example, from an operator or the like. The learning data is data containing, for example, a learning purpose and learning requirements as follows.
Hereinafter, the outline of the system in this embodiment will be described with reference to FIGS. 1 to 3. FIG. 1 is a block diagram showing a schematic configuration of a
Learning Purpose
- Do ten laps of a course within predetermined period of time and reach goal
Learning Requirements
- Do not slide off track
- Running direction is clockwise
- Reach goal
- “Do one lap of course and reach goal” in initial stage level
- Do ten laps of a course within predetermined period of time and reach goal
Learning Requirements
- Do not slide off track
- Running direction is clockwise
- Reach goal
- “Do one lap of course and reach goal” in initial stage level
Note that the task is a matter that is required to be achieved through an operation involved in the learning (“operation involved in the learning” in this embodiment refers to various types of control necessary for autonomous driving of the vehicle 90, and may be considered as being an operation that is performed by the vehicle 90 through the various types of control), and refers to doing laps of a course in this embodiment. Furthermore, the learning purpose is a standard that is to be achieved by the task, and refers to “do ten laps of a course within predetermined period of time and reach goal” as described above in this embodiment. Thus, in this embodiment, it is considered that making the task performable is provided as a learning requirement, in learning in the initial stage level.
Furthermore, in the description below, the learning apparatus 1 is described as an apparatus including a computer such as a PC (personal computer) or a server apparatus, but there is no limitation to this, and it may be realized by, for example, any built-in device that has a processor, a RAM, and a ROM. Furthermore, the configuration of elements installed in the apparatuses is not limited to those realized by software. The apparatuses may have a configuration realized by hardware. For example, a later-described neural network 22 may be configured by an electronic circuit such as a custom LSI (large-scale integration) or an FPGA (field-programmable gate array).
As shown in FIG. 1, the learning apparatus 1 includes a control unit 10, a machine learning unit 20, an operation classifying unit 30, and a storage unit 40.
In the vehicle 90, the control unit 10 is connected to a control sensor 91, an actuator 92, and a status detection sensor 93 provided outside the learning apparatus 1. The control unit 10 controls the actuator 92 in response to output from the control sensor 91 and the status detection sensor 93, thereby performing autonomous driving of the vehicle 90.
The control sensor 91 is a sensor group for controlling the autonomous driving of the vehicle 90. For example, the control sensor 91 is configured by a sensor for sensing obstacles outside the vehicle or a sensor for detecting the state of the road surface such as a vehicle-mounted camera or a laser, for example. Meanwhile, the status detection sensor 93 is a sensor group for detecting the control status of the vehicle 90 in an autonomous driving state. For example, the status detection sensor 93 is configured by a vibration sensor, a noise sensor, a fuel consumption sensor, a vehicle speed sensor, an acceleration sensor, a yaw rate sensor, or the like.
The actuator 92 is controlled by the control unit 10 for autonomous driving of the vehicle 90. The actuator 92 is configured by, for example, an accelerator actuator, a brake actuator, a steering actuator, or the like. The accelerator actuator controls the vehicle driving force by controlling the throttle opening degree in response to a control signal from the control unit 10. The brake actuator controls the braking force on vehicle wheels by controlling the operation amount of the brake pedal in response to a control signal from the control unit 10. The steering actuator performs control of vehicle steering action by controlling driving of a steering assistance motor of an electronic power steering system in response to a control signal from the control unit 10.
Next, the procedure in which the learning apparatus 1 performs learning will be roughly described with reference to FIG. 3. The processing in each step will be described later in detail. FIG. 3 is a flowchart showing the outline of the processing flow when the learning apparatus 1 performs learning. First, as a learning initial stage (S1), learning is performed for the purpose of making the task performable (i.e., making an operation that satisfies the learning requirements in the initial stage performable). In order to learn control of autonomous driving along a predetermined course, the learning apparatus 1 in this embodiment is provided with “do one lap of course and reach goal” as a learning requirement in the initial stage.
After the purpose in the initial stage level has been cleared, operation classification (S2) is performed. In this stage, the content of the learning performed in the learning initial stage S1 is analyzed, so that the task is divided into a plurality of portions based on predetermined parameter (hereinafter, a portion obtained by dividing a task is alternatively referred to as a “scene”), and, in each divided scene, an operation that is to be performed in that scene (hereinafter, alternatively referred to as a “partial operation”) is specified from among the series of operations involved in the task. The predetermined parameter for dividing the task is, for example, a displacement amount in the operation involved in the learning of the task, or the environment in which the operation involved in the learning of the task is performed (the time elapsed from when the task is started, the position from where the task is started, etc.). In this embodiment, the position from where the task is started (the environment in which the operation involved in the learning of the task is performed) is used as the predetermined parameter. That is to say, in this embodiment, the learning apparatus 1 divides the task into scenes based on the position in a course, and classifies the series of operations involved in the learning into scenes based on an operation performed in units of a course corresponding to each divided scene. the efficiency of learning can be increased by performing the learning in units of partial operations classified according to the scenes. Increasing the efficiency of learning may refer to, for example, shortening the time required from when the learning is started to when the learning purpose is achieved.
In the next step after the operations have been classified, learning of control with which the learning is made non-continuable (S3) is performed for each classified partial operation. The state in which the learning is made non-continuable refers to a state in which an operation that is required to be learned through learning is made non-continuable. In this embodiment, the state in which the learning is made non-continuable refers to, for example, states in which the vehicle has slid off the track, in which the vehicle has crashed against a wall or the like and is not moving, in which the vehicle has broken down, or the like. If the learning of control with which the learning is made non-continuable is performed in advance, learning can be performed without making the learning non-continuable when performing learning of optimal control in subsequent steps. Accordingly, the efficiency of the learning can be further increased.
In a learning final stage (S4), the learning is optimized. In this stage, in a state where partial operations classified into scenes and learned are combined, learning for optimally performing the operation from start to end is performed. In this embodiment, learning through doing ten laps of a course within a predetermined period of time and reaching a goal is performed as the learning in the final stage.
2. Detailed Processing
Next, the processing of thelearning apparatus 1 in each step will be described in detail with reference to FIGS. 4 to 8. FIG. 4 is a block diagram showing a detailed configuration of the learning apparatus 1 according to this embodiment. As shown in FIG. 4, the machine learning unit 20 includes a learning data input/output unit 21, a neural network 22, and a learning result output unit 23. The operation classifying unit 30 includes a control data extracting unit 31 and an operation classifying result extracting unit 32.
Hereinafter, the processing of each unit will be described in detail in each step of FIG. 3.
Next, the processing of the
Hereinafter, the processing of each unit will be described in detail in each step of FIG. 3.
2-1. Learning Initial Stage
FIG. 5 is a flowchart showing details of the processing flow in the learning initial stage in step S1 shown in FIG. 3. First, in the learning initial stage (first learning process), the learning data input/output unit 21 accepts learning data (S101). The learning data is data containing, for example, a learning purpose and learning requirements, as described above.
FIG. 5 is a flowchart showing details of the processing flow in the learning initial stage in step S1 shown in FIG. 3. First, in the learning initial stage (first learning process), the learning data input/
In the next step (S102), machine learning is performed. In this embodiment, conditions that restrict individual control operations are not specified in advance, and thus learning of the control operation is performed by the learning apparatus 1 itself. Specifically, the control unit 10 operates the actuator 92 by setting a random control amount thereto. At this time, it is natural that the vehicle 90 cannot drive along the course, and thus the vehicle 90 drives in a haphazard way while sliding off the track, for example. The control unit 10 reads output (hereinafter, alternatively referred to as “sensor value”) from the control sensor 91 and the status detection sensor 93 for the control amount given at random, and stores the data (the control amount and the sensor values) in the storage unit 40. The neural network 22 refers to the storage unit 40 and reads the stored control amount and sensor values, and performs learning of a control operation that matches the learning requirements through deep learning (S102).
In the learning requirements, “do one lap of course and reach goal” is set as the purpose in the initial stage level. Accordingly, in the learning apparatus 1, for example, when it is determined based on output from the control sensor 91 that the vehicle has done one lap of the course and reached the goal, the machine learning is judged to have reached the initial stage level (S103: Y), and the learning in the initial stage is ended.
2-2. Operation Classification
FIG. 6 is a flowchart showing details of the processing flow in the operation classification in step S2 shown in FIG. 3. First, when performing the operation classification processing, the controldata extracting unit 31 extracts the sensor value of the control sensor 91 when the learning initial stage ends and the control amount of the actuator 92 and the sensor value of the status detection sensor 93 corresponding thereto, from the storage unit 40 (S201). the control data extracting unit 31 inputs the extracted values to the neural network 22 as learning data.
FIG. 6 is a flowchart showing details of the processing flow in the operation classification in step S2 shown in FIG. 3. First, when performing the operation classification processing, the control
Next, the neural network 22 performs machine learning based on the learning data input by the control data extracting unit 31 (S202). At this time, the neural network 22 divides the course-running operation into a predetermined number of divided scenes.
Hereinafter, the processing in which the neural network 22 classifies the course-running operation into scenes will be described in more detail. The neural network 22 classifies the course-running operation into scenes based on scene vectors and operation vectors. The scene vectors in this embodiment express a scene of the task that is performed by the vehicle 90. In this embodiment, the scene vectors are acquired, for example, from sensor values (e.g., a position (or distance) from the start point, and a direction from the start point) that are output by the control sensor 91. For example, assuming the xy coordinates taking the start point as the origin, the scene vector at a point l can be expressed as (lx, y).
Meanwhile, the operation vectors express the control status of the driving vehicle 90. The operation vectors are acquired, for example, from sensor values (e.g., velocity, acceleration, angular velocity, angular acceleration, etc.) that are output by the status detection sensor 93. For example, the operation vector at the point l can be expressed as (vl, al) using a velocity v and an acceleration a at the point l.
The neural network 22 divides the task into scenes based on the scene vector (lx, y), and learns, for each divided scene, operation classification that is to be learned in that scene based on the operation vector (vl, al). Accordingly, the learning apparatus 1 can learn control of a partial operation according to a scene, by judging in which scene the learning apparatus 1 is currently present. For example, the neural network 22 focuses on the position that is expressed by the scene vector as well as the point at which the operation vector changes, thereby finding acceleration, deceleration, change of direction, and the like of the operation of the vehicle 90, so that the series of operations can be classified into operations corresponding to the scenes based on the change point. Also, for example, the neural network 22 can learn the operation classification based on the similarity levels of the operation vectors.
In the example of the course shown in FIG. 2, the task is divided into five scenes (a) to (e) based on a position on the course. The partial operations classified into scenes are, for example, as follows.
Scene (a): first straight partial operation (e.g., control of deceleration timing, driving position, and the like when reaching the subsequent first corner)
Scene (b): first corner partial operation (e.g., control of steering at a corner, acceleration timing when entering the second straight, and the like)
Scene (c): second straight partial operation (e.g., control of deceleration timing, driving position, and the like when reaching the subsequent second corner)
Scene (d): second corner partial operation (e.g., control of steering at a corner, acceleration timing, when entering the third straight and the like)
Scene (e): third straight partial operation (e.g., control of acceleration and the like when entering the first straight)
Scene (a): first straight partial operation (e.g., control of deceleration timing, driving position, and the like when reaching the subsequent first corner)
Scene (b): first corner partial operation (e.g., control of steering at a corner, acceleration timing when entering the second straight, and the like)
Scene (c): second straight partial operation (e.g., control of deceleration timing, driving position, and the like when reaching the subsequent second corner)
Scene (d): second corner partial operation (e.g., control of steering at a corner, acceleration timing, when entering the third straight and the like)
Scene (e): third straight partial operation (e.g., control of acceleration and the like when entering the first straight)
Note that the neural network 22 can preferably sort the divided scenes in the time-series order of the task. For example, in this embodiment, the neural network 22 can sort the divided scenes in order of positions in the course from closest to farthest from the task start position (start position) in the direction in which the vehicle 90 travels.
The operation classifying result extracting unit 32 extracts the partial operation classification learned by the neural network 22, and stores it in the storage unit 40 (S203). At this time, the learning result output unit 23 can output a learning result according to the partial operation classification. Accordingly, the neural network 22 can, for example, accept a classification result of the partial learning, as learning data, from another learning apparatus 1' that has a similar configuration, thereby omitting learning regarding the partial operation classification.
2-3. Learning of Control With Which Learning Is Made Non-Continuable
FIG. 7 is a flowchart showing details of the processing flow in the learning of control with which the learning is made non-continuable (second learning process) in step S3 shown in FIG. 3. First, the learning data input/output unit 21 selects a partial operation from among the partial operations classified in the processing in step S2, referring to the storage unit 40, and extracts the control amount for the actuator 92 necessary for this partial operation. Furthermore, the learning data input/output unit 21 performs control in the extracted control amount, referring to the storage unit 40, and judges whether or not the learning has been made non-continuable as a result of the control, for example, based on output from the status detection sensor 93 and the like. The learning data input/output unit 21 reads the extracted control amount and information as to whether or not the learning has been made non-continuable as a result of the control, as learning data, and gives them to the neural network 22 as learning data. The neural network 22 performs learning through deep learning based on the given learning data (S301).
FIG. 7 is a flowchart showing details of the processing flow in the learning of control with which the learning is made non-continuable (second learning process) in step S3 shown in FIG. 3. First, the learning data input/
At this time, the learning result output unit 23 can output a learning result of the control with which the learning is made non-continuable. Accordingly, the neural network 22 can, for example, accept control with which the learning is made non-continuable, as learning data, from another learning apparatus 1' that has a similar configuration, and perform an additional learning process (S302). Thus, it is possible to perform a more efficient learning process. Note that the processing in step S302 is not essential.
The learning apparatus 1 performs the processing in step S301 (and S302) on all classified partial operations (S303).
Although not essential, after learning the control with which the learning is made non-continuable, for all classified partial operations, the learning apparatus 1 may perform learning again through the series of operations (S304). Accordingly, faster course-running control can be performed.
In this manner, the learning apparatus 1 according to this embodiment performs learning of control with which the learning is made non-continuable, for classified partial operations, and thus it is possible to perform learning while avoiding such control in the subsequent learning. Accordingly, it is possible to perform a more efficient learning process.
2-4. Optimization Learning
FIG. 8 is a flowchart showing details of the processing flow in the optimization learning (third learning process) in step S4 shown in FIG. 3. In the optimization learning, the learning that has been performed in the steps up to S3 is optimized, and thus learning for achieving the learning purpose (“do ten laps of a course within predetermined period of time and reach goal” in this embodiment) given as learning data when the learning was started is performed. In the optimization learning, it is possible to perform the learning, with the control with which the learning is made non-continuable, learned in step S3, being excluded. At this time, the learning data input/output unit 21 refers to the storage unit 40 and extracts learning data (which is set by an operator) input in the learning initial stage (S1 in FIG. 3). Furthermore, the learning data input/output unit 21 extracts a status of the neural network 22 after the learning of control with which the learning is made non-continuable, referring to the storage unit 40. The learning data input/output unit 21 sets the extracted data to the control unit 10.
FIG. 8 is a flowchart showing details of the processing flow in the optimization learning (third learning process) in step S4 shown in FIG. 3. In the optimization learning, the learning that has been performed in the steps up to S3 is optimized, and thus learning for achieving the learning purpose (“do ten laps of a course within predetermined period of time and reach goal” in this embodiment) given as learning data when the learning was started is performed. In the optimization learning, it is possible to perform the learning, with the control with which the learning is made non-continuable, learned in step S3, being excluded. At this time, the learning data input/
The control unit 10 outputs the control amount for the actuator 92, based on the set data described above, and acquires sensor values of the control sensor 91 and the status detection sensor 93 corresponding thereto. The control unit 10 stores the control amount and the sensor values output therefor, in the storage unit 40.
The neural network 22 reads the control amount and the sensor values stored by the control unit 10 in the above-described processing, and performs learning through deep learning (S401). Accordingly, the neural network 22 can more efficiently learn a control operation that matches the learning requirements from the start to the end of the operation (i.e., from the start to the goal of the course), in a state where control with which the learning is made non-continuable has been learned. The processing in step S401 is repeatedly performed until the entire learning is optimized (S402). A result of the optimization learning is extracted by the learning result output unit 23, and stored in the storage unit 40. Accordingly, in the optimization learning, it is possible to perform learning, with the control with which the learning is made non-continuable being excluded.
In this manner, with the learning apparatus 1 according to this embodiment, the learning apparatus 1 itself can classify an operation involved in the learning into partial operations, and perform learning. Accordingly, individual optimization can be performed for each classified operation, and the learning can be performed more efficiently. Moreover, with the learning apparatus 1 according to this embodiment, when learning control of a partial operation, first, control with which the learning is made non-continuable is learned. Accordingly, the learning can be efficiently performed without a person setting detailed conditions in advance for each operation.
Hardware Configuration
Hereinafter, an example of the hardware configuration in a case where thelearning apparatus 1 described above is realized by a computer 800 will be described with reference to FIG. 9. Note that the configuration of each apparatus may be realized by a plurality of separate devices.
Hereinafter, an example of the hardware configuration in a case where the
As shown in FIG. 9, the computer 800 includes a processor 801, a memory 803, a storage apparatus 805, an input interface unit (input I/F unit) 807, a data interface unit (data I/F unit) 809, a communication interface unit (communication I/F unit) 811, and a display apparatus 813.
The processor 801 controls various types of processing in the computer 800 by executing programs stored in the memory 803. For example, as a result of the processor 801 executing programs stored in the memory 803, the control unit 10, the machine learning unit 20, the operation classifying unit 30, and the like of the learning apparatus 1 can be realized.
The memory 803 is a storage medium, for example, such as a RAM (random access memory). The memory 803 temporarily stores program code of programs that are executed by the processor 801, and data that is required at the time when the programs are executed.
The storage apparatus 805 is, for example, an auxiliary memory such as a hard disk drive (HDD) or a solid state drive, or a non-volatile storage medium such as a flash memory. The storage apparatus 805 stores various programs for realizing the operating system or the above-described configurations. These programs and data are loaded onto the memory 803 as necessary, and referred to by the processor 801. For example, the storage unit 40 described above is realized by the storage apparatus 805.
The input I/F unit 807 is a device for accepting input from an administrator. Specific examples of the input I/F unit 807 include a keyboard, a mouse, a touch panel, various sensors, a wearable device, and the like. The input I/F unit 807 may be connected to the computer 800, for example, via an interface such as a USB (universal serial bus).
The data I/F unit 809 is a device for inputting data from the outside of the computer 800. Specific examples of the data I/F unit 809 include drive devices and the like for reading data stored in various storage media. The data I/F unit 809 may be provided external to the computer 800. In that case, the data I/F unit 809 is connected to the computer 800, for example, via an interface such as a USB.
The communication I/F unit 811 is a device for performing wired or wireless data communication with an apparatus outside the computer 800 via the Internet N. the communication I/F unit 811 may be provided external to the computer 800. In that case, the communication I/F unit 811 is connected to the computer 800, for example, via an interface such as a USB.
The display apparatus 813 is a device for displaying various types of information. Specific examples of the display apparatus 813 include a liquid crystal display, an organic EL (electro-luminescence) display, a wearable display, and the like. the display apparatus 813 may be provided external to the computer 800. In that case, the display apparatus 813 is connected to the computer 800, for example, via a display cable or the like.
Second Embodiment
In the first embodiment, an example was described in which thelearning apparatus 1 is used for the vehicle 90 controlled to drive autonomously. However, the apparatus to which the learning apparatus 1 is applied is not limited to the example shown in the first embodiment, and the learning apparatus 1 can be applied to various apparatuses. In this embodiment, an example will be described in which the learning apparatus 1 is applied to the control of a robot whose task is to perform a pick-and-place operation. In the second embodiment, mainly differences from the first embodiment will be described.
In the first embodiment, an example was described in which the
First, differences from the first embodiment in terms of the system configuration according to this embodiment will be described with reference to FIG. 10. The configuration of the learning apparatus 1 is as in the first embodiment. On the other hand, the configuration outside the learning apparatus 1 in this embodiment is such that a control sensor 91' is configured by a sensor group for performing a pick-and-place operation. Specifically, it is configured by a workpiece detection sensor (image sensor), a robot holding force sensor, or the like. Furthermore, the control sensor 91' includes an image recognition algorithm, and can recognize the shape of a workpiece that it is holding. The other portions of the configuration outside the learning apparatus 1 are as described in the first embodiment.
Next, differences between learning according to this embodiment and learning according to the first embodiment will be described. The pick-and-place operation that is a task according to this embodiment refers to an operation that is performed following the procedure below.
1. Recognize workpiece shape and hold the workpiece
2. Lift workpiece held thereby
3. Move lifted workpiece to predetermined position according to workpiece shape
4. Stack workpiece in tube according to shape
1. Recognize workpiece shape and hold the workpiece
2. Lift workpiece held thereby
3. Move lifted workpiece to predetermined position according to workpiece shape
4. Stack workpiece in tube according to shape
In learning of robot control according to this embodiment, a given learning purpose and learning requirements are as follows.
Learning Purpose
- Stack ten workpieces in each tube with an inlet (in the shape of a circle, a quadrangle, and a triangle) according to workpiece shape within a predetermined period of time, through pick-and-place operation, from a container containing, in a mixed manner, workpieces having three different shapes (e.g., workpieces in three shapes consisting of a cylinder, a quadratic prism, and a triangular prism)
Learning Requirements
- Do not place workpieces at a position other than predetermined position
- Stack ten workpieces in tubes according to workpiece shapes
- “Stack one workpiece in proper workpiece-shaped tube” in initial stage level
- Stack ten workpieces in each tube with an inlet (in the shape of a circle, a quadrangle, and a triangle) according to workpiece shape within a predetermined period of time, through pick-and-place operation, from a container containing, in a mixed manner, workpieces having three different shapes (e.g., workpieces in three shapes consisting of a cylinder, a quadratic prism, and a triangular prism)
Learning Requirements
- Do not place workpieces at a position other than predetermined position
- Stack ten workpieces in tubes according to workpiece shapes
- “Stack one workpiece in proper workpiece-shaped tube” in initial stage level
In this embodiment, the task is to stack workpiece in tube according to shape. Furthermore, in this embodiment, the pick-and-place operation that is to be learned may be classified following a procedure similar to that in the first embodiment in which the course for driving of the vehicle 90 is classified. For example, in this embodiment, the task is divided based on a displacement amount in the operation involved in the learning of the task, into a scene corresponding to an operation of holding a workpiece, a scene corresponding to an operation of carrying a workpiece, and a scene corresponding to an operation of stacking a workpiece.
Furthermore, in this embodiment, the state in which the learning is made non-continuable refers to, for example, a state in which a workpiece cannot be placed into a tube. Accordingly, in the learning stage of control with which the learning is made non-continuable, the control that is to be learned is, for example, as follows.
- Wrong placement position (workpiece shape and tube inlet shape are different)
- Wrong stacking orientation of workpiece (workpiece shape orientation and tube inlet shape orientation are different)
- Wrong placement position (workpiece shape and tube inlet shape are different)
- Wrong stacking orientation of workpiece (workpiece shape orientation and tube inlet shape orientation are different)
With the learning apparatus 1 according to this embodiment, if the learning of control with which the learning is made non-continuable is performed in advance, learning of properly recognizing workpiece shapes and tube shapes and orientations when holding workpieces can be performed in advance. Accordingly, making the learning non-continuable can be avoided in the learning in the final stage, and thus the efficiency of learning can be further increased.
The other portions of the configuration are similar to those in the first embodiment.
The other portions of the configuration are similar to those in the first embodiment.
Above, embodiments of the present invention have been described. The foregoing embodiments are for the purpose of facilitating understanding of the present invention, and are not to be interpreted as limiting the present invention. The invention can be altered and improved without departing from the gist thereof. For example, the steps in the above-described processing flows can be partially omitted, be rearranged in any desired order, or be executed in parallel, as long as doing so does not cause conflict in the processing content.
In the foregoing embodiments, examples were described in which the system according to the present invention is used to manage abilities acquired by machines according to AI technology such as deep learning, but the present invention is not limited thereto, and can be applied to a wide variety of fields. Examples thereof include distinguishing non-defective products from defective products, foods, machine parts, chemical products, drugs, and the like in various fields including industrial fields, fishery fields, agricultural fields, forestry fields, service industries, and medical and health fields. Furthermore, the present invention can be applied to cases where AI technology is applied to products in embedding fields, systems such as social systems using IT techniques, analysis of big data, classifying processing in a wide variety of control apparatuses, and the like.
Note that, in this specification, a “portion”, “unit” or “procedure” does not merely mean a physical configuration, and there are also cases where processing that is performed by a “portion” or the like is realized by software. Furthermore, processing that is performed by one “portion”, “unit”, “procedure”, or apparatus may be realized by two or more physical configurations or apparatuses, and processing that is performed by two or more “portions” or the like, or apparatuses may be realized by one physical unit or apparatus.
Note that part of or the entirety of the foregoing embodiments may be described as in Additional Remarks below, but there is no limitation to this.
Additional Remark 1
A learning apparatus including at least one hardware processor,
wherein, in a case of learning control of a series of operations involved in a predetermined task, the hardware processor
divides the task into a plurality of scenes, and, for each of the divided scenes, specifies a partial operation that is to be performed in the scene, from among the series of operations, and
learns control of the partial operation for each of the divided scenes.
Additional Remark 2
A learning method that is performed by at least one hardware processor for executing a step of learning control of a series of operations involved in a predetermined task, the method including:
a step of dividing the task into a plurality of scenes, and, for each of the divided scenes, specifying a partial operation that is to be performed in the scene, from among the series of operations; and
a step of learning control of the partial operation for each of the divided scenes.
A learning apparatus including at least one hardware processor,
wherein, in a case of learning control of a series of operations involved in a predetermined task, the hardware processor
divides the task into a plurality of scenes, and, for each of the divided scenes, specifies a partial operation that is to be performed in the scene, from among the series of operations, and
learns control of the partial operation for each of the divided scenes.
A learning method that is performed by at least one hardware processor for executing a step of learning control of a series of operations involved in a predetermined task, the method including:
a step of dividing the task into a plurality of scenes, and, for each of the divided scenes, specifying a partial operation that is to be performed in the scene, from among the series of operations; and
a step of learning control of the partial operation for each of the divided scenes.
Claims (9)
- A learning apparatus configured to learn control of a series of operations involved in a predetermined task, the apparatus comprising:
a classifying unit configured to divide the task into a plurality of scenes, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations; and
a partially learning unit configured to learn control of the partial operation for each of the divided scenes. - The learning apparatus according to claim 1, wherein the learning apparatus further comprises an output unit configured to output a classification result of the partial operation.
- The learning apparatus according to claim 1, wherein the learning apparatus further comprises a control unit configured to sort the classified partial operations or the divided scenes, in a time-series order of the task.
- An autonomous driving control learning apparatus configured to learn control of a series of operations involved in autonomous driving of a vehicle that does laps of a predetermined course, the apparatus comprising:
a classifying unit configured to divide the course into a plurality of scenes based on a position or an orientation in the course relative to a predetermined reference position, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations; and
a partially learning unit configured to learn control of the partial operation for each of the divided scenes. - A robot control learning apparatus configured to learn control of a series of operations involved in a task of holding a predetermined workpiece and stacking the workpiece at a placement position according to a shape of the workpiece, the apparatus comprising:
a classifying unit configured to divide the task into a plurality of scenes based on a displacement amount in an operation involved in learning of the task, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operation; and
a partially learning unit configured to learn control of the partial operation for each of the divided scenes. - A learning method for learning control of a series of operations involved in a predetermined task, the method comprising:
dividing, by a computer, the task into a plurality of scenes, and, for each of the divided scenes, specifying, by the computer, a partial operation that is to be performed in the scene, from among the series of operations; and
learning, by the computer, control of the partial operation for each of the divided scenes. - A program for causing a computer configured to learn control of a series of operations involved in a predetermined task to execute:
dividing the task into a plurality of scenes, and, for each of the divided scenes, specifying a partial operation that is to be performed in the scene, from among the series of operations; and
learning control of the partial operation for each of the divided scenes. - An apparatus configured to perform a predetermined task, the apparatus comprising:
a first sensor configured to sense information regarding a scene in which the apparatus performs the predetermined task;
an actuator;
a second sensor configured to sense information regarding an operation of the apparatus that is performed through control from the actuator;
a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and
a storage unit configured to store a learning result obtained by the learning apparatus according to any one of claims 1 to 3,
wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit. - An autonomous driving control apparatus configured to control a vehicle that that does laps of a predetermined course, the apparatus comprising:
a first sensor configured to sense information regarding a position and an orientation in the predetermined course for the vehicle;
an actuator configured to control driving of the vehicle;
a second sensor configured to sense a control status of the vehicle as per the actuator;
a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and
a storage unit configured to store a learning result obtained by the learning apparatus according to any one of claims 1 to 3,
wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2017104535A JP6978722B2 (en) | 2017-05-26 | 2017-05-26 | Learning device, learning control method, and its program |
| JP2017-104535 | 2017-05-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018216492A1 true WO2018216492A1 (en) | 2018-11-29 |
Family
ID=62386889
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2018/018141 Ceased WO2018216492A1 (en) | 2017-05-26 | 2018-05-10 | Learning apparatus, learning control method, program therefor |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP6978722B2 (en) |
| WO (1) | WO2018216492A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102644164B1 (en) * | 2020-09-16 | 2024-03-07 | 한국생산기술연구원 | Learning method of unit action deep learning model and robot control method using the same |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH06289918A (en) | 1993-04-02 | 1994-10-18 | Nippon Telegr & Teleph Corp <Ntt> | Robot learning control method |
| US20130151442A1 (en) * | 2011-12-13 | 2013-06-13 | Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University) | Method for learning task skill and robot using thereof |
| US9008840B1 (en) * | 2013-04-19 | 2015-04-14 | Brain Corporation | Apparatus and methods for reinforcement-guided supervised learning |
| US20150127155A1 (en) * | 2011-06-02 | 2015-05-07 | Brain Corporation | Apparatus and methods for operating robotic devices using selective state space training |
-
2017
- 2017-05-26 JP JP2017104535A patent/JP6978722B2/en active Active
-
2018
- 2018-05-10 WO PCT/JP2018/018141 patent/WO2018216492A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH06289918A (en) | 1993-04-02 | 1994-10-18 | Nippon Telegr & Teleph Corp <Ntt> | Robot learning control method |
| US20150127155A1 (en) * | 2011-06-02 | 2015-05-07 | Brain Corporation | Apparatus and methods for operating robotic devices using selective state space training |
| US20130151442A1 (en) * | 2011-12-13 | 2013-06-13 | Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University) | Method for learning task skill and robot using thereof |
| US9008840B1 (en) * | 2013-04-19 | 2015-04-14 | Brain Corporation | Apparatus and methods for reinforcement-guided supervised learning |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2018200540A (en) | 2018-12-20 |
| JP6978722B2 (en) | 2021-12-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Liu et al. | Learning end-to-end multimodal sensor policies for autonomous navigation | |
| CN111868641B (en) | Method for generating a training data set for training an artificial intelligence module of a vehicle control system | |
| US11092965B2 (en) | Method and device for driving dynamics control for a transportation vehicle | |
| JP7091820B2 (en) | Control system, learning data creation device, learning device and judgment device | |
| US11960292B2 (en) | Method and system for developing autonomous vehicle training simulations | |
| EP3670109B1 (en) | Method and apparatus for controlling behavior of service robot | |
| WO2018216490A1 (en) | Learning apparatus, learning control method, program therefor | |
| Gallardo et al. | Autonomous decision making for a driver-less car | |
| Majewski et al. | Conceptual design of innovative speech interfaces with augmented reality and interactive systems for controlling loader cranes | |
| WO2018216492A1 (en) | Learning apparatus, learning control method, program therefor | |
| WO2018216493A1 (en) | Learning apparatus, learning control method, and program therefor | |
| JP7617817B2 (en) | Driving assistance device, monitoring device, driving assistance method, and program | |
| Kang et al. | Fusion drive: End-to-end multi modal sensor fusion for guided low-cost autonomous vehicle | |
| CN115131399A (en) | Tracking of multiple objects through cooperation of multiple neural networks | |
| CN114077242A (en) | Device and method for controlling a hardware agent in a control situation with a plurality of hardware agents | |
| CN113029155A (en) | Robot automatic navigation method and device, electronic equipment and storage medium | |
| US12428007B2 (en) | Active learning on fallback data | |
| CN118605155A (en) | A UAV tracking coverage method and device | |
| CN105573469B (en) | Sensor control method and device | |
| Zhang et al. | Zeroth-Order Optimization Attacks on Deep Reinforcement Learning-Based Lane Changing Algorithms for Autonomous Vehicles. | |
| Kumar et al. | Situational Intelligence-based Vehicle Trajectory Prediction in an Unstructured Off-road Environment | |
| CN116670669A (en) | A facility for robust classification and regression of time series | |
| WO2021160273A1 (en) | Computing system and method using end-to-end modeling for a simulated traffic agent in a simulation environment | |
| WO2023009926A1 (en) | Method and system for developing autonomous vehicle training simulations | |
| Asy’ari et al. | Intelligent Small Scale Autonomous Vehicle Development Based on Convolutional Neural Network (CNN) for Steering Angle Prediction |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18727891 Country of ref document: EP Kind code of ref document: A1 |
|
| DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18727891 Country of ref document: EP Kind code of ref document: A1 |