CN119567261B - Robot task execution method, device, robot and active perspective selection system - Google Patents
Robot task execution method, device, robot and active perspective selection systemInfo
- Publication number
- CN119567261B CN119567261B CN202411924317.4A CN202411924317A CN119567261B CN 119567261 B CN119567261 B CN 119567261B CN 202411924317 A CN202411924317 A CN 202411924317A CN 119567261 B CN119567261 B CN 119567261B
- Authority
- CN
- China
- Prior art keywords
- target
- robot
- model
- camera system
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1661—Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
- B25J9/161—Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Manipulator (AREA)
Abstract
The invention provides a robot task execution method, a robot task execution device, a robot and an active visual angle selection system, and relates to the technical field of robots, wherein the robot is connected with a mobile single-camera system, and when a target task is executed by the robot, the robot can acquire first observation data of the mobile single-camera system under the current visual angle; and controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.
Description
Technical Field
The present invention relates to the field of robots, and in particular, to a method and apparatus for executing tasks by a robot, and an active view angle selection system.
Background
Robotic handling is a central challenge in robotics and is critical for a variety of applications, from industrial automation to healthcare. Visual robotic operations are currently mainly based on imitation learning (Imitation Learning, abbreviated as IL). Imitation learning allows robots to learn complex tasks by observing expert demonstration and map observations into the actions of the robotic arm. Thus, visual observation is critical to mimic the efficiency of learning. Existing methods rely on fixed camera settings, which may be single camera systems that use only one camera for viewing, typically using an "on-eye" setting, i.e. where the camera is mounted near the robot end effector (e.g. wrist camera) or fixed in an external scene to cover the whole task area, or multi-camera systems that consist of multiple fixed external cameras or in combination with wrist cameras.
However, the limited field of view of a single camera system may obscure critical parts or objects in the environment, thereby negatively affecting task performance. Multi-camera systems, while capable of providing more comprehensive scene coverage, introduce complexity in that large amounts of redundant or unrelated information may limit learning algorithms, reducing efficiency. In addition, these passive, static, multi-camera system settings do not always provide the most relevant task information, resulting in undesirable decisions that affect task execution efficiency.
Disclosure of Invention
The invention aims to provide a robot task execution method, a robot task execution device, a robot and an active visual angle selection system so as to improve task execution efficiency.
In a first aspect, the present invention provides a robot task execution method applied to a robot connected to a mobile single camera system, the robot task execution method comprising:
Acquiring first observation data of a mobile single camera system under a current view angle;
Determining a target action of a robot arm and a target visual angle corresponding to a mobile single-camera system according to the first observation data and a preset target task;
and controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task.
Further, determining, according to the first observation data and the preset target task, a target action of the robot arm and a target viewing angle corresponding to the mobile single-camera system, includes:
performing motion prediction of the current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target motion;
Performing visual angle prediction of a next time block of the mobile camera system according to the first observation data, the target action and the trained visual angle model to obtain a target visual angle;
the control model and the view angle model are obtained based on sample data combined training under a plurality of adjacent time blocks, the sample data comprise an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles.
Further, the control model comprises an encoder and a decoder, the encoder adopts a pre-trained multi-view mask self-encoder, the decoder comprises a transducer model, the encoder is used for extracting characteristics of input first observed data, and the decoder is used for converting first target characteristics output by the encoder into target actions.
Further, the view angle model comprises a transducer model and a SoftMax activation function, the transducer model is used for extracting characteristics of the input first observation data and target actions, and the SoftMax activation function is used for converting second target characteristics output by the transducer model into a target view angle.
Further, the robot task execution method further includes:
acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles;
Based on the sample data, a control model and a visual angle model are obtained through combined training.
Further, based on each sample data, the joint training obtains a control model and a view angle model, including:
Randomly selecting first sample data and second sample data corresponding to adjacent first time blocks and second time blocks from each sample data;
Inputting second observation data randomly selected from the first sample data under a first view angle into a control model to conduct motion prediction under a first time block, obtaining a first prediction motion output by the control model, and updating parameters of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data;
Inputting the second observation data and the mechanical arm action data in the first sample data into a view angle model to conduct view angle prediction of a second time block, and obtaining a predicted camera view angle output by the view angle model; inputting third observation data of the second sample data under the predicted camera view angle into the control model to conduct motion prediction under a second time block, obtaining a second predicted motion output by the control model, and updating parameters of the control model according to second loss between the second predicted motion and mechanical arm motion data in the second sample data;
using the second loss of the second time block as supervision to update parameters of the view angle model;
And re-executing the step of randomly selecting the first sample data and the second sample data corresponding to the adjacent first time block and second time block from the sample data until the control model and the visual angle model are converged.
In a second aspect, the present invention also provides a robot task execution device applied to a robot connected to a mobile single camera system, the robot task execution device comprising:
the acquisition module is used for acquiring first observation data of the mobile single-camera system under the current view angle;
The determining module is used for determining a target action of the robot mechanical arm and a target visual angle corresponding to the mobile single-camera system according to the first observation data and a preset target task;
and the control module is used for controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task.
In a third aspect, the present invention also provides a robot, including a memory, and a processor, where the memory stores a computer program executable on the processor, and the processor implements the robot task execution method of the first aspect when executing the computer program.
In a fourth aspect, the present invention further provides an active perspective selection system, including the robot of the third aspect, further including a mobile single camera system, the mobile single camera system being connected to the robot.
In a fifth aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the robot task execution method of the first aspect.
According to the robot task execution method, the robot task execution device, the robot and the active visual angle selection system, the robot is connected with the mobile single-camera system, when the robot executes a target task, the robot can acquire first observation data of the mobile single-camera system under the current visual angle, the target action of the robot mechanical arm and the target visual angle corresponding to the mobile single-camera system are determined according to the first observation data and the preset target task, the mechanical arm is controlled to execute the target action, and the mobile single-camera system is controlled to move to the target visual angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of a conventional robot task execution process;
Fig. 2 is a schematic flow chart of a method for executing a robot task according to an embodiment of the present invention;
fig. 3 is a schematic view of a scenario of a robot task execution process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a model training process according to an embodiment of the present invention;
Fig. 5 is a flow chart of another method for executing a robot task according to an embodiment of the present invention;
Fig. 6 is a schematic structural diagram of a robot task execution device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a robot according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an active view selection system according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 1, the existing robot generally adopts a fixed single-camera system or a fixed multi-camera system, that is, the visual angle of the single-camera system or the multi-camera system cannot be changed in the task execution process, and after the control model is obtained through expert data training, the control model can convert the observation data of the single-camera system or the multi-camera system into mechanical arm action output.
However, the visual field of the single-camera system is limited, which affects task performance, and the multi-camera system has a large amount of redundant or irrelevant information, so that the calculation efficiency of the control model is low, and the decision effect is affected, which results in lower task execution efficiency. Based on the above, the method, the device, the robot and the active view angle selection system for executing the robot task provided by the embodiment of the invention adopt a movable single-camera system (namely, a movable single-camera system), allow the movable single-camera system to change the view angle of the movable single-camera system at different positions, and adopt a robot active view angle selection strategy based on imitation learning, through actively selecting the optimal view angle of the movable single-camera system at the next time block, the problem of limited view range of the single-camera system can be avoided, and meanwhile, the influence of redundant or irrelevant information in the multi-camera system on the decision effect is avoided, so that the task execution efficiency is improved.
For the sake of understanding the present embodiment, a detailed description will be given of a method for executing a robot task disclosed in the present embodiment.
The embodiment of the invention provides a robot task execution method, which is applied to a robot connected with a mobile single-camera system and can be executed by a main control device such as a controller of the robot, wherein the mobile single-camera system can change the visual angle by changing the position and/or the angle, and the mobile single-camera system can adopt an RGB camera. Referring to fig. 2, a flow chart of a robot task execution method mainly includes steps S210 to S230 as follows:
step S210, acquiring first observation data of the mobile single camera system at the current viewing angle.
The mobile single-camera system is in communication connection with the robot, and in the task execution process, the mobile single-camera system can transmit collected observation data to the robot, and can also move based on the control of the robot so as to change the visual angle. The observation data may be image data, such as video or pictures, taken by a camera in the mobile single camera system, where the observation data transmitted to the robot by the mobile single camera system at the current time block is referred to as first observation data.
Step S220, determining a target action of the robot arm and a target visual angle corresponding to the mobile single-camera system according to the first observation data and a preset target task.
The target motion of the mechanical arm in the current time block can be predicted according to the first observation data, and then the optimal view angle of the next time block, namely the target view angle, is determined according to the first observation data and the predicted target motion. The target motion may include a motion of one mechanical arm, or may include a motion of two mechanical arms.
In some possible embodiments, the prediction of the target motion and the target view angle may be performed by a pre-trained control model and a view angle model, referring to a schematic scene diagram of the robot task execution process shown in fig. 3, the control model may convert the observation input of the mobile single-camera system into the prediction of the mechanical arm motion, and the view angle model outputs the optimal view angle of the next time block according to the observation and the predicted mechanical arm motion of the mobile single-camera system. Based on this, the step S220 may include performing motion prediction of a current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target motion, and performing view prediction of a next time block of the mobile camera system according to the first observation data, the target motion and the trained view model to obtain a target view. The control model and the view angle model are obtained based on sample data combined training under a plurality of adjacent time blocks, the sample data comprise an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles.
In this embodiment, the robot may perform one or more tasks, each task type corresponding to a set of control models and perspective models. If the robot can execute multiple tasks, a corresponding control model and a corresponding visual angle model can be selected according to the task type of the target task currently executed.
Optionally, the control model may be composed of an encoder and a decoder, the encoder may use a pre-trained multi-view mask self-encoder, the decoder may use a transform model, the encoder is used for extracting features of the input first observation data, and the decoder is used for converting the first target features output by the encoder into target actions, wherein the first target features are obtained by extracting features of the first observation data by the encoder. The decoder may be a diffusion-based transducer model, a classical transducer decoder model, or the like. The control model of the multi-view mask self-encoder based on the pre-training has better extraction capability of view angle characteristics.
Alternatively, the view angle model may be composed of a transducer model and a SoftMax activation function, where the transducer model is used to perform feature extraction on the input first observation data and the target action, and the SoftMax activation function is used to convert a second target feature output by the transducer model into the target view angle, where the second target feature is obtained by performing feature extraction on the input first observation data and the target action by the transducer model. The visual angle model can better extract the characteristics of the current observation data and actions so as to predict the target visual angle.
Step S230, the mechanical arm is controlled to execute the target action, and the mobile single-camera system is controlled to move to the target visual angle so as to complete the target task.
After the target action and the target view angle are obtained in step S220, the robot can control the mechanical arm to execute the target action and send a control instruction corresponding to the target view angle to the mobile single camera system, so that the mobile single camera system moves to the target view angle based on the control instruction. Steps S210 to S230 may then be performed in a loop until the target task is completed.
The robot task execution method provided by the embodiment of the invention not only can complete the grabbing task, but also can complete other operation tasks, such as placing an object at a designated position, double-arm interactive operation (such as respectively grabbing two objects and assembling by two mechanical arms), and the like.
The robot task execution method provided by the embodiment of the invention can acquire the first observation data of the mobile single camera system under the current view angle, determine the target action of the robot arm and the target view angle corresponding to the mobile single camera system according to the first observation data and the preset target task, control the robot arm to execute the target action and control the mobile single camera system to move to the target view angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.
The embodiment of the invention also provides a training process of the control model and the visual angle model, which comprises the steps of firstly acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, the observation data set comprises observation data under a plurality of visual angles, the mechanical arm position data can be mechanical arm joint positions, and then based on the sample data, carrying out joint training to obtain the control model and the visual angle model.
In some possible embodiments, the above-described joint training process may include:
1. first sample data and second sample data corresponding to adjacent first time blocks and second time blocks are randomly selected from the sample data. The first sample data is sample data corresponding to the first time block, and the second sample data is sample data corresponding to the second time block.
2. And inputting second observation data randomly selected from the first sample data under the first view angle into the control model to conduct motion prediction under the first time block, obtaining a first prediction motion output by the control model, and updating parameters of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data.
Specifically, a view angle (i.e., a camera is selected) may be randomly selected from the first sample data, which is herein referred to as a first view angle, the observation data under the first view angle is herein referred to as second observation data, the motion output by the control model after the second observation data is input into the control model is herein referred to as a first prediction motion, and the motion loss (i.e., a first loss) may be calculated according to the first prediction motion and the motion data of the mechanical arm in the first sample data, and the control model may be updated according to the motion loss.
3. And inputting third observation data in the second sample data under the predicted camera view angle into the control model to conduct motion prediction under the second time block, obtaining a second prediction motion output by the control model, and updating parameters of the control model according to a second loss between the second prediction motion and the mechanical arm motion data in the second sample data.
Specifically, after the second observation data and the mechanical arm motion data in the first sample data are input into the view angle model, the view angle output by the view angle model is called as a predicted camera view angle herein, the observation data under the predicted camera view angle is called as third observation data herein, after the third observation data are input into the control model, the motion output by the control model is called as a second predicted motion herein, the motion loss (namely, a second loss) can be calculated according to the second predicted motion and the mechanical arm motion data in the second sample data, and the control model can be further updated according to the motion loss.
4. Parameter updating of the view model is performed using the second loss of the second time block as supervision.
In specific implementation, the second loss can be used as supervision, and a gradient descent algorithm is adopted to update the visual angle model. Wherein the gradient descent algorithm moves along the negative gradient direction of the loss function by iteratively adjusting the model parameters, thereby minimizing the loss function.
And (3) cycling the steps, namely re-executing the step (1) until the control model and the view angle model are converged.
For ease of understanding, the model training process described above is described below with reference to FIG. 4. As shown in fig. 4, the model training process includes the following steps:
Step S410, randomly sampling two adjacent time blocks in the expert data.
Expert data here refers to sample data for training. Expert data corresponding to two time blocks may be denoted as D t:t+T={Ot,st,at:t+T},Dt+T:t+2T={Ot+T,st+T,at+T:t+2T, where T represents the beginning of the first time block, t+t represents the beginning of the second time block, t+2t represents the end of the second time block, O t represents the observed data at each view angle corresponding to the beginning T of the first time block, s t represents the arm joint position of the beginning T of the first time block, and a t:t+T represents the arm motion (i.e., expert motion) between the beginning T of the first time block and the beginning t+t of the second time block.
Step S420, randomly selecting the camera of the first time block, inputting the observation and the expert action into the view angle model to predict the optimal camera view angle of the next time block, and predicting the action of the mechanical arm of the current time block by the observation and the input control model and calculating the action loss with the corresponding expert action to update the control model.
The camera of the first time block is randomly selected, and is observed assuming a viewing angle iInput control model predicts motion of current time block robotic armCalculating motion loss L t with expert motion a t:t+T to update control model, and observingAnd expert action a t:t+T inputs the optimal camera view for the view model prediction next time block, assuming j.
Step S430, for the second time block, the motion of the mechanical arm is predicted by the observation input control model of the optimal camera view angle predicted before, and the motion loss update control model is calculated by the corresponding expert motion.
For the second time block, the observation of the previously predicted optimal camera view jInput control model for predicting motion of mechanical armThe control model is updated with expert action a t+T:t+2T to calculate action loss L t+T.
Step S440, using the motion loss of the second time block as a supervision, updating the view angle model.
The view angle model is updated using the motion loss L t+T of the second time block as supervision.
Step S450, judging whether the control model and the view angle model are converged. If not, step S410 is re-executed, and if so, the flow ends.
For ease of understanding, the active perspective selection inference process of the above-described robot task execution method will be described with reference to fig. 5. As shown in fig. 5, the active perspective selection inference process includes the steps of:
Step S510, the current camera view angle observation input control model predicts the motion of the mechanical arm of the current time block.
The current camera view observation o t inputs a control model to predict the motion of the current time block manipulator
In step S520, the current camera view observation and prediction motion input view model predicts the optimal camera view for the next time block.
Current camera view observation o t and predictive actionThe input view model predicts the optimal camera view j for the next time block.
In step S530, the mechanical arm is performed and the camera is moved to the predicted optimal camera view angle.
Step S540, judging whether the target task is completed. If not, the step S510 is re-executed, and if yes, the flow ends.
The above steps S510 to S530 are looped until the target task is completed.
According to the embodiment of the invention, the optimal visual angle selection of the robot without a label is realized by taking the action loss as supervision in the process of simulating learning. By dynamically adjusting the view angle of the robot, a camera can observe more significant parts, the problem of limited view range of a previous single-camera system is avoided, meanwhile, the noise of an input control model is reduced, the performance comparable with that of a multi-camera system is realized, and the multi-camera system can be surpassed under partial scenes.
Corresponding to the above-mentioned robot task execution method, the embodiment of the invention also provides a robot task execution device, which is applied to a robot connected with a mobile single-camera system. Referring to a schematic structural view of a robot task performing device shown in fig. 6, the robot task performing device includes:
an obtaining module 601, configured to obtain first observation data of the mobile single camera system under a current viewing angle;
The determining module 602 is configured to determine, according to the first observation data and a preset target task, a target action of the robotic arm and a target viewing angle corresponding to the mobile single-camera system;
the control module 603 is configured to control the mechanical arm to perform a target action, and control the mobile camera system to move to a target viewing angle, so as to complete a target task.
The robot task execution device provided by the embodiment of the invention can acquire the first observation data of the mobile single camera system under the current view angle, determine the target action of the robot arm and the target view angle corresponding to the mobile single camera system according to the first observation data and the preset target task, control the robot arm to execute the target action and control the mobile single camera system to move to the target view angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.
Further, the determining module 602 is specifically configured to predict an action of a current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target action, and predict an angle of view of a next time block of the mobile camera system according to the first observation data, the target action and the trained angle of view model to obtain a target angle of view, where the control model and the angle of view model both correspond to a target task, the control model and the angle of view model are obtained based on joint training of sample data under a plurality of adjacent time blocks, the sample data includes an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set includes observation data under a plurality of angles of view.
Further, the control model comprises an encoder and a decoder, wherein the encoder adopts a pre-trained multi-view mask self-encoder, the decoder comprises a transducer model, the encoder is used for extracting characteristics of input first observation data, and the decoder is used for converting first target characteristics output by the encoder into target actions.
Further, the view angle model includes a transducer model and a SoftMax activation function, the transducer model is used for extracting features of the input first observation data and the target action, and the SoftMax activation function is used for converting the second target feature output by the transducer model into the target view angle.
Further, the robot task execution device further comprises a training module, configured to:
acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles;
Based on the sample data, a control model and a visual angle model are obtained through combined training.
The training module is specifically configured to randomly select first sample data and second sample data corresponding to a first time block and a second time block from each sample data, input second observation data in the first sample data, which are randomly selected, into a control model to conduct motion prediction in the first time block, obtain a first prediction motion output by the control model, conduct parameter updating of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data, input second observation data and mechanical arm motion data in the first sample data into a visual angle model to conduct visual angle prediction in the second time block, obtain a predicted camera visual angle output by the visual angle model, input third observation data in the second sample data into the control model to conduct motion prediction in the second time block, obtain a second prediction motion output by the control model, conduct parameter updating of the control model according to second loss between the second prediction motion and mechanical arm motion data in the second sample data, conduct parameter updating of the control model, conduct parameter updating by using the second loss of the second time block as supervision, and re-converge the parameter updating from the first time block to the first sample data and the second sample data.
The implementation principle and the generated technical effects of the robot task execution device provided in this embodiment are the same as those of the foregoing embodiment of the robot task execution method, and for the sake of brief description, reference may be made to corresponding contents in the foregoing embodiment of the robot task execution method where the embodiment of the robot task execution device is not mentioned.
As shown in fig. 7, the robot 700 provided in the embodiment of the present invention includes a processor 701, a memory 702, and a bus, where the memory 702 stores a computer program that can be run on the processor 701, and when the robot 700 runs, the processor 701 and the memory 702 communicate with each other through the bus, and the processor 701 executes the computer program to implement the above-mentioned robot task execution method.
Specifically, the memory 702 and the processor 701 can be general-purpose memories and processors, which are not particularly limited herein.
The embodiment of the invention also provides an active visual angle selection system, as shown in fig. 8, which comprises the robot 801, and further comprises a mobile single-camera system 802, wherein the mobile single-camera system 802 is connected with the robot 801.
The active view selection system provided in this embodiment has the same implementation principle and technical effects as those of the foregoing robot task execution method embodiment, and for brevity description, reference may be made to corresponding contents in the foregoing robot task execution method embodiment where the active view selection system embodiment is not mentioned.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program which is executed by a processor to execute the robot task execution method in the previous method embodiment. The computer readable storage medium includes various media capable of storing program codes, such as a U disk, a mobile hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk or an optical disk.
The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of modules is merely a logical function division, and there may be additional divisions in actual implementation, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411924317.4A CN119567261B (en) | 2024-12-25 | 2024-12-25 | Robot task execution method, device, robot and active perspective selection system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411924317.4A CN119567261B (en) | 2024-12-25 | 2024-12-25 | Robot task execution method, device, robot and active perspective selection system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119567261A CN119567261A (en) | 2025-03-07 |
| CN119567261B true CN119567261B (en) | 2025-09-05 |
Family
ID=94800358
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411924317.4A Active CN119567261B (en) | 2024-12-25 | 2024-12-25 | Robot task execution method, device, robot and active perspective selection system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119567261B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114211490A (en) * | 2021-12-17 | 2022-03-22 | 中山大学 | Robot arm gripper pose prediction method based on Transformer model |
| CN118493383A (en) * | 2024-05-15 | 2024-08-16 | 上海交通大学 | Mechanical arm position visual servo method and system based on cradle head hand-eye camera |
| CN118552618A (en) * | 2024-06-03 | 2024-08-27 | 贵州大学 | Grasping pose generation method based on convolutional neural network |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109202912B (en) * | 2018-11-15 | 2020-09-11 | 太原理工大学 | Method for registering target contour point cloud based on monocular depth sensor and mechanical arm |
| CN117980915A (en) * | 2021-07-28 | 2024-05-03 | 谷歌有限责任公司 | Contrast learning and masking modeling for end-to-end self-supervised pre-training |
| CN114519813A (en) * | 2022-02-22 | 2022-05-20 | 广东工业大学 | Mechanical arm target grabbing method and system |
| CN116117786A (en) * | 2022-09-07 | 2023-05-16 | 山东大学 | Method and system for planning track of mechanical arm under high visual visibility |
| CN116690583A (en) * | 2023-07-24 | 2023-09-05 | 清华大学 | Construction and testing method and device of human-computer interaction manipulator grasping and placing system |
| CN117312992B (en) * | 2023-11-30 | 2024-03-12 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Emotion recognition method and system for fusion of multi-view face features and audio features |
| CN117428779A (en) * | 2023-12-01 | 2024-01-23 | 中国农业银行股份有限公司 | Robot grabbing control method, device, equipment and storage medium |
| CN118741573B (en) * | 2024-07-26 | 2025-05-23 | 广州航海学院 | Fault-tolerant method for improving networking robustness of underwater robot |
-
2024
- 2024-12-25 CN CN202411924317.4A patent/CN119567261B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114211490A (en) * | 2021-12-17 | 2022-03-22 | 中山大学 | Robot arm gripper pose prediction method based on Transformer model |
| CN118493383A (en) * | 2024-05-15 | 2024-08-16 | 上海交通大学 | Mechanical arm position visual servo method and system based on cradle head hand-eye camera |
| CN118552618A (en) * | 2024-06-03 | 2024-08-27 | 贵州大学 | Grasping pose generation method based on convolutional neural network |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119567261A (en) | 2025-03-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7367233B2 (en) | System and method for robust optimization of reinforcement learning based on trajectory-centered models | |
| Wang et al. | Equivariant diffusion policy | |
| Breyer et al. | Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning | |
| CN115917564A (en) | System and method for learning reusable options to transfer knowledge between tasks | |
| CN111890357A (en) | An intelligent robot grasping method based on action demonstration and teaching | |
| Zhang et al. | Modular deep q networks for sim-to-real transfer of visuo-motor policies | |
| CN118061186A (en) | Robot planning method and system based on multi-mode large model predictive control | |
| CN114970826B (en) | Multi-agent collaboration method and device based on task representation and teammate perception | |
| US20230153388A1 (en) | Method for controlling an agent | |
| CN119748461B (en) | Zero sample robot control method, device, terminal and storage medium | |
| CN114355915A (en) | AGV path planning based on deep reinforcement learning | |
| CN115990875A (en) | A State Prediction and Control System for Flexible Cables Based on Latent Space Interpolation | |
| Hafez et al. | Efficient intrinsically motivated robotic grasping with learning-adaptive imagination in latent space | |
| Zakaria et al. | Robotic control of the deformation of soft linear objects using deep reinforcement learning | |
| CN118752492A (en) | Motion control method for multi-task and multi-robot based on deep reinforcement learning | |
| Li et al. | Teleoperation-Driven and Keyframe-Based Generalizable Imitation Learning for Construction Robots | |
| CN119567261B (en) | Robot task execution method, device, robot and active perspective selection system | |
| CN119357647B (en) | Large model feature fusion Ha Xizi attention method for robot operation | |
| CN119304873A (en) | Robot control and model training method, device, equipment and storage medium | |
| CN119610132A (en) | Multi-mode large model robot control method based on meta-learning fine tuning | |
| CN119168835A (en) | A mechanical arm grasping prediction method, electronic device and storage medium | |
| CN120604238A (en) | Open Vocabulary Robot Control Using Multimodal Language Models | |
| WO2023057518A1 (en) | Demonstration-driven reinforcement learning | |
| CN116935492A (en) | A human action prediction method and device based on graph relationship interactive learning | |
| Coskun et al. | Robotic Grasping in Simulation Using Deep Reinforcement Learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |