[go: up one dir, main page]

CN119567261B - Robot task execution method, device, robot and active perspective selection system - Google Patents

Robot task execution method, device, robot and active perspective selection system

Info

Publication number
CN119567261B
CN119567261B CN202411924317.4A CN202411924317A CN119567261B CN 119567261 B CN119567261 B CN 119567261B CN 202411924317 A CN202411924317 A CN 202411924317A CN 119567261 B CN119567261 B CN 119567261B
Authority
CN
China
Prior art keywords
target
robot
model
camera system
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411924317.4A
Other languages
Chinese (zh)
Other versions
CN119567261A (en
Inventor
易鹏飞
谢铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yuanluo Technology Co ltd
Original Assignee
Beijing Yuanluo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yuanluo Technology Co ltd filed Critical Beijing Yuanluo Technology Co ltd
Priority to CN202411924317.4A priority Critical patent/CN119567261B/en
Publication of CN119567261A publication Critical patent/CN119567261A/en
Application granted granted Critical
Publication of CN119567261B publication Critical patent/CN119567261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1661Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/161Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Manipulator (AREA)

Abstract

The invention provides a robot task execution method, a robot task execution device, a robot and an active visual angle selection system, and relates to the technical field of robots, wherein the robot is connected with a mobile single-camera system, and when a target task is executed by the robot, the robot can acquire first observation data of the mobile single-camera system under the current visual angle; and controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.

Description

Robot task execution method and device, robot and active visual angle selection system
Technical Field
The present invention relates to the field of robots, and in particular, to a method and apparatus for executing tasks by a robot, and an active view angle selection system.
Background
Robotic handling is a central challenge in robotics and is critical for a variety of applications, from industrial automation to healthcare. Visual robotic operations are currently mainly based on imitation learning (Imitation Learning, abbreviated as IL). Imitation learning allows robots to learn complex tasks by observing expert demonstration and map observations into the actions of the robotic arm. Thus, visual observation is critical to mimic the efficiency of learning. Existing methods rely on fixed camera settings, which may be single camera systems that use only one camera for viewing, typically using an "on-eye" setting, i.e. where the camera is mounted near the robot end effector (e.g. wrist camera) or fixed in an external scene to cover the whole task area, or multi-camera systems that consist of multiple fixed external cameras or in combination with wrist cameras.
However, the limited field of view of a single camera system may obscure critical parts or objects in the environment, thereby negatively affecting task performance. Multi-camera systems, while capable of providing more comprehensive scene coverage, introduce complexity in that large amounts of redundant or unrelated information may limit learning algorithms, reducing efficiency. In addition, these passive, static, multi-camera system settings do not always provide the most relevant task information, resulting in undesirable decisions that affect task execution efficiency.
Disclosure of Invention
The invention aims to provide a robot task execution method, a robot task execution device, a robot and an active visual angle selection system so as to improve task execution efficiency.
In a first aspect, the present invention provides a robot task execution method applied to a robot connected to a mobile single camera system, the robot task execution method comprising:
Acquiring first observation data of a mobile single camera system under a current view angle;
Determining a target action of a robot arm and a target visual angle corresponding to a mobile single-camera system according to the first observation data and a preset target task;
and controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task.
Further, determining, according to the first observation data and the preset target task, a target action of the robot arm and a target viewing angle corresponding to the mobile single-camera system, includes:
performing motion prediction of the current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target motion;
Performing visual angle prediction of a next time block of the mobile camera system according to the first observation data, the target action and the trained visual angle model to obtain a target visual angle;
the control model and the view angle model are obtained based on sample data combined training under a plurality of adjacent time blocks, the sample data comprise an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles.
Further, the control model comprises an encoder and a decoder, the encoder adopts a pre-trained multi-view mask self-encoder, the decoder comprises a transducer model, the encoder is used for extracting characteristics of input first observed data, and the decoder is used for converting first target characteristics output by the encoder into target actions.
Further, the view angle model comprises a transducer model and a SoftMax activation function, the transducer model is used for extracting characteristics of the input first observation data and target actions, and the SoftMax activation function is used for converting second target characteristics output by the transducer model into a target view angle.
Further, the robot task execution method further includes:
acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles;
Based on the sample data, a control model and a visual angle model are obtained through combined training.
Further, based on each sample data, the joint training obtains a control model and a view angle model, including:
Randomly selecting first sample data and second sample data corresponding to adjacent first time blocks and second time blocks from each sample data;
Inputting second observation data randomly selected from the first sample data under a first view angle into a control model to conduct motion prediction under a first time block, obtaining a first prediction motion output by the control model, and updating parameters of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data;
Inputting the second observation data and the mechanical arm action data in the first sample data into a view angle model to conduct view angle prediction of a second time block, and obtaining a predicted camera view angle output by the view angle model; inputting third observation data of the second sample data under the predicted camera view angle into the control model to conduct motion prediction under a second time block, obtaining a second predicted motion output by the control model, and updating parameters of the control model according to second loss between the second predicted motion and mechanical arm motion data in the second sample data;
using the second loss of the second time block as supervision to update parameters of the view angle model;
And re-executing the step of randomly selecting the first sample data and the second sample data corresponding to the adjacent first time block and second time block from the sample data until the control model and the visual angle model are converged.
In a second aspect, the present invention also provides a robot task execution device applied to a robot connected to a mobile single camera system, the robot task execution device comprising:
the acquisition module is used for acquiring first observation data of the mobile single-camera system under the current view angle;
The determining module is used for determining a target action of the robot mechanical arm and a target visual angle corresponding to the mobile single-camera system according to the first observation data and a preset target task;
and the control module is used for controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task.
In a third aspect, the present invention also provides a robot, including a memory, and a processor, where the memory stores a computer program executable on the processor, and the processor implements the robot task execution method of the first aspect when executing the computer program.
In a fourth aspect, the present invention further provides an active perspective selection system, including the robot of the third aspect, further including a mobile single camera system, the mobile single camera system being connected to the robot.
In a fifth aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the robot task execution method of the first aspect.
According to the robot task execution method, the robot task execution device, the robot and the active visual angle selection system, the robot is connected with the mobile single-camera system, when the robot executes a target task, the robot can acquire first observation data of the mobile single-camera system under the current visual angle, the target action of the robot mechanical arm and the target visual angle corresponding to the mobile single-camera system are determined according to the first observation data and the preset target task, the mechanical arm is controlled to execute the target action, and the mobile single-camera system is controlled to move to the target visual angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of a conventional robot task execution process;
Fig. 2 is a schematic flow chart of a method for executing a robot task according to an embodiment of the present invention;
fig. 3 is a schematic view of a scenario of a robot task execution process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a model training process according to an embodiment of the present invention;
Fig. 5 is a flow chart of another method for executing a robot task according to an embodiment of the present invention;
Fig. 6 is a schematic structural diagram of a robot task execution device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a robot according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an active view selection system according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 1, the existing robot generally adopts a fixed single-camera system or a fixed multi-camera system, that is, the visual angle of the single-camera system or the multi-camera system cannot be changed in the task execution process, and after the control model is obtained through expert data training, the control model can convert the observation data of the single-camera system or the multi-camera system into mechanical arm action output.
However, the visual field of the single-camera system is limited, which affects task performance, and the multi-camera system has a large amount of redundant or irrelevant information, so that the calculation efficiency of the control model is low, and the decision effect is affected, which results in lower task execution efficiency. Based on the above, the method, the device, the robot and the active view angle selection system for executing the robot task provided by the embodiment of the invention adopt a movable single-camera system (namely, a movable single-camera system), allow the movable single-camera system to change the view angle of the movable single-camera system at different positions, and adopt a robot active view angle selection strategy based on imitation learning, through actively selecting the optimal view angle of the movable single-camera system at the next time block, the problem of limited view range of the single-camera system can be avoided, and meanwhile, the influence of redundant or irrelevant information in the multi-camera system on the decision effect is avoided, so that the task execution efficiency is improved.
For the sake of understanding the present embodiment, a detailed description will be given of a method for executing a robot task disclosed in the present embodiment.
The embodiment of the invention provides a robot task execution method, which is applied to a robot connected with a mobile single-camera system and can be executed by a main control device such as a controller of the robot, wherein the mobile single-camera system can change the visual angle by changing the position and/or the angle, and the mobile single-camera system can adopt an RGB camera. Referring to fig. 2, a flow chart of a robot task execution method mainly includes steps S210 to S230 as follows:
step S210, acquiring first observation data of the mobile single camera system at the current viewing angle.
The mobile single-camera system is in communication connection with the robot, and in the task execution process, the mobile single-camera system can transmit collected observation data to the robot, and can also move based on the control of the robot so as to change the visual angle. The observation data may be image data, such as video or pictures, taken by a camera in the mobile single camera system, where the observation data transmitted to the robot by the mobile single camera system at the current time block is referred to as first observation data.
Step S220, determining a target action of the robot arm and a target visual angle corresponding to the mobile single-camera system according to the first observation data and a preset target task.
The target motion of the mechanical arm in the current time block can be predicted according to the first observation data, and then the optimal view angle of the next time block, namely the target view angle, is determined according to the first observation data and the predicted target motion. The target motion may include a motion of one mechanical arm, or may include a motion of two mechanical arms.
In some possible embodiments, the prediction of the target motion and the target view angle may be performed by a pre-trained control model and a view angle model, referring to a schematic scene diagram of the robot task execution process shown in fig. 3, the control model may convert the observation input of the mobile single-camera system into the prediction of the mechanical arm motion, and the view angle model outputs the optimal view angle of the next time block according to the observation and the predicted mechanical arm motion of the mobile single-camera system. Based on this, the step S220 may include performing motion prediction of a current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target motion, and performing view prediction of a next time block of the mobile camera system according to the first observation data, the target motion and the trained view model to obtain a target view. The control model and the view angle model are obtained based on sample data combined training under a plurality of adjacent time blocks, the sample data comprise an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles.
In this embodiment, the robot may perform one or more tasks, each task type corresponding to a set of control models and perspective models. If the robot can execute multiple tasks, a corresponding control model and a corresponding visual angle model can be selected according to the task type of the target task currently executed.
Optionally, the control model may be composed of an encoder and a decoder, the encoder may use a pre-trained multi-view mask self-encoder, the decoder may use a transform model, the encoder is used for extracting features of the input first observation data, and the decoder is used for converting the first target features output by the encoder into target actions, wherein the first target features are obtained by extracting features of the first observation data by the encoder. The decoder may be a diffusion-based transducer model, a classical transducer decoder model, or the like. The control model of the multi-view mask self-encoder based on the pre-training has better extraction capability of view angle characteristics.
Alternatively, the view angle model may be composed of a transducer model and a SoftMax activation function, where the transducer model is used to perform feature extraction on the input first observation data and the target action, and the SoftMax activation function is used to convert a second target feature output by the transducer model into the target view angle, where the second target feature is obtained by performing feature extraction on the input first observation data and the target action by the transducer model. The visual angle model can better extract the characteristics of the current observation data and actions so as to predict the target visual angle.
Step S230, the mechanical arm is controlled to execute the target action, and the mobile single-camera system is controlled to move to the target visual angle so as to complete the target task.
After the target action and the target view angle are obtained in step S220, the robot can control the mechanical arm to execute the target action and send a control instruction corresponding to the target view angle to the mobile single camera system, so that the mobile single camera system moves to the target view angle based on the control instruction. Steps S210 to S230 may then be performed in a loop until the target task is completed.
The robot task execution method provided by the embodiment of the invention not only can complete the grabbing task, but also can complete other operation tasks, such as placing an object at a designated position, double-arm interactive operation (such as respectively grabbing two objects and assembling by two mechanical arms), and the like.
The robot task execution method provided by the embodiment of the invention can acquire the first observation data of the mobile single camera system under the current view angle, determine the target action of the robot arm and the target view angle corresponding to the mobile single camera system according to the first observation data and the preset target task, control the robot arm to execute the target action and control the mobile single camera system to move to the target view angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.
The embodiment of the invention also provides a training process of the control model and the visual angle model, which comprises the steps of firstly acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, the observation data set comprises observation data under a plurality of visual angles, the mechanical arm position data can be mechanical arm joint positions, and then based on the sample data, carrying out joint training to obtain the control model and the visual angle model.
In some possible embodiments, the above-described joint training process may include:
1. first sample data and second sample data corresponding to adjacent first time blocks and second time blocks are randomly selected from the sample data. The first sample data is sample data corresponding to the first time block, and the second sample data is sample data corresponding to the second time block.
2. And inputting second observation data randomly selected from the first sample data under the first view angle into the control model to conduct motion prediction under the first time block, obtaining a first prediction motion output by the control model, and updating parameters of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data.
Specifically, a view angle (i.e., a camera is selected) may be randomly selected from the first sample data, which is herein referred to as a first view angle, the observation data under the first view angle is herein referred to as second observation data, the motion output by the control model after the second observation data is input into the control model is herein referred to as a first prediction motion, and the motion loss (i.e., a first loss) may be calculated according to the first prediction motion and the motion data of the mechanical arm in the first sample data, and the control model may be updated according to the motion loss.
3. And inputting third observation data in the second sample data under the predicted camera view angle into the control model to conduct motion prediction under the second time block, obtaining a second prediction motion output by the control model, and updating parameters of the control model according to a second loss between the second prediction motion and the mechanical arm motion data in the second sample data.
Specifically, after the second observation data and the mechanical arm motion data in the first sample data are input into the view angle model, the view angle output by the view angle model is called as a predicted camera view angle herein, the observation data under the predicted camera view angle is called as third observation data herein, after the third observation data are input into the control model, the motion output by the control model is called as a second predicted motion herein, the motion loss (namely, a second loss) can be calculated according to the second predicted motion and the mechanical arm motion data in the second sample data, and the control model can be further updated according to the motion loss.
4. Parameter updating of the view model is performed using the second loss of the second time block as supervision.
In specific implementation, the second loss can be used as supervision, and a gradient descent algorithm is adopted to update the visual angle model. Wherein the gradient descent algorithm moves along the negative gradient direction of the loss function by iteratively adjusting the model parameters, thereby minimizing the loss function.
And (3) cycling the steps, namely re-executing the step (1) until the control model and the view angle model are converged.
For ease of understanding, the model training process described above is described below with reference to FIG. 4. As shown in fig. 4, the model training process includes the following steps:
Step S410, randomly sampling two adjacent time blocks in the expert data.
Expert data here refers to sample data for training. Expert data corresponding to two time blocks may be denoted as D t:t+T={Ot,st,at:t+T},Dt+T:t+2T={Ot+T,st+T,at+T:t+2T, where T represents the beginning of the first time block, t+t represents the beginning of the second time block, t+2t represents the end of the second time block, O t represents the observed data at each view angle corresponding to the beginning T of the first time block, s t represents the arm joint position of the beginning T of the first time block, and a t:t+T represents the arm motion (i.e., expert motion) between the beginning T of the first time block and the beginning t+t of the second time block.
Step S420, randomly selecting the camera of the first time block, inputting the observation and the expert action into the view angle model to predict the optimal camera view angle of the next time block, and predicting the action of the mechanical arm of the current time block by the observation and the input control model and calculating the action loss with the corresponding expert action to update the control model.
The camera of the first time block is randomly selected, and is observed assuming a viewing angle iInput control model predicts motion of current time block robotic armCalculating motion loss L t with expert motion a t:t+T to update control model, and observingAnd expert action a t:t+T inputs the optimal camera view for the view model prediction next time block, assuming j.
Step S430, for the second time block, the motion of the mechanical arm is predicted by the observation input control model of the optimal camera view angle predicted before, and the motion loss update control model is calculated by the corresponding expert motion.
For the second time block, the observation of the previously predicted optimal camera view jInput control model for predicting motion of mechanical armThe control model is updated with expert action a t+T:t+2T to calculate action loss L t+T.
Step S440, using the motion loss of the second time block as a supervision, updating the view angle model.
The view angle model is updated using the motion loss L t+T of the second time block as supervision.
Step S450, judging whether the control model and the view angle model are converged. If not, step S410 is re-executed, and if so, the flow ends.
For ease of understanding, the active perspective selection inference process of the above-described robot task execution method will be described with reference to fig. 5. As shown in fig. 5, the active perspective selection inference process includes the steps of:
Step S510, the current camera view angle observation input control model predicts the motion of the mechanical arm of the current time block.
The current camera view observation o t inputs a control model to predict the motion of the current time block manipulator
In step S520, the current camera view observation and prediction motion input view model predicts the optimal camera view for the next time block.
Current camera view observation o t and predictive actionThe input view model predicts the optimal camera view j for the next time block.
In step S530, the mechanical arm is performed and the camera is moved to the predicted optimal camera view angle.
Step S540, judging whether the target task is completed. If not, the step S510 is re-executed, and if yes, the flow ends.
The above steps S510 to S530 are looped until the target task is completed.
According to the embodiment of the invention, the optimal visual angle selection of the robot without a label is realized by taking the action loss as supervision in the process of simulating learning. By dynamically adjusting the view angle of the robot, a camera can observe more significant parts, the problem of limited view range of a previous single-camera system is avoided, meanwhile, the noise of an input control model is reduced, the performance comparable with that of a multi-camera system is realized, and the multi-camera system can be surpassed under partial scenes.
Corresponding to the above-mentioned robot task execution method, the embodiment of the invention also provides a robot task execution device, which is applied to a robot connected with a mobile single-camera system. Referring to a schematic structural view of a robot task performing device shown in fig. 6, the robot task performing device includes:
an obtaining module 601, configured to obtain first observation data of the mobile single camera system under a current viewing angle;
The determining module 602 is configured to determine, according to the first observation data and a preset target task, a target action of the robotic arm and a target viewing angle corresponding to the mobile single-camera system;
the control module 603 is configured to control the mechanical arm to perform a target action, and control the mobile camera system to move to a target viewing angle, so as to complete a target task.
The robot task execution device provided by the embodiment of the invention can acquire the first observation data of the mobile single camera system under the current view angle, determine the target action of the robot arm and the target view angle corresponding to the mobile single camera system according to the first observation data and the preset target task, control the robot arm to execute the target action and control the mobile single camera system to move to the target view angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.
Further, the determining module 602 is specifically configured to predict an action of a current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target action, and predict an angle of view of a next time block of the mobile camera system according to the first observation data, the target action and the trained angle of view model to obtain a target angle of view, where the control model and the angle of view model both correspond to a target task, the control model and the angle of view model are obtained based on joint training of sample data under a plurality of adjacent time blocks, the sample data includes an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set includes observation data under a plurality of angles of view.
Further, the control model comprises an encoder and a decoder, wherein the encoder adopts a pre-trained multi-view mask self-encoder, the decoder comprises a transducer model, the encoder is used for extracting characteristics of input first observation data, and the decoder is used for converting first target characteristics output by the encoder into target actions.
Further, the view angle model includes a transducer model and a SoftMax activation function, the transducer model is used for extracting features of the input first observation data and the target action, and the SoftMax activation function is used for converting the second target feature output by the transducer model into the target view angle.
Further, the robot task execution device further comprises a training module, configured to:
acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles;
Based on the sample data, a control model and a visual angle model are obtained through combined training.
The training module is specifically configured to randomly select first sample data and second sample data corresponding to a first time block and a second time block from each sample data, input second observation data in the first sample data, which are randomly selected, into a control model to conduct motion prediction in the first time block, obtain a first prediction motion output by the control model, conduct parameter updating of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data, input second observation data and mechanical arm motion data in the first sample data into a visual angle model to conduct visual angle prediction in the second time block, obtain a predicted camera visual angle output by the visual angle model, input third observation data in the second sample data into the control model to conduct motion prediction in the second time block, obtain a second prediction motion output by the control model, conduct parameter updating of the control model according to second loss between the second prediction motion and mechanical arm motion data in the second sample data, conduct parameter updating of the control model, conduct parameter updating by using the second loss of the second time block as supervision, and re-converge the parameter updating from the first time block to the first sample data and the second sample data.
The implementation principle and the generated technical effects of the robot task execution device provided in this embodiment are the same as those of the foregoing embodiment of the robot task execution method, and for the sake of brief description, reference may be made to corresponding contents in the foregoing embodiment of the robot task execution method where the embodiment of the robot task execution device is not mentioned.
As shown in fig. 7, the robot 700 provided in the embodiment of the present invention includes a processor 701, a memory 702, and a bus, where the memory 702 stores a computer program that can be run on the processor 701, and when the robot 700 runs, the processor 701 and the memory 702 communicate with each other through the bus, and the processor 701 executes the computer program to implement the above-mentioned robot task execution method.
Specifically, the memory 702 and the processor 701 can be general-purpose memories and processors, which are not particularly limited herein.
The embodiment of the invention also provides an active visual angle selection system, as shown in fig. 8, which comprises the robot 801, and further comprises a mobile single-camera system 802, wherein the mobile single-camera system 802 is connected with the robot 801.
The active view selection system provided in this embodiment has the same implementation principle and technical effects as those of the foregoing robot task execution method embodiment, and for brevity description, reference may be made to corresponding contents in the foregoing robot task execution method embodiment where the active view selection system embodiment is not mentioned.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program which is executed by a processor to execute the robot task execution method in the previous method embodiment. The computer readable storage medium includes various media capable of storing program codes, such as a U disk, a mobile hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk or an optical disk.
The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of modules is merely a logical function division, and there may be additional divisions in actual implementation, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.

Claims (9)

1.一种机器人任务执行方法,其特征在于,应用于与移动单相机系统通信连接的机器人;所述机器人任务执行方法包括:1. A robot task execution method, characterized in that it is applied to a robot in communication with a mobile single-camera system; the robot task execution method comprises: 获取所述移动单相机系统在当前视角下的第一观测数据;Acquiring first observation data of the mobile single camera system at a current viewing angle; 根据所述第一观测数据和预设的目标任务,确定所述机器人机械臂的目标动作和所述移动单相机系统对应的目标视角;Determining a target motion of the robot manipulator and a target viewing angle corresponding to the mobile single camera system based on the first observation data and a preset target task; 控制所述机械臂执行所述目标动作,并控制所述移动单相机系统移动至所述目标视角,以完成所述目标任务;其中,所述移动单相机系统通过改变位置和/或角度来改变视角;Controlling the robotic arm to perform the target action and controlling the mobile single-camera system to move to the target viewing angle to complete the target task; wherein the mobile single-camera system changes the viewing angle by changing the position and/or angle; 所述根据所述第一观测数据和预设的目标任务,确定所述机器人机械臂的目标动作和所述移动单相机系统对应的目标视角,包括:The determining, based on the first observation data and a preset target task, a target motion of the robot manipulator and a target viewing angle corresponding to the mobile single camera system includes: 根据所述第一观测数据和训练后的控制模型进行所述机械臂当前时间块的动作预测,得到所述目标动作;Predicting the motion of the robotic arm in the current time block based on the first observation data and the trained control model to obtain the target motion; 根据所述第一观测数据、所述目标动作和训练后的视角模型进行所述移动单相机系统下一时间块的视角预测,得到所述目标视角;Predicting the view angle of the next time block of the mobile single-camera system based on the first observation data, the target motion, and the trained view angle model to obtain the target view angle; 其中,所述控制模型和所述视角模型均与所述目标任务对应,所述控制模型和所述视角模型是基于多个相邻时间块下的样本数据联合训练得到的,所述样本数据包括观测数据集合、机械臂位置数据和机械臂动作数据,所述观测数据集合包括多个视角下的观测数据。In which, the control model and the perspective model both correspond to the target task, and the control model and the perspective model are obtained by joint training based on sample data under multiple adjacent time blocks. The sample data includes an observation data set, robotic arm position data and robotic arm motion data, and the observation data set includes observation data under multiple perspectives. 2.根据权利要求1所述的机器人任务执行方法,其特征在于,所述控制模型包括编码器和解码器,所述编码器采用预训练的多视角掩码自编码器,所述解码器包括Transformer模型;所述编码器用于对输入的所述第一观测数据进行特征提取,所述解码器用于将所述编码器输出的第一目标特征转换为所述目标动作。2. The robot task execution method according to claim 1 is characterized in that the control model includes an encoder and a decoder, the encoder adopts a pre-trained multi-view mask autoencoder, and the decoder includes a Transformer model; the encoder is used to extract features of the first observation data input, and the decoder is used to convert the first target feature output by the encoder into the target action. 3.根据权利要求1所述的机器人任务执行方法,其特征在于,所述视角模型包括Transformer模型和SoftMax激活函数,所述Transformer模型用于对输入的所述第一观测数据和所述目标动作进行特征提取,所述SoftMax激活函数用于将所述Transformer模型输出的第二目标特征转换为所述目标视角。3. The robot task execution method according to claim 1 is characterized in that the perspective model includes a Transformer model and a SoftMax activation function, the Transformer model is used to extract features of the input first observation data and the target action, and the SoftMax activation function is used to convert the second target feature output by the Transformer model into the target perspective. 4.根据权利要求1所述的机器人任务执行方法,其特征在于,所述机器人任务执行方法还包括:4. The robot task execution method according to claim 1, characterized in that the robot task execution method further comprises: 获取多个相邻时间块下的样本数据,所述样本数据包括观测数据集合、机械臂位置数据和机械臂动作数据,所述观测数据集合包括多个视角下的观测数据;Acquire sample data in a plurality of adjacent time blocks, the sample data including an observation data set, robotic arm position data, and robotic arm motion data, the observation data set including observation data from a plurality of perspectives; 基于各所述样本数据,联合训练得到所述控制模型和所述视角模型。Based on each of the sample data, the control model and the viewing angle model are obtained through joint training. 5.根据权利要求4所述的机器人任务执行方法,其特征在于,所述基于各所述样本数据,联合训练得到所述控制模型和所述视角模型,包括:5. The robot task execution method according to claim 4, wherein the control model and the view model are obtained by joint training based on the sample data, comprising: 从各所述样本数据中随机选择相邻的第一时间块和第二时间块对应的第一样本数据和第二样本数据;Randomly selecting first sample data and second sample data corresponding to adjacent first time blocks and second time blocks from each of the sample data; 将所述第一样本数据中随机选择的第一视角下的第二观测数据输入所述控制模型进行所述第一时间块下的动作预测,得到所述控制模型输出的第一预测动作,并根据所述第一预测动作与所述第一样本数据中的机械臂动作数据之间的第一损失进行所述控制模型的参数更新;Inputting second observation data randomly selected from the first sample data at the first perspective into the control model to perform motion prediction in the first time block, obtaining a first predicted motion output by the control model, and updating parameters of the control model based on a first loss between the first predicted motion and the robot arm motion data in the first sample data; 将所述第二观测数据与所述第一样本数据中的机械臂动作数据输入所述视角模型进行所述第二时间块的视角预测,得到所述视角模型输出的预测相机视角;将所述第二样本数据中所述预测相机视角下的第三观测数据输入所述控制模型进行所述第二时间块下的动作预测,得到所述控制模型输出的第二预测动作,并根据所述第二预测动作与所述第二样本数据中的机械臂动作数据之间的第二损失进行所述控制模型的参数更新;Inputting the second observation data and the manipulator motion data in the first sample data into the perspective model to perform perspective prediction for the second time block, and obtaining a predicted camera perspective output by the perspective model; inputting the third observation data under the predicted camera perspective in the second sample data into the control model to perform motion prediction for the second time block, and obtaining a second predicted motion output by the control model; and updating the parameters of the control model according to a second loss between the second predicted motion and the manipulator motion data in the second sample data; 使用所述第二时间块的第二损失作为监督,进行所述视角模型的参数更新;Using the second loss of the second time block as supervision, updating the parameters of the view model; 重新执行所述从各所述样本数据中随机选择相邻的第一时间块和第二时间块对应的第一样本数据和第二样本数据的步骤,直至所述控制模型和所述视角模型收敛。The step of randomly selecting first sample data and second sample data corresponding to adjacent first time blocks and second time blocks from each of the sample data is re-executed until the control model and the viewing angle model converge. 6.一种机器人任务执行装置,其特征在于,应用于与移动单相机系统通信连接的机器人;所述机器人任务执行装置包括:6. A robot task execution device, characterized in that it is applied to a robot that is communicatively connected to a mobile single-camera system; the robot task execution device comprises: 获取模块,用于获取所述移动单相机系统在当前视角下的第一观测数据;An acquisition module, configured to acquire first observation data of the mobile single-camera system at a current viewing angle; 确定模块,用于根据所述第一观测数据和预设的目标任务,确定所述机器人机械臂的目标动作和所述移动单相机系统对应的目标视角;a determination module, configured to determine a target motion of the robot manipulator and a target viewing angle corresponding to the mobile single camera system based on the first observation data and a preset target task; 控制模块,用于控制所述机械臂执行所述目标动作,并控制所述移动单相机系统移动至所述目标视角,以完成所述目标任务;其中,所述移动单相机系统通过改变位置和/或角度来改变视角;a control module, configured to control the robotic arm to perform the target action and control the mobile single-camera system to move to the target viewing angle to complete the target task; wherein the mobile single-camera system changes the viewing angle by changing its position and/or angle; 所述确定模块具体用于:根据所述第一观测数据和训练后的控制模型进行所述机械臂当前时间块的动作预测,得到所述目标动作;根据所述第一观测数据、所述目标动作和训练后的视角模型进行所述移动单相机系统下一时间块的视角预测,得到所述目标视角;其中,所述控制模型和所述视角模型均与所述目标任务对应,所述控制模型和所述视角模型是基于多个相邻时间块下的样本数据联合训练得到的,所述样本数据包括观测数据集合、机械臂位置数据和机械臂动作数据,所述观测数据集合包括多个视角下的观测数据。The determination module is specifically used to: predict the action of the robotic arm in the current time block based on the first observation data and the trained control model to obtain the target action; predict the perspective of the next time block of the mobile single camera system based on the first observation data, the target action and the trained perspective model to obtain the target perspective; wherein, the control model and the perspective model both correspond to the target task, and the control model and the perspective model are obtained by joint training based on sample data in multiple adjacent time blocks, the sample data includes an observation data set, robotic arm position data and robotic arm motion data, and the observation data set includes observation data from multiple perspectives. 7.一种机器人,包括存储器、处理器,所述存储器中存储有可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1-5中任一项所述的机器人任务执行方法。7. A robot comprising a memory and a processor, wherein the memory stores a computer program that can be run on the processor, wherein when the processor executes the computer program, the robot task execution method according to any one of claims 1 to 5 is implemented. 8.一种主动视角选择系统,其特征在于,包括权利要求7所述的机器人,还包括移动单相机系统,所述移动单相机系统与所述机器人连接。8. An active perspective selection system, comprising the robot according to claim 7, and further comprising a mobile single-camera system, wherein the mobile single-camera system is connected to the robot. 9.一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其特征在于,所述计算机程序被处理器运行时执行权利要求1-5中任一项所述的机器人任务执行方法。9. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, executes the robot task execution method according to any one of claims 1 to 5.
CN202411924317.4A 2024-12-25 2024-12-25 Robot task execution method, device, robot and active perspective selection system Active CN119567261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411924317.4A CN119567261B (en) 2024-12-25 2024-12-25 Robot task execution method, device, robot and active perspective selection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411924317.4A CN119567261B (en) 2024-12-25 2024-12-25 Robot task execution method, device, robot and active perspective selection system

Publications (2)

Publication Number Publication Date
CN119567261A CN119567261A (en) 2025-03-07
CN119567261B true CN119567261B (en) 2025-09-05

Family

ID=94800358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411924317.4A Active CN119567261B (en) 2024-12-25 2024-12-25 Robot task execution method, device, robot and active perspective selection system

Country Status (1)

Country Link
CN (1) CN119567261B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114211490A (en) * 2021-12-17 2022-03-22 中山大学 Robot arm gripper pose prediction method based on Transformer model
CN118493383A (en) * 2024-05-15 2024-08-16 上海交通大学 Mechanical arm position visual servo method and system based on cradle head hand-eye camera
CN118552618A (en) * 2024-06-03 2024-08-27 贵州大学 Grasping pose generation method based on convolutional neural network

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109202912B (en) * 2018-11-15 2020-09-11 太原理工大学 Method for registering target contour point cloud based on monocular depth sensor and mechanical arm
CN117980915A (en) * 2021-07-28 2024-05-03 谷歌有限责任公司 Contrast learning and masking modeling for end-to-end self-supervised pre-training
CN114519813A (en) * 2022-02-22 2022-05-20 广东工业大学 Mechanical arm target grabbing method and system
CN116117786A (en) * 2022-09-07 2023-05-16 山东大学 Method and system for planning track of mechanical arm under high visual visibility
CN116690583A (en) * 2023-07-24 2023-09-05 清华大学 Construction and testing method and device of human-computer interaction manipulator grasping and placing system
CN117312992B (en) * 2023-11-30 2024-03-12 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Emotion recognition method and system for fusion of multi-view face features and audio features
CN117428779A (en) * 2023-12-01 2024-01-23 中国农业银行股份有限公司 Robot grabbing control method, device, equipment and storage medium
CN118741573B (en) * 2024-07-26 2025-05-23 广州航海学院 Fault-tolerant method for improving networking robustness of underwater robot

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114211490A (en) * 2021-12-17 2022-03-22 中山大学 Robot arm gripper pose prediction method based on Transformer model
CN118493383A (en) * 2024-05-15 2024-08-16 上海交通大学 Mechanical arm position visual servo method and system based on cradle head hand-eye camera
CN118552618A (en) * 2024-06-03 2024-08-27 贵州大学 Grasping pose generation method based on convolutional neural network

Also Published As

Publication number Publication date
CN119567261A (en) 2025-03-07

Similar Documents

Publication Publication Date Title
JP7367233B2 (en) System and method for robust optimization of reinforcement learning based on trajectory-centered models
Wang et al. Equivariant diffusion policy
Breyer et al. Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning
CN115917564A (en) System and method for learning reusable options to transfer knowledge between tasks
CN111890357A (en) An intelligent robot grasping method based on action demonstration and teaching
Zhang et al. Modular deep q networks for sim-to-real transfer of visuo-motor policies
CN118061186A (en) Robot planning method and system based on multi-mode large model predictive control
CN114970826B (en) Multi-agent collaboration method and device based on task representation and teammate perception
US20230153388A1 (en) Method for controlling an agent
CN119748461B (en) Zero sample robot control method, device, terminal and storage medium
CN114355915A (en) AGV path planning based on deep reinforcement learning
CN115990875A (en) A State Prediction and Control System for Flexible Cables Based on Latent Space Interpolation
Hafez et al. Efficient intrinsically motivated robotic grasping with learning-adaptive imagination in latent space
Zakaria et al. Robotic control of the deformation of soft linear objects using deep reinforcement learning
CN118752492A (en) Motion control method for multi-task and multi-robot based on deep reinforcement learning
Li et al. Teleoperation-Driven and Keyframe-Based Generalizable Imitation Learning for Construction Robots
CN119567261B (en) Robot task execution method, device, robot and active perspective selection system
CN119357647B (en) Large model feature fusion Ha Xizi attention method for robot operation
CN119304873A (en) Robot control and model training method, device, equipment and storage medium
CN119610132A (en) Multi-mode large model robot control method based on meta-learning fine tuning
CN119168835A (en) A mechanical arm grasping prediction method, electronic device and storage medium
CN120604238A (en) Open Vocabulary Robot Control Using Multimodal Language Models
WO2023057518A1 (en) Demonstration-driven reinforcement learning
CN116935492A (en) A human action prediction method and device based on graph relationship interactive learning
Coskun et al. Robotic Grasping in Simulation Using Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant