CN119567261B

CN119567261B - Robot task execution method, device, robot and active perspective selection system

Info

Publication number: CN119567261B
Application number: CN202411924317.4A
Authority: CN
Inventors: 易鹏飞; 谢铮
Original assignee: Beijing Yuanluo Technology Co ltd
Current assignee: Beijing Yuanluo Technology Co ltd
Priority date: 2024-12-25
Filing date: 2024-12-25
Publication date: 2025-09-05
Anticipated expiration: 2044-12-25
Also published as: CN119567261A

Abstract

The invention provides a robot task execution method, a robot task execution device, a robot and an active visual angle selection system, and relates to the technical field of robots, wherein the robot is connected with a mobile single-camera system, and when a target task is executed by the robot, the robot can acquire first observation data of the mobile single-camera system under the current visual angle; and controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.

Description

Robot task execution method and device, robot and active visual angle selection system

Technical Field

The present invention relates to the field of robots, and in particular, to a method and apparatus for executing tasks by a robot, and an active view angle selection system.

Background

Robotic handling is a central challenge in robotics and is critical for a variety of applications, from industrial automation to healthcare. Visual robotic operations are currently mainly based on imitation learning (Imitation Learning, abbreviated as IL). Imitation learning allows robots to learn complex tasks by observing expert demonstration and map observations into the actions of the robotic arm. Thus, visual observation is critical to mimic the efficiency of learning. Existing methods rely on fixed camera settings, which may be single camera systems that use only one camera for viewing, typically using an "on-eye" setting, i.e. where the camera is mounted near the robot end effector (e.g. wrist camera) or fixed in an external scene to cover the whole task area, or multi-camera systems that consist of multiple fixed external cameras or in combination with wrist cameras.

However, the limited field of view of a single camera system may obscure critical parts or objects in the environment, thereby negatively affecting task performance. Multi-camera systems, while capable of providing more comprehensive scene coverage, introduce complexity in that large amounts of redundant or unrelated information may limit learning algorithms, reducing efficiency. In addition, these passive, static, multi-camera system settings do not always provide the most relevant task information, resulting in undesirable decisions that affect task execution efficiency.

Disclosure of Invention

The invention aims to provide a robot task execution method, a robot task execution device, a robot and an active visual angle selection system so as to improve task execution efficiency.

In a first aspect, the present invention provides a robot task execution method applied to a robot connected to a mobile single camera system, the robot task execution method comprising:

Acquiring first observation data of a mobile single camera system under a current view angle;

Determining a target action of a robot arm and a target visual angle corresponding to a mobile single-camera system according to the first observation data and a preset target task;

and controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task.

Further, determining, according to the first observation data and the preset target task, a target action of the robot arm and a target viewing angle corresponding to the mobile single-camera system, includes:

performing motion prediction of the current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target motion;

Performing visual angle prediction of a next time block of the mobile camera system according to the first observation data, the target action and the trained visual angle model to obtain a target visual angle;

the control model and the view angle model are obtained based on sample data combined training under a plurality of adjacent time blocks, the sample data comprise an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles.

Further, the control model comprises an encoder and a decoder, the encoder adopts a pre-trained multi-view mask self-encoder, the decoder comprises a transducer model, the encoder is used for extracting characteristics of input first observed data, and the decoder is used for converting first target characteristics output by the encoder into target actions.

Further, the view angle model comprises a transducer model and a SoftMax activation function, the transducer model is used for extracting characteristics of the input first observation data and target actions, and the SoftMax activation function is used for converting second target characteristics output by the transducer model into a target view angle.

Further, the robot task execution method further includes:

acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles;

Based on the sample data, a control model and a visual angle model are obtained through combined training.

Further, based on each sample data, the joint training obtains a control model and a view angle model, including:

Randomly selecting first sample data and second sample data corresponding to adjacent first time blocks and second time blocks from each sample data;

Inputting second observation data randomly selected from the first sample data under a first view angle into a control model to conduct motion prediction under a first time block, obtaining a first prediction motion output by the control model, and updating parameters of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data;

Inputting the second observation data and the mechanical arm action data in the first sample data into a view angle model to conduct view angle prediction of a second time block, and obtaining a predicted camera view angle output by the view angle model; inputting third observation data of the second sample data under the predicted camera view angle into the control model to conduct motion prediction under a second time block, obtaining a second predicted motion output by the control model, and updating parameters of the control model according to second loss between the second predicted motion and mechanical arm motion data in the second sample data;

using the second loss of the second time block as supervision to update parameters of the view angle model;

And re-executing the step of randomly selecting the first sample data and the second sample data corresponding to the adjacent first time block and second time block from the sample data until the control model and the visual angle model are converged.

In a second aspect, the present invention also provides a robot task execution device applied to a robot connected to a mobile single camera system, the robot task execution device comprising:

the acquisition module is used for acquiring first observation data of the mobile single-camera system under the current view angle;

The determining module is used for determining a target action of the robot mechanical arm and a target visual angle corresponding to the mobile single-camera system according to the first observation data and a preset target task;

and the control module is used for controlling the mechanical arm to execute the target action and controlling the mobile single-camera system to move to the target visual angle so as to complete the target task.

In a third aspect, the present invention also provides a robot, including a memory, and a processor, where the memory stores a computer program executable on the processor, and the processor implements the robot task execution method of the first aspect when executing the computer program.

In a fourth aspect, the present invention further provides an active perspective selection system, including the robot of the third aspect, further including a mobile single camera system, the mobile single camera system being connected to the robot.

In a fifth aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the robot task execution method of the first aspect.

According to the robot task execution method, the robot task execution device, the robot and the active visual angle selection system, the robot is connected with the mobile single-camera system, when the robot executes a target task, the robot can acquire first observation data of the mobile single-camera system under the current visual angle, the target action of the robot mechanical arm and the target visual angle corresponding to the mobile single-camera system are determined according to the first observation data and the preset target task, the mechanical arm is controlled to execute the target action, and the mobile single-camera system is controlled to move to the target visual angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of a conventional robot task execution process;

Fig. 2 is a schematic flow chart of a method for executing a robot task according to an embodiment of the present invention;

fig. 3 is a schematic view of a scenario of a robot task execution process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a model training process according to an embodiment of the present invention;

Fig. 5 is a flow chart of another method for executing a robot task according to an embodiment of the present invention;

Fig. 6 is a schematic structural diagram of a robot task execution device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a robot according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an active view selection system according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in FIG. 1, the existing robot generally adopts a fixed single-camera system or a fixed multi-camera system, that is, the visual angle of the single-camera system or the multi-camera system cannot be changed in the task execution process, and after the control model is obtained through expert data training, the control model can convert the observation data of the single-camera system or the multi-camera system into mechanical arm action output.

However, the visual field of the single-camera system is limited, which affects task performance, and the multi-camera system has a large amount of redundant or irrelevant information, so that the calculation efficiency of the control model is low, and the decision effect is affected, which results in lower task execution efficiency. Based on the above, the method, the device, the robot and the active view angle selection system for executing the robot task provided by the embodiment of the invention adopt a movable single-camera system (namely, a movable single-camera system), allow the movable single-camera system to change the view angle of the movable single-camera system at different positions, and adopt a robot active view angle selection strategy based on imitation learning, through actively selecting the optimal view angle of the movable single-camera system at the next time block, the problem of limited view range of the single-camera system can be avoided, and meanwhile, the influence of redundant or irrelevant information in the multi-camera system on the decision effect is avoided, so that the task execution efficiency is improved.

For the sake of understanding the present embodiment, a detailed description will be given of a method for executing a robot task disclosed in the present embodiment.

The embodiment of the invention provides a robot task execution method, which is applied to a robot connected with a mobile single-camera system and can be executed by a main control device such as a controller of the robot, wherein the mobile single-camera system can change the visual angle by changing the position and/or the angle, and the mobile single-camera system can adopt an RGB camera. Referring to fig. 2, a flow chart of a robot task execution method mainly includes steps S210 to S230 as follows:

step S210, acquiring first observation data of the mobile single camera system at the current viewing angle.

The mobile single-camera system is in communication connection with the robot, and in the task execution process, the mobile single-camera system can transmit collected observation data to the robot, and can also move based on the control of the robot so as to change the visual angle. The observation data may be image data, such as video or pictures, taken by a camera in the mobile single camera system, where the observation data transmitted to the robot by the mobile single camera system at the current time block is referred to as first observation data.

Step S220, determining a target action of the robot arm and a target visual angle corresponding to the mobile single-camera system according to the first observation data and a preset target task.

The target motion of the mechanical arm in the current time block can be predicted according to the first observation data, and then the optimal view angle of the next time block, namely the target view angle, is determined according to the first observation data and the predicted target motion. The target motion may include a motion of one mechanical arm, or may include a motion of two mechanical arms.

In some possible embodiments, the prediction of the target motion and the target view angle may be performed by a pre-trained control model and a view angle model, referring to a schematic scene diagram of the robot task execution process shown in fig. 3, the control model may convert the observation input of the mobile single-camera system into the prediction of the mechanical arm motion, and the view angle model outputs the optimal view angle of the next time block according to the observation and the predicted mechanical arm motion of the mobile single-camera system. Based on this, the step S220 may include performing motion prediction of a current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target motion, and performing view prediction of a next time block of the mobile camera system according to the first observation data, the target motion and the trained view model to obtain a target view. The control model and the view angle model are obtained based on sample data combined training under a plurality of adjacent time blocks, the sample data comprise an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set comprises observation data under a plurality of view angles.

In this embodiment, the robot may perform one or more tasks, each task type corresponding to a set of control models and perspective models. If the robot can execute multiple tasks, a corresponding control model and a corresponding visual angle model can be selected according to the task type of the target task currently executed.

Optionally, the control model may be composed of an encoder and a decoder, the encoder may use a pre-trained multi-view mask self-encoder, the decoder may use a transform model, the encoder is used for extracting features of the input first observation data, and the decoder is used for converting the first target features output by the encoder into target actions, wherein the first target features are obtained by extracting features of the first observation data by the encoder. The decoder may be a diffusion-based transducer model, a classical transducer decoder model, or the like. The control model of the multi-view mask self-encoder based on the pre-training has better extraction capability of view angle characteristics.

Alternatively, the view angle model may be composed of a transducer model and a SoftMax activation function, where the transducer model is used to perform feature extraction on the input first observation data and the target action, and the SoftMax activation function is used to convert a second target feature output by the transducer model into the target view angle, where the second target feature is obtained by performing feature extraction on the input first observation data and the target action by the transducer model. The visual angle model can better extract the characteristics of the current observation data and actions so as to predict the target visual angle.

Step S230, the mechanical arm is controlled to execute the target action, and the mobile single-camera system is controlled to move to the target visual angle so as to complete the target task.

After the target action and the target view angle are obtained in step S220, the robot can control the mechanical arm to execute the target action and send a control instruction corresponding to the target view angle to the mobile single camera system, so that the mobile single camera system moves to the target view angle based on the control instruction. Steps S210 to S230 may then be performed in a loop until the target task is completed.

The robot task execution method provided by the embodiment of the invention not only can complete the grabbing task, but also can complete other operation tasks, such as placing an object at a designated position, double-arm interactive operation (such as respectively grabbing two objects and assembling by two mechanical arms), and the like.

The robot task execution method provided by the embodiment of the invention can acquire the first observation data of the mobile single camera system under the current view angle, determine the target action of the robot arm and the target view angle corresponding to the mobile single camera system according to the first observation data and the preset target task, control the robot arm to execute the target action and control the mobile single camera system to move to the target view angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.

The embodiment of the invention also provides a training process of the control model and the visual angle model, which comprises the steps of firstly acquiring sample data under a plurality of adjacent time blocks, wherein the sample data comprises an observation data set, mechanical arm position data and mechanical arm action data, the observation data set comprises observation data under a plurality of visual angles, the mechanical arm position data can be mechanical arm joint positions, and then based on the sample data, carrying out joint training to obtain the control model and the visual angle model.

In some possible embodiments, the above-described joint training process may include:

1. first sample data and second sample data corresponding to adjacent first time blocks and second time blocks are randomly selected from the sample data. The first sample data is sample data corresponding to the first time block, and the second sample data is sample data corresponding to the second time block.

2. And inputting second observation data randomly selected from the first sample data under the first view angle into the control model to conduct motion prediction under the first time block, obtaining a first prediction motion output by the control model, and updating parameters of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data.

Specifically, a view angle (i.e., a camera is selected) may be randomly selected from the first sample data, which is herein referred to as a first view angle, the observation data under the first view angle is herein referred to as second observation data, the motion output by the control model after the second observation data is input into the control model is herein referred to as a first prediction motion, and the motion loss (i.e., a first loss) may be calculated according to the first prediction motion and the motion data of the mechanical arm in the first sample data, and the control model may be updated according to the motion loss.

3. And inputting third observation data in the second sample data under the predicted camera view angle into the control model to conduct motion prediction under the second time block, obtaining a second prediction motion output by the control model, and updating parameters of the control model according to a second loss between the second prediction motion and the mechanical arm motion data in the second sample data.

Specifically, after the second observation data and the mechanical arm motion data in the first sample data are input into the view angle model, the view angle output by the view angle model is called as a predicted camera view angle herein, the observation data under the predicted camera view angle is called as third observation data herein, after the third observation data are input into the control model, the motion output by the control model is called as a second predicted motion herein, the motion loss (namely, a second loss) can be calculated according to the second predicted motion and the mechanical arm motion data in the second sample data, and the control model can be further updated according to the motion loss.

4. Parameter updating of the view model is performed using the second loss of the second time block as supervision.

In specific implementation, the second loss can be used as supervision, and a gradient descent algorithm is adopted to update the visual angle model. Wherein the gradient descent algorithm moves along the negative gradient direction of the loss function by iteratively adjusting the model parameters, thereby minimizing the loss function.

And (3) cycling the steps, namely re-executing the step (1) until the control model and the view angle model are converged.

For ease of understanding, the model training process described above is described below with reference to FIG. 4. As shown in fig. 4, the model training process includes the following steps:

Step S410, randomly sampling two adjacent time blocks in the expert data.

Expert data here refers to sample data for training. Expert data corresponding to two time blocks may be denoted as D _t:t+T＝{O_t,s_t,a_t：t+T},D_t+T:t+2T＝{O_t+T,s_t+T,a_t+T:t+2T, where T represents the beginning of the first time block, t+t represents the beginning of the second time block, t+2t represents the end of the second time block, O _t represents the observed data at each view angle corresponding to the beginning T of the first time block, s _t represents the arm joint position of the beginning T of the first time block, and a _t:t+T represents the arm motion (i.e., expert motion) between the beginning T of the first time block and the beginning t+t of the second time block.

Step S420, randomly selecting the camera of the first time block, inputting the observation and the expert action into the view angle model to predict the optimal camera view angle of the next time block, and predicting the action of the mechanical arm of the current time block by the observation and the input control model and calculating the action loss with the corresponding expert action to update the control model.

The camera of the first time block is randomly selected, and is observed assuming a viewing angle iInput control model predicts motion of current time block robotic armCalculating motion loss L _t with expert motion a _t:t+T to update control model, and observingAnd expert action a _t:t+T inputs the optimal camera view for the view model prediction next time block, assuming j.

Step S430, for the second time block, the motion of the mechanical arm is predicted by the observation input control model of the optimal camera view angle predicted before, and the motion loss update control model is calculated by the corresponding expert motion.

For the second time block, the observation of the previously predicted optimal camera view jInput control model for predicting motion of mechanical armThe control model is updated with expert action a _t+T:t+2T to calculate action loss L _t+T.

Step S440, using the motion loss of the second time block as a supervision, updating the view angle model.

The view angle model is updated using the motion loss L _t+T of the second time block as supervision.

Step S450, judging whether the control model and the view angle model are converged. If not, step S410 is re-executed, and if so, the flow ends.

For ease of understanding, the active perspective selection inference process of the above-described robot task execution method will be described with reference to fig. 5. As shown in fig. 5, the active perspective selection inference process includes the steps of:

Step S510, the current camera view angle observation input control model predicts the motion of the mechanical arm of the current time block.

The current camera view observation o _t inputs a control model to predict the motion of the current time block manipulator

In step S520, the current camera view observation and prediction motion input view model predicts the optimal camera view for the next time block.

Current camera view observation o _t and predictive actionThe input view model predicts the optimal camera view j for the next time block.

In step S530, the mechanical arm is performed and the camera is moved to the predicted optimal camera view angle.

Step S540, judging whether the target task is completed. If not, the step S510 is re-executed, and if yes, the flow ends.

The above steps S510 to S530 are looped until the target task is completed.

According to the embodiment of the invention, the optimal visual angle selection of the robot without a label is realized by taking the action loss as supervision in the process of simulating learning. By dynamically adjusting the view angle of the robot, a camera can observe more significant parts, the problem of limited view range of a previous single-camera system is avoided, meanwhile, the noise of an input control model is reduced, the performance comparable with that of a multi-camera system is realized, and the multi-camera system can be surpassed under partial scenes.

Corresponding to the above-mentioned robot task execution method, the embodiment of the invention also provides a robot task execution device, which is applied to a robot connected with a mobile single-camera system. Referring to a schematic structural view of a robot task performing device shown in fig. 6, the robot task performing device includes:

an obtaining module 601, configured to obtain first observation data of the mobile single camera system under a current viewing angle;

The determining module 602 is configured to determine, according to the first observation data and a preset target task, a target action of the robotic arm and a target viewing angle corresponding to the mobile single-camera system;

the control module 603 is configured to control the mechanical arm to perform a target action, and control the mobile camera system to move to a target viewing angle, so as to complete a target task.

The robot task execution device provided by the embodiment of the invention can acquire the first observation data of the mobile single camera system under the current view angle, determine the target action of the robot arm and the target view angle corresponding to the mobile single camera system according to the first observation data and the preset target task, control the robot arm to execute the target action and control the mobile single camera system to move to the target view angle so as to complete the target task. In the task execution process, the visual angle of the mobile single-camera system can be dynamically adjusted, so that the mobile single-camera system can observe more significant parts, the problem of limited visual field range of the single-camera system is avoided, meanwhile, the noise of observed data is reduced, the decision effect is improved, and the task execution efficiency is improved.

Further, the determining module 602 is specifically configured to predict an action of a current time block of the mechanical arm according to the first observation data and the trained control model to obtain a target action, and predict an angle of view of a next time block of the mobile camera system according to the first observation data, the target action and the trained angle of view model to obtain a target angle of view, where the control model and the angle of view model both correspond to a target task, the control model and the angle of view model are obtained based on joint training of sample data under a plurality of adjacent time blocks, the sample data includes an observation data set, mechanical arm position data and mechanical arm action data, and the observation data set includes observation data under a plurality of angles of view.

Further, the control model comprises an encoder and a decoder, wherein the encoder adopts a pre-trained multi-view mask self-encoder, the decoder comprises a transducer model, the encoder is used for extracting characteristics of input first observation data, and the decoder is used for converting first target characteristics output by the encoder into target actions.

Further, the view angle model includes a transducer model and a SoftMax activation function, the transducer model is used for extracting features of the input first observation data and the target action, and the SoftMax activation function is used for converting the second target feature output by the transducer model into the target view angle.

Further, the robot task execution device further comprises a training module, configured to:

The training module is specifically configured to randomly select first sample data and second sample data corresponding to a first time block and a second time block from each sample data, input second observation data in the first sample data, which are randomly selected, into a control model to conduct motion prediction in the first time block, obtain a first prediction motion output by the control model, conduct parameter updating of the control model according to first loss between the first prediction motion and mechanical arm motion data in the first sample data, input second observation data and mechanical arm motion data in the first sample data into a visual angle model to conduct visual angle prediction in the second time block, obtain a predicted camera visual angle output by the visual angle model, input third observation data in the second sample data into the control model to conduct motion prediction in the second time block, obtain a second prediction motion output by the control model, conduct parameter updating of the control model according to second loss between the second prediction motion and mechanical arm motion data in the second sample data, conduct parameter updating of the control model, conduct parameter updating by using the second loss of the second time block as supervision, and re-converge the parameter updating from the first time block to the first sample data and the second sample data.

The implementation principle and the generated technical effects of the robot task execution device provided in this embodiment are the same as those of the foregoing embodiment of the robot task execution method, and for the sake of brief description, reference may be made to corresponding contents in the foregoing embodiment of the robot task execution method where the embodiment of the robot task execution device is not mentioned.

As shown in fig. 7, the robot 700 provided in the embodiment of the present invention includes a processor 701, a memory 702, and a bus, where the memory 702 stores a computer program that can be run on the processor 701, and when the robot 700 runs, the processor 701 and the memory 702 communicate with each other through the bus, and the processor 701 executes the computer program to implement the above-mentioned robot task execution method.

Specifically, the memory 702 and the processor 701 can be general-purpose memories and processors, which are not particularly limited herein.

The embodiment of the invention also provides an active visual angle selection system, as shown in fig. 8, which comprises the robot 801, and further comprises a mobile single-camera system 802, wherein the mobile single-camera system 802 is connected with the robot 801.

The active view selection system provided in this embodiment has the same implementation principle and technical effects as those of the foregoing robot task execution method embodiment, and for brevity description, reference may be made to corresponding contents in the foregoing robot task execution method embodiment where the active view selection system embodiment is not mentioned.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program which is executed by a processor to execute the robot task execution method in the previous method embodiment. The computer readable storage medium includes various media capable of storing program codes, such as a U disk, a mobile hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk or an optical disk.

The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.

Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of modules is merely a logical function division, and there may be additional divisions in actual implementation, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.

Claims

1. A robot task execution method, characterized in that it is applied to a robot in communication with a mobile single-camera system; the robot task execution method comprises:

Acquiring first observation data of the mobile single camera system at a current viewing angle;

Determining a target motion of the robot manipulator and a target viewing angle corresponding to the mobile single camera system based on the first observation data and a preset target task;

Controlling the robotic arm to perform the target action and controlling the mobile single-camera system to move to the target viewing angle to complete the target task; wherein the mobile single-camera system changes the viewing angle by changing the position and/or angle;

The determining, based on the first observation data and a preset target task, a target motion of the robot manipulator and a target viewing angle corresponding to the mobile single camera system includes:

Predicting the motion of the robotic arm in the current time block based on the first observation data and the trained control model to obtain the target motion;

Predicting the view angle of the next time block of the mobile single-camera system based on the first observation data, the target motion, and the trained view angle model to obtain the target view angle;

In which, the control model and the perspective model both correspond to the target task, and the control model and the perspective model are obtained by joint training based on sample data under multiple adjacent time blocks. The sample data includes an observation data set, robotic arm position data and robotic arm motion data, and the observation data set includes observation data under multiple perspectives.

2. The robot task execution method according to claim 1 is characterized in that the control model includes an encoder and a decoder, the encoder adopts a pre-trained multi-view mask autoencoder, and the decoder includes a Transformer model; the encoder is used to extract features of the first observation data input, and the decoder is used to convert the first target feature output by the encoder into the target action.

3. The robot task execution method according to claim 1 is characterized in that the perspective model includes a Transformer model and a SoftMax activation function, the Transformer model is used to extract features of the input first observation data and the target action, and the SoftMax activation function is used to convert the second target feature output by the Transformer model into the target perspective.

4. The robot task execution method according to claim 1, characterized in that the robot task execution method further comprises:

Acquire sample data in a plurality of adjacent time blocks, the sample data including an observation data set, robotic arm position data, and robotic arm motion data, the observation data set including observation data from a plurality of perspectives;

Based on each of the sample data, the control model and the viewing angle model are obtained through joint training.

5. The robot task execution method according to claim 4, wherein the control model and the view model are obtained by joint training based on the sample data, comprising:

Randomly selecting first sample data and second sample data corresponding to adjacent first time blocks and second time blocks from each of the sample data;

Inputting second observation data randomly selected from the first sample data at the first perspective into the control model to perform motion prediction in the first time block, obtaining a first predicted motion output by the control model, and updating parameters of the control model based on a first loss between the first predicted motion and the robot arm motion data in the first sample data;

Inputting the second observation data and the manipulator motion data in the first sample data into the perspective model to perform perspective prediction for the second time block, and obtaining a predicted camera perspective output by the perspective model; inputting the third observation data under the predicted camera perspective in the second sample data into the control model to perform motion prediction for the second time block, and obtaining a second predicted motion output by the control model; and updating the parameters of the control model according to a second loss between the second predicted motion and the manipulator motion data in the second sample data;

Using the second loss of the second time block as supervision, updating the parameters of the view model;

The step of randomly selecting first sample data and second sample data corresponding to adjacent first time blocks and second time blocks from each of the sample data is re-executed until the control model and the viewing angle model converge.

6. A robot task execution device, characterized in that it is applied to a robot that is communicatively connected to a mobile single-camera system; the robot task execution device comprises:

An acquisition module, configured to acquire first observation data of the mobile single-camera system at a current viewing angle;

a determination module, configured to determine a target motion of the robot manipulator and a target viewing angle corresponding to the mobile single camera system based on the first observation data and a preset target task;

a control module, configured to control the robotic arm to perform the target action and control the mobile single-camera system to move to the target viewing angle to complete the target task; wherein the mobile single-camera system changes the viewing angle by changing its position and/or angle;

The determination module is specifically used to: predict the action of the robotic arm in the current time block based on the first observation data and the trained control model to obtain the target action; predict the perspective of the next time block of the mobile single camera system based on the first observation data, the target action and the trained perspective model to obtain the target perspective; wherein, the control model and the perspective model both correspond to the target task, and the control model and the perspective model are obtained by joint training based on sample data in multiple adjacent time blocks, the sample data includes an observation data set, robotic arm position data and robotic arm motion data, and the observation data set includes observation data from multiple perspectives.

7. A robot comprising a memory and a processor, wherein the memory stores a computer program that can be run on the processor, wherein when the processor executes the computer program, the robot task execution method according to any one of claims 1 to 5 is implemented.

8. An active perspective selection system, comprising the robot according to claim 7, and further comprising a mobile single-camera system, wherein the mobile single-camera system is connected to the robot.

9. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, executes the robot task execution method according to any one of claims 1 to 5.