CN119319568B

CN119319568B - Robotic arm control method, device, equipment and storage medium

Info

Publication number: CN119319568B
Application number: CN202411855423.1A
Authority: CN
Inventors: 张钊; 郭效禹; 刘恋; 彭博文; 王敏
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2024-12-17
Filing date: 2024-12-17
Publication date: 2025-04-25
Anticipated expiration: 2044-12-17
Also published as: CN119319568A

Abstract

The application discloses a control method, a device, equipment and a storage medium of a mechanical arm, which relate to the technical field of intelligent control and comprise the steps of obtaining voice information and image information, analyzing the voice information and the image information based on a large language model to obtain grabbing task target information of the mechanical arm, obtaining grabbing pose of the mechanical arm according to the grabbing task target information, controlling the mechanical arm to complete grabbing tasks based on a grabbing pose controller of the mechanical arm.

Description

Mechanical arm control method, device, equipment and storage medium

Technical Field

The present application relates to the field of intelligent control technologies, and in particular, to a method, an apparatus, a device, and a storage medium for controlling a mechanical arm.

Background

In recent years, related technologies of large language models have received a great deal of attention, and are an important breakthrough in the field of artificial intelligence. The large model can understand human language input and generate high-quality reply content, and the technology provides new possibilities for modes of human-computer interaction. Along with the development of the large language model field, the multi-mode large model realizes new breakthrough in the task understanding level, not only can understand complex natural language instructions, but also can process picture information, analyze environmental scenes, estimate task targets and further generate new sentences and even code information. The robot can execute a series of complex and highly intelligent tasks by combining the multi-mode large language model, so that the naturalness and fluency of human-computer interaction are greatly improved, the related technology is widely focused and applied in the field of personal intelligence, but the related research results are limited to laboratory scenes at present, and the process of popularizing human life on a large scale and commercializing the market is far away.

At present, in the whole task execution process of the robot, the method mainly comprises advanced strategy formulation of a top layer, sensor perception of interaction of a bottom layer and environment and controller execution of the two layers, related technologies are hot content in academia all the time, along with the advent of ChatGPT, a new possibility is generated by an interaction mode between two norms of the top layer and the bottom layer of the robot, but at present, the control of a mechanical arm grabbing task is still not intelligent enough, an explicit instruction is needed for operation, and the mechanical arm grabbing task cannot be applied to better practice due to poor intelligence.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The application mainly aims to provide a method, a device, equipment and a storage medium for controlling a mechanical arm, and aims to solve the technical problems that the prior mechanical arm grabbing task needs an explicit instruction, and the intelligent degree of the mechanical arm grabbing task is poor and cannot be well applied to practice.

In order to achieve the above object, the present application provides a method for controlling a mechanical arm, the method comprising:

acquiring voice information and image information;

Analyzing the voice information and the image information based on a large language model to obtain grabbing task target information of the mechanical arm;

Obtaining the grabbing pose of the mechanical arm according to the grabbing task target information;

and controlling the mechanical arm to complete a grabbing task based on the mechanical arm grabbing pose controller.

In an embodiment, the analyzing the voice information and the image information based on the large language model to obtain the grabbing task target information of the mechanical arm includes:

identifying keyword information according to the voice information, and carrying out color extraction and depth information extraction on the image information to obtain supplementary constraint;

matching the voice information with the image information based on the supplementary constraint to obtain a target object;

And obtaining task target information based on the keyword information and the image information, wherein the task target information comprises target object category, name and camera coordinate information.

In an embodiment, the obtaining the gripper pose of the mechanical arm according to the gripper task target information includes:

Extracting a reference image of the target object according to the camera coordinate information of the grabbing task target information;

and obtaining the grabbing pose of the mechanical arm according to the reference image and the camera coordinate information in the grabbing task target information.

In an embodiment, the controlling the mechanical arm to complete the grabbing task according to the grabbing pose of the mechanical arm includes:

acquiring a preset hand-eye calibration matrix and a preset inverse kinematics matrix of the mechanical arm;

Performing coordinate transformation on the mechanical arm grabbing pose according to the preset hand-eye calibration matrix and the preset inverse kinematics matrix to obtain mechanical arm base coordinates of the mechanical arm grabbing pose;

And controlling the mechanical arm to execute a grabbing action based on the mechanical arm base coordinates so as to complete a grabbing task.

In an embodiment, the performing coordinate transformation on the mechanical arm grabbing pose according to the preset hand-eye calibration matrix and the preset inverse kinematics matrix to obtain a mechanical arm base coordinate of the mechanical arm grabbing pose includes:

Converting the grabbing pose of the mechanical arm into a mechanical arm space coordinate according to the preset hand-eye calibration matrix;

and converting the space coordinates of the mechanical arm into the base coordinates of the mechanical arm under the base coordinate system of the mechanical arm according to the preset inverse kinematics matrix.

In an embodiment, the controlling the robotic arm to perform the grabbing action based on the robotic arm-based coordinates includes:

acquiring the current state of the mechanical arm, and acquiring an expected state according to the base coordinates of the mechanical arm;

generating an expected track according to the current state, the expected state and a mechanical arm dynamics model;

and controlling the mechanical arm to execute grabbing action based on the expected track.

In an embodiment, the controlling the mechanical arm to perform the grabbing action based on the desired track includes:

constructing a control rate according to the synovial surface according to the adaptive neural network fitting time-varying output constraint;

and controlling the mechanical arm to execute grabbing action according to the time-varying output constraint, the control rate and the expected track.

In addition, in order to achieve the above object, the present application also provides a robot arm control device, including:

The acquisition module is used for acquiring voice information and image information;

the model analysis module is used for analyzing the voice information and the image information based on a large language model to obtain grabbing task target information of the mechanical arm;

the mechanical arm control module is used for obtaining the mechanical arm grabbing pose according to the grabbing task target information;

the mechanical arm control module is further used for controlling the mechanical arm to finish the grabbing task based on the mechanical arm grabbing pose controller.

In addition, in order to achieve the above object, the application also proposes a robot arm control device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the robot arm control method as described above.

In addition, in order to achieve the above object, the present application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the robot arm control method as described above.

Furthermore, to achieve the above object, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the robot arm control method as described above.

One or more technical schemes provided by the application have at least the following technical effects:

according to the method, the external voice information and the environment image are obtained through analysis of the large language model, matched targets in the voice information and the environment image are understood, a grabbing task is automatically generated according to the positions of the matched targets in the image, the large language model is fully utilized to process task input information, the task targets are identified to generate grabbing task information, the success rate of grabbing tasks is improved, and task scenes are widened.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a first embodiment of a method for controlling a mechanical arm according to the present application;

Fig. 2 is a practical display diagram of a mechanical arm acquiring voice information and image information according to a first embodiment of the mechanical arm control method of the present application;

FIG. 3 is a schematic diagram illustrating a robotic arm performing a grabbing action based on a target task according to a first embodiment of a method for controlling a robotic arm of the present application;

FIG. 4 is a schematic diagram of an overall control logic architecture of a robot according to a first embodiment of the present application;

FIG. 5 is a schematic flow chart of a second embodiment of a method for controlling a mechanical arm according to the present application;

FIG. 6 is a schematic diagram illustrating a variation of an input signal of a mechanical arm in a control process according to a second embodiment of the present application;

FIG. 7 is a schematic diagram showing a joint output error according to a second embodiment of the present application;

FIG. 8 is a schematic diagram showing whether to compensate for hysteresis effects according to a second embodiment of the present application;

FIG. 9 is a schematic diagram illustrating the influence of different parameters η on convergence time when a position error e11 is provided in a second embodiment of the mechanical arm control method of the present application;

FIG. 10 is a schematic diagram illustrating the influence of different parameters η on convergence time when a position error e12 is provided in a second embodiment of the mechanical arm control method of the present application;

FIG. 11 is a schematic block diagram of a mechanical arm control device according to an embodiment of the present application;

Fig. 12 is a schematic device structure diagram of a hardware operating environment related to a control method of a mechanical arm in an embodiment of the present application.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the technical solution of the present application and are not intended to limit the present application.

For a better understanding of the technical solution of the present application, the following detailed description will be given with reference to the drawings and the specific embodiments.

The method comprises the steps of obtaining voice information and image information, analyzing the voice information and the image information based on a large language model to obtain grabbing task target information of the mechanical arm, obtaining grabbing pose of the mechanical arm according to the grabbing task target information, and controlling the mechanical arm to complete grabbing tasks based on a grabbing pose controller of the mechanical arm.

In this embodiment, for convenience of description, description will be made below with the recognition of the robot arm control apparatus as the execution subject.

In recent years, the related technology of a large language model is widely focused, and is an important breakthrough in the field of artificial intelligence. The large model can understand human language input and generate high-quality reply content, and the technology provides new possibilities for modes of human-computer interaction. Along with the development of the large language model field, the multi-mode large model realizes new breakthrough in the task understanding level, not only can understand complex natural language instructions, but also can process picture information, analyze environmental scenes, estimate task targets and further generate new sentences and even code information. The robot can execute a series of complex and highly intelligent tasks by combining the multi-mode large language model, so that the naturalness and fluency of human-computer interaction are greatly improved, the related technology is widely focused and applied in the field of personal intelligence, but the related research results are limited to laboratory scenes at present, and the process of popularizing human life on a large scale and commercializing the market is far away.

The application provides a solution, which is to analyze the acquired external voice information and environment image through a large language model, understand the matched targets in the voice information and environment image, automatically generate the grabbing task according to the position of the matched targets in the image, fully utilize the large language model to process task input information, identify the task targets to generate grabbing task information, improve the success rate of grabbing tasks and widen task scenes.

According to the embodiment, the application discloses a control method, a device, equipment and a storage medium of a mechanical arm, which relate to the technical field of intelligent control and comprise the steps of acquiring voice information and image information, analyzing the voice information and the image information based on a large language model to obtain grabbing task target information of the mechanical arm, obtaining grabbing pose of the mechanical arm according to the grabbing task target information, controlling the mechanical arm to complete grabbing tasks based on a grabbing pose controller of the mechanical arm.

It should be noted that, the execution body of the embodiment may be a computing service device having functions of data processing, network communication and program running, such as a tablet computer, a personal computer, a mobile phone, or an electronic device, a mechanical arm control device, or the like, which can implement the above functions. The present embodiment and the following embodiments will be described below with reference to a robot arm control device as an example.

Based on this, an embodiment of the present application provides a method for controlling a mechanical arm, and referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of the method for controlling a mechanical arm according to the present application.

In this embodiment, the mechanical arm control method includes steps S10 to S40:

step S10, voice information and image information are acquired.

It should be noted that the voice information may be the speaking content of the user to the mechanical arm, capturing the external human voice in real time, and taking the voice as the input information.

It should be emphasized that the image information may be an ambient image acquired in real time by the image acquisition device of the mechanical arm, and the ambient image is taken as the image information.

It should be further noted that, the mechanical arm captures the sound information of the surrounding people in real time, and simultaneously captures the surrounding environment information in real time, or triggers the image acquisition device after recognizing the sound information, and then acquires the surrounding environment image information.

And step S20, analyzing the voice information and the image information based on a large language model to obtain the grabbing task target information of the mechanical arm.

It is understood that large language models (Large Language Models, LLMs) refer to deep learning models that are trained from large amounts of text data. These models typically have hundreds of millions of parameters that enable understanding and generating human language, performing tasks such as translation, abstract generation, question-answering, dialogue, etc., and with the increase in computing resources and advances in algorithms, large language models have achieved significant achievements in the field of Natural Language Processing (NLP).

It should be noted that, in the overall task execution process of the robot, the method mainly comprises advanced strategy formulation of the top layer, sensor perception of the bottom layer interacting with the environment and controller execution, and related technologies are all hot contents in academia. With the advent of ChatGPT, a new possibility is created for the interaction pattern between the two paradigms of the top and bottom of the robot. At the task-aware planning level, a discussion of the application capabilities of large language models is developed globally. The multi-modal large language model can receive pictures as input information and has excellent performance in picture understanding ability. The task scene of object grabbing is considered, the task target is used as a guide, task understanding capability and rich semantic knowledge of a large language model are utilized to construct task-oriented grabbing task prediction, and under a part of task scenes, the robot can better simulate a real human thinking mode and an operation means by means of the large language model, so that the task executable is greatly improved. The large language model is utilized to optimize the data analysis and decision process, so that the task motion scene of the unmanned aerial vehicle is greatly expanded, and the unmanned aerial vehicle can realize the effects of autonomously collecting information and controlling in disaster scenes and scenes with difficult communication.

It should be understood that the voice information and the image information may be input into a large language model, and the trained large language model generates a control task/control instruction which can be understood by the mechanical arm for the voice information and the image information, so as to control the mechanical arm to execute the grabbing task.

The capturing task target information may be information such as a position of the target object, an image of the target object, and the like.

In one possible embodiment, step S20 may include steps a21 to a23:

And step A21, recognizing keyword information according to the voice information, and performing color extraction and depth information extraction on the image information to obtain supplementary constraint.

It is understood that the keyword information may be information related to the capturing task in the voice information, for example, the voice information may be "help me to sort a file", where the keyword may be understood as a file.

In a specific implementation, the supplementary constraint may be information for obtaining the keyword information according to the image information, and may be specification information, origin of coordinates, output information specification and the like of the object in the picture.

In the implementation, the input signal of the grabbing task is considered to be human voice input, a trigger keyword is firstly set, when keyword information is recognized, voice signals of an operator are converted into text information through a voice recognition module, and the text information is stored and used as part of text input information of a large model. And the monitoring voice recognition module is used for triggering the visual equipment of the mechanical arm to shoot a task scene when the module is activated, extracting RGBD information of the shot picture, and finally transmitting the shot picture to the multi-mode large model to be used as picture input of the large model.

It should be emphasized that, the voice information and the image information are used as input information of the large language model, in order to ensure the consistency of the output of the large language model and obtain relatively stable performance of the grabbing task, the input of the prompting word of the large language model needs to be further supplemented, and the specification information, the origin of coordinates and the output information specification of the picture need to be supplemented in the prompting word in combination with the specific camera and mechanical arm equipment of the system.

And step A22, matching the voice information with the image information based on the supplementary constraint to obtain a target object.

It is understood that the image information may be the object type and the object coordinates in the recognition map.

It should be noted that, matching the voice information with the image information based on the supplemental constraint may be matching an object in an image with a keyword in voice.

It should be noted that, the large language model has already been mature for training, and can quickly extract information in images, extract keywords in text content, and match the two, and this embodiment of training the large language model is not limited and will not be described here.

It is understood that the target object may be the object in the image that has the highest degree of matching with the voice information.

And step A23, obtaining task target information based on the keyword information and the image information, wherein the task target information comprises target object category, name and camera coordinate information.

In the implementation, the large model analyzes the voice information input by the operator according to the recognized text information and the supplementary constraint requirement, judges the task target and recognizes the target object which is most suitable for voice input. After the target task is judged, object matching is carried out according to the image information provided by the vision module, if object information meeting the task requirement exists, a target bounding box is defined, the geometric center point coordinates of the object are calculated according to the bounding box and the picture information, and the obtained information is output.

In the embodiment, the grabbing target of the mechanical arm is determined by matching the voice information and the image information input into the large language model, so that the requirement of a user can be more flexibly identified, and the task target is determined based on the requirement of the user, thereby flexibly executing the grabbing task in more scenes.

The above is only a possible implementation of step S20 provided in this embodiment, and the specific implementation of step S20 in this embodiment is not specifically limited.

And step S30, obtaining the grabbing pose of the mechanical arm according to the grabbing task target information.

It can be understood that the grasping pose of the mechanical arm can be a parameter which can be understood by the mechanical arm, and can be a quaternion array which represents the specific pose of the end actuating mechanism of the mechanical arm in space.

In one possible embodiment, step S30 may include steps a31 to a32:

And step A31, extracting a reference image of the target object according to the camera coordinate information of the grabbing task target information.

It is understood that the reference image may be an image of the target object that is truncated from the image information.

It should be noted that, the image of the target object is cut out independently, so that the mechanical arm can accurately recognize the environmental image, but not mark the target image from the image information, and the defects of unclear marking and wrong grabbing of the mechanical arm caused by too close distances of a plurality of objects are avoided.

And step A32, obtaining the grabbing pose of the mechanical arm according to the reference image and the camera coordinate information in the grabbing task target information.

It should be noted that, the obtaining of the grabbing pose of the mechanical arm according to the reference image and the camera coordinate information in the grabbing task target information may be that according to the coordinate information of the target object returned by the large model, calculating the optimal grabbing pose of the object with the specified coordinate point according to the trained grabbing neural network and the image information of the surrounding environment shot by the vision equipment, and returning to a quaternary array to represent the specific pose of the end effector of the mechanical arm in space.

The method is characterized in that in the process of generating the grabbing pose of the mechanical arm according to the position of the target object in the image information by the trained grabbing neural network, the weight of the neural network can be updated, the accuracy of the generated pose is continuously optimized, and the grabbing success rate of the mechanical arm is improved.

It should be emphasized that the neural network weight update rate design can refer to the following formula:

R is a normal number diagonal matrix, S is a radial basis function, Z is a network input, σ is a cooperative gain coefficient, and hysteresis parameters are set as:

In the embodiment, the grasping task possibly pointed in the voice information of the user is identified through the large language model, the position of the target task is further analyzed through the image information, and the grasping pose of the mechanical arm is generated according to the target position based on the neural network, so that the target object is accurately grasped.

The above is only a possible implementation of step S30 provided in this embodiment, and the specific implementation of step S30 in this embodiment is not specifically limited.

And step S40, controlling the mechanical arm to finish the grabbing task based on the mechanical arm grabbing pose controller.

It should be noted that, after completing the task processing of the large model and capturing the calculation of the neural network, the movement flow of the mechanical arm is triggered in the form of topics. The mechanical arm obtains coordinate information of topic transmission, and the coordinate in the task plane space is converted into position information of a mechanical arm base coordinate system by combining a hand-eye calibration matrix and a mechanical arm inverse kinematics matrix.

It is emphasized that, from the observation pose, according to the coordinate information, the mechanical arm end mechanism moves to the position above the grabbing target, then moves closer to the target, and moves to the grabbing position accurately, so that the end mechanism reaches the optimal grabbing pose, after reaching the grabbing pose, the end executing mechanism is closed to complete object grabbing, the mechanical arm continues to move to the end pose, and the end clamping jaw is opened. After the object is placed at the target place, the mechanical arm returns to the initial observation pose and waits for the next execution topic command.

In a specific implementation, an actual display diagram of the robotic arm acquiring voice information and image information may refer to fig. 2, and a schematic diagram of the robotic arm performing a grabbing action based on a target task may refer to fig. 3.

It should be noted that, the overall control logic architecture diagram of the mechanical arm can refer to fig. 4, in which the coordinate system conversion training is performed according to the kinematic model of the mechanical arm to train the self-adaptive neural network controller, keywords exist in the voice information of an operator or a user to input a large language model, meanwhile, the keywords trigger the vision module to acquire an environment image, the environment image is transmitted to the neural network of the grabbing model, meanwhile, the environment image is also input to the large language model to supplement the voice information of the user to obtain the grabbing target based on the user intention, the grabbing track of the mechanical arm is generated through the self-adaptive neural network controller and the grabbing model neural network based on the grabbing target, namely, the motion planning in the diagram is performed based on the motion planning, and the grabbing command is executed based on the motion planning.

The embodiment provides a control method for a mechanical arm, which is characterized in that external voice information and an environment image are obtained through analysis of a large language model, matched targets in the voice information and the environment image are understood, a grabbing task is automatically generated according to the position of the matched targets in the image, task input information is processed by fully utilizing the large language model, the task targets are identified to generate grabbing task information, the success rate of grabbing tasks is improved, and task scenes are widened.

In the second embodiment of the present application, the same or similar content as in the first embodiment of the present application may be referred to the above description, and will not be repeated. On this basis, referring to fig. 5, step S40 further includes steps S41 to S43:

step S41, a preset hand-eye calibration matrix and a preset inverse kinematics matrix of the mechanical arm are obtained.

It can be understood that the mechanical arm has three coordinate systems, the coordinate system where the mechanical arm image acquisition device is located, the coordinate system where the tail end grabbing clamp of the mechanical arm is located and the coordinate system where the base of the mechanical arm is located.

It should be understood that the preset hand-eye calibration matrix may be a transformation matrix between a coordinate system where the image acquisition device is located and a coordinate system where the gripper at the tail end of the mechanical arm is located, and the inverse kinematics matrix may be a transformation matrix between the coordinate system where the gripper at the tail end of the mechanical arm is located and the coordinate system where the base of the mechanical arm is located.

And S42, carrying out coordinate transformation on the mechanical arm grabbing pose according to the preset hand-eye calibration matrix and the preset inverse kinematics matrix to obtain the mechanical arm base coordinates of the mechanical arm grabbing pose.

It is understood that the robot base coordinates of the robot gripping pose refer to the pose of the robot gripping pose in a coordinate system relative to the robot base.

In one possible implementation, step S42 may include steps a421 to a422:

And step A421, converting the grabbing pose of the mechanical arm into mechanical arm space coordinates according to the preset hand-eye calibration matrix.

It should be understood that the spatial coordinates of the robot arm may be the coordinates of the robot arm at which the end of the robot arm is located.

It can be appreciated that Hand-Eye Calibration (Hand-Eye Calibration) is a key technology in robot vision, and is used to determine the relative positional relationship between a camera (Eye) and a manipulator end effector (Hand), and by presetting a Hand-Eye Calibration matrix, points in the camera coordinate system can be converted into a manipulator base coordinate system.

It should be noted that the mechanical arm needs to know how to adjust its posture to correctly grasp the object. Therefore, you may still need to calculate the pose (rotation) of the target object under the coordinate system of the mechanical arm base, which can be completed by the hand-eye calibration matrix, and send corresponding instructions to the mechanical arm controller according to the calculated position and pose of the target object under the coordinate system of the mechanical arm base, so that the mechanical arm moves to the designated position and adjusts to the correct pose, thereby completing the grabbing action.

And step A422, converting the space coordinates of the mechanical arm into the base coordinates of the mechanical arm under the base coordinate system of the mechanical arm according to the preset inverse kinematics matrix.

It may be appreciated that the converting the spatial coordinates of the manipulator into the base coordinates of the manipulator in the base coordinates of the manipulator according to the preset inverse kinematics matrix may be converting the coordinates of the midpoint of the spatial coordinates of the manipulator into the base coordinates of the manipulator.

The coordinates of the target object in the image are converted into spatial coordinates in the robot arm base coordinate system by two coordinate transformations, and a control track is generated and a grabbing action is performed based on the spatial coordinates.

In this embodiment, the position of the target object in the image is converted into the spatial coordinate under the mechanical arm base coordinate system by two spatial transformations, and the grabbing action track satisfying the grabbing task is generated according to the neural network based on the spatial coordinate system, thereby completing the grabbing action on the target object.

The above is only a possible implementation of step S42 provided in this embodiment, and the specific implementation of step S42 is not specifically limited in this embodiment.

And step S43, controlling the mechanical arm to execute a grabbing action based on the mechanical arm base coordinates so as to complete a grabbing task.

It will be appreciated that a collision-free path to the target location is planned based on the robot base coordinates. RRT (Rapidly-exploring Random Trees), a-algorithm, or other path planning techniques may be employed.

Further, the result of the path planning is converted into a time sequence of joint angles or a position sequence of the end effector, i.e. a specific motion track is generated.

In one possible implementation, step S43 may include steps a431 to a433:

and step A431, acquiring the current state of the mechanical arm, and acquiring the expected state according to the base coordinates of the mechanical arm.

The desired state may be a state corresponding to a grabbing action of the mechanical arm, which corresponds to a space coordinate of the mechanical arm base coordinate system, according to the grabbing pose of the mechanical arm.

It is understood that the current state of the mechanical arm may be a state corresponding to a spatial coordinate of the current pose of the mechanical arm under the base coordinate of the mechanical arm.

And step A432, generating a desired track according to the current state, the desired state and a mechanical arm dynamics model.

It can be understood that parameters of the mechanical arm model and the rotational inertia of the mechanical arm can be obtained, and the mechanical arm dynamic model can be constructed according to the parameters of the mechanical arm model and the rotational inertia of the mechanical arm.

It should be noted that, the system parameter matrix in the dynamics model is:

Wherein:

Wherein, C represents the Coriolis force and centrifugal force matrix, G represents the gravity vector and M represents the mass matrix, and the three matrices or vectors together describe the force and moment the mechanical arm is subjected to in the motion process, which is important for realizing accurate control and simulation.

Further, the simulated mechanical arm model parameters are that the mass m1=2.00 kg of the connecting rod 1, the length l1=0.35 m, the mass m1=0.85 kg of the connecting rod 2 and the length l1=0.31 m. I1 and I2 correspond to the moment of inertia of the two bars, respectively.

And step A433, controlling the mechanical arm to execute a grabbing action based on the expected track so as to complete the grabbing task.

The method includes the steps of controlling the mechanical arm to execute grabbing action based on the expected track to complete grabbing task, wherein the method includes the steps of constructing control rate according to a synovial surface according to adaptive neural network fitting time-varying output constraint, and controlling the mechanical arm to execute grabbing action according to the time-varying output constraint, the control rate and the expected track.

It should be noted that, a set of kinematic and dynamic models of isomorphic mechanical arms and a plurality of cartesian space trajectory models of expected regression are constructed, and each expected regression trajectory is defined as an execution state. And converting the space track coordinates in the Cartesian coordinate system into coordinates in the joint space according to the constructed mechanical arm kinematics model, and further designing the controller. And adopting a controller form under the constraint condition of the adaptive neural network algorithm design.

It can be understood that in this embodiment, the mechanical arm dynamics model is constructed for the mechanical arm model, and the mechanical arm dynamics model mathematics of the hysteresis effect is introduced while the mechanical arm dynamics model is constructed, and the repeatability of the grabbing motion is considered.

It should be noted that, when the track generated by the trained neural network is a motion track generated based on a kinetic model considering the hysteresis effect of the mechanical arm, a rapid sliding film is constructed by the sliding film surface to adjust the control rate of the mechanical arm. And meanwhile, the corresponding running track is generated by considering the time-varying output constraint and the control rate of the mechanical arm.

It is emphasized that, considering the repeatability of the gripping motion, the mechanical arm dynamics model introducing the hysteresis effect is mathematically described as:

Wherein, Respectively represent a joint angle vector, a velocity vector and an acceleration vector.Respectively a positive definite inertia matrix, a Coriolis force matrix and a centrifugal force matrix and a gravity matrix.Is the system input signal, and the input signal is affected by hysteresis effects. For the above symbols, the following is usedAndTo simplify the correspondence in the kinetic equation. Finally, regarding the output signal under hysteresis effectIs expressed as

Wherein, Is a normal number and satisfies the conditionThe hysteresis equation can be solved.

Then the hysteresis characteristics may be rewritten as

Wherein the method comprises the steps of,。

Further, according to the model information obtained in the step 1, the method comprises the following steps ofThe dynamics model is as follows:

defining a position error and a velocity error as follows:

wherein, a virtual control law is designed Is that

Wherein K ₁ is the gain factor:

The output error e ₁ under consideration will be subject to time-varying output constraints, i.e Wherein。

Further, a control law is constructed by adopting a rapid synovial membrane method, and a synovial membrane surface is constructed as

The gain coefficient and positive definite matrix are as follows:

wherein p and q are positive odd numbers and satisfy the condition 。

The final concrete control law form is:

Wherein:

jacobian matrix:

The expected track of the mechanical arm is

In a specific implementation, fig. 6 shows the variation of the mechanical arm input signal in the control process, fig. 7 is a comparison chart of joint output errors, and compared with a controller designed based on a quadratic lyapunov function, the mechanical arm output performance is further improved by combining the barrier lyapunov function. FIG. 8 is a comparison diagram of whether to compensate for hysteresis effects, and the neural network is used to compensate for hysteresis signals, so that the stability of the system can be further improved, and the overall accuracy of the system can be improved. Fig. 9 and 10 show the effect of different parameters η on the system convergence time, the specific parameter settings are shown in table 1:

TABLE 1

In the embodiment, the neural network generating the grabbing path is optimized and trained through the hysteresis effect of the mechanical arm and time-varying output constraint, so that the mechanical arm grabbing path meeting the grabbing task requirement is obtained.

The above is only a possible implementation of step S43 provided in this embodiment, and the specific implementation of step S43 in this embodiment is not specifically limited.

The embodiment provides a mechanical arm control method, which adopts a neural network method to construct a self-adaptive neural network fitting unknown constraint aiming at a controller with time-varying output constraint and hysteresis effect, so that a mechanical arm grabbing path meeting the grabbing task requirement is generated, and meanwhile, on the basis, the corresponding speed of a system output signal is improved by combining a rapid terminal sliding film technology, and the safety and stability of the system are ensured.

It should be noted that the foregoing examples are only for understanding the present application, and are not meant to limit the control method of the mechanical arm of the present application, and more forms of simple transformation based on the technical concept are all within the scope of the present application.

The present application also provides a mechanical arm control device, referring to fig. 11, the mechanical arm control device includes:

an acquisition module 10 for acquiring voice information and image information;

The model analysis module 20 is configured to analyze the voice information and the image information based on a large language model, so as to obtain grabbing task target information of the mechanical arm;

The mechanical arm control module 30 is configured to obtain a mechanical arm grabbing pose according to the grabbing task target information;

the mechanical arm control module 30 is further configured to control the mechanical arm to complete a grabbing task based on the mechanical arm grabbing pose controller.

The mechanical arm control device provided by the application can solve the technical problems that the prior mechanical arm grabbing task needs to be clearly instructed, and the mechanical arm grabbing task has poor intelligent degree and cannot be well applied to practice by adopting the mechanical arm control method in the embodiment. Compared with the prior art, the mechanical arm control device has the same beneficial effects as the mechanical arm control method provided by the embodiment, and other technical features in the mechanical arm control device are the same as the features disclosed by the method of the embodiment, and are not repeated herein.

The model analysis module 20 is further configured to identify keyword information according to the voice information, perform color extraction and depth information extraction on the image information, and obtain a supplemental constraint;

The mechanical arm control module 30 is further configured to extract a reference image of the target object according to camera coordinate information of the capturing task target information;

The mechanical arm control module 30 is further configured to obtain a preset hand-eye calibration matrix and a preset inverse kinematics matrix of the mechanical arm;

The mechanical arm control module 30 is further configured to convert the mechanical arm grabbing pose into mechanical arm space coordinates according to the preset hand-eye calibration matrix;

The mechanical arm control module 30 is further configured to obtain a current state of the mechanical arm, and obtain a desired state according to the base coordinates of the mechanical arm;

And controlling the mechanical arm to execute a grabbing action based on the expected track so as to complete grabbing tasks.

The mechanical arm control module 30 is further configured to construct a control rate according to the synovial surface according to the adaptive neural network fitting time-varying output constraint;

The application provides a mechanical arm control device which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the mechanical arm control method in the first embodiment.

Referring now to fig. 12, a schematic diagram of a robotic arm control device suitable for use in implementing embodiments of the present application is shown. The robot arm control device in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (Personal DIGITAL ASSISTANT: personal digital assistant), a PAD (Portable Application Description: tablet computer), a PMP (Portable MEDIA PLAYER: portable multimedia player), an in-vehicle terminal (e.g., an in-vehicle navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The robot arm control apparatus shown in fig. 12 is only one example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present application.

As shown in fig. 12, the robot arm control apparatus may include a processing device 1001 (e.g., a central processing unit, a graphic processor, etc.), which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access Memory (RAM: random Access Memory) 1004. In the RAM1004, various programs and data necessary for the operation of the robot arm control apparatus are also stored. The processing device 1001, the ROM1002, and the RAM1004 are connected to each other by a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. In general, a system including an input device 1007 such as a touch screen, a touch pad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc., an output device 1008 including a Liquid crystal display (LCD: liquid CRYSTAL DISPLAY), a speaker, a vibrator, etc., a storage device 1003 including a magnetic tape, a hard disk, etc., and a communication device 1009 may be connected to the I/O interface 1006. The communication means 1009 may allow the robot arm control device to communicate wirelessly or by wire with other devices to exchange data. While a robotic arm control device having various systems is shown in the figures, it should be understood that not all of the illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through a communication device, or installed from the storage device 1003, or installed from the ROM 1002. The above-described functions defined in the method of the disclosed embodiment of the application are performed when the computer program is executed by the processing device 1001.

The mechanical arm control equipment provided by the application can solve the technical problems that the prior mechanical arm grabbing task needs clear instructions, and the mechanical arm grabbing task has poor intelligent degree and cannot be well applied to practice by adopting the mechanical arm control method in the embodiment. Compared with the prior art, the beneficial effects of the mechanical arm control device provided by the application are the same as those of the mechanical arm control method provided by the embodiment, and other technical features of the mechanical arm control device are the same as those disclosed by the method of the previous embodiment, so that the description is omitted herein.

It is to be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

The present application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon for performing the robot arm control method in the above-described embodiments.

The computer readable storage medium provided by the present application may be, for example, a U disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access Memory (RAM: random Access Memory), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM: erasable Programmable Read Only Memory or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (Radio Frequency) and the like, or any suitable combination of the foregoing.

The computer readable storage medium may be included in the robot arm control apparatus or may exist alone without being incorporated in the robot arm control apparatus.

The computer readable storage medium is loaded with one or more programs, and when the one or more programs are executed by the mechanical arm control equipment, the mechanical arm control equipment is enabled to acquire voice information and image information, analyze the voice information and the image information based on a large language model to acquire grabbing task target information of the mechanical arm, acquire grabbing pose of the mechanical arm according to the grabbing task target information, and control the mechanical arm to complete grabbing tasks based on the mechanical arm grabbing pose controller.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN: local Area Network) or a wide area network (WAN: wide Area Network), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present application may be implemented in software or in hardware. Wherein the name of the module does not constitute a limitation of the unit itself in some cases.

The readable storage medium provided by the application is a computer readable storage medium, and the computer readable storage medium stores computer readable program instructions (namely computer programs) for executing the mechanical arm control method, so that the technical problem that the prior mechanical arm grabbing task needs clear instructions and the intelligent degree of the mechanical arm grabbing task is poor and cannot be well applied in practice can be solved. Compared with the prior art, the beneficial effects of the computer readable storage medium provided by the application are the same as those of the mechanical arm control method provided by the embodiment, and are not repeated here.

The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a method for controlling a robotic arm as described above.

The computer program product provided by the application can solve the technical problems that the existing mechanical arm grabbing task needs clear instructions, the intelligent degree of the mechanical arm grabbing task is poor, and the mechanical arm grabbing task cannot be well applied to practice. Compared with the prior art, the beneficial effects of the computer program product provided by the application are the same as those of the mechanical arm control method provided by the embodiment, and are not repeated here.

The foregoing description is only a partial embodiment of the present application, and is not intended to limit the scope of the present application, and all the equivalent structural changes made by the description and the accompanying drawings under the technical concept of the present application, or the direct/indirect application in other related technical fields are included in the scope of the present application.

Claims

1. The mechanical arm control method is characterized by comprising the following steps of:

acquiring voice information and image information;

The large language model-based analysis is performed on the voice information and the image information to obtain grabbing task target information of the mechanical arm, and the method comprises the following steps:

Performing color extraction and depth information extraction on the image information according to the voice information recognition keyword information to obtain supplementary constraints, wherein the supplementary constraints comprise specification information, a coordinate origin and output information specifications of a target object picture;

acquiring task target information based on the keyword information and the image information, wherein the task target information comprises target object category, name and camera coordinate information;

the mechanical arm is controlled to complete a grabbing task based on the mechanical arm grabbing pose controller;

the controlling the mechanical arm to complete the grabbing task based on the grabbing pose of the mechanical arm comprises the following steps:

Constructing a control rate according to the adaptive neural network fitting time-varying output constraint and the synovial surface, wherein the control rate is used for hysteresis compensation in the control process of the mechanical arm;

2. The method for controlling a robotic arm according to claim 1, wherein the obtaining the robotic arm gripping pose according to the gripping task target information comprises:

3. The method for controlling a manipulator according to claim 1, wherein the performing coordinate transformation on the manipulator gripping pose according to the preset hand-eye calibration matrix and the preset inverse kinematics matrix to obtain a manipulator base coordinate of the manipulator gripping pose comprises:

4. A robot arm control device, characterized in that the robot arm control device comprises:

The model analysis module is further used for identifying keyword information according to the voice information, performing color extraction and depth information extraction on the image information to obtain supplementary constraints, wherein the supplementary constraints comprise specification information, coordinate origins and output information specifications of the target object picture; obtaining task target information based on the keyword information and the image information, wherein the task target information comprises target object category, name and camera coordinate information;

The mechanical arm control module is further used for controlling the mechanical arm to complete a grabbing task based on the mechanical arm grabbing pose controller;

The mechanical arm control module is further used for obtaining a preset hand-eye calibration matrix and a preset inverse kinematics matrix of the mechanical arm, carrying out coordinate transformation on the mechanical arm grabbing pose according to the preset hand-eye calibration matrix and the preset inverse kinematics matrix to obtain a mechanical arm base coordinate of the mechanical arm grabbing pose, obtaining a current state of the mechanical arm, obtaining an expected state according to the mechanical arm base coordinate, generating an expected track according to the current state, the expected state and a mechanical arm dynamics model, fitting a time-varying output constraint according to a self-adaptive neural network, constructing a control rate according to a synovial membrane surface, wherein the control rate is used for hysteresis compensation in a mechanical arm control process, and controlling the mechanical arm to execute grabbing actions according to the time-varying output constraint, the control rate and the expected track.

5. A robot control apparatus comprising a memory, a processor, and a robot control program stored on the memory and executable on the processor, the robot control program being configured to implement the robot control method according to any one of claims 1 to 3.

6. A storage medium having stored thereon a robot control program which, when executed by a processor, implements the robot control method according to any one of claims 1 to 3.