[go: up one dir, main page]

CN118999570A - Robot path planning method, device and medium based on quantum search and reinforcement learning - Google Patents

Robot path planning method, device and medium based on quantum search and reinforcement learning Download PDF

Info

Publication number
CN118999570A
CN118999570A CN202411105626.9A CN202411105626A CN118999570A CN 118999570 A CN118999570 A CN 118999570A CN 202411105626 A CN202411105626 A CN 202411105626A CN 118999570 A CN118999570 A CN 118999570A
Authority
CN
China
Prior art keywords
robot
quantum
search
model
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411105626.9A
Other languages
Chinese (zh)
Other versions
CN118999570B (en
Inventor
陆骊工
占美晓
黄超
苏燕红
刘锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Peoples Hospital
Original Assignee
Zhuhai Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Peoples Hospital filed Critical Zhuhai Peoples Hospital
Priority to CN202411105626.9A priority Critical patent/CN118999570B/en
Publication of CN118999570A publication Critical patent/CN118999570A/en
Application granted granted Critical
Publication of CN118999570B publication Critical patent/CN118999570B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N10/00Quantum computing, i.e. information processing based on quantum-mechanical phenomena
    • G06N10/60Quantum algorithms, e.g. based on quantum optimisation, quantum Fourier or Hadamard transforms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Condensed Matter Physics & Semiconductors (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a robot path planning method, a device and a medium based on quantum search and reinforcement learning, wherein the method comprises the steps of quantizing a real environment and a robot state, dynamically mapping the real environment and the robot state to a quantum field, and establishing a quantum field; searching in a quantum domain by utilizing a quantum superposition principle based on a preset quantum search model to obtain a plurality of search samples; determining one or more target search samples from the plurality of search samples according to the probability that the robot reaches the target position; the method comprises the steps of determining a basic reinforcement learning model in advance, training the basic reinforcement learning model by utilizing a quantum search model and a target search sample, and determining the trained reinforcement learning model as a robot path planning model. The method can quickly obtain the high-quality target search sample, effectively reduce the number of training rounds of the robot path planning and improve the quality and the speed of the robot path planning.

Description

Robot path planning method, device and medium based on quantum search and reinforcement learning
Technical Field
The invention relates to the field of robot path planning, in particular to a method, a device and a medium for robot path planning based on quantum search and reinforcement learning.
Background
With the development of sensor, battery and artificial intelligence (ARTIFICIAL INTELLIGENCE) technologies, robots have assumed important tasks in various unstructured environments, such as disaster-site rescue, search actions within a building cluster, package management of indoor cargo areas, and the like. However, while robots can function in these complex and diverse environments, their speed of adaptation is still limited, mainly due to their limited on-board computing power and dynamic disturbances from the environment, such as dynamic obstacles and variations in battery power. These factors lead to inaccuracy in the robot planning and response delay in emergency situations.
Reinforcement learning (Reinforcement Learning, RL) is a machine learning method that allows an agent (e.g., a robot) to learn how to take action to maximize a jackpot by interacting with the environment. Reinforcement learning offers several key advantages in terms of robot adaptation to dynamic environments. Adaptivity: reinforcement learning algorithms allow robots to learn and adapt themselves in unknown or changing environments. By means of trial and error, the robot can gradually understand the modes in the environment and learn to effectively cope with the changes of the modes; exploring and utilizing balance: reinforcement learning algorithms are typically designed with a balancing mechanism between exploration and utilization, which means that robots will not only utilize the best strategy known, but also explore unknown possibilities to find better strategies. While the introduction of reinforcement learning and environmental feedback to enhance the exploration ability of a robot may increase its adaptation speed, such learning is not only time consuming and costly, but also presents a certain risk, such as a collision problem between the robot and an obstacle, due to the large amount of interaction with the environment required.
The above disclosure of background art is only for aiding in understanding the inventive concept and technical solution of the present application, and it does not necessarily belong to the prior art of the present application, nor does it necessarily give technical teaching; the above background should not be used to assess the novelty and creativity of the present application in the event that no clear evidence indicates that such is already disclosed prior to the filing date of the present application.
Disclosure of Invention
The invention aims to provide a robot path planning method, a device and a medium based on quantum search and reinforcement learning, which are different from the traditional quantum reinforcement learning method based on a quantum network model instead of a neural network model, and are based on real environment quantization and quantum superposition mechanisms, the method has the advantages that the high-quality target search sample is quickly obtained and used for path planning strategy training, so that the accuracy of robot path planning can be improved, the interaction times of reinforcement learning and environment can be reduced under the condition that the bottom reinforcement learning algorithm framework is not changed, the training round number is effectively reduced, and the efficiency and the speed of robot path planning are improved.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a robot path planning method based on quantum search and reinforcement learning comprises the following steps:
quantizing the real environment and the robot state, including dynamically mapping the real world environment and the robot state to a quantum field to establish a quantum field;
Searching in the quantum domain by utilizing a quantum superposition principle based on a preset quantum search model and obtaining a plurality of search samples, wherein each search sample corresponds to the probability of the robot reaching the target position;
determining one or more target search samples from a plurality of search samples according to the probability that the robot reaches the target position;
And a basic reinforcement learning model is predetermined, the basic reinforcement learning model is trained by utilizing the quantum search model and the target search sample, and the trained reinforcement learning model is determined to be a robot path planning model.
Further, any one or a combination of the foregoing aspects, in a process of quantizing a robot state, encoding the robot state, and representing the robot state as a quantum superposition state;
Based on a preset quantum search model, carrying out parallel search in the quantum domain by utilizing the quantum superposition state and obtaining a plurality of search samples, wherein the parallel search is used for controlling a robot to explore a plurality of positions in the quantum domain at the same time.
Further, according to any one or a combination of the above-mentioned technical solutions, a real-world environment and a robot state are dynamically mapped to a quantum field to create a quantum field, wherein the quantum field includes a quantized simulation environment, quantum state parameters and quantum state initialization information;
The method for establishing the quantized simulation environment comprises the following steps: based on the accessibility of the robot in the real environment, each position in the environment is provided with a unique binary code by the environment code, and the binary code corresponds to an orthogonal base of the quantum code to obtain the quantized simulation environment;
The quantum state parameters are obtained by mapping elements in a real-world reinforcement learning framework into the quantized simulation environment, wherein the elements in the real-world reinforcement learning framework comprise a current moment state s t, a current moment action a t, a next moment state s t+1 and a reward r;
The quantum state initialization information is obtained by mapping an initial position and a target position of the robot in the real environment and obstacle distribution information in the real environment into the quantized simulation environment.
Further, any one or a combination of the foregoing, the accessibility of the robot in the real environment is determined by the adjacency matrix and expressed as the following formula:
Wherein A represents the accessibility matrix of the robot in the real environment, A surround represents the accessibility matrix of the robot under the influence of the characteristics of the surrounding environment, A obs represents the accessibility matrix of the robot under the influence of the obstacle distribution, A robot represents the accessibility matrix of the robot under the influence of the robot setting, Representing an element product operator.
Further, any one or a combination of the above technical solutions may further include setting a probability value corresponding to a position at an initial moment of the robot to be greater than probability values corresponding to other positions; and/or the number of the groups of groups,
And setting an exit node, so that when the robot reaches a target position, the robot is transferred to the exit node with 100% probability at the next moment, and when the robot is positioned at the exit node, the robot is not transferred to the target position at the next moment.
Further, any one or a combination of the foregoing technical solutions, based on an open system main equation, the quantum search model is constructed by using a quantum random walk model and a classical random walk model, and the quantum search model is represented by the following formula:
dρ/dt=(1-p)LQW(ρ)+pLCRW(ρ)+Ltarget(ρ)
Wherein ρ represents the probability distribution of the robot state, p represents the weighting coefficient of the quantum random walk model, 1-p represents the weighting coefficient of the classical random walk model, L QW (ρ) = -i [ a', ρ ] represents the quantum random walk model, wherein a represents the reachability matrix of the robot in the quantized simulation environment, A classical random walk model is represented, where L ij=(A′ij/dj)|i><j|,dj represents the degree of position j, |i > represents the ith orthogonal basis of the hilbert space, L target (ρ) =2|n+1 > < n|ρ n > < n+1|- { |n > < n|, ρ } represents that when the robot state is at the target position (n), the robot state will irreversibly transition to the exit node (n+1) at the next time.
Further, in the quantized simulation environment, the probability that the robot is located in each grid is represented by a diagonal element ρ ij of a probability distribution ρ, where i= j;
After parallel search at the time T, the probability p (T) that the robot reaches the target position in the quantized simulation environment is expressed as: where n represents all target locations in the environment.
Further, any one or a combination of the foregoing, determining one or more target search samples from a plurality of the search samples by comparing probabilities of arrival of the robot at the target position, includes:
Determining a search sample corresponding to the highest probability of the robot reaching the target position as the target search sample; or alternatively
Determining a search sample corresponding to each probability that the robot reaches the target position is greater than a preset threshold as the target search sample; or alternatively
And sequencing according to the sequence from high probability to low probability that the robot reaches the target position, and confirming that the search samples corresponding to k probabilities before sequencing are the target search samples, wherein k is more than or equal to 2 and k is a positive integer.
Further, in the foregoing any one or a combination of the foregoing any one or more aspects, the structure of the target search sample is (s t_b,at_b,st+1_b,rb), where s t_b is a current time state in the quantized simulation environment corresponding to the target search sample, a t_b is a current action of a robot in the quantized simulation environment corresponding to the target search sample, s t+1_b is a next time state in the quantized simulation environment corresponding to the target search sample, including a robot state and an environment state, and r b is a reward returned by a real environment corresponding to the target search sample.
Further, according to any one or a combination of the foregoing aspects, training the basic reinforcement learning model using the quantum search model and the target search sample includes the steps of:
Packaging the quantum search model and the target search sample into a quantum meditation function module, wherein the input of the quantum meditation function module is in a current state, and the output of the quantum meditation function module is an optimal robot action;
And adding the quantum meditation function module as a robot action selection module into a basic reinforcement learning model, and training the basic reinforcement learning model by taking a target search sample as a training set.
Further, any one or a combination of the foregoing, the optimal robot action is determined by:
based on the quantum search model, taking the current state as input, traversing all possible actions in the quantum field by the robot to obtain the probability that the robot can reach the target position when selecting different actions And determining the robot action with the maximum probability of reaching the target position at the current moment as the optimal robot action according to the position size of the probability of reaching the target position corresponding to each robot action.
Further, any one or a combination of the foregoing aspects, a planned path of the robot from the initial position to the target position is obtained using the robot path planning model.
According to another aspect of the present invention, there is provided a robot path planning apparatus in which a robot path planning model is provided; wherein the robot path planning model is obtained by:
quantizing the real environment and the robot state, including dynamically mapping the real world environment and the robot state to a quantum field to establish a quantum field;
Searching in the quantum domain by utilizing a quantum superposition principle based on a preset quantum search model and obtaining a plurality of search samples, wherein each search sample corresponds to the probability of the robot reaching the target position;
determining one or more target search samples from a plurality of search samples according to the probability that the robot reaches the target position;
And a basic reinforcement learning model is predetermined, the basic reinforcement learning model is trained by utilizing the quantum search model and the target search sample, and the trained reinforcement learning model is determined to be a robot path planning model.
Further, any one or a combination of a plurality of the foregoing technical solutions, and the robot path planning model is obtained based on the robot path planning method based on quantum search and reinforcement learning according to any one or a combination of a plurality of the foregoing technical solutions.
According to another aspect of the present invention there is provided a computer readable storage medium storing program instructions configured to be invoked by a processor to perform the steps of a method as set out in any one or a combination of the above.
The technical scheme provided by the invention has the following beneficial effects:
a. The invention is based on real environment quantization and quantum superposition mechanism, can quickly obtain high-quality target search samples for path planning strategy training, can not only improve the accuracy of robot path planning, but also reduce the number of times of reinforcement learning and environment interaction without changing the bottom reinforcement learning algorithm architecture, effectively reduce the number of training rounds, and improve the efficiency and speed of robot path planning;
b. According to the invention, the state of the robot is expressed as a quantum superposition state in the process of quantizing the state of the robot, so that parallel searching can be realized, the robot can explore a plurality of positions at the same time, the efficiency of acquiring each searching sample can be effectively improved, and the speed of planning a robot path is improved;
c. According to the method, the quantum search is combined with the parallel search, so that a useful high-quality target search sample can be obtained easily, namely, a sample with rewards not being 0 or more target position information is provided;
d. according to the method, the accessibility of the robot in the real environment is determined based on the characteristics of the surrounding environment, the obstacle distribution and the robot setting, and the accessibility is mapped to the quantum domain to construct a quantized simulation environment, so that the accuracy of robot path planning can be improved;
e. According to the invention, the quantum search model is constructed through the weighted combination of the quantum random walk model and the classical random walk model, so that the quick search is ensured, and the anti-interference capability in the search process is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
FIG. 1 is a flow chart of a method for robotic path planning based on quantum search and reinforcement learning provided by an exemplary embodiment of the present invention;
FIG. 2 is a schematic diagram of factors affecting real environment quantization provided by an exemplary embodiment of the present invention;
FIG. 3 is a flow chart of a quantum search process provided by an exemplary embodiment of the present invention;
FIG. 4 is a framework diagram of a quantum search and reinforcement learning fusion provided by an exemplary embodiment of the present invention;
FIG. 5 is a schematic diagram of a comparison of the number of reinforcement learning training rounds with a model of quantum search and the number of reinforcement learning training rounds corresponding to other models according to an exemplary embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
Based on the defects of the prior art, the invention provides a robot path planning method, device and medium based on quantum search and reinforcement learning, and the method and device are based on a quantum superposition mechanism, so that a high-quality target search sample is quickly obtained for path planning strategy training, the accuracy of robot path planning can be improved, the number of training rounds can be effectively reduced, and the speed of robot path planning is improved.
In one embodiment of the present invention, a method for planning a robot path based on quantum search and reinforcement learning is provided, see fig. 1, the method comprising the steps of:
quantizing the real environment and the robot state, including dynamically mapping the real world environment and the robot state to a quantum field to establish a quantum field;
Searching in the quantum domain by utilizing a quantum superposition principle based on a preset quantum search model and obtaining a plurality of search samples, wherein each search sample corresponds to the probability of the robot reaching the target position;
determining one or more target search samples from a plurality of search samples according to the probability that the robot reaches the target position;
And a basic reinforcement learning model is predetermined, the basic reinforcement learning model is trained by utilizing the quantum search model and the target search sample, and the trained reinforcement learning model is determined to be a robot path planning model.
In one embodiment of the invention, in the process of quantizing robot states, the robot states are encoded, and the robot states are represented as quantum superposition states configured to represent probabilities that the robot is at various positions in the quantum domain at the same time. Based on a preset quantum search model, carrying out parallel search in the quantum domain by utilizing the quantum superposition state and obtaining a plurality of search samples, wherein the parallel search is used for controlling a robot to explore a plurality of positions in the quantum domain at the same time.
Embodiments of the present invention are further described below with reference to fig. 1-5.
The real-world environment and the robot state are quantized, including dynamically mapping the real-world environment and the robot state to quantum fields to create quantum domains. The quantum domain comprises a quantized simulation environment, quantum state parameters and quantum state initialization information.
S1, real environment quantization, namely dynamically mapping the real world environment to the quantum field, and establishing a quantized simulation environment.
S1.1, in one embodiment of the invention, a method of using an adjacency matrix is used for representing the accessibility of a robot in a real environment, and describing the real-world environment dynamics in the quantum field based on the accessibility matrix and is used for simulating a quantum search model built later. Of course, the method of implementing real environment quantization is not limited to the method of the adjacency matrix, and any method capable of implementing the same technical effects as the adjacency matrix method can be used for real environment quantization, such as a list, a tree, a graph method, etc. for describing the relationship between the adjacent states of the robot, and the same technical effects as the adjacency matrix method can be implemented.
The method of using the adjacency matrix to represent the accessibility of the robot in a real environment is a preferred embodiment provided by the present invention. In this embodiment, the accessibility of the robot in the real environment is determined by encoding the environment, and a quantized simulation environment is established according to the accessibility of the robot in the real environment, in which each position has a unique binary code corresponding to the orthogonal basis of the quantum code.
S1.1.1, for example, for a 400m×400m real environment, it is divided into 13×13 two-dimensional grids. Of course, in other embodiments, the grid sizes and dimensions may be divided into different grid sizes and dimensions according to the real environment size, requirements and thickness requirements.
Then, for the path planning task, as shown in fig. 2, the following three aspects affect the accessibility of the robot in a real environment:
(1) Characteristics of the surrounding environment: for example, to ensure robot safety, in unstructured and uncertain environments the robot needs to be kept a first distance (e.g. 1.0m or more) from the obstacle, whereas in structured environments (obstacles are simple geometric structures such as rectangular, triangular, circular etc.) the robot needs to be kept a second distance (e.g. 0.5m or more) from the obstacle, said second distance being smaller than said first distance; it should be noted that, 1.0m and 0.5m in this paragraph are only exemplary, and not limiting the scope of the present invention, in other embodiments or specific applications, the first distance may be determined to be 0.5m, 2m, 3m, etc. according to actual needs, and the second distance may be determined to be 0.1m, 0.2m, 1m, etc.;
(2) Barrier distribution: such as the layout of buildings, trees, stones, etc., can affect the accessibility of the robot;
(3) Robot setting: the "capabilities", "oil volume" and "condition" of the robot can affect the accessibility of the robot. For example, a small robot cannot traverse a muddy road, cannot reach a distant location with a small oil amount, and the like. Of course, in other embodiments, the reachability of the robot may be determined based on other empirical knowledge.
Based on the above analysis, the present invention proposes the following formula for determining the reachability of a robot in a real environment:
Wherein A represents the accessibility matrix of the robot in the real environment, A surround represents the accessibility matrix of the robot under the influence of the characteristics of the surrounding environment, A obs represents the accessibility matrix of the robot under the influence of the obstacle distribution, A robot represents the accessibility matrix of the robot under the influence of the robot setting, Representing an element product operator. In the reachable matrix, if the robot can reach the environment position corresponding to the matrix element, the value of the matrix element is set to 1, otherwise, the value is set to 0.
S1.1.2, re-using the python programming language to build a simulation environment based on the reachability matrix a. As shown in fig. 3 (upper right), the simulation environment has a maze structure similar to that shown in the upper right corner of fig. 3, where a reachable matrix element value of 1 corresponds to a white position in the simulation environment, and otherwise corresponds to a black position in the simulation environment. While the cyan position in the simulated environment represents the initial robot position and the red color represents the target position. Of course, in other embodiments, other software and programming languages may be used to build the simulation environment.
S1.2, the state of the robot is quantized, the state of the robot in the real environment is mapped into the quantized simulation environment, and the state of the robot in the quantum domain is expressed as a quantum superposition state by encoding the state of the robot, so that the robot can explore a plurality of positions in the quantized simulation environment at the same time.
S1.2.1, in one embodiment of the present invention, in order to use the efficient parallel search principle of the quantum superposition mechanism and the later more convenient fusion with the reinforcement learning framework, elements in the real world reinforcement learning framework (including the current time state s t, the current time action a t, the next time state s t+1 and the prize r) are mapped into the quantum domain to obtain the quantum state parameters. The quantum state parameters include a current time state |s t >, a current time action |a t >, a next time state |s t+1 >, and a reward |r > in the quantized simulation environment, wherein |s t > is the current time state in the quantized simulation environment including a robot state and an environment state, |a t > is the current action of the robot in the quantized simulation environment, |s t+1 > is the next time state in the quantized simulation environment including a robot state and an environment state, |r > is a reward returned in a quantum domain in an overlapped state.
In the quantum domain, to accelerate computation using quantum superposition, a quantum-dynamics-based quantum search model takes as input the state |s t > in superposition, the robot can select a plurality of actions |a t > in superposition, obtain |s t+1 > in superposition at the next time, and obtain the corresponding reward |r > in superposition. In the quantum domain, the robot may be at multiple positions at each moment, so a discrete probability distribution ρ is used to represent |s t > in the superimposed state. The number of possible values in the discrete probability distribution ρ is equal to the number of simulation environment grids, 13×13=169 in one embodiment of the invention.
S1.2.2 each grid in the quantized simulation environment has a unique binary code, corresponding to the orthogonal basis of the quantum code, and the required number of quantum bits is obtained by the following equation:
Where N bit represents the number of qubits, N cell represents the number of grids, and the number of qubits required in one embodiment of the present invention is 8, although in other embodiments, a suitable number of qubits may be selected according to the actual situation. The quantum field, i.e., robot action |a t > in the quantized simulation environment, is another probability distribution over all possible actions of the robot, and the discrete uniform distribution is used to represent robot action |a t > in one embodiment of the invention, although other probability distributions may be used in other embodiments.
S1.3, according to the initial position and the target position of the robot in the real environment and the obstacle distribution information in the real environment, quantum state initialization information is determined, wherein the quantum state initialization information comprises the initial position, the target position and the initial position of the obstacle of the robot in the quantized simulation environment.
And each grid position of the quantized simulation environment is provided with a binary code, the binary codes are in one-to-one correspondence with the quantum code orthogonal bases, and when the robot searches in parallel, each moment can be positioned at a plurality of grid positions, so that the probability distribution on the quantum code orthogonal bases can be used for describing the robot search in parallel. Thus, by quantum state initialization, the initial positions of the robot, the target point, and the obstacle can be set.
As shown in the lower right of fig. 3, the probability value corresponding to the position where the robot initial time is set is higher than the probability of other positions. Adding an Exit Node (Exit Node) at a position corresponding to the target point, wherein the Exit Node is usually a virtual Node, and when the robot reaches the target position in parallel searching, the robot can be transferred to the Exit Node with 100% probability at the next moment; if the robot is at the exit node, the robot will not be transferred to the target position at the next moment and will be kept at the exit node. Therefore, after a period of time of searching, the probability that the robot is at the exit node represents the total probability that the robot can reach the target point in the searching time period, and the total probability is used for judging the quality of the searched sample. Thus, after the quantum random walk parallel search, the probability value of the exit node is taken as the future jackpot |r > obtained after the robot takes action from the initial position.
In the present invention, the sample mass refers to how much this sample can provide information about where the target point is or about the target point. Such as conventional reinforcement learning, the sample obtained by the search environment is likely to be a useless sample (rewarding 0 or not providing positional information about the target point), but the present invention can easily obtain a useful sample (i.e., a sample rewarding not 0 or providing more target positional information) by quantum search in combination with parallel search. And then training reinforcement learning by using the obtained high-quality sample, so that the efficiency of path planning learning training can be improved, and the accuracy and speed of robot path planning can be improved.
S2, searching in parallel in the quantized simulation environment by utilizing a quantum superposition principle based on a preset quantum search model to obtain a plurality of search samples, wherein in the parallel searching process, each search process simultaneously searches different paths by taking the initial position of the robot as a starting point and the target position as an end point, and when the search process reaches the exit node, the search process is ended to obtain the search sample corresponding to the search process.
One or more target search samples are determined from a plurality of the search samples by comparing probabilities of arrival of the robot at the target position, and the target search sample in the present invention is a high-quality search sample. There are various ways to determine high quality target search samples.
The first way to determine the target search sample is: and determining a search sample corresponding to the highest probability of the robot reaching the target position as the target search sample. Another way is: and determining a search sample corresponding to each probability that the robot reaches the target position is greater than a preset threshold value as the target search sample. In addition, the target search samples may be determined by: and sequencing according to the sequence from high to low of the probability that the robot reaches the target position, and confirming that the search samples corresponding to k probabilities before sequencing (for example, the first three probabilities before sequencing) are the target search samples, wherein k is more than or equal to 2 and k is a positive integer.
The first way of determining the target search sample is optimal, which not only ensures that an effective target search sample can be obtained, but also obtains the target search sample with the highest quality.
In this embodiment, the open system Master equation (Lindblad Master) equation may be used to describe the environmental dynamics of the robot in the quantized simulation environment, and construct a quantum search model for fast search of high quality samples. Of course, the method of constructing the quantum search model is not limited to the Lindblad Master equation, and any method capable of achieving the same technical effects as the Lindblad Master equation can be used for constructing the quantum search model, such as the use of the Lindblad main equation, redfield relaxation and Floquet theory, and the techniques of the Suzuki-Trotter expansion and sparse solver numerical method.
In a preferred embodiment of the present invention, step S2 comprises the following steps.
S2.1, constructing a quantum search model by adopting a quantum random walk model and a classical random walk model based on an open system main equation. The classical random walk model can increase the anti-interference capability of the quantum search model. In the process of searching the quantized simulation environment, the evolution of the discrete probability distribution rho along with time is controlled by adopting the following open system main equation (Lindblad Master), wherein the open system main equation is formed by weighting and combining a quantum random walk model and a classical random walk model, so that the quick searching can be ensured, and the anti-interference capability of the quantum search model is improved:
dρ/dt=(1-p)LQW(ρ)+pLCRW(ρ)+Ltarget(ρ)
Wherein ρ represents a discrete probability distribution of robot states, p represents a weighting coefficient of a quantum random walk model, 1- ρ represents a weighting coefficient of a classical random walk model, L QW (ρ) = -i [ a, ρ ] represents a quantum random walk model, wherein a 'represents a reachability matrix of a robot in the quantized simulation environment, a' is obtained by converting the reachability matrix a of a robot in a real environment into the quantum domain, A classical random walk model is represented, where L ij=(A′ij/dj)|i><j|,A′ij represents the value of the (i, j) element in the reachability matrix a', indicating whether the robot can reach the position corresponding to (i, j), d j represents the degree of position j, |i > represents the i-th orthogonal basis of the hilbert space, L target (ρ) =2|n+1 > < n|ρ n > < n+1| - { |n > < n|, ρ } represents that the robot state will irreversibly transition to the exit node (n+1) at the next moment when the robot state is at the target position (n).
In the quantized simulation environment, the probability that a robot is located in each grid is represented by a diagonal element ρ ij of the probability distribution ρ, where i= j. The probability p (T) of the robot reaching the target position in the quantized simulation environment after the parallel search of T time is expressed as: where n represents all target locations in the environment.
S2.2, after the robots select different actions, comparing the probability of reaching the target position of the future robot corresponding to each search sample, and determining the search sample with the maximum probability value as the target search sample. The search sample in the invention has the same structure as the sample in the traditional reinforcement learning, and consists of a current time state s t, a current time action a t, a next time state s t+1 and a reward r. At the current time s t, the robot traverses all possible actions and obtains the probability and sum of the target positions that the robot can reach when selecting different actions a according to the quantum search modelAnd selecting the optimal robot action at the current moment as a t according to the probability of reaching the target position corresponding to each robot action.
S3, a basic reinforcement learning model is determined in advance, the basic reinforcement learning model is trained by utilizing the quantum search model and the target search sample, and the trained reinforcement learning model is determined to be a robot path planning model. According to the invention, the high-quality target search sample is integrated into the reinforcement learning process, and the basic reinforcement learning model is trained, so that the learning efficiency and the path planning strategy quality can be improved.
In a preferred embodiment of the invention, step S3 comprises the following sub-steps.
S3.1, packaging the quantum search model and the target search sample into a quantum meditation function module, wherein the input of the quantum meditation function module is in a current state, and the output of the quantum meditation function module is the optimal robot action.
As shown in fig. 3, the quantum meditation function module has an input of the current state s t and an output of the optimal robot motion a t. Given the input s t, the robot traverses all possible actions in the quantum field and obtains the probability of reaching the target position when the robot selects different actions a according to the quantum search modelAnd selecting and marking the optimal robot action as a t at the current moment according to the probability of reaching the target position corresponding to each robot action, and outputting the optimal robot action. Preferably, the optimal robot motion is a robot motion corresponding to a maximum probability of reaching the target position.
And S3.2, adding the quantum meditation function module serving as a robot action selection module into a basic reinforcement learning model, and training the basic reinforcement learning model by taking a target search sample as a training set. The quantum meditation function is fused with the traditional reinforcement learning flow, so that the learning training efficiency and the path planning strategy quality can be improved.
As shown in fig. 4, the quantum meditation function is fused as a robotic action selection module into a conventional reinforcement learning process. In the reinforcement learning training process, the current time s t is transferred to the quantum meditation function module, and the optimal robot action a t is obtained. Then, under the reinforcement learning framework, the robot performs the optimal actions in the simulation environment and gets rewards r and next time state s t+1, and high-quality target search samples (s t,at,st+1, r) are stored for training the path planning strategy.
It should be noted that, the probability value is rewarded in the quantum domain, and the high-quality target search sample is obtained through probability; the reward in the reinforcement learning process is a true environment return reward, where the reward is not a probability value, the reward function in reinforcement learning training provided by the invention not only considers the probability of reaching the target position of the robot, but also considers the length of a path and the time of reaching a target point, as described in detail below.
In a preferred embodiment of the invention, the underlying learning model may be trained by time-differential prediction error (TD Prediction Error). The time difference prediction error and the reinforcement learning parameter update are shown in the following equation:
TDerror=r+γV(st+1)-V(st)
V(st)←V(st)+α[r+γV(st+1)-V(st)]
Where r represents a reward returned from a real environment, γ represents a weighting factor of a future jackpot, α represents a reinforcement learning rate, V (s t+1) represents a state value (state value) predicted at time t+1 in the reinforcement learning process, V (s t) represents a state value (state value) predicted at time t in the reinforcement learning process, and TD error represents a difference between a reinforcement learning expectation value and a predicted value.
In one example of the present invention, the environmental rewards r are set in the following manner: the robot reaches the target position rewards +1, otherwise rewards-0.01. The reinforcement learning rate α is set to 0.01 in one embodiment of the invention.
The loss function of reinforcement learning training described above may be iteratively trained using Adam optimization methods.
Referring to fig. 5, the abscissa in fig. 5 represents different environmental dimensions, the ordinate represents the value of the required training round number, the first model represents a depth enhancement learning reference model with Replay Buffer (abbreviated as dqn.w), the second model represents a depth enhancement learning reference model with Replay Buffer (abbreviated as MCT dqn.w) with monte carlo search, and the first model is a depth enhancement learning model with Replay Buffer (abbreviated as Quantum dqn.w) with Quantum search proposed by the present invention. As can be seen from fig. 5, for different environmental dimensions, the number of training rounds required by the path planning method provided by the invention is smaller than the number of training rounds required by the model one and the model two, which indicates that the method can quickly obtain a high-quality target search sample for the path planning strategy training, reduce the number of training rounds, and improve the accuracy and the speed of the robot path planning method.
In one embodiment of the present invention, a planned path of the robot from the initial position to the target position is obtained using the robot path planning model determined in any of the above embodiments.
In one embodiment of the invention, a robot path planning device is provided, wherein a robot path planning model is arranged in the robot path planning device; wherein the robot path planning model is obtained by:
quantizing the real environment and the robot state, including dynamically mapping the real world environment and the robot state to a quantum field to establish a quantum field;
Searching in the quantum domain by utilizing a quantum superposition principle based on a preset quantum search model and obtaining a plurality of search samples, wherein each search sample corresponds to the probability of the robot reaching the target point position;
Determining one or more target search samples from a plurality of search samples according to the probability that the robot reaches the target point position;
And a basic reinforcement learning model is predetermined, the basic reinforcement learning model is trained by utilizing the quantum search model and the target search sample, and the trained reinforcement learning model is determined to be a robot path planning model.
In one embodiment of the invention, a robot path planning model training apparatus is provided that includes a quantization module, a quantum search module, and a reinforcement learning module. Wherein:
the quantization module is configured to quantize real world environments and robot states, including dynamically mapping real world environments and robot states to quantum fields to create quantum domains;
The quantum search module is preset with a quantum search model, is configured to search in the quantum domain by utilizing a quantum superposition principle and obtain a plurality of search samples, each search sample corresponds to the probability of a robot reaching a target point position, and is further configured to determine one or more target search samples from the plurality of search samples according to the probability of the robot reaching the target point position;
The reinforcement learning module is preset with a basic reinforcement learning model, and is configured to train the basic reinforcement learning model by utilizing the quantum search model and the target search sample, and determine that the trained reinforcement learning model is a robot path planning model.
In one embodiment of the invention, a computer readable storage medium is provided for storing program instructions configured to be invoked by a processor to perform the steps of the method of any of the embodiments above.
The robot path planning apparatus, the robot path planning model training apparatus, and the computer-readable storage medium embodiment described above are the same invention concept as the robot path planning method embodiment based on the quantum search and the reinforcement learning, and the entire contents of the robot path planning method embodiment based on the quantum search and the reinforcement learning are incorporated into the robot path planning apparatus, the robot path planning model training apparatus, and the computer-readable storage medium embodiment by way of reference.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely illustrative of the embodiments of this application and it will be appreciated by those skilled in the art that variations and modifications may be made without departing from the principles of the application, and it is intended to cover all modifications and variations as fall within the scope of the application.

Claims (15)

1.一种基于量子搜索和强化学习的机器人路径规划方法,其特征在于,包括以下步骤:1. A robot path planning method based on quantum search and reinforcement learning, characterized in that it includes the following steps: 将真实环境和机器人状态量子化,包括将现实世界的环境和机器人状态动态映射到量子领域建立量子域;Quantize the real environment and robot state, including dynamically mapping the real-world environment and robot state to the quantum domain to establish a quantum domain; 基于预设的量子搜索模型,利用量子叠加原理在所述量子域中进行搜索并获得多个搜索样本,每个搜索样本对应有机器人到达目标位置的概率;Based on a preset quantum search model, the quantum superposition principle is used to search in the quantum domain and obtain multiple search samples, each of which corresponds to the probability that the robot reaches the target position; 根据所述机器人到达目标位置的概率,从多个搜索样本中确定一个或多个目标搜索样本;Determine one or more target search samples from a plurality of search samples according to the probability that the robot reaches the target position; 预先确定基础强化学习模型,并利用所述量子搜索模型和所述目标搜索样本对所述基础强化学习模型进行训练,并确定训练后的强化学习模型为机器人路径规划模型。A basic reinforcement learning model is predetermined, and the basic reinforcement learning model is trained using the quantum search model and the target search sample, and the trained reinforcement learning model is determined to be a robot path planning model. 2.根据权利要求1所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,在将机器人状态量子化过程中,对机器人状态进行编码,将机器人状态表示为量子叠加态;2. The robot path planning method based on quantum search and reinforcement learning according to claim 1 is characterized in that, in the process of quantizing the robot state, the robot state is encoded and represented as a quantum superposition state; 基于预设的量子搜索模型,利用所述量子叠加态在所述量子域中进行并行搜索并获得多个搜索样本,其中,所述并行搜索为控制机器人同时探索所述量子域中的多个位置。Based on a preset quantum search model, the quantum superposition state is used to perform a parallel search in the quantum domain and obtain multiple search samples, wherein the parallel search is to control the robot to simultaneously explore multiple positions in the quantum domain. 3.根据权利要求1所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,将现实世界的环境和机器人状态动态映射到量子领域建立量子域,所述量子域包括量子化仿真环境、量子态参数和量子态初始化信息;3. The robot path planning method based on quantum search and reinforcement learning according to claim 1 is characterized in that the real-world environment and robot state are dynamically mapped to the quantum field to establish a quantum domain, and the quantum domain includes a quantized simulation environment, quantum state parameters and quantum state initialization information; 所述量子化仿真环境的建立方法包括:基于机器人在真实环境中的可达性,对环境编码使环境中的每个位置均有一个唯一的二进制编码,该二进制编码与量子编码的正交基相对应,获得所述量子化仿真环境;The method for establishing the quantized simulation environment comprises: based on the reachability of the robot in the real environment, encoding the environment so that each position in the environment has a unique binary code, the binary code corresponds to an orthogonal basis of the quantum code, and obtaining the quantized simulation environment; 所述量子态参数通过将现实世界强化学习框架中的元素映射到所述量子化仿真环境中得到,现实世界强化学习框架中的元素包括当前时刻状态st、当前时刻动作at、下一时刻状态st+1以及奖励r;The quantum state parameters are obtained by mapping elements in a real-world reinforcement learning framework to the quantized simulation environment, where the elements in the real-world reinforcement learning framework include a current state s t , a current action a t , a next state s t+1 and a reward r; 所述量子态初始化信息通过将机器人在真实环境中的初始位置和目标位置以及真实环境中的障碍物分布信息映射到所述量子化仿真环境中得到。The quantum state initialization information is obtained by mapping the initial position and target position of the robot in the real environment and the obstacle distribution information in the real environment into the quantized simulation environment. 4.根据权利要求3所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,通过邻接矩阵确定机器人在真实环境中的可达性并表示为以下公式:4. The robot path planning method based on quantum search and reinforcement learning according to claim 3 is characterized in that the reachability of the robot in the real environment is determined by the adjacency matrix and expressed as the following formula: 其中,A表示机器人在真实环境中的可达矩阵,Asurround代表周围环境的特征影响下的机器人可达矩阵,Aobs代表障碍分布影响下的机器人可达矩阵,Arobot代表机器人设置影响下的机器人可达性矩阵,代表元素乘积算子。Among them, A represents the reachability matrix of the robot in the real environment, A surround represents the reachability matrix of the robot under the influence of the characteristics of the surrounding environment, A obs represents the reachability matrix of the robot under the influence of obstacle distribution, and A robot represents the reachability matrix of the robot under the influence of robot settings. Represents the element-wise product operator. 5.根据权利要求3所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,还包括设置机器人初始时刻所在位置对应的概率值大于其它位置对应的概率值;和/或,5. The robot path planning method based on quantum search and reinforcement learning according to claim 3 is characterized in that it also includes setting the probability value corresponding to the robot's initial position to be greater than the probability values corresponding to other positions; and/or, 设置退出节点,使机器人到达目标位置时,下一时刻机器人以100%的概率转移到退出节点。Set the exit node so that when the robot reaches the target position, the robot will transfer to the exit node with a 100% probability at the next moment. 6.根据权利要求1所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,基于开放系统主方程,采用量子随机游走模型和经典随机游走模型构建所述量子搜索模型,所述量子搜索模型通过以下公式表示:6. The robot path planning method based on quantum search and reinforcement learning according to claim 1 is characterized in that, based on the open system master equation, the quantum search model is constructed by using a quantum random walk model and a classical random walk model, and the quantum search model is expressed by the following formula: dρ/dt=(1-p)LQW(ρ)+pLCRW(ρ)+Ltarget(ρ)dρ/dt=(1-p)L QW (ρ)+pL CRW (ρ)+L target (ρ) 其中,ρ表示机器人状态的概率分布,p表示量子随机游走模型的加权系数,1-p表示经典随机游走模型的加权系数,LQW(ρ)=-i[A,ρ]表示量子随机游走模型,其中,A表示机器人在所述量子化仿真环境中的可达矩阵,表示经典随机游走模型,其中,Lij=(A′ij/dj)|i><j|,dj表示位置j的度,|i>代表希尔伯特空间的第i个正交基,Ltarget(ρ)=2|n+1><n|ρ|n><n+1|-{|n><n|,ρ}表示当机器人状态处于目标位置(n)时,下一时刻机器人状态会不可逆的转移到退出节点(n+1)。Wherein, ρ represents the probability distribution of the robot state, p represents the weight coefficient of the quantum random walk model, 1-p represents the weight coefficient of the classical random walk model, L QW (ρ) = -i[A , ρ] represents the quantum random walk model, where A represents the reachable matrix of the robot in the quantized simulation environment, represents the classical random walk model, where L ij =(A′ ij /d j )|i><j|, d j represents the degree of position j, |i> represents the i-th orthogonal basis of the Hilbert space, and L target (ρ) = 2|n+1><n|ρ|n><n+1|-{|n><n|,ρ} means that when the robot state is at the target position (n), the robot state will irreversibly transfer to the exit node (n+1) at the next moment. 7.根据权利要求6所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,在所述量子化仿真环境中,机器人位于每个网格的概率由概率分布ρ的对角元素ρij所表示,其中i==j;7. The robot path planning method based on quantum search and reinforcement learning according to claim 6 is characterized in that, in the quantized simulation environment, the probability of the robot being located in each grid is represented by the diagonal element ρ ij of the probability distribution ρ, where i == j; 经过T时刻的并行搜索,在所述量子化仿真环境中机器人到达目标位置的概率p(T)表示为:其中,n表示环境中所有目标位置。After parallel search at time T, the probability p(T) of the robot reaching the target position in the quantized simulation environment is expressed as: Where n represents all target locations in the environment. 8.根据权利要求1所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,通过对比机器人到达目标位置的概率从多个所述搜索样本中确定一个或多个目标搜索样,包括:8. The robot path planning method based on quantum search and reinforcement learning according to claim 1 is characterized in that one or more target search samples are determined from a plurality of search samples by comparing the probability of the robot reaching the target position, comprising: 确定机器人到达目标位置的概率为最高值对应的搜索样本为所述目标搜索样本;或者,Determine that the search sample corresponding to the highest probability of the robot reaching the target position is the target search sample; or, 确定机器人到达目标位置的概率大于预设阈值的各个概率所对应的搜索样本为所述目标搜索样本;或者,Determine that the search samples corresponding to each probability that the robot reaches the target position is greater than a preset threshold as the target search samples; or, 按照机器人到达目标位置的概率由高到低的顺序进行排序,并确认排序前k个概率对应的搜索样本为所述目标搜索样本,k≥2且k为正整数。The robots are sorted in descending order according to the probability of reaching the target position, and the search samples corresponding to the first k probabilities are confirmed to be the target search samples, where k ≥ 2 and k is a positive integer. 9.根据权利要求1所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,所述目标搜索样本的结构为(st_b,at_b,st+1_b,rb),其中,st_b为所述目标搜索样本对应的在所述量子化仿真环境中的当前时刻状态,at_b为所述目标搜索样本对应的在所述量子化仿真环境中的机器人的当前动作,st+1_b为所述目标搜索样本对应的在所述量子化仿真环境中的下一时刻状态、包括机器人状态和环境状态,rb为所述目标搜索样本对应的现实环境返回的奖励。9. The robot path planning method based on quantum search and reinforcement learning according to claim 1 is characterized in that the structure of the target search sample is (s t_b , a t_b , s t+1_b , r b ), wherein s t_b is the current state of the target search sample in the quantized simulation environment, a t_b is the current action of the robot in the quantized simulation environment corresponding to the target search sample, s t+1_b is the next state of the target search sample in the quantized simulation environment, including the robot state and the environment state, and r b is the reward returned by the real environment corresponding to the target search sample. 10.根据权利要求1所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,利用所述量子搜索模型和所述目标搜索样本对所述基础强化学习模型进行训练,包括以下步骤:10. The robot path planning method based on quantum search and reinforcement learning according to claim 1 is characterized in that the basic reinforcement learning model is trained using the quantum search model and the target search sample, comprising the following steps: 将所述量子搜索模型和所述目标搜索样本封装为量子冥想函数模块,所述量子冥想函数模块的输入为当前状态,输出为最优机器人动作;Encapsulating the quantum search model and the target search sample into a quantum meditation function module, wherein the input of the quantum meditation function module is the current state, and the output is the optimal robot action; 将所述量子冥想函数模块作为机器人动作选择模块加入到基础强化学习模型中,以目标搜索样本为训练集对所述基础强化学习模型进行训练。The quantum meditation function module is added to the basic reinforcement learning model as a robot action selection module, and the basic reinforcement learning model is trained with target search samples as a training set. 11.根据权利要求10所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,所述最优机器人动作通过以下方式确定:11. The robot path planning method based on quantum search and reinforcement learning according to claim 10, characterized in that the optimal robot action is determined by the following method: 基于所述量子搜索模型,以当前状态为输入,机器人在量子领域遍历所有可能的动作,获得机器人选择不同动作时能够到达目标位置的概率根据每个机器人动作所对应的到达目标位置概率位置大小,确定在当前时刻下到达目标位置概率最大的机器人动作为所述最优机器人动作。Based on the quantum search model, the robot traverses all possible actions in the quantum field with the current state as input, and obtains the probability that the robot can reach the target position when choosing different actions. According to the probability position of reaching the target position corresponding to each robot action, the robot action with the highest probability of reaching the target position at the current moment is determined as the optimal robot action. 12.根据权利要求1-11中任意一项所述的基于量子搜索和强化学习的机器人路径规划方法,其特征在于,利用所述机器人路径规划模型获得机器人从初始位置到目标位置的规划路径。12. The robot path planning method based on quantum search and reinforcement learning according to any one of claims 1 to 11, characterized in that the robot path planning model is used to obtain the planned path of the robot from the initial position to the target position. 13.一种机器人路径规划装置,其特征在于,所述机器人路径规划装置中设置有机器人路径规划模型;其中,所述机器人路径规划模型通过以下方式获得:13. A robot path planning device, characterized in that a robot path planning model is provided in the robot path planning device; wherein the robot path planning model is obtained by: 将真实环境和机器人状态量子化,包括将现实世界的环境和机器人状态动态映射到量子领域建立量子域;Quantize the real environment and robot state, including dynamically mapping the real-world environment and robot state to the quantum domain to establish a quantum domain; 基于预设的量子搜索模型,利用量子叠加原理在所述量子域中进行搜索并获得多个搜索样本,每个搜索样本对应有机器人到达目标位置的概率;Based on a preset quantum search model, the quantum superposition principle is used to search in the quantum domain and obtain multiple search samples, each of which corresponds to the probability that the robot reaches the target position; 根据所述机器人到达目标位置的概率,从多个搜索样本中确定一个或多个目标搜索样本;Determine one or more target search samples from a plurality of search samples according to the probability that the robot reaches the target position; 预先确定基础强化学习模型,并利用所述量子搜索模型和所述目标搜索样本对所述基础强化学习模型进行训练,并确定训练后的强化学习模型为机器人路径规划模型。A basic reinforcement learning model is predetermined, and the basic reinforcement learning model is trained using the quantum search model and the target search sample, and the trained reinforcement learning model is determined to be a robot path planning model. 14.根据权利要求12所述的机器人路径规划装置,其特征在于,所述机器人路径规划模型基于权利要求2-11中任意一项所述的基于量子搜索和强化学习的机器人路径规划方法获得。14. The robot path planning device according to claim 12 is characterized in that the robot path planning model is obtained based on the robot path planning method based on quantum search and reinforcement learning described in any one of claims 2-11. 15.一种计算机可读存储介质,用于存储程序指令,其特征在于,所述程序指令被配置为被处理器调用而执行如权利要求1至12中任一项所述的方法的步骤。15 . A computer-readable storage medium for storing program instructions, wherein the program instructions are configured to be called by a processor to execute the steps of the method according to any one of claims 1 to 12.
CN202411105626.9A 2024-08-12 2024-08-12 Robot path planning method, device and medium based on quantum search and reinforcement learning Active CN118999570B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411105626.9A CN118999570B (en) 2024-08-12 2024-08-12 Robot path planning method, device and medium based on quantum search and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411105626.9A CN118999570B (en) 2024-08-12 2024-08-12 Robot path planning method, device and medium based on quantum search and reinforcement learning

Publications (2)

Publication Number Publication Date
CN118999570A true CN118999570A (en) 2024-11-22
CN118999570B CN118999570B (en) 2025-04-22

Family

ID=93477576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411105626.9A Active CN118999570B (en) 2024-08-12 2024-08-12 Robot path planning method, device and medium based on quantum search and reinforcement learning

Country Status (1)

Country Link
CN (1) CN118999570B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120279557A (en) * 2025-03-19 2025-07-08 北京科曼达电子技术有限公司 Mobile terminal information extraction system and method based on OCR and local large language model

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1672171A (en) * 2002-07-31 2005-09-21 雅马哈发动机株式会社 Intelligent electromechanical control suspension system based on quantum soft computing
CN111260065A (en) * 2020-01-09 2020-06-09 湖北师范大学 A two-dimensional quantum convolution calculation method
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 A path planning method and system based on reinforcement learning and heuristic search
CN112651508A (en) * 2020-01-10 2021-04-13 腾讯科技(深圳)有限公司 Prediction method, device, equipment and storage medium of adiabatic evolution path
US20210365606A1 (en) * 2019-12-09 2021-11-25 Nanjing Normal University Quantum computing method for expressway traffic flow distribution simulation considering destination selection
CN115409187A (en) * 2022-08-03 2022-11-29 中国人民解放军战略支援部队信息工程大学 Quantum machine learning simulation method and system based on OpenMP parallel model
CN115907016A (en) * 2021-09-30 2023-04-04 合肥本源量子计算科技有限责任公司 A method and related device for searching target range value based on quantum computing
CN117313887A (en) * 2023-10-27 2023-12-29 成都信息工程大学 Quantum measurement learning method based on fuzzy learning
CN118014123A (en) * 2024-01-12 2024-05-10 深圳新奥能源科技有限公司 Energy consumption data early warning method, system, equipment and medium based on smart home
CN118426455A (en) * 2023-06-12 2024-08-02 中科视拓(北京)科技有限公司 Underwater path planning method based on quantum behavior particle swarm algorithm

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1672171A (en) * 2002-07-31 2005-09-21 雅马哈发动机株式会社 Intelligent electromechanical control suspension system based on quantum soft computing
US20210365606A1 (en) * 2019-12-09 2021-11-25 Nanjing Normal University Quantum computing method for expressway traffic flow distribution simulation considering destination selection
CN111260065A (en) * 2020-01-09 2020-06-09 湖北师范大学 A two-dimensional quantum convolution calculation method
CN112651508A (en) * 2020-01-10 2021-04-13 腾讯科技(深圳)有限公司 Prediction method, device, equipment and storage medium of adiabatic evolution path
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 A path planning method and system based on reinforcement learning and heuristic search
CN115907016A (en) * 2021-09-30 2023-04-04 合肥本源量子计算科技有限责任公司 A method and related device for searching target range value based on quantum computing
CN115409187A (en) * 2022-08-03 2022-11-29 中国人民解放军战略支援部队信息工程大学 Quantum machine learning simulation method and system based on OpenMP parallel model
CN118426455A (en) * 2023-06-12 2024-08-02 中科视拓(北京)科技有限公司 Underwater path planning method based on quantum behavior particle swarm algorithm
CN117313887A (en) * 2023-10-27 2023-12-29 成都信息工程大学 Quantum measurement learning method based on fuzzy learning
CN118014123A (en) * 2024-01-12 2024-05-10 深圳新奥能源科技有限公司 Energy consumption data early warning method, system, equipment and medium based on smart home

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHAO HUANG 等: "Quantum Exploration-based Reinforcement Learning for Efficient Robot Path Planning in Sparse-Reward Environment", 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, 30 August 2024 (2024-08-30) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120279557A (en) * 2025-03-19 2025-07-08 北京科曼达电子技术有限公司 Mobile terminal information extraction system and method based on OCR and local large language model

Also Published As

Publication number Publication date
CN118999570B (en) 2025-04-22

Similar Documents

Publication Publication Date Title
Mousavi et al. Traffic light control using deep policy‐gradient and value‐function‐based reinforcement learning
CN109540150B (en) A multi-robot path planning method applied to hazardous chemicals environment
Husbands et al. Controlling Autonomous Robots
Zhao et al. ASPW-DRL: assembly sequence planning for workpieces via a deep reinforcement learning approach
CN111664852A (en) Unmanned aerial vehicle path planning method and device
Nearchou Adaptive navigation of autonomous vehicles using evolutionary algorithms
CN117371895A (en) Multi-ground unmanned vehicle path planning method, system and medium in unknown environment
CN118999570B (en) Robot path planning method, device and medium based on quantum search and reinforcement learning
Wang et al. Research on dynamic path planning of wheeled robot based on deep reinforcement learning on the slope ground
Zhao et al. An improved ant colony algorithm based on Q-Learning for route planning of autonomous vehicle
Handley The genetic planner: The automatic generation of plans for a mobile robot via genetic programming
Paul et al. Deep reinforcement learning based cooperative control of traffic signal for multi‐intersection network in intelligent transportation system using edge computing
Sinkar et al. Multi-agent path finding using dynamic distributed deep learning model
Jin et al. WOA-AGA algorithm design for robot path planning
CN117688823B (en) Rock-soil particle track prediction method, electronic equipment and medium
Liu Shortest path selection algorithm for cold chain logistics transportation based on improved artificial bee colony
Cao et al. Research on Optimization Model of Supply Chain Robot Task Allocation Based on Genetic Algorithm and Software Practice
Fihri et al. Particle Swarm Algorithm Setting using Deep Reinforcement Learning in the Artificial Neural Network Optimization Learning Process.
Talamini et al. Selfish vs. global behavior promotion in car controller evolution
Setiawan et al. Exploring Deep Q-Network for Autonomous Driving Simulation Across Different Driving Modes
CN118670400B (en) Multi-agent route planning method and device based on artificial potential field and PPO
Enache et al. Creating an Artificial Evolutionary Intelligence Using a Graphic Engine
Davis DEVELOPMENT OF A 3D SIMULATION TOOL LEVERAGING GENETIC ALGORITHMS FOR NEURAL NETWORK OPTIMIZATION IN AUTOMATED DRIVING SYSTEMS: AN APPLICATION FOR SCENARIO ANALYSIS
Ibrahim et al. Features of microscopic pedestrian movement in a panic situation based on cellular automata model
Sarti Neuroevolution trajectory networks: illuminating the evolution of artificial neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant