CN119557599A

CN119557599A - Explainable UAV mission decision method and device

Info

Publication number: CN119557599A
Application number: CN202510088173.1A
Authority: CN
Inventors: 杨阳; 蔡怀广; 白江波; 章路; 张文生
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2025-01-20
Filing date: 2025-01-20
Publication date: 2025-03-04

Abstract

The application discloses an interpretable unmanned aerial vehicle task decision method and device. The unmanned aerial vehicle task decision method comprises the steps of obtaining a first input data set aiming at an unmanned aerial vehicle decision task, inputting the first input data set into a first preset decision model constructed in advance to obtain a first strategy data set used for outputting a decision result, wherein the first strategy data set comprises an adjustment scheme used for indicating the next action of the unmanned aerial vehicle aiming at a target task, inputting the first strategy data set and the first input data set into a feature analysis model based on an SHAP algorithm to obtain a decision feature data set aiming at the unmanned aerial vehicle decision task, wherein the decision feature data set comprises unmanned aerial vehicle decision feature data corresponding to the adjustment scheme, and constructing a second preset decision model based on a decision tree algorithm by utilizing the decision feature data set to obtain decision tree data used for presenting a decision path and a decision basis aiming at the adjustment scheme as an interpretation data set used for interpreting the unmanned aerial vehicle decision task.

Description

Interpretable unmanned aerial vehicle task decision method and device

Technical Field

The present disclosure relates generally to the field of artificial intelligence algorithm technology and natural language processing technology, and more particularly, to an interpretable unmanned aerial vehicle task decision method and apparatus.

Background

In the related art, with the rapid development of artificial intelligence related technologies, the processing of various intelligent tasks is becoming a popular problem. Some intelligent decision methods and systems have been proposed for processing unmanned aerial vehicle tasks in the unmanned aerial vehicle field. Traditional intelligent decision systems rely primarily on rule base systems and expert systems to make decisions via predefined rule sets and logic sets. These decision systems, while having a strong interpretability and exhibiting a strong adaptability in some specific scenarios, are difficult to cope with complex environments such as unmanned aerial vehicle task decision-making due to their lack of adaptivity and flexibility.

For decisions of unmanned aerial vehicle tasks including complex environments, there is a need for an interpretive study to achieve such unmanned aerial vehicle task related decisions, and an improved unmanned aerial vehicle task related decision method is needed.

Disclosure of Invention

The embodiment of the disclosure provides an interpretable unmanned aerial vehicle task decision method and device, which provide comprehensive interpretation of a decision process of a complex unmanned aerial vehicle adjustment decision scheme by utilizing a SHAP algorithm to conduct feature analysis on dynamic decision results aiming at unmanned aerial vehicle task data comprising a time sequence and a dynamic environment.

In one general aspect, an interpretable unmanned aerial vehicle task decision method is provided, the unmanned aerial vehicle task decision method comprises the steps of obtaining a first input data set aiming at an unmanned aerial vehicle decision task, wherein the first input data set is unmanned aerial vehicle task data based on a time sequence and a dynamic environment and stored in a form of a table, the unmanned aerial vehicle task data comprises position data aiming at a target task, position data of at least one obstacle related to the target task, current route data related to the target task and time data related to the target task, inputting the first input data set into a pre-built first preset decision model to obtain a first strategy data set for outputting a decision result, the first strategy data set comprises an adjustment scheme for indicating the next action of an unmanned aerial vehicle aiming at the target task, inputting the first strategy data set and the first input data set into a feature analysis model based on an SHAP algorithm to obtain a decision feature data set aiming at the unmanned aerial vehicle decision task, the decision feature data set comprises the feature data of the unmanned aerial vehicle corresponding to the adjustment scheme, and using the pre-built feature tree data set as a basis for constructing an interpretation tree data set aiming at the unmanned aerial vehicle decision task decision by using the preset feature tree algorithm.

Optionally, the step of inputting the first policy data set and the first input data set into a SHAP algorithm-based feature analysis model to obtain a decision feature data set for the unmanned aerial vehicle decision task may include extracting, for each policy in the first policy data set, current input data corresponding to each policy from the first input data set, performing SHAP value-based feature analysis on the current input data of each policy, and performing feature extraction based on marginal contributions of different features corresponding to different policies analyzed under the circumstances, to obtain a feature set with the greatest degree of influence on the current policy of each policy as a decision feature data set for the current input data of each policy, and summarizing the decision feature data set for the current input data of each policy to obtain a decision feature data set for the unmanned aerial vehicle decision task.

Alternatively, the marginal contribution corresponding to each feature included in the feature set having the greatest degree of influence on the current policy of each policy may be calculated by the following formula:

Wherein, And (3) representing a marginal contribution function corresponding to each feature, wherein U represents a utility function obtained after the cooperation of preset feature subsets, N represents a feature complete set, N represents the number of the feature complete set, S represents the number of the feature subset, i represents the sequence number of the current feature, and k is an integer greater than or equal to 1 and less than or equal to N.

Optionally, the construction process of the second preset decision model can comprise the steps of selecting a plurality of features from the decision feature data set, constructing each node of a decision tree according to preset evaluation parameters aiming at the plurality of features, and ending the construction process when a preset termination condition is met.

Alternatively, the first preset decision model may include at least one of a reinforcement learning algorithm-based decision model, a deep learning algorithm-based decision model, and a bayesian network algorithm-based decision model.

Optionally, the first preset decision model may include a decision model based on a reinforcement learning algorithm, and the step of inputting the first input data set into a pre-constructed first preset decision model to obtain a first policy data set for outputting a decision result may include performing reinforcement learning operations including actions, feedback, adjustments, and re-actions based on the first input data set to obtain a decision result satisfying a condition for maximizing a preset cumulative rewards objective as the first policy data set.

Optionally, the first preset decision model may comprise a decision model based on a deep learning algorithm, and the first preset decision model may be trained by acquiring a first sample data set for an unmanned aerial vehicle decision task, the first sample data set being sample unmanned aerial vehicle task data stored in a table form based on a time sequence and a dynamic environment, the sample unmanned aerial vehicle task data comprising position data for a preset task, position data of at least one obstacle related to the preset task, current course data related to the preset task, time data related to the preset task, determining a corresponding sample strategy data set based on the first sample data set, the corresponding sample strategy data set comprising an adjustment scheme for indicating a next action of the unmanned aerial vehicle for the preset task, calculating a preset loss function based on the first sample data set and the corresponding sample strategy data set, and adjusting model parameters of the first preset decision model according to the preset loss function to obtain the trained first preset decision model.

In another general aspect, there is provided an interpretable unmanned aerial vehicle task decision device, the unmanned aerial vehicle task decision device comprising a data acquisition module configured to acquire a first input data set for an unmanned aerial vehicle decision task, the first input data set being unmanned aerial vehicle task data stored in tabular form based on a time sequence and a dynamic environment, the unmanned aerial vehicle task data comprising position data for a target task, position data of at least one obstacle related to the target task, current course data related to the target task, time data related to the target task; the decision generation module is configured to input the first input data set into a pre-built first preset decision model to obtain a first strategy data set for outputting a decision result, the first strategy data set comprises an adjustment scheme for indicating the next action of the unmanned aerial vehicle for a target task, the feature analysis module is configured to input the first strategy data set and the first input data set into a feature analysis model based on an SHAP algorithm to obtain a decision feature data set for the unmanned aerial vehicle decision task, the decision feature data set comprises unmanned aerial vehicle decision feature data corresponding to the adjustment scheme, and the decision interpretation module is configured to construct a second preset decision model based on a decision tree algorithm by utilizing the decision feature data set to obtain decision tree data for presenting a decision path and a decision reason for the adjustment scheme as an interpretation data set for interpreting the unmanned aerial vehicle decision task.

Optionally, the operation of inputting the first policy data set and the first input data set into a feature analysis model based on a SHAP algorithm to obtain a decision feature data set for the unmanned aerial vehicle decision task may include extracting current input data corresponding to each policy from the first input data set for each policy in the first policy data set, performing feature analysis based on a SHAP value on the current input data of each policy, and performing feature extraction based on marginal contributions of different features corresponding to different policies in a situation analyzed, so as to obtain a feature set with the greatest degree of influence on the current policy of each policy as a decision feature data set for the current input data of each policy, and summarizing the decision feature data set for the current input data of each policy to obtain a decision feature data set for the unmanned aerial vehicle decision task.

Optionally, the first preset decision model may include a reinforcement learning algorithm-based decision model, and the operation of the decision generation module inputting the first input data set into a pre-built first preset decision model to obtain a first policy data set for outputting a decision result may include performing reinforcement learning operations including actions, feedback, adjustments, and re-actions based on the first input data set to obtain a decision result satisfying a condition for maximizing a preset cumulative rewards goal as the first policy data set.

Optionally, the first preset decision model may comprise a decision model based on a deep learning algorithm, and the first preset decision model may be trained by acquiring a first sample data set for an unmanned aerial vehicle decision task, the first sample data set being sample unmanned aerial vehicle task data stored in a table form based on a time sequence and a dynamic environment, the sample unmanned aerial vehicle task data comprising position data for a preset task, position data of at least one obstacle related to the preset task, current course data related to the preset task, time data related to the preset task, determining a corresponding sample strategy data set based on the first sample data set, the corresponding sample strategy data set comprising an adjustment scheme for indicating a next action of the unmanned aerial vehicle for the preset task, calculating a preset loss function based on the first sample data set and the corresponding sample strategy data set, and adjusting model parameters of the first preset decision model according to the preset loss function to obtain the trained first preset decision model. In another general aspect, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, enable the interpretable unmanned aerial vehicle task decision method described above.

In another general aspect, there is provided a computer-readable storage medium, which when executed by at least one processor, causes the at least one processor to perform the interpretable unmanned aerial vehicle task decision method as described above.

In another general aspect, there is provided a computing device comprising at least one processor, at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the interpretable unmanned aerial vehicle task decision method as described above.

According to the interpretable unmanned aerial vehicle task decision-making method and device, the SHAP algorithm is utilized to conduct feature analysis on dynamic decision-making results aiming at unmanned aerial vehicle task data comprising time sequences and dynamic environments, so that comprehensive interpretation of decision-making processes of complex unmanned aerial vehicle decision-making schemes is provided. In addition, by using the SHAP algorithm-based feature analysis, a user can clearly know the influence of each unmanned aerial vehicle decision feature corresponding to the unmanned aerial vehicle adjustment scheme on the final unmanned aerial vehicle decision, so that unmanned aerial vehicle adjustment decision logic of the model is understood, and the trust degree of the model is improved. In addition, by utilizing the decision tree model to display the feature segmentation for unmanned aerial vehicle decision and the decision path for unmanned aerial vehicle adjustment scheme, the detailed explanation of each unmanned aerial vehicle adjustment decision point is provided, so that the user can understand and trace the basis of each unmanned aerial vehicle adjustment decision point conveniently.

Drawings

The foregoing and other objects and features of embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings in which the embodiments are shown, in which:

fig. 1 is a flow chart illustrating an interpretable unmanned aerial vehicle task decision method according to an embodiment of the present disclosure;

fig. 2 is a block diagram illustrating an interpretable unmanned aerial vehicle task decision device, according to an embodiment of the present disclosure;

fig. 3 is a block diagram illustrating a computing device according to an embodiment of the present disclosure.

Detailed Description

The following detailed description is provided to assist the reader in obtaining a thorough understanding of the methods, apparatus, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and/or systems described herein will be apparent after an understanding of the present disclosure. For example, the order of operations described herein is merely an example and is not limited to those set forth herein, but may be altered as will be apparent after an understanding of the disclosure of the application, except for operations that must occur in a specific order. Furthermore, descriptions of features known in the art may be omitted for clarity and conciseness.

Reference will now be made in detail to embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments will be described below in order to explain the present disclosure by referring to the figures.

In the related art, as described above, it is difficult for the conventional decision system to cope with a complex dynamic environment in the unmanned aerial vehicle field. Currently, for decisions of unmanned aerial vehicle tasks including complex dynamic environments in this field, there is a need for an improved method of interpretive implementation of such unmanned aerial vehicle task-related decisions.

In order to solve the problems in the related art, the present disclosure proposes an interpretable unmanned aerial vehicle task decision method capable of providing a comprehensive interpretation of a decision process of a complex unmanned aerial vehicle decision scheme by performing feature analysis on a dynamic decision result for unmanned aerial vehicle task data including a time sequence and a dynamic environment by using a SHAP algorithm.

That is, the interpretable unmanned aerial vehicle mission decision method of the present disclosure provides a comprehensive interpretation of the decision process of complex unmanned aerial vehicle decision schemes related to time series and dynamic environments by extracting and analyzing features of a black box model (i.e., a model without interpretability) such as reinforcement learning decision by a SHAP algorithm, and using these extracted features to retrieve the entire process of such a white box model (a model with strong interpretability) such as a decision tree.

An interpretable unmanned aerial vehicle task decision method and apparatus according to embodiments of the present disclosure are described in detail below with reference to fig. 1-3.

Fig. 1 is a flow chart illustrating an interpretable unmanned aerial vehicle task decision method 100, according to an embodiment of the present disclosure.

Referring to fig. 1, in step S101, a first input dataset for a drone decision task is acquired, according to an embodiment of the present disclosure. Here, the first input data set is unmanned aerial vehicle task data based on a time series and a dynamic environment stored in a tabular form. For example, the unmanned mission data includes location data for a target mission, location data for at least one obstacle associated with the target mission, current course data associated with the target mission, time data associated with the target mission.

As an example, the position data for the preset object may include coordinate data, for example, the position data for the target task may include target coordinate data, and the position data of the at least one obstacle may include coordinate data of the at least one obstacle. Further, for example, the time data related to the target task may include different forms of time data, such as time representation data processed according to different rules. Here, it should be noted that the form of data input as a model is defined as a tabular form.

By performing an interpretable decision-making method according to the present disclosure, which will be described in detail later, on a drone decision-making task that requires consideration of a variety of factors (such as target location, obstacle dynamic location, time series, etc.), the drone decision-making system, by analyzing input data including, for example, dynamic location coordinates and dynamic time series, provides decision bases for the drone to perform action adjustments, helping to formulate an optimal adjustment scheme for the drone for subsequent performance actions (e.g., including performed operations and changes in course).

According to an embodiment of the present disclosure, in step S102, a first input data set is input to a first preset decision model constructed in advance, resulting in a first policy data set for outputting a decision result. Here, the first policy data set includes an adjustment scheme for indicating a next action of the drone for the target task.

As an example, the first preset decision model may include at least one of a reinforcement learning algorithm-based decision model, a deep learning algorithm-based decision model, and a bayesian network algorithm-based decision model.

That is, the first preset decision model may include one of a reinforcement learning algorithm-based decision model, a deep learning algorithm-based decision model, and a bayesian network algorithm-based decision model, and preferably may include a reinforcement learning algorithm-based decision model. Further, an exemplary bayesian network algorithm may be a naive bayes algorithm, but is not limited thereto.

According to the embodiment of the disclosure, by performing feature analysis on unmanned aerial vehicle input data corresponding to unmanned aerial vehicle decisions generated by at least one of a reinforcement learning algorithm-based decision model, a deep learning algorithm-based decision model and a bayesian network algorithm-based decision model by using a SHAP (SHAPLEY ADDITIVE exPlanations) algorithm which will be described later, the calculation complexity of interpretation data in the three cases can be reduced, and the calculation efficiency and the real-time application capability of the whole decision method can be improved.

Further, by performing feature analysis on input data corresponding to unmanned plane decisions generated based on a decision model of a deep learning algorithm using a SHAP algorithm to be described later, decision quality and interpretability of the entire model can be improved using feature extraction capability of deep learning.

Further, by performing feature analysis on input data corresponding to unmanned aerial vehicle decisions generated based on decision models of bayesian network algorithms using a SHAP algorithm to be described later, probability relationships between variables can be described using bayesian networks, and unmanned aerial vehicle decisions can be made by inference algorithms.

As one example, the first preset decision model may comprise a reinforcement learning algorithm-based decision model, and in this case, step S102 may further comprise performing reinforcement learning operations including actions, feedback, adjustments, and re-actions based on the first input data set, resulting in a decision result satisfying the condition of maximizing the preset cumulative rewards objective as the first policy data set.

That is, by employing reinforcement learning algorithms, an optimal drone decision strategy is generated from environmental conditions that include dynamic environmental changes associated with the drone mission (e.g., changes in the position and course of the drone over time, changes in the position of obstacles associated with the drone over time, etc.). The system gradually optimizes its decision process by constantly interacting with the environment to achieve the intended goal.

According to the embodiment of the disclosure, by utilizing the decision model based on the reinforcement learning algorithm and the characteristic analysis model based on the SHAP algorithm to combine to explain the complex unmanned aerial vehicle task strategy comprising the time sequence and the dynamic environment, the limitations that the SHAP algorithm is only suitable for sensing tasks and is not suitable for processing scenes comprising data of the time sequence and the dynamic environment and the limitations that the calculation amount is large, the calculation complexity is high and the real-time efficiency is low can be overcome.

As another example, the first preset decision model may include a decision model based on a deep learning algorithm, and in this case, step S102 may further include steps S1021 to S1024:

In step S1021, a first sample dataset for a drone decision task is acquired. Here, the first sample data set is sample unmanned aerial vehicle task data based on a time series and a dynamic environment stored in a table form, for example, the sample unmanned aerial vehicle task data includes position data for a preset task, position data of at least one obstacle related to the preset task, current course data related to the preset task, and time data related to the preset task.

In step S1022, a corresponding sample policy dataset is determined based on the first sample dataset. Here, the corresponding sample policy dataset comprises an adjustment scheme for indicating a next action of the drone for the preset task.

In step S1023, a preset loss function is calculated based on the first sample data set and the corresponding sample policy data set.

In step S1024, the model parameters of the first preset decision model are adjusted according to the preset loss function, so as to obtain a trained first preset decision model.

According to an embodiment of the disclosure, at step S103, the first policy dataset and the first input dataset are input to a SHAP algorithm-based feature analysis model, resulting in a decision feature dataset for the unmanned aerial vehicle decision task. Here, the decision feature data set includes unmanned aerial vehicle decision feature data corresponding to the adjustment scheme.

As an example, step S103 may further include step S1031 and step S1032:

In step S1031, for each policy in the first policy data set, current input data corresponding to each policy is extracted from the first input data set.

In step S1032, feature analysis based on SHAP values is performed on the current input data of each policy, feature extraction is performed based on marginal contributions of different features corresponding to different policies, so as to obtain a feature set with the greatest degree of influence on the current policy of each policy, which is used as a decision feature data set for the current input data of each policy, and the decision feature data sets for the current input data of each policy are summarized, so as to obtain a decision feature data set for the unmanned plane decision task.

Here, the SHAP algorithm provides a unified feature importance measure by assigning the degree of contribution of features to the model output based on Shapley theory. Thus, by using the SHAP algorithm to perform feature analysis on the input data of a single decision strategy, after the decision result is generated, the decision is subjected to feature importance analysis by using the SHAP algorithm.

As an example, the marginal contribution for each feature contained in the feature set that affects the current policy to the greatest extent for each policy may be calculated by the following formula:

(1)

Further, U (S) represents the utility of S after cooperation of the feature subset (e.g., flight path predicted based on neural network, heading adjustment angle predicted based on neural network, etc.).

In addition, the meaning of the above formula (1) is that, for n features, the saproli value of each feature is calculated from the variation of the result of performing the decision (i.e., policy set) (U (S)) when S features are employed. Here, a larger value indicates that the corresponding feature is more important to the determination of such flight path adjustment, the more should be preserved.

For example, in a scenario where there are multiple obstacles related to a target task, features corresponding to parameters of a portion of the multiple obstacles may be more important than features corresponding to parameters of another portion of the multiple obstacles, and thus features corresponding to those parameters of the portion of the multiple obstacles should be more preserved.

According to the embodiment of the disclosure, the influence of each unmanned aerial vehicle decision feature on the final unmanned aerial vehicle decision can be clearly known by a user through the feature analysis based on the SHAP algorithm, so that the decision logic of the model is understood, and the trust degree of the model is improved. For example, for a policy set (adjustment scheme for next action of target task) obtained by reinforcement learning and a data set before reinforcement learning is performed (e.g., various unmanned aerial vehicle information and obstacle information related to target task), which information is more important is analyzed by SHAP algorithm-based feature analysis in the present disclosure.

According to an embodiment of the disclosure, in step S104, a second preset decision model based on a decision tree algorithm is constructed using the decision feature data set, and decision tree data for presenting a decision path and a decision basis for the adjustment scheme is obtained as an interpretation data set for interpreting the unmanned aerial vehicle decision task.

According to embodiments of the present disclosure, a comprehensive interpretation of the decision process of complex unmanned aerial vehicle decision tasks is provided by utilizing the SHAP algorithm to perform feature analysis on dynamic decision results for data including time series and dynamic environments.

In addition, by using the feature analysis result of the SHAP algorithm, the feature subset with the greatest influence on decision making can be extracted, and the features in the feature subset not only help understand the decision logic of the model, but also can be used for further model optimization and debugging.

In addition, by utilizing the decision tree model to display the feature segmentation and the decision paths, detailed explanation of each decision point is provided, and the basis of each decision point is convenient for users to understand and trace back.

For example, the construction process of the second preset decision model may include selecting a plurality of features from the decision feature dataset, constructing each node of the decision tree according to preset evaluation parameters for the plurality of features, ending the construction process until a preset termination condition is met.

According to the embodiment of the disclosure, by constructing the decision tree model by utilizing the characteristics, the decision path and the decision basis for the unmanned aerial vehicle decision are shown through the tree structure, so that each unmanned aerial vehicle decision process can be clearly interpreted and understood, and the local interpretation capability for the unmanned aerial vehicle decision can be further enhanced. In addition, the calculation amount of the decision tree model in the prediction stage is small, the interpretation can be provided quickly, the method is suitable for real-time application, and the actual use efficiency and effect of the whole decision model (or called system) are improved.

Practical applications of the interpretable unmanned aerial vehicle task decision method according to embodiments of the present disclosure are illustrated below in conjunction with table 1 below. In this example, it is assumed that the number of obstacles is three.

TABLE 1

Referring to table 1, the first input data set for the unmanned aerial vehicle decision task includes unmanned aerial vehicle task data, specifically including location data for the target task (e.g., "target_x", "target_y" in table 1), location data for at least one obstacle associated with the target task (e.g., "obstacle 1_X", "obstacle 1_Y", "obstacle 2_X", "obstacle 2_Y", "obstacle 3_X", "obstacle 3_Y" in table 1), current course data associated with the target task (e.g., "current heading" in table 1), and time data associated with the target task (e.g., "time 1", "time 2", "time 3" in table 1).

Specifically, obstacle 1_X and obstacle 1_Y respectively represent X, Y coordinates of a first obstacle, obstacle 2_X and obstacle 2_Y respectively represent X, Y coordinates of a second obstacle, obstacle 3_X and obstacle 3_Y respectively represent X, Y coordinates of a third obstacle, target_x and target_y respectively represent X and Y coordinates of a target position of the unmanned aerial vehicle, a current heading represents a current heading angle of the unmanned aerial vehicle, and time 1, time 2 and time 3 respectively represent different normalized time representations. These parameters together form the basis for navigation and obstacle avoidance decisions of the unmanned aerial vehicle when performing tasks.

In this example, the first preset decision model may be a reinforcement learning based neural network and the first policy data set for outputting the decision result may be, for example, "generate decision" in table 1. Here, "generating a decision" means the final decision output, representing the next action or adjustment of the drone. The decision result is obtained by inputting the parameters into the neural network based on reinforcement learning, and the unmanned aerial vehicle can adjust the flight path of the unmanned aerial vehicle in real time so as to avoid the obstacle and smoothly reach the target position.

Subsequently, by inputting the above parameters and the generated decisions into a SHAP algorithm-based feature analysis model, a decision feature dataset for the unmanned aerial vehicle decision task is obtained, and by calculating the saproliferation values (shapley value) of the respective features, it is possible to obtain, for example, that the absolute values of the saprolion values of the obstacle 3_X, the obstacle 3_Y, the time parameters are smaller, so that these data are secondary compared to other data, can be ignored, and other data will be retained.

For example, the third obstacle has less influence on the decision result than the other two obstacles, and therefore can ignore the corresponding data, and similarly, the three times of different normalization have less influence on the decision result, and therefore can ignore the corresponding data.

Finally, when the decision feature data set is utilized to construct a second preset decision model based on the decision tree algorithm, the data related to the parameters of 'obstacle 1_X', 'obstacle 1_Y', 'obstacle 2_X', 'obstacle 2_Y', 'current heading', 'target_x', 'target_y' can be utilized to construct the second preset decision model based on the decision tree algorithm, so that decision tree data for presenting a decision path and a decision basis for an adjustment scheme of the unmanned aerial vehicle can be obtained as an interpretation data set for interpreting the unmanned aerial vehicle decision task.

Fig. 2 is a block diagram illustrating an interpretable unmanned aerial vehicle task decision device 200, according to an embodiment of the present disclosure.

Referring to fig. 2, an interpretable unmanned aerial vehicle task decision device 200 according to an embodiment of the present disclosure may include a data acquisition module 210, a decision generation module 220, a feature analysis module 230, and a decision interpretation module 240.

According to an embodiment of the present disclosure, the data acquisition module 210 performs the operation of acquiring a first input data set for a drone decision task. Here, the first input data set is unmanned aerial vehicle task data based on a time series and a dynamic environment stored in a tabular form. For example, the unmanned mission data includes location data for a target mission, location data for at least one obstacle associated with the target mission, current course data associated with the target mission, time data associated with the target mission.

According to an embodiment of the present disclosure, the decision generation module 220 performs the operation of inputting a first input data set into a pre-trained first preset decision model, resulting in a first policy data set for outputting decision results. Here, the first policy data set includes an adjustment scheme for indicating a next action of the drone for the target task.

Further, where the first preset decision model comprises a reinforcement learning algorithm-based decision model, the operation of the decision generation module 220 inputting the first input data set into the pre-constructed first preset decision model to obtain a first policy data set for outputting a decision result may comprise performing reinforcement learning operations including actions, feedback, adjustments, and re-actions based on the first input data set to obtain a decision result satisfying a condition for maximizing a preset cumulative rewards objective as the first policy data set.

In addition, in the case where the first preset decision model includes a deep learning algorithm-based decision model, the decision generation module 220 may train the pre-built first preset decision model by the following operations 1) to 4):

In operation 1), a first sample dataset for a drone decision task is acquired. Here, the first sample data set is sample unmanned aerial vehicle task data based on a time series and a dynamic environment stored in a table form, for example, the sample unmanned aerial vehicle task data includes position data for a preset task, position data of at least one obstacle related to the preset task, current course data related to the preset task, and time data related to the preset task.

In operation 2), a corresponding sample policy dataset is determined based on the first sample dataset. Here, the corresponding sample policy dataset comprises an adjustment scheme for indicating a next action of the drone for the preset task.

In operation 3), a preset loss function is calculated based on the first sample data set and the corresponding sample policy data set.

In operation 4), according to the preset loss function, the model parameters of the first preset decision model are adjusted to obtain a trained first preset decision model.

According to an embodiment of the present disclosure, the feature analysis module 230 performs the operation of inputting the first policy dataset and the first input dataset into a SHAP algorithm-based feature analysis model, resulting in a decision feature dataset for the unmanned aerial vehicle decision task. Here, the decision feature data set includes unmanned aerial vehicle decision feature data corresponding to the adjustment scheme.

Here, as an example, the operation of the feature analysis module 230 to input the first policy dataset and the first input dataset to the SHAP algorithm-based feature analysis model to obtain the decision feature dataset for the unmanned aerial vehicle decision task may include extracting current input data corresponding to each policy from the first input dataset for each policy in the first policy dataset, performing SHAP value-based feature analysis on the current input data of each policy, and performing feature extraction based on marginal contributions of different features corresponding to the analyzed policies in a context to obtain a feature set having the greatest degree of influence on the current policy of each policy as a decision feature dataset for the current input data of each policy, and summarizing the decision feature dataset for the current input data of each policy to obtain the decision feature dataset for the unmanned aerial vehicle decision task.

As an example, the value of the marginal contribution corresponding to each feature included in the feature set having the greatest degree of influence on the current policy of each policy may be calculated by the above formula (1).

According to an embodiment of the present disclosure, the decision interpretation module 240 performs the operation of constructing a second preset decision model based on a decision tree algorithm using the decision feature data set, resulting in decision tree data for presenting a decision path and decision reasons for the adjustment scheme as an interpretation data set for interpreting the unmanned aerial vehicle decision task.

By way of example, the construction process of the second preset decision model may comprise selecting a plurality of features from the decision feature dataset, constructing the individual nodes of the decision tree according to preset evaluation parameters for the plurality of features, ending the construction process until a preset termination condition is met.

It should be noted that the operations performed with respect to the above-described respective structural blocks may be similar to those described with reference to fig. 1, and will not be repeated here.

Fig. 3 is a block diagram illustrating a computing device 300 according to an embodiment of the disclosure.

Referring to fig. 3, a computing device 300 according to an embodiment of the present disclosure may include a processor 310 and a memory 320. The processor 310 may include, but is not limited to, a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a microcomputer, a Field Programmable Gate Array (FPGA), a system on a chip (SoC), a microprocessor, an Application Specific Integrated Circuit (ASIC), etc. Memory 320 may store computer-executable instructions to be executed by processor 310. Memory 320 includes high-speed random access memory and/or nonvolatile computer readable storage media. When the processor 310 executes computer-executable instructions stored in the memory 320, an interpretable drone task decision method as described above may be implemented.

An interpretable unmanned aerial vehicle task decision method according to an embodiment of the present disclosure can be written as computer programs/instructions to form a computer program product and stored on a computer readable storage medium. When executed by a processor, the computer program/instructions may implement the interpretable unmanned aerial vehicle task decision method described above. The instructions in the computer-readable storage medium, when executed by a processor of an electronic device/server, enable the electronic device/server to perform the interpretable unmanned aerial vehicle task decision method as described above. Examples of computer readable storage media include read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk memory, hard Disk Drive (HDD), solid State Disk (SSD), card memory (such as a multimedia card, secure Digital (SD) card, or ultra-digital (XD) card), magnetic tape, floppy disk, magneto-optical data storage device, hard disk, solid state disk, and any other device configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data processors or data structures to a computer or processor to execute the computer programs. In one example, the computer program and any associated data, data files, and data structures are distributed across networked computer systems such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed manner by one or more processors or computers.

According to the interpretable unmanned aerial vehicle task decision-making method and device, the SHAP algorithm is utilized to conduct feature analysis on dynamic decision-making results aiming at unmanned aerial vehicle task data comprising time sequences and dynamic environments, so that comprehensive interpretation of decision-making processes of complex unmanned aerial vehicle decision-making schemes is provided.

On the other hand, by using the SHAP algorithm-based feature analysis, a user can clearly know the influence of each unmanned aerial vehicle decision feature corresponding to the unmanned aerial vehicle adjustment scheme on the final unmanned aerial vehicle decision, so that unmanned aerial vehicle decision logic of the model is understood, and the trust degree of the model is improved.

On the other hand, by utilizing the decision tree model to display the feature segmentation for unmanned aerial vehicle decision and the decision path for unmanned aerial vehicle adjustment scheme, the detailed explanation of each unmanned aerial vehicle decision point is provided, and the basis of each unmanned aerial vehicle decision point is convenient for a user to understand and trace back.

On the other hand, by optimizing the calculation of the SHAP value and the construction of the decision tree model, the calculation complexity is reduced, the real-time application efficiency of the whole interpretable unmanned aerial vehicle decision model is improved, and the whole interpretable unmanned aerial vehicle decision model can quickly provide the interpretation related to the adjustment scheme of the respective unmanned aerial vehicle in actual use.

On the other hand, regarding the application of the interpretable unmanned aerial vehicle task decision method and apparatus according to the embodiments of the present disclosure, they can be used in various fields of data relating to time series and dynamic environments, for example, other data processing fields other than unmanned aerial vehicle fields, and the like.

Although a few embodiments of the present disclosure have been disclosed and described, it would be appreciated by those skilled in the art that changes and modifications may be made in these embodiments without departing from the spirit and scope of the present disclosure, the scope of which is defined in the claims and their equivalents.

Claims

1. An explainable UAV mission decision method, characterized in that the UAV mission decision method comprises:

Acquire a first input data set for a drone decision task, wherein the first input data set is drone task data based on a time series and a dynamic environment stored in a table form, wherein the drone task data includes position data for a target task, position data of at least one obstacle related to the target task, current route data related to the target task, and time data related to the target task;

Inputting the first input data set into a pre-built first preset decision model to obtain a first strategy data set for outputting a decision result, wherein the first strategy data set includes an adjustment plan for instructing the next action of the drone for the target task;

Inputting the first strategy data set and the first input data set into a feature analysis model based on the SHAP algorithm to obtain a decision feature data set for the UAV decision task, wherein the decision feature data set includes UAV decision feature data corresponding to the adjustment scheme;

The decision feature data set is used to construct a second preset decision model based on a decision tree algorithm, and decision tree data for presenting a decision path and a decision basis for the adjustment plan are obtained as an explanation data set for explaining the drone decision task.

2. The UAV task decision method according to claim 1, characterized in that the step of inputting the first strategy data set and the first input data set into a feature analysis model based on the SHAP algorithm to obtain a decision feature data set for the UAV decision task comprises:

For each strategy in the first strategy data set, extracting current input data corresponding to each strategy from the first input data set;

A feature analysis based on the SHAP value is performed on the current input data of each strategy, and feature extraction is performed based on the marginal contribution of the analyzed different features in the situations corresponding to different strategies, so as to obtain the feature set with the greatest impact on the current strategy of each strategy, which is used as the decision feature data set for the current input data of each strategy, and the decision feature data sets for the current input data of each strategy are summarized to obtain the decision feature data set for the UAV decision task.

3. The UAV task decision method according to claim 2 is characterized in that the marginal contribution corresponding to each feature included in the feature set with the greatest influence on the current strategy of each strategy is calculated by the following formula:

in, represents the marginal contribution function corresponding to each feature, U represents the utility function obtained after the preset feature subsets cooperate, N represents the full set of features, n represents the number of full sets of features, S represents the number of feature subsets, i represents the sequence number of the current feature, and k is an integer greater than or equal to 1 and less than or equal to n .

4. The UAV task decision method according to claim 1, characterized in that the construction process of the second preset decision model comprises:

Selecting a plurality of features from the decision feature dataset;

According to the preset evaluation parameters for the multiple features, each node of the decision tree is constructed until the construction process ends when a preset termination condition is met.

5. The drone mission decision method according to claim 1 is characterized in that the first preset decision model includes at least one of a decision model based on a reinforcement learning algorithm, a decision model based on a deep learning algorithm, and a decision model based on a Bayesian network algorithm.

6. The UAV task decision method according to claim 5, characterized in that the first preset decision model includes a decision model based on a reinforcement learning algorithm, and the step of inputting the first input data set into the pre-built first preset decision model to obtain a first strategy data set for outputting a decision result comprises:

Based on the first input data set, reinforcement learning operations including action, feedback, adjustment, and re-action are performed to obtain a decision result that satisfies the conditions for maximizing the preset cumulative reward target as the first strategy data set.

7. The UAV task decision method according to claim 6, characterized in that the first preset decision model comprises a decision model based on a deep learning algorithm, and the first preset decision model is trained in the following manner:

Acquire a first sample data set for a drone decision task, wherein the first sample data set is sample drone task data based on a time series and a dynamic environment stored in a table form, wherein the sample drone task data includes position data for a preset task, position data of at least one obstacle related to the preset task, current route data related to the preset task, and time data related to the preset task;

Based on the first sample data set, determining a corresponding sample strategy data set, the corresponding sample strategy data set including an adjustment scheme for instructing the next action of the drone for a preset task;

Calculating a preset loss function based on the first sample data set and the corresponding sample strategy data set;

According to the preset loss function, the model parameters of the first preset decision model are adjusted to obtain the trained first preset decision model.

8. An explainable UAV mission decision device, characterized in that the UAV mission decision device comprises:

The data acquisition module is configured to: acquire a first input data set for a drone decision task, wherein the first input data set is drone task data based on a time series and a dynamic environment stored in a table form, wherein the drone task data includes position data for a target task, position data of at least one obstacle related to the target task, current route data related to the target task, and time data related to the target task;

A decision generation module is configured to: input the first input data set into a pre-built first preset decision model to obtain a first strategy data set for outputting a decision result, wherein the first strategy data set includes an adjustment plan for instructing the next action of the drone for the target task;

A feature analysis module is configured to: input the first strategy data set and the first input data set into a feature analysis model based on the SHAP algorithm to obtain a decision feature data set for the UAV decision task, wherein the decision feature data set includes UAV decision feature data corresponding to the adjustment scheme;

The decision interpretation module is configured to: use the decision feature data set to construct a second preset decision model based on a decision tree algorithm, and obtain decision tree data for presenting the decision path and decision reasons for the adjustment plan as an explanation data set for explaining the drone decision task.

9. A computer program product, characterized in that the computer program product comprises a computer program/instruction, which, when executed by a processor, implements the explainable UAV mission decision method as described in any one of claims 1 to 7.

10. A computing device, characterized in that the computing device comprises: at least one processor; at least one memory storing computer executable instructions, wherein the computer executable instructions, when executed by the at least one processor, prompt the at least one processor to execute the explainable drone mission decision method as described in any one of claims 1 to 7.