CN111753303B

CN111753303B - A multi-granularity code vulnerability detection method based on deep learning and reinforcement learning

Info

Publication number: CN111753303B
Application number: CN202010747186.2A
Authority: CN
Inventors: 蒋远; 苏小红; 王甜甜
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2020-07-29
Filing date: 2020-07-29
Publication date: 2023-02-07
Anticipated expiration: 2040-07-29
Also published as: CN111753303A

Abstract

The invention discloses a multi-granularity code vulnerability detection method based on deep learning and reinforcement learning, which comprises the following steps: 1) Analyzing the source code to obtain an intermediate code representation corresponding to the code; 2) Slicing the intermediate code to obtain a code segment smaller than the source program; 3) Converting an input code segment into a low-dimensional continuous real-valued vector by using a code segment representation method; 4) Inputting the vector representation of the code segment into a coarse-grained code vulnerability detection model based on deep learning, and judging whether the code segment contains defects; 5) And constructing a fine-grained code vulnerability detection model based on reinforcement learning, and predicting code lines which specifically cause vulnerabilities in code segments containing defects. The invention provides a complete multi-granularity code vulnerability detection framework, applies reinforcement learning to the field of fine-granularity code vulnerability detection for the first time, and provides a new code segmentation representation learning model to fully utilize semantic information of a program, thereby improving the accuracy and the practicability of vulnerability detection.

Description

A multi-granularity code vulnerability detection method based on deep learning and reinforcement learning

技术领域technical field

本发明涉及一种代码漏洞检测方法，具体涉及一种基于深度学习和强化学习技术的多粒度代码漏洞检测方法。The invention relates to a code loophole detection method, in particular to a multi-granularity code loophole detection method based on deep learning and reinforcement learning technology.

背景技术Background technique

软件漏洞是指软件在其生命周期中存在的缺陷，而这些缺陷可能会被不法分子利用，绕过系统的访问控制，非法窃取较高的权限从而任意操纵系统，如触发特权命令、访问敏感信息、冒充身份、监听系统运行等。若和安全相关的漏洞无法被及时地识别和修复，则会导致漏洞容易被恶意攻击者所利用，使得系统被入侵进而造成系统运行结果不可靠，或者任意命令执行、任意文件读取等严重安全问题。Software vulnerabilities refer to defects in software during its life cycle, and these defects may be exploited by criminals to bypass system access control, illegally steal higher privileges and manipulate the system arbitrarily, such as triggering privileged commands and accessing sensitive information , impersonate identity, monitor system operation, etc. If security-related vulnerabilities cannot be identified and repaired in a timely manner, the vulnerabilities will be easily exploited by malicious attackers, causing the system to be invaded and resulting in unreliable system operation results, or serious security issues such as arbitrary command execution and arbitrary file reading. question.

代码分析是检查并发现软件代码中的内在缺陷和利用途径的主要手段，一直是信息安全和软件安全领域的研究热点。但是随着现有软件系统愈加复杂庞大，漏洞出现频率和黑客的攻击手段不断提升，传统的基于预定义规则的漏洞检测工具已无法满足现代软件发展的需要，越来越多的研究人员开始关注基于机器学习和深度学习的代码漏洞检测方法。基于机器学习的漏洞检测方法依赖专家手工定义代码特征(例如，软件复杂性度量(software complexity metrics)，函数调用(function call)，代码更改(code changes)和系统调用(system calls))，然后采用机器学习模型自动对漏洞代码和无漏洞代码进行分类。但由于代码特征的定义具有较强的主观性，因此通常只适用于特定的项目，泛化能力较差。并且输入机器学习模型的代码粒度通常较粗，无法确定漏洞代码行的确切位置。基于深度学习的漏洞检测方法不需要专家手工定义特征，可以从大量的历史数据中自动生成漏洞模式，有望改变软件源代码漏洞检测方法，使面向各种类型漏洞的漏洞模式从依赖专家手工定义向自动生成转变，并且显著提高漏洞检测的有效性。然而，该方法的相关研究刚刚起步，大部分研究都集中在较粗粒度的代码漏洞检测上，比如说在函数级别或者文件级别上检测代码漏洞，而对代码的“漏洞结构”的相关研究则非常欠缺。漏洞结构不仅能够使检测工具判断代码是否包含漏洞，还能够指出漏洞在代码中存在的具体形式以及漏洞发生的位置。为了让基于深度学习的漏洞检测方法能够在实际中更好的应用，对代码漏洞结构的研究显得很有必要。Code analysis is the main method to check and discover the inherent defects and utilization methods in software codes, and has always been a research hotspot in the field of information security and software security. However, as the existing software systems become more and more complex, the frequency of vulnerabilities and hackers' attack methods continue to increase, the traditional vulnerability detection tools based on predefined rules can no longer meet the needs of modern software development, and more and more researchers have begun to pay attention to Code vulnerability detection method based on machine learning and deep learning. Machine learning-based vulnerability detection methods rely on experts to manually define code characteristics (for example, software complexity metrics (software complexity metrics), function calls (function calls), code changes (code changes) and system calls (system calls)), and then use Machine learning models automatically classify vulnerable and non-vulnerable code. However, because the definition of code features is highly subjective, it is usually only applicable to specific projects and has poor generalization ability. Moreover, the granularity of the code input to the machine learning model is usually coarse, and it is impossible to determine the exact location of the vulnerable code line. Vulnerability detection methods based on deep learning do not require experts to manually define features, and can automatically generate vulnerability patterns from a large amount of historical data. Automatically generate transitions and significantly improve the effectiveness of vulnerability detection. However, related research on this method has just started, most of which focus on coarser-grained code vulnerability detection, such as detecting code vulnerabilities at the function level or file level, while related research on the "vulnerability structure" of code is Very lacking. The vulnerability structure not only enables the detection tool to judge whether the code contains a vulnerability, but also points out the specific form of the vulnerability in the code and the location where the vulnerability occurs. In order to make the vulnerability detection method based on deep learning better applied in practice, it is necessary to study the structure of code vulnerabilities.

文献VulDeeLocator(Z.Li,D.Zou,S.Xu,Z.Chen,Y.Zhu,and H.Jin,VulDeeLocator:A Deep Learning-based Fine-grained Vulnerability Detector,arXivpreprint arXiv:2001.02350,2020)是目前能够检索到的唯一一篇基于深度学习实现语句级别细粒度漏洞定位的文献，该方法通过在深度神经网络模型训练时加入漏洞语句的位置信息，使模型在训练时将注意力集中在漏洞相关的语句上，来实现语句级别的细粒度漏洞检测。但是从实验效果来看VulDeeLocator在细粒度检测上并没有明显优于传统的基于规则的漏洞检测工具，这是由于该方法的网络结构未能充分的捕捉程序的语义信息以及和漏洞相关的代码结构。The literature VulDeeLocator (Z.Li, D.Zou, S.Xu, Z.Chen, Y.Zhu, and H.Jin, VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector, arXivpreprint arXiv:2001.02350, 2020) is currently The only document that can be retrieved is based on deep learning to achieve fine-grained vulnerability location at the sentence level. This method adds the location information of the vulnerability sentence to the deep neural network model training, so that the model can focus on the vulnerability-related information during training. statement to implement statement-level fine-grained vulnerability detection. However, from the experimental results, VulDeeLocator is not significantly better than traditional rule-based vulnerability detection tools in fine-grained detection. This is because the network structure of this method cannot fully capture the semantic information of the program and the code structure related to the vulnerability. .

发明内容Contents of the invention

本发明的目的是提供一种基于深度学习和强化学习的多粒度代码漏洞检测方法，不仅能够在较粗的粒度上检测出含有缺陷的软件模块(例如，函数)，而且还能在更细的粒度水平上定位到模块中可能造成漏洞的代码语句，即实现多粒度的代码漏洞检测。另外，由于漏洞检测模型的准确性依赖于模型输入的准确性，即代码片段表示的准确性。针对以往基于标记(token)的代码表示方法未能充分利用代码语义信息的问题，本发明提出了一种新的代码分段表示学习方法(Staged Code Representation Learning)。该方法首先学习代码中每条语句的向量表示，然后在每条语句向量表示的基础上学习整个程序的向量表示。通过将语句的向量表示和程序的向量表示分开使得学习到的代码向量能够捕获到程序更细微的结构(句法)差异以及更复杂的语义信息。The purpose of the present invention is to provide a multi-granularity code vulnerability detection method based on deep learning and reinforcement learning, which can not only detect defective software modules (for example, functions) at a coarser granularity, but also At the granular level, the code statements that may cause vulnerabilities in the module are located, that is, multi-granularity code vulnerability detection is realized. In addition, since the accuracy of the vulnerability detection model depends on the accuracy of the model input, that is, the accuracy of the code fragment representation. Aiming at the problem that the previous token-based code representation methods fail to make full use of code semantic information, the present invention proposes a new code segment representation learning method (Staged Code Representation Learning). The method first learns a vector representation of each statement in the code, and then learns a vector representation of the entire program based on the vector representation of each statement. By separating the vector representation of the statement and the vector representation of the program, the learned code vector can capture the subtler structural (syntactic) differences and more complex semantic information of the program.

本发明的目的是通过以下技术方案实现的：The purpose of the present invention is achieved through the following technical solutions:

一种基于深度学习和强化学习的多粒度代码漏洞检测方法，首先，解析源代码，获取代码对应的中间代码表示形式。与源代码的表示形式相比，使用中间代码表示能够捕获更多的程序控制流和变量定义-使用信息。其次，以可能引发漏洞的关键点(key points)为切片标准，对中间代码进行切片，获取比源程序更小的代码段(code gadgets)，以缩小模型输入序列的长度，避免漏洞无关语句对重要信息的影响。再次，使用本发明提出的代码分段表示学习模型将输入的代码段转化为低维连续的实值向量。然后将代码段的向量表示输入到基于深度学习的粗粒度代码漏洞检测模型中，判断输入的向量对应的代码片段是否包含漏洞。最后，如果粗粒度模型检测到输入的代码段包含漏洞，则继续由基于强化学习的细粒度检测模型进行下一步判断，即找出造成漏洞的可能代码行。A multi-granularity code vulnerability detection method based on deep learning and reinforcement learning. First, source code is parsed to obtain an intermediate code representation corresponding to the code. Compared with source code representation, using intermediate code representation can capture more program control flow and variable definition-use information. Second, take the key points that may cause vulnerabilities as the slicing standard, slice the intermediate code, and obtain code segments (code gadgets) that are smaller than the source program, so as to reduce the length of the model input sequence and avoid loophole-related statements. impact of important information. Thirdly, the code segmentation representation learning model proposed by the present invention is used to convert the input code segment into a low-dimensional continuous real-valued vector. Then the vector representation of the code segment is input into the coarse-grained code vulnerability detection model based on deep learning, and it is judged whether the code segment corresponding to the input vector contains a vulnerability. Finally, if the coarse-grained model detects that the input code segment contains a vulnerability, the fine-grained detection model based on reinforcement learning will continue to make the next step of judgment, that is, to find out the possible code lines that cause the vulnerability.

具体包括如下步骤：Specifically include the following steps:

步骤1：使用Clang工具对源程序进行静态分析，获取程序的中间代码表示形式；Step 1: Use the Clang tool to statically analyze the source program to obtain the intermediate code representation of the program;

步骤2：提取可能造成漏洞的关键点，生成切片标准，然后对中间代码进行切片，合并前向切片和后向切片得到程序的代码段；Step 2: Extract the key points that may cause vulnerabilities, generate slicing standards, then slice the intermediate code, and merge the forward slicing and backward slicing to obtain the code segment of the program;

步骤3：使用代码分段表示学习方法将代码段表示成低维连续的实值向量；Step 3: Use the code segmentation representation learning method to represent the code segment as a low-dimensional continuous real-valued vector;

步骤4：将代码段的向量表示输入到基于深度学习的粗粒度代码漏洞检测模型中，判断代码段是否包含漏洞；Step 4: Input the vector representation of the code segment into the coarse-grained code vulnerability detection model based on deep learning to determine whether the code segment contains vulnerabilities;

步骤5：构建基于强化学习的细粒度漏洞检测模型，预测含有缺陷的代码段中具体引发漏洞的代码行。Step 5: Construct a fine-grained vulnerability detection model based on reinforcement learning to predict the code line that specifically causes the vulnerability in the code segment containing the defect.

相比于现有技术，本发明具有如下优点：Compared with the prior art, the present invention has the following advantages:

1、本发明提出一种新型的多粒度代码漏洞检测方法及框架，与现有的只能在较粗的粒度水平上进行漏洞检测的方法相比，其好处在于它既能够完成粗粒度的漏洞代码模块检测，又能实现细粒度代码漏洞语句的定位，提高了基于数据驱动的漏洞检测方法的实用性以及模型预测结果的可解释性。1. The present invention proposes a new multi-granularity code vulnerability detection method and framework. Compared with the existing method that can only detect vulnerabilities at a coarser granularity level, its advantage is that it can complete coarse-grained vulnerability detection. Code module detection can also realize the location of fine-grained code vulnerability statements, which improves the practicability of data-driven vulnerability detection methods and the interpretability of model prediction results.

2、本发明提出一种新型的代码分段表示学习方法，与现有的基于标记(token)的代码表示方法相比，其好处在于它能充分利用程序的局部和全局语义信息，提高所生成的代码表示向量的准确度，进而也提高了模型的漏洞检测能力。2. The present invention proposes a novel code segmentation representation learning method. Compared with the existing token-based code representation method, its advantage is that it can make full use of the local and global semantic information of the program and improve the generated The accuracy of the code representation vector, which in turn improves the vulnerability detection ability of the model.

3、本发明首次提出将强化学习技术应用于细粒度代码漏洞检测任务上，通过在训练数据实例上不断尝试各种可能的漏洞语句组合，并将每种组合所包含的漏洞语句数量作为调整模型行为的信号反馈给策略(Policy)模块，通过不断的信号积累，使得模型能够自动学习得到和漏洞相关的代码结构。3. The present invention proposes for the first time the application of reinforcement learning technology to fine-grained code vulnerability detection tasks, by continuously trying various possible combinations of vulnerability statements on training data instances, and using the number of vulnerability statements contained in each combination as an adjustment model Behavior signals are fed back to the Policy module, and through continuous signal accumulation, the model can automatically learn the code structure related to the vulnerability.

附图说明Description of drawings

图1是本发明提出的多粒度代码漏洞检测方法的总体流程图。FIG. 1 is an overall flow chart of the multi-granularity code vulnerability detection method proposed by the present invention.

图2是函数调用关键点在语法树中对应的语句节点。Figure 2 is the statement node corresponding to the key point of the function call in the syntax tree.

图3是数组定义关键点在语法树中对应的语句节点。Fig. 3 is the statement node corresponding to the key point of array definition in the syntax tree.

图4是指针定义关键点在语法树中对应的语句节点。Fig. 4 is the statement node corresponding to the pointer definition key point in the syntax tree.

图5是赋值表达式关键点在语法树中对应的语句节点。Fig. 5 is the statement node corresponding to the key point of the assignment expression in the syntax tree.

具体实施方式Detailed ways

下面结合附图对本发明的技术方案作进一步的说明，但并不局限于此，凡是对本发明技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和范围，均应涵盖在本发明的保护范围中。The technical solution of the present invention will be further described below in conjunction with the accompanying drawings, but it is not limited thereto. Any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention should be covered by the present invention. within the scope of protection.

本实施方式基于深度学习技术和强化学习技术分别实现粗粒度和细粒度的代码漏洞识别和漏洞语句定位。首先，解析源代码，获取代码对应的中间代码表示形式(例如LLVM IR)。其次，以可能造成漏洞的关键点(keypoints)作为切片标准，对中间代码进行切片，获取比源程序更小的代码段(code gadgets)。再次，使用代码分段表示学习方法将输入的代码段转化为低维连续的实值向量。然后将代码段的向量表示输入到基于深度学习的粗粒度漏洞检测模型中，判断输入的向量对应的代码片段是否包含漏洞。最后，如果粗粒度模型检测到输入的代码段包含漏洞，则继续由基于强化学习的细粒度漏洞检测模型进行下一步判断，即找出造成漏洞的代码行。This embodiment implements coarse-grained and fine-grained code vulnerability identification and vulnerability statement location based on deep learning technology and reinforcement learning technology, respectively. First, the source code is parsed to obtain the intermediate code representation (such as LLVM IR) corresponding to the code. Second, the intermediate code is sliced using keypoints that may cause vulnerabilities as the slicing standard to obtain code segments (code gadgets) that are smaller than the source program. Thirdly, the code segmentation representation learning method is used to convert the input code segment into a low-dimensional continuous real-valued vector. Then input the vector representation of the code segment into the coarse-grained vulnerability detection model based on deep learning, and judge whether the code segment corresponding to the input vector contains a vulnerability. Finally, if the coarse-grained model detects that the input code segment contains a vulnerability, the fine-grained vulnerability detection model based on reinforcement learning will continue to make the next step of judgment, that is, find out the line of code that causes the vulnerability.

如图1所示，具体步骤如下：As shown in Figure 1, the specific steps are as follows:

步骤1：使用Clang工具对源程序进行静态分析，获取程序的中间代码表示形式。Step 1: Use the Clang tool to statically analyze the source program to obtain the intermediate code representation of the program.

步骤2：提取可能造成漏洞的关键点，生成切片标准，然后对中间代码进行切片，合并前向切片和后向切片得到程序的代码段，具体步骤如下：Step 2: Extract the key points that may cause vulnerabilities, generate slicing standards, then slice the intermediate code, and merge the forward slicing and backward slicing to obtain the code segment of the program. The specific steps are as follows:

步骤21：利用程序分析技术将程序解析成抽象语法树形式；Step 21: using program analysis technology to parse the program into an abstract syntax tree form;

步骤22：通过节点匹配算法遍历生成的抽象语法树，查找四种可能造成代码漏洞的语法树节点：(1)函数调用语句节点(即Callee)，如图2所示；(2)数组定义语句节点(即IdentifierDeclStatement，且语句中包含“[”和“]”字符)，如图3所示；(3)指针定义语句节点(即IdentifierDeclStatement，且语句中包含“*”字符)，如图4所示；(4)表达式语句节点(即ExpressionStatement)，如图5所示；Step 22: Traverse the generated abstract syntax tree through the node matching algorithm, and find four syntax tree nodes that may cause code vulnerabilities: (1) function call statement node (ie Callee), as shown in Figure 2; (2) array definition statement Node (i.e. IdentifierDeclStatement, and the statement contains "[" and "]" characters), as shown in Figure 3; (3) pointer definition statement node (i.e., IdentifierDeclStatement, and the statement contains "*" characters), as shown in Figure 4 (4) expression statement node (i.e. ExpressionStatement), as shown in Figure 5;

步骤23：对程序中提取的上述四类节点进行过滤，选出符合条件的语法树节点作为可能引发漏洞的关键点，例如：如果“Callee”节点所对应的标识符(identify)在预先定义的可能引发漏洞的库函数列表中，则“Callee”节点所对应的父节点(CallExpression)即为切片标准中的语句S；Step 23: Filter the above four types of nodes extracted in the program, and select the syntax tree nodes that meet the conditions as the key points that may cause vulnerabilities, for example: if the identifier (identify) corresponding to the "Callee" node is in the predefined In the list of library functions that may cause vulnerabilities, the parent node (CallExpression) corresponding to the "Callee" node is the statement S in the slice standard;

步骤24：从语句S中提取涉及到的变量作为切片标准中的V，结合S和V最终得到用于提取切片的切片标准(S,V)；Step 24: Extract the variables involved from the statement S as V in the slice standard, combine S and V to finally obtain the slice standard (S, V) for extracting slices;

步骤25：利用程序分析技术将代码段解析成程序依赖图，根据步骤24生成的切片标准在程序依赖图上进行前向和后向切片分析，合并前向切片和后向切片得到和四种可能造成漏洞的关键点相关的程序切片；Step 25: Use the program analysis technology to parse the code segment into a program dependency graph, perform forward and backward slice analysis on the program dependency graph according to the slice standard generated in step 24, and combine the forward slice and the backward slice to obtain four possibilities Program slices related to key points that cause vulnerabilities;

步骤26：根据源程序和中间代码之间的对应关系，将源程序的程序切片转化为中间代码的程序切片。Step 26: Convert the program slice of the source program into the program slice of the intermediate code according to the corresponding relationship between the source program and the intermediate code.

步骤3：使用代码分段表示学习方法将代码段表示成低维连续的实值向量，具体步骤如下：Step 3: Use the code segmentation representation learning method to represent the code segment as a low-dimensional continuous real-valued vector. The specific steps are as follows:

步骤31：将代码段以语句(statement)为单位进行拆分；Step 31: Split the code segment in units of statements;

步骤32：构建基于CNN的语句编码网络(Statement Encode Network，SENet)，模型的示意图如图1的中上部分；Step 32: Construct a CNN-based statement encoding network (Statement Encode Network, SENet), the schematic diagram of the model is shown in the upper middle part of Figure 1;

步骤33：将步骤31中获得的每条语句以空格为分隔符拆分成标记(token)序列作为语句编码网络的输入，输出语句的向量表示；Step 33: splitting each statement obtained in step 31 into a sequence of tokens with a space as a separator, and outputting a vector representation of the statement as a statement encoding network;

步骤34：构建基于LSTM的程序编码网络(Program Encode Network，PENet)，模型的示意图如图1的右上部分；Step 34: Build an LSTM-based program encoding network (Program Encode Network, PENet), the schematic diagram of the model is shown in the upper right part of Figure 1;

步骤35：将步骤33中代码段所包含的每条语句的向量表示作为程序编码网络的输入，输出最后时间步的隐藏层的向量表示作为代码段的向量表示。Step 35: Take the vector representation of each sentence contained in the code segment in step 33 as the input of the program encoding network, and output the vector representation of the hidden layer of the last time step as the vector representation of the code segment.

步骤4：将代码段的向量表示输入到基于深度学习的粗粒度代码漏洞检测模型中，判断代码段是否包含漏洞，具体步骤如下：Step 4: Input the vector representation of the code segment into the coarse-grained code vulnerability detection model based on deep learning to determine whether the code segment contains vulnerabilities. The specific steps are as follows:

步骤41：构建基于全连接单隐藏层的粗粒度代码漏洞检测模型(DetectorNetwork，DNet)，模型的示意图如图1的中下部分；Step 41: Construct a coarse-grained code vulnerability detection model (DetectorNetwork, DNet) based on a fully connected single hidden layer. The schematic diagram of the model is shown in the middle and lower part of Figure 1;

步骤42：将代码段的向量表示作为模型的输入，输出代码段中包含漏洞的概率；Step 42: take the vector representation of the code segment as the input of the model, and output the probability that the code segment contains a vulnerability;

步骤43：将预测的概率和真实的标签作为交叉熵损失函数的输入，计算预测的误差，利用反向传播算法更新代码分段表示学习模型(即基于CNN实现的SENet和基于LSTM实现的PENet)以及粗粒度漏洞检测模型(DNet)的参数。Step 43: Use the predicted probability and the real label as the input of the cross-entropy loss function, calculate the predicted error, and use the backpropagation algorithm to update the code segmentation representation learning model (ie SENet based on CNN and PENet based on LSTM) And the parameters of the coarse-grained vulnerability detection model (DNet).

步骤5：构建基于强化学习的细粒度漏洞检测模型(Policy Network，PNet)，预测含有缺陷的代码段中具体引发漏洞的代码行，具体步骤如下：Step 5: Construct a fine-grained vulnerability detection model (Policy Network, PNet) based on reinforcement learning, and predict the code line that specifically causes the vulnerability in the code segment containing the defect. The specific steps are as follows:

步骤51：构建基于强化学习的细粒度代码漏洞预测模型PNet，模型的示意图如图1右侧所示；Step 51: Build a fine-grained code vulnerability prediction model PNet based on reinforcement learning, the schematic diagram of the model is shown on the right side of Figure 1;

步骤52：将当前程序语句的向量表示以及语句的上下文内容(context)向量表示进行拼接作为强化学习在t时间步下的状态(state)表示；Step 52: splicing the vector representation of the current program statement and the context vector representation of the statement as the state (state) representation of the reinforcement learning at time step t;

步骤53：根据t时刻的状态表示，预测智能体(agent或policy)可能采取的行动(action)，如果行动是“相关(relevant)”，则意味着在t时刻的输入语句是可能造成代码漏洞的语句，如果行动是“不相关(irrelevant)”，则代表当前语句不会引发代码漏洞，针对代码段的每条语句做相同的动作预测，生成动作序列；Step 53: According to the state representation at time t, predict the action that the agent (agent or policy) may take. If the action is "relevant", it means that the input sentence at time t may cause code loopholes statement, if the action is "irrelevant", it means that the current statement will not cause code vulnerabilities, and the same action prediction is made for each statement in the code segment to generate an action sequence;

步骤54：根据公式

计算奖励(reward)，其中U是步骤53中预测的动作序列中和漏洞相关的代码行，V是真实的漏洞所在的代码行；Step 54: According to the formula

Calculate the reward (reward), where U is the code line related to the vulnerability in the predicted action sequence in step 53, and V is the code line where the real vulnerability is located;

步骤55：根据经典的REINFORCE算法和Policy梯度算法更新模型PNet的参数，使得PNet能够自动学习得到和漏洞相关的代码结构。Step 55: Update the parameters of the model PNet according to the classic REINFORCE algorithm and the Policy gradient algorithm, so that PNet can automatically learn the code structure related to the vulnerability.

实施例：Example:

以数据集Software Assurance Reference Dataset(SARD)中的具体漏洞实例为例，分析本发明提出的基于深度学习和强化学习的多粒度代码漏洞检测方法的检测过程。漏洞实例涉及的四个源代码文件内容分别如表1～表4所示。首先执行本发明具体实施方案的步骤1和步骤2，将源代码转化为中间代码，并且以可能造成漏洞的危险函数memset为关键点，生成针对中间代码的程序切片，如表5所示，在这个切片代码中，实际的漏洞语句是第10，11，34和35行。然后执行步骤3和步骤4学习程序中每行代码的向量表示以及整个程序的向量表示，并且将程序的向量表示作为粗粒度漏洞检测模型(DNet)的输入向量，模型预测的结果为1，即切片包含漏洞。最后执行步骤5将程序中每行语句的向量表示作为细粒度漏洞检测模型(PNet)的输入，输出预测的动作(action)序列如表6所示，其中数字0代表对应的代码语句不包含漏洞，数字1代表对应的代码语句可能包含漏洞，由于数字1在表6中的位置索引也是10，11，34和35，因此细粒度漏洞检测模型准确的识别出了具体的漏洞位置。从上述示例可以看出，本发明提出的方法既实现了粗粒度的漏洞代码检测，也实现了定位到具体可能造成漏洞的代码语句。Taking the specific vulnerability examples in the data set Software Assurance Reference Dataset (SARD) as an example, the detection process of the multi-granularity code vulnerability detection method based on deep learning and reinforcement learning proposed by the present invention is analyzed. The contents of the four source code files involved in the vulnerability examples are shown in Table 1 to Table 4 respectively. First execute steps 1 and 2 of the specific embodiment of the present invention, convert the source code into intermediate code, and use the dangerous function memset that may cause loopholes as a key point to generate program slices for the intermediate code, as shown in Table 5, in In this slice code, the actual vulnerable statements are lines 10, 11, 34 and 35. Then execute steps 3 and 4 to learn the vector representation of each line of code in the program and the vector representation of the entire program, and use the vector representation of the program as the input vector of the coarse-grained vulnerability detection model (DNet), and the model prediction result is 1, that is Slices contain holes. Finally, step 5 is executed, and the vector representation of each statement in the program is used as the input of the fine-grained vulnerability detection model (PNet), and the predicted action (action) sequence is output as shown in Table 6, where the number 0 represents that the corresponding code statement does not contain a vulnerability , the number 1 represents that the corresponding code statement may contain a vulnerability. Since the position index of the number 1 in Table 6 is also 10, 11, 34 and 35, the fine-grained vulnerability detection model accurately identifies the specific vulnerability location. It can be seen from the above examples that the method proposed by the present invention not only realizes coarse-grained vulnerability code detection, but also realizes locating specific code statements that may cause vulnerabilities.

表1 CWE124_Buffer_Underwrite__char_declare_memmove_53a.cTable 1 CWE124_Buffer_Underwrite__char_declare_memmove_53a.c

表2 CWE124_Buffer_Underwrite__char_declare_memmove_53b.cTable 2 CWE124_Buffer_Underwrite__char_declare_memmove_53b.c

表3 CWE124_Buffer_Underwrite__char_declare_memmove_53c.cTable 3 CWE124_Buffer_Underwrite__char_declare_memmove_53c.c

表4 CWE124_Buffer_Underwrite__char_declare_memmove_53d.cTable 4 CWE124_Buffer_Underwrite__char_declare_memmove_53d.c

表5以memset为关键点，生成针对中间代码的程序切片Table 5 uses memset as the key point to generate program slices for intermediate codes

表6细粒度漏洞检测模型针对表5预测的动作序列Table 6 Action sequence predicted by the fine-grained vulnerability detection model for Table 5

Claims

1. A multi-granularity code vulnerability detection method based on deep learning and reinforcement learning, characterized in that said method comprises the steps:

Step 1: Use the Clang tool to statically analyze the source program to obtain the intermediate code representation of the program;

Step 2: Extract the key points that may cause vulnerabilities, generate slicing standards, then slice the intermediate code, and merge the forward slicing and backward slicing to obtain the code segment of the program. The specific steps are as follows:

Step 21: using program analysis technology to parse the program into an abstract syntax tree form;

Step 22: traverse the generated abstract syntax tree through the node matching algorithm, and find four syntax tree nodes that may cause code loopholes: (1) function call statement node; (2) array definition statement node; (3) pointer definition statement node; (4) Expression statement node;

Step 23: Filter the above four types of nodes extracted in the program, and select the syntax tree nodes that meet the conditions as the key points that may cause vulnerabilities;

Step 24: Extract the variables involved from the statement S as V in the slice standard, combine S and V to finally obtain the slice standard (S, V) for extracting slices;

Step 25: Use the program analysis technology to parse the code segment into a program dependency graph, perform forward and backward slice analysis on the program dependency graph according to the slice standard generated in step 24, and combine the forward slice and the backward slice to obtain four possibilities Program slices related to key points that cause vulnerabilities;

Step 26: Convert the program slice of the source program into the program slice of the intermediate code according to the corresponding relationship between the source program and the intermediate code;

Step 3: Use the code segmentation representation learning method to represent the code segment as a low-dimensional continuous real-valued vector. The specific steps are as follows:

Step 31: Split the code segment in units of statements;

Step 32: Construct a CNN-based sentence encoding network SENet;

Step 33: splitting each statement obtained in step 31 into a token sequence with a space as a delimiter as the input of the statement encoding network, and outputting a vector representation of the statement;

Step 34: Construct PENet, an LSTM-based program encoding network;

Step 35: using the vector representation of each statement contained in the code segment in step 33 as the input of the program encoding network, outputting the vector representation of the hidden layer of the last time step as the vector representation of the code segment;

Step 4: Input the vector representation of the code segment into the coarse-grained code vulnerability detection model based on deep learning to determine whether the code segment contains vulnerabilities. The specific steps are as follows:

Step 41: Build a coarse-grained code vulnerability detection model DNet based on a fully connected single hidden layer;

Step 42: take the vector representation of the code segment as the input of the model, and output the probability that the code segment contains a vulnerability;

Step 43: Use the predicted probability and the real label as the input of the cross-entropy loss function, calculate the predicted error, and use the backpropagation algorithm to update the parameters of the code segmentation representation learning model and the coarse-grained vulnerability detection model DNet. The learning model includes the parameters based on SENet implemented by CNN and PENet based on LSTM;

Step 5: Construct a fine-grained vulnerability detection model based on reinforcement learning, and predict the code line that specifically causes the vulnerability in the code segment containing the defect. The specific steps are as follows:

Step 51: Construct PNet, a fine-grained code vulnerability prediction model based on reinforcement learning;

Step 52: Concatenate the vector representation of the current program statement and the context vector representation of the statement as the state state representation of the reinforcement learning at time step t;

Step 53: According to the state representation at time t, predict the action taken by the agent or policy. If the action is "relevant", it means that the input sentence at time t is a statement that causes code loopholes. If the action is "not relevant irrelevant", it means that the current statement will not cause code loopholes, the same action prediction is made for each statement in the code segment, and an action sequence is generated;

Step 54: According to the formula

Calculate the reward reward, where U is the code line related to the vulnerability in the predicted action sequence in step 53, and V is the code line where the real vulnerability is located;

Step 55: Update the parameters of the model PNet according to the classic REINFORCE algorithm and the Policy gradient algorithm, so that the model PNet can automatically learn the code structure related to the vulnerability.