[go: up one dir, main page]

CN118673497A - Code risk detection method and device, electronic equipment and computer storage medium - Google Patents

Code risk detection method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN118673497A
CN118673497A CN202410552728.9A CN202410552728A CN118673497A CN 118673497 A CN118673497 A CN 118673497A CN 202410552728 A CN202410552728 A CN 202410552728A CN 118673497 A CN118673497 A CN 118673497A
Authority
CN
China
Prior art keywords
risk
function
code
taint
transfer analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410552728.9A
Other languages
Chinese (zh)
Other versions
CN118673497B (en
Inventor
王柏柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang eCommerce Bank Co Ltd
Original Assignee
Zhejiang eCommerce Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang eCommerce Bank Co Ltd filed Critical Zhejiang eCommerce Bank Co Ltd
Priority to CN202410552728.9A priority Critical patent/CN118673497B/en
Publication of CN118673497A publication Critical patent/CN118673497A/en
Application granted granted Critical
Publication of CN118673497B publication Critical patent/CN118673497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3604Analysis of software for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Stored Programmes (AREA)

Abstract

本说明书公开了一种代码风险检测方法、装置、电子设备及计算机存储介质,方法包括:获取针对源代码的代码风险检测结果,确定代码风险检测结果对应的关键风险代码;基于关键风险代码生成污点传递分析提示词以及关键风险代码提示词,之后基于关键风险代码提示词和污点传递分析提示词构建目标提示词,最后基于目标提示词采用目标大语言模型确定污点传递分析结果,从而基于污点传递分析结果对代码风险检测结果进行更新得到目标代码风险检测结果。

The present specification discloses a code risk detection method, device, electronic device and computer storage medium. The method includes: obtaining code risk detection results for source code, determining key risk codes corresponding to the code risk detection results; generating taint transfer analysis prompt words and key risk code prompt words based on the key risk codes, then constructing target prompt words based on the key risk code prompt words and the taint transfer analysis prompt words, and finally determining the taint transfer analysis results based on the target prompt words using a target large language model, thereby updating the code risk detection results based on the taint transfer analysis results to obtain target code risk detection results.

Description

Code risk detection method and device, electronic equipment and computer storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for detecting code risk, an electronic device, and a computer storage medium.
Background
In recent years, as network security risks increase, code security issues have been increasingly focused. In the process of carrying out security risk detection on codes, in order to avoid missing report of risks, a large number of risk false positives are often generated, so that security engineers are required to spend a large amount of time for risk investigation.
Disclosure of Invention
The specification provides a code risk detection method, a device, an electronic device and a computer storage medium, wherein the technical scheme is as follows:
in a first aspect, the present specification provides a code risk detection method, the method comprising:
acquiring a code risk detection result aiming at a source code, and determining a key risk code corresponding to the code risk detection result;
Generating a stain transfer analysis prompt word based on the key risk code, and generating a key risk code prompt word based on the key risk code;
Constructing a target prompt word based on the key risk code prompt word and the stain transfer analysis prompt word;
And determining a taint transfer analysis result by adopting a target large language model based on the target prompt word, and updating the code risk detection result based on the taint transfer analysis result to obtain a target code risk detection result.
In a second aspect, the present specification provides a code risk detection apparatus, the apparatus comprising:
The acquisition module is suitable for acquiring a code risk detection result aiming at a source code and determining a key risk code corresponding to the code risk detection result;
The generation module is suitable for generating a stain transfer analysis prompt word based on the key risk code and generating a key risk code prompt word based on the key risk code;
The construction module is suitable for constructing target prompt words based on the key risk code prompt words and the stain transfer analysis prompt words;
And the updating module is suitable for determining a taint transfer analysis result by adopting a target large language model based on the target prompt word, and updating the code risk detection result based on the taint transfer analysis result to obtain a target code risk detection result.
In a third aspect, the present description provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.
In a fourth aspect, the present description provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
In a fifth aspect, the present description provides a computer program product storing at least one instruction for loading by a processor and performing the method steps of any one of the above.
The technical scheme provided by some embodiments of the present specification has the following beneficial effects: the method comprises the steps of obtaining a code risk detection result corresponding to a source code, determining a key risk code corresponding to the code risk detection result, eliminating redundant useless codes compared with the code risk detection result, and constructing a key risk code prompt word based on the key risk code, so that interference of other redundant useless codes on a target large language model during a target prompt word constructed later is avoided, and input of the subsequent target large language model is reduced.
Meanwhile, confirming a stain transfer analysis prompt word aiming at a key risk code, constructing a target prompt word of a target large language model through the key code prompt word and the stain transfer analysis prompt word, so that the understanding accuracy of the target prompt word and the processing efficiency of stain transfer analysis by the target large language model are effectively improved, finally, updating a risk detection result based on the obtained stain transfer result, and carrying out false alarm screening on the risk detection result, thereby solving the problem that a large amount of risk false alarms are generated in order to avoid missing the risk in the process of carrying out safety risk detection on the code, and a safety engineer is required to consume a large amount of time to carry out risk screening.
Drawings
In order to more clearly illustrate the technical solutions of the present specification or the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a code risk detection system provided in the present specification;
fig. 2 is a flow chart of a code risk detection method according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of determining a key risk code according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart of another embodiment of determining a key risk code;
FIG. 5 is a schematic flow chart of determining key risk code hint words according to an embodiment of the present disclosure;
FIG. 6 is a schematic flow chart of a method for constructing a target prompt word according to an embodiment of the present disclosure;
FIG. 7 is a schematic flow chart of constructing a code priori knowledge hint word according to an embodiment of the present disclosure;
FIG. 8 is a flow chart for determining a stain transfer analysis result according to an embodiment of the present disclosure;
FIG. 9 is a flowchart of determining a reference hint word corresponding to a reference function according to embodiments of the present disclosure;
Fig. 10 is a schematic diagram of a code risk detection device according to an embodiment of the present disclosure;
Fig. 11 is a schematic structural diagram of an electronic device provided in the present specification;
FIG. 12 is a schematic diagram of the architecture of the operating system and user space provided herein;
FIG. 13 is a block diagram of the android operating system of FIG. 12;
FIG. 14 is a diagram of the architecture of the IOS operating system of FIG. 12.
Detailed Description
The following description of the embodiments of the present invention will be made apparent from, and elucidated with reference to, the drawings of the present specification, in which embodiments described are only some, but not all, embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The present specification is described in detail below with reference to specific examples.
Referring to fig. 1, fig. 1 is a schematic view of a code risk detection system provided in the present specification. As shown in fig. 1, the code risk detection system may include at least a client cluster and a service platform 100.
The client cluster may include at least one client, as shown in fig. 1, specifically including a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, …, and a client n corresponding to a user n, where n is an integer greater than 0.
Each client in the client cluster may be a communication-enabled electronic device including, but not limited to: wearable devices, handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Electronic devices in different networks may be called different names, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, an electronic device in a Personal Digital Assistant (PDA), a 5G network, or a future evolution network, etc.
The service platform 100 may be a separate server device, such as: rack-mounted, blade, tower-type or cabinet-type server equipment or hardware equipment with stronger computing capacity such as workstations, mainframe computers and the like is adopted; the server cluster may also be a server cluster formed by a plurality of servers, and each server in the server cluster may be formed in a symmetrical manner, wherein each server is functionally equivalent and functionally equivalent in a transaction link, and each server may independently provide services to the outside, and the independent provision of services may be understood as no assistance of another server is needed.
In one or more embodiments of the present disclosure, the service platform 100 may establish a communication connection with at least one client in the client cluster, and complete interaction of data in the code risk detection process based on the communication connection, such as online transaction data interaction, for example, the service platform 100 may implement content recommendation to the client based on the target neural network model obtained by the code risk detection method of the present disclosure; as another example, the service platform 100 may obtain training data, such as first training data, from a client.
It should be noted that, the service platform 100 establishes a communication connection with at least one client in the client cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, where the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., a target compression package). All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), etc. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
The code risk detection system embodiments provided in the present disclosure and the code risk detection methods in one or more embodiments belong to the same concept, and an execution subject corresponding to the code risk detection methods related to one or more embodiments in the present disclosure may be the service platform 100 described above; the execution subject corresponding to the code risk detection method in one or more embodiments of the present disclosure may also be an electronic device corresponding to a client, and specifically determined based on an actual application environment. The implementation process of the code risk detection system embodiment may be described in detail in the following method embodiment, which is not described herein.
Based on the schematic view of the scenario shown in fig. 1, a detailed description is provided below of a code risk detection method provided in one or more embodiments of the present disclosure.
Referring to fig. 2, fig. 2 is a schematic flow chart of a code risk detection method according to an embodiment of the present disclosure, which may be implemented by a computer program and may be executed on a code risk detection device based on von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application. The code risk detection device may be a service platform.
Specifically, the code risk detection method includes:
s202: and acquiring a code risk detection result aiming at the source code, and determining a key risk code corresponding to the code risk detection result.
Optionally, acquiring a code risk detection result for the source code includes:
And performing code risk detection on the source code to obtain a code risk detection result aiming at the source code.
The code risk detection includes, but is not limited to, program static analysis, which refers to a method for performing program analysis under the condition that a program is not run. Generally, for a security risk scene, a static program analysis adopts a data flow/stain analysis algorithm to perform program analysis, calculates the relationship between program variables in functions and between functions, and finally determines whether security risks exist in codes under specific conditions.
Optionally, acquiring a code risk detection result for the source code includes:
Performing risk detection on the source code based on the code risk detection model to obtain a code risk detection result aiming at the source code; the code risk detection model can be obtained by model training based on the initial code risk detection model through a sample code and a reference risk label corresponding to the sample code.
And inputting the source code into a code risk detection model, and processing the source code through the code risk detection model so as to obtain a code risk detection result.
In general, in order to ensure that potential security risks are not missed as far as possible for code risk detection results of source codes, false positives may exist, so that the code risk detection results need to be screened, and detection results corresponding to the false positives are removed.
The code risk detection results are typically risk function call information indicating one or more possible security risk vulnerabilities to the source code. The function codes corresponding to the code risk detection result are usually all the function codes of all the risk functions in the risk function call information, but not all the function codes of the risk functions can cause the function to become the risk function, so that the codes in each risk function need to be subjected to useless code filtering processing, and the key risk codes for enabling each function to become the risk function are determined.
S204: and generating a stain transfer analysis prompt word based on the key risk code, and generating a key risk code prompt word based on the key risk code.
After determining all the key risk codes of the source code, in order to improve the processing efficiency of all the key risk codes, all the key risk codes may be divided according to functions corresponding to all the key risk codes, for example, key risk codes corresponding to one, two or more functions are divided into a group, so as to obtain multiple groups of key risk code sets for all the key risk codes. Of course, it is also possible to group all key risk codes into one group.
Therefore, when generating the taint transfer analysis prompt and the key risk code prompt based on the key risk codes, the taint transfer analysis prompt and the key risk code prompt may be generated based on all the key risk codes; or, the stain transfer analysis prompt words and the key risk code prompt words corresponding to the key risk code sets can be generated based on the key risk code sets. It should be appreciated that each set of key risk codes may correspond to one taint transfer analysis hint word as well as one key risk code hint word.
Here, the corruption transfer may refer to a transfer process of a corruption variable in a source code, where the corruption variable may be untrusted data or confidential data in a function. The taint transfer analysis prompt word is used for indicating the target large language model to carry out taint transfer analysis on the key risk codes corresponding to the taint transfer analysis prompt word.
S206: and constructing a target prompt word based on the key risk code prompt word and the stain transfer analysis prompt word.
After the key risk code prompt word and the stain transfer analysis prompt word are obtained, the key risk code prompt word and the stain transfer analysis prompt word can be spliced according to a preset splicing rule to obtain the target prompt word. The preset stitching rules may be determined based on the target large language model. Here, each set of key risk codes corresponds to a target prompt.
S208: and determining a taint transfer analysis result by adopting a target large language model based on the target prompt word, and updating the code risk detection result based on the taint transfer analysis result to obtain a target code risk detection result.
After the target prompt word is input into the target large language model, the target large language model extracts semantic information of a corresponding part of the taint transfer analysis prompt word in the target prompt word, determines a taint transfer analysis task of the target large language model based on the extracted semantic information, and then carries out taint transfer analysis on key risk codes of a corresponding part of the key risk code prompt word in the target prompt word based on the obtained taint transfer analysis task to obtain a function taint transfer result aiming at the key risk codes. Here, the target large language model is a language model composed of a neural network having many parameters, and is obtained by training a large amount of unlabeled text based on self-supervised learning or semi-supervised learning.
And then, the target large language model sequentially carries out taint transfer analysis on the key risk codes in the source codes based on the target prompt words to obtain taint transfer analysis results, so as to determine whether the taint variable is successfully transferred in the key risk codes corresponding to the source codes.
When the stain variable is successfully transmitted in the key risk code corresponding to the source code, the code risk detection result aiming at the source code is indicated to be not misreported; when the stain variable is not successfully transmitted in the key risk code corresponding to the source code, indicating that the code risk detection result aiming at the source code has false alarm, correcting and updating the code risk detection result to eliminate false alarm.
In the specification, by acquiring the code risk detection result corresponding to the source code, determining the key risk code corresponding to the code risk detection result, eliminating redundant useless codes compared with the code risk detection result, and then constructing the key risk code prompt word based on the key risk code, the interference of other redundant useless codes on the target large language model during the subsequent construction of the target prompt word is avoided, and the input of the subsequent target large language model is reduced.
Meanwhile, confirming a stain transfer analysis prompt word aiming at a key risk code, constructing a target prompt word of a target large language model through the key code prompt word and the stain transfer analysis prompt word, so that the understanding accuracy of the target prompt word and the processing efficiency of stain transfer analysis by the target large language model are effectively improved, finally, updating a risk detection result based on the obtained stain transfer result, and carrying out false alarm screening on the risk detection result, thereby solving the problem that a large amount of risk false alarms are generated in order to avoid missing the risk in the process of carrying out safety risk detection on the code, and a safety engineer is required to consume a large amount of time to carry out risk screening.
Referring to fig. 3, fig. 3 is a schematic flow chart of determining a key risk code according to an embodiment of the present disclosure. Specific: in S202, determining a key risk code corresponding to the code risk detection result includes:
S302: and acquiring a risk function call link from the code risk detection result, and determining an objective function and function variables of the objective function in the risk function call link.
The risk function call link may be a function call link with security risk, where a function call link refers to a series of procedures of function call, that is, when a function is called, another function is triggered, and the procedure continues to form a call chain, that is, a function call link. The function call link includes a function call procedure and a function related to the function call procedure. Of course, the function call link may also include code line number information of the function call point.
Thus, a risk function call link may be understood as a function call link in which a taint variable has a possibility of directly propagating to a taint aggregation point, where a taint aggregation point represents a function site that directly generates security-sensitive operations or leaks private data to the outside, where the security-sensitive operations generally refer to violating data integrity, that is, data is lost, misplaced, or changed during transmission, storage, and processing, so that the authenticity and accuracy of the data are affected.
After acquiring a risk function call link from the code risk detection result, acquiring a function call relation of the risk function call link, and determining a starting function in the risk function call link, wherein the starting function refers to a first called function in the risk function call link, and the starting function is a starting point of the whole risk function call link. And taking the initial function in the risk function call link as an objective function, and determining the function variable of the objective function, wherein the function variable refers to a variable used in the function, and the function variable can be transferred to other functions as an input parameter of the function during function call. It should be understood that, since the risk function call link is obtained from the code risk detection result, the function variable corresponding to the start function in the risk function call link is typically a stain variable.
S304: a key risk code for each risk function associated with the function variable is determined from the risk function call link.
After the function variables of the objective function are obtained, determining a risk function associated with the function variables of the objective function from the risk function call link, and determining key risk codes associated with the function variables of the objective function in each risk function. Here, the key risk code may be understood as a function statement code of a function related to a function variable of the objective function.
For example, the risk function is a function B, where the function B includes a plurality of rows of function codes, for example, a first row of function codes is a, a second behavior function code B, and a third behavior function code c, where the second behavior function code B is associated with a function variable of the objective function, that is, the second behavior function code B directly or indirectly calls the function variable of the objective function, and then the second behavior function code B is a key risk code of the function B for the function variable in the objective function.
In the specification, the target function and the function variable thereof of the risk function call link in the code risk detection result are determined, so that the key risk codes of all risk functions related to the function variable are determined from the risk function call link through the function variable, redundant codes in single functions compared with the function variable are eliminated, and then key risk code prompt words are constructed based on the key risk codes, so that interference of other redundant codes on a target large language model is prevented, and meanwhile, input of a subsequent target large language model is reduced.
In the embodiment provided in the present specification, updating the code risk detection result based on the stain transfer analysis result in S208 to obtain the target code risk detection result includes:
And determining the type of the taint transfer analysis result, and if the type of the taint transfer analysis result is the type of the link taint transfer failure, removing the target risk function call link corresponding to the taint transfer failure from the code risk detection result.
Determining the type of the stain transfer analysis result of each risk function call link in the code risk detection result, so as to judge whether each risk function call link is false alarm information in the code risk detection result, and when the type of the stain transfer analysis result is the link stain transfer failure type, removing the target risk function call link corresponding to the stain transfer failure from the code risk detection result; when the type of the taint transfer analysis result is the type of successful link taint transfer, the target risk function call link corresponding to the successful taint transfer is reserved from the code risk detection result.
Referring to fig. 4, fig. 4 is a schematic flow chart of another embodiment of determining a key risk code. Specific: determining key risk codes of each risk function associated with the function variables from the risk function call link in S304 includes:
s402: at least one risk function associated with a function variable in the risk function call link is determined.
And performing function splitting on the risk function call links to obtain at least two call functions corresponding to the risk function call links, and then determining one or more risk functions associated with the function variables from the call functions. Here, the number of risk functions is determined based on the number of calling functions associated with the function variables in the risk function call link.
Specifically, one or more risk functions associated with the function variables may be determined from at least two call functions corresponding to the risk function call links by data flow direction information for the function variables.
S404: searching code sentences associated with the function variables from the function code sentences of the risk functions to obtain key risk codes corresponding to the risk functions.
After determining the risk function in the risk function call link, acquiring function code sentences of the risk function, and extracting code sentences associated with function variables from the function code sentences of the risk function, thereby obtaining key codes corresponding to the risk functions.
In a specific embodiment, when the function variables corresponding to the objective function are filename and content, after searching the code statement associated with the function variable from the function code statement of one risk function, the key code corresponding to the risk function is determined to be "filepath =" pre_path "+filename" and "file_write (filepath, content)".
In the present specification, at least one risk function associated with a function variable in a risk function call link is determined by a function variable corresponding to the objective function, then a key code associated with the function variable in a function code statement of the risk function is determined, so as to obtain key codes of each risk function in the risk function call link, in this embodiment by determining the risk function associated with the function variable, the function irrelevant to the function variable of the objective function is filtered, the model input of the subsequent objective large language model can be effectively reduced, and meanwhile, the useless codes in each risk function are further filtered by extracting the key codes related to the function variable in each risk function, so that the model input of the subsequent objective large language model is further reduced, and the interference of other useless codes on the objective large language model is effectively eliminated.
Referring to fig. 5, fig. 5 is a schematic flow chart of determining key risk code hint words according to an embodiment of the present disclosure. Specific: in S204, generating a keyword of the key risk code based on the key risk code, including:
S502: and determining a stain variable corresponding to the risk function based on the code risk detection result.
Each risk function has its corresponding stain variable, it being understood that each risk function's corresponding stain variable may be from the last risk function corresponding to that risk function in the stain transfer data stream, or from an external input. Therefore, the stain transfer information can be determined based on the code risk detection result, thereby determining the stain variable corresponding to each risk function.
During the smear transfer process, the smear variable may change, assuming that the initial smear variable in the risk function call link is W, when the smear variable is transferred to the next risk function, the smear variable may become N, where n=w+1. Therefore, the stain variable of each risk function key code can be determined based on the stain transfer information in the code risk detection result. The dirty transfer information may refer to transfer information of a dirty variable in a function call link.
In the risk function call link, since the function taint transfer result of the current risk function may affect the function taint transfer result of the next risk function, and since the function taint transfer result of the current risk function may affect the taint variable in the next risk function, it is necessary to sequentially determine the function taint transfer result of each risk function in the risk function call link based on the target large language model.
S504: searching a target sub-function corresponding to the end code statement from the key risk codes corresponding to the risk functions.
After determining the key codes corresponding to the risk functions, determining end code sentences in the key codes corresponding to the risk functions, wherein the end code sentences can refer to target code sentences which are finally required to be executed when the key codes corresponding to the risk functions are operated, and then, taking the target code sentences as target sub-functions. The objective subfunction is used to transfer the stain variable between the risk functions.
S506: and generating a stain transfer analysis prompt word aiming at the key risk code based on the stain variable and the target sub-function.
The stain transfer analysis hint may be: and judging whether the taint variable can be successfully transferred to the target subfunction of the key risk code through the key risk code. Because the stain variables, objective subfunctions, and key risk codes of different risk functions may be different, different risk functions may correspond to different stain transfer analysis hints.
In some embodiments, the key risk code corresponding to the risk function func (filename, content) is {// filepath = "pre_path" +filename; the file_write (filepath, content); the stain variable of the risk function is file name and content, the objective subfunction is file_write function, then the stain transfer analysis prompt word for the key risk code may be "the parameter of the current func function is filename, content, filename, content is controllable, please analyze from the key risk code, whether the final stain of the parameter filename, content will reach the file_write function".
In the specification, a stain variable corresponding to each risk function is determined according to a code risk detection result, and a target sub-function for carrying out stain variable transfer between each risk function is determined based on key risk codes corresponding to each risk function, finally, a stain transfer analysis prompt word for the key risk codes corresponding to each risk function is generated based on the stain variable corresponding to each risk function and the target sub-function corresponding to each risk function, and the whole stain transfer process of the source code is split into a stain transfer process of a single risk function, so that understanding of a single target large language model is improved, and input of the single target large language model is reduced.
Referring to fig. 6, fig. 6 is a schematic flow chart of constructing a target prompt word according to an embodiment of the present disclosure. As shown in fig. 6, S206 constructs a target cue word based on the key risk code cue word and the stain transfer analysis cue word, including:
s602: code priori knowledge prompt words aiming at risk function call links are constructed, and output standard prompt words are constructed.
Code priori knowledge hint words are used to instruct a target large language model to model learning understanding of the priori knowledge, where the priori knowledge includes a series of vulnerability codes and cleansing functions, which are typically functions used to clean up or clean up data or variables, which may involve deleting duplicates, handling errors, normalizing data formats, or performing other tasks related to data cleaning and preparation.
By providing the bug codes and the purification functions, the target large language model learns the codes with the bugs and the dirty variables of the codes with the bugs after passing through the purification functions are eliminated, so that the accuracy of the target large language model in eliminating false alarm information in the code risk detection results is improved.
The output standard prompt word is used for standardizing the output of the target large language model, so that the output format of the target large language model is stable, and the subsequent generation of new prompt words based on the output of the target large language model is convenient. The output specification prompt word may be: please output the result in json format.
At this time, the result of the smear transfer analysis may be expressed in json format, and may be split into two parts, where the first part is used to answer whether the smear variable can reach the objective function, and the second part is used to describe the smear source on the objective function clearly when the smear variable can reach the objective function, i.e. perform the smear analysis, so as to determine the positions of the function corresponding to the convergence point of the smear point and the smear variable corresponding to the initial source function of the smear variable.
S604: and constructing a target prompt word based on the code priori knowledge prompt word, the key risk code prompt word, the stain transfer analysis prompt word and the output standard prompt word.
After the code priori knowledge prompt words and the output standard prompt words are determined, the code priori knowledge prompt words, the key risk code prompt words, the stain transfer analysis prompt words and the output standard prompt words are spliced, so that the target prompt words are obtained.
The code priori knowledge prompt words are used for indicating the target large language model to perform model learning understanding on priori knowledge, so that the target large language model learns codes with holes and stain variables of the codes with holes after passing through purification functions are eliminated, and the accuracy of the target large language model in eliminating false alarm information in code risk detection results is improved; the key risk code prompt words are used for providing key risk codes for the target large language model to carry out taint transfer risks, so that redundant codes are prevented from interfering with model understanding of the target large language model; the taint transfer analysis prompt word is used for indicating the target large language model to carry out taint transfer analysis on the key risk codes in the key risk code prompt word; the output specification prompt words are used for normalizing the output of the target large language model, so that new prompt words can be conveniently generated based on the output of the target large language model.
Referring to fig. 7, fig. 7 is a schematic flow chart of constructing a code priori knowledge hint word according to an embodiment of the present disclosure. As shown in fig. 7, constructing a code priori knowledge hint word for a risk function call link in S602 includes:
s702: and determining the risk type of the risk function in the risk function call link, and determining code priori knowledge database information matched with the risk type.
The risk types of the risk functions may be partitioned based on a specific scenario, e.g., the risk types may include a first risk type, a second risk type, a third risk type, and so on.
Each risk type is provided with a corresponding code priori knowledge database, the code priori knowledge database can comprise a series of vulnerability codes and purifying functions corresponding to the risk type, and the code priori knowledge database information can be the position information of the code priori knowledge database.
S704: code priori knowledge prompt words for the risk function call links are generated based on the code priori knowledge database information.
After the code priori knowledge database information is determined, code priori knowledge prompt words which instruct the target large language model to learn by code priori knowledge based on the code priori knowledge database information are generated by using the code priori knowledge database information.
Specifically, when the code priori knowledge database information is the position information of the code priori knowledge database, the code priori knowledge hint word may be: please perform model learning on the code priori knowledge in the code priori knowledge database based on the location information of the code priori knowledge database.
In the specification, the risk type of the risk function in the risk function call link is determined so as to determine the corresponding code priori knowledge database information, so that the generated code priori knowledge prompt words are more targeted, the code priori knowledge of the target large language model for model learning is more targeted, and the targeting of the target large language model for code priori knowledge learning and the model learning efficiency are improved.
Referring to fig. 8, fig. 8 is a flow chart illustrating a determination of a stain transfer analysis result according to an embodiment of the present disclosure. As shown in fig. 8, determining a taint transfer analysis result using a target large language model based on the target prompt word in S208 includes:
S802: and inputting the target prompt word into the target large language model.
The target prompt words at least comprise key risk code prompt words and stain transfer analysis prompt words; of course, the target prompt words can also comprise code priori knowledge prompt words, key risk code prompt words, stain transfer analysis prompt words and output standard prompt words; or, the target prompt word may also include a code priori knowledge prompt word, a key risk code prompt word and a stain transfer analysis prompt word; or, the target prompt words may also include a key risk code prompt word, a stain transfer analysis prompt word and an output specification prompt word.
The specific descriptions of the code priori knowledge prompt word, the key risk code prompt word, the stain transfer analysis prompt word and the output specification prompt word may refer to the related descriptions of S604, and will not be repeated here.
S804: acquiring a risk function corresponding to a target prompt word and a risk function call link corresponding to the risk function through a target large language model, determining a taint variable corresponding to the risk function based on the target prompt word, and carrying out taint transfer analysis on key risk codes corresponding to the risk function based on the taint variable to obtain a function taint transfer analysis result aiming at the risk function.
The target prompt word may be determined based on the key risk code corresponding to the single risk function, and thus, the risk function corresponding to the target prompt word may be obtained based on the target large language model. The code risk detection result comprises one or more risk function call links, and the risk function is contained in the corresponding risk function call link, so that the corresponding risk function call link can be obtained from the code risk detection result based on the risk function.
The stain variable corresponding to the risk function can be determined through the code risk detection result, and it is understood that the expression form of the stain variable may change in the stain transmission process, so that each risk function has its corresponding stain variable.
Through the indication of the target prompt word, the target large language model firstly determines a taint variable corresponding to the risk function, then carries out taint transfer analysis on a key risk code corresponding to the risk function based on the taint variable, so as to judge whether the taint variable is transferred successfully in the key risk code corresponding to the risk function, and further obtain a function taint transfer analysis result.
S806: a determination is made whether a risk function next reference function exists based on the risk function call link.
The risk function call link includes a call relationship between the risk functions, and thus, it can be determined whether the risk function has a next reference function whose stain variable passes through the risk function call link.
S808: if not, generating a stain transfer analysis result based on the function stain transfer analysis result.
When the next reference function of the risk function does not exist based on the risk function call link, the risk function is the function of final transfer of the taint variable, and then the taint transfer analysis result can be determined based on the function taint transfer analysis result of the risk function.
Namely, when the function taint transfer analysis result of the risk function is of a taint transfer success type, the taint transfer analysis result corresponds to that the link taint transfer is successful; when the function taint transfer analysis result of the risk function is the taint transfer failure type, the taint transfer analysis result is corresponding to the link taint transfer failure.
S810: if the function stain transfer analysis result exists, acquiring a reference prompt word corresponding to a reference function associated with the function stain transfer analysis result, taking the reference prompt word as a target prompt word and the reference function as a risk function, and executing the step of inputting the target prompt word into the target large language model.
In the risk function call link, the function taint transfer result of the current risk function may affect the function taint transfer result of the next reference function, because the function taint transfer result of the current risk function may affect the taint variable of the next reference function, it is necessary to sequentially determine the function taint transfer result of each risk function in the risk function call link based on the target large language model.
Therefore, when determining that the next reference function of the risk function exists based on the risk function call link, indicating that the risk function is not the function of final transfer of the taint variable, determining the next reference function of the risk function, and at the moment, acquiring a reference prompt word corresponding to the reference function associated with the taint transfer analysis result of the function, wherein the reference prompt word can comprise a key risk code prompt word and a taint transfer analysis prompt word corresponding to the reference function; of course, the reference prompt words can also comprise code priori knowledge prompt words, key risk code prompt words, stain transfer analysis prompt words and output standard prompt words corresponding to the reference function; or, the reference prompt words can also comprise code priori knowledge prompt words, key risk code prompt words and stain transfer analysis prompt words corresponding to the reference function; or, the reference prompt words may also include key risk code prompt words, taint transfer analysis prompt words and output specification prompt words corresponding to the reference function.
After the reference prompt word corresponding to the reference function is obtained, the reference prompt word corresponding to the reference function is used as a target prompt word, the reference function is used as a risk function, and step S802 is executed until the next reference function of the risk function does not exist in the risk function call link, which indicates that the target large language model completely completes the stain transfer analysis on the risk function call link.
In the specification, the risk functions in the risk function call link are subjected to the stain transfer analysis one by one through the target large language model, so that the input of a single target large language model is reduced, the model understanding accuracy of the target large language model on the target prompt word and the model processing efficiency are improved, and the final stain transfer analysis result can be accurately and efficiently determined.
Referring to fig. 9, fig. 9 is a flowchart illustrating a process of determining a reference prompt word corresponding to a reference function according to an embodiment of the present disclosure. As shown in fig. 9, in S810, obtaining a reference hint word corresponding to a reference function associated with a function taint transfer analysis result includes:
s902: and determining a reference taint variable corresponding to the reference function based on the function taint transfer analysis result, and searching a reference sub-function corresponding to the end code statement from the reference key risk code corresponding to the reference function.
The function taint transfer analysis result corresponding to the risk function can represent whether the taint variable is transferred successfully in the risk function, when the taint variable is transferred successfully in the risk function, the taint variable corresponding to the risk function is transferred to the reference function, and the taint variable is possibly changed when transferred to the reference function due to the fact that the operation of the taint variable possibly exists, so that the reference taint variable corresponding to the reference function needs to be determined based on the function taint transfer analysis result.
And searching a key risk code corresponding to the reference function, namely, a reference key risk code, searching an end code statement from the reference key risk code, and taking the end code statement as a reference sub-function.
Optionally, determining the reference stain variable corresponding to the reference function based on the function stain transfer analysis result in S902 includes:
Acquiring a key risk code of a risk function corresponding to a function taint transfer analysis result, and determining a target sub-function corresponding to an end code statement in the key risk code;
if the function taint transfer analysis result is the taint transfer success type, acquiring a function output variable of the objective subfunction, and taking the function output variable as a reference taint variable corresponding to the reference function.
The stain variable of the risk function is generally transferred to the next reference function through the objective sub-function corresponding to the end code statement in the key risk code, and when the function stain transfer analysis result of the risk function is a stain transfer success type, the function output variable of the objective sub-function is the reference stain variable of the next reference function.
When the function taint transfer analysis result of the risk function is the taint transfer failure type, it indicates that the function output variable of the objective subfunction is not the reference taint variable of the next reference function, at this time, whether the next reference function has an additional reference taint variable can be determined based on the code risk detection result, if the next reference function is the reference taint variable without the additional reference taint variable, the reference taint variable of the next reference function is empty, and the function taint transfer analysis result of the next reference function is the taint transfer failure type.
S904: and generating a reference taint transfer analysis prompt word aiming at the reference key risk code based on the reference taint variable, the reference key risk code and the reference sub-function.
Wherein, the related description of step S904 is similar to step S506, refer to step S506, and will not be described here again.
S906: and determining the reference prompt words corresponding to the reference function based on the reference stain transfer analysis prompt words.
Generating a reference prompt word corresponding to the reference function based on at least the reference taint transfer analysis prompt word, wherein the reference prompt word can comprise a key risk code prompt word and a reference taint transfer analysis prompt word corresponding to the reference function; of course, the reference prompt words can also comprise code priori knowledge prompt words, key risk code prompt words, reference stain transfer analysis prompt words and output specification prompt words corresponding to the reference function; or, the reference prompt word can also comprise a code priori knowledge prompt word, a key risk code prompt word and a reference stain transfer analysis prompt word which correspond to the reference function; or, the reference prompt words may also include key risk code prompt words, reference taint transfer analysis prompt words and output specification prompt words corresponding to the reference function.
In the specification, a reference stain variable corresponding to a reference function is determined based on a function stain transfer analysis result corresponding to a risk function, and a reference stain transfer analysis prompt word for a reference key risk code is obtained, so that a reference prompt word for the reference function is generated, and the accuracy of the reference prompt word corresponding to the reference function is improved.
Next, with reference to fig. 10, fig. 10 is a schematic diagram illustrating a code risk detection device according to an embodiment of the present disclosure, and the code risk detection device provided in the present disclosure will be described in detail. Note that, the code risk detection apparatus shown in fig. 10 is used to perform the method of the embodiment shown in fig. 1 to 9 of the present specification, and for convenience of explanation, only the portion relevant to the present specification is shown, and specific technical details are not disclosed, and reference is made to the embodiment shown in fig. 1 to 9 of the present specification.
Referring to fig. 10, a schematic diagram of the code risk detection device of the present specification is shown. The code risk detection device 1 may be implemented as all or part of a user terminal by software, hardware or a combination of both. According to some embodiments, the code risk detection device 1 comprises an acquisition module 11, a generation module 12, a construction module 13 and an update module 14, in particular for:
the acquiring module 11 is adapted to acquire a code risk detection result for the source code and determine a key risk code corresponding to the code risk detection result;
the generation module 12 is suitable for generating a stain transfer analysis prompt word based on the key risk code and generating a key risk code prompt word based on the key risk code;
A construction module 13 adapted to construct a target cue word based on the key risk code cue word and the taint transfer analysis cue word;
The updating module 14 is adapted to determine a taint transfer analysis result by using the target large language model based on the target prompt word, and update the code risk detection result based on the taint transfer analysis result to obtain a target code risk detection result.
Optionally, the acquisition module 11 includes:
the acquisition unit is suitable for acquiring a risk function call link from the code risk detection result and determining an objective function and function variables of the objective function in the risk function call link;
And the determining unit is suitable for determining key risk codes of the risk functions associated with the function variables from the risk function call link.
Optionally, the determining unit includes:
a determining subunit adapted to determine at least one risk function associated with the function variable in the risk function call link;
And the searching subunit is suitable for searching the code statement associated with the function variable from the function code statement of the risk function to obtain the key risk code corresponding to each risk function.
Optionally, the generating module 12 includes:
a stain variable determining unit adapted to determine a stain variable corresponding to the risk function based on the code risk detection result;
The target sub-function searching unit is suitable for searching the target sub-function corresponding to the end code statement from the key risk codes corresponding to the risk functions;
the generating unit is suitable for generating a stain transfer analysis prompt word aiming at the key risk code based on the stain variable and the objective sub-function.
Optionally, the building module 13 comprises:
The first construction unit is suitable for constructing code priori knowledge prompt words aiming at the risk function call link and constructing output standard prompt words;
The second construction unit is suitable for constructing target prompt words based on the code priori knowledge prompt words, the key risk code prompt words, the stain transfer analysis prompt words and the output standard prompt words.
Optionally, the first building unit comprises:
The code priori knowledge database information determining subunit is suitable for determining the risk type of the risk function in the risk function call link and determining the code priori knowledge database information matched with the risk type;
The code priori knowledge prompt word generation unit is suitable for generating code priori knowledge prompt words aiming at the risk function call link based on the code priori knowledge database information.
Optionally, the update module 14 includes:
the input unit is suitable for inputting the target prompt word into the target large language model;
the function taint transfer analysis result determining unit is suitable for obtaining a risk function corresponding to a target prompt word and a risk function call link corresponding to the risk function through a target large language model, determining a taint variable corresponding to the risk function based on the target prompt word, and carrying out taint transfer analysis on a key risk code corresponding to the risk function based on the taint variable to obtain a function taint transfer analysis result aiming at the risk function;
The judging unit is suitable for determining whether a next reference function of the risk function exists or not based on the risk function calling link;
the stain transfer analysis result generating unit is suitable for generating a stain transfer analysis result based on the function stain transfer analysis result if the stain transfer analysis result does not exist;
And the execution unit is suitable for acquiring the reference prompt word corresponding to the reference function related to the function taint transfer analysis result if the reference prompt word exists, taking the reference prompt word as a target prompt word and the reference function as a risk function, and executing the step of inputting the target prompt word into the target large language model.
Optionally, the execution unit includes:
The reference sub-function searching sub-unit is suitable for determining a reference stain variable corresponding to the reference function based on a function stain transfer analysis result, and searching a reference sub-function corresponding to an end code statement from a reference key risk code corresponding to the reference function;
A reference taint transfer analysis prompt generation subunit adapted to generate a reference taint transfer analysis prompt for a reference key risk code based on the reference taint variable and the reference subfunction;
The reference prompt word determining subunit is suitable for determining the reference prompt word corresponding to the reference function based on the reference stain transfer analysis prompt word.
Optionally, the reference sub-function searching sub-unit is suitable for acquiring a key risk code of a risk function corresponding to the function taint transfer analysis result, and determining a target sub-function corresponding to an end code statement in the key risk code;
if the function taint transfer analysis result is the taint transfer success type, acquiring a function output variable of the objective subfunction, and taking the function output variable as a reference taint variable corresponding to the reference function.
Optionally, the updating module 14 is adapted to determine a type of the taint transfer analysis result, and if the type of the taint transfer analysis result is a link taint transfer failure type, remove the target risk function call link corresponding to the taint transfer failure from the code risk detection result.
It should be noted that, when the code risk detection device provided in the foregoing embodiment performs the code risk detection method, only the division of the foregoing functional modules is used as an example, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the code risk detection device and the code risk detection method provided in the foregoing embodiments belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not described herein again.
The present disclosure further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded by a processor and executed by the processor, where the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 9, and the details are not repeated herein.
The present disclosure further provides a computer program product, where at least one instruction is stored, where the at least one instruction is loaded by the processor and executed by the processor, where the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 9, and details are not repeated herein.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device provided in the present disclosure. The electronic device in this specification may include one or more of the following: processor 110, memory 120, input device 130, output device 140, and bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.
Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-programmable gate array (FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processor (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.
The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, which may be an Android (Android) system, including an Android system-based deep development system, an IOS system developed by apple corporation, including an IOS system-based deep development system, or other systems, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device in use, such as phonebooks, audiovisual data, chat log data, and the like.
Referring to fig. 12, fig. 12 is a schematic diagram of an operating system and a user space provided in the present specification, where the memory 120 may be divided into an operating system space and a user space, and an operating system may be run in the operating system space, and a native and a third party application may be run in the user space. In order to ensure that different third party application programs can achieve better operation effects, the operating system allocates corresponding system resources for the different third party application programs. However, the requirements of different application scenarios in the same third party application program on system resources are different, for example, under the local resource loading scenario, the third party application program has higher requirement on the disk reading speed; in the animation rendering scene, the third party application program has higher requirements on the GPU performance. The operating system and the third party application program are mutually independent, and the operating system often cannot timely sense the current application scene of the third party application program, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third party application program.
In order to enable the operating system to distinguish specific application scenes of the third-party application program, data communication between the third-party application program and the operating system needs to be communicated, so that the operating system can acquire current scene information of the third-party application program at any time, and targeted system resource adaptation is performed based on the current scene.
Fig. 13 is a schematic diagram of the Android operating system in fig. 12, taking the operating system as an Android system as an example, a program and data stored in the memory 120 are shown in fig. 13, and a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360 and an application layer 380 may be stored in the memory 120, where the Linux kernel layer 320, the system runtime library layer 340 and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides the underlying drivers for various hardware of the electronic device, such as display drivers, audio drivers, camera drivers, bluetooth drivers, wi-Fi drivers, power management, and the like. The system runtime layer 340 provides the main feature support for the Android system through some C/c++ libraries. For example, the SQLite library provides support for databases, the OpenGL/ES library provides support for 3D graphics, the Webkit library provides support for browser kernels, and the like. Also provided in the system runtime library layer 340 is An Zhuoyun runtime library (Android runtime), which primarily provides some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building applications, which developers can also build their own applications by using, for example, campaign management, window management, view management, notification management, content provider, package management, call management, resource management, location management. At least one application program is running in the application layer 380, and these application programs may be native application programs of the operating system, such as a contact program, a short message program, a clock program, a camera application, etc.; and may also be a third party application developed by a third party developer, such as a game-like application, instant messaging program, photo beautification program, etc.
FIG. 14 is a block diagram of the IOS operating system of FIG. 12, taking the operating system as an IOS system as an example, the programs and data stored in the memory 120 are shown in FIG. 9, the IOS system comprising: core operating system layer 420 (Core OS layer), core services layer 440 (Core SERVICES LAYER), media layer 460 (MEDIA LAYER), and touchable layer 480 (Cocoa Touch Layer). The core operating system layer 420 includes an operating system kernel, drivers, and underlying program frameworks that provide more hardware-like functionality for use by the program frameworks at the core services layer 440. The core services layer 440 provides system services and/or program frameworks required by the application, such as a Foundation (Foundation) framework, an account framework, an advertisement framework, a data storage framework, a network connection framework, a geographic location framework, a sports framework, and the like. The media layer 460 provides an interface for applications related to audiovisual aspects, such as a graphics-image related interface, an audio technology related interface, a video technology related interface, an audio video transmission technology wireless play (AirPlay) interface, and so forth. The touchable layer 480 provides various commonly used interface-related frameworks for application development, with the touchable layer 480 being responsible for user touch interactions on the electronic device. Such as a local notification service, a remote push service, an advertisement framework, a game tool framework, a message User Interface (UI) framework, a User Interface UIKit framework, a map framework, and so forth.
Among the frameworks shown in fig. 14, frameworks related to most applications include, but are not limited to: a base framework in core services layer 440 and UIKit frameworks in touchable layer 480. The infrastructure provides many basic object classes and data types, providing the most basic system services for all applications, independent of the UI. While the class provided by the UIKit framework is a base UI class library for creating touch-based user interfaces, iOS applications can provide UIs based on the UIKit framework, so it provides the application's infrastructure for building user interfaces, drawing, handling and user interaction events, responding to gestures, and so on.
The manner and principle of implementing data communication between the third party application program and the operating system in the IOS system may refer to the Android system, and this description is not repeated here.
The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are a touch display screen for receiving a touch operation thereon or thereabout by a user using a finger, a touch pen, or any other suitable object, and displaying a user interface of each application program. Touch display screens are typically provided on the front panel of an electronic device. The touch display screen may be designed as a full screen, a curved screen, or a contoured screen. The touch display screen can also be designed to be a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen is not limited in this specification.
In addition, those skilled in the art will appreciate that the configuration of the electronic device shown in the above-described figures does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the electronic device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WIRELESS FIDELITY, wiFi) module, a power supply, and a bluetooth module, which are not described herein.
In this specification, the execution subject of each step may be the electronic device described above. Optionally, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or other operating systems, which is not limited in this specification.
The electronic device of the present specification may further have a display device mounted thereon, and the display device may be various devices capable of realizing a display function, for example: cathode ray tube displays (cathode ray tubedisplay, CR), light-emitting diode displays (light-emitting diode display, LED), electronic ink screens, liquid Crystal Displays (LCD), plasma display panels (PLASMA DISPLAY PANEL, PDP), and the like. A user may utilize a display device on electronic device 101 to view displayed text, images, video, etc. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage, an audio playing device, a video playing device, a notebook, a desktop computing device, a wearable device such as an electronic watch, electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, etc.
In the electronic device shown in fig. 11, where the electronic device may be a terminal, the processor 110 may be configured to invoke the code risk detection program stored in the memory 120 and specifically perform the following operations:
acquiring a code risk detection result aiming at a source code, and determining a key risk code corresponding to the code risk detection result;
Generating a stain transfer analysis prompt word based on the key risk code, and generating a key risk code prompt word based on the key risk code;
constructing a target prompt word based on the key risk code prompt word and the stain transfer analysis prompt word;
and determining a taint transfer analysis result by adopting a target large language model based on the target prompt word, and updating the code risk detection result based on the taint transfer analysis result to obtain a target code risk detection result.
Optionally, when the processor 110 executes the key risk code corresponding to the determined code risk detection result, specific execution is performed:
acquiring a risk function call link from a code risk detection result, and determining a target function and a function variable of the target function in the risk function call link;
a key risk code for each risk function associated with the function variable is determined from the risk function call link.
Optionally, when the processor 110 executes the key risk code of each risk function associated with the function variable determined from the risk function call link, specific execution is performed:
Determining at least one risk function associated with a function variable in a risk function call link;
Searching code sentences associated with the function variables from the function code sentences of the risk functions to obtain key risk codes corresponding to the risk functions.
Optionally, when the processor 110 executes to generate the key risk code hint word based on the key risk code, specific execution is performed:
determining a stain variable corresponding to the risk function based on the code risk detection result;
Searching a target sub-function corresponding to the end code statement from the key risk codes corresponding to the risk functions;
and generating a stain transfer analysis prompt word aiming at the key risk code based on the stain variable and the target sub-function.
Optionally, when the processor 110 executes the target prompt word constructed based on the key risk code prompt word and the taint transfer analysis prompt word, the specific implementation is as follows:
Constructing a code priori knowledge prompt word aiming at a risk function call link, and constructing an output standard prompt word;
And constructing a target prompt word based on the code priori knowledge prompt word, the key risk code prompt word, the stain transfer analysis prompt word and the output standard prompt word.
Optionally, when the processor 110 executes the code priori knowledge hint word for constructing the risk function call link, specific execution is performed:
Determining a risk type of a risk function in a risk function call link, and determining code priori knowledge database information matched with the risk type;
code priori knowledge prompt words for the risk function call links are generated based on the code priori knowledge database information.
Optionally, when the processor 110 determines the taint transfer analysis result based on the target prompt word using the target large language model, the specific implementation is as follows:
inputting a target prompt word into a target large language model;
Acquiring a risk function corresponding to a target prompt word and a risk function call link corresponding to the risk function through a target large language model, determining a taint variable corresponding to the risk function based on the target prompt word, and carrying out taint transfer analysis on key risk codes corresponding to the risk function based on the taint variable to obtain a function taint transfer analysis result aiming at the risk function;
determining whether a risk function next reference function exists based on the risk function call link;
if the function stain transfer analysis result does not exist, generating a stain transfer analysis result based on the function stain transfer analysis result;
if the function stain transfer analysis result exists, acquiring a reference prompt word corresponding to a reference function associated with the function stain transfer analysis result, taking the reference prompt word as a target prompt word and the reference function as a risk function, and executing the step of inputting the target prompt word into the target large language model.
Optionally, when the processor 110 executes to obtain the reference hint word corresponding to the reference function associated with the function taint delivery analysis result, specific execution is performed:
Determining a reference stain variable corresponding to the reference function based on a function stain transfer analysis result, and searching a reference sub-function corresponding to an end code statement from a reference key risk code corresponding to the reference function;
generating a reference taint transfer analysis prompt word aiming at a reference key risk code based on the reference taint variable and the reference subfunction;
And determining the reference prompt words corresponding to the reference function based on the reference stain transfer analysis prompt words.
Optionally, when the processor 110 determines the reference stain variable corresponding to the reference function based on the function stain transfer analysis result, the specific implementation is as follows:
Acquiring a key risk code of a risk function corresponding to a function taint transfer analysis result, and determining a target sub-function corresponding to an end code statement in the key risk code;
if the function taint transfer analysis result is the taint transfer success type, acquiring a function output variable of the objective subfunction, and taking the function output variable as a reference taint variable corresponding to the reference function.
Optionally, when the processor 110 performs updating the code risk detection result based on the stain transfer analysis result to obtain the target code risk detection result, specific execution is performed:
And determining the type of the taint transfer analysis result, and if the type of the taint transfer analysis result is the type of the link taint transfer failure, removing the target risk function call link corresponding to the taint transfer failure from the code risk detection result.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, object features, interactive behavior features, user information, and the like referred to in this specification are all acquired with sufficient authorization.
The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.

Claims (14)

1. A code risk detection method, the method comprising:
acquiring a code risk detection result aiming at a source code, and determining a key risk code corresponding to the code risk detection result;
Generating a stain transfer analysis prompt word based on the key risk code, and generating a key risk code prompt word based on the key risk code;
Constructing a target prompt word based on the key risk code prompt word and the stain transfer analysis prompt word;
And determining a taint transfer analysis result by adopting a target large language model based on the target prompt word, and updating the code risk detection result based on the taint transfer analysis result to obtain a target code risk detection result.
2. The method of claim 1, the determining a key risk code corresponding to the code risk detection result, comprising:
Acquiring a risk function call link from the code risk detection result, and determining an objective function in the risk function call link and a function variable of the objective function;
And determining key risk codes of each risk function associated with the function variables from the risk function call link.
3. The method of claim 2, the determining key risk codes for each risk function associated with the function variable from the risk function call link, comprising:
Determining at least one risk function associated with the function variable in the risk function call link;
searching code sentences associated with the function variables from the function code sentences of the risk functions to obtain key risk codes corresponding to the risk functions.
4. The method of claim 2, the generating a key-risk code hint word based on the key-risk code, comprising:
determining a stain variable corresponding to the risk function based on the code risk detection result;
searching a target sub-function corresponding to an end code statement from the key risk codes corresponding to the risk functions;
and generating a stain transfer analysis prompt word aiming at the key risk code based on the stain variable and the objective subfunction.
5. The method of claim 2, the constructing a target cue word based on the key risk code cue word and the taint transfer analysis cue word, comprising:
constructing a code priori knowledge prompt word aiming at the risk function call link, and constructing an output standard prompt word;
and constructing a target prompt word based on the code priori knowledge prompt word, the key risk code prompt word, the stain transfer analysis prompt word and the output standard prompt word.
6. The method of claim 5, the constructing code priori knowledge hint words for the risk function call link comprising:
determining a risk type of a risk function in the risk function call link, and determining code priori knowledge database information matched with the risk type;
and generating a code priori knowledge prompt word aiming at the risk function call link based on the code priori knowledge database information.
7. The method of any of claims 2 to 6, the determining a taint transfer analysis result using a target large language model based on the target prompt word, comprising:
inputting the target prompt word into a target large language model;
Acquiring a risk function corresponding to the target prompt word and a risk function call link corresponding to the risk function through the target large language model, determining a taint variable corresponding to the risk function based on the target prompt word, and carrying out taint transfer analysis on a key risk code corresponding to the risk function based on the taint variable to obtain a function taint transfer analysis result aiming at the risk function;
determining whether a next reference function of the risk function exists based on the risk function call link;
If the function stain transfer analysis result does not exist, generating a stain transfer analysis result based on the function stain transfer analysis result;
if the function stain transfer analysis result exists, acquiring a reference prompt word corresponding to the reference function associated with the function stain transfer analysis result, taking the reference prompt word as the target prompt word and the reference function as the risk function, and executing the step of inputting the target prompt word into a target large language model.
8. The method of claim 7, wherein the obtaining the reference hint word corresponding to the reference function associated with the function taint transfer analysis result comprises:
Determining a reference stain variable corresponding to the reference function based on the function stain transfer analysis result, and searching a reference sub-function corresponding to an end code statement from a reference key risk code corresponding to the reference function;
Generating a reference taint transfer analysis prompt word for the reference key risk code based on the reference taint variable and the reference sub-function;
And determining the reference prompt words corresponding to the reference function based on the reference stain transfer analysis prompt words.
9. The method of claim 8, the determining a reference stain variable corresponding to the reference function based on the function stain transfer analysis result, comprising:
acquiring a key risk code of a risk function corresponding to the function taint transfer analysis result, and determining a target sub-function corresponding to an end code statement in the key risk code;
And if the function taint transfer analysis result is a taint transfer success type, acquiring a function output variable of the target sub-function, and taking the function output variable as a reference taint variable corresponding to the reference function.
10. The method of claim 2, the updating the code risk detection result based on the taint transfer analysis result to obtain an object code risk detection result, comprising:
and determining the type of the taint transfer analysis result, and if the type of the taint transfer analysis result is the type of the link taint transfer failure, removing the target risk function call link corresponding to the taint transfer failure from the code risk detection result.
11. A code risk detection apparatus, the apparatus comprising:
The acquisition module is suitable for acquiring a code risk detection result aiming at a source code and determining a key risk code corresponding to the code risk detection result;
The generation module is suitable for generating a stain transfer analysis prompt word based on the key risk code and generating a key risk code prompt word based on the key risk code;
The construction module is suitable for constructing target prompt words based on the key risk code prompt words and the stain transfer analysis prompt words;
And the updating module is suitable for determining a taint transfer analysis result by adopting a target large language model based on the target prompt word, and updating the code risk detection result based on the taint transfer analysis result to obtain a target code risk detection result.
12. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 10.
13. A computer program product storing at least one instruction for loading by a processor and performing the method steps of any one of claims 1 to 10.
14. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-10.
CN202410552728.9A 2024-05-06 2024-05-06 Code risk detection method and device, electronic equipment and computer storage medium Active CN118673497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410552728.9A CN118673497B (en) 2024-05-06 2024-05-06 Code risk detection method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410552728.9A CN118673497B (en) 2024-05-06 2024-05-06 Code risk detection method and device, electronic equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN118673497A true CN118673497A (en) 2024-09-20
CN118673497B CN118673497B (en) 2025-11-28

Family

ID=92718119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410552728.9A Active CN118673497B (en) 2024-05-06 2024-05-06 Code risk detection method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN118673497B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119538266A (en) * 2024-11-22 2025-02-28 北京火山引擎科技有限公司 A safety detection method, device, equipment, medium and product
CN119691757A (en) * 2025-02-25 2025-03-25 上海交通大学 Firmware logic vulnerability detection method, system, electronic device and computer-readable storage medium based on large language model thinking chain
CN119939605A (en) * 2025-01-27 2025-05-06 北京火山引擎科技有限公司 A method, device, equipment, medium and product for detecting attack chains

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210149995A1 (en) * 2019-11-18 2021-05-20 International Business Machines Corporation System and Method for Negation Aware Sentiment Detection
US20220207163A1 (en) * 2020-12-30 2022-06-30 Atlassian Pty Ltd Apparatuses, methods, and computer program products for programmatically parsing, classifying, and labeling data objects
CN116800525A (en) * 2023-07-21 2023-09-22 浙江网商银行股份有限公司 Honeypot protection method and device, storage medium and electronic equipment
CN117555720A (en) * 2024-01-11 2024-02-13 腾讯科技(深圳)有限公司 Code repairing method, device, equipment and medium
CN117806967A (en) * 2023-12-29 2024-04-02 北京百度网讯科技有限公司 Method for processing code and training method of code processing model
CN117874754A (en) * 2024-01-12 2024-04-12 中电云计算技术有限公司 Sensitive information leakage monitoring and linkage large model analysis method
CN117909235A (en) * 2024-01-18 2024-04-19 百度(中国)有限公司 Code risk detection method, training method and device of deep learning model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210149995A1 (en) * 2019-11-18 2021-05-20 International Business Machines Corporation System and Method for Negation Aware Sentiment Detection
US20220207163A1 (en) * 2020-12-30 2022-06-30 Atlassian Pty Ltd Apparatuses, methods, and computer program products for programmatically parsing, classifying, and labeling data objects
CN116800525A (en) * 2023-07-21 2023-09-22 浙江网商银行股份有限公司 Honeypot protection method and device, storage medium and electronic equipment
CN117806967A (en) * 2023-12-29 2024-04-02 北京百度网讯科技有限公司 Method for processing code and training method of code processing model
CN117555720A (en) * 2024-01-11 2024-02-13 腾讯科技(深圳)有限公司 Code repairing method, device, equipment and medium
CN117874754A (en) * 2024-01-12 2024-04-12 中电云计算技术有限公司 Sensitive information leakage monitoring and linkage large model analysis method
CN117909235A (en) * 2024-01-18 2024-04-19 百度(中国)有限公司 Code risk detection method, training method and device of deep learning model

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119538266A (en) * 2024-11-22 2025-02-28 北京火山引擎科技有限公司 A safety detection method, device, equipment, medium and product
CN119538266B (en) * 2024-11-22 2025-07-22 北京火山引擎科技有限公司 Language model-based rebound shell detection method, device, equipment, medium and product for cloud security protection and business risk identification
CN119939605A (en) * 2025-01-27 2025-05-06 北京火山引擎科技有限公司 A method, device, equipment, medium and product for detecting attack chains
CN119691757A (en) * 2025-02-25 2025-03-25 上海交通大学 Firmware logic vulnerability detection method, system, electronic device and computer-readable storage medium based on large language model thinking chain
CN119691757B (en) * 2025-02-25 2025-05-27 上海交通大学 Firmware logic vulnerability detection method, system, electronic device and computer-readable storage medium based on large language model thinking chain

Also Published As

Publication number Publication date
CN118673497B (en) 2025-11-28

Similar Documents

Publication Publication Date Title
CN118673497B (en) Code risk detection method and device, electronic equipment and computer storage medium
CN111767554B (en) Screen sharing method and device, storage medium and electronic equipment
CN112187585B (en) Network protocol testing method and device
US20220392130A1 (en) Image special effect processing method and apparatus
CN111459586B (en) Remote assistance method, device, storage medium and terminal
CN112653670A (en) Service logic vulnerability detection method, device, storage medium and terminal
WO2021147455A1 (en) Message processing method and device, and electronic apparatus
CN115049068B (en) Model processing method, device, storage medium and electronic device
WO2025201472A1 (en) Fact evaluation model training and fact evaluation
CN116823537A (en) An insurance report processing method, device, storage medium and electronic equipment
CN118568206B (en) Pre-training language model back door detection method and device and electronic equipment
CN113268221B (en) File matching method, device, storage medium and computer equipment
CN111722936A (en) Communication method and device of page end and native end, electronic equipment and readable medium
CN117056507B (en) Long text analysis method, long text analysis model training method and related equipment
CN119415360A (en) Application behavior detection method, device, electronic device, and medium
CN116166251B (en) Media file display method, device, medium and electronic device
CN118070923A (en) Model training data generation method, device, storage medium and electronic device
CN113972989A (en) Data verification method, device, storage medium and electronic device
CN117472782A (en) Transaction testing method and device, storage medium and electronic equipment
CN113138707B (en) Interaction method, interaction device, electronic equipment and computer-readable storage medium
CN117519563A (en) Interface interaction method and device, electronic equipment and storage medium
CN112559328B (en) Discrimination method, device, equipment and medium of instruction simulation engine
CN116522996A (en) Training method of recommendation model, recommendation method and related device
CN115827833A (en) Dialog structure processing method and device, storage medium and electronic equipment
CN113934639B (en) Data processing method, device, readable medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant