WO2021018929A1 - Procédé mis en œuvre par ordinateur, système et programme informatique permettant d'identifier un fichier malveillant - Google Patents
Procédé mis en œuvre par ordinateur, système et programme informatique permettant d'identifier un fichier malveillant Download PDFInfo
- Publication number
- WO2021018929A1 WO2021018929A1 PCT/EP2020/071334 EP2020071334W WO2021018929A1 WO 2021018929 A1 WO2021018929 A1 WO 2021018929A1 EP 2020071334 W EP2020071334 W EP 2020071334W WO 2021018929 A1 WO2021018929 A1 WO 2021018929A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- features
- file
- malicious file
- computer
- fuzzy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/043—Architecture, e.g. interconnection topology based on fuzzy logic, fuzzy membership or fuzzy inference, e.g. adaptive neuro-fuzzy inference systems [ANFIS]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/048—Fuzzy inferencing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/145—Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- the present invention relates to a computer-implemented method, system and computer program for identifying a malicious file that combines different types of analysis, processes and procedures that allow detecting and classifying malicious files.
- a method for identifying malware file using multiple classifiers is known by US patent application, US2010192222A1.
- such method uses multiple classifiers including static and dynamic classifiers, and thus is unable to identify malware based only on static analysis.
- the patent EP2882159 discloses a computer implemented method of profiling cyber threats detected in a target environment, that comprises receiving, from a Security Information and Event Manager (SIEM) monitoring the target environment, alerts triggered by a detected potential cyber threat, and, for each alert: retrieving captured packet data related to the alert; extracting data pertaining to a set of attributes from captured packet data triggering the alert; applying fuzzy logic to data pertaining to one or more of the attributes to determine values for one or more output variables indicative of a level of an aspect of risk attributable to the cyber threat.
- SIEM Security Information and Event Manager
- the present invention relates, in accordance with a first aspect, to a computer-implemented method for identifying a malicious file.
- the method comprises:
- a preliminary classification output i.e. a score
- the method comprises:
- the method comprises:
- the method comprises performing a further static machine learning classification process, using as inputs several or all of the above mentioned sets of features, to obtain a corresponding further preliminary classification output; and - performing said fuzzy inference procedure based on possibilistic logic using as input variable also said further preliminary classification output.
- the above mentioned fuzzy inference procedure comprises a fuzzification process that converts the input variables into fuzzy variables.
- the fuzzification process comprises deriving membership functions relating the input variables with output variables through membership degrees of values of the input variables in predefined fuzzy sets, and representing said membership functions with linguistic variables, said linguistic variables being said fuzzy variables.
- the fuzzy inference procedure further comprises an inference decision-making process comprising firing fuzzy possibilistic rules with values of said linguistic variables for said input variables, to generate a fuzzy output that identifies the degree of belief that the potentially malicious file has to be a malicious file or a benign file.
- the method of the first aspect of the present invention further comprises selecting which fuzzy possibilistic rules to fire in said inference decision-making process, based on at least said values of the linguistic variables for the input variables.
- the fuzzy inference procedure based on possibilistic logic further comprises a defuzzification process that converts the above mentioned fuzzy possibilistic output into a crisp output, wherein said crisp output constitutes the above mentioned enhanced classification output.
- the above mentioned set or sets of features may comprise:
- API Application Programming Interfaces
- function calls the representation of an executable file as a stream of entropy values, where each value describes the amount of entropy over a small chunk of code in a specific location of the potentially malicious file; the sequence of assembly language instructions executed by a software program constituting the potentially malicious file, in particular, the operational codes of the machine language instructions;
- the fuzzy inference procedure based on possibilistic logic is based on a PGL+ algorithm.
- the proof method for PGL+ is complete and involves a semantical unification model of disjunctive fuzzy constants and three other inference patterns together with a deductive mechanism based on a modus ponens style.
- the PGL+ algorithm can comprise applying three algorithms sequentially: a first algorithm that extends the fuzzy possibilistic rules by means of implementing a first set of rules; a second algorithm that translates the fuzzy possibilistic rules into a semantically equivalent set of 1 -weighted clauses by means of implemented a second set of rules; and a third algorithm that computes a maximum degree of possibilistic entailment of a goal from the equivalent set of 1 -weighted clauses.
- the fuzzy inference procedure based on possibilistic logic comprises executing the following formulas in the form (A, c), where A is a Horn clause (fact or rule) with disjunctive fuzzy constants and c is a degree in the unit interval [0,1 ] which denotes a lower bound on the belief on A in terms of necessity measures. Every fact and rule is attached with a degree of belief or weight in the real interval [0, 1 ] that denotes a lower bound on the belief on the fact and rule in terms of necessity measures. So, those facts and rules that are demonstrated to be key for the decision system have a higher weight, and facts and rules not so useful in the decision system have a lower weight.
- the rules created by the system can have a higher degree of belief than the rules created by a human, or vice versa.
- the system may create rules of the following form:
- the facts can have different degrees of belief depending on the source of the information.
- file management API functions e.g. CopyFile, CreateFile, EncryptFile, etc.
- networking APIs e.g.HttpCreateServerSession.DnsAcquireContextHandle, RpcStringBindingCompose, etc.
- the machine learning models can be enhanced by further using Reinforcement Learning methods.
- Reinforcement Learning is a set of techniques that allow to solve problems in highly uncertain or almost unknown domains.
- the method can use machine learning to select the most relevant features, using RL guided methods to derive the future reward (i.e. accuracy) of using such feature.
- the machine learning technique will be able to use a Q-Table (rewards table) of the RL method to accurately predict which feature and split set use for prediction, thus creating a quasi-optimal DT from which to derive the rules for the system. This last module makes the system keep learning from new threats, a key aspect when it comes to cybersecurity.
- the present invention also relates to a system for identifying a malicious file, the system comprising one or more computing entities adapted to perform the steps of the method of the first aspect of the invention for all its embodiments, said one or more computing entities including at least the following modules operatively connected to each other:
- preprocessing computing module configured and arranged to perform a static analysis of a potentially malicious file to obtain a set of features that provide an abstract view of the malicious file
- a machine learning module configured and arranged to perform a static machine learning classification process using as inputs said set of features, to obtain a preliminary classification output;
- a fuzzy inference module configured and arranged to perform a fuzzy inference procedure based on possibilistic logic using as input variables said set of features and said preliminary classification output, to generate an enhanced possibilistic classification output that identifies the potentially malicious file as a malicious file or a benign file.
- a computer program product is one embodiment that has a computer-readable medium including computer program instructions encoded thereon that when executed on at least one processor in a computer system causes the processor to perform the operations indicated herein as embodiments of the invention.
- the limitations mentioned above associated to the prior art methods are addressed by aggregating and combining multiple static features and the output of preferably multiple static classifiers to infer the maliciousness of a file based on a set of fuzzy rules. These rules might be inferred using the knowledge of cyber security experts or using any machine learning technique.
- the user has access to all the decisions taken in order to decide if a file is malicious. Additionally, an expert user can create additional rules, or modify the ones created by the method, system or computer program of the present invention. Brief Description of the Fiaures
- Fig. 1 schematically shows the system of the second aspect of the invention, for an embodiment, depicting its main modules.
- Fig. 2 is an Entropy versus Chunk diagram showing an example of a static analysis of the method of the first aspect of the invention to provide a set of features of an abstract view of an executable file in the form of a stream of entropy values of a structural entropy, computed using the Shannon’s formula, of the executable file, according to an embodiment, by means of the pre-processing computing module of the system of the second aspect of the invention.
- Fig. 3 shows gray scale images constituting sets of features obtained by respective static analyses of the method of the first aspect of the invention, representing abstracts views of different malware files (Rammnit, Lollipop, Kelihos_ver3), according to corresponding embodiments, by means of the pre-processing computing module of the system of the second aspect of the invention.
- Fig. 4 schematically shows an overview of a preprocessing module of the system of the second aspect of the present invention, decomposed into five components for performing five corresponding static analyses, including those associate to the embodiments of Figures 2 and 3, to an executable file.
- Fig. 5 schematically shows the system of the second aspect of the invention, for an embodiment for which the machine learning module includes one submodule, or static classifier, per each set of features provided by a respective static analyser of the pre-processing module.
- Fig. 6 schematically shows the system of the second aspect of the invention, for an embodiment for which the machine learning module includes only one static classifier that includes as inputs all the set of features provided by all the static analysers of the pre-processing module.
- Fig. 7 schematically shows the system of the second aspect of the invention, for an embodiment that differs to that of Figure 5 in that the machine learning module comprises, in addition, a further submodule that includes as inputs all the set of features provided by all the static analyzers of the pre-processing module.
- Fig. 8 schematically shows the system of the second aspect of the present invention, for an embodiment, including the preprocessing module, the machine learning module, and a fuzzy inference module decomposed in several functional blocks.
- Fig. 9 is a diagram that shows the membership function of some fuzzy subsets of sets of features obtained with a static analyzer, particularly of entropy values, for an embodiment of the fuzzification process performed according to the method of the first aspect of the invention, by means of the fuzzy inference module of the system of the second aspect of the invention.
- Fig. 10 graphically shows the membership function of some fuzzy subsets associated to scores obtained from a machine learning process applied on the sets of features of Figure 9, for an embodiment of the fuzzification process performed according to the method of the first aspect of the invention, by means of the fuzzy inference module of the system of the second aspect of the invention.
- Fig. 1 1 is a diagram that shows membership functions of scores obtained at the fuzzification process, as part of a defuzzification process to obtain crisp values, according to an embodiment of the method and system of the present invention. Detailed Description of Preferred Embodiments
- Fig. 1 shows an embodiment of the system of the second aspect of the present invention.
- the proposed system includes three components: a preprocessing module 1 10; a machine learning module 120 and a fuzzy interference module 130.
- the preprocessing module 1 10 is responsible of the extraction of features/characteristics 1 1 1 of a given software program 100 (also termed file or executable).
- the machine learning module 120 which can be composed of one or more machine learning modules 121 , given one or more of said extracted features/characteristics 1 1 1 , can output a score 123 (i.e. a preliminary classification output) indicating the maliciousness of the software program 100 with respect to the input features 1 1 1 .
- the fuzzy inference module 130 is responsible of performing inference upon fuzzy rules and given facts, i.e. characteristics of the software program 100 and the output scores 123 of the machine learning methods implemented by the machine learning modules 121 , to derive a reasonable output or conclusion 140 (i.e. an enhanced classification output), that is whether a file 100 is malicious or not. Notice that the invention might be applied to classifying malware into families without needing to make any significant modification.
- the terms“given facts” refer herein to the facts, data and input information of the fuzzy inference module 130. These data are the features extracted by the pre processing 1 10 and machine learning 120 modules.
- the preprocessing 1 10 and machine learning 120 modules are independent module or are comprised by a common feature extraction module.
- a file 100 is received at a client or server computer, and then a static type of analysis of the file 100, i.e. without executing the file, is initiated.
- This static analysis is performed by the preprocessing module 1 10, which processes the file 100 and generates an abstract view thereof.
- This abstract view might be represented by sets of features 1 1 1 .
- each machine learning classifier 122 is used as input to one or more static classifiers 122, each implemented in one of the cited machine learning submodules 121 .
- the output 123 of each machine learning classifier 122 is a value in the range [0, 1].
- a value close to 0 means that the executable 100 does not contain suspicious/malicious indicators with regard of a specific group of features 1 1 1 , otherwise, values close to 1 indicates maliciousness.
- Any machine learning method can be used as classifier. For instance, neural networks, support vector machines or decision trees.
- the fuzzy inference module 130 receives as input at least one or more features 1 1 1 extracted by the preprocessing module 1 10 and the output 123 of one or more static classifiers 122, and performs the inference procedure upon the rules and given facts to derive a reasonable output or conclusion 140, that is whether a file is malicious or not.
- Preprocessing module description The preprocessing module 1 10 is responsible of the feature extraction process. It analyses the software program 100 with static techniques (i.e. the program 100 is not executed). It extracts various characteristics from the programs’ 100 syntax and semantic.
- the software program 100 can take varying formats including, but not limited to, Portable Executable (PE), Disk Operating System (DOS) executable files, New Executable (NE) files, Linear Executable (LE) files, Executable and Linkable Format (ELF) files, JAVA Archive (JAR) files, and SHOCKWAVE/FLASH (SWF) files.
- PE Portable Executable
- DOS Disk Operating System
- NE New Executable
- LE Linear Executable
- ELF Executable and Linkable Format
- JAR JAVA Archive
- SWF SHOCKWAVE/FLASH
- the preprocessing module 1 10 extracts at least one, but not limited to, of the following sets or subsets (groups) of features:
- API Application Programming Interfaces
- API functions and system calls are related with services provided by operating systems. It supports various key operations such as networks, security, system services, file managements, and so on. In addition, they include various functions for utilizing system resources, such as memory, file system, network or graphics.
- API function calls can provide key information to represent the behavior of the software 100.
- every API function and system call had been associated a feature.
- the feature range is [0, 1]; 0 (or False) if the API function or system call hasn’t been called by the program; 1 (or True) otherwise.
- An executable file 100 is represented as a stream of entropy values, where each value describes the amount of entropy over a small chunk of code in a specific location of the file 100. For each chunk of code, the entropy is computed using the Shannon’s formula. There exists empirical evidence that the entropy time series from a given family are similar and distinct from those belonging to a different family. This is the result of reusing the code to create new malware variants. In consequence, the structural entropy of an executable 100 can be used to detect whether it is benign or malware and to classify it into their corresponding family.
- FIG. 2 shows an example of the above mentioned computed entropy versus chunk, for an embodiment.
- a software program 100 is disassembled (IDA Pro, Radare2, Capstone, etc.) and its sequence of assembly language instructions is extracted for further analysis.
- the operational codes of the machine language instruction were extracted.
- every byte has to be interpreted as one pixel in an image. Then, the resulting array has to be organized as a 2-D array and visualized as a gray scale image, as shown in Fig. 3.
- the main benefit of visualizing a malicious executable 100 as an image is that the different sections of a binary can be easily differentiated.
- malware authors only used to change a small part of the code to produce new variants. Thus, if old malware is re-used to create new binaries the resulting ones would be very similar. Additionally, by representing malware as an image it is possible to detect the small changes while retaining the global structure of samples.
- This group of features comprises hand-crafted features defined by cyber security experts. For instance, the size in bytes and the entropy of the sections of the Portable Executable file, the frequency of use of the registers, the frequency of a set of keywords from an executable, the attributes of the headers of the Portable Executable, among others.
- Fig. 4 presents an overview of the preprocessing module 1 10 decomposed into the five aforementioned components.
- Machine learning module description
- machine learning algorithms to address the problem of malicious software detection and classification has increased during the last decade. Instead of directly dealing with raw malware, machine learning solutions first have to extract features that provide an abstract view of the software. Then the features extracted can be used to feed one machine learning method at least.
- the system of the second aspect of the invention comprises and uses multiple machine learning submodules 121 , each receiving as inputs the output provided by a respective of the static classifiers of the preprocessing module 1 10.
- the system receives a file 100 (such as an executable file) at a client or server computer.
- the preprocessing module 1 10 is responsible of extracting a set of features 1 1 1 from the file 100, by means of the static classifiers. These features 1 1 1 are used as input to the machine learning submodules 121.
- the system has at least as many machine learning submodules 121 as groups of features.
- each machine learning submodule 121 is a value in the range [0,1].
- a value close to 0 means that the executable 100 do not contain suspicious/malicious indicators with regard of a specific group of features 1 1 1 , otherwise, the value will be close to 1.
- Any machine learning method can be used as static classifier. For instance, neural networks, support vector machines or decision trees.
- a feed-forward neural network with at least three layers: (1 ) an input layer, (2) one fully-connected layer and (3) an output layer can be used.
- the input layer has size equal to the length of the feature vector.
- the output layer has only one neuron and outputs the probability of an executable of being malicious or not. Additionally, a dropout after every fully-connected layer can be added.
- convolutional neural networks have achieved great success in image and time series related classification tasks.
- Convolutional neural networks consist of a sequence of convolutional layers, the output of which is connected only to local regions in the input. This structure allows learning filters able to recognize specific patterns in the input data.
- the convolutional network can be composed by 5 or more layers: (1 ) the input layer, (2) one convolutional layer, (3) one pooling layer, (4) one fully-connected layer and (5) the output layer.
- Static classifier embodiment 1 API function calls.
- the behavior of an executable file can be modelled by their use of the API functions.
- the executable file is disassembled to analyze and extract the API function calls it performs.
- every API function and system call has associated a feature.
- the feature range is [0,1]; 0 (or False) if the API function or system call hasn’t been called by the program; 1 (or True) otherwise.
- only a subset of the available API function calls a program can execute is considered. That is because the number of API function calls a program can execute is huge and some of them are irrelevant to model the program’s behavior. Thus, in some implementations only a subset of the available API function calls is considered. To select which are the most informative API function calls to record, any feature selection technique might be considered.
- a feed-forward network can be utilized to analyze the API functions invoked by a computer program.
- the feed-forward network may have one or more hidden layers followed by an output layer, which generates a classification for the file (e.g. malicious or benign).
- the classification of the file can be provided at an output of the convolutional neural network.
- Static classifier embodiment 2 Structural entropy.
- an executable file can be represented as a stream of entropy values, where each value describes the amount of entropy over a small chunk of code in a specific location of the file. For each chunk of code, the entropy is computed using the Shannon’s formula.
- a convolutional neural network can be utilized to analyze the stream of entropy values by applying a plurality of kernels to detect certain patterns in the variation between entropy values of adjacent chunks.
- the convolutional network can detect malicious executables by providing a classification of the disassembled binary file (maliciousness score: [0,1]).
- the convolutional neural network may include a convolutional layer, a pooling layer, a fully connected layer and an output layer.
- the convolutional neural network can be configured to process streams variable in length. As such, one or more techniques can be applied to generate fixed length representations of the entropy values.
- the first convolutional layer can be configured to process the stream of entropy values by applying a plurality of kernels K1 ,1 , K1 ,2,..., K1 ,x to the entropy values.
- Each kernel applied to the first convolutional layer can be configured to detect changes between entropy values of adjacent chunks in a file. According to some implementations, each kernel applied to the first convolutional layer can be adapted to detect a specific sequence of entropy values, having w values.
- the convolutional neural network has been indicated as comprising 3 convolutional layers, it should be appreciated that the convolutional neural network can include less or more convolutional layers.
- the pooling layer can be configured to further process the output from a preceding convolutional layer by compressing (e.g. subsampling or down sampling) the output from the preceding convolution layer.
- the pooling layer can compress the output by applying one or more pooling functions, including for example a maximum pooling functions.
- the output of the pooling layer can be further processed by the one or more fully connected layers and the output layer in order to generate a classification for the file (e.g. malicious or benign).
- the classification of the file can be provided at an output of the convolutional neural network.
- Static classifier embodiment 3 Assembly language instructions.
- a binary file can be disassembled thereby forming a discernible sequence of instructions having one or more identifying features (e.g. instruction mnemonics).
- a convolutional neural network (CNN) can be utilized to analyze the disassembled binary file by applying a plurality of kernels (filters) adapted to detect certain sequences of instructions in the disassembled file.
- the convolutional network can detect malicious executables by providing a classification of the disassembled binary file (maliciousness score: [0,1 ]).
- the convolutional neural network may include a convolutional layer, a pooling layer, a fully connected layer and an output layer.
- the convolutional neural network can be configured to process a sequence of instructions that are variable in length.
- one or more techniques can be applied to generate fixed length representations of the instructions.
- the fixed length of instructions can be encoded in a way the network understands their meaning.
- mnemonics are encoded using one-hot vector representations.
- each one-hot vector is represented as a word embedding, that is a vector of real numbers.
- This vector representation of the opcodes can be generated during the training phase of the convolutional network or using any other approach such as neural probabilistic language models, i.e. SkipGram model, Word2Vec model, Recurrent Neural Network models, etc.
- the first convolutional layer can be configured to process the encoded fixed mnemonics representations by applying a plurality of kernels K1 ,1 , K1 ,2,... K1 ,x to the encoded fixed mnemonics representations.
- Each kernel applied at the first convolutional layer can be configured to detect a specific sequence of instructions.
- each kernel applied to the first convolutional layer can be adapted to detect a sequence having a number of instructions. That is, kernels K can be adapted to detect instances where a number of instructions appear in a certain order.
- kernel K1 ,1 can be adapted to detect the instruction sequence [cmp, jne, dec] while kernel K1 ,2 can be adapted to detect the instruction set [dec, mov, jmp].
- the size of each kernel corresponds to the window size of the first convolutional layer.
- the convolutional layer may have kernels of different size. For instance, one kernel may be adapted to detect the instruction sequence [dec, mov, jmp] while another kernel may be adapted to detect the instruction set [dec, mov, jmp, pull, sub].
- the convolutional neural network is shown to include one convolutional layer, it should be appreciated that the convolutional neural network can include a different number of convolutional layers. For instance, the convolutional neural network can include more convolutional layers such as 2.
- the kernels K2,1 , K2,2, .... K,2,x applied to the second convolutional layer can be adapted to detect specific sequences of two or more of the sequences of instructions detected at the first convolutional layer. Consequently, the second convolutional layer would generate increasingly abstract representations of the sequence of instructions from the disassembled binary file.
- the pooling layer can be configured to further process the output from a preceding convolutional layer by compressing (e.g. subsampling or down sampling) the output from the preceding convolution layer.
- the pooling layer can compress the output by applying one or more pooling functions, including for example a maximum pooling functions.
- the output of the pooling layer can be further processed by the one or more fully connected layers and the output layer in order to generate a classification for the disassembled binary file (e.g. malicious or benign).
- the classification of the disassembled binary file can be provided at an output of the convolutional neural network.
- Static classifier embodiment 4 Image-based representation of malware’s hexadecimal content.
- a software program can be visualized as an image, where every byte interpreted as one pixel in the image. Then, the resulting array is organized as a 2-D array and visualized as a gray scale image.
- Approaches such as convolutional neural networks can yield classifiers that can learn to extract features that are at least as effective as human-engineered features.
- a convolutional neural network implementation to extract features can advantageously make use of the connectivity structure between feature maps to extract local and invariant features from an image.
- a convolutional neural network (CNN) can be utilized to analyze the file by applying a plurality of kernels (filters) adapted to detect certain local and invariant patterns in the pixels of the representation of the software program as a gray-scale image.
- the convolutional network can detect malicious executables by providing a classification of the disassembled binary file (maliciousness score: [0,1]).
- the convolutional neural network at least may include a convolutional layer, a pooling layer, a fully connected layer and an output layer. In some implementations, it may include more than one convolutional, pooling and fully connected layers. According to some implementations, each kernel applied to the first convolutional layer can be adapted to detect a pattern in the pixels of the image having w x h size, where w is the width and h is the height of the kernel. Subsequent convolutional layers detect increasingly abstract features.
- the pooling layer can be configured to further process the output from a preceding convolutional layer by compressing (e.g. subsampling or down sampling) the output from the preceding convolution layer.
- the pooling layer can compress the output by applying one or more pooling functions, including for example the maximum pooling function.
- the output of the pooling layer can be further processed by the one or more fully connected layers and the output layer in order to generate a classification for the file (e.g. malicious or benign).
- the classification of the file can be provided at an output of the convolutional neural network.
- Static classifier embodiment 5 Miscellaneous features.
- the so-called “miscellaneous” features include those applicable software characteristics. These characteristics at least include the keywords occurring in the software of the program and the fields of the header of a file in any format. Other type of features may also be used.
- Next table illustrates the fields of the header of a file in portable executable format.
- these fields are: MajorLinkedVersion, MinorLinkerVersion, SizeOfCode, SizeOflnitializedData, etc. Shown is relevant information that contains suitable characteristics to use as features. These characteristics are specific to information of a Portable Executable file header, but other file types will have other relevant header information and characteristics.
- the preprocessing module 1 10 is responsible of extracting a set of informative features 1 1 1 from the file 100. These features 1 1 1 are then aggregated and fed as input to a common static classifier 122, which will determine whether the file 100 is malicious or not.
- the input of the static classifier 123 is the features 1 1 1 from the distinct groups extracted by the preprocessing module 1 10.
- the output 123 of the static classifier 122 is a value in the range [0,1].
- a value close to 0 means that the executable 100 does not contain suspicious/malicious indicators with regard to the features 1 1 1 , otherwise, the value will be close to 1.
- Any machine learning method can be used as classifier. For instance, neural networks, support vector machines or decision trees.
- the preprocessing module 1 10 is responsible of extracting a set of informative features 1 1 1 from the file 100. These features 1 1 1 are used as input to static classifiers.
- the system has as many static classifiers as set of features and, in contrast to the embodiment of Fig. 5, a further static classifier that would aggregate and use the features of all groups as input.
- the output 123 of each machine learning classifier 122 is a value in the range [0, 1 ].
- a value close to 0 means that the executable 100 do not contain suspicious/malicious indicators with regard of a specific group of features 1 1 1 , otherwise, the value will be close to 1.
- Any machine learning method can be used as classifier. For instance, neural networks, support vector machines or decision trees.
- the last component of the malware detection system is the fuzzy inference engine 130. Its aim is to define a set of fuzzy rules of whether an executable is malicious based on the output of the machine learning methods and the features extracted by the preprocessing module.
- This component 130 performs the following steps:
- the fuzzy inference module 130 can be decomposed into functional blocks, as depicted in Fig. 8, and described below in detail.
- Fuzzification 131 involves two processes: derive the membership functions for input and output variables, and represent them with linguistic variables. (Given two inputs, x1 and y1 , determine the degree to which input variables belong to each of the appropriate fuzzy sets.)
- the input values are two-fold: a feature vector of program characteristics named F, of size
- F feature vector of program characteristics
- feature vector of program characteristics
- feature vector of program characteristics
- feature vector of program characteristics
- F i Î F corresponds to the value of the i-th feature of the program 100.
- This feature vector is extracted by the preprocessing module 1 10; and a score vector containing the output scores 123 of the machine learning algorithms named S of size
- is equal to the number of distinct algorithms that have been applied to predict the maliciousness of the program based on distinct groups of features.
- is equal to the number of distinct algorithms that have been applied to predict the maliciousness of the program based on distinct groups of features.
- the entropy of a bytes sequence refers to the amount of disorder (uncertainty) or its statistical variation.
- the entropy value ranges from 0 to 8. If occurrences of all values are the same, the entropy will be largest. On the contrary, if certain byte values occur with high probabilities, the entropy value will be smaller. According to studies, the entropy of plain text, native executables, packed executables and encrypted executables tend to differ greatly. In consequence, the [0,8] range can be further divided into at least six sub-ranges or subsets, which are:
- a trapezoidal waveform is utilized for this type of membership function. For instance, 4.0 entropy will belong to“very low” to 0.6 degree and to“low” to 0.4 degree.
- the score 123 of a given machine learning classifier 122 is a value in the range [0, 1].
- a value close to 0 means that the executable 100 do not contain suspicious/malicious indicators with regard of a specific group of features 1 1 1 , and it is a low threat, otherwise, the value will be close to 1 .
- This score 123 can be further divided into at least three sub-ranges or subsets which are:
- 0.4 score belongs to“LOW” to 0.38 degree and to“MEDIUM” to 1.0 degree.
- the fuzzy sets corresponding to all machine learning classifiers 122 are defined using the same membership functions for simplicity purposes. However, this is not a constraint and they might be defined with different membership functions and fuzzy sets.
- the rule base and the database of the invention are jointly referred to as the knowledge base 132.
- the knowledge base 132 comprises:
- IF-THEN rules lead to what action or actions should be taken in terms of the currently observed information.
- a fuzzy rule associates a condition described using linguistic variables and fuzzy sets to an output or a conclusion.
- the IF part is mainly used to capture knowledge and the THEN part can be utilized to give the conclusion or output in linguistic variable form.
- IF-THEN rules are widely used by the inference engine to compute the degree to which the input data matches the condition of a rule.
- Fuzzy sets are sets whose elements have degrees of membership. Fuzzy set theory permits the gradual assessment of the membership of elements in a set; this is described with the aid of a membership function valued in the real unit interval [0,1].
- the membership function represents the degree of truth.
- the system has associated one fuzzy set to every input feature. See the membership functions of features“entropy” and “machine learning score” previously presented.
- the IF-THEN rules and the membership functions of the fuzzy sets might be defined by experts in the field or by exploiting approximation techniques from neural networks.
- experts extract comprehensible rules from their vast knowledge of the field. These rules are fine-tuned using the available input-output data.
- neural network techniques are used to automatically derive rules from the data.
- Every rule is attached with a degree of belief or weight in the real interval (0, 1 ] that denotes a lower bound on the belief on the rule in terms of necessity measures. So, that rules that are demonstrated to be key for the decision system have a higher weight, and rules not so useful in the decision system have a lower weight.
- the rules created by the system may have a higher degree of belief than the rules created by a human, or vice versa.
- the system may create rules of the following form:
- the decision-making unit (Inference Engine) 135 is the inference procedure upon the fuzzy rules and given facts to derive a reasonable output or conclusion 140.
- the inference engine is based on the PGL+ reasoning system, for reasoning under possibilistic uncertainty and disjunctive vague knowledge.
- PGL+ is a possibilistic logic programming framework with fuzzy constants based on the Horn-rule fragment of Godel infinitely- valued logic with an efficient proof algorithm based on a complete calculus and oriented to goals (conclusions). Fuzzy constants are interpreted as disjunctive imprecise knowledge and the partial matching between them is computed by means of a fuzzy unification mechanism based on a necessity-like measure.
- the output of the Inference Engine 135 is a conclusion involving fuzzy constants together with the degree on the belief on the conclusion.
- the belief degree to classify the file 100 as malware is used, and fuzzy constants are transformed into crisp values using membership functions analogous to the ones used by the fuzzifier 131 .
- the invention may use, but not limited to, one of the following defuzzification 136 methods:
- the output fuzzy set might be decomposed into at least three sub-ranges or subsets, which are represented as membership functions in Fig. 1 1 :
- the fuzzy output is converted to a crisp output using, but not limited to, any of the aforementioned defuzzification methods 136.
- An unseen executable (XXXXXXXXXXX.exe) 100 is passed as input to the system.
- the preprocessing module 1 10 extracts a subset of features 1 1 1 that provides an abstract view of the program. In particular, the preprocessing module 1 10 extracts at least the following features 1 1 1 :
- the aforementioned data is passed as input to some machine learning methods to calculate a maliciousness score 123 based on a particular feature or subset of features 1 1 1.
- Machine learning model 1 outputs a maliciousness score 123 equal to 0.65 with respect to the structural entropy of the executable 100. (A machine learning model is defined as the output generated when a machine learning algorithm is trained with your training data).
- Machine learning model 2 outputs a maliciousness score 123 equal to 0.15 with respect to the sequence of instructions of the executable 100.
- Machine learning model 3 outputs a maliciousness score 123 equal to
- Rule 2 IF entropy(file) is“very_high” AND ML_score(ENTROPY) is“high” THEN file 100 is encrypted with a degree of belief of at least 0.9 c.
- Rule 3 IF has_section(UPX0) OR has_section(UPX1 ) or has_section(“X”) THEN file 100 is compressed with a degree of belief of at least 0.9 d.
- Rule 4 IF file 100 is encrypted AND ML_score(API) is “low” and ML_score(Opcodes) is“low” THEN file 100 is benign with a degree of belief of at least 0.7 e.
- Rule 5 IF file 100 is encrypted AND ML_score(API) is“medium” and ML_score(Opcodes) is “medium” THEN file 100 is suspicious with a degree of belief of at least 0.8
- file 100 is encrypted AND ML_score(API) is “low” and ML_score(Opcodes) is“low” THEN file 100 is benign with a degree of belief of at least 0.7
- the PGL+ involves a semantical unification model of disjunctive fuzzy constants and three other inference patterns together with a deductive mechanism based on a modus ponens style.
- the PGL+ system allows expressing both ill-defined properties and weights with which properties and patterns can be attached with. For instance, suppose that the problem observation corresponds to the following statement“it is almost sure that the entropy file is around_20”. This statement can be represented in the proposed system with the formula:
- entropy(.) is a classical predicate expressing the entropy property of the problem domain
- around_20 is a fuzzy constant
- degree 0.9 expresses how much is believed the formula entropy(around_20) in terms of a necessity measure.
- the PGL+ system computes the degree of belief of the crisp property encrypted by conveniently combining the degrees of belief 0.9 and 0.7 together with the degree of partial matching between both fuzzy constants high and around_20.
- the inference procedure based on the PGL+ reasoning system is divided in three algorithms which are applied sequentially.
- a completion algorithm which extends the set of rules and facts with all valid clauses by means of the following Generalized Resolution and Fusion inference rules:
- the completion algorithm first computes the set of valid clauses that can be derived by applying the Generalized resolution rule (i.e. by chaining clauses). Then, from this new set of valid clauses, the algorithm computes all valid clauses that can be derived by applying the Fusion rule (i.e. by fusing clauses). As the Fusion rule stretches the body of rules and the Generalized resolution rule modifies the body or the head of rules, the chaining and fusion steps have to be performed while new valid clauses are derived. As the chaining and fusion steps cannot produce infinite loops and each valid clause is either an original clause or can be derived at least from two clauses, in the worst-case each combination of clauses derives a different valid clause. Hence, as a finite set of facts and rules N is had, in the worst-case the number of valid clauses is
- c1 , c2 and c3 can derive a new valid clause if c1 and c2, c1 and c3, or c2 and c3 derive a valid clause different to c1 , c2 and c3.
- each clause can be replaced by
- D q can be computed from this finite set of facts by applying the UN and IN rules.
- the above mechanism can be recursively applied for determining
- Program aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
- Tangible non-transitory “storage” type media include any or all of the memory or other storage for the computers, processors, or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide storage at any time for the software programming.
- All or portions of the software may at times be communicated through a network such as the Internet or various other telecommunication networks.
- Such communications may enable loading of the software from one computer or processor into another, for example, from a management server or host computer of a scheduling system into the hardware platform(s) of a computing environment or other system implementing a computing environment or similar functionalities in connection with image processing.
- another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- the physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software.
- terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
- a machine-readable medium may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium.
- Non volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s), or the like, which may be used to implement the system or any of its components shown in the drawings.
- Volatile storage media may include dynamic memory, such as a main memory of such a computer platform.
- Tangible transmission media may include coaxial cables; copper wire and fiber optics, including the wires that form a bus within a computer system.
- Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Computer-readable media may include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a physical processor for execution.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Computer Hardware Design (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Fuzzy Systems (AREA)
- Automation & Control Theory (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Virology (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
L'invention concerne un procédé mis en œuvre par ordinateur, un système et des programmes informatiques permettant d'identifier un fichier malveillant. Le procédé consiste à effectuer une analyse statique d'un fichier potentiellement malveillant pour obtenir un ensemble de caractéristiques qui fournissent un résumé du fichier ; à effectuer un processus de classification par apprentissage machine statique en utilisant comme données d'entrées ledit ensemble de caractéristiques, pour obtenir une sortie de classification préliminaire ; et à effectuer une procédure d'inférence floue sur la base d'une logique possibiliste en utilisant comme variables d'entrée ledit ensemble de caractéristiques et ladite sortie de classification préliminaire, pour générer une sortie de classification améliorée qui identifie le fichier potentiellement malveillant comme un fichier malveillant ou un fichier anodin.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20744065.2A EP4004827A1 (fr) | 2019-07-30 | 2020-07-29 | Procédé mis en oeuvre par ordinateur, système et programme informatique permettant d'identifier un fichier malveillant |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP19382656.7 | 2019-07-30 | ||
| EP19382656 | 2019-07-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021018929A1 true WO2021018929A1 (fr) | 2021-02-04 |
Family
ID=67514512
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2020/071334 Ceased WO2021018929A1 (fr) | 2019-07-30 | 2020-07-29 | Procédé mis en œuvre par ordinateur, système et programme informatique permettant d'identifier un fichier malveillant |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4004827A1 (fr) |
| WO (1) | WO2021018929A1 (fr) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114036521A (zh) * | 2021-11-29 | 2022-02-11 | 北京航空航天大学 | 一种Windows恶意软件对抗样本生成方法 |
| CN114510721A (zh) * | 2022-02-18 | 2022-05-17 | 哈尔滨工程大学 | 一种基于特征融合的静态恶意代码分类方法 |
| US20220179953A1 (en) * | 2020-12-08 | 2022-06-09 | Mcafee, Llc | Systems, methods, and media for analyzing structured files for malicious content |
| CN114840737A (zh) * | 2021-12-03 | 2022-08-02 | 图模智能信息技术云南有限公司 | 一种个性化智能搜索方法 |
| CN115690499A (zh) * | 2022-10-31 | 2023-02-03 | 国能信息技术有限公司 | 恶意视频识别方法、装置及存储介质 |
| CN118965346A (zh) * | 2024-07-24 | 2024-11-15 | 西安电子科技大学杭州研究院 | 一种恶意软件的检测方法、装置、设备及产品 |
| CN119397524A (zh) * | 2024-07-12 | 2025-02-07 | 中国人民解放军网络空间部队信息工程大学 | 基于静态多特征优化与融合的恶意软件家族分类方法 |
| CN119783101A (zh) * | 2024-12-23 | 2025-04-08 | 重庆大学 | 用于恶意软件检测的敏感api构建方法及计算机程序产品 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100192222A1 (en) | 2009-01-23 | 2010-07-29 | Microsoft Corporation | Malware detection using multiple classifiers |
| EP2882159A1 (fr) | 2013-12-06 | 2015-06-10 | Cyberlytic Limited | Profilage de menaces électroniques détectées dans un environnement cible et génération automatique d'une ou plusieurs bases de règles pour un système expert utilisable pour établir le profil de telles menaces dans ledit environnement |
| WO2015160367A1 (fr) * | 2014-04-18 | 2015-10-22 | Hewlett-Packard Development Company, L.P. | Supervision pré-cognitive d'informations et d'événements de sécurité |
| US9495633B2 (en) | 2015-04-16 | 2016-11-15 | Cylance, Inc. | Recurrent neural networks for malware analysis |
| US9690938B1 (en) | 2015-08-05 | 2017-06-27 | Invincea, Inc. | Methods and apparatus for machine learning based malware detection |
| US9705904B1 (en) | 2016-07-21 | 2017-07-11 | Cylance Inc. | Neural attention mechanisms for malware analysis |
-
2020
- 2020-07-29 EP EP20744065.2A patent/EP4004827A1/fr not_active Withdrawn
- 2020-07-29 WO PCT/EP2020/071334 patent/WO2021018929A1/fr not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100192222A1 (en) | 2009-01-23 | 2010-07-29 | Microsoft Corporation | Malware detection using multiple classifiers |
| EP2882159A1 (fr) | 2013-12-06 | 2015-06-10 | Cyberlytic Limited | Profilage de menaces électroniques détectées dans un environnement cible et génération automatique d'une ou plusieurs bases de règles pour un système expert utilisable pour établir le profil de telles menaces dans ledit environnement |
| WO2015160367A1 (fr) * | 2014-04-18 | 2015-10-22 | Hewlett-Packard Development Company, L.P. | Supervision pré-cognitive d'informations et d'événements de sécurité |
| US9495633B2 (en) | 2015-04-16 | 2016-11-15 | Cylance, Inc. | Recurrent neural networks for malware analysis |
| US9690938B1 (en) | 2015-08-05 | 2017-06-27 | Invincea, Inc. | Methods and apparatus for machine learning based malware detection |
| US9705904B1 (en) | 2016-07-21 | 2017-07-11 | Cylance Inc. | Neural attention mechanisms for malware analysis |
Non-Patent Citations (3)
| Title |
|---|
| ALSINET T ET AL: "Formalizing argumentative reasoning in a possibilistic logic programming setting with fuzzy unification", INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, ELSEVIER SCIENCE, NEW YORK, NY, US, vol. 48, no. 3, 1 August 2008 (2008-08-01), pages 711 - 729, XP022757645, ISSN: 0888-613X, [retrieved on 20080618], DOI: 10.1016/J.IJAR.2007.07.004 * |
| DUBOIS D ET AL: "Possibilistic logic: a retrospective and prospective view", FUZZY SETS AND SYSTEMS, ELSEVIER, AMSTERDAM, NL, vol. 144, no. 1, 16 May 2004 (2004-05-16), pages 3 - 23, XP004503187, ISSN: 0165-0114, DOI: 10.1016/J.FSS.2003.10.011 * |
| NATARAJ ET AL., MALWARE IMAGES. VISUALIZATION AND AUTOMATIC CLASSIFICATION |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220179953A1 (en) * | 2020-12-08 | 2022-06-09 | Mcafee, Llc | Systems, methods, and media for analyzing structured files for malicious content |
| US11755728B2 (en) * | 2020-12-08 | 2023-09-12 | Mcafee, Llc | Systems, methods, and media for analyzing structured files for malicious content |
| CN114036521A (zh) * | 2021-11-29 | 2022-02-11 | 北京航空航天大学 | 一种Windows恶意软件对抗样本生成方法 |
| CN114036521B (zh) * | 2021-11-29 | 2024-05-03 | 北京航空航天大学 | 一种Windows恶意软件对抗样本生成方法 |
| CN114840737A (zh) * | 2021-12-03 | 2022-08-02 | 图模智能信息技术云南有限公司 | 一种个性化智能搜索方法 |
| CN114510721A (zh) * | 2022-02-18 | 2022-05-17 | 哈尔滨工程大学 | 一种基于特征融合的静态恶意代码分类方法 |
| CN114510721B (zh) * | 2022-02-18 | 2024-07-05 | 哈尔滨工程大学 | 一种基于特征融合的静态恶意代码分类方法 |
| CN115690499A (zh) * | 2022-10-31 | 2023-02-03 | 国能信息技术有限公司 | 恶意视频识别方法、装置及存储介质 |
| CN119397524A (zh) * | 2024-07-12 | 2025-02-07 | 中国人民解放军网络空间部队信息工程大学 | 基于静态多特征优化与融合的恶意软件家族分类方法 |
| CN118965346A (zh) * | 2024-07-24 | 2024-11-15 | 西安电子科技大学杭州研究院 | 一种恶意软件的检测方法、装置、设备及产品 |
| CN119783101A (zh) * | 2024-12-23 | 2025-04-08 | 重庆大学 | 用于恶意软件检测的敏感api构建方法及计算机程序产品 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4004827A1 (fr) | 2022-06-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4004827A1 (fr) | Procédé mis en oeuvre par ordinateur, système et programme informatique permettant d'identifier un fichier malveillant | |
| Yan et al. | A survey of adversarial attack and defense methods for malware classification in cyber security | |
| Zhang et al. | Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes | |
| Islam et al. | Android malware classification using optimum feature selection and ensemble machine learning | |
| Ravi et al. | A Multi-View attention-based deep learning framework for malware detection in smart healthcare systems | |
| Tran et al. | NLP-based approaches for malware classification from API sequences | |
| Kumar et al. | Malicious code detection based on image processing using deep learning | |
| Fan et al. | Malicious sequential pattern mining for automatic malware detection | |
| Silivery et al. | A model for multi-attack classification to improve intrusion detection performance using deep learning approaches | |
| Ring et al. | Malware detection on windows audit logs using LSTMs | |
| Eke et al. | The use of machine learning algorithms for detecting advanced persistent threats | |
| Zulfiqar et al. | DeepDetect: An innovative hybrid deep learning framework for anomaly detection in IoT networks | |
| Gayathri et al. | Adversarial training for robust insider threat detection | |
| Someya et al. | FCGAT: Interpretable malware classification method using function call graph and attention mechanism | |
| Hamad et al. | Bertdeep-ware: A cross-architecture malware detection solution for iot systems | |
| Osei et al. | An Attention-based Wide and Deep Neural Network for Reentrancy Vulnerability Detection in Smart Contracts | |
| Lodha et al. | SQL injection and its detection using machine learning algorithms and BERT | |
| Otsubo et al. | Compiler provenance recovery for multi-cpu architectures using a centrifuge mechanism | |
| Abdullah | A comparison of several intrusion detection methods using the NSL-KDD dataset | |
| Maniriho et al. | Deep learning models for detecting malware attacks | |
| CN114969734A (zh) | 一种基于api调用序列的勒索病毒变种检测方法 | |
| Song et al. | Multi-model Smart Contract Vulnerability Detection Based on BiGRU | |
| CN119577765A (zh) | 一种面向汇编代码的恶意软件智能分类检测方法和系统 | |
| Nofal et al. | SQL injection attacks detection and prevention based on neuro-fuzzy technique | |
| Zhang | Clement: Machine learning methods for malware recognition based on semantic behaviours |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20744065 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020744065 Country of ref document: EP Effective date: 20220228 |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2020744065 Country of ref document: EP |