WO2023048921A1 - Automatic graph-based detection of potential security threats - Google Patents
Automatic graph-based detection of potential security threats Download PDFInfo
- Publication number
- WO2023048921A1 WO2023048921A1 PCT/US2022/042279 US2022042279W WO2023048921A1 WO 2023048921 A1 WO2023048921 A1 WO 2023048921A1 US 2022042279 W US2022042279 W US 2022042279W WO 2023048921 A1 WO2023048921 A1 WO 2023048921A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- graph
- patterns
- connection
- bayesian network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/552—Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Definitions
- Malicious entities often use multiple assets and resources to execute attacks against targets, such as cloud services and resources.
- the attacks typically are intended to obtain unauthorized access to the targets and/or cause the targets to execute malicious code.
- the malicious entities typically refine their tools and develop new attack paths.
- security domain experts usually perform a substantial amount of manual work, including manually analyzing data and writing customized queries, which limits the scalability of these conventional techniques.
- Correlation engines were originally developed for on-premise systems with complete control over single-sourced security logs. Whereas modern cloud systems typically involve multiple solutions, cloud services, and data sources. Accordingly, automated techniques that utilize a correlation engine may not be comprehensively intelligent enough to detect attack patterns, especially considering the complexity and breadth of many newer attack patterns.
- Graph-based detection of potential security threats utilizes graph(s) to detect the potential security threats.
- a graph is a mathematical structure that is used to model pairwise relations between objects.
- the graph includes graph nodes (a.k.a. vertices), and at least some pairs of the graph nodes are connected by respective edges (a.k.a. links). For instance, a first edge may connect graph nodes A and B; a second edge may connect graph nodes B and C; and a third edge may connect graph nodes C and A.
- a graph node may be connected to any one or more other graph nodes by one or more respective edges.
- a potential security threat may be a potential negative (e.g., malicious) action or event that is facilitated by a vulnerability and that is configured to result in an unwanted impact to a computing system.
- a Bayesian network is initialized using an association graph, based on (e.g., based at least in part on) correlations among graph nodes that are included in the association graph, to establish connections among network nodes that are included in the Bayesian network.
- the network nodes of the Bayesian network are grouped among clusters that correspond to respective intents such that, for each connection between a respective pair of network nodes, which includes an arbitrary network node and a network node that is included in a cluster, a connection between the arbitrary network node and each of the other network nodes that are included in that cluster is created. Patterns in the Bayesian network are identified. Each pattern includes at least one connection.
- Each connection is between a respective pair of network nodes.
- At least one redundant connection which is redundant with regard to one or more other connections, is removed from the patterns in the Bayesian network. Scores are assigned to the respective patterns in the Bayesian network, based on knowledge of historical patterns and historical security threats, such that each score indicates a likelihood of the respective pattern to indicate a security threat.
- An output graph is automatically generated. The output graph includes each pattern that has a score that is greater than or equal to a score threshold and does not include each pattern that has a score that is less than the score threshold. Each pattern in the output graph represents a potential security threat.
- FIG. 1 is a block diagram of an example automatic graph-based detection system in accordance with an embodiment.
- FIG. 2 depicts a flowchart of an example method for performing automatic graph-based detection of potential security threats in accordance with an embodiment.
- FIG. 3 is a block diagram of an example computing system in accordance with an embodiment.
- FIGS. 4-6 depict example Bayesian networks in accordance with embodiments.
- FIG. 7 depicts an example computer in which embodiments may be implemented.
- references in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” or the like, indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the relevant art(s) to implement such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
- Descriptors such as “first”, “second”, “third”, etc. are used to reference some elements discussed herein. Such descriptors are used to facilitate the discussion of the example embodiments and do not indicate a required order of the referenced elements, unless an affirmative statement is made herein that such an order is required.
- Example embodiments described herein are capable of performing automatic graph-based detection of potential security threats.
- Graph-based detection of potential security threats utilizes graph(s) to detect the potential security threats.
- a graph is a mathematical structure that is used to model pairwise relations between objects.
- the graph includes graph nodes (a.k.a. vertices), and at least some pairs of the graph nodes are connected by respective edges (a.k.a. links). For instance, a first edge may connect graph nodes A and B; a second edge may connect graph nodes B and C; and a third edge may connect graph nodes C and A.
- a graph node may be connected to any one or more other graph nodes by one or more respective edges.
- a potential security threat may be a potential negative (e.g., malicious) action or event that is facilitated by a vulnerability and that is configured to result in an unwanted impact to a computing system.
- Example techniques described herein have a variety of benefits as compared to conventional techniques for detecting potential security threats.
- the example techniques may be capable of automatically detecting potential security threats with less noise and/or with greater speed, efficiency, reliability, and/or effectiveness than the conventional techniques.
- the example techniques may be capable of more accurately and/or precisely detecting potential security threats.
- the example techniques may be more scalable than the conventional techniques.
- Each operation e.g., unsupervised learning, statistical testing, causal inference, clustering, supervised classification, utilizing manually generated rules
- Each operation e.g., unsupervised learning, statistical testing, causal inference, clustering, supervised classification, utilizing manually generated rules
- Each operation e.g., unsupervised learning, statistical testing, causal inference, clustering, supervised classification, utilizing manually generated rules
- Each operation e.g., unsupervised learning, statistical testing, causal inference, clustering, supervised classification, utilizing manually generated rules
- the example techniques may provide a modular solution for detecting potential security threats.
- the example techniques may reduce an amount of time and/or assets (e.g., processor, memory, network bandwidth) that are consumed to detect potential security threats and/or to respond to (e.g., mitigate) negative impacts that result from the potential security threats. For example, by using graph(s) to automatically detect potential security threats, an amount of time and/or assets consumed to detect the potential security threats and/or to respond to the negative impacts that result from those potential security threats may be reduced. For instance, the example techniques may prevent the negative impacts of the potential security threats from occurring in which case the amount of time and/or assets consumed to respond to the negative impacts may be avoided.
- assets e.g., processor, memory, network bandwidth
- the example techniques may improve (e.g., increase) a user experience of a user whose computing device or account is affected by the potential security threats, increase efficiency of the user, and/or reduce a cost associated with detecting the potential security threats and responding to the corresponding negative impacts.
- FIG. 1 is a block diagram of an example automatic graph-based detection system 100 in accordance with an embodiment.
- the automatic graph-based detection system 100 operates to perform automatic graph-based detection of potential security threats.
- the automatic graph-based detection system 100 includes a plurality of protectable entities 102A-102M and a computing system 106.
- Each of the protectable entities 102A-102M may be a processing system, an application, a service, a client, a user (e.g., a user ID), or any entity that possesses sensitive, proprietary, and/or important information.
- An example of a processing system is a system that includes at least one processor that is capable of manipulating data in accordance with a set of instructions.
- a processing system may be a computer, a personal digital assistant, a cellular telephone, an Internet of things (loT) device, etc.
- a computer include but are not limited to a desktop computer, a laptop computer, a tablet computer, a wearable computer (e.g., a smart watch or a head-mounted computer), a server computer (e.g., a web server, a file server, or a print server), and a client computer.
- the protectable entities 102A-102M are configured to generate logs (a.k.a. log data) and/or events (e.g., security alerts). For instance, a first protectable entity 102A is shown to generate first logs and events 104A; a second protectable entity 102B is shown to generate second logs and events 104B; and an Mth protectable entity 102M is shown to generate Mth logs and events 104M.
- a log that is generated by a protectable entity includes multiple log entries such that each log entry indicates an action that is performed with regard to the protectable entity. For instance, a log entry may indicate an action that is performed on the protectable entity or by the protectable entity.
- the log entry may indicate a request that is received by the protectable entity, data accessed by the protectable entity in response to the request, and/or an operation performed on the data by the protectable entity.
- An event that is generated by a protectable entity indicates an occurrence that is encountered by the protectable entity.
- an event may be in the form of a security alert.
- a security alert that is generated by a protectable entity indicates an occurrence that potentially negatively impacts security of the protectable entity.
- the protectable entity may identify the occurrence using a security alert based on a confidence that the event is to result in a negative impact to the security of the protectable entity being greater than or equal to a confidence threshold and/or as a result of an estimated severity (e.g., estimated extent) of the negative impact being greater than or equal to a severity threshold.
- a confidence threshold e.g., a confidence threshold
- an estimated severity e.g., estimated extent
- the computing system 106 is a processing system that is configured to receive the logs and events 104A-104M from the protectable entities 102A-102M.
- the computing system 106 may be a physical processing system or a virtual processing system.
- the computing system 106 may host any one or more of the protectable entities 102A-102M, though the scope of the example embodiments is not limited in this respect.
- the computing system 106 may be connected to any one or more of the protectable entities 102A-102M via a network. For instance, communication between the protectable entities 102A-102M and the computing system 106 may be carried out over the network using well-known network communication protocols.
- the network may be a wide-area network (e.g., the Internet), a local area network (LAN), another type of network, or a combination thereof.
- the computing system 106 may be connected to any one or more of the protectable entities 102A-102M via a direct connection (e.g., and not via a network).
- the computing system 106 may not be connected to one or more of the protectable entities 102A-102M.
- a protectable entity may store its logs and/or events in a storage that is external to the protectable entity, and the computing system 106 may be connected to the storage (e.g., without being connected to the protectable entity).
- an intermediary may obtain logs and/or events from a protectable entity, and the computing system 106 may be connected to the intermediary in order to obtain the logs and/or events.
- the computing system 106 includes automatic graph-based detection logic 108, which is configured to perform automatic graph-based detection of potential security threats.
- the automatic graph-based detection logic 108 analyzes the logs and events 104A- 104M, which are obtained directly or indirectly from the protectable entities 102A-102M, to detect the security threats. For instance, the automatic graph-based detection logic 108 may retrieve the logs and events 104A-104M from the protectable entities 102A-102M, from storage associated with the protectable entities 102A-102M, or from an intermediary that obtains the logs and events 104A-104M from the protectable entities 102A-102M.
- the automatic graph-based detection logic 108 may retrieve the logs and events 104A-104M directly from the protectable entities 102A-102M by intercepting the logs and events 104A-104M from the protectable entities 102A-102M in real-time.
- the automatic graph-based detection logic 108 generates an association graph based on (e.g., based at least in part on) the logs and events 104A-104M.
- the automatic graph-based detection logic 108 initializes a Bayesian network using the association graph, based on correlations among graph nodes that are included in the association graph, to establish connections among network nodes that are included in the Bayesian network.
- the automatic graph-based detection logic 108 groups the network nodes of the Bayesian network among clusters that correspond to respective intents such that, for each connection between a respective pair of network nodes, which includes an arbitrary network node and a network node that is included in a cluster, a connection between the arbitrary network node and each of the other network nodes that are included in that cluster is created.
- the automatic graph-based detection logic 108 identifies patterns in the Bayesian network. Each pattern includes at least one connection. Each connection is between a respective pair of network nodes.
- the automatic graph-based detection logic 108 removes at least one redundant connection, which is redundant with regard to one or more other connections, from the patterns in the Bayesian network.
- the automatic graph-based detection logic 108 assigns scores to the respective patterns in the Bayesian network, based on knowledge of historical patterns and historical security threats, such that each score indicates a likelihood of the respective pattern to indicate a security threat.
- the automatic graph-based detection logic 108 automatically generates an output graph.
- the output graph includes each pattern that has a score that is greater than or equal to a score threshold.
- the output graph does not include each pattern that has a score that is less than the score threshold.
- Each pattern in the output graph represents a potential security threat.
- the automatic graph-based detection logic 108 may include or be incorporated into security information and event management (SIEM) logic.
- SIEM logic is configured to perform security information management (SIM) and security event management (SEM) to provide real-time analysis of security alerts generated by applications and network hardware.
- SIM security information management
- SEM security event management
- SIM may be used to provide long-term storage, analysis (e.g., trend analysis), and reporting of log data.
- SIM may be used to store the log data in a central repository.
- SEM may be used to provide real-time monitoring and correlation of events, notifications, and console views. It will be recognized that the example techniques described herein may be implemented using SIEM logic.
- the automatic graph-based detection logic 108 may use machine learning (ML) to perform any one or more of its operations.
- ML machine learning
- the automatic graph-based detection logic 108 may use the machine learning to develop and refine the clusters among which the network nodes of the Bayesian network are grouped, the patterns that are identified in the Bayesian network, the scores that are assigned to the respective patterns in the Bayesian network, and/or the output graph. For example, the automatic graph-based detection logic 108 may use the machine learning to analyze the network nodes of the Bayesian network to identify the intents associated with respective subsets of the network nodes. In accordance with this example, the automatic graph-based detection logic 108 may then group the network nodes associated with each intent into a respective cluster.
- the automatic graph-based detection logic 108 may use the machine learning to analyze the network nodes of the Bayesian network to identify the patterns among the network nodes.
- the automatic graph-based detection logic 108 may use the machine learning to analyze historical information, which indicates the historical patterns and the historical security threats, to derive the scores that are to be assigned to the respective patterns in the Bayesian network.
- the automatic graphbased detection logic 108 may use the machine learning to analyze the patterns and the scores to generate the output graph.
- the automatic graph-based detection logic 108 may use the machine learning to establish the score threshold against which the scores of the respective patterns in the Bayesian network are compared for purposes of generating the output graph.
- the automatic graph-based detection logic 108 may use a neural network to perform the machine learning to predict the intents associated with the respective subsets of the network nodes (and the corresponding clusters), the patterns among the network nodes in the Bayesian network, the scores that are to be assigned to the respective patterns in the Bayesian network, and/or the score threshold against which the scores are compared.
- the automatic graph-based detection logic 108 may use the predicted intents, patterns, scores, and/or score threshold to generate the output graph.
- Examples of a neural network include but are not limited to a feed forward neural network and a long short-term memory (LSTM) neural network.
- LSTM long short-term memory
- a feed forward neural network is an artificial neural network for which connections between units in the neural network do not form a cycle.
- the feed forward neural network allows data to flow forward (e.g., from the input nodes toward to the output nodes), but the feed forward neural network does not allow data to flow backward (e.g., from the output nodes toward to the input nodes).
- the automatic graph-based detection logic 108 employs a feed forward neural network to train a machine learning model that is used to determine ML-based confidences. Such ML-based confidences may be used to determine likelihoods that events will occur.
- An LSTM neural network is a recurrent neural network that has memory and allows data to flow forward and backward in the neural network.
- the LSTM neural network is capable of remembering values for short time periods or long time periods. Accordingly, the LSTM neural network may keep stored values from being iteratively diluted over time.
- the LSTM neural network may be capable of storing information, such as historical intents, patterns, scores, score thresholds, and security threats over time. For instance, the LSTM neural network may generate the output graph by utilizing such information.
- the LSTM neural network may be capable of remembering relationships between features, such as events that are represented by the network nodes of the Bayesian network, sequences (e.g., temporal sequences) of such events, entities associated with such events, probabilities that such events, sequences, and/or entities correspond to a potential security threat, and ML-based confidences that are derived therefrom.
- features such as events that are represented by the network nodes of the Bayesian network, sequences (e.g., temporal sequences) of such events, entities associated with such events, probabilities that such events, sequences, and/or entities correspond to a potential security threat, and ML-based confidences that are derived therefrom.
- the automatic graph-based detection logic 108 may include training logic and inference logic.
- the training logic is configured to train a machine learning algorithm that the inference logic uses to determine (e.g., infer) the ML-based confidences.
- the training logic may provide sample events, sample sequences of the sample events, sample entities associated with the sample events, sample probabilities that the sample events, sample sequences, and/or sample entities correspond to a potential security threat, and sample confidences as inputs to the algorithm to train the algorithm.
- the sample data may be labeled.
- the machine learning algorithm may be configured to derive relationships between the features (e.g., events, sequences, entities, and probabilities that the events, sequences, and/or entities correspond to a potential security threat) and the resulting ML-based confidences.
- the inference logic is configured to utilize the machine learning algorithm, which is trained by the training logic, to determine the ML-based confidence when the features are provided as inputs to the algorithm.
- the automatic graph-based detection logic 108 may be implemented in various ways to perform automatic graph-based detection of potential security threats, including being implemented in hardware, software, firmware, or any combination thereof.
- the automatic graphbased detection logic 108 may be implemented as computer program code configured to be executed in one or more processors.
- at least a portion of the automatic graph-based detection logic 108 may be implemented as hardware logic/electrical circuitry.
- the automatic graph-based detection logic 108 may be implemented in a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), an application-specific standard product (ASSP), a system-on-a-chip system (SoC), a complex programmable logic device (CPLD), etc.
- FPGA field-programmable gate array
- ASIC application-specific integrated circuit
- ASSP application-specific standard product
- SoC system-on-a-chip system
- CPLD complex programmable logic device
- Each SoC may include an integrated circuit chip that includes one or more of a processor (a microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and/or further circuits and/or embedded firmware to perform its functions.
- DSP digital signal processor
- the automatic graph-based detection logic 108 may be partially or entirely incorporated in a SIEM program, though the example embodiments are not limited in this respect.
- FIG. 2 depicts a flowchart 200 of an example method for performing automatic graph-based detection of potential security threats in accordance with an embodiment.
- Flowchart 200 may be performed by the computing system 106 shown in FIG. 1, for example.
- flowchart 200 is described with respect to computing system 300 shown in FIG. 3, which is an example implementation of the computing system 106.
- the computing system 300 includes automatic graph-based detection logic 308 and a store 310.
- the automatic graph-based detection logic 308 includes association graph logic 312, initialization logic 314, grouping logic 316, pattern identification logic 318, redundancy removal logic 320, scoring logic 322, and output graph logic 324.
- the store 310 may be any suitable type of store.
- One type of store is a database.
- the store 310 may be a relational database, an entityrelationship database, an object database, an object relational database, an extensible markup language (XML) database, etc.
- the store 310 is shown to store historical information 340 for non-limiting illustrative purposes. Further structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the discussion regarding flowchart 200.
- a Bayesian network is initialized using an association graph, based on correlations among graph nodes that are included in the association graph, to establish connections among network nodes that are included in the Bayesian network.
- the initialization logic 314 initializes the Bayesian network using the association graph.
- the initialization logic 314 may receive association graph information 328, which identifies the graph nodes and the correlations among the graph nodes.
- the initialization logic 314 may analyze the association graph information to determine the graph nodes and the correlations among the graph nodes.
- the initialization logic 314 may initialize the Bayesian network based on the graph nodes and the correlations that are identified by the association graph information 328.
- the initialization logic 314 may generate Bayesian network information 330, which indicates (e.g., identifies) the network nodes of the Bayesian network and the connections among the network nodes.
- the Bayesian network is initialized at step 202 using the association graph based on pairwise correlations among the graph nodes that are included in the association graph.
- the Bayesian network is initialized at step 202 by performing a test of significance on the association graph to identify the correlations among the graph nodes in the association graph.
- a test of significance compares observed data with a hypothesis to determine whether the hypothesis is true.
- the hypothesis may be that correlations exists between some pairs of graph nodes in the association graph and do not exist between other pairs of graph nodes in the association graph.
- the test of significance may determine likelihoods that observed relationships between some pairs of graph nodes constitute actual correlations and likelihoods that absence of observed relationships between other pairs of graph nodes constitute absence of actual correlations.
- the test of significance may identify the correlations among the graph nodes based on the likelihood associated with each pair of graph nodes. For instance, a likelihood that is greater than or equal to a likelihood threshold may indicate existence of a correlation between the respective pair of graph nodes. A likelihood that is less than the likelihood threshold may indicate that a correlation between the respective pair of graph nodes does not exist.
- initializing the Bayesian network at step 202 includes assigning a weight to each pair of network nodes in the Bayesian network.
- Each weight represents an extent to which the network nodes in the respective pair are related. For instance, each weight may indicate a confidence that the network nodes in the respective pair are related.
- each weight may be calculated using the expression log ⁇ l+P(AiAj)/[P(Ai)P(Aj)] ⁇ , where P(Ai) is a probability of a first network node that is included in the respective pair, P(Aj) is a probability of a second network node that is included in the respective pair, and P(AiAj) is a probability of a combination of the first and second network nodes.
- initializing the Bayesian network at step 202 further includes removing a connection between each pair of network nodes in the Bayesian network that has a weight that is less than or equal to a weight threshold.
- each graph node of the association graph represents an entity from a plurality of entities or an event from a plurality of events.
- entities include but are not limited to a user, an internet protocol (IP) address, an alert, a host (e.g., client host), a virtual machine (VM), a file, a cloud subscription, and a domain controller.
- IP internet protocol
- VM virtual machine
- each network node of the Bayesian network represents an event from the plurality of events.
- each network node may indicate an event name, a provider, and/or a join type associated with the event.
- a join type may indicate that the corresponding event is joined by (i.e., associated with) a user, a host, an IP address, and so on.
- the network nodes of the Bayesian network are grouped among clusters that correspond to respective intents such that, for each connection between a respective pair of network nodes, which includes an arbitrary network node and a network node that is included in a cluster, a connection between the arbitrary network node and each of the other network nodes that are included in that cluster is created.
- each network node of the Bayesian network may be grouped into a single cluster.
- Examples of an intent include but are not limited to attempting to access file(s) maliciously, attempting to access file(s) legitimately, uploading file(s), and partitioning memory.
- the network nodes of the Bayesian network may be grouped among the clusters using an unsupervised clustering algorithm.
- the network nodes of the Bayesian network may be grouped among the clusters using a supervised clustering algorithm.
- grouping logic 316 groups the network nodes of the Bayesian network among the clusters that correspond to respective intents. For example, the grouping logic 316 may analyze the Bayesian network information 330, which indicates the network nodes of the Bayesian network and the connections among the network nodes, to determine the intent that is associated with each connection. For instance, the grouping logic 316 may assign probabilities for the respective intents to each connection in the Bayesian network such that each probability indicates a confidence that the respective intent is to be associated with the connection. The grouping logic 316 may identify the intent associated with each connection to be the intent having a confidence that is no less than (e.g., is greater than) confidences of the other intents with regard to the connection.
- the grouping logic 316 may group the network nodes having connections that are associated with each intent into a respective cluster. For instance, the grouping logic 316 may group network nodes having connections that are associated with a first intent into a first cluster; the grouping logic 316 may group network nodes having connections that are associated with a second intent into a second cluster, and so on. In further accordance with this implementation, for each connection between a respective pair of network nodes, which includes an arbitrary network node and a network node that is included in a cluster, the grouping logic 316 creates a connection between the arbitrary network node and each of the other network nodes that are included in that cluster.
- the grouping logic 316 may create the connections to the other network nodes based on the grouping.
- the grouping logic 316 may generate grouping information 332 to indicate the network nodes of the Bayesian network, the cluster in which each network node is grouped, and/or the connections among the network nodes.
- each network node of the Bayesian network represents an event from a plurality of events
- the events represented by the network nodes in each cluster are configured to achieve the intent with which the cluster corresponds.
- Network node A may represent “atypical travel.”
- Network node B may represent “unfamiliar sign-in properties.”
- Network node C may represent “user and IP address reconnaissance (SMB).”
- Network node D may represent “network mapping reconnaissance (DNS).”
- Network node E may represent “user and group membership reconnaissance (SAMR).”
- Network node F may represent “new group add suspiciously.”
- Network node G may represent “an uncommon file was created and added to startup folder.”
- Network node H may represent “fake Windows binary set to autostart.”
- Network node I may represent “Sticky Keys binary hijack detected.”
- Network node J may represent “user account created under suspicious circumstances.”
- Network node K may represent “new local admin added using Net commands.”
- Network nodes A and B may be grouped in a first cluster based on nodes A and B corresponding to an intent of “exploitation.”
- Network nodes C-E may be grouped in a second cluster based on nodes C-E corresponding to an
- a plurality of patterns in the Bayesian network are identified.
- Each pattern includes at least one connection.
- Each connection is between a respective pair of network nodes.
- each pattern may represent a respective sequence of connections.
- each network node of the Bayesian network represents an event from a plurality of events
- each pattern may represent a respective sequence of events.
- the pattern identification logic 318 identifies the patterns in the Bayesian network. For example, the pattern identification logic 318 may analyze the grouping information 332 to identify the patterns. In accordance with this example, the pattern identification logic 318 may review the connections among the network nodes, as indicated by the grouping information 332, to identify the patterns.
- the pattern identification logic 318 may generate pattern information 334, which indicates the plurality of patterns in the Bayesian network.
- the pattern information 334 may further indicate the network nodes of the Bayesian network, the cluster in which each network node is grouped, and/or the connections among the network nodes.
- identifying the plurality of patterns at step 206 includes automatically identifying a first subset of the plurality of patterns using a machine learning technique. In accordance with this embodiment, identifying the plurality of patterns at step 206 further includes identifying a second subset of the plurality of patterns using a manually generated rule (e.g., based on each pattern in the second subset not being automatically identified by using the machine learning technique).
- the manually generated rule may be a human-generated rule.
- One example of a manually generated rule may specify that a designated network node joined by a domain controller is not suspicious, whereas the designated network node joined by a client host is suspicious. When a request is sent to the domain controller, the domain controller forwards the request to a client host.
- the network node being joined by the domain controller may not indicate that the domain controller has been compromised.
- An algorithm used in the machine learning technique may not be capable of learning this distinction, and the manually generated rule may be used to ensure that the distinction is made.
- the first subset includes at least one pattern from the plurality of patterns
- the second subset includes at least one pattern from the plurality of patterns.
- the redundancy removal logic 320 may remove at least one redundant connection from the patterns in the Bayesian network.
- the redundancy removal logic 320 may analyze the pattern information 334 to identify the connections among the network nodes and the plurality of patterns in the Bayesian network. The redundancy removal logic 320 may compare each connection with other connection(s) to determine whether each connection is redundant with regard to any one or more of the other connection(s). In accordance with this example, the redundancy removal logic 320 may remove from the patterns, which are indicated by the pattern information 334, each connection that is determined to be redundant with regard to one or more other connections.
- the redundancy removal logic 320 may generate updated pattern information 336, which indicates the plurality of patterns in the Bayesian network as revised to exclude the redundant connect! on(s).
- the pattern information 334 may further indicate the network nodes of the Bayesian network, the cluster in which each network node is grouped, and/or the non-redundant connections among the network nodes.
- removing the at least one redundant connection at step 208 includes removing a first connection, which is between a first network node and a second network node, from the patterns in the Bayesian network based on the first connection being redundant with regard to a second connection, which is between the first network node and a third network node, and a third connection, which is between the third network node and the second network node.
- the first connection may be removed from the patterns based on the first connection linking the first network node and the second network node and a combination of the second and third connections also linking the first network node and the second network node.
- removing the at least one redundant connection at step 208 includes removing a first connection, which is between a first network node and a second network node, from the patterns in the Bayesian network based on the first connection being redundant with regard to a second connection, which is between the first network node and a third network node, as a result of the second network node and the third network node being in a same cluster.
- the second network node and the third network node may be deemed to be equivalent as a result of the second network node and the third network node being in the same cluster.
- scores are assigned to the respective patterns in the Bayesian network, based on knowledge of historical patterns and historical security threats, such that each score indicates a likelihood of the respective pattern to indicate a security threat.
- the likelihood of a pattern to indicate a security threat may be based on an extent to which the pattern corresponds to (e.g., matches) a historical pattern that corresponds to a historical security threat, a number of historical patterns with which the pattern corresponds that correspond to a historical security threat, an extent to which the pattern corresponds to a historical pattern that does not correspond to a historical security threat, and/or a number of historical patterns with which the pattern corresponds that do not correspond to a historical security threat.
- the historical patterns and the historical security threats may be those that have been identified within a specified period of time or that have occurred within the specified period of time.
- the specified period of time may be a 20-day time period that ends with a current time, a 30-day time period that ends with the current time, or a 60-day time period that ends with the current time.
- the scoring logic 322 assigns the scores to the respective patterns in the Bayesian network based on historical information 340, which indicates the historical patterns and the historical security threats. For instance, the scoring logic 322 may retrieve the historical information 340 from the store 310 for purposes of determining the scores to be assigned to the respective patterns. The scoring logic 322 may generate scoring information 338, which indicates the plurality of patterns in the Bayesian network (as revised to exclude the redundant connect! on(s)) and the scores that are assigned to the respective patterns. For instance, the scoring information 338 may cross-reference the patterns to the corresponding scores. The scoring information 338 may further indicate the network nodes of the Bayesian network, the cluster in which each network node is grouped, and/or the non-redundant connections among the network nodes.
- assigning the scores to the respective patterns in the Bayesian network at step 210 is performed using a classifier that is trained using features that are derived from labeled data. Accordingly, the scores may be assigned using supervised classification.
- the labeled data represents the knowledge of the historical patterns and the historical security threats. For instance, the labeled data may indicate known attack patterns, user feedback, and/or manually generated labels.
- the known attack patterns may include patterns that have been previously flagged as representing potential security threats.
- the user feedback may include impressions (e.g., opinions) of users regarding the historical patterns (e.g., whether one or more of the historical patterns correspond to one or more of the historical security threats).
- a pattern that is known to be non-threatening or that has affected relatively few (e.g., one or two) computing devices may be assigned a relatively low score (e.g., a score of zero); whereas a pattern that is known to represent a potential security threat or that has affected a substantial number of computing devices may be assigned a relatively high score.
- a first pattern which includes a connection between a first network node representing “atypical travel” and a second network node representing “suspicious inbox manipulation rule,” is assigned a score of 0.8.
- a second pattern which includes a connection between the first network node and a third network node representing “network communication with a malicious machine detected,” is assigned a score of 0.71.
- a third pattern which includes a connection between the first network node and a fourth network node representing “crypto-mining activity,” is assigned a score of 0.71.
- a fourth pattern which includes a connection between a fifth network node representing “suspected brute-force attack” and a sixth network node representing “suspicious behavior by cmd.exe was observed,” is assigned a score of 0.7.
- a fifth pattern which includes a connection between a seventh network node representing “filed SSH brute force attack” and an eighth network node representing “executable application control policy violation was audited,” is assigned a score of 0.2.
- a sixth pattern which includes a connection between a ninth network node representing “connection to a custom network indicator” and a tenth network node representing “credit card number,” is assigned a score of 0.19.
- an output graph is automatically generated.
- the output graph includes each pattern that has a score that is greater than or equal to a score threshold and does not include each pattern that has a score that is less than the score threshold.
- Each pattern in the output graph represents a potential security threat.
- the output graph logic 324 automatically generates an output graph 342, which includes each pattern that has a score that is greater than or equal to the score threshold and which does not include each pattern that has a score that is less than the score threshold. For instance, the output graph logic 324 may compare the score that is assigned to each pattern and the score threshold to determine whether the respective score is greater than or equal to the score threshold. The output graph logic 324 may aggregate the patterns that have the scores that are greater than or equal to the score threshold to generate the output graph 342.
- the score threshold may be set to equal 0.7 for non-limiting, illustrative purposes.
- the output graph may be automatically generated to include the first, second, third, and fourth patterns because their corresponding scores of 0.8, 0.71, 0.71, and 0.7 are greater than or equal to the score threshold of 0.7.
- the output graph may be automatically generated to not include the fifth and sixth patterns because their corresponding scores of 0.2 and 0.19 are less than the score threshold of 0.7.
- one or more steps 202, 204, 206, 208, 210, and/or 212 of flowchart 200 may not be performed. Moreover, steps in addition to or in lieu of steps 202, 204, 206, 208, 210, and/or 212 may be performed.
- the method of flowchart 200 further includes generating the association graph based on information regarding a computer network. The information indicates requests that are received in the computer network, data that are accessed in response to the requests, and/or operations that are performed on the data.
- the association graph logic 312 generates the association graph based on computer network information 326, which indicates the requests that are received in the computer network, the data that are accessed in response to the requests, and the operations that are performed on the data.
- the association graph logic 312 may generate the association graph information 328 to describe the association graph.
- the association graph information 328 may indicate the graph nodes that are included in the association graph and correlations among the graph nodes.
- the method of flowchart 200 further includes identifying redundant connections among the patterns in the Bayesian network by performing a conditional independence test on the network nodes of the Bayesian network.
- the conditional independence test is configured to determine whether each network node in the Bayesian network has a causal relation to each other network node in the Bayesian network.
- Conditional independence is often formulated in terms of conditional probability, which may be expressed using the following inequality: P(A
- B) P(AAB)/P(B), where P(A
- the redundancy removal logic 320 identifies the redundant connections among the patterns in the Bayesian network.
- removing the at least one redundant connection at step 208 is performed as a result of identifying the redundant connections among the patterns in the Bayesian network.
- identifying the redundant connections among the patterns in the Bayesian network includes automatically identifying a first subset of the redundant connections using a machine learning technique. In accordance with this aspect, identifying the redundant connections among the patterns in the Bayesian network further includes identifying a second subset of the redundant connections using a manually generated rule (e.g., based on each redundant connection in the second subset not being automatically identified by using the machine learning technique). The first subset includes at least one of the redundant connections, and the second subset includes at least one of the redundant connections.
- the computing system 300 may not include one or more of the automatic graph-based detection logic 308, the store 310, the association graph logic 312, the initialization logic 314, the grouping logic 316, the pattern identification logic 318, the redundancy removal logic 320, the scoring logic 322, and/or the output graph logic 324.
- the computing system 300 may include components in addition to or in lieu of the automatic graph-based detection logic 308, the store 310, the association graph logic 312, the initialization logic 314, the grouping logic 316, the pattern identification logic 318, the redundancy removal logic 320, the scoring logic 322, and/or the output graph logic 324.
- FIG. 4 depicts an example Bayesian network 400 in accordance with an embodiment.
- the Bayesian network 400 includes network nodes A-F.
- Network node A is connected to network nodes B, C, and D via respective connections 408, 406, and 404.
- Network nodes B and C are connected via connection 410.
- Network nodes C and D are connected via connection 412.
- Network nodes C and F are connected via connection 414.
- Network nodes D and E are connected via connection 402.
- FIG. 5 depicts another example Bayesian network 500 in accordance with an embodiment.
- the Bayesian network 500 is similar to the Bayesian network 400 shown in FIG. 4, except that network nodes D and E in the Bayesian network 500 have been grouped in a cluster 502.
- network nodes D and E may have been grouped into the cluster 502 in accordance with step 204 of flowchart 200 in FIG. 2.
- the Bayesian network 500 includes network nodes A-F.
- Network node A is connected to network nodes B, C, and D via respective connections 408, 406, and 404.
- Network nodes B and C are connected via connection 410.
- Network nodes C and D are connected via connection 412.
- Network nodes C and F are connected via connection 414.
- Network nodes D and E are connected via connection 402.
- a connection is created between the arbitrary network node and each of the other network nodes that are included in the cluster 502. For instance, because network node A is connected to network node D (i.e., one of the network nodes in the cluster 502) via connection 404, a connection 504 is established to connect network node A to network node E (i.e., the other network node in the cluster 502).
- connection 512 is established to connect network node C to network E (i.e., the other network node in the cluster 502).
- FIG. 6 depicts yet another example Bayesian network 600 in accordance with an embodiment.
- the Bayesian network 600 includes network nodes A-F. Initially, network node A is connected to network nodes B, C, and D via respective connections 608, 606, and 604; network nodes B and C are connected via connection 610; network nodes C and D are connected via connection 612; network nodes C and F are connected via connection 614; and network nodes D and E are connected via connection 602.
- connections 604 and 610 have been removed because each of connections 604 and 610 is redundant with regard to one or more other connections. For instance, connections 604 and 610 may have been removed in accordance with step 208 of flowchart 200 in FIG. 2.
- Connection 604 is redundant with regard to a combination of connection 606 and connection 612 because connection 604 connects network nodes A and D and the combination of connections 606 and 612 connects network nodes A and D. For example, connection 606 connects network nodes A and C, and connection 612 connects network nodes C and D. The combination of connections 606 and 612 therefore connects network nodes A and D.
- connection 604 may be described using the following equation: P(A,D
- C) P(A
- Connection 610 is redundant with regard to a combination of connections 608 and 606 because connection 610 connects network nodes B and C and the combination of connections 608 and 606 connects network nodes B and C.
- connection 608 connects network nodes B and A
- connection 606 connects network nodes A and C.
- the combination of connections 608 and 606 therefore connects network nodes B and C.
- the redundancy of connection 610 may be described using the following equation: P(B,C
- A) P(B
- any one or more of the automatic graph-based detection logic 108, the automatic graph-based detection logic 308, the association graph logic 312, the initialization logic 314, the grouping logic 316, the pattern identification logic 318, the redundancy removal logic 320, the scoring logic 322, the output graph logic 324, and/or flowchart 200 may be implemented in hardware, software, firmware, or any combination thereof.
- any one or more of the automatic graph-based detection logic 108, the automatic graph-based detection logic 308, the association graph logic 312, the initialization logic 314, the grouping logic 316, the pattern identification logic 318, the redundancy removal logic 320, the scoring logic 322, the output graph logic 324, and/or flowchart 200 may be implemented, at least in part, as computer program code configured to be executed in one or more processors.
- any one or more of the automatic graph-based detection logic 108, the automatic graph-based detection logic 308, the association graph logic 312, the initialization logic 314, the grouping logic 316, the pattern identification logic 318, the redundancy removal logic 320, the scoring logic 322, the output graph logic 324, and/or flowchart 200 may be implemented, at least in part, as hardware logic/electrical circuitry.
- Such hardware logic/electrical circuitry may include one or more hardware logic components. Examples of a hardware logic component include but are not limited to a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), an application-specific standard product (ASSP), a system-on-a-chip system (SoC), a complex programmable logic device (CPLD), etc.
- FPGA field-programmable gate array
- ASIC application-specific integrated circuit
- ASSP application-specific standard product
- SoC system-on-a-chip system
- CPLD complex programmable logic device
- a SoC may include an integrated circuit chip that includes one or more of a processor (e.g., a microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and/or further circuits and/or embedded firmware to perform its functions.
- a processor e.g., a microcontroller, microprocessor, digital signal processor (DSP), etc.
- An example system ( Figure 1, 102A-102M, 106A-106N; Figure 3, 300; Figure 7, 700) to perform automatic graph-based detection of potential security threats comprises a memory ( Figure 7, 704, 708, 710) and one or more processors (Figure 7, 702) coupled to the memory.
- the one or more processors are configured to initialize ( Figure 2, 202) a Bayesian network using an association graph, based on correlations among graph nodes that are included in the association graph, to establish connections among network nodes that are included in the Bayesian network.
- the one or more processors are further configured to group ( Figure 2, 204) the network nodes of the Bayesian network among clusters that correspond to respective intents such that, for each connection between a respective pair of network nodes, which includes an arbitrary network node and a network node that is included in a cluster, a connection between the arbitrary network node and each of the other network nodes that are included in that cluster is created.
- the one or more processors are further configured to identify ( Figure 2, 206) a plurality of patterns in the Bayesian network, each pattern including at least one connection, each connection being between a respective pair of network nodes.
- the one or more processors are further configured to remove (Figure 2, 208) at least one redundant connection, which is redundant with regard to one or more other connections, from the patterns in the Bayesian network.
- the one or more processors are further configured to assign ( Figure 2, 210) scores to the respective patterns in the Bayesian network, based on knowledge of historical patterns and historical security threats, such that each score indicates a likelihood of the respective pattern to indicate a security threat.
- the one or more processors are further configured to automatically generate (Figure 2, 212) an output graph ( Figure 3, 342), which includes each pattern that has a score that is greater than or equal to a score threshold and which does not include each pattern that has a score that is less than the score threshold. Each pattern in the output graph represents a potential security threat.
- (A2) In the example system of Al, wherein the one or more processors are further configured to generate the association graph based on information regarding a computer network, the information indicating requests that are received in the computer network, data that are accessed in response to the requests, and operations that are performed on the data.
- (A5) In the example system of any of A1-A4, wherein the one or more processors are configured to: assign a weight to each pair of network nodes in the Bayesian network, each weight representing an extent to which the network nodes in the respective pair are related; and remove a connection between each pair of network nodes in the Bayesian network that has a weight that is less than or equal to a weight threshold.
- each graph node of the association graph represents an entity from a plurality of entities or an event from a plurality of events; and wherein each network node of the Bayesian network represents an event from the plurality of events.
- (A7) In the example system of any of A1-A6, wherein the one or more processors are configured to: automatically identify a first subset of the plurality of patterns using a machine learning technique; and identify a second subset of the plurality of patterns using a manually generated rule; and wherein each of the first subset and the second subset includes at least one pattern from the plurality of patterns.
- the one or more processors are configured to: identify redundant connections among the patterns in the Bayesian network by performing a conditional independence test on the network nodes of the Bayesian network, the conditional independence test configured to determine whether each network node in the Bayesian network has a causal relation to each other network node in the Bayesian network; and remove the at least one redundant connection as a result of identifying the redundant connections among the patterns in the Bayesian network.
- (A10) In the example system of any of A1-A9, wherein the one or more processors are configured to remove a first connection, which is between a first network node and a second network node, from the patterns in the Bayesian network based on the first connection being redundant with regard to a second connection, which is between the first network node and a third network node, as a result of the second network node and the third network node being equivalent.
- An example method of performing automatic graph-based detection of potential security threats comprises initializing ( Figure 2, 202) a Bayesian network using an association graph, based on correlations among graph nodes that are included in the association graph, to establish connections among network nodes that are included in the Bayesian network.
- the method further comprises grouping ( Figure 2, 204) the network nodes of the Bayesian network among clusters that correspond to respective intents such that, for each connection between a respective pair of network nodes, which includes an arbitrary network node and a network node that is included in a cluster, a connection between the arbitrary network node and each of the other network nodes that are included in that cluster is created.
- the method further comprises identifying ( Figure 2, 206) a plurality of patterns in the Bayesian network, each pattern including at least one connection, each connection being between a respective pair of network nodes.
- the method further comprises removing ( Figure 2, 208) at least one redundant connection, which is redundant with regard to one or more other connections, from the patterns in the Bayesian network.
- the method further comprises assigning ( Figure 2, 210) scores to the respective patterns in the Bayesian network, based on knowledge of historical patterns and historical security threats, such that each score indicates a likelihood of the respective pattern to indicate a security threat.
- the method further comprises automatically generating ( Figure 2, 212) an output graph ( Figure 3, 342), which includes each pattern that has a score that is greater than or equal to a score threshold and which does not include each pattern that has a score that is less than the score threshold.
- Each pattern in the output graph represents a potential security threat.
- (B2) In the method of Bl, further comprising: generating the association graph based on information regarding a computer network, the information indicating requests that are received in the computer network, data that are accessed in response to the requests, and operations that are performed on the data.
- initializing the Bayesian network comprises: initializing the Bayesian network using the association graph based on pairwise correlations among the graph nodes that are included in the association graph.
- initializing the Bayesian network comprises: initializing the Bayesian network by performing a test of significance on the association graph to identify the correlations among the graph nodes in the association graph.
- initializing the Bayesian network comprises: assigning a weight to each pair of network nodes in the Bayesian network, each weight representing an extent to which the network nodes in the respective pair are related; and removing a connection between each pair of network nodes in the Bayesian network that has a weight that is less than or equal to a weight threshold.
- each graph node of the association graph represents an entity from a plurality of entities or an event from a plurality of events; and wherein each network node of the Bayesian network represents an event from the plurality of events.
- identifying the plurality of patterns in the Bayesian network comprises: automatically identifying a first subset of the plurality of patterns using a machine learning technique; and identifying a second subset of the plurality of patterns using a manually generated rule; and wherein each of the first subset and the second subset includes at least one pattern from the plurality of patterns.
- identifying the redundant connections among the patterns in the Bayesian network comprises: automatically identifying a first subset of the redundant connections using a machine learning technique; and identifying a second subset of the redundant connections using a manually generated rule; and wherein each of the first subset and the second subset includes at least one of the redundant connections.
- removing at least one redundant connection comprises: removing a first connection, which is between a first network node and a second network node, from the patterns in the Bayesian network based on the first connection being redundant with regard to a second connection, which is between the first network node and a third network node, as a result of the second network node and the third network node being equivalent.
- assigning the scores comprises: assigning the scores to the respective patterns in the Bayesian network using a classifier that is trained using features that are derived from labeled data, which represents the knowledge of the historical patterns and the historical security threats.
- FIG. 718, 722 An example computer program product (Figure 7, 718, 722) comprising a computer- readable storage medium having instructions recorded thereon for enabling a processor-based system ( Figure 1, 102A-102M, 106A-106N; Figure 3, 300; Figure 7, 700) to perform automatic graph-based detection of potential security threats by performing operations.
- the operations comprise initializing ( Figure 2, 202) a Bayesian network using an association graph, based on correlations among graph nodes that are included in the association graph, to establish connections among network nodes that are included in the Bayesian network.
- the operations further comprise grouping ( Figure 2, 204) the network nodes of the Bayesian network among clusters that correspond to respective intents such that, for each connection between a respective pair of network nodes, which includes an arbitrary network node and a network node that is included in a cluster, a connection between the arbitrary network node and each of the other network nodes that are included in that cluster is created.
- the operations further comprise identifying ( Figure 2, 206) a plurality of patterns in the Bayesian network, each pattern including at least one connection, each connection being between a respective pair of network nodes.
- the operations further comprise removing ( Figure 2, 208) at least one redundant connection, which is redundant with regard to one or more other connections, from the patterns in the Bayesian network.
- the operations further comprise assigning (Figure 2, 210) scores to the respective patterns in the Bayesian network, based on knowledge of historical patterns and historical security threats, such that each score indicates a likelihood of the respective pattern to indicate a security threat.
- the operations further comprise automatically generating ( Figure 2, 212) an output graph ( Figure 3, 342), which includes each pattern that has a score that is greater than or equal to a score threshold and which does not include each pattern that has a score that is less than the score threshold, each pattern in the output graph representing a potential security threat.
- FIG. 7 depicts an example computer 700 in which embodiments may be implemented.
- Computer 700 may be a physical computer or a virtual computer (e.g., virtual machine). Any one or more of the protectable entities 102A-102M and/or the computing system 106 shown in FIG. 1 and/or the computing system 300 shown in FIG. 3 may be implemented using computer 700, including one or more features of computer 700 and/or alternative features.
- Computer 700 may be a general-purpose computing device in the form of a conventional personal computer, a mobile computer, or a workstation, for example, or computer 700 may be a special purpose computing device.
- the description of computer 700 provided herein is provided for purposes of illustration, and is not intended to be limiting. Embodiments may be implemented in further types of computer systems, as would be known to persons skilled in the relevant art(s).
- computer 700 includes a processing unit 702, a system memory 704, and a bus 706 that couples various system components including system memory 704 to processing unit 702.
- Bus 706 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- System memory 704 includes read only memory (ROM) 708 and random access memory (RAM) 710.
- ROM read only memory
- RAM random access memory
- BIOS basic input/output system 712
- Computer 700 may include one or more of the following drives: a hard disk drive 714 for reading from and writing to a hard disk, a magnetic disk drive 716 for reading from or writing to a removable magnetic disk 718, and an optical disk drive 720 for reading from or writing to a removable optical disk 722 such as a CD ROM, DVD ROM, or other optical media.
- Hard disk drive 714, magnetic disk drive 716, and optical disk drive 720 are connected to bus 706 by a hard disk drive interface 724, a magnetic disk drive interface 726, and an optical drive interface 728, respectively.
- the drives and their associated computer-readable storage media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computer.
- hard disk a removable magnetic disk and a removable optical disk
- other types of computer-readable storage media can be used to store data, such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like.
- external storage e.g., cloud storage
- a cache may be used in addition to or in lieu of the hard disk drive 714, the magnetic disk drive 716, and/or the optical disk drive 720.
- a number of program modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. These programs include an operating system 730, one or more application programs 732, other program modules 734, and program data 736.
- Application programs 732 or program modules 734 may include, for example, computer program logic for implementing any one or more of (e.g., at least a portion of) the automatic graph-based detection logic 108, the automatic graph-based detection logic 308, the association graph logic 312, the initialization logic 314, the grouping logic 316, the pattern identification logic 318, the redundancy removal logic 320, the scoring logic 322, the output graph logic 324, and/or flowchart 200 (including any step of flowchart 200), as described herein.
- a user may enter commands and information into the computer 700 through input devices such as keyboard 738 and pointing device 740.
- Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, touch screen, camera, accelerometer, gyroscope, or the like.
- serial port interface 742 that is coupled to bus 706, but may be connected by other interfaces, such as a parallel port, game port, or a universal serial bus (USB).
- a display device 744 (e.g., a monitor) is also connected to bus 706 via an interface, such as a video adapter 746.
- computer 700 may include other peripheral output devices (not shown) such as speakers and printers.
- Computer 700 is connected to a network 748 (e.g., the Internet) through a network interface or adapter 750, a modem 752, or other means for establishing communications over the network.
- a network 748 e.g., the Internet
- Modem 752 which may be internal or external, is connected to bus 706 via serial port interface 742.
- computer program medium and “computer-readable storage medium” are used to generally refer to media (e.g., non-transitory media) such as the hard disk associated with hard disk drive 714, removable magnetic disk 718, removable optical disk 722, as well as other media such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like.
- a computer-readable storage medium is not a signal, such as a carrier signal or a propagating signal.
- a computer-readable storage medium may not include a signal.
- a computer-readable storage medium does not constitute a signal per se.
- Such computer-readable storage media are distinguished from and non-overlapping with communication media (do not include communication media).
- Communication media embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media includes wireless media such as acoustic, RF, infrared and other wireless media, as well as wired media. Example embodiments are also directed to such communication media.
- computer programs and modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. Such computer programs may also be received via network interface 750 or serial port interface 742. Such computer programs, when executed or loaded by an application, enable computer 700 to implement features of embodiments discussed herein. Accordingly, such computer programs represent controllers of the computer 700.
- Example embodiments are also directed to computer program products comprising software (e.g., computer-readable instructions) stored on any computer-useable medium.
- software e.g., computer-readable instructions
- Such software when executed in one or more data processing devices, causes data processing device(s) to operate as described herein.
- Embodiments may employ any computer-useable or computer- readable medium, known now or in the future. Examples of computer-readable mediums include, but are not limited to storage devices such as RAM, hard drives, floppy disks, CD ROMs, DVD ROMs, zip disks, tapes, magnetic storage devices, optical storage devices, MEMS- based storage devices, nanotechnology -based storage devices, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202280061252.8A CN117980903A (en) | 2021-09-24 | 2022-09-01 | Graph-based automatic detection of potential security threats |
| EP22777470.0A EP4406193A1 (en) | 2021-09-24 | 2022-09-01 | Automatic graph-based detection of potential security threats |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163248390P | 2021-09-24 | 2021-09-24 | |
| US63/248,390 | 2021-09-24 | ||
| US17/520,594 | 2021-11-05 | ||
| US17/520,594 US11928207B2 (en) | 2021-09-24 | 2021-11-05 | Automatic graph-based detection of potential security threats |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023048921A1 true WO2023048921A1 (en) | 2023-03-30 |
Family
ID=83447948
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2022/042279 Ceased WO2023048921A1 (en) | 2021-09-24 | 2022-09-01 | Automatic graph-based detection of potential security threats |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023048921A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240231922A1 (en) * | 2023-01-11 | 2024-07-11 | Vmware, Inc. | Anti-affinity for containerized computing service |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190132344A1 (en) * | 2016-12-16 | 2019-05-02 | Patternex, Inc. | Method and system for employing graph analysis for detecting malicious activity in time evolving networks |
-
2022
- 2022-09-01 WO PCT/US2022/042279 patent/WO2023048921A1/en not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190132344A1 (en) * | 2016-12-16 | 2019-05-02 | Patternex, Inc. | Method and system for employing graph analysis for detecting malicious activity in time evolving networks |
Non-Patent Citations (2)
| Title |
|---|
| SATORU KOBAYASHI ET AL: "Mining Causality of Network Events in Log Data", IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, vol. 15, no. 1, 1 March 2018 (2018-03-01), US, pages 53 - 67, XP055733051, ISSN: 1932-4537, DOI: 10.1109/TNSM.2017.2778096 * |
| SUN XIAOYAN ET AL: "Using Bayesian Networks for Probabilistic Identification of Zero-Day Attack Paths", IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, vol. 13, no. 10, 1 October 2018 (2018-10-01), USA, pages 2506 - 2521, XP055974500, ISSN: 1556-6013, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=8327913&ref=aHR0cHM6Ly9pZWVleHBsb3JlLmllZWUub3JnL2Fic3RyYWN0L2RvY3VtZW50LzgzMjc5MTM/Y2FzYV90b2tlbj1ITFJBdTJtajNKZ0FBQUFBOjFOOVdPaTByMEF1cElWQUhGdFJiZ2Zxd2c1SmhxRl84dmk1LS10bUhka016UGJEcThwTkJYVnZUOFVVVWhwS2NCanhXTHJMakVn> [retrieved on 20221025], DOI: 10.1109/TIFS.2018.2821095 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240231922A1 (en) * | 2023-01-11 | 2024-07-11 | Vmware, Inc. | Anti-affinity for containerized computing service |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11928207B2 (en) | Automatic graph-based detection of potential security threats | |
| US12206699B1 (en) | Identifying high-influence features for model-detected anomalies | |
| US20230274003A1 (en) | Identifying and correcting vulnerabilities in machine learning models | |
| JP7302019B2 (en) | Hierarchical Behavior Modeling and Detection Systems and Methods for System-Level Security | |
| US11509674B1 (en) | Generating machine learning data in salient regions of a feature space | |
| US11770409B2 (en) | Intrusion management with threat type clustering | |
| US12301589B2 (en) | Intrusion detection using machine learning | |
| US12088600B1 (en) | Machine learning system for detecting anomalies in hunt data | |
| EP4487231A1 (en) | Machine learning for anomaly detection based on logon events | |
| Panagiotou et al. | Host-based intrusion detection using signature-based and ai-driven anomaly detection methods | |
| Shin et al. | Comparison of anomaly detection accuracy of host-based intrusion detection systems based on different machine learning algorithms | |
| US20190340614A1 (en) | Cognitive methodology for sequence of events patterns in fraud detection using petri-net models | |
| US11663329B2 (en) | Similarity analysis for automated disposition of security alerts | |
| US12259986B2 (en) | Detection and mitigation of high-risk online activity in a computing platform | |
| US10785243B1 (en) | Identifying evidence of attacks by analyzing log text | |
| US20250317462A1 (en) | Machine learning approach for solving the cold start problem in stateful models | |
| US10990762B2 (en) | Chat analysis using machine learning | |
| WO2020129031A1 (en) | Method and system for generating investigation cases in the context of cybersecurity | |
| US12493693B2 (en) | Systems and methods for selecting client backup files for maliciousness analysis | |
| US20230367849A1 (en) | Entropy exclusion of training data for an embedding network | |
| WO2023048921A1 (en) | Automatic graph-based detection of potential security threats | |
| US20240396907A1 (en) | Monitoring Hosts for Anomalous Processes Using Machine Learning Models | |
| US10846407B1 (en) | Machine learning model robustness characterization | |
| WO2024137138A1 (en) | Detecting a spoofed entity based on complexity of a distribution of events initiated by the spoofed entity | |
| WO2022122455A1 (en) | Intrusion response determination |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22777470 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280061252.8 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202417022705 Country of ref document: IN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022777470 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022777470 Country of ref document: EP Effective date: 20240424 |