HK1208105B - Method and apparatus for compilation of finite automata - Google Patents
Method and apparatus for compilation of finite automata Download PDFInfo
- Publication number
- HK1208105B HK1208105B HK15108609.9A HK15108609A HK1208105B HK 1208105 B HK1208105 B HK 1208105B HK 15108609 A HK15108609 A HK 15108609A HK 1208105 B HK1208105 B HK 1208105B
- Authority
- HK
- Hong Kong
- Prior art keywords
- pattern
- nfa
- selected sub
- sub
- node
- Prior art date
Links
Abstract
A method and corresponding apparatus are provided implementing run time processing using Deterministic Finite Automata (DFA) and Non-Deterministic Finite Automata (NFA) to find the existence of a pattern in a payload. A subpattern may be selected from each pattern in a set of one or more regular expression patterns based on at least one heuristic and a unified deterministic finite automata (DFA) may be generated using the subpatterns selected from all patterns in the set, and at least one non-deterministic finite automata (NFA) may be generated for at least one pattern in the set, optimizing run time performance of the run time processing.
Description
Background
The Open Systems Interconnection (OSI) reference model defines seven network protocol layers (L1-L7) for communication over a transmission medium. Higher layers (L4-L7) represent end-to-end communications and lower layers (L1-L3) represent local communications.
The network application aware system needs to process, filter and switch network protocol layers ranging from L3 to L7, for example L7 network protocol layers such as hypertext transfer protocol (HTTP) and Simple Mail Transfer Protocol (SMTP), and L4 network protocol layers such as Transmission Control Protocol (TCP). In addition to handling network protocol layers, network application aware systems also need to protect these protocols simultaneously, and access and content based security through the L4-L7 network protocol layers includes wire-speed firewalls, Virtual Private Networks (VPNs), Secure Sockets Layer (SSL), Intrusion Detection Systems (IDS), internet protocol security (IPSec), anti-virus (AV) and anti-spam functions.
The network processor may be used for high throughput L2 and L3 network protocol processing, i.e., performing packet processing for line-rate forwarded packets. Typically, a general purpose processor is used to process L4-L7 network protocols that require more intelligent processing. While a general purpose processor may perform computationally intensive tasks, it does not provide sufficient performance for processing data so that it can be forwarded at wire speed.
Content aware networks need to examine the content of packets at "line speed". The content may be analyzed to determine if there has been a security breach or intrusion. A number of patterns and rules in the form of regular expressions are applied to ensure that all security breaches or intrusions are detected. Regular expressions are a compact method that describes patterns in strings. The simplest pattern of regular expression matching is a single character or string of characters, e.g.,/c/or/cat. Regular expressions also include operators and meta-characters with special meanings.
By using meta-characters, regular expressions can be used for more complex searches, such as "abc. That is, the string "abc" is found to be followed by the character "xyz" but the number of characters between "abc" and "xyz" is not limited. Another example is the regular expression "abc.. xyz; ", that is, the string" abc "is found to be followed by two characters, followed by the string" abc "and an unlimited number of characters, followed by the string" xyz ".
Intrusion Detection System (IDS) applications examine the contents of all individual packets flowing through the network and identify suspicious patterns that may indicate attempts to break into or compromise the system. One example of a suspicious pattern may be a particular text string followed by 100 characters in a packet, followed by another particular text string.
Content searching is typically performed using search methods such as Deterministic Finite Automata (DFA) or non-deterministic finite automata (NFA) to process regular expressions.
Disclosure of Invention
Embodiments of the present invention provide a method, apparatus, computer program product and corresponding system for compilation and run-time processing of finite automata.
According to one embodiment, a method may select, in a security device operatively coupled to a network, a sub-pattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic in at least one processor operatively coupled to at least one memory. The method may generate a unified Deterministic Finite Automaton (DFA) using sub-patterns selected from all patterns in the set. The method may generate at least one non-deterministic finite automaton (NFA) for at least one pattern in the set. The portion of the at least one pattern used to generate the at least one NFA and the at least one traversal direction for runtime processing of the at least one NFA may be determined based on whether a length of the selected sub-pattern is fixed or variable and a location of the selected sub-pattern within the at least one pattern. The method may store the generated unified DFA and the at least one NFA in at least one memory.
The at least one heuristic may include maximizing a number of unique sub-patterns selected and a length of each sub-pattern selected.
The method may determine whether the length of the selected sub-pattern is fixed or variable.
The method may calculate a hash value for each sub-pattern selected and store each hash value calculated in association with an identifier of the pattern from which the sub-pattern was selected.
The at least one heuristic may comprise maximizing a number of unique sub-patterns selected, and the method may comprise calculating a hash value of the selected sub-pattern for each pattern in the set. The method may compare the computed hash value to a list of hash values of sub-patterns selected for other patterns in the set. If the computed hash value is found in the list, the method may determine whether to replace (i) the selected sub-pattern with another sub-pattern from the pattern or (ii) the selected alternative (alternate) sub-pattern from other patterns in the set with another pattern selected sub-pattern in the set. Other patterns in the set may be identified based on an association in the list with the computed hash value. The determination of whether to replace (i) or (ii) may be based on comparing the lengths of the sub-patterns considered for replacement in order to maximize the length of the selected unique sub-pattern.
The at least one heuristic may include identifying sub-patterns of each pattern and ignoring a given sub-pattern of the identified sub-patterns of each pattern if the given sub-pattern has a length less than a minimum threshold.
The at least one heuristic may include accessing a knowledge base of sub-patterns associated with the historical usage frequency indicator and ignoring a given sub-pattern if the historical usage frequency indicator for the given sub-pattern of the identified sub-patterns for each pattern in the accessed knowledge base is greater than or equal to a usage frequency threshold.
The at least one heuristic may include identifying sub-patterns for each pattern and maximizing a number of consecutive text characters in the selected sub-pattern for each pattern by selecting the given sub-pattern based on the given sub-pattern having a maximum number of consecutive text characters in the identified sub-pattern and based on the given sub-pattern being unique among all sub-patterns selected for the set of one or more regular expressions.
The at least one heuristic may comprise: setting a priority for a given sub-mode of each of the given sub-modes of each mode based on the sub-mode type of each sub-mode and the length of the given sub-mode, the longer length sub-mode being prioritized over another sub-mode of lesser length, selecting a unique sub-mode as a selected sub-mode based on the priority setting, the selected unique sub-mode having a length of at least a minimum length threshold, and selecting a non-unique sub-mode as the selected sub-mode based on the priority setting if none of the given sub-modes is unique and has a length of at least the minimum length threshold.
The at least one heuristic may include prioritizing a given sub-mode based on a sub-mode type for each of the given sub-modes of each mode, the sub-mode type being text-only, alternative, single character repeat, or multiple character repeat, and the highest to lowest order of priority for the sub-mode types being text-only, alternative, single character repeat, and multiple character repeat.
If the first element of the selected sub-pattern is a first element of at least one pattern and the length of the selected sub-pattern is fixed, the position of the selected sub-pattern may be a start position of the at least one pattern and the portion of the at least one pattern used to generate the at least one NFA may be at least one pattern excluding the selected sub-pattern. The at least one NFA may be a single NFA, and the at least one traversal direction of the at least one NFA may be a forward traversal direction.
Generating the unified DFA may include: associating a DFA node of the generated unified DFA associated with a last element of the selected sub-pattern with metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload a pointer to a starting node of the generated at least one NFA, instructions for transitioning to traversing the at least one NFA in a forward traversal direction and reporting a match of the selected sub-pattern, a leading offset within the payload of a leading character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and a length of the selected sub-pattern. The start node of the at least one NFA may be associated with a first element of the at least one mode used to generate the portion of the at least one NFA, and the payload start offset of the at least one NFA may be associated with a byte after another byte of the end offset of the selected sub-mode.
Generating at least one NFA may include: associating the NFA node of the generated at least one NFA with metadata indicating to the walker instructions for terminating the traversal and reporting a lag offset within the payload of a lag character matching at the NFA node as an end offset of the at least one pattern and a final match of the at least one pattern, the NFA node being associated with a last element of the at least one pattern.
The position of the selected sub-pattern may be an intermediate position of the at least one pattern if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern. The portion of the at least one mode used to generate the at least one NFA may include a lag portion and a leading portion of the at least one mode if the length of the selected sub-mode is fixed. The lag portion of the at least one mode may be the at least one mode excluding the selected sub-mode and the leading portion of the at least one mode. The leading portion of at least one mode may exclude selected sub-modes and the lagging portion of at least one mode. The at least one NFA may include a lagging NFA and a leading NFA, the at least one traversal direction may include a forward traversal direction and a reverse traversal direction, the lagging NFA may have a forward traversal direction, and the leading NFA may have a reverse traversal direction. The lagging portion of at least one mode can be used to generate a lagging NFA and the leading portion of at least one mode can be used to generate a leading NFA.
Generating the unified DFA may include: associating the generated unified DFA node with metadata, the DFA node associated with a last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of a lagging NFA, instructions for transitioning to traversing the lagging NFA in a forward traversal direction, the starting node of the lagging NFA associated with a first element of a lagging portion, a payload start offset of the lagging NFA associated with a byte following an end offset of the selected sub-pattern, a pointer to a starting node of a leading NFA, instructions for transitioning to traversing the leading NFA in a reverse traversal direction and reporting an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as the end offset of the selected sub-pattern, a match of the selected sub-pattern, and a length of the selected sub-pattern, the start node of the preamble NFA is associated with the last element of the preamble section, and the payload start offset of the preamble NFA is determined by subtracting the length of the selected sub-pattern from the end offset of the selected sub-pattern.
Generating at least one NFA may include: associating a hysteresis node of the hysteresis NFA with metadata, the hysteresis node associated with a last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload, instructions for terminating traversal of the generated hysteresis NFA and reporting a hysteresis offset of a hysteresis character of the payload within the payload that matches the last element at the hysteresis node and a match of a hysteresis portion of the at least one pattern. Generating at least one NFA may include: associating a leading node of the leading NFA with metadata, the leading node being associated with a first element of the at least one pattern, the metadata indicating instructions to the walker to terminate traversal of the generated leading NFA and report a match of a leading portion of the at least one pattern and report a leading offset within the payload of a leading character of the payload matching the first element at the leading node as a start offset of the at least one pattern if required by a qualifier associated with the at least one pattern.
The position of the selected sub-pattern may be an intermediate position of the at least one pattern if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern may not be the last element of the at least one pattern, and the position of the selected sub-pattern may be a start position of the at least one pattern if the first element of the selected sub-pattern is the first element of the at least one pattern. The portion of the at least one pattern used to generate the at least one NFA may include the hysteresis portion and the entire portion of the at least one pattern if the length of the sub-pattern is fixed or variable. The lagging portion of the at least one mode may be the at least one mode excluding the leading portion of the at least one mode. The preamble part may include a first element of the at least one pattern, a last element of the selected sub-pattern, and all elements in the at least one pattern between them. The whole part of the at least one pattern may be the at least one pattern, and the preamble part is the selected sub-pattern if the position of the selected sub-pattern is the start position. The at least one NFA may include a lagging NFA and an umbrella NFA. The at least one traversal direction may include a forward traversal direction and a reverse traversal direction. The lagging NFA may have a forward direction of traversal and the umbrella NFA may have a reverse direction of traversal. The hysteresis portion of the at least one mode can be used to generate a hysteresis NFA, and the entire portion of the at least one mode can be used to generate an umbrella NFA.
Generating the unified DFA may include: associating a DFA node of the generated unified DFA with metadata, the DFA node associated with a last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload a pointer to a start node of the lagging NFA, instructions to transition to traversing the lagging NFA in a forward traversal direction and reporting a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an end offset of the selected sub-pattern and reporting a length if the length of the selected sub-pattern is fixed, the start node of the lagging NFA associated with the first element of the lagging section.
Generating at least one NFA may include: associating a lagging node of the generated lagging NFA with metadata, the lagging node associated with a last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload a pointer to a starting node of the umbrella NFA, instructions to transition to traversing the umbrella NFA in a reverse traversal direction and optionally reporting an offset within the payload of characters matching the last element of the at least one pattern at the lagging node and optionally reporting a match of a lagging portion of the at least one pattern, the starting node of the umbrella NFA associated with the last element of the at least one pattern. Generating at least one NFA may include: associating umbrella nodes of the generated umbrella NFA with metadata, the umbrella nodes being associated with first elements of the at least one pattern, the metadata indicating instructions to the walker to terminate the traversal and report a final match of the at least one pattern and, if a qualifier associated with the at least one pattern requires, report a start offset within the payload of a start character matching the first elements of the at least one pattern at the umbrella nodes as a start offset of the at least one pattern.
The position of the selected sub-pattern may be an intermediate position of the at least one pattern if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern, and the position of the selected sub-pattern may be a start position of the at least one pattern if the first element of the selected sub-pattern is the first element of the at least one pattern. The portion of the at least one pattern used to generate the at least one NFA may include a lag portion and a leading portion of the at least one pattern if the length of the sub-pattern is fixed or variable. The lagging portion of the at least one mode may be the at least one mode excluding the leading portion of the at least one mode. The preamble part may include a first element of the at least one pattern, a last element of the selected sub-pattern, and all elements in the at least one pattern between them. The hysteresis part may be the selected sub-pattern if the position of the selected sub-pattern is the start position. The at least one NFA may include a lagging NFA and a leading NFA. The at least one traversal direction may include a forward traversal direction and a reverse traversal direction. The lagging NFA may have a forward traversal direction and the leading NFA may have a reverse traversal direction. The hysteresis portion of at least one mode may be used to generate a hysteresis NFA. The preamble portion of at least one pattern may be used to generate a preamble NFA.
Generating the unified DFA may include: associating a DFA node of the generated unified DFA with metadata, the DFA node associated with a last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload a pointer to a starting node of a lagging NFA, instructions for transitioning to traversing the lagging NFA in a forward traversal direction, the starting node of the lagging NFA associated with a first element of a lag portion, a pointer to a starting node of a leading NFA, instructions for transitioning to traversing the leading NFA in a reverse traversal direction and reporting a match of the selected sub-pattern and an offset within the payload of a character at a DFA node that matches a last element of the selected sub-pattern as an end offset of the selected sub-pattern and reporting a length if the length of the selected sub-pattern is fixed, a start node of the leading NFA being associated with the last element of the selected sub-pattern.
Generating at least one NFA may include: associating a hysteresis node of the generated lagging NFA with metadata, the hysteresis node associated with a last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload instructions for terminating traversal of the lagging NFA and reporting a hysteresis offset within the payload of a hysteresis character at the hysteresis node that matches the last element of the at least one pattern and reporting a match of a hysteresis portion of the at least one pattern. Generating at least one NFA may include: associating a leading node of the generated leading NFA with metadata, the leading node associated with a first element of the at least one pattern, the metadata indicating to the walker instructions for terminating traversal of the leading NFA and reporting a leading offset within the payload of a leading character at the leading node that matches the first element of the at least one pattern.
The position of the selected sub-pattern may be an intermediate position of the at least one pattern if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern. At least one NFA may be a single NFA if the length of the selected sub-pattern is fixed or variable. The at least one traversal direction may include a forward traversal direction for a runtime processing node of the single NFA associated with an element of the lag portion of the at least one mode and a reverse traversal direction for a runtime processing node of the single NFA associated with all elements of the at least one mode. The hysteresis part of the at least one pattern may be the at least one pattern excluding a leading part of the at least one pattern, and the leading part may include a first element of the at least one pattern, a last element of the selected sub-pattern, and all elements in the at least one pattern between them.
Generating the unified DFA may include: associating a DFA node of the generated unified DFA with metadata, the node associated with a last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload a pointer to a starting node of the single NFA, instructions to transition to traversing the generated single NFA in a forward traversal direction and reporting a match of the selected sub-pattern, an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and reporting a length if the length of the selected sub-pattern is fixed, the starting node associated with a next element in the at least one pattern that immediately follows the last element of the selected sub-pattern.
Generating at least one NFA may include: associating a hysteresis node of the single NFA with metadata, the hysteresis node associated with a last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload, instructions for transitioning to traversing the single NFA in a reverse traversal direction with a payload start offset associated with an end offset of the selected sub-pattern. Generating at least one NFA may include: associating a leading node of the single NFA with metadata, the leading node associated with a first element of the at least one pattern, the metadata indicating to the walker instructions to terminate the traversal and report an offset within the payload of characters of the leading node that match the first element of the at least one pattern as a starting offset of the at least one pattern and a final match of the at least one pattern if required by a qualifier associated with the at least one pattern.
The position of the selected sub-pattern may be an intermediate position of the at least one pattern if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern. At least one NFA may be a single NFA if the length of the selected sub-pattern is fixed. The at least one traversal direction may include a reverse traversal direction for a runtime processing node of a single NFA associated with a leading portion of the at least one pattern and a forward traversal direction for a runtime processing node of a single NFA associated with all elements of the at least one pattern. The leading portion may be at least one mode excluding a lagging portion of the at least one mode. The hysteresis part may include a first element of the selected sub-pattern, a last element of the at least one pattern, and all elements in the at least one pattern between them.
Generating the unified DFA may include: associating a DFA node of the generated unified DFA with metadata, the node being associated with a last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a start node of the single NFA, instructions to transition to traversing the generated single NFA in a reverse traversal direction and reporting a match of the selected sub-pattern, an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an end offset of the selected sub-pattern and a length of the selected sub-pattern, the start node being associated with the last element of the preamble portion, the payload start offset being determined by subtracting the length of the selected sub-pattern from the end offset of the selected sub-pattern.
Generating at least one NFA may include: associating a leading node of the single NFA with metadata, the leading node associated with a first element of the at least one pattern, the metadata indicating instructions to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload to traverse the single NFA in a forward traversal direction. Generating at least one NFA may include: associating a hysteresis node of the single NFA with metadata, the hysteresis node associated with a last element of the at least one pattern, the metadata indicating to the walker instructions to terminate the traversal and report an offset within the payload of characters at the hysteresis node that match the last element of the at least one pattern and a final match of the at least one pattern.
The position of the selected sub-pattern may be an end position of the at least one pattern if a first element of the selected sub-pattern is a last element of the at least one pattern, and the portion of the at least one pattern used to generate the at least one NFA may be the at least one pattern excluding the selected sub-pattern if a length of the selected sub-pattern is fixed, and the at least one traversal direction may be a reverse traversal direction.
Generating the unified DFA may include: associating the DFA node with metadata, the DFA node associated with a last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload a pointer to a start node of the generated at least one NFA, instructions to transition to traversing the generated at least one NFA in a reverse traversal direction and reporting a match of the selected sub-pattern and an offset within the payload of characters at the DFA node that match the last element of the selected sub-pattern as an end offset of the selected sub-pattern, the start node of the generated at least one NFA associated with the last element of the portion, the payload start offset of the generated at least one NFA determined by subtracting a length of the selected sub-pattern from the end offset of the selected sub-pattern.
Generating at least one NFA may include: associating the NFA node associated with the first element of the portion with metadata indicating, to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload, instructions for terminating the traversal and reporting a final match of the at least one pattern and reporting an offset within the payload of characters matching the first element of the portion at the NFA node as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
The position of the selected sub-pattern may be an end position of the at least one pattern if a first element of the selected sub-pattern is a last element of the at least one pattern, and the portion of the at least one pattern used to generate the at least one NFA may be the at least one pattern and the at least one traversal direction may be a reverse traversal direction if a length of the selected sub-pattern is variable or fixed.
Generating the unified DFA may include: associating the DFA node with metadata, the DFA node associated with a last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload a pointer to a start node of the generated at least one NFA, instructions to transition to traversing the generated at least one NFA in a reverse traversal direction and reporting a match of the selected sub-pattern and an offset within the payload of a character of the DFA node matching the last element of the selected sub-pattern as an end offset of the selected sub-pattern and reporting a length if the length of the selected sub-pattern is fixed, the start node of the generated at least one NFA associated with the last element of the selected sub-pattern, the payload start offset of the generated at least one NFA associated with the end offset of the selected sub-pattern.
Generating at least one NFA may include: associating the NFA node associated with the first element of the portion with metadata indicating, to a walker configured to traverse the generated unified DFA and the at least one NFA with the payload, instructions for terminating the traversal and reporting a final match of the at least one pattern and reporting an offset within the payload of characters matching the first element of the portion at the NFA node as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
Storing the generated unified DFA and the at least one NFA in the at least one memory may include generating a binary image including the generated unified DFA and the at least one NFA.
The at least one processor may include a DFA coprocessor and an NFA coprocessor configured as acceleration units to offload DFA and NFA runtime processing, respectively.
Another example embodiment disclosed herein includes an apparatus corresponding to the operation consistent with the apparatus embodiments disclosed herein.
Additionally, yet another example embodiment may include a non-transitory computer readable medium having stored thereon a sequence of instructions which, when loaded and executed by a processor, causes the processor to perform the method disclosed herein.
Drawings
The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the invention.
FIG. 1 is a block diagram of one embodiment of a security device in which embodiments disclosed herein may be implemented.
Fig. 2A-G are example NFA and DFA graphs and tables illustrating the graph expansion concept.
FIG. 3A is another block diagram of one embodiment of a security device in which embodiments disclosed herein may be implemented.
Fig. 3B is a flow diagram of one example embodiment of a method that may be implemented in at least one processor operatively coupled to at least one memory in a security device operatively coupled to a network.
FIG. 4 is a block diagram of one embodiment for generating a unified DFA and at least one NFA based on a selected sub-pattern having a fixed length and a selected sub-pattern position being a start position of a regular expression pattern.
FIG. 5 is a block diagram of one embodiment for generating a unified DFA and at least one NFA based on the position of a selected sub-pattern being an intermediate position of a regular expression pattern and the length of the selected sub-pattern being fixed.
FIG. 6 is a block diagram of one embodiment for generating a unified DFA and at least one NFA based on the position of a selected sub-pattern being a middle or starting position of a regular expression pattern and the length of the sub-pattern being fixed or variable.
FIG. 7 is a block diagram of another embodiment for generating a unified DFA and at least one NFA based on the position of a selected sub-pattern being an intermediate or starting position of a regular expression pattern and the length of the selected sub-pattern being fixed or variable.
FIG. 8 is a block diagram of one embodiment for generating a unified DFA and at least one NFA based on the position of a selected sub-pattern being an intermediate position of a regular expression pattern and the length of the selected sub-pattern being fixed or variable.
FIG. 9 is a block diagram of one embodiment for generating a unified DFA and at least one NFA based on the position of a selected sub-pattern being an intermediate position of a regular expression pattern and the length of the selected sub-pattern being fixed.
FIG. 10 is a block diagram of one embodiment for generating a unified DFA and at least one NFA based on the position of a selected sub-pattern being the end position of a regular expression pattern and the length of the selected sub-pattern being fixed.
FIG. 11 is a block diagram of one embodiment for generating a unified DFA and at least one NFA based on the position of a selected sub-pattern being the end position of a regular expression pattern and the length of the selected sub-pattern being fixed or variable.
FIG. 12 is a block diagram of an example internal structure of a computer, optionally within one embodiment disclosed herein.
Detailed Description
Before describing in detail the exemplary embodiments of the present invention, an exemplary security application in which these embodiments may be implemented and an exemplary security application using typical processes of Deterministic Finite Automata (DFA) and non-deterministic finite automata (NFA) are described immediately below to assist the reader in understanding the inventive features of the present invention.
FIG. 1 is a block diagram of one embodiment of a security device 102 in which embodiments of the present invention may be implemented. The security device 102 may include a network services processor 100. The security device 102 may be a separate system that may switch packets received at one network interface 103a to another network interface 103b and may perform multiple security functions on the packets before forwarding the received packets. For example, the security device 102 may be used to perform security processing on a packet 101a that may be received over a Wide Area Network (WAN)105a or any other suitable network before forwarding the processed packet 101b to a Local Area Network (LAN)105b or any other suitable network.
Network services processor 100 may be configured to handle the Open Systems Interconnection (OSI) network layer L2-L7 protocols encapsulated in received packets. As known to those skilled in the art, the OSI reference model defines seven network protocol layers (L1-L7). The physical layer (L1) represents the actual electrical and physical interface that connects the device to the transmission medium. The data link layer (L2) performs data framing. The network layer (L3) formats the data into packets. The transport layer (L4) handles end-to-end transport. The session layer (L5) manages communication between devices, e.g., whether the communication is half duplex or full duplex. The presentation layer (L6) manages data formatting and presentation, such as syntax, control code, special graphics and character sets. The application layer (L7) allows communication between users, such as file transfer and email.
Network services processor 100 may schedule and queue work (e.g., packet processing operations) for higher level network protocols, such as L4-L7, and enable processing of the higher level network protocols in received packets to be performed to forward the packets at wire speed (i.e., the data transfer rate of the network through which data may be sent and received). By processing the protocol to forward packets at wire speed, the network services processor 100 does not slow down the network data transfer rate. The network service processor 100 may receive packets from the network interface 103a or 103b, which may be a physical hardware interface, and perform L2-L7 network protocol processing on the received packets. Network services processor 100 may then forward processed packet 101b through network interface 103a or 103b to another hop in the network, to a final destination, or through another bus (not shown) for further processing by a host processor (not shown). Network protocol processing may include processing network security protocols such as firewalls, application firewalls, Virtual Private Networks (VPNs) including IP security (IPSec) and/or Secure Sockets Layer (SSL), Intrusion Detection Systems (IDS), and anti-virus (AV).
Network server processor 100 may use multiple processors (i.e., cores) to deliver high application performance. Each of the cores (not shown) may be dedicated to performing data plane or control plane operations. The data plane operations may include packet operations for forwarding packets. The control plane operations may include processing portions of complex higher level protocols, such as internet protocol security (IPSec), Transmission Control Protocol (TCP), and Secure Sockets Layer (SSL). The data plane operations may include processing other portions of these complex higher-level protocols.
Network services processor 100 may also include a dedicated coprocessor (not shown) of an offload (offload) core so that network services processor 100 achieves high throughput. For example, the network services processor 100 may include an acceleration unit 106 that may include a hardware accelerated hyper non-deterministic automata (HNA) co-processor 108 for NFA processing and a hardware accelerated Hyper Finite Automata (HFA) co-processor 110 for DFA processing. The HNA 108 and HFA 110 co-processors may be configured to bypass the burdensome burden of a general purpose core (not shown) of the network services processor 100 to perform computation and memory intensive pattern matching methods.
The network services processor 100 may perform pattern searching, regular expression processing, content verification, transformation, and security accelerated packet processing. Regular expression processing and pattern searching can be used to perform string matching for AV and IDS applications, as well as other applications that require string matching. A memory controller (not shown) in network services processor 100 may control access to memory 104 operatively coupled to the network services processor 100. The memory may be internal (i.e., on-chip) or external (i.e., off-chip) or a combination thereof and may be configured to store received data packets, such as packet 101a, for processing by network services processor 100. The memory may be configured to store compiled rule data for lookup and pattern matching in DFA and NFA graph expression retrieval. The compiled rule data may be stored as a binary image 112 that includes the compiled rule data for the DFA and NFA, or as multiple binary images that separate the DFA compiled rule data from the NFA compiled rule data.
Typical content aware application processing may use DFAs or NFAs to identify patterns in the content of received packets. Both DFA and NFA are finite automata, i.e., computational models, each including a set of states, a starting state, an input letter (a set of all possible characters), and a transition function. The calculation starts in the start state and changes to a new state according to a transition function.
Patterns are commonly expressed using regular expressions that include atomic elements, e.g., common text characters such as A-Z, 0-9, and meta characters such as ^ a, |, and |. The atomic elements of a regular expression are the symbols to be matched (single characters). Atomic elements may be combined with meta-characters that allow concatenation (+), alternative (|), and clinet ("). The meta-characters used for concatenation may be used to create multiple character matching patterns from a single character (or sub-string), while the meta-characters used for alternatives (|) may be used to create regular expressions that may match any sub-string in two or more sub-strings. The meta character clinstar (×) allows pattern matching any number of times, including the absence of previous characters or strings.
Combining different operators and single characters allows the construction of complex sub-patterns of expressions. For example, a sub-pattern, such as (th (is | at)) may match multiple strings, such as: th, this, that, thisis, thisat, thatis, or thatat. Another example of a complex sub-pattern of an expression may be a sub-pattern that incorporates a character class construct [ … ] that allows listing a list of characters sought. For example, gr [ ea ] y looks for both grey and gray. Other examples of complex submodes are submodes that may use dashes to indicate a range of characters, such as [ A-Z ] or a source character that matches any one character "". The elements of the schema may be atomic elements or a combination of one or more atomic elements and one or more meta-character combinations.
The input to the DFA or NFA state machine is typically a string of (8-bit) bytes from the input stream (i.e., the received packet), that is, the alphabet may be a single byte (one character or symbol). Each byte in the input stream may produce a transition from one state to another. The states and transition functions of the DFA or NFA state machine may be represented by graphs. Each node in the graph may represent a state and the arcs in the graph may represent state transitions. The current state of the state machine may be represented by a node identifier that selects a particular node in the graph.
Using DFA to process regular expressions and find one or more patterns of regular expression descriptions in the input character stream can be characterized as having deterministic runtime performance. The next state of the DFA can be determined from the input character (or symbol) and the current state of the DFA, since there is only one state transition per DFA state. In this way, the runtime performance of the DFA is considered deterministic and the behavior can be fully predicted from the inputs. However, a tradeoff for certainty is a graph in which the number of nodes (or graph size) can grow exponentially with the size of the pattern.
In contrast, the number of nodes (or pattern size) that can characterize an NFA pattern grows linearly with the size of the pattern. However, using NFA to process regular expressions and find one or more patterns of regular expression descriptions in the input character stream may be characterized as having non-deterministic runtime performance. For example, given an input character (or symbol) and the current state of the NFA, it is possible that there is more than one next state of the NFA to transition to. Thus, the next state of the NFA cannot be uniquely determined from the input and the current state of the NFA. Thus, the runtime performance of NFA is considered non-deterministic, as the behavior cannot be fully predicted from the input.
Fig. 2A-G illustrate the DFA "graph unrolling" concept. FIGS. 2A, 2B, and 2C show graphs for patterns ". sup. > a [ < Lambda >/n ]", ". sup. < Lambda [ < Lambda >/n ] [ < Lambda \ n ]", respectively, and FIGS. 2D, 2E, and 2F show graphs of DFAs for the same patterns, respectively. As shown in fig. 2A-2F and summarized in the table of fig. 2G, NFAs may grow linearly for some modes, while DFAs for the same mode may grow exponentially to produce graph unrolling. As shown, for a given mode or modes, the number of DFA states may be greater than the number of NFA states, typically on the order of hundreds or thousands of states. This is an example of "graph unrolling," which is a flag property of DFAs.
According to embodiments disclosed herein, content searching may be performed using DFAs, NFAs, or a combination thereof. According to one embodiment, a runtime processor, a co-processor, or a combination thereof may be implemented in hardware and may be configured to implement a compiler and a walker (walker).
The compiler may compile a schema or an input list of schemas (also referred to as signatures or rules) into a DFA, NFA, or a combination thereof. The DFA and NFA may be binary data structures such as DFA and NFA graphs and tables.
The walker may perform runtime processing, i.e., actions to identify that a pattern exists in the input stream or to match a pattern with content in the input stream. The content may be a payload portion of an Internet Protocol (IP) datagram or any other suitable payload in the input stream. Runtime processing of a DFA or NFA graph may be referred to as traversing the DFA or NFA graph with payload to determine a pattern match. A processor configured to generate a DFA, NFA, or combination thereof may be referred to herein as a compiler. A processor configured to implement runtime processing of payloads using generated DFAs, NFAs, or a combination thereof may be referred to herein as a walker. In accordance with embodiments disclosed herein, the network services processor 100 may be configured to implement a compiler and a walker in the security device 102.
Fig. 3A is a block diagram of another embodiment of the security device 102 of fig. 1 in which embodiments of the present invention may be implemented. As described with reference to fig. 1, the security device 102 may be operatively coupled to one or more networks and may include a memory 104 and a network services processor 100 that may include an acceleration unit 106. Referring to FIG. 3A, the web services processor 100 may be configured to implement a compiler 306 that generates the binary image 112 and a traverser 320 that uses the binary image 112. For example, compiler 306 may generate binary image 112 that includes compiled rule data used by traverser 302 to perform a pattern matching method on received packet 101a (shown in fig. 1). According to embodiments disclosed herein, the compiler 306 may generate the binary image 112 by determining compiled rule data for a DFA, NFA, or a combination thereof based on at least one heuristic (heuristic), as described further below. Compiler 306 may determine rule data that is advantageously suitable for use with DFAs and NFAs.
According to embodiments disclosed herein, the compiler 306 may generate the binary image 112 by processing a rule set 310, which may include one or more sets of regular expression patterns 304 and optional qualifiers 308. From rule set 310, compiler 306 may generate unified DFA312 using a sub-pattern selected from all of the one or more regular expression patterns and at least one NFA314 for at least one pattern in set of one or more regular expression patterns 304 for use by walker 302 during runtime processing, and generate metadata (not shown) including mapping information for transitioning walker 320 between a state (not shown) of unified DFA312 and a state of the at least one NFA 314. Unified DFA312 and at least one NFA314 may be represented in a data structure as a graph or in any other suitable form, and the mappings in the metadata may be represented in a data structure as one or more tables or in any other suitable form. According to embodiments disclosed herein, if the sub-mode selected from the mode is the mode, the NFA is not generated for the mode. According to embodiments disclosed herein, each NFA generated may be used for a particular pattern in a set, while a unified DFA may be generated based on all sub-patterns from all patterns in the set.
The traverser 320 traverses the unified DFA312 and the at least one NFA314 with the payload by transitioning the states of the unified DFA312 and the at least one NFA based on the consumed bytes from the payload in the received packet 101 a. In this way, the traverser 320 traverses through the unified DFA312 and the at least one NFA314 with the payload.
The rule set 310 may include a set 304 of one or more regular expression patterns and may be in the form of a Perl-compatible regular expression (PCRE) script file or any other suitable form. The PCRE has become a practical rule for regular expression syntax in security and networking applications. As more applications requiring deep packet inspection have emerged or more threats have become prevalent in the internet, the corresponding signatures/patterns used to identify viruses/attacks or applications have also become more complex. For example, signature databases have evolved from having a simple string pattern to a regular expression (regex) pattern with wildcard characters, ranges, character classes, and advanced PCRE signatures.
As shown in FIG. 3A, optional qualifiers 308 may each be associated with a pattern in the set of regular expression patterns 304. For example, an optional qualifier 322 may be associated with the schema 316. The optional qualifiers 308 may each be one or more qualifiers that specify a desired customization, advanced PCRE signature options, or other suitable options for handling the schema associated with the qualifier. For example, qualifier 322 may indicate whether a start offset (i.e., the position in the payload of the first matching character of the pattern that matches in the payload) option in the advanced PCRE signature option for pattern 316 is desired.
As applications emerge, the start offset has become important for processing in Deep Packet Inspection (DPI) systems. Traditionally, only finite automata are required to report the presence or absence of a given pattern in the input and report the end offset of a matching pattern in the payload for processing. 4-11, if qualifier 322 indicates that it is desired to start an offset, compiler 306 may generate binary image 112 in a manner that causes traverser 320 to report (i.e., declare) an offset in the payload of the location of the first matching character of the pattern that matches in the payload.
According to embodiments disclosed herein, compiler 306 may generate unified DFAs 312 using sub-patterns 302 selected from all patterns in set 304 of one or more regular expression patterns. Compiler 306 may select sub-pattern 302 from each pattern in set of one or more regular expression patterns 304 based on at least one heuristic as described further below. Compiler 306 may also generate at least one NFA314 for at least one pattern 316 in the set, and at least one traversal direction for a portion (not shown) of the at least one pattern 316 used to generate the at least one NFA314 and for runtime processing (i.e., traversal) of the at least one NFA314 may be determined based on whether a length of a selected sub-pattern 318 is fixed or variable and a position of the selected sub-pattern 318 within the at least one pattern 316. Compiler 306 may store unified DFA312 and at least one NFA314 in at least one memory 104.
The compiler may determine whether the length of the selected potential sub-pattern is fixed or variable. For example, a sub-pattern, such as "cedf" may be determined to have a fixed length of 4, since "cdef" is a string, while a complex sub-pattern including operators may be determined to have a variable length. For example, a complex sub-pattern, such as "a. times.cd [. sup./n ] {0,10}. times.y" may have "cd [. sup./n ] {0,10 }" as a selected sub-pattern, which may have a variable length of 2 to 12.
According to embodiments disclosed herein, sub-mode selection may be based on at least one heuristic. A sub-pattern is a set of one or more contiguous elements from a pattern, where each element from the pattern may be represented by a node in a DFA or NFA graph for matching bytes or characters from the payload. The element may be a single text character represented by a node or a character class represented by a node as described above. Compiler 306 may determine which sub-patterns in a pattern are better suited for NFA based on whether the sub-patterns may cause excessive DFA graph expansion as described above with reference to fig. 2A-G. For example, generating a DFA from a sub-pattern that includes consecutive text characters will not produce a DFA graph expansion, whereas a complex sub-pattern, as described above, may include operators as well as characters, and thus may cause a DFA graph expansion. Sub-patterns, for example, comprising wildcard characters or larger character classes repeated multiple times (e.g., [ < Lambda >/n ] > or [ < Lambda >/n ] {16})) can cause an excess state in the DFA and thus can be more advantageously suited for NFA.
As disclosed above, selecting a sub-pattern from each pattern in the set of one or more regular expressions 304 may be based on at least one heuristic. According to one embodiment, the at least one heuristic may comprise maximizing the number of unique sub-patterns selected and the length of each sub-pattern selected. For example, a pattern, such as "ab. times.cdef. times.mn", may have a plurality of potential sub-patterns, such as "ab. times.," cdef ", and". times.mn ". The compiler may select "cdef" as a sub-pattern of the pattern because it is the largest sub-pattern in the pattern "ab. However, if a sub-pattern "cdef" has been selected for another pattern, the compiler may select an alternative sub-pattern for the pattern "ab. Alternatively, the compiler may replace the sub-pattern "cdef" with another sub-pattern for other patterns, thereby enabling the sub-pattern "cdef" to be selected for the pattern "ab.
In this way, compiler 306 may select sub-modes of mode 304 based on the context of the possible sub-modes for each mode of mode 304, thereby enabling maximization of the number of unique sub-modes selected and the length of each sub-mode selected. As such, the compiler 306 may generate a unified DFA312 from the selected sub-patterns 302 that minimizes the number of false positives (i.e., no matches or partial matches) in the pattern matching of the at least one NFA314 by increasing the probability of pattern matching in the at least one NFA 314.
By maximizing the sub-pattern length, false positives in NFA processing can be avoided. False positives in NFA processing can cause non-deterministic runtime processing and thus can reduce runtime performance. In addition, by maximizing the number of unique sub-patterns selected, compiler 306 implements a 1: 1 transition.
For example, if multiple schemas share a selected sub-schema, the walker of the unified DFA will need to transition to multiple at least one DFA, since each at least one NFA is a per-schema NFA, and sub-schema matches from the unified DFA represent partial matches for each of the multiple schemas. In this way, maximizing the number of unique submodes reduces DFA: NFA 1: the number of N transitions thereby reduces the runtime processing of the walker 320.
To achieve maximizing the number of unique sub-patterns, compiler 302 may calculate hash value 326 for the selected sub-pattern 318 and store the calculated hash value 326 in association with an identifier (not shown) of the pattern 316 from which the sub-pattern 318 is selected. For example, compiler 306 may calculate a hash value for the selected sub-pattern for each pattern in set 304. The calculated hash values 324 may be stored in the at least one memory 104 as a table or in any suitable manner. The hashing method used may be any suitable hashing method. The compiler may compare the computed hash value to a list of hash values of sub-patterns selected for other patterns in the set to determine if the selected sub-pattern is unique.
If the computed hash value is found in the list, the compiler may determine whether to (i) replace the selected sub-pattern with another sub-pattern from the pattern or (ii) replace the selected sub-pattern with an alternative sub-pattern selected from another pattern in the set with another pattern selected sub-pattern in the set. Another pattern in the set may be identified based on an association with the hash value computed in the list. The determination of whether to replace (i) or (ii) may be based on comparing the lengths of the sub-patterns considered for replacement in order to maximize the length of the selected unique sub-pattern as described above. Alternative selected sub-patterns may include selecting the next longest sub-pattern or the next highest priority sub-pattern identified for a given pattern. The potential sub-patterns may be prioritized based on, for example, the likelihood of causing DFA unrolling or the magnitude of the expected DFA unrolling.
According to embodiments disclosed herein, the at least one heuristic may include identifying sub-patterns for each mode and ignoring a given sub-pattern of the identified sub-patterns for each mode if the given sub-pattern has a length less than a minimum threshold. For example, to reduce false positives in at least one NFA, the compiler may ignore sub-patterns having a length less than a minimum threshold, as such sub-patterns may result in a higher probability of false positives in at least one NFA.
The at least one heuristic may include accessing a knowledge base (not shown) of sub-patterns associated with the historical usage frequency indicator and ignoring a given sub-pattern if the historical usage frequency indicator for the given sub-pattern of the identified sub-patterns for each pattern in the accessed knowledge base is greater than or equal to a usage frequency threshold. For example, an application or protocol specific sub-mode may have a high frequency of use, such as payload for hypertext transfer protocol (HTTP), "carrier return line feed," or a clear traffic (clear traffic) such as multiple consecutive 0 s from a binary file, or any other frequently used sub-mode.
The at least one heuristic may include identifying sub-patterns for each pattern, and maximizing a number of consecutive text characters in a selected sub-pattern by, for each pattern, selecting the given sub-pattern based on the given sub-pattern having a maximum number of sub-patterns identified and based on the given sub-pattern being unique among all sub-patterns selected for the set of one or more regular expressions. As disclosed above, maximizing the length of the selected sub-pattern may enable a higher probability of a match in at least one NFA.
The at least one heuristic may include prioritizing a given sub-pattern based on a sub-pattern type of each sub-pattern of a given sub-pattern of each pattern and a length of the given sub-pattern. The sub-pattern type may be text-only, alternative (alternative), single character repetition, or multi-character repetition, and the highest to lowest priority for the sub-pattern type may be text-only, alternative (alternative), single character repetition, and multi-character repetition. In this way, a sub-pattern that is a text string having a length of at least a minimum length threshold may be prioritized over a complex sub-pattern of variable length.
Compiler 306 may cause the longer length sub-mode to be prioritized over another sub-mode of lesser length. Compiler 306 may select a unique sub-mode as the selected sub-mode based on the priority setting. As described above, the selected unique sub-pattern may have a length of at least a minimum length threshold.
If a given sub-mode is not unique and has a length of at least a minimum length threshold, compiler 306 may select a non-unique sub-mode as the selected sub-mode based on a priority setting. In this way, compiler 306 may select a sub-pattern from a pattern that is a repetition of a sub-pattern selected from another pattern, rather than selecting a sub-pattern having a length less than a minimum threshold. To facilitate finalizing the sub-patterns, compiler 306 may perform multiple passes on the patterns and order the possible sub-patterns by length. As such, compiler sub-pattern selection for a given pattern in the set of one or more regular expressions 304 may be performed within the context of sub-pattern selection for other patterns in the set of one or more regular expressions 304.
As described above, qualifier 322 may indicate that a report start offset is desired. However, the start offset may not be readily identifiable. Finding a start offset in the payload that matches a pattern such as "a. times.b" or "a. times.d" for example, may be difficult given a payload such as "axycamb", because two patterns may match "axycamb" and "amb". As such, it may be necessary to track the offsets for the two instances of "a" in the payload as potential start offsets. According to embodiments disclosed herein, there is no need to track the potential start offset, since the start offset is not determined until a match for the entire pattern is determined to have been found in the payload. Matches that determine the entire pattern may be found using match results from the unified DFA, the at least one NFA, or a combination thereof.
According to embodiments disclosed herein, if the payload in the received packet 101 includes content that matches the sub-pattern 318 selected from the pattern 316, the walker may transition to traverse at least one NFA for the pattern 318. The walker 320 may report the match and offset for the selected sub-pattern 318 identifying the position in the received packet of the last character of the matching sub-pattern as the ending offset for the sub-pattern in the payload. If the sub-pattern is a subset of the patterns, the sub-pattern matching may be a partial matching for the patterns. As such, the traverser 320 may continue to search the payload for the remainder of the pattern by traversing at least one NFA for the pattern to determine a final match for the pattern. It should be appreciated that the pattern may traverse one or more payloads in the received packet 101 a.
Fig. 3B is a flow diagram (350) of one example embodiment of a method that may be implemented in at least one processor operatively coupled to at least one memory in a security device operatively coupled to a network. The method may begin (352) and select a sub-pattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic (354). The method may generate a unified deterministic automaton (DFA) (356) using the selected sub-patterns from all the patterns in the set. The method may generate at least one non-deterministic automaton (NFA) for at least one pattern in the set, the portion of the at least one pattern used to generate the at least one NFA and the at least one traversal direction for runtime processing of the at least one NFA being determined based on whether a length of the selected sub-pattern is fixed or variable and a position of the selected sub-pattern within the at least one pattern (358). The method may store the generated unified DFA and the at least one NFA in at least one memory (360). The method then ends in the example embodiment (362).
As disclosed above, compiler 306 may generate unified DFA312 and at least one NFA314 to enable traverser 320 to search received packet 101a for a match for set 304 of one or more regular expression patterns. Compiler 306 may select a sub-pattern from each pattern in set 304 of one or more regular expression patterns based on at least one heuristic. A unified DFA312 may be generated using sub-patterns 302 selected from all patterns in the set 304. Compiler 306 may generate at least one NFA314 for at least one pattern 316 in set 304. The portion of the at least one pattern used to generate the at least one NFA314 and the at least one traversal direction for runtime processing of the at least one NFA314 may be determined based on whether the length of the selected sub-pattern 318 is fixed or variable and the position of the selected sub-pattern 318 within the at least one pattern 316 as disclosed below with reference to fig. 4-11.
Fig. 4 is a block diagram 400 for generating a unified DFA312 and at least one NFA314 based on a fixed length of a selected sub-pattern 404 and a position of the selected sub-pattern being a starting position of at least one pattern 406. As shown in FIG. 4, a first element 408 of the selected sub-pattern 404 is a first element of at least one pattern 406. The portion 410 of the at least one pattern used to generate the at least one NFA402 may be the at least one pattern 406 that excludes the selected sub-pattern 404. At least one NFA314 may be a single NFA402 and at least one traversal direction of at least one NFA314 may be a forward traversal direction 412. For example, for a given mode, such as "cavium," the forward traversal direction will traverse the input payload through the nodes of at least one NFA314 in a traversal direction from "c" to "m," while the reverse traversal direction will traverse the input payload in a traversal direction from "m" to "c.
According to the example embodiment of fig. 4, compiler 306 may associate DFA nodes 414 of unified DFA312 associated with last element 416 of selected sub-pattern 404 with metadata 418. The metadata 418 may indicate a pointer 420 to a start node 422 of a single NFA402 to a traverser 320 configured to traverse the unified DFA312 and at least one NFA314 with a payload 426. Metadata 418 may include instructions to transition to traverse a single NFA402 in forward traversal direction 412. A starting node 422 of a single NFA402 may be associated with a first element 424 of at least one pattern 406 used to generate portion 410 of a single NFA 402. The metadata 418 may indicate to the walker 320 to report the matching of the selected sub-pattern 404, the leading offset (of offsets 428) of the leading character (of characters 430) within the payload 426 that matches the last element 416 of the selected sub-pattern 404 at the DFA node 414 as the ending offset of the selected sub-pattern, and the length of the selected sub-pattern. The payload start offset used to traverse a single NFA402 may be the offset of the byte following the byte of the end offset in the payload 426. For example, the next character in the payload to begin traversal of a single NFA402 at start node 422 may be determined to be the byte following the byte of the end offset in the payload. Since the length of the selected sub-mode is fixed, compiler 306 may determine the length of the selected sub-mode and include it in metadata 418. The traverser 320 may use the length included in the metadata 418 to determine a start offset of the pattern 406 within the payload 426. For example, if a qualifier in qualifier 308 requires, traverser 320 may determine the start offset by subtracting the length included in metadata 418 from the determined end offset.
It should be appreciated that reporting may be performed in any suitable manner. For example, walker 320 may report the ending offset by declaring the ending offset to network service processor 100, e.g., by writing to a memory location, triggering an interrupt, sending or publishing a message, etc. Alternatively, the traverser 320 may report the end offset or any other offset or information based on the matching results by declaring the end offset or other established results in its own data structure for use in the traverser's own process.
According to the example embodiment of fig. 4, compiler 306 may associate NFA node 432 of the generated single NFA with metadata 434 that indicates to the walker an instruction to terminate the traversal because a final match for the entire pattern 406 has been identified. NFA node 432 may be associated with a last element 436 of at least one pattern 406. Metadata 434 may instruct traversator 320 to report the lag offset (in offset 428) of the lag character (in character 430) within payload 426 that matched at NFA node 432 as the ending offset of at least one pattern 406 and the final match of at least one pattern 406.
The traversator 320 can associate each traversal for a given schema with a transaction identifier. In this way, sub-pattern lengths, payload character offsets, and pattern matching results may be reported in association with corresponding transaction identifiers. In an example embodiment, the network service processor 100 may correlate the walker result information for a given pattern based on the transaction identifier used to seek traversal of the given pattern.
Fig. 5 is a block diagram 500 for generating a unified DFA312 and at least one NFA314 based on the position of a selected sub-pattern 504 being an intermediate position of at least one pattern 506 and the length of the selected sub-pattern 504 being fixed. According to the example embodiment of fig. 5, the portion of the at least one pattern 506 used to generate the at least one NFA314 includes a lag portion 508 and a leading portion 510 of the at least one pattern 506. As shown in fig. 5, the lag portion 508 of the at least one mode 506 may be the at least one mode 506 excluding the selected sub-mode 506 and the leading portion 510 of the at least one mode 506. The leading portion 510 of the at least one pattern 506 excludes the selected sub-pattern 506 and the lagging portion 508 of the at least one pattern 506.
According to the example embodiment of fig. 5, at least one NFA314 includes a lagging NFA 512 and a leading NFA 514. The at least one traversal direction includes a forward traversal direction 516 and a reverse traversal direction 518. The lagging NFA 512 may be traversed in a forward traversal direction 516, and the leading NFA514 may be traversed in a reverse traversal direction 518. The lagging portion 508 of the at least one pattern 506 can be used to generate a lagging NFA 512 and the leading portion 510 of the at least one pattern 506 can be used to generate a leading NFA 514.
According to the example embodiment of fig. 5, compiler 306 may associate DFA nodes 515 of unified DFA312 with metadata 520 having a last element 522 of selected sub-pattern 504. The metadata 520 may indicate to a walker configured to traverse the unified DFA312 and the at least one NFA314 with a payload, such as the payload 426 of fig. 4. The metadata 520 may include a pointer 524 to a start node 526 of the lagging NFA 512, instructions to transition the traverser 320 to traverse the lagging NFA 512 in the forward traversal direction 516 with the payload starting at an offset of a byte following the byte ending the offset in the payload 426. The start node 526 of the late NFA 512 may be associated with a first element 528 of the late portion 508. The metadata 520 may indicate a pointer 530 to a starting node 532 of the leading NFA514 and instructions for the walker 320 to transition to traverse the leading NFA514 in the reverse traversal direction 518. The starting node 532 of the leading NFA514 may be associated with the last element 534 of the leading portion 510. Metadata 520 may indicate to walker 320 the offset of the character (of character 430) within payload 426 (of offset 428) reported at DFA node 515 that matches the last element of the selected sub-pattern 522 as the ending offset of the selected sub-pattern 504, the matching of the selected sub-pattern, and the length of the selected sub-pattern. The traversator 320 may use the length included in the metadata 520 to determine a starting payload offset for starting the reverse traversal at the start node 532 by subtracting the length of the selected sub-pattern in the metadata 520 from the ending offset of the selected sub-pattern 504.
According to the example embodiment of fig. 5, compiler 306 may associate hysteresis node 536 of hysteresis NFA 512 associated with last element 538 of at least one pattern 506 with metadata 540. Metadata 540 may indicate to walker 320 an instruction to terminate traversing lagging NFA 512 and report the lagging offset (in offset 428) within payload 426 (in offset 428) of the lagging character (in character 430) of payload 426 that matches the last element 538 of lagging node 536. Metadata 540 may indicate to walker 320 that a match of lag portion 508 of at least one pattern 506 is reported.
According to the example embodiment of fig. 5, the compiler 306 may associate a leading node 542 of the leading NFA514 associated with the first element of the at least one pattern 506 with metadata 546 indicating to the walker 320 an instruction to terminate traversal of the leading NFA 514. Metadata 546 may indicate to walker 320 that a match of the leading portion 510 of at least one pattern 506 is reported. If a qualifier associated with at least one pattern 506, such as one of qualifiers 308, is required, metadata 546 may indicate to walker 320 the leading offset (of offsets 428) of the leading character (of characters 430) within payload 426 that matches the first element 544 of leading node 542 of report payload 426 as the start offset for the at least one pattern 506.
Fig. 6 is a block diagram 600 of one embodiment for generating a unified DFA312 and at least one NFA314 based on the position of a selected sub-pattern being a middle position or starting position of at least one pattern and the length of the sub-pattern being fixed or variable. According to the example embodiment of fig. 6, the portion of the at least one pattern 606 used to generate the at least one NFA314 includes a hysteresis portion 608 and an entire portion 610 of the at least one pattern 606. The lag portion 608 of the at least one pattern 606 may be the at least one pattern 606 excluding the leading portion 612 of the at least one pattern 606. The leading portion 612 includes a first element 614 of the at least one pattern 606, a last element 616 of the selected sub-pattern 604, and all elements in between in the at least one pattern 606. The entire portion 610 of the at least one pattern 606 may be the at least one pattern 606.
If the first element 618 of the selected sub-pattern 604 is not the first element of the at least one pattern 606 and the last element 616 of the selected sub-pattern 604 is not the last element 620 of the at least one pattern 606, the position of the selected sub-pattern is an intermediate position of the at least one pattern 606 and the start portion 622 precedes the selected sub-pattern 604 in the at least one pattern 606.
If the first element 618 of the selected sub-pattern 604 is the first element 614 of the at least one pattern, the position of the selected sub-pattern is the starting position of the at least one pattern 606. If the position of the selected sub-pattern is the start position, the start part 622 is not present and the preamble part 612 is the selected sub-pattern 604.
According to the example embodiment of fig. 6, at least one NFA includes a lagging NFA 624 and an umbrella NFA 626. The at least one traversal direction includes a forward traversal direction 628 and a reverse traversal direction 630. Hysteretic NFA 624 has a forward direction of traversal 628 and umbrella NFA 626 has a reverse direction of traversal 630. The hysteresis portion 608 of at least one mode 606 can be used by the compiler 306 to generate a hysteresis NFA 624. The entire portion 610 of at least one pattern 606 may be used by compiler 306 to generate umbrella-shaped NFA 626.
According to the example embodiment of fig. 6, compiler 306 may associate DFA nodes 632 of unified DFA312 with metadata 634 for the last element 616 of selected sub-pattern 604. Metadata 634 may indicate to walker 320 a pointer 636 that points to the start node 638 of lagging NFA 624 and instructions to transition to traverse lagging NFA 624 in forward traversal direction 628. The start node 638 of the lagging NFA 624 may be associated with a first element 640 of the lagging portion 608. Metadata 634 may indicate to walker 320 to report the matching of the selected sub-pattern 604 and the offset of the character (of character 430) within payload 426 (of offsets 428) of the DFA node that matches the last element 616 of the selected sub-pattern 604 as the ending offset of the selected sub-pattern 604 and report the length if the length of the selected sub-pattern 604 is fixed.
According to the example embodiment of fig. 6, the compiler 306 may associate a lag node 642 of the lag NFA 624 associated with the last element 620 of the at least one pattern 606 with the metadata 652. The metadata 652 may indicate to the walker 320 a pointer 644 to the starting node 646 of the umbrella shape NFA 626, an instruction to transition to traverse the umbrella shape 626 in the reverse traversal direction 630. The starting node 646 of the umbrella 626 can be associated with the last element 620 of at least one schema 606. Metadata 652 may indicate to the walker an offset (of offsets 428) within payload 426 of characters (of characters 430) that optionally report a match with the last element 620 of at least one pattern 606 at lag node 642 and optionally report a match of lag portion 608 of at least one pattern 606.
According to the example embodiment of fig. 6, compiler 306 may associate umbrella node 648 of umbrella NFA 626 associated with first element 614 of at least one schema 606 with metadata 650. Metadata 650 may indicate instructions to walker 320 for terminating traversal and reporting a final match of at least one pattern 606. If a qualifier associated with at least one pattern 606 in qualifiers 308 requires, metadata 650 may indicate to the walker a start offset (in offsets 428) within payload 426 of a start character reported at umbrella node 648 that matches first element 614 of at least one pattern 606 as the start offset of at least one pattern 606.
Fig. 7 is a block diagram 700 of another embodiment of generating a unified DFA312 and at least one NFA314 based on the position of a selected sub-pattern 704 being a middle or starting position of at least one pattern 706 and the length of the selected sub-pattern 704 being fixed or variable. According to the example embodiment of fig. 7, the portion of the at least one pattern used to generate NFA314 includes a lag portion 708 and a leading portion 712 of the at least one pattern 706. The hysteresis portion 708 of the at least one pattern 706 may be the at least one pattern 706 excluding the preamble portion 712 of the at least one pattern 706. The preamble portion 712 includes a first element 714 of the at least one pattern 706, a last element 716 of the selected sub-pattern 704, and all elements in between in the at least one pattern 706. If the position of the selected sub-pattern 704 is a start position, the preamble part 712 may be the selected sub-pattern.
If the first element 718 of the selected sub-mode 704 is not the first element 714 of the at least one mode 706 and the last element 716 of the selected sub-mode 704 is not the last element 720 of the at least one mode 706, the position of the selected sub-mode is a middle position of the at least one mode 706 and the start portion 722 precedes the selected sub-mode 704 in the at least one mode 706.
If the first element 718 of the selected sub-pattern 704 is the first element 714 of the at least one pattern, the position of the selected sub-pattern is the starting position of the at least one pattern 706. If the position of the selected sub-pattern is the start position, the start part 722 is not present and the preamble part 712 is the selected sub-pattern 704.
According to the example embodiment of fig. 7, at least one NFA314 includes a lagging NFA 724 and a leading NFA 726, and at least one traversal direction includes a forward traversal direction 728 and a reverse traversal direction 730. Lagging NFA 724 has a forward direction of traversal 728. The leading NFA 726 has a reverse traversal direction 730. The hysteresis portion 708 of at least one mode 706 may be used to generate a hysteresis NFA 724. The preamble portion 712 of at least one pattern 706 may be used to generate a preamble NFA 726.
According to the example embodiment of fig. 7, compiler 306 may associate DFA node 732 of unified DFA312 associated with last element 716 of selected sub-pattern 704 with metadata 734. The metadata 734 may indicate to the walker 320 a pointer 736 to a start node 738 of the lagging NFA 724 and instructions to transition to traverse the lagging NFA 724 in the forward traversal direction 728. The start node 738 of the retard NFA 724 may be associated with the first element 740 of the retard portion 708. The start offset of the payload for starting the forward traversal of the lagging NFA 724 may be an offset of a byte following the byte of the end offset of the selected sub-pattern 704. The metadata 734 may indicate to the walker 320 a pointer 744 to the start node 746 of the leading NFA 726 and instructions to transition to traverse the leading NFA 726 in the reverse traversal direction 730. The starting node 746 of the leading NFA 726 may be associated with the last element 716 of the selected sub-pattern 704. The payload offset used to begin the reverse traversal of the leading NFA 726 may be the ending offset of the selected sub-pattern 704. Metadata 734 may indicate to walker 320 to report the matching of the selected sub-pattern 704 and the offset within payload 426 (in offset 428) of the character (in character 430) of DFA node 732 that matches the last element 716 of the selected sub-pattern 704 as the ending offset of the selected sub-pattern 704 and report the length if the length of the selected sub-pattern 704 is fixed.
According to the example embodiment of fig. 7, the compiler 306 may associate a hysteresis node 742 of the hysteresis NFA 724 associated with the last element 720 of the at least one schema 706 with metadata 752. Metadata 752 may indicate to walker 320 to terminate traversal of the lagging NFA and report the lagging offset (of offsets 428) within payload 426 of the lagging character (of characters 430) of lagging node 742 that matches the last element 720 of at least one pattern 706 and report the matching of the lagging portion 708 of at least one pattern 706.
According to the example embodiment of fig. 7, the compiler 306 may associate a leading node 748 of the generated leading NFA 724 associated with the first element 714 of the at least one schema 706 with metadata 750. Metadata 750 may indicate to walker 320 instructions to terminate traversal of leading NFA 726 and report a leading offset (of offsets 428) of the leading character (of characters 430) within the payload that matches the first element 714 of at least one pattern 706 at leading node 748.
The embodiment of fig. 7 may be considered an optimization of the embodiment of fig. 6 because the walker 320 need not traverse the NFA for the lag portion 708 in the reverse direction.
Fig. 8 is a block diagram 800 of one embodiment of generating a unified DFA312 and at least one NFA314 based on the position of the selected sub-pattern 804 being a middle position of the at least one pattern 806 and the length of the selected sub-pattern 804 being fixed or variable. According to the example embodiment of fig. 8, at least one NFA314 is a single NFA 854. The at least one traversal direction includes a forward traversal direction 828 for runtime processing nodes of a single NFA 854 associated with elements of the lag portion 808 of the at least one pattern 806 and a reverse traversal direction 830 for runtime processing nodes of a single NFA 854 associated with all elements of the at least one pattern 806. The lag portion 808 of the at least one pattern 806 is the at least one pattern 806 that excludes the lead portion 812 of the at least one pattern 806. The preamble 812 includes a first element 814 of the at least one pattern 806, a last element 816 of the selected sub-pattern 804, and all elements in between in the at least one pattern 806.
According to the example embodiment of fig. 8, compiler 806 may associate DFA nodes 832 of unified DFA312 associated with last element 816 of selected sub-pattern 804 with metadata 834. The metadata 834 may indicate to the walker 320 a pointer 836 that points to the start node 856 of a single NFA 854 and instructions to transition to traverse the single NFA 854 in the forward traversal direction 828. The start node 856 may be associated with a next element 840 in the at least one schema 806 immediately following the last element 816 of the selected sub-schema 804. Metadata 834 may indicate to the walker 320 to report the match of the selected sub-pattern 804, the offset within the payload 426 (in offset 428) of the character (in character 430) of the DFA node 832 that matches the last element 816 of the selected sub-pattern 804 as the ending offset of the selected sub-pattern 804 and report the length if the length of the selected sub-pattern 804 is fixed.
According to the example embodiment of fig. 8, compiler 806 may associate hysteresis node 842 of a single NFA 854 associated with last element 820 of at least one pattern 806 with metadata 852 indicating instructions to traverse 320 to transition to traverse the single NFA 854 in opposite traversal direction 830 with payload beginning at the end offset of selected sub-pattern 804. The compiler 306 may associate the leader node 848 associated with the first element 814 of at least one mode 806 with the metadata 850 for a single NFA 854. The metadata 850 may indicate to the walker 320 instructions to terminate the traversal and report the offset of the character (of the characters 430) within the payload 426 (of the offset 428) that matches the first element 814 of the at least one pattern 806 at the leading node 848 if needed by a qualifier associated with the at least one pattern 806 in the qualifiers 308 as a starting offset for the at least one pattern 806 and a final match for the at least one pattern 806.
Fig. 9 is a block diagram 900 of one embodiment of generating a unified DFA312 and at least one NFA314 based on the position of a selected sub-pattern 904 being a middle position of at least one pattern 906 and the length of the selected sub-pattern 904 being fixed. According to the example embodiment of fig. 9, the at least one NFA314 may be a single NFA 954, and the at least one traversal direction includes a reverse traversal direction 930 for runtime processing nodes of the single NFA 954 that are associated with the leading portion 912 of the at least one mode 906 and a forward traversal direction 928 for runtime processing nodes of the single NFA 954 that are associated with all elements of the at least one mode 906. The leading portion 912 can be at least one pattern 906 of the lagging portion 908 excluding the at least one pattern 906. The hysteresis portion 908 includes a first element 918 of the selected sub-pattern 904, a last element 920 of the at least one pattern 906, and all elements in the at least one pattern 906 between them.
According to the example embodiment of fig. 9, compiler 306 may associate DFA node 932 of unified DFA312 associated with last element 916 of selected sub-pattern 904 with metadata 956. The metadata 956 may indicate to the walker 320 a pointer 936 pointing to the start node 946 of a single NFA 954 and instructions to transition to traverse the single NFA 954 in the reverse traversal direction 930. The start node 946 may be associated with the last element 934 of the preamble 912. Metadata 956 may indicate to walker 320 a match reporting the selected sub-pattern 904. Metadata 956 may indicate to walker 320 the offset of the character (of characters 430) within payload 426 (within offset 428) reported at DFA node 932 associated with the last element 916 of the selected sub-mode 904 as the ending offset of the selected sub-mode 904 and the length of the selected sub-mode. The traverser 320 may use the length (if included in the metadata 956) to determine the payload start offset of the start node 946 by subtracting the length of the selected sub-mode in the metadata 956 from the end offset of the selected sub-mode.
According to the example embodiment of fig. 9, compiler 306 may associate leader node 948 of a single NFA 954 associated with first element 914 of at least one mode 906 with metadata 950. Metadata 950 may indicate to walker 320 an instruction to transition to traverse a single NFA 954 in forward traversal direction 928. The compiler 306 may associate a hysteresis node 942 of a single NFA 954 associated with the last element 920 of at least one pattern 906 with metadata 952. Metadata 952 may indicate to walker 320 an instruction to terminate the traversal. Metadata 952 may indicate to the walker that the offset (in offset 428) of the character (in character 430) within payload 426 that matches the last element 920 of at least one pattern 906 at lag node 942 and the final match of at least one pattern 906 are reported.
Fig. 10 is a block diagram 1000 of one embodiment of generating a unified DFA312 and at least one NFA314 based on the position of the selected sub-pattern 1004 being a middle position of the at least one pattern 1006 and the length of the selected sub-pattern 1004 being fixed. According to the example embodiment of fig. 10, if the last element 1016 of the selected sub-mode 1004 may be the last element of the at least one mode 1016, the position of the selected sub-mode 1004 may be the end position of the at least one mode 1006 and the at least one NFA314 may be a single NFA 1054. If the length of the selected sub-pattern 1004 is fixed, a portion 1012 of the at least one pattern 1006 used to generate the single NFA1054 may be the at least one pattern 1006 excluding the selected sub-pattern 1004. At least one traversal direction may be a reverse traversal direction 1030 for traversing a single NFA 1054.
According to the example embodiment of fig. 10, compiler 306 may associate DFA node 1032 corresponding to the last element 1016 of the selected sub-mode 1004 with metadata 1052. The metadata 1052 may indicate to the walker 320 a pointer 1036 to the start node 1045 of the single NFA1054 and instructions to transition to traverse the single NFA1054 in the reverse traversal direction 1030. The starting node 1046 of the single NFA 1046 is associated with the last element 1034 of the portion 1012. The metadata 1052 may indicate to the walker 320 to report the matching of the selected sub-mode 1004 and the offset of the character (of the characters 430) within the payload 426 (of the offsets 428) of the character (of the characters 430) of the DFA node 1032 that matches the last element 1016 of the selected sub-mode 1004 as the ending offset of the selected sub-mode 1004 and the length of the selected sub-mode 1004. The traverser 320 may use the length (if included in the metadata 1052) to determine the payload start offset of the start node 1046 by subtracting the length of the selected sub-pattern in the metadata 1052 from the end offset of the selected sub-pattern 1004.
According to the example embodiment of fig. 10, compiler 306 may associate NFA node 1048 associated with first element 1014 of portion 1012 with metadata 1050. Metadata 1050 may indicate to walker 320 to terminate traversal and report a final match of at least one pattern 1006 and report an offset within payload 426 (of offsets 428) of a character (of characters 430) matching first element 1014 of portion 1012 of NFA node 1048 within a payload 426 as a starting offset of at least one pattern 1006 if a qualifier associated with at least one pattern 1006 of qualifiers 308 requires.
Fig. 11 is a block diagram 1100 of one embodiment of generating a unified DFA312 and at least one NFA314 based on the position of a selected sub-pattern 1104 being the end position of at least one pattern 1106 and the length of the selected sub-pattern 1104 being fixed or variable. According to the example embodiment of fig. 11, if the last element 1116 of the selected sub-mode 1104 may be the last element of the at least one mode 1106, the position of the selected sub-mode 1104 is the end position of the at least one mode 1106 and the at least one NFA314 may be a single NFA 1154. The portion 1112 of the at least one pattern 1106 used to generate the single NFA1154 may be the at least one pattern 1106 if the length of the selected sub-pattern 1104 is fixed or variable. At least one traversal direction may be a reverse traversal direction 1130 for traversing a single NFA 1154.
According to the example embodiment of fig. 11, compiler 306 may associate DFA node 1132 corresponding to last element 1116 of selected sub-pattern 1104 with metadata 1152. The metadata 1152 may indicate to the walker 320 a pointer 1136 pointing to the starting node 1156 of a single NFA1154 and instructions to transition to traverse the single NFA1154 in the reverse traversal direction 1130. The starting node 1146 of a single NFA1154 may be associated with the last element 1116 of the selected sub-pattern 1104. Metadata 1152 may indicate to walker 320 the match of the selected sub-pattern 1104 and the offset of the character (of characters 430) within payload 426 (of offsets 428) of the DFA node 1132 that matches the last element 1116 of the selected sub-pattern 1104 as the ending offset of the selected sub-pattern 1104 and report the length if the length of the selected sub-pattern 1104 is fixed.
According to the example embodiment of fig. 11, compiler 306 may associate NFA node 1148 associated with first element 1114 of portion 1112 with metadata 1150. Metadata 1150 may indicate to walker 320 to terminate traversal and report a final match of at least one pattern 1106. Metadata 1152 may indicate to walker 320 to report the offset (in offsets 428) of the character (in characters 430) matching the first element 1114 of portion 1112 in NFA node 1148 within payload 426 as the start offset of at least one pattern 1106 if a qualifier associated with the at least one pattern 1106 in qualifiers 304 requires.
FIG. 12 is a block diagram of an example of the internal structure of a computer 1200 in which various embodiments of the invention may be implemented. Computer 1200 includes a system bus 1202, which is a collection of hardware lines used for data transfer among the components of a computer or processing system. The system bus 1202 is essentially a shared conduit that connects different elements of a computer system (e.g., processor, disk storage, memory, input/output ports, network ports, etc.) that enables the transfer of information between the elements. Operative with system bus 1202 is an I/O device interface 1204 for connecting various input and output devices (e.g., keyboard, mouse, displays, printers, speakers, etc.) to the computer 1200. Network interface 1206 allows computer 1200 to connect to various other devices attached to a network. Memory 1208 provides volatile storage for computer software instructions 1210 and data 1212 that can be used to implement embodiments of the present invention. Disk storage 1214 provides non-volatile storage for computer software instructions 1210 and data 1212, which can be used to implement embodiments of the present invention. Central processing unit 1218 also operates with system bus 1202 and provides execution of computer instructions.
Further example embodiments of the invention may be configured using a computer program product; the control may be programmed, for example, in software for implementing an exemplary embodiment of the present invention. Further example embodiments of the invention may include a non-transitory computer readable medium containing instructions that are executable by a processor and that, when executed, cause the processor to perform the methods described herein. It should be understood that the elements of the block diagrams and flow diagrams described herein may be implemented in software, hardware, firmware, or other similar implementations as determined in the future. Furthermore, the elements of the block diagrams and flow diagrams described herein may be combined or divided in any manner of software, hardware, or firmware.
It is to be understood that the term "herein" may be transferred to an application or patent incorporating the teachings presented herein such that the subject matter, definitions or data are transferred to the application or patent doing so.
If implemented in software, the software may be written in any language that can support the example embodiments disclosed herein. The software may be stored in any form of a computer readable medium, such as Random Access Memory (RAM), Read Only Memory (ROM), compact disk read only memory (CD-ROM), and the like. In operation, a general-purpose or special-purpose processor loads and executes software in a manner well understood in the art. It should also be understood that the block diagrams and flow diagrams may include more or fewer elements, be arranged or oriented differently, or be characterized differently. It will be appreciated that the implementations may specify block diagrams, flow diagrams, and/or network diagrams, and numbers of block diagrams and flow diagrams, that illustrate implementations of embodiments of the invention.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
Claims (73)
1. A security device operatively coupled to a network, the security device comprising:
at least one memory;
at least one processor operatively coupled to the at least one memory, the at least one processor configured to:
selecting a sub-pattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic;
generating a unified Deterministic Finite Automaton (DFA) using the sub-patterns selected from all patterns in the set;
generating at least one non-deterministic finite automaton (NFA) for at least one pattern in the set, at least one traversal direction of the at least one pattern for generating the portion of the at least one NFA and for runtime processing of the at least one NFA being determined based on whether a length of the selected sub-pattern is fixed or variable and a position of the selected sub-pattern within the at least one pattern; and is
Storing the generated unified DFA and the at least one NFA in the at least one memory.
2. The security device of claim 1, wherein the at least one heuristic comprises maximizing a number of unique sub-patterns selected and a length of each sub-pattern selected.
3. The security device of claim 1, wherein the processor is further configured to determine whether a length of the selected sub-mode is fixed or variable.
4. The security device of claim 1, wherein the at least one processor is further configured to:
calculating the hash value of each selected sub-mode; and is
Storing each computed hash value in association with an identifier of a mode from which the sub-mode is selected.
5. The security device of claim 1, wherein the at least one heuristic comprises maximizing a number of unique sub-patterns selected, and to maximize the number of unique sub-patterns selected, the at least one processor is further configured for, for each pattern in the set:
calculating the hash value of the selected sub-mode;
comparing the calculated hash value to a list of hash values of sub-patterns selected for other patterns in the set; and is
If the calculated hash value is found in the list, determining whether to replace (i) the selected sub-pattern with another sub-pattern from the pattern or (ii) the selected sub-pattern with the other pattern in the set selected from the other pattern in the set identified based on an association in the list with the calculated hash value, wherein the determination of whether to replace (i) or (ii) is based on comparing lengths of sub-patterns considered for the replacement, thereby maximizing the length of the unique sub-pattern selected.
6. The security device of claim 1, wherein the at least one heuristic comprises identifying sub-patterns of each pattern and ignoring a given sub-pattern of the identified sub-patterns of each pattern if the given sub-pattern has a length less than a minimum threshold.
7. The security device of claim 1, wherein the at least one heuristic comprises accessing a knowledge base of sub-patterns associated with historical usage frequency indicators and ignoring a given sub-pattern of the identified sub-patterns for each pattern if the historical usage frequency indicator of the given sub-pattern is greater than or equal to a usage frequency threshold in the accessed knowledge base.
8. The security device of claim 1, wherein the at least one heuristic comprises identifying sub-patterns for each pattern, and selecting a given sub-pattern for each pattern to maximize a number of consecutive text characters in the selected sub-pattern by having a maximum number of consecutive text characters in the identified sub-pattern based on the given sub-pattern, and by being unique among all sub-patterns selected for the set of one or more regular expressions based on the given sub-pattern.
9. The security device of claim 1, wherein the at least one heuristic comprises:
prioritizing a given sub-mode of each mode based on a sub-mode type of each sub-mode and a length of the given sub-mode, a longer length sub-mode being prioritized over another sub-mode of lesser length;
selecting a unique sub-mode as the selected sub-mode based on the priority setting, the selected unique sub-mode having a length of at least a minimum length threshold; and is
Selecting a non-unique sub-pattern as the selected sub-pattern based on the priority setting if the given sub-pattern is not unique and has a length of at least the minimum length threshold.
10. The security device of claim 1, wherein the at least one heuristic comprises prioritizing a given sub-mode of each mode based on the sub-mode type of each of the given sub-mode, the sub-mode type being text-only, alternative, single character repetition, or multi-character repetition, and the highest to lowest priority for the sub-mode type being text-only, alternative, single character repetition, and multi-character repetition.
11. The security apparatus of claim 1, wherein if the first element of the selected sub-pattern is a first element of the at least one pattern and a length of the selected sub-pattern is fixed, a position of the selected sub-pattern is a starting position of the at least one pattern, a portion of the at least one pattern used to generate the at least one NFA is the at least one pattern excluding the selected sub-pattern, the at least one NFA is a single NFA, and the at least one traversal direction of the at least one NFA is a forward traversal direction.
12. The security device of claim 11, wherein to generate the unified DFA, the at least one processor is further configured to:
associating a DFA node of the generated unified DFA associated with a last element of the selected sub-pattern with metadata indicating, to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a starting node of the generated at least one NFA, instructions for transitioning to traversing the at least one NFA in a forward traversal direction and reporting a match of the selected sub-pattern, a leading offset within the payload of a leading character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and a length of the selected sub-pattern, the starting node of the at least one NFA being associated with a first element of the at least one pattern used to generate the portion of the at least one NFA, a character offset of the at least one NFA in the payload is associated with a byte after the end offset of the selected sub-mode, a payload start offset of the at least one NFA is associated with a byte after another byte of the end offset of the selected sub-mode.
13. The security device of claim 11, wherein to generate the at least one NFA, the at least one processor is further configured to:
associating an NFA node of the generated at least one NFA with metadata, the metadata indicating instructions to the walker to terminate the traversal and report a lag offset within a payload of a lag character matched at the NFA node as a final match of the at least one pattern and an end offset of the at least one pattern, the NFA node associated with a last element of the at least one pattern.
14. The security device of claim 1, wherein if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern, the position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the length of the selected sub-pattern is fixed:
the portion of the at least one mode used to generate the at least one NFA includes a lag portion and a leading portion of the at least one mode, the lag portion of the at least one mode being the at least one mode excluding the selected sub-mode and the leading portion of the at least one mode, the leading portion of the at least one mode excluding the selected sub-mode and the lag portion of the at least one mode; and is
The at least one NFA includes a lagging NFA and a leading NFA, the at least one traversal direction includes a forward traversal direction and a reverse traversal direction, the lagging NFA has the forward traversal direction, the leading NFA has the reverse traversal direction, the lagging portion of the at least one mode is used to generate the lagging NFA, and the leading portion of the at least one mode is used to generate the leading NFA.
15. The security device of claim 14, wherein to generate the unified DFA, the at least one processor is further configured to:
associating a DFA node of the generated unified DFA with metadata, the DFA node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a starting node of the lagging NFA, an instruction to transition to traverse the lagging NFA in the forward traversal direction, the starting node of the lagging NFA associated with a first element of the hysteresis section, a payload start offset of the lagging NFA associated with a byte after another byte of an end offset of the selected sub-pattern, a pointer to a starting node of the leading NFA to transition to traverse the leading NFA in the reverse traversal direction and report an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as the selected sub-pattern Instructions for the ending offset of a pattern, the matching of the selected sub-pattern, and a length of the selected sub-pattern, the starting node of the preamble NFA being associated with a last element of the preamble portion, the payload starting offset of the preamble NFA being determined by subtracting the length of the selected sub-pattern from the ending offset of the selected sub-pattern.
16. The security device of claim 14, wherein to generate the at least one NFA, the at least one processor is further configured to:
associating a lagging node of the lagging NFA with metadata, the lagging node associated with the last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating traversing the generated lagging NFA and reporting a matching of a lagging offset within the payload of a lagging character of the payload and the lagging portion of the at least one pattern matching the last element at the lagging node; and is
Associating a leading node of the leading NFA with metadata associated with the first element of the at least one pattern, the metadata indicating instructions to the walker to terminate traversal of the generated leading NFA and report a match of the leading portion of the at least one pattern and report a leading offset within the payload of the leading character of the payload matching the first element at the leading node as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
17. The security apparatus of claim 1, wherein if a first element of the selected sub-pattern is not a first element of the at least one pattern and a last element of the selected sub-pattern is not a last element of the at least one pattern, a position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the first element of the selected sub-pattern is the first element of the at least one pattern, a position of the selected sub-pattern is a start position of the at least one pattern, and if a length of the sub-pattern is fixed or variable:
the portion of the at least one mode used to generate the at least one NFA includes a lag portion and an entire portion of the at least one mode, the lag portion of the at least one mode being the at least one mode excluding a leading portion of the at least one mode, the leading portion including the first element of the at least one mode, the last element of the selected sub-mode, and all elements in the at least one mode therebetween, the entire portion of the at least one mode being the at least one mode, the leading portion being the selected sub-mode if the position of the selected sub-mode is the starting position; and is
The at least one NFA includes a lagging NFA and an umbrella NFA, the at least one traversal direction includes a forward traversal direction and a reverse traversal direction, the lagging NFA has the forward traversal direction, the umbrella NFA has the reverse traversal direction, the lagging portion of the at least one pattern is used to generate the lagging NFA, and the entire portion of the at least one pattern is used to generate the umbrella NFA.
18. The security device of claim 17, wherein to generate the unified DFA, the at least one processor is further configured to:
associating a DFA node of the generated unified DFA with metadata, the DFA node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a start node of the lagging NFA, instructions to transition to traverse the lagging NFA in the forward traversal direction and report a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an end offset of the selected sub-pattern and report a length of the selected sub-pattern if the length is fixed, the start node of the lagging NFA associated with a first element of the lagging portion.
19. The security device of claim 17, wherein to generate the at least one NFA, the at least one processor is further configured to:
associating a hysteresis node of the generated hysteretic NFA with metadata, the hysteresis node associated with the last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a starting node of the umbrella NFA, instructions to transition to traverse the umbrella NFA in the reverse traversal direction and optionally report an offset within the payload of a character at the hysteresis node that matches the last element of the at least one pattern and optionally report a match of the hysteresis portion of the at least one pattern, the starting node of the umbrella NFA associated with the last element of the at least one pattern; and is
Associating umbrella nodes of the generated umbrella NFA with metadata associated with the first elements of the at least one pattern, the metadata indicating instructions to the walker to terminate the traversal and report a final match of the at least one pattern and report a start offset within the payload of a start character of the umbrella nodes that matches the first elements of the at least one pattern as a start offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
20. The security apparatus of claim 1, wherein if a first element of the selected sub-pattern is not a first element of the at least one pattern and a last element of the selected sub-pattern is not a last element of the at least one pattern, a position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the first element of the selected sub-pattern is the first element of the at least one pattern, a position of the selected sub-pattern is a start position of the at least one pattern, and if a length of the sub-pattern is fixed or variable:
the portion of the at least one mode used to generate the at least one NFA includes a lag portion and a leading portion of the at least one mode, the lag portion of the at least one mode being the at least one mode excluding the leading portion of the at least one mode, the leading portion including the first element of the at least one mode, the last element of the selected sub-mode, and all elements in the at least one mode therebetween, the lag portion being the selected sub-mode if the position of the selected sub-mode is the starting position; and is
The at least one NFA includes a lagging NFA and a leading NFA, the at least one traversal direction includes a forward traversal direction and a reverse traversal direction, the lagging NFA has the forward traversal direction, the leading NFA has the reverse traversal direction, the lagging portion of the at least one mode is used to generate the lagging NFA, and the leading portion of the at least one mode is used to generate the leading NFA.
21. The security device of claim 20, wherein to generate the unified DFA, the at least one processor is further configured to:
associating a DFA node of the generated unified DFA with metadata, the DFA node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the lagging NFA, instructions to transition to traverse the lagging NFA in the forward traversal direction, the starting node of the lagging NFA associated with a first element of the hysteresis section, a pointer to a starting node of the leading NFA, instructions to transition to traverse the leading NFA in the reverse traversal direction and report a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and report the sub-pattern if the length of the selected sub-pattern is fixed Instructions of length, the starting node of the leading NFA associated with a last element of the selected sub-mode.
22. The security device of claim 20, wherein to generate the at least one NFA, the at least one processor is further configured to:
associating a hysteresis node of the generated lagging NFA with metadata, the hysteresis node associated with the last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating traversing the lagging NFA and reporting a hysteresis offset within the payload of a lagging character at the hysteresis node that matches the last element of the at least one pattern and reporting a match of the lagging portion of the at least one pattern; and is
Associating a leading node of the generated leading NFA with metadata, the leading node associated with the first element of the at least one pattern, the metadata indicating to the walker instructions to terminate traversal of the leading NFA and report a leading offset within the payload of a leading character of the leading node that matches the first element of the at least one pattern.
23. The security device of claim 1, wherein if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern, the position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the length of the selected sub-pattern is fixed or variable:
the at least one NFA is a single NFA and the at least one traversal direction includes a forward traversal direction for runtime processing nodes of the single NFA associated with elements of a hysteresis portion of the at least one mode and a reverse traversal direction for runtime processing nodes of the single NFA associated with all elements of the at least one mode, the hysteresis portion of the at least one mode being the at least one mode excluding a leading portion of the at least one mode, the leading portion including the first element of the at least one mode, the last element of the selected sub-mode, and all elements in the at least one mode therebetween.
24. The security device of claim 23, wherein to generate the unified DFA, the at least one processor is further configured to:
associating DFA nodes of the generated unified DFA with metadata, the DFA nodes associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the single NFA, instructions to transition to traversing the generated single NFA in the forward traversal direction and reporting a match of the selected sub-pattern, an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern, and reporting the length if the length of the selected sub-pattern is fixed, the start node is associated with a next element in the at least one pattern immediately following the last element of the selected sub-pattern.
25. The security device of claim 23, wherein to generate the at least one NFA, the at least one processor is further configured to:
associating a hysteresis node of the single NFA with metadata associated with a last element of the at least one pattern, the metadata indicating to a walker configured to traverse the unified DFA and the at least one NFA produced with a payload, instructions for transitioning to traversing the single NFA in the reverse traversal direction with a payload start offset associated with an end offset of the selected sub-pattern; and is
Associating a leading node of the single NFA with metadata, the leading node associated with the first element of the at least one pattern, the metadata indicating to the walker instructions to terminate the traversal and report an offset within the payload of characters of the leading node that match the first element of the at least one pattern if required by a qualifier associated with the at least one pattern as a starting offset of the at least one pattern and a final match of the at least one pattern.
26. The security device of claim 1, wherein if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern, the position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the length of the selected sub-pattern is fixed:
the at least one NFA is a single NFA, and the at least one traversal direction includes a reverse traversal direction for runtime processing nodes of the single NFA associated with a leading portion of the at least one pattern and a forward traversal direction for runtime processing nodes of the single NFA associated with all elements of the at least one pattern, the leading portion being the at least one pattern excluding a lagging portion of the at least one pattern, the lagging portion including the first element of the selected sub-pattern, the last element of the at least one pattern, and all elements of the at least one pattern in between.
27. The security device of claim 26, wherein to generate the unified DFA, the at least one processor is further configured to:
associating a DFA node of the generated unified DFA with metadata, the node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the single NFA, instructions to transition to traverse the generated single NFA in the reverse traversal direction and report a match of the selected sub-pattern, an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and a length of the selected sub-pattern, the start node is associated with a last element of the preamble part, and a payload start offset is determined by subtracting a length of the selected sub-pattern from the end offset of the selected sub-pattern.
28. The security device of claim 26, wherein to generate the at least one NFA, the at least one processor is further configured to:
associating a leading node of the single NFA with metadata, the leading node associated with a first element of the at least one pattern, the metadata indicating instructions to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload to traverse the single NFA in the forward traversal direction; and is
Associating a hysteresis node of the single NFA with metadata associated with the last element of the at least one pattern, the metadata indicating instructions to the walker to terminate the traversal and report an offset within the payload of characters of the hysteresis node that match the last element of the at least one pattern and a final match of the at least one pattern.
29. The security device of claim 1, wherein if a last element of the selected sub-pattern is a last element of the at least one pattern, a position of the selected sub-pattern is an end position of the at least one pattern, and if a length of the selected sub-pattern is fixed, a portion of the at least one pattern used to generate the at least one NFA is the at least one pattern excluding the selected sub-pattern, and the at least one traversal direction is a reverse traversal direction.
30. The security device of claim 29, wherein to generate the unified DFA, the processor is further configured to:
associating a DFA node with metadata, the DFA node corresponding to the last element of the selected sub-mode, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the generated at least one NFA, instructions to transition to traversing the generated at least one NFA in a reverse traversal direction and reporting a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern, the starting node of the generated at least one NFA is associated with a last element of the portion, a payload start offset of the generated at least one NFA is determined by subtracting a length of the selected sub-mode from the end offset of the selected sub-mode.
31. The security device of claim 29, wherein to generate the at least one NFA, the processor is further configured to:
associating an NFA node associated with a first element of the portion with metadata indicating, to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating the traversal and reporting a final match of the at least one pattern and reporting an offset within the payload of characters of the NFA node that match the first element of the portion as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
32. The security device of claim 1, wherein if a last element of the selected sub-pattern is a last element of the at least one pattern, a position of the selected sub-pattern is an end position of the at least one pattern, and if a length of the selected sub-pattern is variable or fixed, a portion of the at least one pattern used to generate the at least one NFA is the at least one pattern, and the at least one traversal direction is a reverse traversal direction.
33. The security device of claim 32, wherein to generate the unified DFA, the processor is further configured to:
associating a DFA node with metadata, the DFA node corresponding to the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a start node of the generated at least one NFA, instructions to transition to traverse the generated at least one NFA in a reverse traversal direction and report a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an end offset of the selected sub-pattern and report the length if the length of the selected sub-pattern is fixed, the start node of the generated at least one NFA being associated with a last element of the selected sub-pattern, the start offset of the generated at least one NFA being related to the end offset of the selected sub-pattern And (4) connecting.
34. The security device of claim 32, wherein to generate the at least one NFA, the processor is further configured to:
associating an NFA node associated with a first element of the portion with metadata indicating, to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating the traversal and reporting a final match of the at least one pattern and reporting an offset within the payload of characters of the NFA node that match the first element of the portion as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
35. The security device of claim 1, wherein to store the generated unified DFA and at least one NFA in the at least one memory, the processor is further configured to generate a binary image comprising the generated unified DFA and at least one NFA.
36. The security device of claim 1, wherein the at least one processor comprises a DFA coprocessor and an NFA coprocessor configured as acceleration units to offload DFA and NFA runtime processing, respectively.
37. A method for compiling a finite automaton, comprising:
in at least one processor in a security device operatively coupled with at least one memory, the security device operatively coupled to a network:
selecting a sub-pattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic;
generating a unified Deterministic Finite Automaton (DFA) using the sub-patterns selected from all patterns in the set;
generating at least one non-deterministic finite automaton (NFA) for at least one pattern in the set, at least one traversal direction of the at least one pattern for generating the portion of the at least one NFA and for runtime processing of the at least one NFA being determined based on whether a length of the selected sub-pattern is fixed or variable and a position of the selected sub-pattern within the at least one pattern; and is
Storing the generated unified DFA and the at least one NFA in the at least one memory.
38. The method of claim 37, wherein the at least one heuristic comprises maximizing a number of unique sub-patterns selected and a length of each sub-pattern selected.
39. The method of claim 37, further comprising determining whether a length of the selected sub-mode is fixed or variable.
40. The method of claim 37, further comprising:
calculating the hash value of each selected sub-mode; and is
Storing each computed hash value in association with an identifier of a mode from which the sub-mode is selected.
41. The method of claim 37, wherein the at least one heuristic comprises maximizing a number of unique sub-patterns selected, and further comprising: for each mode in the set of modes,
calculating the hash value of the selected sub-mode;
comparing the calculated hash value to a list of hash values of sub-patterns selected for other patterns in the set; and is
If the calculated hash value is found in the list, determining whether to replace (i) the selected sub-pattern with another sub-pattern from the pattern or (ii) the selected sub-pattern with the other pattern in the set selected from the other pattern in the set identified based on an association in the list with the calculated hash value, wherein the determination of whether to replace (i) or (ii) is based on comparing lengths of sub-patterns considered for the replacement, thereby maximizing the length of the unique sub-pattern selected.
42. The method of claim 37, wherein the at least one heuristic comprises identifying sub-patterns for each pattern and ignoring a given sub-pattern of the identified sub-patterns for each pattern if the given sub-pattern has a length less than a minimum threshold.
43. The method of claim 37, wherein the at least one heuristic comprises accessing a knowledge base of sub-patterns associated with historical usage frequency indicators and ignoring a given sub-pattern of the identified sub-patterns for each pattern if the historical usage frequency indicator of the given sub-pattern is greater than or equal to a usage frequency threshold in the accessed knowledge base.
44. The method of claim 37, wherein the at least one heuristic comprises: identifying sub-patterns for each pattern, and maximizing, for each pattern, the number of consecutive text characters in a given sub-pattern of the identified sub-patterns by selecting the sub-pattern, the given sub-pattern being selected based on: a given sub-pattern of the identified sub-patterns has a maximum number of consecutive text characters in the identified sub-pattern, and the given sub-pattern is unique among all sub-patterns selected for the set of one or more regular expressions.
45. The method of claim 37, wherein the at least one heuristic comprises:
prioritizing a given sub-mode of each mode based on a sub-mode type of each sub-mode and a length of the given sub-mode, a longer length sub-mode being prioritized over another sub-mode of lesser length;
selecting a unique sub-mode as the selected sub-mode based on the priority setting, the selected unique sub-mode having a length of at least a minimum length threshold; and is
Selecting a non-unique sub-pattern as the selected sub-pattern based on the priority setting if none of the given sub-patterns is unique and has a length of at least the minimum length threshold.
46. The method of claim 37, wherein the at least one heuristic comprises prioritizing a given sub-mode of each mode based on a sub-mode type for the given sub-mode, the sub-mode type being text-only, alternative, single character repeat, or multi-character repeat, and a highest to lowest priority for the sub-mode type being text-only, alternative, single character repeat, and multi-character repeat.
47. The method of claim 37, wherein if the first element of the selected sub-pattern is a first element of the at least one pattern and a length of the selected sub-pattern is fixed, a position of the selected sub-pattern is a starting position of the at least one pattern, the portion of the at least one pattern used to generate the at least one NFA is the at least one pattern excluding the selected sub-pattern, the at least one NFA is a single NFA, and the at least one traversal direction of the at least one NFA is a forward traversal direction.
48. The method of claim 47, wherein generating the unified DFA comprises:
associating a DFA node of the generated unified DFA associated with a last element of the selected sub-pattern with metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the generated at least one NFA, instructions for transitioning to traversing the at least one NFA in a forward traversal direction and reporting a match of the selected sub-pattern, a leading offset within the payload of a leading character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and a length of the selected sub-pattern, the starting node of the at least one NFA being associated with a first element of the at least one pattern used to generate the portion of the at least one NFA, a payload start offset of the at least one NFA is associated with a byte following another byte of the end offset of the selected sub-mode.
49. The method of claim 37, wherein generating the at least one NFA comprises:
associating an NFA node of the generated at least one NFA with metadata, the metadata indicating instructions to the walker to terminate the traversal and report a lag offset within a payload of a lag character matched at the NFA node as a final match of the at least one pattern and an end offset of the at least one pattern, the NFA node associated with a last element of the at least one pattern.
50. The method of claim 37, wherein if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern, the position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the length of the selected sub-pattern is fixed:
the portion of the at least one mode used to generate the at least one NFA includes a lag portion and a leading portion of the at least one mode, the lag portion of the at least one mode being the at least one mode excluding the selected sub-mode and the leading portion of the at least one mode, the leading portion of the at least one mode excluding the selected sub-mode and the lag portion of the at least one mode; and is
The at least one NFA includes a lagging NFA and a leading NFA, the at least one traversal direction includes a forward traversal direction and a reverse traversal direction, the lagging NFA has the forward traversal direction, the leading NFA has the reverse traversal direction, the lagging portion of the at least one mode is used to generate the lagging NFA, and the leading portion of the at least one mode is used to generate the leading NFA.
51. The method of claim 50, wherein generating the unified DFA comprises:
associating a DFA node of the generated unified DFA with metadata, the DFA node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a starting node of the lagging NFA, instructions for transitioning to traverse the lagging NFA in the forward traversal direction, the starting node of the lagging NFA associated with a first element of the hysteresis section, a payload start offset of the lagging NFA associated with a byte following an end offset of the selected sub-pattern, a pointer to a starting node of the leading NFA, instructions for transitioning to traverse the leading NFA in the reverse traversal direction and reporting an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as the selected sub-pattern Instructions to determine a payload start offset of the preamble NFA based on the length of the selected sub-mode, the start node of the preamble NFA being associated with a last element of the preamble section, the payload start offset of the preamble NFA being determined by subtracting the length of the selected sub-mode from the end offset of the selected sub-mode.
52. The method of claim 50, wherein generating the at least one NFA comprises:
associating a lagging node of the lagging NFA with metadata, the lagging node associated with the last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating traversing the generated lagging NFA and reporting a matching of a lagging offset within the payload of a lagging character of the payload and the lagging portion of the at least one pattern matching the last element at the lagging node; and is
Associating a leading node of the leading NFA with metadata associated with the first element of the at least one pattern, the metadata indicating instructions to the walker to terminate traversal of the generated leading NFA and report a match of the leading portion of the at least one pattern and report a leading offset within the payload of the leading character of the payload matching the first element at the leading node as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
53. The method of claim 37, wherein if a first element of the selected sub-pattern is not a first element of the at least one pattern and a last element of the selected sub-pattern is not a last element of the at least one pattern, a position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the first element of the selected sub-pattern is the first element of the at least one pattern, a position of the selected sub-pattern is a starting position of the at least one pattern, and if a length of the sub-pattern is fixed or variable:
the portion of the at least one mode used to generate the at least one NFA includes a lag portion and an entire portion of the at least one mode, the lag portion of the at least one mode being the at least one mode excluding a leading portion of the at least one mode, the leading portion including the first element of the at least one mode, the last element of the selected sub-mode, and all elements in the at least one mode therebetween, the entire portion of the at least one mode being the at least one mode, the leading portion being the selected sub-mode if the position of the selected sub-mode is the starting position; and is
The at least one NFA includes a lagging NFA and an umbrella NFA, the at least one traversal direction includes a forward traversal direction and a reverse traversal direction, the lagging NFA has the forward traversal direction, the umbrella NFA has the reverse traversal direction, the lagging portion of the at least one pattern is used to generate the lagging NFA, and the entire portion of the at least one pattern is used to generate the umbrella NFA.
54. The method of claim 53, wherein generating the unified DFA comprises:
associating a DFA node of the generated unified DFA with metadata, the DFA node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the unified DFA and the at least one NFA generated with a payload a pointer to a start node of the lagging NFA, instructions to transition to traverse the lagging NFA in the forward traversal direction and report a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an end offset of the selected sub-pattern and report the length if the length of the selected sub-pattern is fixed, the start node of the lagging NFA associated with a first element of the lagging portion.
55. The method of claim 53, wherein generating the at least one NFA comprises:
associating a hysteresis node of the generated hysteretic NFA with metadata, the hysteresis node associated with the last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a starting node of the umbrella NFA, instructions to transition to traverse the umbrella NFA in the reverse traversal direction and optionally report an offset within the payload of a character at the hysteresis node that matches the last element of the at least one pattern and optionally report a match of the hysteresis portion of the at least one pattern, the starting node of the umbrella NFA associated with the last element of the at least one pattern; and is
Associating umbrella nodes of the generated umbrella NFA with metadata associated with the first elements of the at least one pattern, the metadata indicating instructions to the walker to terminate the traversal and report a final match of the at least one pattern and report a start offset within the payload of a start character of the umbrella nodes that matches the first elements of the at least one pattern as a start offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
56. The method of claim 37, wherein if a first element of the selected sub-pattern is not a first element of the at least one pattern and a last element of the selected sub-pattern is not a last element of the at least one pattern, a position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the first element of the selected sub-pattern is the first element of the at least one pattern, a position of the selected sub-pattern is a starting position of the at least one pattern, and if a length of the sub-pattern is fixed or variable:
the portion of the at least one mode used to generate the at least one NFA includes a lag portion and a leading portion of the at least one mode, the lag portion of the at least one mode being the at least one mode excluding the leading portion of the at least one mode, the leading portion including the first element of the at least one mode, the last element of the selected sub-mode, and all elements in the at least one mode therebetween, the lag portion being the selected sub-mode if the position of the selected sub-mode is the starting position; and is
The at least one NFA includes a lagging NFA and a leading NFA, the at least one traversal direction includes a forward traversal direction and a reverse traversal direction, the lagging NFA has the forward traversal direction, the leading NFA has the reverse traversal direction, the lagging portion of the at least one mode is used to generate the lagging NFA, and the leading portion of the at least one mode is used to generate the leading NFA.
57. The method of claim 56, wherein generating the unified DFA comprises:
associating a DFA node of the generated unified DFA with metadata, the DFA node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the lagging NFA, instructions to transition to traverse the lagging NFA in the forward traversal direction, the starting node of the lagging NFA associated with a first element of the hysteresis section, a pointer to a starting node of the leading NFA, instructions to transition to traverse the leading NFA in the reverse traversal direction and report a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and report the sub-pattern if the length of the selected sub-pattern is fixed Instructions of length, the starting node of the leading NFA associated with a last element of the selected sub-mode.
58. The method of claim 56, wherein generating the at least one NFA comprises:
associating a hysteresis node of the generated lagging NFA with metadata, the hysteresis node associated with the last element of the at least one pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating traversing the lagging NFA and reporting a hysteresis offset within the payload of a lagging character at the hysteresis node that matches the last element of the at least one pattern and reporting a match of the lagging portion of the at least one pattern; and is
Associating a leading node of the generated leading NFA with metadata associated with the first element of the at least one pattern, the metadata indicating instructions to the walker to terminate traversal of the leading NFA and report a match of the leading portion and a leading offset within the payload of a leading character of the leading node that matches the first element of the at least one pattern.
59. The method of claim 37, wherein if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern, the position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the length of the selected sub-pattern is fixed or variable:
the at least one NFA is a single NFA and the at least one traversal direction includes a forward traversal direction for runtime processing nodes of the single NFA associated with elements of a hysteresis portion of the at least one mode and a reverse traversal direction for runtime processing nodes of the single NFA associated with all elements of the at least one mode, the hysteresis portion of the at least one mode being the at least one mode excluding a leading portion of the at least one mode, the leading portion including the first element of the at least one mode, the last element of the selected sub-mode, and all elements in the at least one mode therebetween.
60. The method of claim 59, wherein generating the unified DFA comprises:
associating a DFA node of the generated unified DFA with metadata, the node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the single NFA, instructions to transition to traversing the generated single NFA in the forward traversal direction and reporting a match of the selected sub-pattern, an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern, and reporting the length if the length of the selected sub-pattern is fixed, the start node is associated with a next element in the at least one pattern immediately following the last element of the selected sub-pattern.
61. The method of claim 59, wherein generating the at least one NFA comprises:
associating a hysteresis node of the single NFA with metadata associated with a last element of the at least one pattern, the metadata indicating instructions to a walker configured to traverse the unified DFA and the at least one NFA produced with a payload to traverse the single NFA in the reverse traversal direction with a payload start offset associated with an end offset of the selected sub-pattern; and is
Associating a leading node of the single NFA with metadata, the leading node associated with the first element of the at least one pattern, the metadata indicating to the walker instructions to terminate the traversal and report an offset within the payload of characters of the leading node that match the first element of the at least one pattern if required by a qualifier associated with the at least one pattern as a starting offset of the at least one pattern and a final match of the at least one pattern.
62. The method of claim 37, wherein if the first element of the selected sub-pattern is not the first element of the at least one pattern and the last element of the selected sub-pattern is not the last element of the at least one pattern, the position of the selected sub-pattern is an intermediate position of the at least one pattern, and if the length of the selected sub-pattern is fixed:
the at least one NFA is a single NFA, and the at least one traversal direction includes a reverse traversal direction for runtime processing nodes of the single NFA associated with a leading portion of the at least one pattern and a forward traversal direction for runtime processing nodes of the single NFA associated with all elements of the at least one pattern, the leading portion being the at least one pattern excluding a lagging portion of the at least one pattern, the lagging portion including the first element of the selected sub-pattern, the last element of the at least one pattern, and all elements of the at least one pattern in between.
63. The method of claim 62, wherein generating the unified DFA comprises:
associating a DFA node of the generated unified DFA with metadata, the node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload a pointer to a starting node of the single NFA, instructions to transition to traverse the generated single NFA in the reverse traversal direction and report a match of the selected sub-pattern, an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an ending offset of the selected sub-pattern and a length of the selected sub-pattern, the start node is associated with a last element of the preamble part, and a payload start offset is determined by subtracting a length of the selected sub-pattern from the end offset of the selected sub-pattern.
64. The method of claim 62, wherein generating the at least one NFA comprises:
associating a leading node of the single NFA with metadata, the leading node associated with a first element of the at least one pattern, the metadata indicating instructions to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload to traverse the single NFA in the forward traversal direction; and is
Associating a hysteresis node of the single NFA with metadata associated with the last element of the at least one pattern, the metadata indicating instructions to the walker to terminate the traversal and report an offset within the payload of characters of the hysteresis node that match the last element of the at least one pattern and a final match of the at least one pattern.
65. The method of claim 37, wherein if a last element of the selected sub-pattern is a last element of the at least one pattern, a position of the selected sub-pattern is an end position of the at least one pattern, and if a length of the selected sub-pattern is fixed, the portion of the at least one pattern used to generate the at least one NFA is the at least one pattern excluding the selected sub-pattern, and the at least one traversal direction is a reverse traversal direction.
66. The method of claim 65, wherein generating the unified DFA comprises:
associating a DFA node with metadata, the DFA node associated with the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a start node of the generated at least one NFA, instructions to transition to traverse the generated at least one NFA in a reverse traversal direction and report a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an end offset of the selected sub-pattern, the start node of the generated at least one NFA associated with a last element of the portion.
67. The method of claim 65, wherein generating the at least one NFA comprises:
associating an NFA node associated with a first element of the portion with metadata indicating, to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating the traversal and reporting a final match of the at least one pattern and reporting an offset within the payload of characters of the NFA node that match the first element of the portion as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
68. The method of claim 37, wherein if a last element of the selected sub-pattern is a last element of the at least one pattern, a position of the selected sub-pattern is an end position of the at least one pattern, and if a length of the selected sub-pattern is variable or fixed, the portion of the at least one pattern used to generate the at least one NFA is the at least one pattern, and the at least one traversal direction is a reverse traversal direction.
69. The method of claim 68, wherein generating the unified DFA comprises:
associating a DFA node with metadata, the DFA node corresponding to the last element of the selected sub-pattern, the metadata indicating to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, a pointer to a start node of the generated at least one NFA, instructions to transition to traverse the generated at least one NFA in a reverse traversal direction and report a match of the selected sub-pattern and an offset within the payload of a character of the DFA node that matches the last element of the selected sub-pattern as an end offset of the selected sub-pattern and report the length if the length of the selected sub-pattern is fixed, the start node of the generated at least one NFA being associated with a last element of the selected sub-pattern, the start offset of the generated at least one NFA being related to the end offset of the selected sub-pattern And (4) connecting.
70. The method of claim 68, wherein generating the at least one NFA comprises:
associating an NFA node associated with a first element of the portion with metadata indicating, to a walker configured to traverse the generated unified DFA and the at least one NFA with a payload, instructions for terminating the traversal and reporting a final match of the at least one pattern and reporting an offset within the payload of characters of the NFA node that match the first element of the portion as a starting offset of the at least one pattern if a qualifier associated with the at least one pattern requires.
71. The method of claim 37, wherein storing the generated unified DFA and at least one NFA in the at least one memory comprises generating a binary image comprising the generated unified DFA and at least one NFA.
72. The method of claim 37, wherein the at least one processor comprises a DFA coprocessor and an NFA coprocessor configured as acceleration units to offload DFA and NFA runtime processing, respectively.
73. A non-transitory computer readable medium having stored thereon sequences of instructions which, when loaded and executed by a processor, cause the processor to:
selecting a sub-pattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic;
generating a unified Deterministic Finite Automaton (DFA) using the sub-patterns selected from all patterns in the set;
generating at least one non-deterministic finite automaton (NFA) for at least one pattern in the set, at least one traversal direction of the at least one pattern for generating the portion of the at least one NFA and for runtime processing of the at least one NFA being determined based on whether a length of the selected sub-pattern is fixed or variable and a position of the selected sub-pattern within the at least one pattern; and is
Storing the generated unified DFA and the at least one NFA in the at least one memory.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/015,248 US9426165B2 (en) | 2013-08-30 | 2013-08-30 | Method and apparatus for compilation of finite automata |
| US14/015,248 | 2013-08-30 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1208105A1 HK1208105A1 (en) | 2016-02-19 |
| HK1208105B true HK1208105B (en) | 2019-07-05 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9426166B2 (en) | Method and apparatus for processing finite automata | |
| US9426165B2 (en) | Method and apparatus for compilation of finite automata | |
| US9419943B2 (en) | Method and apparatus for processing of finite automata | |
| US9602532B2 (en) | Method and apparatus for optimizing finite automata processing | |
| US10002326B2 (en) | Compilation of finite automata based on memory hierarchy | |
| US9904630B2 (en) | Finite automata processing based on a top of stack (TOS) memory | |
| CN107122221B (en) | compiler for regular expressions | |
| US9438561B2 (en) | Processing of finite automata based on a node cache | |
| US8990259B2 (en) | Anchored patterns | |
| US10110558B2 (en) | Processing of finite automata based on memory hierarchy | |
| US9823895B2 (en) | Memory management for finite automata processing | |
| HK1208105B (en) | Method and apparatus for compilation of finite automata | |
| HK1208103B (en) | Method and apparatus for processing finite automata | |
| Yu et al. | Fast packet pattern-matching algorithms | |
| HK1212119B (en) | Method and apparatus for processing of finite automata | |
| HK1193278A (en) | Compiler for regular expressions | |
| HK1193278B (en) | Compiler for regular expressions | |
| HK1211718A1 (en) | System and method to traverse nfa generated for regular expression patterns | |
| HK1214695B (en) | Compilation of finite automata based on memory hierarchy |