[go: up one dir, main page]

WO2016175829A1 - Mapping nodes in a network - Google Patents

Mapping nodes in a network Download PDF

Info

Publication number
WO2016175829A1
WO2016175829A1 PCT/US2015/028485 US2015028485W WO2016175829A1 WO 2016175829 A1 WO2016175829 A1 WO 2016175829A1 US 2015028485 W US2015028485 W US 2015028485W WO 2016175829 A1 WO2016175829 A1 WO 2016175829A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
log files
log file
node
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2015/028485
Other languages
French (fr)
Inventor
Noam Fraenkel
Erez AGAMI
Efrat EGOZI LEVI
Ohad Assulin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Hewlett Packard Enterprise Development LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development LP filed Critical Hewlett Packard Enterprise Development LP
Priority to PCT/US2015/028485 priority Critical patent/WO2016175829A1/en
Publication of WO2016175829A1 publication Critical patent/WO2016175829A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications

Definitions

  • a network administrator may need to know what hardware and software components make up the network.
  • a network is described in terms of nodes, each of which may be a point capable of sending and receiving data.
  • a node may include hardware and/or software elements.
  • FIG. 1 is a block diagram of a computing device for mapping nodes within a computer network according to an example of the principles described herein.
  • FIG. 2 is a block diagram of a computing device for mapping nodes within a computer network according to another example of the principles described herein.
  • FIG. 3 is a flowchart showing a method of mapping nodes within a computer network according to another example of the principles described herein.
  • FIG. 4 is a flowchart showing a method of comparing log files according to another example of the principles described herein.
  • FIG. 5 is a flowchart showing a method of determining whether a correlation among any time series patterns exists among a plurality of nodes in a network exists according to another example of the principles described herein.
  • a network administrator is generally responsible for the security and operation of a network.
  • the network administrator needs to understand risks to the network, the impact of proposed changes and what elements of the network may be causing issues. To accomplish this, the network administrator may need to know what hardware and software components make up the network.
  • the network and its components may be described in terms of nodes, each of which is a point capable of sending and receiving data.
  • a node may include hardware and/or software elements.
  • the network administrator may determine what network elements constitute each node. As the size of the network grows, keeping track of individual nodes may be challenging. This information defining nodes and their relationships is sometimes organized using only a simple spreadsheet. However, as the size of the network and the number of nodes increase, it may not be effective to track the nodes using only a spreadsheet.
  • CMDBs configuration management databases
  • a CMDB maintains data identifying and defining attributes of the nodes of a network.
  • the CMDB may also maintain data describing the relationships between nodes.
  • a number of rules may be created by the network administrator that allows the CMDB to proactively scan the network to recognize and define nodes.
  • CMDB complementary metal-oxide-semiconductor
  • running proactive scans of the network with a CMDB requires providing credentials to access the various elements of the network. This may be cumbersome to set up.
  • the CMDB requires ongoing maintenance of the rules governing nodes. These rules must be accurate, carefully defined, and up-to-date in order to provide a proper view of the nodes.
  • the present specification describes a computing device that does not use a CMDB and/or that does not require a network administrator to implement a spreadsheet in order to identify and manage the nodes in the network. Instead, the computing device may receive a number of log files from any number of nodes within a network and, based on the log files, identify each node in the network and the relationships between nodes. This will be described in more detail below.
  • the present specification describes a method of mapping nodes in a network.
  • the method may comprise operating a computing device that receives log files from nodes on a computer network the computing device comparing the log files to identify log files from different nodes and a same node and mapping the different nodes in the computer network based on the log files.
  • the present specification describes a computing device for automatically mapping nodes in a network.
  • the computing device may comprise a network interface for receiving log files from nodes on a network and at least one processor to implement a comparator and node mapper wherein the comparator electronically compares log files received and determines whether log files have originated from a same node or different nodes in the network and wherein the mapper generates a map of the different nodes in the network based on the comparison of log files through the network interface.
  • the present specification describes a non-transitory computer-readable storage medium comprising instructions which, when executed by a processor of a device for mapping a number of nodes within a computer network.
  • the instructions when executed by the processor, may cause the processor to compare a plurality of log files received by the processor to identify log files from different nodes and a same node and map the different nodes in the computer network based on the log files.
  • node is meant to be understood as a point within a network, capable of sending or receiving data.
  • a node may be a device, such as printer or workstation, a system, a router, a switch, or a storage location on a disk, among other networked computing devices.
  • a node may be a module within a computing device.
  • a single computing device may support multiple network nodes.
  • Fig. 1 is a block diagram of a computing device (105) for mapping nodes (135) within a computer network (130) according to an example of the principles described herein.
  • the computing device (105) may be implemented in any electronic computing device. Examples of electronic computing devices include servers, desktop computers, laptop computers, personal digital assistants (PDAs), mobile devices, smartphones, gaming systems, and tablets, among other electronic devices.
  • PDAs personal digital assistants
  • the computing device (105) may be utilized in any data processing scenario including, stand-alone hardware, mobile applications, through a computer network, or combinations thereof. Further, the computing device (105) may be used in a computer network, a public cloud network, a private cloud network, a hybrid cloud network, other forms of networks, or combinations thereof. In one example, the methods provided by the computing device (105) are provided as a service over a network by, for example, a third party.
  • the service may comprise, for example, the following: a Software as a Service (SaaS) hosting a number of applications; a Platform as a Service (PaaS) hosting a computer platform comprising, for example, operating systems, hardware, and storage, among others; an Infrastructure as a Service (laaS) hosting equipment such as, for example, servers, storage components, network, and components, among others; application program interface (API) as a service (APIaaS), other forms of network services, or combinations thereof.
  • the present methods may be implemented on one or multiple hardware platforms, in which the modules in the computing device (105) can be executed on one or across multiple platforms. Such modules can run on various forms of cloud technologies and hybrid cloud technologies or offered as a SaaS (Software as a service) that can be implemented on or off the cloud.
  • the methods provided by the computing device (105) are executed by a local administrator.
  • the computing device (105) may comprise various hardware components.
  • these hardware components may be a processor (e.g., at least one processor (1 10)).
  • the processor (1 10) may include the hardware architecture to retrieve executable code from the data storage device and execute the executable code.
  • the executable code may, when executed by processor (1 10), cause the processor (1 10) to implement at least the functionality of comparing log files received and determining whether log files have originated from a same node (135) or different nodes (135) in the network (130) according to the methods of the present specification described herein.
  • the log files comprise information specific to each of the nodes (135) such as events that occur at the node or messages sent or received between nodes. Some of this information may also include a source identification value (source ID) that identifies a source of the log file.
  • source ID source identification value
  • any number of log files may have been received by the computing device (105) such that a comparison of a subsequent log file's source ID would require a comparison of that value to a myriad of other similar values associated with each log file received. Instead, the comparison is accomplished as described herein, in order to prevent the comparison of one to many.
  • the log file may also include a timestamp of when the log file was created, hardware installation dates, hardware identification information, software installation dates, software update information, current message throughput of a hardware device, a number of messages received by a hardware device, the time taken to address any number of types of messages received by a hardware device, among other data associated with each node.
  • Each log file may also comprise data that indicates whether a quality of service threshold is being reached thereby describing the general health of the node. This information is used by the processor (1 10) to make the comparison described above. Specific examples of how the comparison is made will be described in more detail below.
  • the executable code may also, when executed by processor (1 10), cause the processor (1 10) to implement at least the functionality of generating a map of different nodes (135) in the network (130) based on the comparison of log files.
  • the map of nodes may be represented to a user of the computing device (105) on a display device via a number of graphical user interfaces (GUIs).
  • GUIs graphical user interfaces
  • the node map may further show how each node is functionally related to each node in the network (130).
  • the executable code may, when executed by processor (1 10), cause the processor (1 10) to determine if a plurality of nodes (135) have a functional relationship between them such that changes in the plurality of nodes affects how each of the nodes operates. Details of the functionality of the processor (1 10) will be described in more detail below.
  • the computing device (105) may further comprise a number of modules (1 15, 120) to achieve the methods described herein.
  • the various modules (1 15, 120) may comprise executable program code that may be executed separately.
  • the modules (1 15, 120) are executable program code stored on a data storage device (140) on the computing device (105).
  • the various modules (1 15, 120) may be stored as separate computer program products.
  • the various modules (1 15, 120) within the computing device (105) may be combined within a number of computer program products; each computer program product comprising a number of the modules (1 15, 120).
  • the modules (1 15, 120) may include a comparator (1 15) to, when executed by the processor (1 10), compare the log files received by the processor (1 10) from the network interface (125) to identify log files from different nodes (135) and a same node (135).
  • the comparison of the log files received allows the computing device to discover which nodes exist within the network (130).
  • a number of nodes (135) within the network (130) send any number of log files each to the computing device (105). As mentioned above, this data allows the comparator (1 15) to determine what nodes exist in the network by comparing data in the log files received.
  • the modules (1 15, 120) may also include a mapper (120) to map the different nodes operating on the network (130).
  • the mapper (120) receives data from the comparator (1 15) indicating whether a log file has originated from a new node that has not been discovered by the comparator (1 15) yet, or if the log file is from a same node (135) already identified by the comparator (1 15).
  • the mapper (120) may also receive information indicating the functional relationship among a plurality of nodes (135) in the network (130) and associated those nodes (135) as such.
  • the computing device (105) may further comprise a data storage device (140).
  • the data storage device (140) may store data such as executable program code that is executed by the processor (1 10) or other processing device such as the comparator (1 15) and the mapper (120).
  • the data storage device (140) may also store computer code representing a number of applications that the processor (1 10) executes to implement at least the functionality described herein.
  • the data storage device (140) may include various types of memory modules, including volatile and nonvolatile memory.
  • the data storage device (140) of the present example includes Random Access Memory (RAM), Read Only Memory (ROM), and Hard Disk Drive (HDD) memory.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • HDD Hard Disk Drive
  • Many other types of memory may also be utilized, and the present specification contemplates the use of many varying type(s) of memory in the data storage device (140) as may suit a particular application of the principles described herein.
  • different types of memory in the data storage device (140) may be used for different data storage needs.
  • the processor (1 10) may boot from Read Only Memory (ROM), maintain nonvolatile storage in the Hard Disk Drive (HDD) memory, and execute program code stored in Random Access Memory (RAM).
  • the data storage device (140) may comprise a computer readable medium, a computer readable storage medium, or a non-transitory computer readable medium, among others.
  • the data storage device (140) may be, but not limited to a system, apparatus, or device implementing electronic, magnetic, optical, electromagnetic, infrared, or semiconductor principles or combinations of the foregoing.
  • a computer readable storage medium may include, for example, the following: an electrical connection having a number of wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store computer usable program code for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable storage medium may be any non-transitory medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Fig. 2 is a block diagram of a computing device (105) for mapping nodes (135) within a computer network (130) according to another example of the principles described herein. Similar to Fig. 1 , the computing device (105) shows a comparator (1 15). In this example, the comparator (1 15) may further comprise a characteristics generator (205) and a signature generator (210) to identify log files from different nodes and a same node.
  • the comparator (1 15) may further comprise a characteristics generator (205) and a signature generator (210) to identify log files from different nodes and a same node.
  • the characteristics generator (205) may generate a vector of characteristics from each log file.
  • the vector of characteristics represents a plurality of characteristics of a corresponding log file.
  • characteristics comprises a number of values that describe, in this case, features of the log file.
  • the values in the vector of characteristics may be arranged in any order and may comprise any number of dimensional values.
  • the data within each log file is described in the same order of dimensional values so that the signature generator (210) can easily compare a signature derived from the vectors of characteristics to vectors of characteristics derived from new log files received by the computing device (105).
  • a vector of characteristics comprises three groups of dimensional values: numerical values associated with the frequency of any given term that appears in the log file; a numerical value of the compression ratio of the log file based on the size of the log file in a sampled state to the size of the log file that after being subjected to a compression algorithm; and numerical values identifying a number of text patterns within the log file.
  • numerical values associated with the frequency of any given term that appears in the log file a numerical value of the compression ratio of the log file based on the size of the log file in a sampled state to the size of the log file that after being subjected to a compression algorithm
  • numerical values identifying a number of text patterns within the log file will be described in more detail below.
  • a parent node may include a number of child nodes to which the parent node relays information about to the computing device (105).
  • the log files of each child node may be included with the log files of the parent node forming a group of log files.
  • the computing device (105) may analyze the text of the group of log files from this group of nodes before analyzing the text of each individual node in the group.
  • the vector of characteristics may comprise a first group of values associated with the frequency of any given term that appears in the log file.
  • Log files comprise human readable text that comprises a number of words that help to describe the type and behavior of each of the nodes. These words may appear multiple times in the text and may indicate what type of node the log file originates or at least may help to identify the node when two log files are compared later.
  • a "bag-of-words” modeling and a "term frequency-inverse document frequency” (TF-IDF) modeling may be implemented to determine which words appear how often in the log file.
  • bag-of-words text within the log file is represented as the "bag” or multiset of its words and may disregard grammar and even word order.
  • Each distinct word within the log file may be counted and a number may be assigned defining how many times each of those distinct words appears in the log file.
  • Each distinct word itself may be assigned a certain dimension in the vector of characteristics to which the value
  • the number of dimensions used in this category may be around 30,000. These values form part of the vector of characteristics and may be saved on the data storage device (140) to be later combined with the other dimensional values in the vector of characteristics and for future comparison with other log files.
  • TF-IDF is a numerical value that is intended to reflect how important a word is to a log file.
  • the TF-IDF provides a weighting value that describes how important any word is in the log file.
  • the TF-IDF value increases proportionally to the number of times a word appears in the log file. However, this value is offset by the frequency of the word in the log file, which helps to adjust for the fact that some words appear more frequently in other log files generally.
  • These weighting values describing the importance of each word in the log file may also be added to the number of dimensional values in the vector of characteristics. As mentioned above, the number of dimensions resulting from the BOW modeling and TF-IDF modeling may exceed 30,000.
  • compression ratio value is a numerical value indicating the compression ratio of the log file based on the size of the log file in an unsampled state to the size of the log file that is subjected to a compression algorithm. This is a single value that occupies a single dimension in the vector of characteristics.
  • each log file may be subjected to a sampling process that reduces the volume of text within the log file but does not lose any information necessary to properly identify the log file. This may be done to improve performance by not subjecting the processor (1 10) to an increase in processing steps. This sampling is optional, however, and the methods described in here may use the complete log files and may also implement multiple processors to achieve a relatively quick and easy processing of the log files.
  • the compression ratio value is computed using the unsampled log file.
  • the processor may refer back to the original copy of the log file to compute the compression ratio value.
  • the characteristics generator may divide the size of the unsampled log file by the size of the log file after it was subjected to a compression algorithm.
  • a compression algorithm encodes log files using fewer memory bits than the original log file
  • a log file from a first source may have a compression ratio of 0.164 while a log file from a second source may have a compression ratio of 0.169.
  • This compression value is added to a dimension in the vector of characteristics along with the BOW and TF-IFD values described above.
  • the vector of characteristics may further comprise numerical values identifying a number of text patterns within the log file.
  • those words may further comprise a number of text patterns indicating how those words are arranged in the text of the log file.
  • the size of the patterns may vary.
  • the characteristics generator (205) may identify a number of times a pattern of text appears in the log file. For example, where the text in the log file reads "Queue was created," that may be indicative of a specific text pattern and the characteristics generator (205) may count the number of times that phrase appears in the log file. This may be done for any phrase.
  • each text pattern may be assigned a text pattern ID as well as a dimension in the vector of characteristics.
  • the number of dimensional values in the vector of characteristics describing a text pattern may be in excess of 3,000.
  • the characteristics generator (205) may further comprise a signature generator (210) that assigns a unique signature to the vector of characteristics.
  • a signature generator (210) may create a unique signature for each log file by applying a machine learning algorithm to the plurality of vectors of characteristics.
  • a machine learning algorithm applies a number of processes to learn which log files are different from the others.
  • the signature generator (210) may learn from and make predictions on data using such algorithms by building a model from example vectors of characteristics of individual log files in order to make data-driven decisions on which log files originated from which nodes.
  • the signature generator, (210) implementing the machine learning algorithm receives a set of such vectors of characteristics, where each vector of characteristics is already associated with the source ID, in order to train (statistically learn) a signature model.
  • methods of machine learning create a signature which is a mathematical function that can be applied on a vector of characteristics and output a signature unique to that node of the log file.
  • a new log file received after the training method above may be transformed into a vector of characteristics as described above.
  • the model may be applied to the vector of characteristics of the new log file. This provides two outputs: the signature of the node to which the log file is most similar and a confidence value indicating a confidence level of the decision that the new log file belongs to that node.
  • the signature may be used to compare to a number of vectors of characteristics in a similar way to determine whether any other log files have originated from the same node or one of a number of different nodes within a computer network.
  • a minimal similarity threshold may be set such that where the minimal similarity threshold between a signature and a vector of characteristics is reached, it is determined that the two nodes from which the log files originated from are in fact the same node.
  • the threshold of minimal similarity may be based on all the log files of the same node.
  • a plurality of log files comprising the same source ID may be each converted into a vector of characteristics and the machine learning algorithm may be used to create the signature for that node based on those log files known to originate from the same node.
  • a confidence value may be produced indicating a confidence of what node each of those log files originated from.
  • the threshold of minimal similarity may be set to the average confidence value of the log files for that type of node within, for example, plus or minus one standard deviation. In another example, the threshold of minimal similarity may be set to the average confidence value of the log files for that type of node within plus or minus a factor of the standard deviation.
  • the node associated with that subsequent log file is marked as a new node connected to the network.
  • the threshold of minimal similarity is less than plus or minus three standard deviations of the average confidence value, the node associated with that subsequent log file is marked as an existing node connected to the network.
  • the signature of the new node may be used to improve the quality of all signatures learned by the signature generator (210).
  • the new data derived from the new signature may be used to further differentiate individual nodes from each other using the machine learning algorithm.
  • the vector of characteristics and the signature generated may be used by the signature generator (210) to identify that existing node when a log file is received from that node again.
  • the threshold of similarity may be adjusted such that a relatively lower threshold of similarity is used thereby allowing the computing device (105) to define characteristics of each node with more specificity.
  • the computing device of Fig. 2 also shows a linked node module (215), a number of peripheral device adapters (225), and a display device (240). These will now be described in more detail.
  • the linked node module (120) may discover which nodes among the number of nodes in the network are functionally linked together. To accomplish this, a time series pattern for each node may be created thereby capturing the frequency of the log files generated by the node. The behavior of each node may then be captured and analyzed. Specifically, the "normal” and "abnormal" behavior of each node is captured. A correlation between the normal and abnormal behavior patterns is identified among each of the nodes to determine which nodes in the network are behaving similarly. The identification of the normal and abnormal behaviors may be accomplished using, for example, a predictive analytics tool. In one example, "abnormal" behavior may include a rare event such as where a certain node generates log files more often than that node had generated in the past.
  • normal behavior may include log generations where the number of log files generated by any given node over a period of time does not deviate from a certain level of activity.
  • the determination as to whether the behavior of any given node is "normal” or "abnormal” may depend on a determination as to whether the activity of that node is similar to activity in the past.
  • a threshold time period may be used to determine whether activity from a certain node is normal or abnormal.
  • a threshold number of log files received by the processor may be used to make this determination.
  • both a threshold time period and threshold number of log files receive may be used to determine whether the activity of a given node is or is not "normal.”
  • the nodes discovered by the linked node module (215) to be linked together are presented to a user via a graphical user interface presented on the display device (240) associated with the computing device (105).
  • the user may assess and validate whether any node so indicated by the linked node module (215) is linked to another node and how it is linked.
  • the text patterns of the unique signature created by the node identification module (1 15) may also be presented to the user for help in the assessment and validation process.
  • the user may also be presented with information describing how each of the nodes interacts based upon each nodes' role. With this information, the user may validate the functional or physical connection between two nodes.
  • the peripheral device adapters (225) may provide an interface to input/output devices, such as, for example, the display device (240), a mouse, or a keyboard.
  • the peripheral device adapters (225) may also provide access to other external devices such as an external storage device, a number of network devices such as, for example, servers, switches, and routers, client devices, other types of computing devices, and combinations thereof.
  • the display device (240) may be provided to allow a user of the computing device (105) to interact with and implement the functionality of the computing device (105) as described above.
  • the peripheral device adapters (225) may also create an interface between the processor (1 10) and the display device (240), a printer, or other media output devices.
  • the network interface (125) may provide an interface to other computing devices (135) within, for example, a network (130), thereby enabling the transmission of data between the computing device (135) and other devices located within the network (245).
  • the network interface (125) may provide an interface between the computing device (105) and each node communicatively coupled to the network (130) so that each node (135) may send to the computing device (105) its log files.
  • the computing device (105) may then receive those log files via the network interface (125).
  • Figs. 1 and 2 show a single network and a single node (135) communicatively coupled to the network (135)
  • the present specification contemplates that the network interface (125) communicatively couples the computing device (105) to any number of networks (130), nodes (135), or networks (130) including any number of nodes (135).
  • mapper (120) receives data from the comparator (1 15) describing what nodes (135) are present in the network (130) as well as information indicating the functional relationship among a plurality of nodes (135) in the network (130).
  • the mapper (120) may cause the display device (240), via the processor, to display the number of graphical user interfaces (GUIs) on the display device (240) showing this data.
  • GUIs may include aspects of the executable code including the display of the nodes discovered by the linked node module (215) to be linked together.
  • Examples of display devices (240) include a computer screen, a laptop screen, a mobile device screen, a personal digital assistant (PDA) screen, and a tablet screen, among other display devices (240).
  • Fig. 3 is a flowchart showing a method (300) of mapping nodes within a computer network according to one example of the principles described herein.
  • the method (300) may begin with operating (305) a computing device that receives log files from nodes on a computer network. As described above, any number of log files may be received by the computing device (105) from any number of nodes.
  • a node may send to the computing device a plurality of log files each of which describe different aspects of the node.
  • the method (300) may continue with the computing device comparing (310) the log files to identify log files from different nodes and a same node. As described above, this is done by generating a vector of characteristics for each log file, learn which log files are different from the others via a machine learning algorithm, and create a signature for the node based on the vector of characteristics. The learned signatures can then be compared with vectors of characteristics from new log files received to identify log files from different nodes and same nodes.
  • the method (300) may then continue with the computing device mapping (315) the different nodes in the computer network based on the log files.
  • the mapping (315) may be displayed on a display device (240) associated with the computing device (105).
  • Fig. 4 is a flowchart showing a method (400) of comparing logs files according to another example of the principles described herein.
  • the method (400) may begin with generating (405) a vector of characteristics from each log file, the vector of characteristics representing a plurality of
  • the vector of characteristics comprises, at least, word frequency value, a compression ratio value the log file, and a text pattern value. The process of how these values are obtained is described above.
  • the method (400) may continue with generating (410) a signature from each vector of characteristics.
  • the signature is generated by implementing a machine learning algorithm as described above.
  • the method (400) may further include comparing (415) a signature from one of the vector of characteristics with a vector of
  • This process allows the log files to be compared to a signature so that a source ID from each individual log file received does not have to be compared to a source ID for each log file already received by the computing device (105).
  • Fig. 5 is a flowchart showing a method (500) of determining whether a correlation among any time series patterns exists among a plurality of nodes in a network.
  • the method (500) may begin with creating (505) a number of time series patters for each mapped node based on the frequency of log files received from each node.
  • the time series pattern may describe a number of log files received from a mapped node within a predefined period of time.
  • the method (500) may continue with determining (510) whether a correlation among any time series patterns for any node exists. As described above, a correlation exists where normal and abnormal behavior of any two nodes are similar.
  • the method (500) may then continue with presenting (515), to a user of a computing device, any detected correlation of time series pattern.
  • the presentation may be made to a user via a display device (240) on the computing device (105).
  • a user may then verify that a correlation exists based on the information presented.
  • the computer usable program code may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the computer usable program code, when executed via, for example, the processor (1 10) of the computer (105) or other programmable data processing apparatus, implement the functions or acts specified in the flowchart and/or block diagram block or blocks.
  • the computer usable program code may be embodied within a computer readable storage medium; the computer readable storage medium being part of the computer program product.
  • the computer readable storage medium is a non-transitory computer readable medium.
  • mapping nodes within a computer network uses the log files received from a number of computing devices coupled to the network. This alleviates a network administrator from having to maintain an ever increasing spreadsheet file to keep track of nodes attached to the network. Additionally, the system also relieves the network administrator from having to gain credentials in order to gain access to each node in the network and retrieve information about the nodes. Instead, the system passively accepts and analyzes log files without requiring access credentials to each node. Still further, the system does not interface with a siloed network that could have difficulties integrating with a CMDB.
  • a computing device implementing the comparator may receive any number of log files from any number of nodes and determine, based on the characteristics derived from the log files and what type of node those log files came from, whether each log file did or did not come from the same node.
  • a computing device implementing the linked node module may also determine whether the receipt of log files from any plurality of nodes is normal or abnormal and based on that information and determine whether there is a functional link among a plurality of nodes that may affect the performance among the nodes in the computer network.
  • each log file may comprise a source ID
  • a one to many comparison would have to be completed by comparing a source ID of a new log file to each and every log file received previously by the computing device.
  • a machine learning algorithm may be used to train the computing device to recognize same or different nodes and assign a signature to new nodes when discovered.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Examples relate to mapping nodes in a network. In an example, a method of mapping nodes in a network may comprise operating a computing device that receives log files from nodes on a computer network the computing device comparing the log files to identify log files from different nodes and a same node and mapping the different nodes in the computer network based on the log files. In another example, a computing device may map nodes in a network by receiving log files from nodes on a network and, with at least one processor, implement a comparator and node mapper wherein the comparator electronically compares log files received and determines whether log files have originated from a same node or different nodes in the network and wherein the mapper generates a map of nodes in the network based on the comparison of log files through the network interface.

Description

MAPPING NODES IN A NETWORK
BACKGROUND
[0001] To manage a computer network, a network administrator may need to know what hardware and software components make up the network. Often a network is described in terms of nodes, each of which may be a point capable of sending and receiving data. A node may include hardware and/or software elements.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] The accompanying drawings illustrate various examples of the principles described herein and are a part of the specification. The examples are given merely for illustration, and do not limit the scope of the claims.
[0003] Fig. 1 is a block diagram of a computing device for mapping nodes within a computer network according to an example of the principles described herein.
[0004] Fig. 2 is a block diagram of a computing device for mapping nodes within a computer network according to another example of the principles described herein.
[0005] Fig. 3 is a flowchart showing a method of mapping nodes within a computer network according to another example of the principles described herein.
[0006] Fig. 4 is a flowchart showing a method of comparing log files according to another example of the principles described herein.
[0007] Fig. 5 is a flowchart showing a method of determining whether a correlation among any time series patterns exists among a plurality of nodes in a network exists according to another example of the principles described herein.
[0008] Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.
DETAILED DESCRIPTION
[0009] A network administrator is generally responsible for the security and operation of a network. The network administrator needs to understand risks to the network, the impact of proposed changes and what elements of the network may be causing issues. To accomplish this, the network administrator may need to know what hardware and software components make up the network. As noted above, the network and its components may be described in terms of nodes, each of which is a point capable of sending and receiving data. A node may include hardware and/or software elements.
[0010] The network administrator may determine what network elements constitute each node. As the size of the network grows, keeping track of individual nodes may be challenging. This information defining nodes and their relationships is sometimes organized using only a simple spreadsheet. However, as the size of the network and the number of nodes increase, it may not be effective to track the nodes using only a spreadsheet.
[0011] To deal with the volume of information for larger networks, network administrators have implemented configuration management databases (CMDBs). A CMDB maintains data identifying and defining attributes of the nodes of a network. The CMDB may also maintain data describing the relationships between nodes. A number of rules may be created by the network administrator that allows the CMDB to proactively scan the network to recognize and define nodes.
[0012] However, there may be challenges to integrate a CMDB into a network. Additionally, running proactive scans of the network with a CMDB requires providing credentials to access the various elements of the network. This may be cumbersome to set up. Further, the CMDB requires ongoing maintenance of the rules governing nodes. These rules must be accurate, carefully defined, and up-to-date in order to provide a proper view of the nodes.
[0013] The present specification describes a computing device that does not use a CMDB and/or that does not require a network administrator to implement a spreadsheet in order to identify and manage the nodes in the network. Instead, the computing device may receive a number of log files from any number of nodes within a network and, based on the log files, identify each node in the network and the relationships between nodes. This will be described in more detail below.
[0014] In one example, the present specification describes a method of mapping nodes in a network. The method may comprise operating a computing device that receives log files from nodes on a computer network the computing device comparing the log files to identify log files from different nodes and a same node and mapping the different nodes in the computer network based on the log files.
[0015] In another example, the present specification describes a computing device for automatically mapping nodes in a network. The computing device may comprise a network interface for receiving log files from nodes on a network and at least one processor to implement a comparator and node mapper wherein the comparator electronically compares log files received and determines whether log files have originated from a same node or different nodes in the network and wherein the mapper generates a map of the different nodes in the network based on the comparison of log files through the network interface.
[0016] In yet another example, the present specification describes a non-transitory computer-readable storage medium comprising instructions which, when executed by a processor of a device for mapping a number of nodes within a computer network. The instructions, when executed by the processor, may cause the processor to compare a plurality of log files received by the processor to identify log files from different nodes and a same node and map the different nodes in the computer network based on the log files. [0017] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough
understanding of the present systems and methods. It will be apparent, however, to one skilled in the art that the present apparatus, systems and methods may be practiced without these specific details. Reference in the specification to "an example" or similar language means that a particular feature, structure, or characteristic described in connection with that example is included as described, but may not be included in other examples.
[0018] As used in the present specification and in the appended claims, the term "node" is meant to be understood as a point within a network, capable of sending or receiving data. In one example, a node may be a device, such as printer or workstation, a system, a router, a switch, or a storage location on a disk, among other networked computing devices. In another example, a node may be a module within a computing device. A single computing device may support multiple network nodes.
[0019] Additionally, as used in the present specification and in the appended claims, the term "a number of" or similar language is meant to be understood broadly as any greater than or equal to one.
[0020] Turning now to the figures, Fig. 1 is a block diagram of a computing device (105) for mapping nodes (135) within a computer network (130) according to an example of the principles described herein. The computing device (105) may be implemented in any electronic computing device. Examples of electronic computing devices include servers, desktop computers, laptop computers, personal digital assistants (PDAs), mobile devices, smartphones, gaming systems, and tablets, among other electronic devices.
[0021] The computing device (105) may be utilized in any data processing scenario including, stand-alone hardware, mobile applications, through a computer network, or combinations thereof. Further, the computing device (105) may be used in a computer network, a public cloud network, a private cloud network, a hybrid cloud network, other forms of networks, or combinations thereof. In one example, the methods provided by the computing device (105) are provided as a service over a network by, for example, a third party. In this example, the service may comprise, for example, the following: a Software as a Service (SaaS) hosting a number of applications; a Platform as a Service (PaaS) hosting a computer platform comprising, for example, operating systems, hardware, and storage, among others; an Infrastructure as a Service (laaS) hosting equipment such as, for example, servers, storage components, network, and components, among others; application program interface (API) as a service (APIaaS), other forms of network services, or combinations thereof. The present methods may be implemented on one or multiple hardware platforms, in which the modules in the computing device (105) can be executed on one or across multiple platforms. Such modules can run on various forms of cloud technologies and hybrid cloud technologies or offered as a SaaS (Software as a service) that can be implemented on or off the cloud. In another example, the methods provided by the computing device (105) are executed by a local administrator.
[0022] To achieve its desired functionality, the computing device (105) may comprise various hardware components. Among these hardware components may be a processor (e.g., at least one processor (1 10)). The processor (1 10) may include the hardware architecture to retrieve executable code from the data storage device and execute the executable code.
[0023] The executable code may, when executed by processor (1 10), cause the processor (1 10) to implement at least the functionality of comparing log files received and determining whether log files have originated from a same node (135) or different nodes (135) in the network (130) according to the methods of the present specification described herein. The log files comprise information specific to each of the nodes (135) such as events that occur at the node or messages sent or received between nodes. Some of this information may also include a source identification value (source ID) that identifies a source of the log file. Although a source ID is present on each log file, any number of log files may have been received by the computing device (105) such that a comparison of a subsequent log file's source ID would require a comparison of that value to a myriad of other similar values associated with each log file received. Instead, the comparison is accomplished as described herein, in order to prevent the comparison of one to many.
[0024] The log file may also include a timestamp of when the log file was created, hardware installation dates, hardware identification information, software installation dates, software update information, current message throughput of a hardware device, a number of messages received by a hardware device, the time taken to address any number of types of messages received by a hardware device, among other data associated with each node. Each log file may also comprise data that indicates whether a quality of service threshold is being reached thereby describing the general health of the node. This information is used by the processor (1 10) to make the comparison described above. Specific examples of how the comparison is made will be described in more detail below.
[0025] The executable code may also, when executed by processor (1 10), cause the processor (1 10) to implement at least the functionality of generating a map of different nodes (135) in the network (130) based on the comparison of log files. In one example, the map of nodes may be represented to a user of the computing device (105) on a display device via a number of graphical user interfaces (GUIs). In one example, in addition to showing that the different nodes exist, the node map may further show how each node is functionally related to each node in the network (130). In this example, the executable code may, when executed by processor (1 10), cause the processor (1 10) to determine if a plurality of nodes (135) have a functional relationship between them such that changes in the plurality of nodes affects how each of the nodes operates. Details of the functionality of the processor (1 10) will be described in more detail below.
[0026] The computing device (105) may further comprise a number of modules (1 15, 120) to achieve the methods described herein. The various modules (1 15, 120) may comprise executable program code that may be executed separately. In one example, the modules (1 15, 120) are executable program code stored on a data storage device (140) on the computing device (105). In one example, the various modules (1 15, 120) may be stored as separate computer program products. In another example, the various modules (1 15, 120) within the computing device (105) may be combined within a number of computer program products; each computer program product comprising a number of the modules (1 15, 120).
[0027] In one example, the modules (1 15, 120) may include a comparator (1 15) to, when executed by the processor (1 10), compare the log files received by the processor (1 10) from the network interface (125) to identify log files from different nodes (135) and a same node (135). The comparison of the log files received allows the computing device to discover which nodes exist within the network (130). In one example, a number of nodes (135) within the network (130) send any number of log files each to the computing device (105). As mentioned above, this data allows the comparator (1 15) to determine what nodes exist in the network by comparing data in the log files received.
[0028] The modules (1 15, 120) may also include a mapper (120) to map the different nodes operating on the network (130). The mapper (120) receives data from the comparator (1 15) indicating whether a log file has originated from a new node that has not been discovered by the comparator (1 15) yet, or if the log file is from a same node (135) already identified by the comparator (1 15). As mentioned above, the mapper (120) may also receive information indicating the functional relationship among a plurality of nodes (135) in the network (130) and associated those nodes (135) as such.
[0029] The computing device (105) may further comprise a data storage device (140). The data storage device (140) may store data such as executable program code that is executed by the processor (1 10) or other processing device such as the comparator (1 15) and the mapper (120). The data storage device (140) may also store computer code representing a number of applications that the processor (1 10) executes to implement at least the functionality described herein.
[0030] The data storage device (140) may include various types of memory modules, including volatile and nonvolatile memory. For example, the data storage device (140) of the present example includes Random Access Memory (RAM), Read Only Memory (ROM), and Hard Disk Drive (HDD) memory. Many other types of memory may also be utilized, and the present specification contemplates the use of many varying type(s) of memory in the data storage device (140) as may suit a particular application of the principles described herein. In certain examples, different types of memory in the data storage device (140) may be used for different data storage needs. For example, in certain examples the processor (1 10) may boot from Read Only Memory (ROM), maintain nonvolatile storage in the Hard Disk Drive (HDD) memory, and execute program code stored in Random Access Memory (RAM).
[0031] The data storage device (140) may comprise a computer readable medium, a computer readable storage medium, or a non-transitory computer readable medium, among others. For example, the data storage device (140) may be, but not limited to a system, apparatus, or device implementing electronic, magnetic, optical, electromagnetic, infrared, or semiconductor principles or combinations of the foregoing. More specific examples of the computer readable storage medium may include, for example, the following: an electrical connection having a number of wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store computer usable program code for use by or in connection with an instruction execution system, apparatus, or device. In another example, a computer readable storage medium may be any non-transitory medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
[0032] Fig. 2 is a block diagram of a computing device (105) for mapping nodes (135) within a computer network (130) according to another example of the principles described herein. Similar to Fig. 1 , the computing device (105) shows a comparator (1 15). In this example, the comparator (1 15) may further comprise a characteristics generator (205) and a signature generator (210) to identify log files from different nodes and a same node.
[0033] The characteristics generator (205) may generate a vector of characteristics from each log file. The vector of characteristics represents a plurality of characteristics of a corresponding log file. The vector of
characteristics comprises a number of values that describe, in this case, features of the log file. The values in the vector of characteristics may be arranged in any order and may comprise any number of dimensional values. During operation, the data within each log file is described in the same order of dimensional values so that the signature generator (210) can easily compare a signature derived from the vectors of characteristics to vectors of characteristics derived from new log files received by the computing device (105).
[0034] In one example, a vector of characteristics comprises three groups of dimensional values: numerical values associated with the frequency of any given term that appears in the log file; a numerical value of the compression ratio of the log file based on the size of the log file in a sampled state to the size of the log file that after being subjected to a compression algorithm; and numerical values identifying a number of text patterns within the log file. Each of these values will be described in more detail below. Although, the present specification describes the above three groups of dimensional values, the present specification contemplates the use of a number of additional dimensional values based on the text or other attributes of the log files received by the computing device (105).
[0035] Additionally, although the present examples may describe a single log file being evaluated and compared by the comparator (1 15), the present specification contemplates the comparison of a group of log files to other groups of log files. In some examples, a parent node may include a number of child nodes to which the parent node relays information about to the computing device (105). Here, the log files of each child node may be included with the log files of the parent node forming a group of log files. For better efficiency, the computing device (105) may analyze the text of the group of log files from this group of nodes before analyzing the text of each individual node in the group.
[0036] As described above, the vector of characteristics may comprise a first group of values associated with the frequency of any given term that appears in the log file. Log files comprise human readable text that comprises a number of words that help to describe the type and behavior of each of the nodes. These words may appear multiple times in the text and may indicate what type of node the log file originates or at least may help to identify the node when two log files are compared later. In order to determine which words appear how often in the text of a log file, a "bag-of-words" modeling and a "term frequency-inverse document frequency" (TF-IDF) modeling may be implemented to determine which words appear how often in the log file.
[0037] In a bag-of-words (BOW) model, text within the log file is represented as the "bag" or multiset of its words and may disregard grammar and even word order. Each distinct word within the log file may be counted and a number may be assigned defining how many times each of those distinct words appears in the log file. Each distinct word itself may be assigned a certain dimension in the vector of characteristics to which the value
representing the frequency of that word appears in the log file. As the number of various words increases, the number of dimensions used in this category also increases. In some examples, the number of dimensions used in the category may be around 30,000. These values form part of the vector of characteristics and may be saved on the data storage device (140) to be later combined with the other dimensional values in the vector of characteristics and for future comparison with other log files.
[0038] TF-IDF is a numerical value that is intended to reflect how important a word is to a log file. Along with the BOW, the TF-IDF provides a weighting value that describes how important any word is in the log file. The TF-IDF value increases proportionally to the number of times a word appears in the log file. However, this value is offset by the frequency of the word in the log file, which helps to adjust for the fact that some words appear more frequently in other log files generally. These weighting values describing the importance of each word in the log file may also be added to the number of dimensional values in the vector of characteristics. As mentioned above, the number of dimensions resulting from the BOW modeling and TF-IDF modeling may exceed 30,000.
[0039] With the BOW and TF-IFD values added to the vector of characteristics, a compression ratio value may also be added. This
compression ratio value is a numerical value indicating the compression ratio of the log file based on the size of the log file in an unsampled state to the size of the log file that is subjected to a compression algorithm. This is a single value that occupies a single dimension in the vector of characteristics. In one example, each log file may be subjected to a sampling process that reduces the volume of text within the log file but does not lose any information necessary to properly identify the log file. This may be done to improve performance by not subjecting the processor (1 10) to an increase in processing steps. This sampling is optional, however, and the methods described in here may use the complete log files and may also implement multiple processors to achieve a relatively quick and easy processing of the log files. The compression ratio value, however, is computed using the unsampled log file. Where the log file was sampled, the processor may refer back to the original copy of the log file to compute the compression ratio value. During operation, the characteristics generator may divide the size of the unsampled log file by the size of the log file after it was subjected to a compression algorithm. A compression algorithm encodes log files using fewer memory bits than the original log file
representation by, in one example, eliminating statistical redundancy. By way of an example only, a log file from a first source may have a compression ratio of 0.164 while a log file from a second source may have a compression ratio of 0.169. This compression value is added to a dimension in the vector of characteristics along with the BOW and TF-IFD values described above.
[0040] Along with the BOW and TF-IDF and compression ratio values, the vector of characteristics may further comprise numerical values identifying a number of text patterns within the log file. Along with each log file comprising a number of different words, those words may further comprise a number of text patterns indicating how those words are arranged in the text of the log file. The size of the patterns may vary. During operation, the characteristics generator (205) may identify a number of times a pattern of text appears in the log file. For example, where the text in the log file reads "Queue was created," that may be indicative of a specific text pattern and the characteristics generator (205) may count the number of times that phrase appears in the log file. This may be done for any phrase. In one example, each text pattern may be assigned a text pattern ID as well as a dimension in the vector of characteristics. In one example, the number of dimensional values in the vector of characteristics describing a text pattern may be in excess of 3,000.
[0041] These values added to the vector of characteristics by the characteristics generator (205) now describe the log file in a unique way. The number of dimensions in the vector of characteristics, however, may exceed 30,000. The characteristics generator (205) may further comprise a signature generator (210) that assigns a unique signature to the vector of characteristics.
[0042] After a plurality of vectors of characteristics have been computed, a signature generator (210) may create a unique signature for each log file by applying a machine learning algorithm to the plurality of vectors of characteristics. A machine learning algorithm applies a number of processes to learn which log files are different from the others. In this example, the signature generator (210) may learn from and make predictions on data using such algorithms by building a model from example vectors of characteristics of individual log files in order to make data-driven decisions on which log files originated from which nodes. The signature generator, (210) implementing the machine learning algorithm, receives a set of such vectors of characteristics, where each vector of characteristics is already associated with the source ID, in order to train (statistically learn) a signature model. Specifically, methods of machine learning create a signature which is a mathematical function that can be applied on a vector of characteristics and output a signature unique to that node of the log file. [0043] With this signature model, a new log file received after the training method above may be transformed into a vector of characteristics as described above. Instead of comparing the source ID of the new log file to other source IDs of past log files received, the model may be applied to the vector of characteristics of the new log file. This provides two outputs: the signature of the node to which the log file is most similar and a confidence value indicating a confidence level of the decision that the new log file belongs to that node.
[0044] In this example, the signature may be used to compare to a number of vectors of characteristics in a similar way to determine whether any other log files have originated from the same node or one of a number of different nodes within a computer network. A minimal similarity threshold may be set such that where the minimal similarity threshold between a signature and a vector of characteristics is reached, it is determined that the two nodes from which the log files originated from are in fact the same node.
[0045] In one example, the threshold of minimal similarity may be based on all the log files of the same node. When the training method is implemented a plurality of log files comprising the same source ID may be each converted into a vector of characteristics and the machine learning algorithm may be used to create the signature for that node based on those log files known to originate from the same node.
[0046] When a signature is assigned to all of the vectors of characteristics, a confidence value may be produced indicating a confidence of what node each of those log files originated from. Where the confidence values have a statistically normal distribution, the threshold of minimal similarity may be set to the average confidence value of the log files for that type of node within, for example, plus or minus one standard deviation. In another example, the threshold of minimal similarity may be set to the average confidence value of the log files for that type of node within plus or minus a factor of the standard deviation.
[0047] In the example where the minimal similarity is not greater than plus or minus one standard deviation, the node associated with that subsequent log file is marked as a new node connected to the network. In another example, if the threshold of minimal similarity is less than plus or minus three standard deviations of the average confidence value, the node associated with that subsequent log file is marked as an existing node connected to the network.
[0048] In examples where the processor (1 10) has identified a new node based on a log file received and transformed into a vector of
characteristics, the signature of the new node may be used to improve the quality of all signatures learned by the signature generator (210). In this case, the new data derived from the new signature may be used to further differentiate individual nodes from each other using the machine learning algorithm. Additionally, where a new node has been identified via a new log file received, the vector of characteristics and the signature generated may be used by the signature generator (210) to identify that existing node when a log file is received from that node again. In this example, the threshold of similarity may be adjusted such that a relatively lower threshold of similarity is used thereby allowing the computing device (105) to define characteristics of each node with more specificity.
[0049] The computing device of Fig. 2 also shows a linked node module (215), a number of peripheral device adapters (225), and a display device (240). These will now be described in more detail.
[0050] The linked node module (120) may discover which nodes among the number of nodes in the network are functionally linked together. To accomplish this, a time series pattern for each node may be created thereby capturing the frequency of the log files generated by the node. The behavior of each node may then be captured and analyzed. Specifically, the "normal" and "abnormal" behavior of each node is captured. A correlation between the normal and abnormal behavior patterns is identified among each of the nodes to determine which nodes in the network are behaving similarly. The identification of the normal and abnormal behaviors may be accomplished using, for example, a predictive analytics tool. In one example, "abnormal" behavior may include a rare event such as where a certain node generates log files more often than that node had generated in the past. One example would be where a certain node has generated log files every hour for the past 4 months and then begins generating log files every 4 hours instead. In this example, the decrease in log file generation would be considered a rare event thus being an "abnormal" event. Conversely, "normal" behavior may include log generations where the number of log files generated by any given node over a period of time does not deviate from a certain level of activity.
[0051] The determination as to whether the behavior of any given node is "normal" or "abnormal" may depend on a determination as to whether the activity of that node is similar to activity in the past. In one example, a threshold time period may be used to determine whether activity from a certain node is normal or abnormal. In another example, a threshold number of log files received by the processor may be used to make this determination. In yet another example, both a threshold time period and threshold number of log files receive may be used to determine whether the activity of a given node is or is not "normal."
[0052] In one example, the nodes discovered by the linked node module (215) to be linked together are presented to a user via a graphical user interface presented on the display device (240) associated with the computing device (105). The user may assess and validate whether any node so indicated by the linked node module (215) is linked to another node and how it is linked. The text patterns of the unique signature created by the node identification module (1 15) may also be presented to the user for help in the assessment and validation process. The user may also be presented with information describing how each of the nodes interacts based upon each nodes' role. With this information, the user may validate the functional or physical connection between two nodes.
[0053] The peripheral device adapters (225) may provide an interface to input/output devices, such as, for example, the display device (240), a mouse, or a keyboard. The peripheral device adapters (225) may also provide access to other external devices such as an external storage device, a number of network devices such as, for example, servers, switches, and routers, client devices, other types of computing devices, and combinations thereof.
[0054] The display device (240) may be provided to allow a user of the computing device (105) to interact with and implement the functionality of the computing device (105) as described above. The peripheral device adapters (225) may also create an interface between the processor (1 10) and the display device (240), a printer, or other media output devices. The network interface (125) may provide an interface to other computing devices (135) within, for example, a network (130), thereby enabling the transmission of data between the computing device (135) and other devices located within the network (245). Specifically, the network interface (125) may provide an interface between the computing device (105) and each node communicatively coupled to the network (130) so that each node (135) may send to the computing device (105) its log files. The computing device (105) may then receive those log files via the network interface (125). Although Figs. 1 and 2 show a single network and a single node (135) communicatively coupled to the network (135), the present specification contemplates that the network interface (125) communicatively couples the computing device (105) to any number of networks (130), nodes (135), or networks (130) including any number of nodes (135).
[0055] As described above, mapper (120) receives data from the comparator (1 15) describing what nodes (135) are present in the network (130) as well as information indicating the functional relationship among a plurality of nodes (135) in the network (130). The mapper (120) may cause the display device (240), via the processor, to display the number of graphical user interfaces (GUIs) on the display device (240) showing this data. The GUIs may include aspects of the executable code including the display of the nodes discovered by the linked node module (215) to be linked together. Examples of display devices (240) include a computer screen, a laptop screen, a mobile device screen, a personal digital assistant (PDA) screen, and a tablet screen, among other display devices (240). [0056] Fig. 3 is a flowchart showing a method (300) of mapping nodes within a computer network according to one example of the principles described herein. The method (300) may begin with operating (305) a computing device that receives log files from nodes on a computer network. As described above, any number of log files may be received by the computing device (105) from any number of nodes. In one example, a node may send to the computing device a plurality of log files each of which describe different aspects of the node.
[0057] The method (300) may continue with the computing device comparing (310) the log files to identify log files from different nodes and a same node. As described above, this is done by generating a vector of characteristics for each log file, learn which log files are different from the others via a machine learning algorithm, and create a signature for the node based on the vector of characteristics. The learned signatures can then be compared with vectors of characteristics from new log files received to identify log files from different nodes and same nodes.
[0058] The method (300) may then continue with the computing device mapping (315) the different nodes in the computer network based on the log files. The mapping (315) may be displayed on a display device (240) associated with the computing device (105).
[0059] Fig. 4 is a flowchart showing a method (400) of comparing logs files according to another example of the principles described herein. The method (400) may begin with generating (405) a vector of characteristics from each log file, the vector of characteristics representing a plurality of
characteristics of a corresponding log file. As described above, the vector of characteristics comprises, at least, word frequency value, a compression ratio value the log file, and a text pattern value. The process of how these values are obtained is described above.
[0060] The method (400) may continue with generating (410) a signature from each vector of characteristics. The signature is generated by implementing a machine learning algorithm as described above. [0061] The method (400) may further include comparing (415) a signature from one of the vector of characteristics with a vector of
characteristics generated from a new log file. This process allows the log files to be compared to a signature so that a source ID from each individual log file received does not have to be compared to a source ID for each log file already received by the computing device (105).
[0062] Fig. 5 is a flowchart showing a method (500) of determining whether a correlation among any time series patterns exists among a plurality of nodes in a network. The method (500) may begin with creating (505) a number of time series patters for each mapped node based on the frequency of log files received from each node. The time series pattern may describe a number of log files received from a mapped node within a predefined period of time.
[0063] The method (500) may continue with determining (510) whether a correlation among any time series patterns for any node exists. As described above, a correlation exists where normal and abnormal behavior of any two nodes are similar.
[0064] The method (500) may then continue with presenting (515), to a user of a computing device, any detected correlation of time series pattern. The presentation may be made to a user via a display device (240) on the computing device (105). A user may then verify that a correlation exists based on the information presented.
[0065] Aspects of the present device and method are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to examples of the principles described herein. Each block of the flowchart illustrations and block diagrams, and combinations of blocks in the flowchart illustrations and block diagrams, may be implemented by computer usable program code. The computer usable program code may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the computer usable program code, when executed via, for example, the processor (1 10) of the computer (105) or other programmable data processing apparatus, implement the functions or acts specified in the flowchart and/or block diagram block or blocks. In one example, the computer usable program code may be embodied within a computer readable storage medium; the computer readable storage medium being part of the computer program product. In one example, the computer readable storage medium is a non-transitory computer readable medium.
[0066] The specification and figures describe mapping nodes within a computer network. The computing device uses the log files received from a number of computing devices coupled to the network. This alleviates a network administrator from having to maintain an ever increasing spreadsheet file to keep track of nodes attached to the network. Additionally, the system also relieves the network administrator from having to gain credentials in order to gain access to each node in the network and retrieve information about the nodes. Instead, the system passively accepts and analyzes log files without requiring access credentials to each node. Still further, the system does not interface with a siloed network that could have difficulties integrating with a CMDB. Even further, no pre-existing knowledge of the network or individual nodes within the network is used because the information used to identify individual nodes and their functional connections is received from the log files. Instead, a computing device implementing the comparator may receive any number of log files from any number of nodes and determine, based on the characteristics derived from the log files and what type of node those log files came from, whether each log file did or did not come from the same node. A computing device implementing the linked node module may also determine whether the receipt of log files from any plurality of nodes is normal or abnormal and based on that information and determine whether there is a functional link among a plurality of nodes that may affect the performance among the nodes in the computer network. Although each log file may comprise a source ID, without the present device a one to many comparison would have to be completed by comparing a source ID of a new log file to each and every log file received previously by the computing device. Instead, in one example, a machine learning algorithm may be used to train the computing device to recognize same or different nodes and assign a signature to new nodes when discovered.
[0067] The preceding description has been presented to illustrate and describe examples of the principles described. This description is not intended to be exhaustive or to limit these principles to any precise form disclosed. Many modifications and variations are possible in light of the above teaching.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A method of mapping nodes in a network, the method comprising:
operating a computing device that receives log files from nodes on a computer network;
the computing device comparing the log files to identify log files from different nodes and a same node; and
the computing device mapping the different nodes in the computer network based on the log files.
2 The method of claim 1 , further comprising:
generating a vector of characteristics from each log file, the vector of characteristics representing a plurality of characteristics of a corresponding log file; and
generating a signature from each vector of characteristics;
wherein comparing log files comprises comparing a signature from one of the vector of characteristics with a vector of characteristics generated from a new log file.
3. The method of claim 2, wherein, if the signature from one of the vector of characteristics matches the vector of characteristics generated from a new log file, the signature is applied to the new log file.
4. The method of claim 2, wherein a characteristic among the plurality of characteristics for each log file comprises a word frequency value;
wherein the word frequency value for each log file is generated by determining a number of times a word is found within each of the log files individually, and
wherein the word frequency value for each log file is included in a corresponding one of the vector of characteristics for each log file.
5. The method of claim 2, a characteristic among the plurality of characteristics for each log file comprises a compression ratio value of each log file;
wherein the compression ratio value for each log file is generated by comparing the size of each log file versus a corresponding size of each log file when a data compression process is applied to the log file; and
wherein the compression ratio value is included in a corresponding one of the vector of characteristics for each log file.
6. The method of claim 1 , wherein a characteristic among the plurality of characteristics for each log file comprises a text pattern value;
wherein the text pattern value for each log file is generated by identifying a number of text patterns, assigning a value to each text pattern; and identifying a text pattern value corresponding to the text pattern identified within each log file; and
wherein the text pattern value for each log file is included in a
corresponding one of the vector of characteristics for each log file.
7. The method of claim 1 , further comprising:
creating a time series pattern for each mapped node based on the frequency of log files received from each node;
determining whether a correlation among any time series patterns for any node exists; and
where a temporal correlation exits between a plurality of nodes, determining that the plurality of nodes have a functional relationship.
8. A computing device for automatically mapping nodes in a network, the device comprising:
a network interface for receiving log files from nodes on a network; and at least one processor to implement a comparator and node mapper; wherein the comparator electronically compares log files received and determines whether log files have originated from a same node or different nodes in the network; and
wherein the mapper generates a map of the different nodes in the network based on the comparison of log files through the network interface.
9. The computing device of claim 8, the comparator further comprising a characteristics generator to generate a vector of characteristics from each log file, the vector of characteristics representing a plurality of characteristics of a corresponding log file.
10. The computing device of claim 9, the comparator further comprising a signature generator to generate a signature that identifies each node based on the vector of characteristics.
1 1 . The computing device of claim 10, wherein comparing log files received further comprises comparing signatures associated with a plurality of log files received and wherein the plurality of log files are determined to originate from a same node when the signatures associated with the plurality of log files are similar with a threshold minimal similarity.
12. The computing device of claim 9, further comprising a linked component module to determine, based on a time series patterns for each mapped node, a frequency of log files received from each node and determine that a plurality of mapped nodes are functionally connected based on a temporal correlation of log files received by each of the plurality of node.
13. A non-transitory computer-readable storage medium comprising instructions which, when executed by a processor of a device for mapping a number of nodes within a computer network, causes the processor to: compare a plurality of log files received by the processor to identify log files from different nodes and a same node; and
map the different nodes in the computer network based on the log files.
14. The non-transitory computer-readable storage medium of claim 13, further comprising instructions to, when executed by the processor, create a vector of characteristics describing a plurality of characteristics for each of the log files.
15. The non-transitory computer-readable storage medium of claim 13, further comprising instructions to, when executed by the processor, determine if the plurality of log files are functionally related by:
creating a time series pattern for each node based on the amount of log files received from one node within a period of time;
determining whether the time series patterns for each of the plurality of nodes match.
PCT/US2015/028485 2015-04-30 2015-04-30 Mapping nodes in a network Ceased WO2016175829A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/028485 WO2016175829A1 (en) 2015-04-30 2015-04-30 Mapping nodes in a network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/028485 WO2016175829A1 (en) 2015-04-30 2015-04-30 Mapping nodes in a network

Publications (1)

Publication Number Publication Date
WO2016175829A1 true WO2016175829A1 (en) 2016-11-03

Family

ID=57198613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/028485 Ceased WO2016175829A1 (en) 2015-04-30 2015-04-30 Mapping nodes in a network

Country Status (1)

Country Link
WO (1) WO2016175829A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222557A1 (en) * 2008-02-29 2009-09-03 Raymond Harry Putra Rudy Analysis system, information processing apparatus, activity analysis method and program product
US20100238815A1 (en) * 2009-03-18 2010-09-23 Hunt Technologies, Llc Network Status Detection
KR20110042379A (en) * 2008-08-18 2011-04-26 센소매틱 일렉트로닉스, 엘엘씨 Mobile wireless network for asset tracking and supply chain monitoring
US20110252063A1 (en) * 2010-04-13 2011-10-13 Isaacson Scott A Relevancy filter for new data based on underlying files
US20120250752A1 (en) * 2011-03-30 2012-10-04 Hunt Technologies, Llc Grid Event Detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222557A1 (en) * 2008-02-29 2009-09-03 Raymond Harry Putra Rudy Analysis system, information processing apparatus, activity analysis method and program product
KR20110042379A (en) * 2008-08-18 2011-04-26 센소매틱 일렉트로닉스, 엘엘씨 Mobile wireless network for asset tracking and supply chain monitoring
US20100238815A1 (en) * 2009-03-18 2010-09-23 Hunt Technologies, Llc Network Status Detection
US20110252063A1 (en) * 2010-04-13 2011-10-13 Isaacson Scott A Relevancy filter for new data based on underlying files
US20120250752A1 (en) * 2011-03-30 2012-10-04 Hunt Technologies, Llc Grid Event Detection

Similar Documents

Publication Publication Date Title
US11756404B2 (en) Adaptive severity functions for alerts
US20250047705A1 (en) Modeling Simulated Cybersecurity Attack Difficulty
US10013333B2 (en) Scalable continuous integration and delivery systems and methods
CN110113388B (en) Improved clustering algorithm-based block chain system consensus method and device
US20250274470A1 (en) Intrusion detection using robust singular value decomposition
JP7086972B2 (en) Continuous learning for intrusion detection
US20210304071A1 (en) Systems and methods for generating machine learning applications
JP6587330B2 (en) Random forest model training method, electronic apparatus, and storage medium
CN112527912B (en) Data processing method and device based on block chain network and computer equipment
US20190166150A1 (en) Automatically Assessing a Severity of a Vulnerability Via Social Media
US20160226893A1 (en) Methods for optimizing an automated determination in real-time of a risk rating of cyber-attack and devices thereof
US11563727B2 (en) Multi-factor authentication for non-internet applications
WO2020011286A2 (en) Decentralized automatic phone fraud risk management
CN113472883A (en) Method, device and equipment for storing data based on block chain and storage medium
US20210329033A1 (en) Cybersecurity maturity determination
US20220067160A1 (en) System and method for anomalous database access monitoring
CN113065748A (en) Business risk assessment method, device, equipment and storage medium
US12298995B1 (en) Systems, methods, and computer-readable media for managing an extract, transform, and load process
US20250036516A1 (en) Managing operational functionality of far edge devices using log data
JP7351399B2 (en) Log generation device, log generation method, and program
CN113657808A (en) Personnel evaluation method, device, equipment and storage medium
US11294759B2 (en) Detection of failure conditions and restoration of deployed models in a computing environment
US11023863B2 (en) Machine learning risk assessment utilizing calendar data
CN112703485A (en) Supporting experimental assessment of modifications to computing environments within a distributed system using machine learning methods
US11188655B2 (en) Scanning information technology (IT) components for compliance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15890961

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15890961

Country of ref document: EP

Kind code of ref document: A1