WO2016175829A1

WO2016175829A1 - Mapping nodes in a network

Info

Publication number: WO2016175829A1
Application number: PCT/US2015/028485
Authority: WO
Inventors: Noam Fraenkel; Erez AGAMI; Efrat EGOZI LEVI; Ohad Assulin
Original assignee: Hewlett Packard Enterprise Development LP
Current assignee: Hewlett Packard Enterprise Development LP
Priority date: 2015-04-30
Filing date: 2015-04-30
Publication date: 2016-11-03
Anticipated expiration: 2017-10-30

Abstract

Examples relate to mapping nodes in a network. In an example, a method of mapping nodes in a network may comprise operating a computing device that receives log files from nodes on a computer network the computing device comparing the log files to identify log files from different nodes and a same node and mapping the different nodes in the computer network based on the log files. In another example, a computing device may map nodes in a network by receiving log files from nodes on a network and, with at least one processor, implement a comparator and node mapper wherein the comparator electronically compares log files received and determines whether log files have originated from a same node or different nodes in the network and wherein the mapper generates a map of nodes in the network based on the comparison of log files through the network interface.

Description

MAPPING NODES IN A NETWORK

BACKGROUND

[0001] To manage a computer network, a network administrator may need to know what hardware and software components make up the network. Often a network is described in terms of nodes, each of which may be a point capable of sending and receiving data. A node may include hardware and/or software elements.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] The accompanying drawings illustrate various examples of the principles described herein and are a part of the specification. The examples are given merely for illustration, and do not limit the scope of the claims.

[0003] Fig. 1 is a block diagram of a computing device for mapping nodes within a computer network according to an example of the principles described herein.

[0004] Fig. 2 is a block diagram of a computing device for mapping nodes within a computer network according to another example of the principles described herein.

[0005] Fig. 3 is a flowchart showing a method of mapping nodes within a computer network according to another example of the principles described herein.

[0006] Fig. 4 is a flowchart showing a method of comparing log files according to another example of the principles described herein.

[0007] Fig. 5 is a flowchart showing a method of determining whether a correlation among any time series patterns exists among a plurality of nodes in a network exists according to another example of the principles described herein.

[0008] Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

DETAILED DESCRIPTION

[0009] A network administrator is generally responsible for the security and operation of a network. The network administrator needs to understand risks to the network, the impact of proposed changes and what elements of the network may be causing issues. To accomplish this, the network administrator may need to know what hardware and software components make up the network. As noted above, the network and its components may be described in terms of nodes, each of which is a point capable of sending and receiving data. A node may include hardware and/or software elements.

[0010] The network administrator may determine what network elements constitute each node. As the size of the network grows, keeping track of individual nodes may be challenging. This information defining nodes and their relationships is sometimes organized using only a simple spreadsheet. However, as the size of the network and the number of nodes increase, it may not be effective to track the nodes using only a spreadsheet.

[0011] To deal with the volume of information for larger networks, network administrators have implemented configuration management databases (CMDBs). A CMDB maintains data identifying and defining attributes of the nodes of a network. The CMDB may also maintain data describing the relationships between nodes. A number of rules may be created by the network administrator that allows the CMDB to proactively scan the network to recognize and define nodes.

[0012] However, there may be challenges to integrate a CMDB into a network. Additionally, running proactive scans of the network with a CMDB requires providing credentials to access the various elements of the network. This may be cumbersome to set up. Further, the CMDB requires ongoing maintenance of the rules governing nodes. These rules must be accurate, carefully defined, and up-to-date in order to provide a proper view of the nodes.

[0013] The present specification describes a computing device that does not use a CMDB and/or that does not require a network administrator to implement a spreadsheet in order to identify and manage the nodes in the network. Instead, the computing device may receive a number of log files from any number of nodes within a network and, based on the log files, identify each node in the network and the relationships between nodes. This will be described in more detail below.

[0014] In one example, the present specification describes a method of mapping nodes in a network. The method may comprise operating a computing device that receives log files from nodes on a computer network the computing device comparing the log files to identify log files from different nodes and a same node and mapping the different nodes in the computer network based on the log files.

[0015] In another example, the present specification describes a computing device for automatically mapping nodes in a network. The computing device may comprise a network interface for receiving log files from nodes on a network and at least one processor to implement a comparator and node mapper wherein the comparator electronically compares log files received and determines whether log files have originated from a same node or different nodes in the network and wherein the mapper generates a map of the different nodes in the network based on the comparison of log files through the network interface.

[0016] In yet another example, the present specification describes a non-transitory computer-readable storage medium comprising instructions which, when executed by a processor of a device for mapping a number of nodes within a computer network. The instructions, when executed by the processor, may cause the processor to compare a plurality of log files received by the processor to identify log files from different nodes and a same node and map the different nodes in the computer network based on the log files. [0017] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough

understanding of the present systems and methods. It will be apparent, however, to one skilled in the art that the present apparatus, systems and methods may be practiced without these specific details. Reference in the specification to "an example" or similar language means that a particular feature, structure, or characteristic described in connection with that example is included as described, but may not be included in other examples.

[0018] As used in the present specification and in the appended claims, the term "node" is meant to be understood as a point within a network, capable of sending or receiving data. In one example, a node may be a device, such as printer or workstation, a system, a router, a switch, or a storage location on a disk, among other networked computing devices. In another example, a node may be a module within a computing device. A single computing device may support multiple network nodes.

[0019] Additionally, as used in the present specification and in the appended claims, the term "a number of" or similar language is meant to be understood broadly as any greater than or equal to one.

[0020] Turning now to the figures, Fig. 1 is a block diagram of a computing device (105) for mapping nodes (135) within a computer network (130) according to an example of the principles described herein. The computing device (105) may be implemented in any electronic computing device. Examples of electronic computing devices include servers, desktop computers, laptop computers, personal digital assistants (PDAs), mobile devices, smartphones, gaming systems, and tablets, among other electronic devices.

[0021] The computing device (105) may be utilized in any data processing scenario including, stand-alone hardware, mobile applications, through a computer network, or combinations thereof. Further, the computing device (105) may be used in a computer network, a public cloud network, a private cloud network, a hybrid cloud network, other forms of networks, or combinations thereof. In one example, the methods provided by the computing device (105) are provided as a service over a network by, for example, a third party. In this example, the service may comprise, for example, the following: a Software as a Service (SaaS) hosting a number of applications; a Platform as a Service (PaaS) hosting a computer platform comprising, for example, operating systems, hardware, and storage, among others; an Infrastructure as a Service (laaS) hosting equipment such as, for example, servers, storage components, network, and components, among others; application program interface (API) as a service (APIaaS), other forms of network services, or combinations thereof. The present methods may be implemented on one or multiple hardware platforms, in which the modules in the computing device (105) can be executed on one or across multiple platforms. Such modules can run on various forms of cloud technologies and hybrid cloud technologies or offered as a SaaS (Software as a service) that can be implemented on or off the cloud. In another example, the methods provided by the computing device (105) are executed by a local administrator.

[0022] To achieve its desired functionality, the computing device (105) may comprise various hardware components. Among these hardware components may be a processor (e.g., at least one processor (1 10)). The processor (1 10) may include the hardware architecture to retrieve executable code from the data storage device and execute the executable code.

[0023] The executable code may, when executed by processor (1 10), cause the processor (1 10) to implement at least the functionality of comparing log files received and determining whether log files have originated from a same node (135) or different nodes (135) in the network (130) according to the methods of the present specification described herein. The log files comprise information specific to each of the nodes (135) such as events that occur at the node or messages sent or received between nodes. Some of this information may also include a source identification value (source ID) that identifies a source of the log file. Although a source ID is present on each log file, any number of log files may have been received by the computing device (105) such that a comparison of a subsequent log file's source ID would require a comparison of that value to a myriad of other similar values associated with each log file received. Instead, the comparison is accomplished as described herein, in order to prevent the comparison of one to many.

[0024] The log file may also include a timestamp of when the log file was created, hardware installation dates, hardware identification information, software installation dates, software update information, current message throughput of a hardware device, a number of messages received by a hardware device, the time taken to address any number of types of messages received by a hardware device, among other data associated with each node. Each log file may also comprise data that indicates whether a quality of service threshold is being reached thereby describing the general health of the node. This information is used by the processor (1 10) to make the comparison described above. Specific examples of how the comparison is made will be described in more detail below.

[0025] The executable code may also, when executed by processor (1 10), cause the processor (1 10) to implement at least the functionality of generating a map of different nodes (135) in the network (130) based on the comparison of log files. In one example, the map of nodes may be represented to a user of the computing device (105) on a display device via a number of graphical user interfaces (GUIs). In one example, in addition to showing that the different nodes exist, the node map may further show how each node is functionally related to each node in the network (130). In this example, the executable code may, when executed by processor (1 10), cause the processor (1 10) to determine if a plurality of nodes (135) have a functional relationship between them such that changes in the plurality of nodes affects how each of the nodes operates. Details of the functionality of the processor (1 10) will be described in more detail below.

[0026] The computing device (105) may further comprise a number of modules (1 15, 120) to achieve the methods described herein. The various modules (1 15, 120) may comprise executable program code that may be executed separately. In one example, the modules (1 15, 120) are executable program code stored on a data storage device (140) on the computing device (105). In one example, the various modules (1 15, 120) may be stored as separate computer program products. In another example, the various modules (1 15, 120) within the computing device (105) may be combined within a number of computer program products; each computer program product comprising a number of the modules (1 15, 120).

[0027] In one example, the modules (1 15, 120) may include a comparator (1 15) to, when executed by the processor (1 10), compare the log files received by the processor (1 10) from the network interface (125) to identify log files from different nodes (135) and a same node (135). The comparison of the log files received allows the computing device to discover which nodes exist within the network (130). In one example, a number of nodes (135) within the network (130) send any number of log files each to the computing device (105). As mentioned above, this data allows the comparator (1 15) to determine what nodes exist in the network by comparing data in the log files received.

[0028] The modules (1 15, 120) may also include a mapper (120) to map the different nodes operating on the network (130). The mapper (120) receives data from the comparator (1 15) indicating whether a log file has originated from a new node that has not been discovered by the comparator (1 15) yet, or if the log file is from a same node (135) already identified by the comparator (1 15). As mentioned above, the mapper (120) may also receive information indicating the functional relationship among a plurality of nodes (135) in the network (130) and associated those nodes (135) as such.

[0029] The computing device (105) may further comprise a data storage device (140). The data storage device (140) may store data such as executable program code that is executed by the processor (1 10) or other processing device such as the comparator (1 15) and the mapper (120). The data storage device (140) may also store computer code representing a number of applications that the processor (1 10) executes to implement at least the functionality described herein.

[0030] The data storage device (140) may include various types of memory modules, including volatile and nonvolatile memory. For example, the data storage device (140) of the present example includes Random Access Memory (RAM), Read Only Memory (ROM), and Hard Disk Drive (HDD) memory. Many other types of memory may also be utilized, and the present specification contemplates the use of many varying type(s) of memory in the data storage device (140) as may suit a particular application of the principles described herein. In certain examples, different types of memory in the data storage device (140) may be used for different data storage needs. For example, in certain examples the processor (1 10) may boot from Read Only Memory (ROM), maintain nonvolatile storage in the Hard Disk Drive (HDD) memory, and execute program code stored in Random Access Memory (RAM).

[0031] The data storage device (140) may comprise a computer readable medium, a computer readable storage medium, or a non-transitory computer readable medium, among others. For example, the data storage device (140) may be, but not limited to a system, apparatus, or device implementing electronic, magnetic, optical, electromagnetic, infrared, or semiconductor principles or combinations of the foregoing. More specific examples of the computer readable storage medium may include, for example, the following: an electrical connection having a number of wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store computer usable program code for use by or in connection with an instruction execution system, apparatus, or device. In another example, a computer readable storage medium may be any non-transitory medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

[0032] Fig. 2 is a block diagram of a computing device (105) for mapping nodes (135) within a computer network (130) according to another example of the principles described herein. Similar to Fig. 1 , the computing device (105) shows a comparator (1 15). In this example, the comparator (1 15) may further comprise a characteristics generator (205) and a signature generator (210) to identify log files from different nodes and a same node.

[0033] The characteristics generator (205) may generate a vector of characteristics from each log file. The vector of characteristics represents a plurality of characteristics of a corresponding log file. The vector of

characteristics comprises a number of values that describe, in this case, features of the log file. The values in the vector of characteristics may be arranged in any order and may comprise any number of dimensional values. During operation, the data within each log file is described in the same order of dimensional values so that the signature generator (210) can easily compare a signature derived from the vectors of characteristics to vectors of characteristics derived from new log files received by the computing device (105).

[0034] In one example, a vector of characteristics comprises three groups of dimensional values: numerical values associated with the frequency of any given term that appears in the log file; a numerical value of the compression ratio of the log file based on the size of the log file in a sampled state to the size of the log file that after being subjected to a compression algorithm; and numerical values identifying a number of text patterns within the log file. Each of these values will be described in more detail below. Although, the present specification describes the above three groups of dimensional values, the present specification contemplates the use of a number of additional dimensional values based on the text or other attributes of the log files received by the computing device (105).

[0035] Additionally, although the present examples may describe a single log file being evaluated and compared by the comparator (1 15), the present specification contemplates the comparison of a group of log files to other groups of log files. In some examples, a parent node may include a number of child nodes to which the parent node relays information about to the computing device (105). Here, the log files of each child node may be included with the log files of the parent node forming a group of log files. For better efficiency, the computing device (105) may analyze the text of the group of log files from this group of nodes before analyzing the text of each individual node in the group.

[0036] As described above, the vector of characteristics may comprise a first group of values associated with the frequency of any given term that appears in the log file. Log files comprise human readable text that comprises a number of words that help to describe the type and behavior of each of the nodes. These words may appear multiple times in the text and may indicate what type of node the log file originates or at least may help to identify the node when two log files are compared later. In order to determine which words appear how often in the text of a log file, a "bag-of-words" modeling and a "term frequency-inverse document frequency" (TF-IDF) modeling may be implemented to determine which words appear how often in the log file.

[0037] In a bag-of-words (BOW) model, text within the log file is represented as the "bag" or multiset of its words and may disregard grammar and even word order. Each distinct word within the log file may be counted and a number may be assigned defining how many times each of those distinct words appears in the log file. Each distinct word itself may be assigned a certain dimension in the vector of characteristics to which the value

representing the frequency of that word appears in the log file. As the number of various words increases, the number of dimensions used in this category also increases. In some examples, the number of dimensions used in the category may be around 30,000. These values form part of the vector of characteristics and may be saved on the data storage device (140) to be later combined with the other dimensional values in the vector of characteristics and for future comparison with other log files.

[0038] TF-IDF is a numerical value that is intended to reflect how important a word is to a log file. Along with the BOW, the TF-IDF provides a weighting value that describes how important any word is in the log file. The TF-IDF value increases proportionally to the number of times a word appears in the log file. However, this value is offset by the frequency of the word in the log file, which helps to adjust for the fact that some words appear more frequently in other log files generally. These weighting values describing the importance of each word in the log file may also be added to the number of dimensional values in the vector of characteristics. As mentioned above, the number of dimensions resulting from the BOW modeling and TF-IDF modeling may exceed 30,000.

[0039] With the BOW and TF-IFD values added to the vector of characteristics, a compression ratio value may also be added. This

compression ratio value is a numerical value indicating the compression ratio of the log file based on the size of the log file in an unsampled state to the size of the log file that is subjected to a compression algorithm. This is a single value that occupies a single dimension in the vector of characteristics. In one example, each log file may be subjected to a sampling process that reduces the volume of text within the log file but does not lose any information necessary to properly identify the log file. This may be done to improve performance by not subjecting the processor (1 10) to an increase in processing steps. This sampling is optional, however, and the methods described in here may use the complete log files and may also implement multiple processors to achieve a relatively quick and easy processing of the log files. The compression ratio value, however, is computed using the unsampled log file. Where the log file was sampled, the processor may refer back to the original copy of the log file to compute the compression ratio value. During operation, the characteristics generator may divide the size of the unsampled log file by the size of the log file after it was subjected to a compression algorithm. A compression algorithm encodes log files using fewer memory bits than the original log file

representation by, in one example, eliminating statistical redundancy. By way of an example only, a log file from a first source may have a compression ratio of 0.164 while a log file from a second source may have a compression ratio of 0.169. This compression value is added to a dimension in the vector of characteristics along with the BOW and TF-IFD values described above.

[0040] Along with the BOW and TF-IDF and compression ratio values, the vector of characteristics may further comprise numerical values identifying a number of text patterns within the log file. Along with each log file comprising a number of different words, those words may further comprise a number of text patterns indicating how those words are arranged in the text of the log file. The size of the patterns may vary. During operation, the characteristics generator (205) may identify a number of times a pattern of text appears in the log file. For example, where the text in the log file reads "Queue was created," that may be indicative of a specific text pattern and the characteristics generator (205) may count the number of times that phrase appears in the log file. This may be done for any phrase. In one example, each text pattern may be assigned a text pattern ID as well as a dimension in the vector of characteristics. In one example, the number of dimensional values in the vector of characteristics describing a text pattern may be in excess of 3,000.

[0041] These values added to the vector of characteristics by the characteristics generator (205) now describe the log file in a unique way. The number of dimensions in the vector of characteristics, however, may exceed 30,000. The characteristics generator (205) may further comprise a signature generator (210) that assigns a unique signature to the vector of characteristics.

[0042] After a plurality of vectors of characteristics have been computed, a signature generator (210) may create a unique signature for each log file by applying a machine learning algorithm to the plurality of vectors of characteristics. A machine learning algorithm applies a number of processes to learn which log files are different from the others. In this example, the signature generator (210) may learn from and make predictions on data using such algorithms by building a model from example vectors of characteristics of individual log files in order to make data-driven decisions on which log files originated from which nodes. The signature generator, (210) implementing the machine learning algorithm, receives a set of such vectors of characteristics, where each vector of characteristics is already associated with the source ID, in order to train (statistically learn) a signature model. Specifically, methods of machine learning create a signature which is a mathematical function that can be applied on a vector of characteristics and output a signature unique to that node of the log file. [0043] With this signature model, a new log file received after the training method above may be transformed into a vector of characteristics as described above. Instead of comparing the source ID of the new log file to other source IDs of past log files received, the model may be applied to the vector of characteristics of the new log file. This provides two outputs: the signature of the node to which the log file is most similar and a confidence value indicating a confidence level of the decision that the new log file belongs to that node.

[0044] In this example, the signature may be used to compare to a number of vectors of characteristics in a similar way to determine whether any other log files have originated from the same node or one of a number of different nodes within a computer network. A minimal similarity threshold may be set such that where the minimal similarity threshold between a signature and a vector of characteristics is reached, it is determined that the two nodes from which the log files originated from are in fact the same node.

[0045] In one example, the threshold of minimal similarity may be based on all the log files of the same node. When the training method is implemented a plurality of log files comprising the same source ID may be each converted into a vector of characteristics and the machine learning algorithm may be used to create the signature for that node based on those log files known to originate from the same node.

[0046] When a signature is assigned to all of the vectors of characteristics, a confidence value may be produced indicating a confidence of what node each of those log files originated from. Where the confidence values have a statistically normal distribution, the threshold of minimal similarity may be set to the average confidence value of the log files for that type of node within, for example, plus or minus one standard deviation. In another example, the threshold of minimal similarity may be set to the average confidence value of the log files for that type of node within plus or minus a factor of the standard deviation.

[0047] In the example where the minimal similarity is not greater than plus or minus one standard deviation, the node associated with that subsequent log file is marked as a new node connected to the network. In another example, if the threshold of minimal similarity is less than plus or minus three standard deviations of the average confidence value, the node associated with that subsequent log file is marked as an existing node connected to the network.

[0048] In examples where the processor (1 10) has identified a new node based on a log file received and transformed into a vector of

characteristics, the signature of the new node may be used to improve the quality of all signatures learned by the signature generator (210). In this case, the new data derived from the new signature may be used to further differentiate individual nodes from each other using the machine learning algorithm. Additionally, where a new node has been identified via a new log file received, the vector of characteristics and the signature generated may be used by the signature generator (210) to identify that existing node when a log file is received from that node again. In this example, the threshold of similarity may be adjusted such that a relatively lower threshold of similarity is used thereby allowing the computing device (105) to define characteristics of each node with more specificity.

[0049] The computing device of Fig. 2 also shows a linked node module (215), a number of peripheral device adapters (225), and a display device (240). These will now be described in more detail.

[0050] The linked node module (120) may discover which nodes among the number of nodes in the network are functionally linked together. To accomplish this, a time series pattern for each node may be created thereby capturing the frequency of the log files generated by the node. The behavior of each node may then be captured and analyzed. Specifically, the "normal" and "abnormal" behavior of each node is captured. A correlation between the normal and abnormal behavior patterns is identified among each of the nodes to determine which nodes in the network are behaving similarly. The identification of the normal and abnormal behaviors may be accomplished using, for example, a predictive analytics tool. In one example, "abnormal" behavior may include a rare event such as where a certain node generates log files more often than that node had generated in the past. One example would be where a certain node has generated log files every hour for the past 4 months and then begins generating log files every 4 hours instead. In this example, the decrease in log file generation would be considered a rare event thus being an "abnormal" event. Conversely, "normal" behavior may include log generations where the number of log files generated by any given node over a period of time does not deviate from a certain level of activity.

[0051] The determination as to whether the behavior of any given node is "normal" or "abnormal" may depend on a determination as to whether the activity of that node is similar to activity in the past. In one example, a threshold time period may be used to determine whether activity from a certain node is normal or abnormal. In another example, a threshold number of log files received by the processor may be used to make this determination. In yet another example, both a threshold time period and threshold number of log files receive may be used to determine whether the activity of a given node is or is not "normal."

[0052] In one example, the nodes discovered by the linked node module (215) to be linked together are presented to a user via a graphical user interface presented on the display device (240) associated with the computing device (105). The user may assess and validate whether any node so indicated by the linked node module (215) is linked to another node and how it is linked. The text patterns of the unique signature created by the node identification module (1 15) may also be presented to the user for help in the assessment and validation process. The user may also be presented with information describing how each of the nodes interacts based upon each nodes' role. With this information, the user may validate the functional or physical connection between two nodes.

[0053] The peripheral device adapters (225) may provide an interface to input/output devices, such as, for example, the display device (240), a mouse, or a keyboard. The peripheral device adapters (225) may also provide access to other external devices such as an external storage device, a number of network devices such as, for example, servers, switches, and routers, client devices, other types of computing devices, and combinations thereof.

[0054] The display device (240) may be provided to allow a user of the computing device (105) to interact with and implement the functionality of the computing device (105) as described above. The peripheral device adapters (225) may also create an interface between the processor (1 10) and the display device (240), a printer, or other media output devices. The network interface (125) may provide an interface to other computing devices (135) within, for example, a network (130), thereby enabling the transmission of data between the computing device (135) and other devices located within the network (245). Specifically, the network interface (125) may provide an interface between the computing device (105) and each node communicatively coupled to the network (130) so that each node (135) may send to the computing device (105) its log files. The computing device (105) may then receive those log files via the network interface (125). Although Figs. 1 and 2 show a single network and a single node (135) communicatively coupled to the network (135), the present specification contemplates that the network interface (125) communicatively couples the computing device (105) to any number of networks (130), nodes (135), or networks (130) including any number of nodes (135).

[0055] As described above, mapper (120) receives data from the comparator (1 15) describing what nodes (135) are present in the network (130) as well as information indicating the functional relationship among a plurality of nodes (135) in the network (130). The mapper (120) may cause the display device (240), via the processor, to display the number of graphical user interfaces (GUIs) on the display device (240) showing this data. The GUIs may include aspects of the executable code including the display of the nodes discovered by the linked node module (215) to be linked together. Examples of display devices (240) include a computer screen, a laptop screen, a mobile device screen, a personal digital assistant (PDA) screen, and a tablet screen, among other display devices (240). [0056] Fig. 3 is a flowchart showing a method (300) of mapping nodes within a computer network according to one example of the principles described herein. The method (300) may begin with operating (305) a computing device that receives log files from nodes on a computer network. As described above, any number of log files may be received by the computing device (105) from any number of nodes. In one example, a node may send to the computing device a plurality of log files each of which describe different aspects of the node.

[0057] The method (300) may continue with the computing device comparing (310) the log files to identify log files from different nodes and a same node. As described above, this is done by generating a vector of characteristics for each log file, learn which log files are different from the others via a machine learning algorithm, and create a signature for the node based on the vector of characteristics. The learned signatures can then be compared with vectors of characteristics from new log files received to identify log files from different nodes and same nodes.

[0058] The method (300) may then continue with the computing device mapping (315) the different nodes in the computer network based on the log files. The mapping (315) may be displayed on a display device (240) associated with the computing device (105).

[0059] Fig. 4 is a flowchart showing a method (400) of comparing logs files according to another example of the principles described herein. The method (400) may begin with generating (405) a vector of characteristics from each log file, the vector of characteristics representing a plurality of

characteristics of a corresponding log file. As described above, the vector of characteristics comprises, at least, word frequency value, a compression ratio value the log file, and a text pattern value. The process of how these values are obtained is described above.

[0060] The method (400) may continue with generating (410) a signature from each vector of characteristics. The signature is generated by implementing a machine learning algorithm as described above. [0061] The method (400) may further include comparing (415) a signature from one of the vector of characteristics with a vector of

characteristics generated from a new log file. This process allows the log files to be compared to a signature so that a source ID from each individual log file received does not have to be compared to a source ID for each log file already received by the computing device (105).

[0062] Fig. 5 is a flowchart showing a method (500) of determining whether a correlation among any time series patterns exists among a plurality of nodes in a network. The method (500) may begin with creating (505) a number of time series patters for each mapped node based on the frequency of log files received from each node. The time series pattern may describe a number of log files received from a mapped node within a predefined period of time.

[0063] The method (500) may continue with determining (510) whether a correlation among any time series patterns for any node exists. As described above, a correlation exists where normal and abnormal behavior of any two nodes are similar.

[0064] The method (500) may then continue with presenting (515), to a user of a computing device, any detected correlation of time series pattern. The presentation may be made to a user via a display device (240) on the computing device (105). A user may then verify that a correlation exists based on the information presented.

[0065] Aspects of the present device and method are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to examples of the principles described herein. Each block of the flowchart illustrations and block diagrams, and combinations of blocks in the flowchart illustrations and block diagrams, may be implemented by computer usable program code. The computer usable program code may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the computer usable program code, when executed via, for example, the processor (1 10) of the computer (105) or other programmable data processing apparatus, implement the functions or acts specified in the flowchart and/or block diagram block or blocks. In one example, the computer usable program code may be embodied within a computer readable storage medium; the computer readable storage medium being part of the computer program product. In one example, the computer readable storage medium is a non-transitory computer readable medium.

[0066] The specification and figures describe mapping nodes within a computer network. The computing device uses the log files received from a number of computing devices coupled to the network. This alleviates a network administrator from having to maintain an ever increasing spreadsheet file to keep track of nodes attached to the network. Additionally, the system also relieves the network administrator from having to gain credentials in order to gain access to each node in the network and retrieve information about the nodes. Instead, the system passively accepts and analyzes log files without requiring access credentials to each node. Still further, the system does not interface with a siloed network that could have difficulties integrating with a CMDB. Even further, no pre-existing knowledge of the network or individual nodes within the network is used because the information used to identify individual nodes and their functional connections is received from the log files. Instead, a computing device implementing the comparator may receive any number of log files from any number of nodes and determine, based on the characteristics derived from the log files and what type of node those log files came from, whether each log file did or did not come from the same node. A computing device implementing the linked node module may also determine whether the receipt of log files from any plurality of nodes is normal or abnormal and based on that information and determine whether there is a functional link among a plurality of nodes that may affect the performance among the nodes in the computer network. Although each log file may comprise a source ID, without the present device a one to many comparison would have to be completed by comparing a source ID of a new log file to each and every log file received previously by the computing device. Instead, in one example, a machine learning algorithm may be used to train the computing device to recognize same or different nodes and assign a signature to new nodes when discovered.

[0067] The preceding description has been presented to illustrate and describe examples of the principles described. This description is not intended to be exhaustive or to limit these principles to any precise form disclosed. Many modifications and variations are possible in light of the above teaching.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A method of mapping nodes in a network, the method comprising:

operating a computing device that receives log files from nodes on a computer network;

the computing device comparing the log files to identify log files from different nodes and a same node; and

the computing device mapping the different nodes in the computer network based on the log files.

2 The method of claim 1 , further comprising:

generating a vector of characteristics from each log file, the vector of characteristics representing a plurality of characteristics of a corresponding log file; and

generating a signature from each vector of characteristics;

wherein comparing log files comprises comparing a signature from one of the vector of characteristics with a vector of characteristics generated from a new log file.

3. The method of claim 2, wherein, if the signature from one of the vector of characteristics matches the vector of characteristics generated from a new log file, the signature is applied to the new log file.

4. The method of claim 2, wherein a characteristic among the plurality of characteristics for each log file comprises a word frequency value;

wherein the word frequency value for each log file is generated by determining a number of times a word is found within each of the log files individually, and

wherein the word frequency value for each log file is included in a corresponding one of the vector of characteristics for each log file.

5. The method of claim 2, a characteristic among the plurality of characteristics for each log file comprises a compression ratio value of each log file;

wherein the compression ratio value for each log file is generated by comparing the size of each log file versus a corresponding size of each log file when a data compression process is applied to the log file; and

wherein the compression ratio value is included in a corresponding one of the vector of characteristics for each log file.

6. The method of claim 1 , wherein a characteristic among the plurality of characteristics for each log file comprises a text pattern value;

wherein the text pattern value for each log file is generated by identifying a number of text patterns, assigning a value to each text pattern; and identifying a text pattern value corresponding to the text pattern identified within each log file; and

wherein the text pattern value for each log file is included in a

corresponding one of the vector of characteristics for each log file.

7. The method of claim 1 , further comprising:

creating a time series pattern for each mapped node based on the frequency of log files received from each node;

determining whether a correlation among any time series patterns for any node exists; and

where a temporal correlation exits between a plurality of nodes, determining that the plurality of nodes have a functional relationship.

8. A computing device for automatically mapping nodes in a network, the device comprising:

a network interface for receiving log files from nodes on a network; and at least one processor to implement a comparator and node mapper; wherein the comparator electronically compares log files received and determines whether log files have originated from a same node or different nodes in the network; and

wherein the mapper generates a map of the different nodes in the network based on the comparison of log files through the network interface.

9. The computing device of claim 8, the comparator further comprising a characteristics generator to generate a vector of characteristics from each log file, the vector of characteristics representing a plurality of characteristics of a corresponding log file.

10. The computing device of claim 9, the comparator further comprising a signature generator to generate a signature that identifies each node based on the vector of characteristics.

1 1 . The computing device of claim 10, wherein comparing log files received further comprises comparing signatures associated with a plurality of log files received and wherein the plurality of log files are determined to originate from a same node when the signatures associated with the plurality of log files are similar with a threshold minimal similarity.

12. The computing device of claim 9, further comprising a linked component module to determine, based on a time series patterns for each mapped node, a frequency of log files received from each node and determine that a plurality of mapped nodes are functionally connected based on a temporal correlation of log files received by each of the plurality of node.

13. A non-transitory computer-readable storage medium comprising instructions which, when executed by a processor of a device for mapping a number of nodes within a computer network, causes the processor to: compare a plurality of log files received by the processor to identify log files from different nodes and a same node; and

map the different nodes in the computer network based on the log files.

14. The non-transitory computer-readable storage medium of claim 13, further comprising instructions to, when executed by the processor, create a vector of characteristics describing a plurality of characteristics for each of the log files.

15. The non-transitory computer-readable storage medium of claim 13, further comprising instructions to, when executed by the processor, determine if the plurality of log files are functionally related by:

creating a time series pattern for each node based on the amount of log files received from one node within a period of time;

determining whether the time series patterns for each of the plurality of nodes match.