JP2017199250A

JP2017199250A - Computer system, data analysis method, and computer

Info

Publication number: JP2017199250A
Application number: JP2016090661A
Authority: JP
Inventors: 千絵増田; Chie Masuda; 松原　大典; Daisuke Matsubara; 大典松原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2016-04-28
Filing date: 2016-04-28
Publication date: 2017-11-02

Abstract

PROBLEM TO BE SOLVED: To improve the reduction and the analysis accuracy of a computer resource quantity that is used in a system having a plurality of analyzers.SOLUTION: In a computer system having a plurality of computers, the plurality of computers each include: a computer having an analysis control part including a plurality of analysis parts for executing analysis processing; a feature amount calculating part for calculating a feature amount used by the analysis parts; and a computer having a sorting part for selecting an analysis part to be used. The sorting part manages sorting information including a plurality of entries including a feature amount, a kind of the analysis processing, and a result of the analysis processing, retrieves the entry including the feature amount similar to the calculated feature amount from the data by referring to the sorting information, selects the analysis processing to be executed on the basis of the result of the analysis processing included in the retrieved entry, and adds the feature amount calculated from the data, the kind of the analysis processing, and the result of the analysis processing to the sorting information when the result of the analysis processing is received.SELECTED DRAWING: Figure 2

Description

本発明は、ネットワークを介して送受信されるデータを用いた分析処理を実行する分析器を複数備えるシステムの管理方法に関する。 The present invention relates to a system management method including a plurality of analyzers that execute analysis processing using data transmitted and received via a network.

近年、ビッグデータといわれる大量の情報を収集し、収集した大量の情報を利用するソリューションが期待されている。このような大量の情報を利用したソリューションの１つとしてネットワーク運用管理がある。 In recent years, a solution that collects a large amount of information called big data and uses the collected large amount of information is expected. One of the solutions using such a large amount of information is network operation management.

ビッグデータを利用したネットワーク運用管理技術では、ネットワーク装置を流れる情報がパケットレベルで分析される。この技術は、侵入検知システム（ＩＤＳ：ＩｎｔｒｕｓｉｏｎＤｅｔｅｃｔｉｏｎＳｙｓｔｅｍ）等の分野で使われている。 In network operation management technology using big data, information flowing through a network device is analyzed at a packet level. This technique is used in fields such as an intrusion detection system (IDS).

ＩＤＳには、シグネチャ型ＩＤＳ及びアノマリ型ＩＤＳが存在する。シグネチャ型ＩＤＳは、通過する情報が登録されたパターンと一致するか否かを判定することによって、不正アクセスを検知する。一方、アノマリ型ＩＤＳは、通過する情報を分析することによってパターンに登録されていない未知の不正アクセスを検知する。より具体的には、アノマリ型ＩＤＳは、正常なトラフィックを用いた機械学習に基づいて学習モデルを生成し、通過する情報と学習モデルとを比較することによって、正常なトラフィックであるか否かを判定する。 The IDS includes a signature type IDS and an anomaly type IDS. The signature type IDS detects unauthorized access by determining whether or not the passing information matches the registered pattern. On the other hand, the anomaly type IDS detects unknown unauthorized access that is not registered in the pattern by analyzing passing information. More specifically, the anomaly type IDS generates a learning model based on machine learning using normal traffic, and compares whether the information is passed with the learning model to determine whether the traffic is normal traffic. judge.

前述したような機械学習に基づく分析は、不正アクセスの検知だけでなく、複数の装置が分散配置されたシステムにおいて各装置の動作品質の保証及び装置管理等、様々な分野での応用が期待されている。 The above-described analysis based on machine learning is expected not only to detect unauthorized access but also to be applied in various fields such as guaranteeing the operation quality of each device and device management in a system in which multiple devices are distributed. ing.

アノマリ型ＩＤＳは、検知率が低く、また、誤検知率が高いといった問題がある。前述の問題の解決方法として、不正アクセスの種類毎に専門の分析器を用いることによって検知精度（分析精度）を向上させる方法が考えられる。別の解決方法としては、複数の分析器を統合することによって性能の高い一つの分析器を構成するアンサンブル学習法が考えられる（例えば、特許文献１参照）。 Anomaly-type IDS has a problem that the detection rate is low and the false detection rate is high. As a method for solving the above-described problem, a method of improving detection accuracy (analysis accuracy) by using a specialized analyzer for each type of unauthorized access is conceivable. As another solution, an ensemble learning method that constitutes one analyzer with high performance by integrating a plurality of analyzers is conceivable (for example, see Patent Document 1).

特許文献１には、「不正アクセスによって引き起こされる異常を、トラヒック量や通信範囲の異常、通信手順の異常、送受信データの異常の３種類として定義した複数グループに分類し、グループ毎の検出に特化した特徴量を用いた検出モジュールを備えたシステムを構成して不正アクセスの検出を行う。タイムスロット型、フロー・カウント型、フロー・ペイロード型の各グループの検出に特化した特徴量を用いた検出モジュールを備え、各検出モジュールの検出結果の論理和を最終的な出力結果とするシステムを構成し、いずれかの検出モジュールが異常と判断するとシステムがアラートを警告することにより不正アクセスの検出を行う」ことが記載されている。 Patent Document 1 states that “abnormalities caused by unauthorized access are classified into multiple groups defined as three types of traffic volume, communication range abnormality, communication procedure abnormality, and transmission / reception data abnormality. Configures a system equipped with a detection module that uses specialized features to detect unauthorized access, and uses feature features specialized for detection of time slot type, flow count type, and flow payload type groups. The system detects the unauthorized access by alerting an alert when any of the detection modules is determined to be abnormal. To do ".

しかし、特許文献１に記載の方法では、複数の分析器が並列的に動作するため、処理負荷が大きくなり、また、処理時間が長くなるという問題がある。したがって、特許文献１に記載の方法ではシステム性能が低下する。そのため、分析精度を向上させるとともに、分析処理に用いられる計算機リソース量を削減する技術が求められている。 However, the method described in Patent Document 1 has a problem that the processing load becomes large and the processing time becomes long because a plurality of analyzers operate in parallel. Therefore, the method described in Patent Document 1 degrades the system performance. Therefore, there is a need for a technique for improving the accuracy of analysis and reducing the amount of computer resources used for analysis processing.

分析処理に用いられる計算機リソース量を削減する方法として、特許文献２に記載された技術が知られている。特許文献２には、「ゲートウェイ装置は、第１ネットワーク上の１ないし複数の機器の動作情報を取得して解析装置に送信する動作情報取得部を備え、解析装置は、動作情報を用いて機器の障害解析を行う障害解析部を備え、動作情報取得部は、取得した動作情報のうち重要度の高いものをあらかじめ定められた重要度にしたがって絞り込んだ上で解析装置３００に送信する」ことが記載されている。 As a method of reducing the amount of computer resources used for analysis processing, a technique described in Patent Document 2 is known. Patent Document 2 states that “the gateway device includes an operation information acquisition unit that acquires operation information of one or more devices on the first network and transmits the operation information to the analysis device. A failure analysis unit that performs a failure analysis of the operation information, and the operation information acquisition unit narrows down the highly important pieces of the acquired operation information according to a predetermined importance and transmits them to the analysis apparatus 300. Have been described.

特開２００６−１１５１２９号公報JP 2006-115129 A 特開２０１３−３４２４３号公報JP 2013-34243 A

特許文献２の技術では予めルールを定義する必要があるため、未知の情報については、従来と同様に全ての分析器が分析を行う必要がある。そのため、分析処理に使用する計算機リソース量を削減することができない。 Since the technique of Patent Document 2 needs to define a rule in advance, it is necessary for all analyzers to analyze unknown information as in the conventional case. Therefore, the amount of computer resources used for analysis processing cannot be reduced.

本発明は、複数の機械学習に基づく分析を行うシステムにおいて、分析精度の向上及び分析処理に使用する計算機リソース量の削減を目的とする。 An object of the present invention is to improve analysis accuracy and reduce the amount of computer resources used for analysis processing in a system that performs analysis based on a plurality of machine learnings.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、複数の計算機を備える計算機システムであって、前記複数の計算機の各々は、プロセッサ、前記プロセッサに接続されるメモリ、前記プロセッサに接続され、他の装置と接続するためのインタフェースを有し、前記複数の計算機は、ネットワークを介して送受信されるデータを用いた分析処理を実行する分析部を複数含む分析制御部を有する計算機と、前記データを用いて前記複数の分析部の各々が使用する特徴量を算出する特徴量算出部、及び使用する前記分析部を選択する振分部を有する計算機と、を含み、前記振分部は、前記複数の分析部の各々が使用する前記特徴量、前記分析処理の種別、及び分析処理の結果を含むエントリを複数含む振分情報を管理し、前記分析処理の結果は、前記分析処理が必要であるか否かを示す値であり、前記振分部は、前記振分情報を参照して、前記データから算出された特徴量に類似する特徴量を含む類似エントリを検索し、前記類似エントリに含まれる前記分析処理の結果に基づいて、実行する前記分析処理を選択し、前記振分部によって選択された分析処理を実行する前記分析部は、前記データから算出された特徴量を用いた分析処理を実行し、前記分析処理の結果を前記振分部に送信し、前記振分部は、前記分析処理の結果を受信した場合、前記データから算出された特徴量、前記分析処理の種別、及び前記分析処理の結果を含むエントリを前記振分情報に追加することを特徴とする。 A typical example of the invention disclosed in the present application is as follows. That is, a computer system including a plurality of computers, each of the plurality of computers has a processor, a memory connected to the processor, an interface connected to the processor and connected to another device, The plurality of computers are used by a computer having an analysis control unit including a plurality of analysis units that execute analysis processing using data transmitted and received via a network, and each of the plurality of analysis units using the data. A feature amount calculating unit that calculates a feature amount, and a computer having a distribution unit that selects the analysis unit to be used, and the distribution unit includes the feature amount used by each of the plurality of analysis units, The distribution information including a plurality of entries including the type of the analysis process and the result of the analysis process is managed, and the result of the analysis process indicates whether the analysis process is necessary or not. The distribution unit refers to the distribution information, searches for a similar entry including a feature amount similar to the feature amount calculated from the data, and results of the analysis process included in the similar entry The analysis unit to be executed is selected based on the analysis unit, and the analysis unit that executes the analysis process selected by the allocating unit executes the analysis process using the feature amount calculated from the data, and the analysis The processing result is transmitted to the allocating unit, and when the allocating unit receives the result of the analysis process, the characteristic amount calculated from the data, the type of the analysis process, and the result of the analysis process are transmitted. An entry including the information is added to the distribution information.

本発明によれば、振分情報に基づいて分析部が選択されるため、分析処理に使用する計算機リソース量を削減できる。また、分析部の分析結果を振分情報に反映することによって、分析精度を向上させることができる。前述した以外の課題、構成及び効果は、以下の実施例の説明によって明らかにされる。 According to the present invention, since an analysis unit is selected based on distribution information, the amount of computer resources used for analysis processing can be reduced. Also, the analysis accuracy can be improved by reflecting the analysis result of the analysis unit in the distribution information. Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.

実施例１の計算機システムの構成例を示す図である。1 is a diagram illustrating a configuration example of a computer system according to a first embodiment. 実施例１の分析装置のソフトウェア構成の詳細を説明する図である。FIG. 3 is a diagram illustrating details of a software configuration of the analysis apparatus according to the first embodiment. 実施例１の振分情報群に含まれる振分情報の一例を示す図である。It is a figure which shows an example of the distribution information contained in the distribution information group of Example 1. 実施例１の振分部が実行する処理を説明するフローチャートである。It is a flowchart explaining the process which the distribution part of Example 1 performs. 実施例１の特徴量空間の一例を示す図である。FIG. 5 is a diagram illustrating an example of a feature amount space according to the first embodiment. 実施例２の計算機システムの構成例を示す図である。It is a figure which shows the structural example of the computer system of Example 2. FIG.

以下、本発明の実施例を、図面を用いて説明する。なお、以下で説明する実施例は一例にすぎず、本発明が適用される実施例は、以下の実施例に限られるわけではない。さらに、以下に示した実施例は単独で適用してもよいし、複数又は全ての実施例を組み合わせて適用しても構わない。 Embodiments of the present invention will be described below with reference to the drawings. In addition, the Example described below is only an example, and the Example to which the present invention is applied is not limited to the following Example. Furthermore, the embodiments shown below may be applied singly or in combination of a plurality or all of the embodiments.

図１は、実施例１の計算機システムの構成例を示す図である。 FIG. 1 is a diagram illustrating a configuration example of a computer system according to the first embodiment.

計算機システムは、データセンタ１００及び複数の端末１０１から構成される。データセンタ１００及び複数の端末１０１は外部ＮＷ１０５を介して接続される。 The computer system includes a data center 100 and a plurality of terminals 101. The data center 100 and the plurality of terminals 101 are connected via an external NW 105.

データセンタ１００は、ＮＷ装置１０２、計算機１０３、及び分析装置１０４を含む。なお、各装置は二つ以上存在してもよい。計算機１０３及び分析装置１０４は、ＮＷ装置１０２に接続する。 The data center 100 includes an NW device 102, a computer 103, and an analysis device 104. Two or more devices may exist. The computer 103 and the analysis device 104 are connected to the NW device 102.

ＮＷ装置１０２は、ネットワークを介して外部の装置及び内部装置を接続する装置である。ＮＷ装置１０２は、例えば、スイッチ、ルータ、及びゲートウェイ等が考えられる。ＮＷ装置１０２は、端末１０１及び計算機１０３との間で送受信されるデータをミラーリングし、ミラーリングされたデータを分析装置１０４に送信する。 The NW device 102 is a device that connects an external device and an internal device via a network. Examples of the NW device 102 include a switch, a router, and a gateway. The NW device 102 mirrors data transmitted / received between the terminal 101 and the computer 103, and transmits the mirrored data to the analysis device 104.

計算機１０３は、端末１０１からの処理要求に基づいて各種処理を実行する。例えば、計算機１０３は、Ｗｅｂサーバ及びデータベースサーバ等として、端末１０１にサービスを提供する。なお、本実施例は、計算機１０３の構成、及び計算機１０３が提供するサービスの種別等に限定されない。 The computer 103 executes various processes based on processing requests from the terminal 101. For example, the computer 103 provides a service to the terminal 101 as a Web server, a database server, or the like. The present embodiment is not limited to the configuration of the computer 103, the type of service provided by the computer 103, and the like.

分析装置１０４は、ＮＷ装置１０２を通過するデータ（パケット）又は当該データのログをＮＷ装置１０２から取得し、当該データを分析することによってデータの搾取、データの破壊、データの改ざん、及び計算機１０３に機能不全等を目的とした不正アクセスを検知する。以下の説明では、ＮＷ装置１０２を通過するデータ又は当該データのログを観測データとも記載する。 The analysis device 104 acquires data (packets) passing through the NW device 102 or a log of the data from the NW device 102 and analyzes the data to extract data, destroy data, tamper data, and the computer 103. Detect unauthorized access for the purpose of malfunction. In the following description, data passing through the NW device 102 or a log of the data is also referred to as observation data.

不正アクセスとしては、ＤｏＳ（ＤｅｎｉａｌｏｆＳｅｒｖｉｃｅ）攻撃、Ｕ２Ｒ（ＵｓｅｒｔｏＲｏｏｔ）攻撃、Ｒ２Ｌ（ＲｅｍｏｔｅｔｏＬｏｃａｌ）攻撃、及びＰｒｏｂｅ攻撃等が知られている。 As unauthorized access, DoS (Denial of Service) attack, U2R (User to Root) attack, R2L (Remote to Local) attack, Probe attack, and the like are known.

ＤｏＳ攻撃は、大量のデータ又は異常データを送信することによって、データを受信したシステムを稼働できない状態にする攻撃である。Ｕ２Ｒ攻撃及びＲ２Ｌ攻撃は、異常データを送信することによって、システムに不正に侵入する攻撃である。また、Ｐｒｏｂｅ攻撃は、システムのサービス及びプロトコル等を調査する攻撃である。 A DoS attack is an attack that renders a system that has received data inoperable by transmitting a large amount of data or abnormal data. The U2R attack and the R2L attack are attacks that illegally enter a system by transmitting abnormal data. The Probe attack is an attack for investigating system services and protocols.

分析装置１０４は、ハードウェアとして、ＣＰＵ１１０、メモリ１１１、記憶装置１１２、及びＩ／Ｆ１１３を有する。各構成は内部バス等を介して互いに接続される。なお、端末１０１、ＮＷ装置１０２、及び計算機１０３のハードウェアは分析装置１０４と同一であるものとする。 The analysis device 104 includes a CPU 110, a memory 111, a storage device 112, and an I / F 113 as hardware. Each component is connected to each other via an internal bus or the like. It is assumed that the hardware of the terminal 101, the NW device 102, and the computer 103 is the same as that of the analysis device 104.

ＣＰＵ１１０は、メモリ１１１に格納されるプログラムを実行する。ＣＰＵ１１０がプログラムを実行することによって、分析装置１０４が有する機能を実現できる。以下の説明では、機能部を主語に処理を説明する場合、ＣＰＵ１１０が当該機能部を実現するプログラムを実行していることを示す。 CPU 110 executes a program stored in memory 111. The functions of the analysis apparatus 104 can be realized by the CPU 110 executing the program. In the following description, when a process is described with a functional unit as a subject, it indicates that the CPU 110 is executing a program that realizes the functional unit.

メモリ１１１は、ＣＰＵ１１０が実行するプログラムを格納する。また、メモリ１１１は、プログラムが処理に使用するワークエリアを含む。メモリ１１１に格納されるプログラムについては後述する。 The memory 111 stores a program executed by the CPU 110. Further, the memory 111 includes a work area used by the program for processing. The program stored in the memory 111 will be described later.

記憶装置１１２は、情報を永続的に格納する。記憶装置１１２は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）及びＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等が考えられる。記憶装置１１２は、各種情報を格納する記憶部１３０として使用される。記憶装置１１２に格納される情報については後述する。 The storage device 112 stores information permanently. The storage device 112 may be an HDD (Hard Disk Drive), an SSD (Solid State Drive), or the like. The storage device 112 is used as a storage unit 130 that stores various types of information. Information stored in the storage device 112 will be described later.

Ｉ／Ｆ１１３は、他の装置と接続するためのインタフェースである。Ｉ／Ｆ１１３は、例えば、ネットワークインタフェースが考えられる。 The I / F 113 is an interface for connecting to other devices. For example, the I / F 113 may be a network interface.

メモリ１１１に格納されるプログラムについて説明する。メモリ１１１は、特徴量算出部１２０、振分部１２１、及び分析制御部１２２を実現するプログラムを格納する。 A program stored in the memory 111 will be described. The memory 111 stores a program for realizing the feature amount calculation unit 120, the distribution unit 121, and the analysis control unit 122.

特徴量算出部１２０は、観測データを用いて、分析処理に使用する各種特徴量を算出する。 The feature amount calculation unit 120 calculates various feature amounts used for the analysis processing using the observation data.

分析制御部１２２は、複数の分析処理を実行する。本実施例の分析制御部１２２は、ＤｏＳ攻撃、Ｕ２Ｒ攻撃、Ｒ２Ｌ攻撃、及びＰｒｏｂｅ攻撃のそれぞれを検知するための分析部（分析器）を有する。各分析部は、それぞれの攻撃を検知するための分析処理を実行する。 The analysis control unit 122 executes a plurality of analysis processes. The analysis control unit 122 according to the present embodiment includes an analysis unit (analyzer) for detecting each of a DoS attack, a U2R attack, an R2L attack, and a Probe attack. Each analysis unit executes an analysis process for detecting each attack.

各分析部が分析処理に使用する特徴量は異なる場合がある。本実施例では、以下のような特徴量が用いられる。 The feature amount used by each analysis unit for analysis processing may be different. In the present embodiment, the following feature amounts are used.

Ｕ２Ｒ攻撃分析部２０１（図２参照）及びＲ２Ｌ攻撃分析部２０２（図２参照）は、一つのパケットから算出される特徴量を用いて分析処理を実行する。Ｕ２Ｒ攻撃及びＲ２Ｌ攻撃は、パケットに含まれる異常データに起因するためである。 The U2R attack analysis unit 201 (see FIG. 2) and the R2L attack analysis unit 202 (see FIG. 2) execute an analysis process using a feature amount calculated from one packet. This is because the U2R attack and the R2L attack are caused by abnormal data included in the packet.

ＤｏＳ攻撃分析部２００（図２参照）は、複数のパケットを集約したフローから算出される特徴量を用いて分析処理を実行する。ＤｏＳ攻撃は、通信量及び通信範囲の異常に起因するためである。 The DoS attack analysis unit 200 (see FIG. 2) executes an analysis process using a feature amount calculated from a flow in which a plurality of packets are aggregated. This is because the DoS attack is caused by an abnormality in the communication amount and communication range.

Ｐｒｏｂｅ攻撃分析部２０３（図２参照）は、フローから算出された特徴量又は一定期間に取得した複数のパケットから算出された特徴量を用いて分析処理を実行する。Ｐｒｏｂｅ攻撃は、通信量及び通信範囲の異常に起因する場合、又は、通信手順の異常に起因する場合があるためである。 The probe attack analysis unit 203 (see FIG. 2) executes an analysis process using the feature amount calculated from the flow or the feature amount calculated from a plurality of packets acquired in a certain period. This is because the Probe attack may be caused by an abnormality in the communication amount and the communication range, or may be caused by an abnormality in the communication procedure.

Ｐｒｏｂｅ攻撃の一つである「ＩＰｓｗｅｅｐ」は、不特定のＩＰアドレスに対してｐｉｎｇを実行し、稼働しているシステムを特定する攻撃である。「ＩＰｓｗｅｅｐ」を検知するためには、例えば、同一の送信元から送信されるパケットの数を特徴量として用いればよい。 “IPsweep”, which is one of the probe attacks, is an attack that pings an unspecified IP address and identifies an operating system. In order to detect “IPsweep”, for example, the number of packets transmitted from the same transmission source may be used as the feature amount.

分析部が分析処理に使用する特徴量の種別は、一つでもよいし、また、複数でもよい。本発明は、分析処理に使用する特徴量に限定されない。 There may be one or more types of feature quantities used by the analysis unit for the analysis processing. The present invention is not limited to the feature amount used for the analysis process.

振分部１２１は、特徴量算出部１２０によって算出された各種特徴量に基づいて、分析処理の実行を指示する分析部を選択する。 The allocating unit 121 selects an analysis unit that instructs execution of the analysis processing based on the various feature amounts calculated by the feature amount calculation unit 120.

記憶装置１１２によって実現される記憶部１３０に格納される情報について説明する。記憶部１３０は、ログ情報１４０、特徴量情報群１４１、振分情報群１４２、及び学習データ群１４３を格納する。なお、記憶部１３０に格納される情報は、メモリ１１１に格納されてもよい。 Information stored in the storage unit 130 realized by the storage device 112 will be described. The storage unit 130 stores log information 140, a feature amount information group 141, a distribution information group 142, and a learning data group 143. Information stored in the storage unit 130 may be stored in the memory 111.

ログ情報１４０は、ＮＷ装置１０２から取得した観測データをログとして管理する情報である。ログ情報１４０には、タイムスタンプ、送信元のＩＰアドレス、パケットサイズ等を含むエントリが複数含まれる。 The log information 140 is information for managing observation data acquired from the NW device 102 as a log. The log information 140 includes a plurality of entries including a time stamp, a source IP address, a packet size, and the like.

特徴量情報群１４１は、各分析部が使用する特徴量を管理する情報である。振分情報群１４２は、分析部を選択するための情報である。学習データ群１４３は、各分析部が機械学習に使用する学習データを管理する情報である。 The feature quantity information group 141 is information for managing the feature quantities used by each analysis unit. The distribution information group 142 is information for selecting an analysis unit. The learning data group 143 is information for managing learning data used by each analysis unit for machine learning.

図２は、実施例１の分析装置１０４のソフトウェア構成の詳細を説明する図である。なお、各機能部を接続する線は、論理的な接続関係を示す。 FIG. 2 is a diagram illustrating details of the software configuration of the analysis apparatus 104 according to the first embodiment. In addition, the line which connects each function part shows a logical connection relationship.

まず、分析制御部１２２、特徴量情報群１４１、振分情報群１４２、及び学習データ群１４３の詳細について説明する。 First, details of the analysis control unit 122, the feature amount information group 141, the distribution information group 142, and the learning data group 143 will be described.

分析制御部１２２は、ＤｏＳ攻撃分析部２００、Ｕ２Ｒ攻撃分析部２０１、Ｒ２Ｌ攻撃分析部２０２、及びＰｒｏｂｅ攻撃分析部２０３を含む。各分析部は、特徴量情報群１４１を参照して、受信した観測データに関連する分析処理を実行する。 The analysis control unit 122 includes a DoS attack analysis unit 200, a U2R attack analysis unit 201, an R2L attack analysis unit 202, and a Probe attack analysis unit 203. Each analysis unit refers to the feature quantity information group 141 and executes an analysis process related to the received observation data.

特徴量情報群１４１は、パケット特徴量情報２１０、フロー特徴量情報２１１、及び周期特徴量情報２１２を含む。 The feature amount information group 141 includes packet feature amount information 210, flow feature amount information 211, and periodic feature amount information 212.

パケット特徴量情報２１０は、パケット単位の特徴量を管理する情報である。フロー特徴量情報２１１は、フロー単位の特徴量を管理する情報である。周期特徴量情報２１２は、任意の時間範囲の観測データを用いて算出される特徴量を管理する情報である。 The packet feature amount information 210 is information for managing a feature amount in units of packets. The flow feature amount information 211 is information for managing a feature amount in units of flows. Periodic feature amount information 212 is information for managing feature amounts calculated using observation data in an arbitrary time range.

振分情報群１４２は、パケット用振分情報２２０、フロー用振分情報２２１、及び周期用振分情報２２２を含む。 The distribution information group 142 includes packet distribution information 220, flow distribution information 221, and period distribution information 222.

パケット用振分情報２２０は、パケット単位の特徴量に基づいて、分析部を選択するための情報である。フロー用振分情報２２１は、フロー単位の特徴量に基づいて、分析部を選択するための情報である。周期用振分情報２２２は、任意の時間範囲の観測データを用いて算出される特徴量に基づいて、分析部を選択するための情報である。 The packet distribution information 220 is information for selecting an analysis unit based on a feature amount in units of packets. The flow distribution information 221 is information for selecting an analysis unit based on a flow-unit feature amount. The periodic distribution information 222 is information for selecting an analysis unit based on a feature amount calculated using observation data in an arbitrary time range.

なお、分析に使用する特徴量の組合せが複数存在するため、パケット用振分情報２２０、フロー用振分情報２２１、及び周期用振分情報２２２は複数存在する。 Since there are a plurality of combinations of feature amounts used for analysis, there are a plurality of packet distribution information 220, flow distribution information 221, and period distribution information 222.

学習データ群１４３は、ＤｏＳ攻撃分析用学習データ２３０、Ｕ２Ｒ攻撃分析用学習データ２３１、Ｒ２Ｌ攻撃分析用学習データ２３２、Ｐｒｏｂｅ攻撃分析用学習データ２３３を含む。各学習データには、正常な通信の特徴量を含むデータが含まれる。 The learning data group 143 includes DoS attack analysis learning data 230, U2R attack analysis learning data 231, R2L attack analysis learning data 232, and Probe attack analysis learning data 233. Each learning data includes data including a characteristic amount of normal communication.

ＤｏＳ攻撃分析用学習データ２３０は、ＤｏＳ攻撃分析部２００が使用する学習データである。Ｕ２Ｒ攻撃分析用学習データ２３１は、Ｕ２Ｒ攻撃分析部２０１が使用する学習データである。Ｒ２Ｌ攻撃分析用学習データ２３２は、Ｒ２Ｌ攻撃分析部２０２が使用する学習データである。Ｐｒｏｂｅ攻撃分析用学習データ２３３は、Ｐｒｏｂｅ攻撃分析部２０３が使用する学習データである。 The DoS attack analysis learning data 230 is learning data used by the DoS attack analysis unit 200. The U2R attack analysis learning data 231 is learning data used by the U2R attack analysis unit 201. The R2L attack analysis learning data 232 is learning data used by the R2L attack analysis unit 202. The probe attack analysis learning data 233 is learning data used by the probe attack analysis unit 203.

次に、分析装置１０４の処理の流れについて説明する。 Next, the processing flow of the analyzer 104 will be described.

分析制御部１２２に含まれる分析部は、学習データ群１４３に格納される学習データを用いて機械学習を実行する。なお、機械学習は、周期的に実行されてもよいし、ユーザからの指示を受け付けた場合に実行されてもよい。 The analysis unit included in the analysis control unit 122 performs machine learning using the learning data stored in the learning data group 143. The machine learning may be executed periodically, or may be executed when an instruction from the user is received.

特徴量算出部１２０は、ＮＷ装置１０２から受信した観測データをログ情報１４０に格納し、ログ情報１４０を用いて各種特徴量を算出する。例えば、特徴量算出部１２０は、パケット単位の特徴量、フロー単位の特徴量、及び任意の時間範囲の観測データの特徴量を算出する。 The feature amount calculation unit 120 stores the observation data received from the NW device 102 in the log information 140 and calculates various feature amounts using the log information 140. For example, the feature amount calculation unit 120 calculates a feature amount in units of packets, a feature amount in units of flows, and a feature amount of observation data in an arbitrary time range.

特徴量算出部１２０は、算出された特徴量を特徴量情報群１４１に格納し、その後、振分部１２１に処理の開始を指示する。 The feature amount calculation unit 120 stores the calculated feature amount in the feature amount information group 141, and then instructs the distribution unit 121 to start processing.

振分部１２１は、特徴量情報群１４１及び振分情報群１４２に基づいて、異常を示す特徴量であるか否かを判定する。異常を示す特徴量であると判定された場合、振分部１２１は、特徴量情報群１４１及び振分情報群１４２に基づいて、受信した観測データに関連する分析処理を実行する分析部を選択する。 Based on the feature amount information group 141 and the distribution information group 142, the distribution unit 121 determines whether or not the feature amount indicates an abnormality. When it is determined that the feature amount indicates an abnormality, the distribution unit 121 selects an analysis unit that executes an analysis process related to the received observation data based on the feature amount information group 141 and the distribution information group 142 To do.

振分部１２１は、分析制御部１２２に選択された分析部の実行を指示する。具体的には、振分部１２１は、選択された分析部に対応する分析関数を呼び出し、算出された特徴量を引数として分析制御部１２２に入力する。 The distribution unit 121 instructs the analysis control unit 122 to execute the selected analysis unit. Specifically, the distribution unit 121 calls an analysis function corresponding to the selected analysis unit, and inputs the calculated feature amount as an argument to the analysis control unit 122.

なお、振分部１２１が実行する処理の詳細は、図４を用いて説明する。 Details of processing executed by the allocating unit 121 will be described with reference to FIG.

分析制御部１２２は、呼び出された分析関数に対応する分析部に分析処理の実行を指示する。分析部は、引数として入力された特徴量に基づいて、不正アクセスを検知するための分析処理を実行する。分析制御部１２２は、分析部によって実行された分析処理の結果を振分部１２１に出力する。 The analysis control unit 122 instructs the analysis unit corresponding to the called analysis function to execute the analysis process. The analysis unit executes analysis processing for detecting unauthorized access based on the feature amount input as an argument. The analysis control unit 122 outputs the result of the analysis process executed by the analysis unit to the distribution unit 121.

振分部１２１は、分析処理の結果に基づいて振分情報群１４２を更新する。 The distribution unit 121 updates the distribution information group 142 based on the result of the analysis process.

以上で説明したように、振分部１２１は、異常を示す特徴量を検知した場合、特徴量情報群１４１及び振分情報群１４２に基づいて分析処理を実行する分析部を選択する。これによって、必要な分析処理のみが実行されるため、分析装置１０４が使用する計算機リソース量を削減できる。また、分析処理の結果に基づいて振分情報群１４２が更新されるため、分析装置１０４における不正アクセスの検知精度が向上する。 As described above, the distribution unit 121 selects an analysis unit that performs an analysis process based on the feature amount information group 141 and the distribution information group 142 when detecting a feature amount indicating abnormality. As a result, only necessary analysis processing is executed, so that the amount of computer resources used by the analysis device 104 can be reduced. Further, since the distribution information group 142 is updated based on the result of the analysis process, the accuracy of detecting unauthorized access in the analysis device 104 is improved.

図３は、実施例１の振分情報群１４２に含まれる振分情報の一例を示す図である。図３では、フロー用振分情報２２１の一例を示す。 FIG. 3 is a diagram illustrating an example of distribution information included in the distribution information group 142 according to the first embodiment. FIG. 3 shows an example of flow distribution information 221.

フロー用振分情報２２１は、特徴量３０１、分析種別３０２、及び分析結果３０３を含むエントリを複数含む。 The flow distribution information 221 includes a plurality of entries including a feature quantity 301, an analysis type 302, and an analysis result 303.

特徴量３０１は、分析部を選択するための指標となる特徴量である。図３の特徴量３０１は、送信パケット数３１１及びコネクション割合３１２を含む。 The feature quantity 301 is a feature quantity that serves as an index for selecting an analysis unit. 3 includes the number of transmitted packets 311 and the connection ratio 312.

送信パケット数３１１は、任意のフローを介して端末１０１から計算機１０３に送信されたパケットの数である。所定の期間（例えば、パケットが送信された時間から５秒前の間）に生成されたコネクションのうち、パケットの送信元の端末１０１と計算機１０３との間に生成されたコネクションの割合である。 The number of transmitted packets 311 is the number of packets transmitted from the terminal 101 to the computer 103 via an arbitrary flow. Of the connections generated during a predetermined period (for example, 5 seconds before the time when the packet was transmitted), this is the ratio of connections generated between the terminal 101 that sent the packet and the computer 103.

分析種別３０２は、分析処理の種別である。分析結果３０３は、分析処理の結果である。分析結果３０３には、正常な通信であることを示す「正常」及び異常な通信であることを示す「異常」のいずれかが格納される。本実施例では、分析処理を実行するか否かを示す情報として分析結果３０３を用いる。すなわち、分析結果３０３が「正常」の場合には、分析処理が不要であると判定され、分析結果３０３が「異常」の場合には、分析処理が必要であると判定される。 The analysis type 302 is a type of analysis process. The analysis result 303 is a result of the analysis process. The analysis result 303 stores either “normal” indicating normal communication or “abnormal” indicating abnormal communication. In the present embodiment, the analysis result 303 is used as information indicating whether or not to execute the analysis process. That is, when the analysis result 303 is “normal”, it is determined that the analysis process is not necessary, and when the analysis result 303 is “abnormal”, it is determined that the analysis process is necessary.

図４は、実施例１の振分部１２１が実行する処理を説明するフローチャートである。図５は、実施例１の特徴量空間の一例を示す図である。 FIG. 4 is a flowchart illustrating processing executed by the distribution unit 121 according to the first embodiment. FIG. 5 is a diagram illustrating an example of the feature amount space according to the first embodiment.

振分部１２１は、特徴量算出部１２０から処理の開始指示を受け付けた場合、以下で説明する処理を開始する。なお、特徴量算出部１２０は、観測データの受信に伴って更新された特徴量情報の識別情報、及び特徴量情報のエントリの識別情報を振分部１２１に入力するものとする。 When the distribution unit 121 receives a process start instruction from the feature amount calculation unit 120, the distribution unit 121 starts the process described below. Note that the feature amount calculation unit 120 inputs the identification information of the feature amount information and the identification information of the entry of the feature amount information updated with the reception of the observation data to the distribution unit 121.

振分部１２１は、振分情報群１４２の中から振分情報を一つ選択する（ステップＳ４０１）。 The distribution unit 121 selects one distribution information from the distribution information group 142 (step S401).

具体的には、振分部１２１は、更新された特徴量を含む振分情報を検索し、検索された振分情報のリストを生成する。振分部１２１は、振分情報のリストを参照して、振分情報を一つ選択する。このとき、振分部１２１は、特徴量情報群１４１の更新された特徴量情報から選択された振分情報の特徴量３０１に対応する特徴量を取得する。 Specifically, the distribution unit 121 searches for distribution information including the updated feature amount, and generates a list of searched distribution information. The distribution unit 121 refers to the distribution information list and selects one distribution information. At this time, the distribution unit 121 acquires a feature amount corresponding to the feature amount 301 of the distribution information selected from the updated feature amount information of the feature amount information group 141.

次に、振分部１２１は、選択された振分情報を参照して、特徴量３０１が取得されたエントリの特徴量と類似するエントリが存在するか否かを判定する（ステップＳ４０２）。具体的には、以下のような処理が実行される。 Next, the distribution unit 121 refers to the selected distribution information, and determines whether there is an entry similar to the feature amount of the entry from which the feature amount 301 is acquired (step S402). Specifically, the following processing is executed.

振分部１２１は、選択された振分情報の特徴量３０１を軸とする特徴量空間に各エントリの特徴量をプロットする。図３に示すフロー用振分情報２２１が選択された場合、図５に示すような特徴量空間に各エントリの特徴量がプロットされる。図３に示すフロー用振分情報２２１は、二つの特徴量を含むため特徴量空間は二次元となる。したがって、ｎ個の特徴量を含む振分情報の場合、特徴量空間はｎ次元となる。 The distribution unit 121 plots the feature amount of each entry in the feature amount space with the feature amount 301 of the selected distribution information as an axis. When the flow distribution information 221 shown in FIG. 3 is selected, the feature amount of each entry is plotted in the feature amount space as shown in FIG. Since the flow distribution information 221 shown in FIG. 3 includes two feature amounts, the feature amount space is two-dimensional. Therefore, in the case of distribution information including n feature quantities, the feature quantity space is n-dimensional.

また、白丸及び黒丸は、フロー用振分情報２２１のエントリの特徴量を示す。白丸は任意の分析処理の分析結果３０３が「正常」であるエントリの特徴量を示す。黒丸は任意の分析処理の分析結果３０３が「異常」であるエントリの特徴量を示す。ここでは、Ｐｒｏｂｅ攻撃の分析処理の分析結果３０３を想定する。なお、白丸及び黒丸の区別は説明のために区別したものである。 White circles and black circles indicate feature amounts of entries in the flow distribution information 221. A white circle indicates a feature amount of an entry whose analysis result 303 of an arbitrary analysis process is “normal”. A black circle indicates a feature amount of an entry whose analysis result 303 of an arbitrary analysis process is “abnormal”. Here, the analysis result 303 of the analysis process of the Probe attack is assumed. The distinction between white circles and black circles is made for the purpose of explanation.

振分部１２１は、更新された特徴量情報から更新された特徴量を含むエントリを取得し、更新されたエントリの特徴量を特徴量空間にプロットする。図５に示す特徴量空間のバツ印が取得されたエントリの特徴量の点を示す。 The allocating unit 121 acquires an entry including the updated feature value from the updated feature value information, and plots the updated feature value of the entry in the feature value space. The feature value points of the entry from which the cross mark of the feature value space shown in FIG. 5 is acquired are shown.

振分部１２１は、特徴量空間における各エントリの特徴量と取得されたエントリの特徴量との間の距離を算出する。振分部１２１は、分析結果３０３が「正常」であるエントリなかで最も距離が短いエントリ（第１エントリ）を特定し、分析結果３０３が「異常」であるエントリのなかで最も距離が短いエントリ（第２エントリ）を特定する。 The allocating unit 121 calculates a distance between the feature amount of each entry in the feature amount space and the feature amount of the acquired entry. The allocating unit 121 identifies the entry having the shortest distance (first entry) among the entries whose analysis result 303 is “normal”, and the entry having the shortest distance among the entries whose analysis result 303 is “abnormal”. Specify (second entry).

図５に示す例では、点（α２，β２）に対応するエントリが第１エントリとなり、点（α３，β３）に対応するエントリが第２エントリとなる。 In the example shown in FIG. 5, the entry corresponding to the point (α2, β2) is the first entry, and the entry corresponding to the point (α3, β3) is the second entry.

振分部１２１は、第１エントリの特徴量と取得されたエントリの特徴量との間の距離ｒ１、第２エントリの特徴量と取得されたエントリの特徴量との間の距離ｒ２に基づいて、類似するエントリを特定する。具体的には、振分部１２１は、以下に示す四つの条件に基づいて、類似するエントリを特定する。 The allocating unit 121 is based on the distance r1 between the feature value of the first entry and the acquired feature value of the entry, and the distance r2 between the feature value of the second entry and the acquired feature value of the entry. Identify similar entries. Specifically, the distribution unit 121 identifies similar entries based on the following four conditions.

（条件１）ｒ１≦Ｒ１かつｒ２＞Ｒ
（条件２）ｒ１＞Ｒ１かつｒ２≦Ｒ
（条件３）ｒ１≦Ｒ１かつｒ２≦Ｒ
（条件４）ｒ１＞Ｒ１かつｒ２＞Ｒ２ (Condition 1) r1 ≦ R1 and r2> R
(Condition 2) r1> R1 and r2 ≦ R
(Condition 3) r1 ≦ R1 and r2 ≦ R
(Condition 4) r1> R1 and r2> R2

Ｒ１は分析結果３０３が「正常」であるエントリに類似と判定する基準距離を表す。Ｒ２は分析結果３０３が「異常」であるエントリと類似と判定する基準距離を表す。ただし、Ｒ２はＲ１より大きいものとする。 R1 represents a reference distance determined to be similar to an entry whose analysis result 303 is “normal”. R2 represents a reference distance determined to be similar to an entry whose analysis result 303 is “abnormal”. However, R2 is larger than R1.

通常、トラフィックの大部分が正常な通信内容であり、異常な通信内容を含むトラフィックは少ない。そのため、分析結果３０３が「異常」であるエントリは、分析結果３０３が「正常」であるエントリが分布するエリアとは異なるエリアに存在する。また、一般的に分析結果３０３が「異常」であるエントリの周辺には、分析結果３０３が「正常」であるエントリは存在しない。そこで、Ｒ２を十分大きくすることによって、異常を示す特徴量を検知する精度を高める効果がある。 Usually, most of the traffic is normal communication content, and there is little traffic including abnormal communication content. For this reason, the entry whose analysis result 303 is “abnormal” exists in an area different from the area where the entries whose analysis result 303 is “normal” are distributed. In general, there is no entry whose analysis result 303 is “normal” around the entry whose analysis result 303 is “abnormal”. Therefore, by sufficiently increasing R2, there is an effect of improving the accuracy of detecting the feature amount indicating abnormality.

（条件１）は、第１エントリを中心とする円の領域に取得されたエントリの特徴量が含まれることを示す。（条件２）は、第２エントリを中心とする円の領域に取得されたエントリの特徴量が含まれることを示す。（条件３）は、第１エントリを中心とする円の領域及び第２エントリを中心とする円の領域の両方に取得されたエントリの特徴量が含まれることを示す。（条件４）は、第１エントリを中心とする円の領域及び第２エントリを中心とする円の領域のいずれにも取得されたエントリの特徴量が含まれないことを示す。 (Condition 1) indicates that the feature amount of the acquired entry is included in a circular area centered on the first entry. (Condition 2) indicates that the feature amount of the acquired entry is included in a circle area centered on the second entry. (Condition 3) indicates that the feature amount of the acquired entry is included in both the circle area centered on the first entry and the circle area centered on the second entry. (Condition 4) indicates that the feature quantity of the acquired entry is not included in any of the circle area centered on the first entry and the circle area centered on the second entry.

（条件１）を満たす場合、振分部１２１は、取得されたエントリの特徴量が第１エントリの特徴量３０１に類似すると判定する。（条件２）を満たす場合、振分部１２１は、取得されたエントリの特徴量が第２エントリの特徴量３０１に類似すると判定する。（条件３）を満たす場合、振分部１２１は、取得されたエントリの特徴量が第２エントリの特徴量３０１に類似すると判定する。（条件４）を満たす場合、振分部１２１は、特徴量３０１が類似するエントリは存在しないと判定する。以上がステップＳ４０２の処理の説明である。 When (Condition 1) is satisfied, the distribution unit 121 determines that the feature value of the acquired entry is similar to the feature value 301 of the first entry. When (Condition 2) is satisfied, the distribution unit 121 determines that the feature amount of the acquired entry is similar to the feature amount 301 of the second entry. When (Condition 3) is satisfied, the distribution unit 121 determines that the feature value of the acquired entry is similar to the feature value 301 of the second entry. When (Condition 4) is satisfied, the allocating unit 121 determines that there is no entry having a similar feature amount 301. The above is the description of the process in step S402.

特徴量３０１が取得されたエントリの特徴量と類似するエントリが存在しないと判定された場合、振分部１２１は、全ての分析部を選択し、全ての分析部に対して分析処理の実行を指示する（ステップＳ４０９）。その後、振分部１２１は、ステップＳ４０６に進む。 If it is determined that there is no entry similar to the feature amount of the entry for which the feature amount 301 is acquired, the allocating unit 121 selects all the analysis units, and executes the analysis processing for all the analysis units. An instruction is given (step S409). Thereafter, the allocating unit 121 proceeds to step S406.

なお、振分部１２１は、分析制御部１２２から分析結果を受信した場合、選択された振分情報の識別情報、取得されたエントリの特徴量、分析種別、及び分析結果を対応付けたエントリをメモリ１１１に一時的に格納する。 When the distribution unit 121 receives the analysis result from the analysis control unit 122, the distribution unit 121 displays an entry that associates the identification information of the selected distribution information, the feature amount of the acquired entry, the analysis type, and the analysis result. It is temporarily stored in the memory 111.

特徴量３０１が取得されたエントリの特徴量と類似するエントリが存在すると判定された場合、振分部１２１は、類似するエントリの分析種別３０２に基づいて分析処理を一つ選択し（ステップＳ４０３）、当該分析処理に対応する分析結果３０３が「正常」であるか否かを判定する（ステップＳ４０４）。 If it is determined that there is an entry similar to the feature value of the entry for which the feature value 301 is acquired, the distribution unit 121 selects one analysis process based on the analysis type 302 of the similar entry (step S403). Then, it is determined whether or not the analysis result 303 corresponding to the analysis process is “normal” (step S404).

選択された分析処理に対応する分析結果３０３が「正常」であると判定された場合、振分部１２１は、ステップＳ４０５に進む。 If it is determined that the analysis result 303 corresponding to the selected analysis process is “normal”, the distribution unit 121 proceeds to step S405.

選択された分析処理に対応する分析結果３０３が「異常」であると判定された場合、振分部１２１は、当該分析処理に対応する分析部を選択し、選択された分析部に対して分析処理の実行を指示する（ステップＳ４１０）。その後、振分部１２１は、ステップＳ４０５に進む。 When it is determined that the analysis result 303 corresponding to the selected analysis process is “abnormal”, the allocating unit 121 selects an analysis unit corresponding to the analysis process, and analyzes the selected analysis unit. The execution of the process is instructed (step S410). Thereafter, the allocating unit 121 proceeds to step S405.

ステップＳ４０５では、振分部１２１は、類似するエントリの全ての分析種別３０２について処理が完了したか否かを判定する（ステップＳ４０５）。 In step S405, the distribution unit 121 determines whether or not the processing has been completed for all analysis types 302 of similar entries (step S405).

類似するエントリの全ての分析種別３０２について処理が完了していないと判定された場合、振分部１２１は、ステップＳ４０３に戻り、同様の処理を実行する。 If it is determined that processing has not been completed for all analysis types 302 of similar entries, the distribution unit 121 returns to step S403 and executes similar processing.

類似するエントリの全ての分析種別３０２について処理が完了したと判定された場合、振分部１２１は、全ての振分情報について処理が完了したか否かを判定する（ステップＳ４０６）。 When it is determined that processing has been completed for all analysis types 302 of similar entries, the distribution unit 121 determines whether processing has been completed for all distribution information (step S406).

具体的には、振分部１２１は、振分情報のリストに含まれる全ての振分情報について処理が完了したか否かを判定する。 Specifically, the distribution unit 121 determines whether or not processing has been completed for all distribution information included in the distribution information list.

全ての振分情報について処理が完了していないと判定された場合、振分部１２１は、ステップＳ４０１に戻り、同様の処理を実行する。 If it is determined that the processing has not been completed for all the distribution information, the distribution unit 121 returns to step S401 and executes the same processing.

全ての振分情報について処理が完了したと判定された場合、振分部１２１は、一回以上分析部が選択されたか否かを判定する（ステップＳ４０７）。すなわち、ステップＳ４０９又はステップＳ４１０の処理が一回以上実行されたか否かが判定される。 When it is determined that the processing has been completed for all distribution information, the distribution unit 121 determines whether or not the analysis unit has been selected one or more times (step S407). That is, it is determined whether or not the process of step S409 or step S410 has been executed once or more.

分析部が選択されていないと判定された場合、振分部１２１は、処理を終了する。 When it is determined that the analysis unit is not selected, the distribution unit 121 ends the process.

一回以上分析部が選択されたと判定された場合、振分部１２１は、分析制御部１２２から分析処理の結果を全て受信した後、振分情報群１４２を更新し（ステップＳ４０８）、その後、処理を終了する。 When it is determined that the analysis unit has been selected one or more times, the distribution unit 121 updates the distribution information group 142 after receiving all the results of the analysis processing from the analysis control unit 122 (step S408). The process ends.

具体的には、振分部１２１は、メモリ１１１に格納されるエントリを参照して、更新する振分情報を特定し、特定された振分情報にエントリを一つ追加する。振分部１２１は、追加されたエントリの特徴量３０１に取得されたエントリの特徴量を設定し、分析種別３０２に全ての分析種別の行を生成し、各行の分析結果３０３に分析結果を設定する。なお、実行が指示されていない分析処理の分析結果３０３には、「正常」が設定されるものとする。 Specifically, the distribution unit 121 refers to the entries stored in the memory 111, identifies distribution information to be updated, and adds one entry to the identified distribution information. The allocating unit 121 sets the acquired feature amount of the entry in the added entry feature amount 301, generates a row of all analysis types in the analysis type 302, and sets the analysis result in the analysis result 303 of each row To do. It should be noted that “normal” is set in the analysis result 303 of the analysis process for which execution is not instructed.

なお、図３では、振分情報の分析種別３０２には全ての分析処理の行が含まれるが、本発明はこれに限定されない。例えば、特徴量３０１を用いる分析処理の行のみを含んでもよい。 In FIG. 3, the distribution information analysis type 302 includes all analysis processing rows, but the present invention is not limited to this. For example, only the row of the analysis process using the feature quantity 301 may be included.

なお、実施例１では、パケットの特徴量を用いた分析処理を行うシステムを例に説明したが、本発明はこれに限定されない。パケット以外のデータを分析する分析部を複数有するシステムでも同様の効果を奏する。 In the first embodiment, the system that performs the analysis processing using the feature amount of the packet has been described as an example, but the present invention is not limited to this. A system having a plurality of analysis units that analyze data other than packets also has the same effect.

実施例１によれば、分析装置１０４は、振分情報に基づいて、任意の分析結果が異常を示す特徴量に類似する特徴量の有無を判定し、異常を示す特徴量に類似する特徴量が検知された場合、当該特徴量を用いた分析処理を実行する分析部を選択し、分析処理の実行を指示する。これによって、分析装置１０４が分析処理に使用する計算機リソース量を削減できる。 According to the first embodiment, the analysis apparatus 104 determines the presence or absence of a feature quantity similar to a feature quantity whose arbitrary analysis result indicates abnormality based on the distribution information, and a feature quantity similar to the feature quantity indicating abnormality. Is detected, an analysis unit that executes an analysis process using the feature amount is selected, and execution of the analysis process is instructed. As a result, the amount of computer resources used by the analysis apparatus 104 for analysis processing can be reduced.

また、分析部の分析結果に基づいて振分情報を更新することによって、特徴量の類否判断に使用するデータ量が増加するため、適切に分析部を選択できる。これによって、システム全体の分析精度を向上させることができる。 Further, by updating the distribution information based on the analysis result of the analysis unit, the amount of data used for determining the similarity of the feature amount increases, so that the analysis unit can be appropriately selected. Thereby, the analysis accuracy of the entire system can be improved.

実施例２では、振分部１２１及び分析制御部１２２が別々の装置に実装される点が実施例１と異なる。以下実施例１との差異を中心に実施例２について説明する。 The second embodiment is different from the first embodiment in that the distribution unit 121 and the analysis control unit 122 are mounted on different devices. Hereinafter, the second embodiment will be described focusing on differences from the first embodiment.

図６は、実施例２の計算機システムの構成例を示す図である。 FIG. 6 is a diagram illustrating a configuration example of a computer system according to the second embodiment.

実施例２ではデータセンタ１００内の構成が実施例１のデータセンタ１００と異なる。具体的には、データセンタ１００は、ＮＷ装置１０２、計算機１０３、振分装置６００、及び分析装置６０１を含む。なお、各装置は二つ以上存在してもよい。計算機１０３及び振分装置６００は、ＮＷ装置１０２に接続する。 In the second embodiment, the configuration in the data center 100 is different from the data center 100 in the first embodiment. Specifically, the data center 100 includes an NW device 102, a computer 103, a distribution device 600, and an analysis device 601. Two or more devices may exist. The computer 103 and the distribution device 600 are connected to the NW device 102.

実施例２のデータセンタ１００は、分析処理を選択する振分装置６００及び分析処理を実行する分析装置６０１を別々の装置として含む点が実施例１と異なる。 The data center 100 according to the second embodiment is different from the first embodiment in that a sorting apparatus 600 that selects an analysis process and an analysis apparatus 601 that executes the analysis process are included as separate apparatuses.

振分装置６００のメモリ１１１は、特徴量算出部１２０及び振分部１２１を実現するプログラムを格納する。振分装置６００の記憶部１３０は、ログ情報１４０、特徴量情報群１４１、及び振分情報群１４２を格納する。 The memory 111 of the distribution device 600 stores a program for realizing the feature amount calculation unit 120 and the distribution unit 121. The storage unit 130 of the distribution device 600 stores log information 140, a feature amount information group 141, and a distribution information group 142.

分析装置６０１のメモリ１１１は、分析制御部１２２を実現するプログラムを格納する。分析装置６０１の記憶部１３０は、学習データ群１４３を格納する。 The memory 111 of the analysis apparatus 601 stores a program that implements the analysis control unit 122. The storage unit 130 of the analysis device 601 stores a learning data group 143.

特徴量算出部１２０及び分析制御部１２２が実行する処理は、実施例１と同一である。また、ログ情報１４０、特徴量情報群１４１、振分情報群１４２、及び学習データ群１４３の内容は、実施例１と同一である。 The processing executed by the feature amount calculation unit 120 and the analysis control unit 122 is the same as that in the first embodiment. The contents of the log information 140, the feature amount information group 141, the distribution information group 142, and the learning data group 143 are the same as those in the first embodiment.

振分部１２１が実行する処理は、実施例１と一部処理が異なる。具体的には、分析処理の実行を指示する方法が実施例１と異なる。 The processing executed by the distribution unit 121 is partly different from that of the first embodiment. Specifically, the method for instructing execution of the analysis process is different from that in the first embodiment.

例えば、ステップＳ４１０において、振分部１２１は、分析処理の種別、算出された特徴量を引数として含む分析処理の実行指示を分析装置６０１に送信する。 For example, in step S410, the allocating unit 121 transmits an analysis processing execution instruction including the type of analysis processing and the calculated feature amount as arguments to the analysis apparatus 601.

また、分析処理毎に分析装置６０１を有する構成であってもよい。この場合、振分部１２１は、分析装置６０１の識別情報、及び分析処理の種別を含む情報を保持する。振分部１２１は、分析処理の実行を指示する場合、当該情報に基づいて、選択された分析処理を実行する分析装置６０１を特定し、特定された分析装置６０１に算出された特徴量を引数として含む分析処理の実行指示を送信する。 Moreover, the structure which has the analyzer 601 for every analysis process may be sufficient. In this case, the distribution unit 121 holds the identification information of the analysis device 601 and information including the type of analysis processing. When instructing the execution of the analysis process, the allocating unit 121 specifies the analysis device 601 that executes the selected analysis process based on the information, and uses the feature amount calculated by the specified analysis device 601 as an argument. The execution instruction of the analysis process included as is transmitted.

実施例２は、実施例１と同一の効果を奏する。また、振分装置６００及び分析装置６０１を別々の装置にすることによって、分析装置６０１の追加及び削除の制約がないため、システムの構成を柔軟に変更することができる。また、既存の計算機システムに振分装置６００を追加することによって、本発明の効果を有する計算機システムを実現できる。 The second embodiment has the same effect as the first embodiment. In addition, since the distribution device 600 and the analysis device 601 are separate devices, there is no restriction on addition and deletion of the analysis device 601, so that the system configuration can be flexibly changed. Further, by adding the distribution device 600 to the existing computer system, a computer system having the effects of the present invention can be realized.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. Further, for example, the above-described embodiments are described in detail for easy understanding of the present invention, and are not necessarily limited to those provided with all the described configurations. In addition, a part of the configuration of the embodiment can be added to, deleted from, or replaced with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるＣＰＵが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. The present invention can also be realized by software program codes that implement the functions of the embodiments. In this case, a storage medium in which the program code is recorded is provided to the computer, and a CPU included in the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the program code itself and the storage medium storing it constitute the present invention. As a storage medium for supplying such a program code, for example, a flexible disk, CD-ROM, DVD-ROM, hard disk, SSD (Solid State Drive), optical disk, magneto-optical disk, CD-R, magnetic tape, A non-volatile memory card, ROM, or the like is used.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 The program code for realizing the functions described in the present embodiment can be implemented by a wide range of programs or script languages such as assembler, C / C ++, perl, Shell, PHP, Java (registered trademark).

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ−ＲＷ、ＣＤ−Ｒ等の記憶媒体に格納し、コンピュータが備えるＣＰＵが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, by distributing the program code of the software that implements the functions of the embodiments via a network, the program code is stored in a storage means such as a hard disk or memory of a computer or a storage medium such as a CD-RW or CD-R The CPU included in the computer may read and execute the program code stored in the storage unit or the storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above-described embodiments, the control lines and information lines indicate what is considered necessary for the explanation, and not all control lines and information lines on the product are necessarily shown. All the components may be connected to each other.

１００データセンタ
１０１端末
１０２ＮＷ装置
１０３計算機
１０４分析装置
１０５外部ＮＷ
１１０ＣＰＵ
１１１メモリ
１１２記憶装置
１１３Ｉ／Ｆ
１２０特徴量算出部
１２１振分部
１２２分析制御部
１３０記憶部
１４０ログ情報
１４１特徴量情報群
１４２振分情報群
１４３学習データ群
２００ＤｏＳ攻撃分析部
２０１Ｕ２Ｒ攻撃分析部
２０２Ｒ２Ｌ攻撃分析部
２０３Ｐｒｏｂｅ攻撃分析部
２１０パケット特徴量情報
２１１フロー特徴量情報
２１２周期特徴量情報
２２０パケット用振分情報
２２１フロー用振分情報
２２２周期用振分情報
２３０ＤｏＳ攻撃分析用学習データ
２３１Ｕ２Ｒ攻撃分析用学習データ
２３２Ｒ２Ｌ攻撃分析用学習データ
２３３Ｐｒｏｂｅ攻撃分析用学習データ
６００振分装置
６０１分析装置 100 Data Center 101 Terminal 102 NW Device 103 Computer 104 Analysis Device 105 External NW
110 CPU
111 Memory 112 Storage device 113 I / F
DESCRIPTION OF SYMBOLS 120 Feature-value calculation part 121 Distribution part 122 Analysis control part 130 Storage part 140 Log information 141 Feature-value information group 142 Distribution information group 143 Learning data group 200 DoS attack analysis part 201 U2R attack analysis part 202 R2L attack analysis part 203 Probe Attack analysis unit 210 Packet feature amount information 211 Flow feature amount information 212 Periodic feature amount information 220 Packet distribution information 221 Flow distribution information 222 Periodic distribution information 230 Learning data for DoS attack analysis 231 Learning data for U2R attack analysis 232 R2L attack analysis learning data 233 Probe attack analysis learning data 600 Distribution device 601 Analysis device

Claims

A computer system comprising a plurality of computers,
Each of the plurality of computers has a processor, a memory connected to the processor, an interface connected to the processor and connected to another device,
The plurality of computers are:
A computer having an analysis control unit including a plurality of analysis units that perform analysis processing using data transmitted and received via a network;
A feature amount calculation unit that calculates a feature amount used by each of the plurality of analysis units using the data, and a computer that includes a distribution unit that selects the analysis unit to be used.
The distribution unit manages distribution information including a plurality of entries including the feature amount used by each of the plurality of analysis units, the type of the analysis process, and a result of the analysis process,
The result of the analysis process is a value indicating whether or not the analysis process is necessary,
The distribution unit is
With reference to the distribution information, a similar entry including a feature amount similar to the feature amount calculated from the data is searched,
Based on the result of the analysis process included in the similar entry, the analysis process to be executed is selected,
The analysis unit that executes the analysis process selected by the allocating unit performs an analysis process that uses the feature amount calculated from the data, and transmits the result of the analysis process to the allocating unit.
When the distribution unit receives the result of the analysis process, the distribution unit adds an entry including the feature amount calculated from the data, the type of the analysis process, and the result of the analysis process to the distribution information. A featured computer system.

The computer system according to claim 1,
The feature amount calculation unit calculates a plurality of feature amounts of different types used by each of the plurality of analysis units,
The entry included in the distribution information includes the plurality of feature amounts,
The distribution unit is
A distance between the plurality of feature amounts included in the plurality of entries and the plurality of feature amounts calculated from the data in a feature amount space having one type of feature amount as one component is calculated. ,
A computer system characterized in that an entry having the smallest distance is specified as the similar entry among entries whose distance is equal to or less than a predetermined threshold.

The computer system according to claim 2,
The distribution unit selects all of the analysis processes when there is no entry whose distance is equal to or less than the predetermined threshold.

The computer system according to claim 2,
The distribution unit is
A first entry having a minimum distance is identified from entries including a value indicating that the analysis process is unnecessary, as a result of the analysis process;
A second entry having a minimum distance is identified from entries including a value indicating that the analysis process is necessary, as a result of the analysis process;
A first distance between a plurality of feature amounts included in the first entry and a plurality of feature amounts calculated from the data is equal to or less than a first threshold, and the plurality of feature amounts included in the second entry; If the second distance between the plurality of feature amounts calculated from the data is greater than a second threshold, the first entry is identified as the similar entry;
If the first distance is greater than the first threshold and the second distance is less than or equal to the second threshold, the second entry is identified as the similar entry;
The computer system characterized by specifying the second entry as the similar entry when the first distance is equal to or less than the first threshold and the second distance is equal to or less than the second threshold.

A computer system according to claim 4, wherein
The computer system according to claim 1, wherein the first threshold value is smaller than the second threshold value.

A method for analyzing data in a computer system comprising a plurality of computers,
Each of the plurality of computers has a processor, a memory connected to the processor, an interface connected to the processor and connected to another device,
The plurality of computers are:
A computer having an analysis control unit including a plurality of analysis units that perform analysis processing using data transmitted and received via a network;
A feature amount calculation unit that calculates a feature amount, and a calculator that includes a distribution unit that selects the analysis unit to be used.
The distribution unit manages distribution information including a plurality of entries including the feature amount used by each of the plurality of analysis units, the type of the analysis process, and a result of the analysis process,
The result of the analysis process is a value indicating whether or not the analysis process is necessary,
The data analysis processing method is:
A first step in which the feature amount calculation unit calculates a feature amount used by each of the plurality of analysis units using the data;
A second step in which the allocating unit searches for a similar entry including a feature quantity similar to the feature quantity calculated from the data with reference to the distribution information;
A third step in which the allocating unit selects the analysis process to be executed based on a result of the analysis process included in the similar entry;
The analysis unit that executes the analysis process selected by the allocating unit executes an analysis process that uses the feature amount calculated from the data, and transmits a result of the analysis process to the allocating unit. And the steps
When the distribution unit receives the result of the analysis process, the distribution unit adds an entry including the feature amount calculated from the data, the type of the analysis process, and the result of the analysis process to the distribution information. And a data analysis method comprising the steps of:

The data analysis method according to claim 6, comprising:
The entry included in the distribution information includes a plurality of feature amounts of different types,
The first step includes a step in which the feature amount calculation unit calculates the plurality of feature amounts used by each of the plurality of analysis units,
The second step includes
The allocating unit includes a plurality of feature amounts included in the plurality of entries and a plurality of feature amounts calculated from the data in a feature amount space having one type of feature amount as one component. A sixth step of calculating a distance between;
The distribution unit includes: a seventh step of identifying, as the similar entry, an entry having the smallest distance among entries whose distance is equal to or less than a predetermined threshold. Method.

The data analysis method according to claim 7, comprising:
The second step includes a method of analyzing data, wherein the distribution unit includes a step of selecting all the analysis processes when there is no entry whose distance is equal to or less than the predetermined threshold.

The data analysis method according to claim 7, comprising:
The seventh step includes
The allocating unit identifying a first entry having a minimum distance among entries including a value indicating that the result of the analysis process does not require the analysis process;
The allocating unit identifying a second entry having the minimum distance among entries including a value indicating that the analysis process requires the analysis process; and
A first distance between a plurality of feature amounts included in the first entry and a plurality of feature amounts calculated from the data is equal to or less than a first threshold, and the plurality of feature amounts included in the second entry; When the second distance between the plurality of feature amounts calculated from the data is greater than a second threshold, the allocating unit identifies the first entry as the similar entry;
When the first distance is greater than the first threshold and the second distance is less than or equal to the second threshold, the allocating unit identifies the second entry as the similar entry;
When the first distance is equal to or smaller than the first threshold and the second distance is equal to or smaller than the second threshold, the allocating unit specifies the second entry as the similar entry. Data analysis method characterized by

The data analysis method according to claim 9, comprising:
The data analysis method, wherein the first threshold value is smaller than the second threshold value.

A computer comprising a processor, a memory connected to the processor, an interface connected to the processor and connected to another device,
The calculator is
An analysis control unit including a plurality of analysis units that perform analysis processing using data transmitted and received via a network;
A feature value calculation unit that calculates a feature value used by the analysis unit using the data;
A sorting unit for selecting the analysis unit to be used,
Holding distribution information including a plurality of entries including the feature amount used by each of the plurality of analysis units, the type of the analysis process, and a result of the analysis process;
The result of the analysis process is a value indicating whether or not the analysis process is necessary,
The distribution unit is
With reference to the distribution information, a similar entry including a feature amount similar to the feature amount calculated from the data is searched,
Based on the result of the analysis process included in the similar entry, the analysis process to be executed is selected,
The analysis unit that executes the analysis process selected by the allocating unit executes an analysis process that uses the feature amount calculated from the data, and outputs the result of the analysis process to the allocating unit;
When the distribution unit receives the result of the analysis process, the distribution unit adds an entry including the feature amount calculated from the data, the type of the analysis process, and the result of the analysis process to the distribution information. A featured calculator.

The computer according to claim 11, wherein
The feature amount calculation unit calculates a plurality of feature amounts of different types used by each of the plurality of analysis units,
The entry included in the distribution information includes the plurality of feature amounts,
The distribution unit is
A distance between the plurality of feature amounts included in the plurality of entries and the plurality of feature amounts calculated from the data in a feature amount space having one type of feature amount as one component is calculated. ,
A computer characterized in that an entry having the smallest distance is specified as the similar entry among entries whose distance is equal to or less than a predetermined threshold.

The computer according to claim 12, comprising:
The distribution unit selects all the analysis processes when there is no entry whose distance is equal to or less than the predetermined threshold.

The computer according to claim 12, comprising:
The distribution unit is
A first entry having a minimum distance is identified from entries including a value indicating that the analysis process is unnecessary, as a result of the analysis process;
A second entry having a minimum distance is identified from entries including a value indicating that the analysis process is necessary, as a result of the analysis process;
A first distance between a plurality of feature amounts included in the first entry and a plurality of feature amounts calculated from the data is equal to or less than a first threshold, and the plurality of feature amounts included in the second entry; If the second distance between the plurality of feature amounts calculated from the data is greater than a second threshold, the first entry is identified as the similar entry;
If the first distance is greater than the first threshold and the second distance is less than or equal to the second threshold, the second entry is identified as the similar entry;
The computer is characterized in that when the first distance is equal to or less than the first threshold and the second distance is equal to or less than the second threshold, the second entry is specified as the similar entry.

The computer according to claim 14, wherein
The computer according to claim 1, wherein the first threshold value is smaller than the second threshold value.