WO2015114804A1

WO2015114804A1 - Unauthorized-access detection method and detection system

Info

Publication number: WO2015114804A1
Application number: PCT/JP2014/052288
Authority: WO
Inventors: 進芹田; 雅之吉野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2014-01-31
Filing date: 2014-01-31
Publication date: 2015-08-06
Anticipated expiration: 2016-07-31
Also published as: JP6039826B2; JPWO2015114804A1

Abstract

Existing techniques for detecting unauthorized network access by malware-infected computers or the like cannot generate effective URL regular expressions from small samples of malicious URLs. On the basis of feature quantities for past network accesses and malicious URLs obtained from malware analysis results, this invention expands the sample of malicious URLs by searching an access log for URLs similar to said malicious URLs and generates a URL regular expression. Said URL regular expression is added to detection rules to detect unauthorized access.

Description

Unauthorized access detection method and system

　本発明は、マルウェアに感染したコンピュータなどが行う不正なネットワークアクセスを検知する技術に関する。 The present invention relates to technology for detecting unauthorized network access performed by malware-infected computers.

　組織内のコンピュータに感染したマルウェアは、攻撃者が用意した外部のサーバと通信を行い、新たなマルウェアのダウンロードやコンピュータから取得した情報のアップロードなどを行う。一般にこれらの活動にかかわる通信を不正アクセスと呼ぶ。 Malware that infects computers in the organization communicates with external servers prepared by attackers to download new malware and upload information obtained from computers. Communication related to these activities is generally called unauthorized access.

　不正アクセスを検知する手法として、ＵＲＬブラックリストを用いた方法が知られている。ＵＲＬブラックリストは、既知のマルウェアのアクセスで利用されたＵＲＬ（悪性ＵＲＬと呼ぶ）をリスト化したものである。ＵＲＬブラックリストをファイアウォール、ＩＤＳ/ＩＰＳ、プロキシサーバなどのセキュリティ装置に登録することで、ブラックリストに含まれるＵＲＬで特定される外部サーバへのアクセスを検知することができる。不正アクセスを検知した場合、アクセスを中断することで、被害の拡大を防ぐことができる。このような技術は、一般にウェブフィルタリングと呼ばれる。　
　しかし、攻撃者はブラックリストによる検知を逃れるために、マルウェアが通信するＵＲＬを意図的に変化させる。例えば、ＵＲＬの一部に乱数を組み込むなどの手法が知られている。また、攻撃者は、既存のマルウェアを再利用して、攻撃に利用することがある。そのため、ＵＲＬに含まれるドメインは異なるが、パス部分は同一あるいは類似している場合がある。このような変化したＵＲＬはブラックリストとの完全一致検索では検知できない。変化したＵＲＬに対応する方法として、ＵＲＬを正規表現で表現する技術が知られている。正規表現は、文字列の集合を一つの文字列で表現する方法の一つである。特許文献１では、正規表現生成の候補となる複数のＵＲＬサンプルから、文字列の頻度情報をもとにＵＲＬ正規表現を生成する方法が開示されている。 As a technique for detecting unauthorized access, a method using a URL blacklist is known. The URL blacklist is a list of URLs (called malignant URLs) used for accessing known malware. By registering the URL blacklist in a security device such as a firewall, IDS / IPS, or proxy server, it is possible to detect access to an external server specified by the URL included in the blacklist. When unauthorized access is detected, it is possible to prevent the damage from spreading by interrupting access. Such a technique is generally called web filtering.
However, the attacker intentionally changes the URL with which the malware communicates in order to escape detection by the blacklist. For example, a technique of incorporating a random number into a part of a URL is known. In addition, an attacker may reuse existing malware for attacks. Therefore, although the domains included in the URL are different, the path portions may be the same or similar. Such a changed URL cannot be detected by an exact match search with the black list. As a method for dealing with a changed URL, a technique for expressing a URL with a regular expression is known. Regular expressions are one method for expressing a set of character strings as a single character string. Patent Document 1 discloses a method for generating a URL regular expression from a plurality of URL samples that are candidates for regular expression generation based on frequency information of character strings.

米国特許出願公開第２００９／０２６５７８６号明細書US Patent Application Publication No. 2009/0265786

　上記した特許文献１によると、変化した悪性ＵＲＬを検知することができる。しかし、特許文献１の手法は、サンプルに現れる文字列の頻度情報をもとに正規表現を生成するため、ある程度以上の数のＵＲＬサンプルを必要とする。サンプル数が少ない場合、検知に有効なＵＲＬ正規表現を生成することはできない。 According to the above-mentioned Patent Document 1, it is possible to detect a changed malicious URL. However, since the method of Patent Document 1 generates a regular expression based on frequency information of character strings appearing in the sample, it requires a certain number of URL samples. When the number of samples is small, a URL regular expression effective for detection cannot be generated.

　本発明は、上記の問題点を考慮し、正規表現の元になるＵＲＬサンプルが少ない場合であっても、検知に有効なＵＲＬ正規表現を生成し不正アクセスを検知することを目的とする。 The present invention has been made in consideration of the above problems, and an object of the present invention is to generate a URL regular expression effective for detection and detect unauthorized access even when there are few URL samples from which a regular expression is based.

　上記課題を解決するために本発明では、マルウェア解析結果から得られるマルウェアのアクセス挙動のトレースから不正アクセス検知のためのＵＲＬ正規表現を生成して検知ルールを更新する不正アクセスの検知方法において、解析者が採集した、またはネットワーク上から感染した新たなマルウェアのアクセス挙動のトレース解析からマルウェア特徴量を抽出するステップと、過去のネットワーク上のアクセスログからアクセス特徴量を随時抽出して、記録したアクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似ＵＲＬを検索するステップと、マルウェア特徴量の接続先ＵＲＬ、および前記検索された類似ＵＲＬよりＵＲＬ正規表現を生成するステップと、前記ＵＲＬ正規表現を前記アクセスログに含まれる接続先ＵＲＬとのパターンマッチングに適用して、その一致率を計算し、該一致率が推奨値以下を満たした場合に、前記ＵＲＬ正規表現を新たな検知ルールに設定するステップとを有することを特徴とする不正アクセス検知方法を提案した。 In order to solve the above-described problems, the present invention analyzes an unauthorized access detection method that generates a URL regular expression for detecting unauthorized access from a trace of malware access behavior obtained from a malware analysis result and updates a detection rule. That extract malware features from trace analysis of access behavior of new malware collected by a user or infected from the network, and access that is recorded by extracting access features from past access logs on the network Searching a similar URL satisfying a distance within a predetermined threshold from a feature amount storage unit using a malware feature amount as a query, and generating a URL regular expression from the connection destination URL of the malware feature amount and the searched similar URL And accessing the URL regular expression with the step Applying the pattern matching with the connection destination URL included in the group, calculating the matching rate, and setting the URL regular expression to a new detection rule when the matching rate satisfies a recommended value or less; A method for detecting unauthorized access is proposed.

　また、上記課題を解決するために本発明では、前記不正アクセス検知方法において、前記アクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似ＵＲＬを検索するステップは、前記マルウェア特徴量の接続先ＵＲＬ以外の特徴量と対応するアクセス特徴量の接続先ＵＲＬ以外の特徴量との間に定義した距離関数値が所定の閾値より小さくなる場合に、類似アクセス特徴量と判定する第１のステップと、前記マルウェア特徴量の接続先ＵＲＬと前記類似アクセス特徴量に含まれる接続先ＵＲＬとの間に定義した文字列間の相違の距離関数値が所定の閾値より小さくなる場合に、類似ＵＲＬと判定して検索する第２のステップとよりなることを特徴とする。 Further, in order to solve the above-described problem, in the present invention, in the unauthorized access detection method, the step of searching the access feature amount storage unit for a similar URL satisfying a distance within a predetermined threshold using a malware feature amount as a query, When the distance function value defined between the feature amount other than the connection destination URL of the malware feature amount and the feature amount other than the connection destination URL of the corresponding access feature amount is smaller than a predetermined threshold, The distance function value of the difference between character strings defined between the first step of determining and the connection destination URL of the malware feature amount and the connection destination URL included in the similar access feature amount is smaller than a predetermined threshold value. In this case, it is characterized by comprising the second step of searching for a similar URL.

　また、上記課題を解決するために本発明では、インターネットに接続するネットワークに接続された複数のサーバ上に構成された不正アクセス検知システムを、クライアントに感染した、または採集した新たなマルウェアを仮想的試験環境で実行して、マルウェアのアクセス挙動のトレースを生成するマルウェア解析機能と、前記マルウェアのアクセス挙動のトレースより、マルウェア特徴量を抽出するマルウェア特徴量抽出機能と、クライアントの過去のアクセスログを記憶管理して、アクセスログよりアクセス特徴量を適宜抽出してアクセス特徴量記憶部へ記憶するアクセス特徴量抽出機能と、前記アクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似ＵＲＬを検索する類似ＵＲＬ検索機能と、マルウェア特徴量の接続先ＵＲＬ、および前記検索された類似ＵＲＬよりＵＲＬ正規表現を生成し、前記ＵＲＬ正規表現を前記アクセスログに含まれる接続先ＵＲＬとのパターンマッチングに適用して、その一致率を計算し、該一致率が推奨値以下を満たした場合に、前記ＵＲＬ正規表現を新たな検知ルールに加える正規表現生成機能と、前記ＵＲＬ正規表現を加えて更新した検知ルールを、アクセス対象のＵＲＬに適用して、不正アクセスか否かを判定する悪性ＵＲＬ検知機能とを備えて構成した。 Further, in order to solve the above-mentioned problems, in the present invention, an unauthorized access detection system configured on a plurality of servers connected to a network connected to the Internet is used to virtually detect new malware that has infected or collected clients. A malware analysis function that generates a trace of malware access behavior by executing it in a test environment, a malware feature extraction function that extracts malware features from the trace of malware access behavior, and a client's past access log An access feature value extraction function that appropriately manages and extracts an access feature value from an access log and stores it in an access feature value storage unit, and a malware feature value as a query from the access feature value storage unit within a predetermined threshold A similar URL search function for searching for similar URLs that satisfy a distance; A URL regular expression is generated from the connection destination URL of the malware feature amount and the searched similar URL, and the URL regular expression is applied to pattern matching with the connection destination URL included in the access log, and the matching rate is calculated. When the match rate is less than or equal to the recommended value, a regular expression generation function for adding the URL regular expression to a new detection rule, and a detection rule updated by adding the URL regular expression are updated as URLs to be accessed. And a malicious URL detection function for judging whether or not unauthorized access is made.

　本発明により、不正ＵＲＬのサンプルが少量の場合であっても、効果的なＵＲＬ正規表現を生成し、不正アクセスを検知することができる。 According to the present invention, an effective URL regular expression can be generated and unauthorized access can be detected even if there are a small number of unauthorized URL samples.

本実施形態の不正アクセス検知システムのシステム構成の例を示した図である。It is the figure which showed the example of the system configuration | structure of the unauthorized access detection system of this embodiment. マルウェア解析サーバ、検知ルール設定サーバ、ログ管理サーバ、プロキシサーバが有する機能の関連を説明した図である。It is the figure explaining the relationship of the function which a malware analysis server, a detection rule setting server, a log management server, and a proxy server have. マルウェア特徴量の例を示した図である。It is the figure which showed the example of the malware feature-value. アクセス特徴量の例を示した図である。It is the figure which showed the example of the access feature-value. 検知ルール管理情報の例を示した図である。It is the figure which showed the example of detection rule management information. マルウェア特徴量抽出機能の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of the malware feature-value extraction function. ＵＲＬ正規表現生成機能の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of a URL regular expression production | generation function. 類似ＵＲＬ検索機能の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of a similar URL search function.

　以下、本発明を実施するための形態（以下、「実施形態」という。）について、適宜図面を参照しつつ説明する。 Hereinafter, modes for carrying out the present invention (hereinafter referred to as “embodiments”) will be described with reference to the drawings as appropriate.

　図１は、本実施形態の不正アクセス検知システム１００のシステム構成の例を示した図である。図１に示すように、本システムは、プロキシサーバ１２１、ログ管理サーバ１２２、マルウェア解析サーバ１２３、検知ルール設定サーバ１２４を含み、各装置はネットワーク１０１を介して相互に接続されて構成される。 FIG. 1 is a diagram illustrating an example of a system configuration of an unauthorized access detection system 100 according to the present embodiment. As shown in FIG. 1, this system includes a proxy server 121, a log management server 122, a malware analysis server 123, and a detection rule setting server 124, and each device is configured to be connected to each other via a network 101.

　不正アクセス検知システム１００は、ある組織内に敷設されたローカルエリアネットワーク１２０に複数のクライアント１２５と共に接続されている。ローカルエリアネットワーク１２０は、ファイアウォール１３０、ネットワーク１０１を介してインターネット１１０に接続されている。 The unauthorized access detection system 100 is connected together with a plurality of clients 125 to a local area network 120 installed in a certain organization. The local area network 120 is connected to the Internet 110 via the firewall 130 and the network 101.

　インターネット１１０上の攻撃者サーバ１１１は、ネットワークに接続する前記組織などに対して攻撃を行う者が利用するサーバである。攻撃者は、マルウェアを組織内に侵入させることに成功すると、攻撃者サーバ１１１を使い組織のクライアント１２５に感染したマルウェアと通信を行う。その結果、新たなマルウェアの送信や、組織内から取得したファイルの受信などを行う。攻撃者サーバ１１１はインターネット１１０上に複数設置される。 The attacker server 111 on the Internet 110 is a server used by an attacker who attacks the organization connected to the network. When the attacker succeeds in infiltrating the malware into the organization, the attacker uses the attacker server 111 to communicate with the malware infected with the client 125 of the organization. As a result, new malware is transmitted and files acquired from within the organization are received. A plurality of attacker servers 111 are installed on the Internet 110.

　ファイアウォール１３０は、ローカルエリアネットワーク１２０とインターネット１１０との間で、互いのネットワークを行き来するパケットの中から、特定の条件に合ったパケットを破棄（遮断）あるいは許可（通過）する機能を備える。特にプロキシサーバ１２１を経由しないパケットを破棄することで、ローカルエリアネットワーク１２０からインターネット１１０へ向かう全てのアクセスをプロキシサーバ１２１経由で行うことができる。 The firewall 130 has a function of discarding (blocking) or permitting (passing) a packet that meets a specific condition from among packets traveling between the local area network 120 and the Internet 110. In particular, by discarding packets that do not pass through the proxy server 121, all accesses from the local area network 120 to the Internet 110 can be performed through the proxy server 121.

　プロキシサーバ１２１は、クライアント１２５とインターネット上のサーバ間のパケットのやり取りを中継する。プロキシサーバ１２１に悪性ＵＲＬを登録しておくことで、不正アクセスを検知することができる。不正アクセスを検知した場合、アクセスを中止することで、攻撃者との通信を遮断することができる。また、プロキシサーバ１２１は、クライアント１２５が行ったアクセスの履歴を全て記録する機能を備える。この記録をアクセスログと呼ぶ。プロキシサーバ１２１の処理の詳細は、図２で説明する。 The proxy server 121 relays packet exchange between the client 125 and a server on the Internet. By registering a malicious URL in the proxy server 121, unauthorized access can be detected. When unauthorized access is detected, communication with the attacker can be blocked by canceling access. In addition, the proxy server 121 has a function of recording all the history of accesses performed by the client 125. This record is called an access log. Details of the processing of the proxy server 121 will be described with reference to FIG.

　ログ管理サーバ１２２は、プロキシサーバ１２１が出力するアクセスログを保存し、正規表現ＵＲＬの生成に利用するＵＲＬを検索する機能を備える。ログ管理サーバ１２２の処理の詳細は、図２で説明する。 The log management server 122 has a function of storing an access log output by the proxy server 121 and searching for a URL used for generating a regular expression URL. Details of the processing of the log management server 122 will be described with reference to FIG.

　マルウェア解析サーバ１２３は、仮想環境などでマルウェアを実行し、ネットワークアクセスの振る舞いなどを記録する機能を備える。図１では、マルウェア解析サーバ１２３は、ローカルエリアネットワークに接続されているが、インターネット１１０上に接続されてもよい。マルウェア解析サーバ１２３の処理の詳細は、図２で説明する。 The malware analysis server 123 has a function of executing malware in a virtual environment and recording network access behavior. In FIG. 1, the malware analysis server 123 is connected to the local area network, but may be connected to the Internet 110. Details of the processing of the malware analysis server 123 will be described with reference to FIG.

　検知ルール設定サーバ１２４は、マルウェア解析サーバ１２３が記録したマルウェアのネットワークアクセスの振る舞いを記録したデータから抽出した特徴量と、ログ管理サーバ１２２が有するアクセスログから抽出した特徴量を用いてＵＲＬ正規表現を生成する機能を備える。さらに、生成したＵＲＬ正規表現をプロキシサーバ１２１へ検知ルールとして設定する機能を備える。 The detection rule setting server 124 uses the feature amount extracted from the data recorded by the malware analysis server 123 to record the network access behavior of the malware, and the URL regular expression using the feature amount extracted from the access log of the log management server 122. The function to generate. Furthermore, a function of setting the generated URL regular expression as a detection rule in the proxy server 121 is provided.

　クライアント１２５は、ネットワーク１０１を介してインターネット１１０にアクセスする機能を備える。クライアント１２５は、偽造メールに添付された実行ファイルを実行するなどして、マルウェアに感染する可能性がある。マルウェアに感染したクライアント１２５は、正規のユーザに気づかれずに、攻撃者と通信を行う。 The client 125 has a function of accessing the Internet 110 via the network 101. The client 125 may be infected with malware by executing an executable file attached to the forged mail. The client 125 infected with malware communicates with an attacker without being noticed by a legitimate user.

　ネットワークに接続されたこれらの各装置のハードウェア構成は、少なくともＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）、ハードディスクドライブなどの補助記憶装置、ＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）などの主記憶装置、キーボードやマウスといった入力装置、ディスプレイなどの出力装置と接続されるＩ（Ｉｎｐｕｔ）／Ｏ（Ｏｕｔｐｕｔ）インターフェース、ローカルエリアネット１２０およびインターネット１１０に接続するためのネットワークインターフェースなどを備える。 The hardware configuration of each of these devices connected to the network includes at least a main storage device such as a CPU (Central Processing Unit), an auxiliary storage device such as a hard disk drive, a ROM (Read Only Memory), a RAM (Random Access Memory), An input device such as a keyboard and a mouse, an I (Input) / O (Output) interface connected to an output device such as a display, a network interface for connecting to the local area network 120 and the Internet 110 are provided.

　図２を参照して、マルウェア解析サーバ１２３、検知ルール設定サーバ１２４、ログ管理サーバ１２２、プロキシサーバ１２１および各サーバが連携して行う処理の概要について説明する。 Referring to FIG. 2, an outline of the malware analysis server 123, the detection rule setting server 124, the log management server 122, the proxy server 121, and processing performed by each server in cooperation with each other will be described.

　マルウェア解析サーバ１２３は、マルウェア解析機能２３１を備える。マルウェア解析機能２３１は、マルウェア解析サーバ１２３上で仮想的試験環境でマルウェアを実行し、マルウェアが行うファイルの生成、レジストリの変更、ネットワークを介したアクセスの挙動などを記録する。特に、本発明では、ネットワークを介したアクセスの記録を利用する。アクセスの記録には、マルウェアが行ったアクセスが時系列で記録される。各アクセスの記録は、アクセスした時刻、送信したパケットを含む。パケットを解析することで、接続先ＵＲＬ、接続先ＩＰアドレス、接続先ポート、送信元ポート、プロトコル、Ｕｓｅｒ－Ａｇｅｎｔなどの情報を取得できる。このようなマルウェア解析機能２３１は、一般に動的解析と呼ばれる技術で実現できる。本実施形態では、既存の動的解析の技術をマルウェア解析サーバ１２３に実装して、マルウェアをデバッガやエミュレータにより実行して、マルウェアの制御フローのトレースを記録する。 The malware analysis server 123 includes a malware analysis function 231. The malware analysis function 231 executes malware on the malware analysis server 123 in a virtual test environment, and records file generation, registry change, access behavior via the network, and the like performed by the malware. In particular, the present invention uses a record of access via a network. In the access record, accesses made by malware are recorded in chronological order. Each access record includes the access time and the transmitted packet. By analyzing the packet, information such as a connection destination URL, a connection destination IP address, a connection destination port, a transmission source port, a protocol, and a User-Agent can be acquired. Such a malware analysis function 231 can be realized by a technique generally called dynamic analysis. In the present embodiment, the existing dynamic analysis technology is installed in the malware analysis server 123, the malware is executed by a debugger or an emulator, and a trace of the malware control flow is recorded.

　マルウェア解析サーバ１２３にマルウェアを用意する方法は大きく２通りある。一つは、マルウェア解析サーバ１２３とは別のコンピュータ(クライアント１２５)で発見されたマルウェアを、手動でマルウェア解析サーバ１２３へコピー(マルウェア検体)する方法である。もう一つは、マルウェア解析サーバ１２３を攻撃者が狙いやすい場所に設置し(例えば、外部からアクセスし易い場所、ファイアウォール１３０の外側のネットワーク１０１上に設置する。)、マルウェアに感染させる方法である。この方法は、一般にハニーポットと呼ばれる。 There are two main methods for preparing malware in the malware analysis server 123. One is a method of manually copying (malware specimen) malware found on a computer (client 125) different from the malware analysis server 123 to the malware analysis server 123. The other is a method in which the malware analysis server 123 is installed in a place where an attacker can easily aim (for example, installed on the network 101 outside the firewall 130 where access is easy from the outside), and infecting with malware. . This method is generally called a honeypot.

　検知ルール設定サーバ１２４は、マルウェア特徴量３００抽出機能２４１、正規表現生成機能２４２、検知ルール管理情報記憶部２４３、検知ルール設定機能２４４などを備える。
マルウェア特徴量３００抽出機能２４１は、マルウェア解析機能２３１が出力したマルウェアの解析結果(マルウェアが行ったアクセスの時系列の記録)から、ＵＲＬ正規表現の生成に利用する情報を抽出する。マルウェア特徴量３００の詳細は、図３で、処理の詳細は、図６で説明する。 The detection rule setting server 124 includes a malware feature amount 300 extraction function 241, a regular expression generation function 242, a detection rule management information storage unit 243, a detection rule setting function 244, and the like.
The malware feature quantity 300 extraction function 241 extracts information used for generating a URL regular expression from the analysis result of the malware output by the malware analysis function 231 (the time series of accesses made by the malware). Details of the malware feature 300 will be described with reference to FIG. 3, and details of the processing will be described with reference to FIG.

　正規表現生成機能２４２は、マルウェア特徴量３００抽出機能２４１が出力した情報をもとに、ログ管理サーバ１２２へ問合せを行い、ＵＲＬ正規表現を生成するための類似ＵＲＬの集合を取得する。取得した類似ＵＲＬの集合から正規表現の候補を生成し、アクセスログ２２１を利用してＵＲＬ正規表現５０２を生成する。生成したＵＲＬ正規表現５０２を検知ルール管理情報記憶部２４３に保存する。正規表現生成機能２４２の処理の詳細は、図７で説明する。 The regular expression generation function 242 makes an inquiry to the log management server 122 based on the information output by the malware feature quantity 300 extraction function 241 and acquires a set of similar URLs for generating a URL regular expression. A regular expression candidate is generated from the acquired set of similar URLs, and a URL regular expression 502 is generated using the access log 221. The generated URL regular expression 502 is stored in the detection rule management information storage unit 243. Details of the processing of the regular expression generation function 242 will be described with reference to FIG.

　検知ルール管理情報記憶部２４３は、正規表現生成機能２４２が生成したＵＲＬ正規表現とＵＲＬ正規表現を適用する装置の情報などを含む。詳細は図５で説明する。
検知ルール設定機能２４４は、正規表現生成機能２４２が生成したＵＲＬ正規表現をプロキシサーバ１２１に設定する機能を備える。 The detection rule management information storage unit 243 includes a URL regular expression generated by the regular expression generation function 242 and information on a device to which the URL regular expression is applied. Details will be described with reference to FIG.
The detection rule setting function 244 has a function of setting the URL regular expression generated by the regular expression generation function 242 in the proxy server 121.

　ログ管理サーバ１２２は、アクセスログ記憶部２２１、アクセス特徴量抽出機能２２２、アクセス特徴量記憶部２２３、類似ＵＲＬ検索機能２２４などを備える。
アクセスログ記憶部２２１は、プロキシサーバ１２１が出力したアクセスログ２１３を例えば１年以上に亘って記録したものを含む。アクセスログ記憶部２２１は、クライアント１２５がアクセスを行った日時、アクセスを行ったクライアント１２５のＩＰアドレス、アクセスを行った接続先ＵＲＬ、アクセスに利用したＵｓｅｒ－Ａｇｅｎｔ、リファラ、送信したパケットのサイズ、受信したパケットのサイズなどを含む。一般に、負荷分散などの理由からプロキシサーバ１２１は複数設置される。そのため、出力されるアクセスログも複数ファイルに分割される。ログ管理サーバ１２２は、これら分割されたログをマージして保存する。 The log management server 122 includes an access log storage unit 221, an access feature amount extraction function 222, an access feature amount storage unit 223, a similar URL search function 224, and the like.
The access log storage unit 221 includes a record of the access log 213 output from the proxy server 121 over, for example, one year or more. The access log storage unit 221 includes the date and time when the client 125 accessed, the IP address of the client 125 that accessed, the connection destination URL that accessed, the User-Agent used for the access, the referer, the size of the transmitted packet, Includes the size of the received packet. Generally, a plurality of proxy servers 121 are installed for reasons such as load distribution. For this reason, the output access log is also divided into a plurality of files. The log management server 122 merges and stores these divided logs.

　アクセス特徴量抽出機能２２２は、アクセスログ２２１を解析し、一連のアクセスに対してアクセス特徴量２２３を算出する。膨大なアクセスログデータから、必要時にアクセス特徴量を算出すると時間を要するので、アクセスログ２２１が記録された際に適宜アクセス特徴量２２３を算出する。
アクセス特徴量記憶部２２３は、類似ＵＲＬを検索するのに必要な情報を含む。詳細は図４で説明する。
類似ＵＲＬ検索機能２２４は、正規表現生成機能２４２が正規表現を生成するのに必要な類似ＵＲＬの集合を検索する。処理の詳細は、図７で説明する。 The access feature amount extraction function 222 analyzes the access log 221 and calculates an access feature amount 223 for a series of accesses. Since it takes time to calculate an access feature amount from a large amount of access log data when necessary, the access feature amount 223 is appropriately calculated when the access log 221 is recorded.
The access feature amount storage unit 223 includes information necessary for searching for similar URLs. Details will be described with reference to FIG.
The similar URL search function 224 searches for a set of similar URLs necessary for the regular expression generation function 242 to generate a regular expression. Details of the processing will be described with reference to FIG.

　プロキシサーバ１２１は、悪性ＵＲＬ検知機能２１１、検知ルール記憶部２１２を備える。
悪性ＵＲＬ検知機能２１１は、クライアント１２５がアクセスしようとするＵＲＬが予め設定したＵＲＬ正規表現に一致するか否かを比較し、一致する場合は不正アクセスと判定し、アクセスを中止するなどの制御を行う。
検知ルール２１２は、プロキシサーバ１２１がクライアント１２５のアクセスを許可するか遮断するかの判断を行うためのルールを含む。ＵＲＬ正規表現も検知ルールの一つだが、パケットのサイズやプロトコルの種類などによる検知ルール２１２も有する。例えば、クライアント１２５がアクセスを試みるインターネット上の相手のＵＲＬがＵＲＬ正規表現に一致し、パケットサイズが１ＭＢ以上の場合は、アクセスを中止するなど複合的なルールを設定できる。 The proxy server 121 includes a malicious URL detection function 211 and a detection rule storage unit 212.
The malicious URL detection function 211 compares whether or not the URL to be accessed by the client 125 matches a URL regular expression set in advance. If the URL matches, the malicious URL detection function 211 determines that the access is unauthorized and stops the access. Do.
The detection rule 212 includes a rule for the proxy server 121 to determine whether to permit or block access to the client 125. A URL regular expression is one of the detection rules, but also includes a detection rule 212 based on a packet size, a protocol type, and the like. For example, if the URL of the other party on the Internet that the client 125 tries to access matches the URL regular expression and the packet size is 1 MB or more, a complex rule such as stopping access can be set.

　図３を参照して、検知ルール設定サーバ１２４のマルウェア特徴量抽出機能２４１がマルウェア解析機能２３１が出力したマルウェアの解析結果(マルウェアが行ったアクセスの時系列の記録)から抽出したマルウェア特徴量３００の例について説明する。マルウェア特徴量３００は、マルウェアＩＤ３０１、接続先ＵＲＬリスト３０２、平均パケットサイズ３０３、アクセス時間間隔３０４、Ｕｓｅｒ－Ａｇｅｎｔ３０５、Ｐｏｓｔ回数３０６などを含む。 Referring to FIG. 3, malware feature quantity 300 extracted from the malware analysis result (time series of accesses made by malware) output by malware analysis function 231 by malware feature quantity extraction function 241 of detection rule setting server 124. An example will be described. The malware feature amount 300 includes a malware ID 301, a connection destination URL list 302, an average packet size 303, an access time interval 304, a User-Agent 305, a Post count 306, and the like.

　マルウェアＩＤ３０１は、マルウェアを一意に特定するための識別子である。例えば、ＭＤ５などのハッシュ値をマルウェアＩＤ３０１として利用する。接続先ＵＲＬリスト３０２は、マルウェア解析機能２３１が解析したマルウェアがアクセスしたＵＲＬのリストである。これらは、悪性ＵＲＬと判定される。平均パケットサイズ３０３は、一連のマルウェアのアクセスの中でマルウェアが送信したパケットの平均サイズである。アクセス時間間隔３０４は、一連のマルウェアのアクセスの時間パターンを表す量である。例えば、アクセス間隔の平均時間などが利用できる。より高度な時間パターンを表す量として、アクセス時刻の周期性などを抽出することも考えられる。Ｕｓｅｒ－Ａｇｅｎｔは、アクセスを行ったプログラムを特定するための識別子である。ＰＯＳＴ回数３０６は、一連のマルウェアのアクセスでＰＯＳＴを行った回数である。これらマルウェア特徴量はマルウェアの挙動を特徴付ける量であり、マルウェア特徴量と類似の特徴量を持つアクセスはマルウェアによるアクセスである可能性が高い。 The malware ID 301 is an identifier for uniquely identifying malware. For example, a hash value such as MD5 is used as the malware ID 301. The connection destination URL list 302 is a list of URLs accessed by malware analyzed by the malware analysis function 231. These are determined to be malicious URLs. The average packet size 303 is an average size of packets transmitted by malware in a series of malware accesses. The access time interval 304 is an amount representing a time pattern of access of a series of malware. For example, the average time of access intervals can be used. It may be possible to extract the periodicity of the access time as an amount representing a more advanced time pattern. User-Agent is an identifier for specifying the program that has accessed. The number of POSTs 306 is the number of times POST is performed by a series of malware accesses. These malware feature amounts are features that characterize the behavior of malware, and an access having a feature amount similar to the malware feature amount is likely to be an access by malware.

　図４を参照して、ログ管理サーバ１２２のアクセス特徴量抽出機能２２２がアクセスログ２２１から抽出するアクセス特徴量２２３の例について説明する。アクセス特徴量２２３は、セッションＩＤ４０１、イベントＩＤリスト４０２、接続先ＵＲＬリスト４０３、平均パケットサイズ４０４、アクセス時間間隔４０５、Ｕｓｅｒ－Ａｇｅｎｔ４０６、Ｐｏｓｔ回数４０７などを含む。 With reference to FIG. 4, an example of the access feature quantity 223 extracted from the access log 221 by the access feature quantity extraction function 222 of the log management server 122 will be described. The access feature amount 223 includes a session ID 401, an event ID list 402, a connection destination URL list 403, an average packet size 404, an access time interval 405, a User-Agent 406, a Post count 407, and the like.

　セッションＩＤ４０１は、クライアント１２５が行った一連の繋がりを持ったアクセスを特定するための識別子である。イベントＩＤリスト４０２は、セッションＩＤで識別されるセッションに属するイベントのリストを特定するための識別子である。ここでイベントとは、アクセスログ記憶部２２１に含まれる一つのアクセスを指す。接続先ＵＲＬリスト４０３は、セッション中にクライアント１２５がアクセスしたＵＲＬを記録したリストである。平均パケットサイズ４０４は、セッション中にクライアント１２５が送信したパケットの平均サイズである。アクセス時間間隔４０５は、一連のマルウェアのアクセスの時間パターンを表す量である。マルウェア特徴量３００のアクセス時間と同様の量が利用できる。Ｕｓｅｒ－Ａｇｅｎｔ４０６は、アクセスを行ったプログラムを特定するための識別子である。ＰＯＳＴ回数４０７は、セッション中にクライアント１２５が送信したＰＯＳＴアクセスの回数である。 The session ID 401 is an identifier for specifying an access having a series of connections made by the client 125. The event ID list 402 is an identifier for specifying a list of events belonging to the session identified by the session ID. Here, the event refers to one access included in the access log storage unit 221. The connection destination URL list 403 is a list in which URLs accessed by the client 125 during a session are recorded. The average packet size 404 is an average size of packets transmitted by the client 125 during the session. The access time interval 405 is an amount representing a time pattern of access of a series of malware. An amount similar to the access time of the malware feature amount 300 can be used. User-Agent 406 is an identifier for identifying the program that has accessed. The POST count 407 is the number of POST accesses transmitted by the client 125 during the session.

　以下で、セッションの決め方について説明する。まず、アクセスログ記憶部２２１に含まれる各イベントをクライアント１２５ごとに分類する。クライアント１２５はイベントに含まれる送信元ＩＰや、ユーザの認証情報により特定される。次に、クライアント１２５で分類されたイベントをセッションに分類する。アクセスの時間間隔の差異が予め決めた閾値（例えば３０分）を超えた場合に別のセッションと判定する。さらに、Ｕｓｅｒ－Ａｇｅｎｔが異なるイベントは別セッションと判定する。これにより、アクセスログ１２５は複数のセッションに分解される。 The following explains how to decide a session. First, each event included in the access log storage unit 221 is classified for each client 125. The client 125 is specified by the source IP included in the event and user authentication information. Next, the events classified by the client 125 are classified into sessions. When the difference in the access time interval exceeds a predetermined threshold (for example, 30 minutes), it is determined as another session. Furthermore, an event with a different User-Agent is determined as another session. Thereby, the access log 125 is decomposed into a plurality of sessions.

　図５を参照して、正規表現生成機能２４２にて作成した検知ルール管理情報２４３の例について説明する。検知ルール管理情報記憶部２４３は、ルールＩＤ５０１、ＵＲＬ正規表現５０２、対象装置ＩＤ５０３、対策５０４、設定日５０５などを含む。
ルールＩＤ５０１は、検知ルール２４３を一意に特定するための識別子である。ＵＲＬ正規表現５０２は、プロキシサーバ１２１が、クライアント１２５がアクセスする対象のＵＲＬを不正アクセスとして判定するための接続先ＵＲＬを正規表現で表したものである。対象装置ＩＤ５０３は、検知ルールを適用する装置を識別するための情報である。例えば、プロキシサーバ１２１のＩＰアドレスなどが利用できる。対策５０４は、クライアント１２５の接続先ＵＲＬがＵＲＬ正規表現に一致した場合に対象装置(プロキシサーバ)の悪性ＵＲＬ検知機能２１１が行う制御の内容である。例えば、通信の遮断や、管理者へ通知など対策として利用できる。設定日は、検知ルールを設定した日時を表す。設定日を利用することで、設定してから一定期間経過したルールは削除するなどの運用が可能になる。 An example of the detection rule management information 243 created by the regular expression generation function 242 will be described with reference to FIG. The detection rule management information storage unit 243 includes a rule ID 501, a URL regular expression 502, a target device ID 503, a countermeasure 504, a setting date 505, and the like.
The rule ID 501 is an identifier for uniquely specifying the detection rule 243. The URL regular expression 502 represents the connection destination URL for the proxy server 121 to determine the URL to be accessed by the client 125 as unauthorized access in regular expression. The target device ID 503 is information for identifying a device to which the detection rule is applied. For example, the IP address of the proxy server 121 can be used. The countermeasure 504 is the content of control performed by the malicious URL detection function 211 of the target device (proxy server) when the connection destination URL of the client 125 matches the URL regular expression. For example, it can be used as a countermeasure such as blocking communication or notifying the administrator. The setting date represents the date and time when the detection rule is set. By using the set date, it is possible to operate such as deleting a rule that has passed for a certain period after setting.

　図６のフローチャートを参照して、検知ルール設定サーバ１２４のマルウェア特徴量抽出機能２４１の処理の流れの例を説明する。
ステップＳ６０１において、マルウェア特徴量抽出機能２４１は、マルウェア解析機能２３１が出力したマルウェア解析結果(マルウェアが行ったアクセスの時系列の記録)を読み込む。複数のマルウェアを解析した場合は、マルウェア解析結果も複数存在する。 With reference to the flowchart of FIG. 6, an example of the flow of processing of the malware feature amount extraction function 241 of the detection rule setting server 124 will be described.
In step S601, the malware feature amount extraction function 241 reads the malware analysis result (the time series of accesses made by the malware) output by the malware analysis function 231. When analyzing multiple malware, there are multiple malware analysis results.

　ステップＳ６０２において、マルウェア特徴量抽出機能２４１は、マルウェア解析機能２３１が出力したマルウェア解析結果から図３に示すマルウェア特徴量３００を抽出する。前述したように、マルウェア解析結果は、ＯＳのＡＰＩ呼び出しや、ネットワークアクセスのログを含む。マルウェア特徴量３００は、プロキシサーバ１２１が出力したアクセスログの検索に利用される。そのため、アクセスログに含まれないＯＳのＡＰＩ呼び出しなどの情報は除外し、ネットワークアクセスに関するログを選択する。その後、選択したログを解析し、図３に含まれる情報を抽出する。 In step S602, the malware feature quantity extraction function 241 extracts the malware feature quantity 300 shown in FIG. 3 from the malware analysis result output by the malware analysis function 231. As described above, the malware analysis result includes an OS API call and a network access log. The malware feature 300 is used for searching the access log output by the proxy server 121. For this reason, information such as an API call of the OS that is not included in the access log is excluded and a log related to network access is selected. Thereafter, the selected log is analyzed, and information included in FIG. 3 is extracted.

　ステップＳ６０３において、マルウェア特徴量抽出機能２４１は、重複するマルウェア特徴量３００を除外する。マルウェア解析機能２３１が複数のマルウェアに対して解析を行った場合、ステップＳ６０２において、複数のマルウェア特徴量３００が抽出される。その中には、マルウェアのハッシュ値(マルウェアＩＤ３０１)は異なるが、他のマルウェア特徴量３００は同一なマルウェアが存在する可能性がある。その場合、いずれか一つのマルウェア特徴量３００のみを選択する。
ステップＳ６０４において、マルウェア特徴量抽出機能２４１は、ステップＳ６０３で重複を除外したマルウェア特徴量３００を正規表現生成機能２４２に送信する。 In step S603, the malware feature amount extraction function 241 excludes the duplicate malware feature amount 300. When the malware analysis function 231 analyzes a plurality of malware, a plurality of malware feature quantities 300 are extracted in step S602. Among them, although the malware hash values (malware ID 301) are different, there is a possibility that the same malware exists in the other malware feature amount 300. In that case, only one malware feature 300 is selected.
In step S604, the malware feature amount extraction function 241 transmits the malware feature amount 300 from which duplication is excluded in step S603 to the regular expression generation function 242.

図７のフローチャートを参照して、正規表現生成機能２４２の処理の流れの例を説明する。
ステップＳ７０１において、正規表現生成機能２４２は、マルウェア特徴量抽出機能２４１が出力したマルウェア特徴量３００を取得する。複数のマルウェアを解析した場合は、複数のマルウェア特徴量３００を取得する。 An example of the processing flow of the regular expression generation function 242 will be described with reference to the flowchart of FIG.
In step S 701, the regular expression generation function 242 acquires the malware feature quantity 300 output by the malware feature quantity extraction function 241. When a plurality of malwares are analyzed, a plurality of malware feature quantities 300 are acquired.

ステップＳ７０２において、正規表現生成機能２４２は、ステップＳ７０１で取得したマルウェア特徴量３００をログ管理サーバ１２２の類似ＵＲＬ検索機能２２４に送信する。マルウェア特徴量３００を受信した類似ＵＲＬ検索機能２２４は、マルウェア特徴量３００をクエリにして、アクセス特徴量記憶部２２３から類似ＵＲＬを検索する。検索した結果、該当する類似ＵＲＬを正規表現生成機能２４２に送信する。類似ＵＲＬ検索機能２２４の詳細は、図８で説明する。
ステップＳ７０３において、正規表現生成機能２４２は、類似ＵＲＬ検索機能２２４が送信した類似ＵＲＬを受信する。 In step S 702, the regular expression generation function 242 transmits the malware feature quantity 300 acquired in step S 701 to the similar URL search function 224 of the log management server 122. The similar URL search function 224 that has received the malware feature quantity 300 searches the access feature quantity storage unit 223 for a similar URL using the malware feature quantity 300 as a query. As a result of the search, the corresponding similar URL is transmitted to the regular expression generation function 242. Details of the similar URL search function 224 will be described with reference to FIG.
In step S 703, the regular expression generation function 242 receives the similar URL transmitted by the similar URL search function 224.

ステップＳ７０４において、正規表現生成機能２４２は、類似ＵＲＬの集合からＵＲＬ正規表現を生成する。複数の文字列から正規表現を生成する方法として、特許文献１に記載される方法を採用する。
上記手法を使うことで、類似ＵＲＬの集合から、正規表現を生成することができる。しかし、上記手法も含めて一般にある複数の文字列を表現する正規表現は一意には定まらない。例えば、http://www.sample.com/path[a-zA-Z]{5,10}.exeと、http://www.sample.com/[a-zA-Z]*.exeという二つの正規表現を比較すると、前者で表現されるＵＲＬは全て後者でも表現できる。その意味で、後者は粒度が荒い正規表現と言える。粒度が荒い正規表現をプロキシサーバ１２１の検知ルール記憶部２１２に記憶して、クライアント１２５がアクセスしようとするＵＲＬを検知すると、悪性ＵＲＬと誤認識する確率が高くなる。そのため、正規表現の粒度は細かい方が望ましい。　
　ステップＳ７０５において、正規表現生成機能２４２は、アクセスログ２２１を使って正規表現の粒度を確認する。まず、ステップＳ７０４で生成したＵＲＬ正規表現を、ステップＳ７０３で受信した類似ＵＲＬ以外のアクセスログ２２１のＵＲＬとのパターンマッチングに適用し、その一致率を計算する。ここで、予め一致率の推奨値（例えば３％）を決めておく。そして、計算した一致率が推奨値より大きくなった場合は、正規表現の粒度が荒すぎるので、ステップＳ７０４で生成した該当正規表現を破棄する。 In step S704, the regular expression generation function 242 generates a URL regular expression from a set of similar URLs. As a method for generating a regular expression from a plurality of character strings, the method described in Patent Document 1 is adopted.
By using the above method, a regular expression can be generated from a set of similar URLs. However, a regular expression that expresses a plurality of general character strings including the above method is not uniquely determined. For example, http://www.sample.com/path[a-zA-Z]{5,10}.exe and http://www.sample.com/[a-zA-Z]*.exe Comparing two regular expressions, all URLs expressed in the former can be expressed in the latter. In that sense, the latter is a regular expression with coarse grain. If a regular expression with a coarse granularity is stored in the detection rule storage unit 212 of the proxy server 121 and a URL that the client 125 attempts to access is detected, the probability of erroneous recognition as a malicious URL increases. Therefore, it is desirable that the regular expression has a finer granularity.
In step S 705, the regular expression generation function 242 uses the access log 221 to check the regular expression granularity. First, the URL regular expression generated in step S704 is applied to pattern matching with the URL of the access log 221 other than the similar URL received in step S703, and the matching rate is calculated. Here, a recommended value (for example, 3%) of the matching rate is determined in advance. If the calculated match rate is larger than the recommended value, the regular expression granularity is too rough, and the corresponding regular expression generated in step S704 is discarded.

　ステップＳ７０６において、正規表現生成機能２４２は、ステップＳ７０５で粒度を確認したＵＲＬ正規表現のうち、アクセスログ２２１のＵＲＬとの一致率が推奨値以下となるＵＲＬ正規表現のみを検知ルール管理情報記憶部２４３に保存する。ここで、ＵＲＬ正規表現に、対象装置ＩＤ、対策、設定日などを追加する。対象装置ＩＤ、対策は予め決められた値を登録してもよいし、正規表現生成結果を確認した管理者が手動で登録してもよい。 In step S706, the regular expression generation function 242 detects only the URL regular expression whose matching rate with the URL of the access log 221 is equal to or less than the recommended value among the URL regular expressions whose granularity is confirmed in step S705. Save to 243. Here, the target device ID, countermeasure, set date, etc. are added to the URL regular expression. For the target device ID and the countermeasure, a predetermined value may be registered, or an administrator who has confirmed the regular expression generation result may manually register the value.

　図８のフローチャートを参照して、ログ管理サーバ１２２の類似ＵＲＬ検索機能２２４の処理の流れの例を説明する。
ステップＳ８０１において、類似ＵＲＬ検索機能２２４は、正規表現生成機能２４２が送信したマルウェア特徴量３００を受信する。 An example of the processing flow of the similar URL search function 224 of the log management server 122 will be described with reference to the flowchart of FIG.
In step S801, the similar URL search function 224 receives the malware feature quantity 300 transmitted by the regular expression generation function 242.

ステップＳ８０２において、類似ＵＲＬ検索機能２２４は、マルウェア特徴量３００の接続先ＵＲＬ以外の特徴量を検索キーとして、アクセス特徴量記憶部２２３に記憶されたアクセス特徴量を検索して、数１の距離関数が予め設定した閾値より小さくなる場合に、マルウェア特徴量３００とアクセス特徴量２２３は類似していると見做して、類似アクセス特徴量の集合を取得する。
マルウェア特徴量Ｃmとアクセス特徴量Ｃaの間に定義する距離関数d(Ｃm,Ｃa)(数１)は、接続先ＵＲＬ以外の対応する特徴量の差の絶対値に特徴量ごとの重み係数wfを掛けて、それらの線形和を距離として用いる。このとき、Ｕｓｅｒ－Ａｇｅｎｔなど数値的でない特徴量は、値が一致であれば０、不一致であれば１とする離散距離を用いる。
（数１）　d(Ｃm,Ｃa)＝wf1・|cm_pk－ca_pk|＋wf2・|cm_atime－ca_atime|
＋wf3・ｆ(cm_ua,ca_ua)＋wf4・|cm_pfreq－ca_pfreq|＋‥‥
　ここで、cm_pk，cm_atime，cm_ua，cm_pfreq：マルウェア特徴量(平均パケットサイズ、アクセス時間間隔、Ｕｓｅｒ－Ａｇｅｎｔ、Ｐｏｓｔ回数)、ca_pk，ca_atime，ca_ua，ca_pfreq：アクセス特徴量(平均パケットサイズ、アクセス時間間隔、Ｕｓｅｒ－Ａｇｅｎｔ、Ｐｏｓｔ回数)、wf1，wf2，wf3，wf4：特徴量ごとの重み係数、ｆ(cm_ua,ca_ua)：特徴量(Ｕｓｅｒ－Ａｇｅｎｔ)の距離関数＝０(cm_ua＝ca_ua) or １(cm_ua≠ca_ua)。
　検索の結果、該当するアクセス特徴量(セッション)の集合を取得する。 In step S 802, the similar URL search function 224 searches the access feature amount stored in the access feature amount storage unit 223 using a feature amount other than the connection destination URL of the malware feature amount 300 as a search key, When the function is smaller than a preset threshold, it is assumed that the malware feature quantity 300 and the access feature quantity 223 are similar, and a set of similar access feature quantities is acquired.
The distance function d (Cm, Ca) (Equation 1) defined between the malware feature quantity Cm and the access feature quantity Ca is a weighting factor wf for each feature quantity in the absolute value of the corresponding feature quantity difference other than the connection destination URL. And use their linear sum as the distance. At this time, a non-numerical feature quantity such as User-Agent uses a discrete distance of 0 if the values match and 1 if the values do not match.
(Equation 1) d (Cm, Ca) = wf1 · | cm_pk−ca_pk | + wf2 · | cm_atime−ca_atime |
+ Wf3 · f (cm_ua, ca_ua) + wf4 · | cm_pfreq−ca_pfreq | +
Here, cm_pk, cm_atime, cm_ua, cm_pfreq: malware feature (average packet size, access time interval, User-Agent, Post count), ca_pk, ca_atime, ca_ua, ca_pfreq: access feature (average packet size, access time interval) , User-Agent, Post count), wf1, wf2, wf3, wf4: weighting factor for each feature quantity, f (cm_ua, ca_ua): distance function of feature quantity (User-Agent) = 0 (cm_ua = ca_ua) or 1 (cm_ua ≠ ca_ua).
As a result of the search, a set of corresponding access feature values (sessions) is acquired.

　ステップＳ８０３において、類似ＵＲＬ検索機能２２４は、ステップＳ８０２で取得した類似アクセス特徴量(セッション)に含まれる接続先ＵＲＬを取得する。
  ステップＳ８０４において、類似ＵＲＬ検索機能２２４は、ステップＳ８０３で取得した接続先ＵＲＬの中から、ステップＳ８０１で取得したマルウェア特徴量が含む接続先ＵＲＬと類似する接続先ＵＲＬを取得する。具体的には、マルウェア特徴量が含む接続先ＵＲＬと、類似アクセス特徴量(セッション)に含まれる接続先ＵＲＬに対して、距離関数を定義して、あらかじめ設定した閾値より距離が小さい組を類似とみなす。距離関数として、文字列の近さを図る「編集距離（レーベンシュタイン距離とも呼ばれる）」などが利用できる。
  編集距離(レーベンシュタイン距離)は、二つの文字列がどの程度異なっているかを示す数値である。具体的には、文字の挿入や削除、置換によって、一つの文字列を別の文字列に変形するのに必要な手順の最小回数として与えられる。
  ステップＳ８０５において、類似ＵＲＬ検索機能２２４は、ステップＳ８０４で検索した類似ＵＲＬを正規表現生成機能２４２へ送信する。 In step S803, the similar URL search function 224 acquires a connection destination URL included in the similar access feature amount (session) acquired in step S802.
In step S804, the similar URL search function 224 acquires a connection destination URL similar to the connection destination URL included in the malware feature amount acquired in step S801 from the connection destination URLs acquired in step S803. Specifically, a distance function is defined for the connection destination URL included in the malware feature quantity and the connection destination URL included in the similar access feature quantity (session), and a pair whose distance is smaller than a preset threshold is similar. It is considered. As the distance function, “edit distance (also called Levenshtein distance)” that makes a character string close can be used.
The edit distance (Levenstein distance) is a numerical value indicating how different two character strings are. Specifically, it is given as the minimum number of steps required to transform one character string into another character string by inserting, deleting, or replacing characters.
In step S805, the similar URL search function 224 transmits the similar URL searched in step S804 to the regular expression generation function 242.

　以上に記載したように、本実施形態の不正アクセス検知システム１００は、図２に示す通り、マルウェア解析サーバ１２３にユーザがマルウェアコピー(マルウェア検体)を投入することにより、又はマルウェア解析サーバ１２３がネットワーク１０１を介してマルウェアに感染したと判定した場合に、新たなＵＲＬ正規表現の検知ルールを作成する処理が起動される。
マルウェア解析サーバ１２３は、投入、又は感染したマルウェアのアクセスの挙動を解析して、アクセスの記録を蓄積する。検知ルール設定サーバ１２４は、マルウェアの解析結果(マルウェアが行ったアクセスの時系列の記録)からマルウェア特徴量を抽出して、ログ管理サーバ１２２が管理するローカルエリアネットワーク上のクライアント１２５の過去のアクセスログから類似のアクセス特徴量を検索することを指示する。ログ管理サーバ１２２は、過去のアクセスログからアクセス特徴量を適宜抽出して記憶しておいて、マルウェア特徴量と類似するアクセス特徴量からマルウェア特徴量が含む接続先ＵＲＬと類似するアクセス特徴量が含む接続先ＵＲＬを抽出して、類似ＵＲＬとして検知ルール設定サーバ１２４へ報告する。検知ルール設定サーバ１２４は、マルウェア特徴量が含む接続先ＵＲＬと類似ＵＲＬとに基づいて、新たなＵＲＬ正規表現を作成して、その新たなＵＲＬ正規表現が検知ルールとして適当であるか否かをアクセスログに含まれるＵＲＬとの一致率を計算して判定する。算出した一致率が推奨値以下と判定した場合に、新たなＵＲＬ正規表現を検知ルール管理情報記憶部２４３に保存して、そのＵＲＬ正規表現をプロキシサーバ１２１の検知ルール記憶部２１２へ設定して、それ以後の悪性ＵＲＬ検知機能に使用される。 As described above, the unauthorized access detection system 100 according to the present embodiment, as shown in FIG. 2, allows the user to insert a malware copy (malware sample) into the malware analysis server 123, or the malware analysis server 123 is connected to the network. When it is determined that the computer is infected with malware via 101, a process for creating a new URL regular expression detection rule is started.
The malware analysis server 123 analyzes the access behavior of the input or infected malware, and accumulates access records. The detection rule setting server 124 extracts the malware feature amount from the malware analysis result (the time series of the access performed by the malware), and the past access of the client 125 on the local area network managed by the log management server 122. Instruct to search similar access feature quantity from log. The log management server 122 appropriately extracts and stores the access feature amount from the past access log, and the access feature amount similar to the connection destination URL included in the malware feature amount from the access feature amount similar to the malware feature amount. The included connection destination URL is extracted and reported to the detection rule setting server 124 as a similar URL. The detection rule setting server 124 creates a new URL regular expression based on the connection destination URL and the similar URL included in the malware feature, and determines whether or not the new URL regular expression is appropriate as the detection rule. Judgment is made by calculating the matching rate with the URL included in the access log. When it is determined that the calculated matching rate is equal to or less than the recommended value, a new URL regular expression is stored in the detection rule management information storage unit 243, and the URL regular expression is set in the detection rule storage unit 212 of the proxy server 121. , Used for the subsequent malicious URL detection function.

　本実施形態の不正アクセス検知システム１００は、プロキシサーバ１２１、ログ管理サーバ１２２、マルウェア解析サーバ１２３、検知ルール設定サーバ１２４に分けられて構成される例を示した。しかし、これらの各サーバのうちいずれかの複数のサーバが同一のサーバ上に構成される例も考えられる。また、例えばプロキシサーバ１２１などが複数のサーバの分散処理により構成される例も考えられる。また、全てのサーバが同一のサーバ上に構成される例も考えられる。 In the present embodiment, the unauthorized access detection system 100 is divided into a proxy server 121, a log management server 122, a malware analysis server 123, and a detection rule setting server 124. However, an example in which any one of these servers is configured on the same server is also conceivable. Further, for example, an example in which the proxy server 121 or the like is configured by distributed processing of a plurality of servers can be considered. An example in which all servers are configured on the same server is also conceivable.

１００：不正アクセス検知システム、１０１：ネットワーク、１１０：インターネット、１１１：攻撃者サーバ、１２０：ローカルエリアネットワーク、１２１：プロキシサーバ、１２２：ログ管理サーバ、１２３：マルウェア解析サーバ、１２４：検知ルール設定サーバ、１２５：クライアント、１３０：ファイアウォール、２１１：悪性ＵＲＬ検知機能、２１２：検知ルール、２１３：アクセスログ、２２１：アクセスログ、２２２：アクセス特徴量抽出機能、２３３：アクセス特徴量、２２４：類似ＵＲＬ検索機能、２３１：マルウェア解析機能、２４１：マルウェア特徴量抽出機能、２４２：正規表現生成機能、２４３：検知ルール管理情報、２４４：検知ルール設定機能、３００：マルウェア特徴量
100: Unauthorized access detection system 101: Network 110: Internet 111: Attacker server 120: Local area network 121: Proxy server 122: Log management server 123: Malware analysis server 124: Detection rule setting server , 125: client, 130: firewall, 211: malicious URL detection function, 212: detection rule, 213: access log, 221: access log, 222: access feature amount extraction function, 233: access feature amount, 224: similar URL search Function, 231: malware analysis function, 241: malware feature extraction function, 242: regular expression generation function, 243: detection rule management information, 244: detection rule setting function, 300: malware feature

Claims

A method for detecting unauthorized access by generating a URL regular expression for detecting unauthorized access from a trace of malware access behavior obtained from a malware analysis result, and updating a detection rule,
Extracting malware features from trace analysis of access behavior of new malware collected by analysts or infected from the network;
Extracting an access feature amount from the access log on the past network as needed, and searching the recorded access feature amount storage unit for a similar URL satisfying a distance within a predetermined threshold using the malware feature amount as a query;
Generating a URL regular expression from the connection destination URL of the malware feature quantity and the searched similar URL;
The URL regular expression is applied to pattern matching with a connection destination URL included in the access log, and the matching rate is calculated. When the matching rate satisfies a recommended value or less, the URL regular expression is updated to a new one. The steps to set in the detection rule;
A method for detecting unauthorized access, comprising:

The step of searching for a similar URL satisfying a distance within a predetermined threshold from the access feature amount storage unit using a malware feature amount as a query,
When the distance function value defined between the feature amount other than the connection destination URL of the malware feature amount and the feature amount other than the connection destination URL of the corresponding access feature amount is smaller than a predetermined threshold, A first step of determining;
When the distance function value of the difference between the character strings defined between the connection destination URL of the malware feature amount and the connection destination URL included in the similar access feature amount is smaller than a predetermined threshold, it is determined as a similar URL. The unauthorized access detection method according to claim 1, further comprising: a second step of performing a search.

The said malware feature-value and the said access feature-value have the feature-value data item of a connection destination URL, an average packet size, an access time interval, User-Agent, and the frequency | count of Post at least. Unauthorized access detection method.

An unauthorized access detection system configured on a plurality of servers connected to a network connected to the Internet,
A malware analysis function that generates new traces of malware access behavior by running new malware infected or collected in a virtual test environment;
Malware feature amount extraction function for extracting malware feature amount from the trace of malware access behavior;
An access feature amount extraction function for storing and managing a client's past access log, appropriately extracting an access feature amount from the access log, and storing it in an access feature amount storage unit;
A similar URL search function for searching for similar URLs satisfying a distance within a predetermined threshold from the access feature value storage unit using a malware feature value as a query;
A URL regular expression is generated from the connection destination URL of the malware feature amount and the searched similar URL, and the URL regular expression is applied to pattern matching with the connection destination URL included in the access log, and the matching rate is calculated. A regular expression generation function for calculating and adding the URL regular expression to a new detection rule when the matching rate satisfies a recommended value or less;
Applying the detection rule updated by adding the URL regular expression to the URL to be accessed to determine whether or not it is unauthorized access;
An unauthorized access detection system comprising:

The similar URL search function is a distance function value defined between the feature quantity other than the connection destination URL of the malware feature quantity and the feature quantity other than the connection destination URL of the corresponding access feature quantity from the access feature quantity storage unit. If the value is smaller than a predetermined threshold value, it is determined as a similar access feature amount and searched.
When the distance function value of the difference between the character strings defined between the connection destination URL of the malware feature amount and the connection destination URL included in the similar access feature amount is smaller than a predetermined threshold, it is determined as a similar URL. The unauthorized access detection system according to claim 4, wherein search is performed.

6. The malware feature amount and the access feature amount include feature amount data items of at least a connection destination URL, an average packet size, an access time interval, a User-Agent, and a Post count. Unauthorized access detection system.