[go: up one dir, main page]

WO2015114804A1 - Unauthorized-access detection method and detection system - Google Patents

Unauthorized-access detection method and detection system Download PDF

Info

Publication number
WO2015114804A1
WO2015114804A1 PCT/JP2014/052288 JP2014052288W WO2015114804A1 WO 2015114804 A1 WO2015114804 A1 WO 2015114804A1 JP 2014052288 W JP2014052288 W JP 2014052288W WO 2015114804 A1 WO2015114804 A1 WO 2015114804A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
malware
url
feature amount
regular expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2014/052288
Other languages
French (fr)
Japanese (ja)
Inventor
進 芹田
雅之 吉野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to PCT/JP2014/052288 priority Critical patent/WO2015114804A1/en
Priority to JP2015559696A priority patent/JP6039826B2/en
Publication of WO2015114804A1 publication Critical patent/WO2015114804A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures

Definitions

  • the present invention relates to technology for detecting unauthorized network access performed by malware-infected computers.
  • the URL blacklist is a list of URLs (called malignant URLs) used for accessing known malware.
  • a security device such as a firewall, IDS / IPS, or proxy server.
  • Such a technique is generally called web filtering.
  • the attacker intentionally changes the URL with which the malware communicates in order to escape detection by the blacklist. For example, a technique of incorporating a random number into a part of a URL is known. In addition, an attacker may reuse existing malware for attacks.
  • Patent Document 1 discloses a method for generating a URL regular expression from a plurality of URL samples that are candidates for regular expression generation based on frequency information of character strings.
  • Patent Document 1 it is possible to detect a changed malicious URL.
  • the method of Patent Document 1 since the method of Patent Document 1 generates a regular expression based on frequency information of character strings appearing in the sample, it requires a certain number of URL samples. When the number of samples is small, a URL regular expression effective for detection cannot be generated.
  • the present invention has been made in consideration of the above problems, and an object of the present invention is to generate a URL regular expression effective for detection and detect unauthorized access even when there are few URL samples from which a regular expression is based.
  • the present invention analyzes an unauthorized access detection method that generates a URL regular expression for detecting unauthorized access from a trace of malware access behavior obtained from a malware analysis result and updates a detection rule. That extract malware features from trace analysis of access behavior of new malware collected by a user or infected from the network, and access that is recorded by extracting access features from past access logs on the network Searching a similar URL satisfying a distance within a predetermined threshold from a feature amount storage unit using a malware feature amount as a query, and generating a URL regular expression from the connection destination URL of the malware feature amount and the searched similar URL And accessing the URL regular expression with the step Applying the pattern matching with the connection destination URL included in the group, calculating the matching rate, and setting the URL regular expression to a new detection rule when the matching rate satisfies a recommended value or less; A method for detecting unauthorized access is proposed.
  • the step of searching the access feature amount storage unit for a similar URL satisfying a distance within a predetermined threshold using a malware feature amount as a query When the distance function value defined between the feature amount other than the connection destination URL of the malware feature amount and the feature amount other than the connection destination URL of the corresponding access feature amount is smaller than a predetermined threshold, The distance function value of the difference between character strings defined between the first step of determining and the connection destination URL of the malware feature amount and the connection destination URL included in the similar access feature amount is smaller than a predetermined threshold value. In this case, it is characterized by comprising the second step of searching for a similar URL.
  • an unauthorized access detection system configured on a plurality of servers connected to a network connected to the Internet is used to virtually detect new malware that has infected or collected clients.
  • a malware analysis function that generates a trace of malware access behavior by executing it in a test environment, a malware feature extraction function that extracts malware features from the trace of malware access behavior, and a client's past access log
  • An access feature value extraction function that appropriately manages and extracts an access feature value from an access log and stores it in an access feature value storage unit, and a malware feature value as a query from the access feature value storage unit within a predetermined threshold
  • a similar URL search function for searching for similar URLs that satisfy a distance;
  • a URL regular expression is generated from the connection destination URL of the malware feature amount and the searched similar URL, and the URL regular expression is applied to pattern matching with the connection destination URL included in the access log, and the matching rate is calculated.
  • a regular expression generation function for adding the URL regular expression to a new detection rule, and a detection rule updated by adding the URL regular expression are updated as URLs to be accessed.
  • a malicious URL detection function for judging whether or not unauthorized access is made.
  • an effective URL regular expression can be generated and unauthorized access can be detected even if there are a small number of unauthorized URL samples.
  • FIG. 1 is a diagram illustrating an example of a system configuration of an unauthorized access detection system 100 according to the present embodiment.
  • this system includes a proxy server 121, a log management server 122, a malware analysis server 123, and a detection rule setting server 124, and each device is configured to be connected to each other via a network 101.
  • the unauthorized access detection system 100 is connected together with a plurality of clients 125 to a local area network 120 installed in a certain organization.
  • the local area network 120 is connected to the Internet 110 via the firewall 130 and the network 101.
  • the attacker server 111 on the Internet 110 is a server used by an attacker who attacks the organization connected to the network. When the attacker succeeds in infiltrating the malware into the organization, the attacker uses the attacker server 111 to communicate with the malware infected with the client 125 of the organization. As a result, new malware is transmitted and files acquired from within the organization are received.
  • a plurality of attacker servers 111 are installed on the Internet 110.
  • the firewall 130 has a function of discarding (blocking) or permitting (passing) a packet that meets a specific condition from among packets traveling between the local area network 120 and the Internet 110. In particular, by discarding packets that do not pass through the proxy server 121, all accesses from the local area network 120 to the Internet 110 can be performed through the proxy server 121.
  • the proxy server 121 relays packet exchange between the client 125 and a server on the Internet. By registering a malicious URL in the proxy server 121, unauthorized access can be detected. When unauthorized access is detected, communication with the attacker can be blocked by canceling access.
  • the proxy server 121 has a function of recording all the history of accesses performed by the client 125. This record is called an access log. Details of the processing of the proxy server 121 will be described with reference to FIG.
  • the log management server 122 has a function of storing an access log output by the proxy server 121 and searching for a URL used for generating a regular expression URL. Details of the processing of the log management server 122 will be described with reference to FIG.
  • the malware analysis server 123 has a function of executing malware in a virtual environment and recording network access behavior.
  • the malware analysis server 123 is connected to the local area network, but may be connected to the Internet 110. Details of the processing of the malware analysis server 123 will be described with reference to FIG.
  • the detection rule setting server 124 uses the feature amount extracted from the data recorded by the malware analysis server 123 to record the network access behavior of the malware, and the URL regular expression using the feature amount extracted from the access log of the log management server 122. The function to generate. Furthermore, a function of setting the generated URL regular expression as a detection rule in the proxy server 121 is provided.
  • the client 125 has a function of accessing the Internet 110 via the network 101.
  • the client 125 may be infected with malware by executing an executable file attached to the forged mail.
  • the client 125 infected with malware communicates with an attacker without being noticed by a legitimate user.
  • each of these devices connected to the network includes at least a main storage device such as a CPU (Central Processing Unit), an auxiliary storage device such as a hard disk drive, a ROM (Read Only Memory), a RAM (Random Access Memory), An input device such as a keyboard and a mouse, an I (Input) / O (Output) interface connected to an output device such as a display, a network interface for connecting to the local area network 120 and the Internet 110 are provided.
  • a main storage device such as a CPU (Central Processing Unit), an auxiliary storage device such as a hard disk drive, a ROM (Read Only Memory), a RAM (Random Access Memory),
  • An input device such as a keyboard and a mouse, an I (Input) / O (Output) interface connected to an output device such as a display, a network interface for connecting to the local area network 120 and the Internet 110 are provided.
  • a main storage device such as a CPU (Central Processing Unit)
  • auxiliary storage device such as a hard disk drive
  • the malware analysis server 123 includes a malware analysis function 231.
  • the malware analysis function 231 executes malware on the malware analysis server 123 in a virtual test environment, and records file generation, registry change, access behavior via the network, and the like performed by the malware.
  • the present invention uses a record of access via a network.
  • access record accesses made by malware are recorded in chronological order.
  • Each access record includes the access time and the transmitted packet.
  • information such as a connection destination URL, a connection destination IP address, a connection destination port, a transmission source port, a protocol, and a User-Agent can be acquired.
  • Such a malware analysis function 231 can be realized by a technique generally called dynamic analysis.
  • the existing dynamic analysis technology is installed in the malware analysis server 123, the malware is executed by a debugger or an emulator, and a trace of the malware control flow is recorded.
  • malware analysis server 123 There are two main methods for preparing malware in the malware analysis server 123.
  • One is a method of manually copying (malware specimen) malware found on a computer (client 125) different from the malware analysis server 123 to the malware analysis server 123.
  • the other is a method in which the malware analysis server 123 is installed in a place where an attacker can easily aim (for example, installed on the network 101 outside the firewall 130 where access is easy from the outside), and infecting with malware. .
  • This method is generally called a honeypot.
  • the detection rule setting server 124 includes a malware feature amount 300 extraction function 241, a regular expression generation function 242, a detection rule management information storage unit 243, a detection rule setting function 244, and the like.
  • the malware feature quantity 300 extraction function 241 extracts information used for generating a URL regular expression from the analysis result of the malware output by the malware analysis function 231 (the time series of accesses made by the malware). Details of the malware feature 300 will be described with reference to FIG. 3, and details of the processing will be described with reference to FIG.
  • the regular expression generation function 242 makes an inquiry to the log management server 122 based on the information output by the malware feature quantity 300 extraction function 241 and acquires a set of similar URLs for generating a URL regular expression.
  • a regular expression candidate is generated from the acquired set of similar URLs, and a URL regular expression 502 is generated using the access log 221.
  • the generated URL regular expression 502 is stored in the detection rule management information storage unit 243. Details of the processing of the regular expression generation function 242 will be described with reference to FIG.
  • the detection rule management information storage unit 243 includes a URL regular expression generated by the regular expression generation function 242 and information on a device to which the URL regular expression is applied. Details will be described with reference to FIG.
  • the detection rule setting function 244 has a function of setting the URL regular expression generated by the regular expression generation function 242 in the proxy server 121.
  • the log management server 122 includes an access log storage unit 221, an access feature amount extraction function 222, an access feature amount storage unit 223, a similar URL search function 224, and the like.
  • the access log storage unit 221 includes a record of the access log 213 output from the proxy server 121 over, for example, one year or more.
  • the access log storage unit 221 includes the date and time when the client 125 accessed, the IP address of the client 125 that accessed, the connection destination URL that accessed, the User-Agent used for the access, the referer, the size of the transmitted packet, Includes the size of the received packet.
  • a plurality of proxy servers 121 are installed for reasons such as load distribution. For this reason, the output access log is also divided into a plurality of files.
  • the log management server 122 merges and stores these divided logs.
  • the access feature amount extraction function 222 analyzes the access log 221 and calculates an access feature amount 223 for a series of accesses. Since it takes time to calculate an access feature amount from a large amount of access log data when necessary, the access feature amount 223 is appropriately calculated when the access log 221 is recorded.
  • the access feature amount storage unit 223 includes information necessary for searching for similar URLs. Details will be described with reference to FIG.
  • the similar URL search function 224 searches for a set of similar URLs necessary for the regular expression generation function 242 to generate a regular expression. Details of the processing will be described with reference to FIG.
  • the proxy server 121 includes a malicious URL detection function 211 and a detection rule storage unit 212.
  • the malicious URL detection function 211 compares whether or not the URL to be accessed by the client 125 matches a URL regular expression set in advance. If the URL matches, the malicious URL detection function 211 determines that the access is unauthorized and stops the access. Do.
  • the detection rule 212 includes a rule for the proxy server 121 to determine whether to permit or block access to the client 125.
  • a URL regular expression is one of the detection rules, but also includes a detection rule 212 based on a packet size, a protocol type, and the like. For example, if the URL of the other party on the Internet that the client 125 tries to access matches the URL regular expression and the packet size is 1 MB or more, a complex rule such as stopping access can be set.
  • malware feature quantity 300 extracted from the malware analysis result (time series of accesses made by malware) output by malware analysis function 231 by malware feature quantity extraction function 241 of detection rule setting server 124.
  • the malware feature amount 300 includes a malware ID 301, a connection destination URL list 302, an average packet size 303, an access time interval 304, a User-Agent 305, a Post count 306, and the like.
  • the malware ID 301 is an identifier for uniquely identifying malware. For example, a hash value such as MD5 is used as the malware ID 301.
  • the connection destination URL list 302 is a list of URLs accessed by malware analyzed by the malware analysis function 231. These are determined to be malicious URLs.
  • the average packet size 303 is an average size of packets transmitted by malware in a series of malware accesses.
  • the access time interval 304 is an amount representing a time pattern of access of a series of malware. For example, the average time of access intervals can be used. It may be possible to extract the periodicity of the access time as an amount representing a more advanced time pattern.
  • User-Agent is an identifier for specifying the program that has accessed.
  • the number of POSTs 306 is the number of times POST is performed by a series of malware accesses.
  • These malware feature amounts are features that characterize the behavior of malware, and an access having a feature amount similar to the malware feature amount is likely to be an access by malware.
  • the access feature amount 223 includes a session ID 401, an event ID list 402, a connection destination URL list 403, an average packet size 404, an access time interval 405, a User-Agent 406, a Post count 407, and the like.
  • the session ID 401 is an identifier for specifying an access having a series of connections made by the client 125.
  • the event ID list 402 is an identifier for specifying a list of events belonging to the session identified by the session ID. Here, the event refers to one access included in the access log storage unit 221.
  • the connection destination URL list 403 is a list in which URLs accessed by the client 125 during a session are recorded.
  • the average packet size 404 is an average size of packets transmitted by the client 125 during the session.
  • the access time interval 405 is an amount representing a time pattern of access of a series of malware. An amount similar to the access time of the malware feature amount 300 can be used.
  • User-Agent 406 is an identifier for identifying the program that has accessed.
  • the POST count 407 is the number of POST accesses transmitted by the client 125 during the session.
  • each event included in the access log storage unit 221 is classified for each client 125.
  • the client 125 is specified by the source IP included in the event and user authentication information.
  • the events classified by the client 125 are classified into sessions.
  • a predetermined threshold for example, 30 minutes
  • it is determined as another session.
  • an event with a different User-Agent is determined as another session.
  • the access log 125 is decomposed into a plurality of sessions.
  • the detection rule management information storage unit 243 includes a rule ID 501, a URL regular expression 502, a target device ID 503, a countermeasure 504, a setting date 505, and the like.
  • the rule ID 501 is an identifier for uniquely specifying the detection rule 243.
  • the URL regular expression 502 represents the connection destination URL for the proxy server 121 to determine the URL to be accessed by the client 125 as unauthorized access in regular expression.
  • the target device ID 503 is information for identifying a device to which the detection rule is applied. For example, the IP address of the proxy server 121 can be used.
  • the countermeasure 504 is the content of control performed by the malicious URL detection function 211 of the target device (proxy server) when the connection destination URL of the client 125 matches the URL regular expression. For example, it can be used as a countermeasure such as blocking communication or notifying the administrator.
  • the setting date represents the date and time when the detection rule is set. By using the set date, it is possible to operate such as deleting a rule that has passed for a certain period after setting.
  • step S601 the malware feature amount extraction function 241 reads the malware analysis result (the time series of accesses made by the malware) output by the malware analysis function 231.
  • the malware analysis result the time series of accesses made by the malware
  • the malware feature quantity extraction function 241 extracts the malware feature quantity 300 shown in FIG. 3 from the malware analysis result output by the malware analysis function 231.
  • the malware analysis result includes an OS API call and a network access log.
  • the malware feature 300 is used for searching the access log output by the proxy server 121. For this reason, information such as an API call of the OS that is not included in the access log is excluded and a log related to network access is selected. Thereafter, the selected log is analyzed, and information included in FIG. 3 is extracted.
  • the malware feature amount extraction function 241 excludes the duplicate malware feature amount 300.
  • the malware analysis function 231 analyzes a plurality of malware, a plurality of malware feature quantities 300 are extracted in step S602.
  • the malware hash values (malware ID 301) are different, there is a possibility that the same malware exists in the other malware feature amount 300. In that case, only one malware feature 300 is selected.
  • the malware feature amount extraction function 241 transmits the malware feature amount 300 from which duplication is excluded in step S603 to the regular expression generation function 242.
  • step S ⁇ b> 701 the regular expression generation function 242 acquires the malware feature quantity 300 output by the malware feature quantity extraction function 241.
  • the regular expression generation function 242 acquires the malware feature quantity 300 output by the malware feature quantity extraction function 241.
  • a plurality of malwares are analyzed, a plurality of malware feature quantities 300 are acquired.
  • step S ⁇ b> 702 the regular expression generation function 242 transmits the malware feature quantity 300 acquired in step S ⁇ b> 701 to the similar URL search function 224 of the log management server 122.
  • the similar URL search function 224 that has received the malware feature quantity 300 searches the access feature quantity storage unit 223 for a similar URL using the malware feature quantity 300 as a query.
  • the regular expression generation function 242 receives the similar URL transmitted by the similar URL search function 224.
  • step S704 the regular expression generation function 242 generates a URL regular expression from a set of similar URLs.
  • a method for generating a regular expression from a plurality of character strings the method described in Patent Document 1 is adopted.
  • a regular expression can be generated from a set of similar URLs.
  • a regular expression that expresses a plurality of general character strings including the above method is not uniquely determined. For example, http://www.sample.com/path[a-zA-Z] ⁇ 5,10 ⁇ .exe and http://www.sample.com/[a-zA-Z]*.exe Comparing two regular expressions, all URLs expressed in the former can be expressed in the latter.
  • step S ⁇ b> 705 the regular expression generation function 242 uses the access log 221 to check the regular expression granularity.
  • the URL regular expression generated in step S704 is applied to pattern matching with the URL of the access log 221 other than the similar URL received in step S703, and the matching rate is calculated.
  • a recommended value for example, 3%) of the matching rate is determined in advance. If the calculated match rate is larger than the recommended value, the regular expression granularity is too rough, and the corresponding regular expression generated in step S704 is discarded.
  • step S706 the regular expression generation function 242 detects only the URL regular expression whose matching rate with the URL of the access log 221 is equal to or less than the recommended value among the URL regular expressions whose granularity is confirmed in step S705. Save to 243.
  • the target device ID, countermeasure, set date, etc. are added to the URL regular expression.
  • a predetermined value may be registered, or an administrator who has confirmed the regular expression generation result may manually register the value.
  • step S801 the similar URL search function 224 receives the malware feature quantity 300 transmitted by the regular expression generation function 242.
  • step S ⁇ b> 802 the similar URL search function 224 searches the access feature amount stored in the access feature amount storage unit 223 using a feature amount other than the connection destination URL of the malware feature amount 300 as a search key,
  • the distance function d (Cm, Ca) (Equation 1) defined between the malware feature quantity Cm and the access feature quantity Ca is a weighting factor wf for each feature quantity in the absolute value of the corresponding feature quantity difference other than the connection destination URL. And use their linear sum as the distance.
  • a non-numerical feature quantity such as User-Agent uses a discrete distance of 0 if the values match and 1 if the values do not match.
  • the similar URL search function 224 acquires a connection destination URL included in the similar access feature amount (session) acquired in step S802.
  • the similar URL search function 224 acquires a connection destination URL similar to the connection destination URL included in the malware feature amount acquired in step S801 from the connection destination URLs acquired in step S803.
  • a distance function is defined for the connection destination URL included in the malware feature quantity and the connection destination URL included in the similar access feature quantity (session), and a pair whose distance is smaller than a preset threshold is similar. It is considered.
  • “edit distance (also called Levenshtein distance)” that makes a character string close can be used.
  • the edit distance (Levenstein distance) is a numerical value indicating how different two character strings are. Specifically, it is given as the minimum number of steps required to transform one character string into another character string by inserting, deleting, or replacing characters.
  • the similar URL search function 224 transmits the similar URL searched in step S804 to the regular expression generation function 242.
  • the unauthorized access detection system 100 allows the user to insert a malware copy (malware sample) into the malware analysis server 123, or the malware analysis server 123 is connected to the network.
  • a process for creating a new URL regular expression detection rule is started.
  • the malware analysis server 123 analyzes the access behavior of the input or infected malware, and accumulates access records.
  • the detection rule setting server 124 extracts the malware feature amount from the malware analysis result (the time series of the access performed by the malware), and the past access of the client 125 on the local area network managed by the log management server 122. Instruct to search similar access feature quantity from log.
  • the log management server 122 appropriately extracts and stores the access feature amount from the past access log, and the access feature amount similar to the connection destination URL included in the malware feature amount from the access feature amount similar to the malware feature amount.
  • the included connection destination URL is extracted and reported to the detection rule setting server 124 as a similar URL.
  • the detection rule setting server 124 creates a new URL regular expression based on the connection destination URL and the similar URL included in the malware feature, and determines whether or not the new URL regular expression is appropriate as the detection rule. Judgment is made by calculating the matching rate with the URL included in the access log.
  • a new URL regular expression is stored in the detection rule management information storage unit 243, and the URL regular expression is set in the detection rule storage unit 212 of the proxy server 121. , Used for the subsequent malicious URL detection function.
  • the unauthorized access detection system 100 is divided into a proxy server 121, a log management server 122, a malware analysis server 123, and a detection rule setting server 124.
  • a proxy server 121 an example in which any one of these servers is configured on the same server is also conceivable.
  • the proxy server 121 or the like is configured by distributed processing of a plurality of servers can be considered.
  • An example in which all servers are configured on the same server is also conceivable.
  • Unauthorized access detection system 101: Network 110: Internet 111: Attacker server 120: Local area network 121: Proxy server 122: Log management server 123: Malware analysis server 124: Detection rule setting server , 125: client, 130: firewall, 211: malicious URL detection function, 212: detection rule, 213: access log, 221: access log, 222: access feature amount extraction function, 233: access feature amount, 224: similar URL search Function, 231: malware analysis function, 241: malware feature extraction function, 242: regular expression generation function, 243: detection rule management information, 244: detection rule setting function, 300: malware feature

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

Existing techniques for detecting unauthorized network access by malware-infected computers or the like cannot generate effective URL regular expressions from small samples of malicious URLs. On the basis of feature quantities for past network accesses and malicious URLs obtained from malware analysis results, this invention expands the sample of malicious URLs by searching an access log for URLs similar to said malicious URLs and generates a URL regular expression. Said URL regular expression is added to detection rules to detect unauthorized access.

Description

不正アクセスの検知方法および検知システムUnauthorized access detection method and system

 本発明は、マルウェアに感染したコンピュータなどが行う不正なネットワークアクセスを検知する技術に関する。 The present invention relates to technology for detecting unauthorized network access performed by malware-infected computers.

 組織内のコンピュータに感染したマルウェアは、攻撃者が用意した外部のサーバと通信を行い、新たなマルウェアのダウンロードやコンピュータから取得した情報のアップロードなどを行う。一般にこれらの活動にかかわる通信を不正アクセスと呼ぶ。 Malware that infects computers in the organization communicates with external servers prepared by attackers to download new malware and upload information obtained from computers. Communication related to these activities is generally called unauthorized access.

 不正アクセスを検知する手法として、URLブラックリストを用いた方法が知られている。URLブラックリストは、既知のマルウェアのアクセスで利用されたURL(悪性URLと呼ぶ)をリスト化したものである。URLブラックリストをファイアウォール、IDS/IPS、プロキシサーバなどのセキュリティ装置に登録することで、ブラックリストに含まれるURLで特定される外部サーバへのアクセスを検知することができる。不正アクセスを検知した場合、アクセスを中断することで、被害の拡大を防ぐことができる。このような技術は、一般にウェブフィルタリングと呼ばれる。 
 しかし、攻撃者はブラックリストによる検知を逃れるために、マルウェアが通信するURLを意図的に変化させる。例えば、URLの一部に乱数を組み込むなどの手法が知られている。また、攻撃者は、既存のマルウェアを再利用して、攻撃に利用することがある。そのため、URLに含まれるドメインは異なるが、パス部分は同一あるいは類似している場合がある。このような変化したURLはブラックリストとの完全一致検索では検知できない。変化したURLに対応する方法として、URLを正規表現で表現する技術が知られている。正規表現は、文字列の集合を一つの文字列で表現する方法の一つである。特許文献1では、正規表現生成の候補となる複数のURLサンプルから、文字列の頻度情報をもとにURL正規表現を生成する方法が開示されている。
As a technique for detecting unauthorized access, a method using a URL blacklist is known. The URL blacklist is a list of URLs (called malignant URLs) used for accessing known malware. By registering the URL blacklist in a security device such as a firewall, IDS / IPS, or proxy server, it is possible to detect access to an external server specified by the URL included in the blacklist. When unauthorized access is detected, it is possible to prevent the damage from spreading by interrupting access. Such a technique is generally called web filtering.
However, the attacker intentionally changes the URL with which the malware communicates in order to escape detection by the blacklist. For example, a technique of incorporating a random number into a part of a URL is known. In addition, an attacker may reuse existing malware for attacks. Therefore, although the domains included in the URL are different, the path portions may be the same or similar. Such a changed URL cannot be detected by an exact match search with the black list. As a method for dealing with a changed URL, a technique for expressing a URL with a regular expression is known. Regular expressions are one method for expressing a set of character strings as a single character string. Patent Document 1 discloses a method for generating a URL regular expression from a plurality of URL samples that are candidates for regular expression generation based on frequency information of character strings.

米国特許出願公開第2009/0265786号明細書US Patent Application Publication No. 2009/0265786

 上記した特許文献1によると、変化した悪性URLを検知することができる。しかし、特許文献1の手法は、サンプルに現れる文字列の頻度情報をもとに正規表現を生成するため、ある程度以上の数のURLサンプルを必要とする。サンプル数が少ない場合、検知に有効なURL正規表現を生成することはできない。 According to the above-mentioned Patent Document 1, it is possible to detect a changed malicious URL. However, since the method of Patent Document 1 generates a regular expression based on frequency information of character strings appearing in the sample, it requires a certain number of URL samples. When the number of samples is small, a URL regular expression effective for detection cannot be generated.

 本発明は、上記の問題点を考慮し、正規表現の元になるURLサンプルが少ない場合であっても、検知に有効なURL正規表現を生成し不正アクセスを検知することを目的とする。 The present invention has been made in consideration of the above problems, and an object of the present invention is to generate a URL regular expression effective for detection and detect unauthorized access even when there are few URL samples from which a regular expression is based.

 上記課題を解決するために本発明では、マルウェア解析結果から得られるマルウェアのアクセス挙動のトレースから不正アクセス検知のためのURL正規表現を生成して検知ルールを更新する不正アクセスの検知方法において、解析者が採集した、またはネットワーク上から感染した新たなマルウェアのアクセス挙動のトレース解析からマルウェア特徴量を抽出するステップと、過去のネットワーク上のアクセスログからアクセス特徴量を随時抽出して、記録したアクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似URLを検索するステップと、マルウェア特徴量の接続先URL、および前記検索された類似URLよりURL正規表現を生成するステップと、前記URL正規表現を前記アクセスログに含まれる接続先URLとのパターンマッチングに適用して、その一致率を計算し、該一致率が推奨値以下を満たした場合に、前記URL正規表現を新たな検知ルールに設定するステップとを有することを特徴とする不正アクセス検知方法を提案した。 In order to solve the above-described problems, the present invention analyzes an unauthorized access detection method that generates a URL regular expression for detecting unauthorized access from a trace of malware access behavior obtained from a malware analysis result and updates a detection rule. That extract malware features from trace analysis of access behavior of new malware collected by a user or infected from the network, and access that is recorded by extracting access features from past access logs on the network Searching a similar URL satisfying a distance within a predetermined threshold from a feature amount storage unit using a malware feature amount as a query, and generating a URL regular expression from the connection destination URL of the malware feature amount and the searched similar URL And accessing the URL regular expression with the step Applying the pattern matching with the connection destination URL included in the group, calculating the matching rate, and setting the URL regular expression to a new detection rule when the matching rate satisfies a recommended value or less; A method for detecting unauthorized access is proposed.

 また、上記課題を解決するために本発明では、前記不正アクセス検知方法において、前記アクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似URLを検索するステップは、前記マルウェア特徴量の接続先URL以外の特徴量と対応するアクセス特徴量の接続先URL以外の特徴量との間に定義した距離関数値が所定の閾値より小さくなる場合に、類似アクセス特徴量と判定する第1のステップと、前記マルウェア特徴量の接続先URLと前記類似アクセス特徴量に含まれる接続先URLとの間に定義した文字列間の相違の距離関数値が所定の閾値より小さくなる場合に、類似URLと判定して検索する第2のステップとよりなることを特徴とする。 Further, in order to solve the above-described problem, in the present invention, in the unauthorized access detection method, the step of searching the access feature amount storage unit for a similar URL satisfying a distance within a predetermined threshold using a malware feature amount as a query, When the distance function value defined between the feature amount other than the connection destination URL of the malware feature amount and the feature amount other than the connection destination URL of the corresponding access feature amount is smaller than a predetermined threshold, The distance function value of the difference between character strings defined between the first step of determining and the connection destination URL of the malware feature amount and the connection destination URL included in the similar access feature amount is smaller than a predetermined threshold value. In this case, it is characterized by comprising the second step of searching for a similar URL.

 また、上記課題を解決するために本発明では、インターネットに接続するネットワークに接続された複数のサーバ上に構成された不正アクセス検知システムを、クライアントに感染した、または採集した新たなマルウェアを仮想的試験環境で実行して、マルウェアのアクセス挙動のトレースを生成するマルウェア解析機能と、前記マルウェアのアクセス挙動のトレースより、マルウェア特徴量を抽出するマルウェア特徴量抽出機能と、クライアントの過去のアクセスログを記憶管理して、アクセスログよりアクセス特徴量を適宜抽出してアクセス特徴量記憶部へ記憶するアクセス特徴量抽出機能と、前記アクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似URLを検索する類似URL検索機能と、マルウェア特徴量の接続先URL、および前記検索された類似URLよりURL正規表現を生成し、前記URL正規表現を前記アクセスログに含まれる接続先URLとのパターンマッチングに適用して、その一致率を計算し、該一致率が推奨値以下を満たした場合に、前記URL正規表現を新たな検知ルールに加える正規表現生成機能と、前記URL正規表現を加えて更新した検知ルールを、アクセス対象のURLに適用して、不正アクセスか否かを判定する悪性URL検知機能とを備えて構成した。 Further, in order to solve the above-mentioned problems, in the present invention, an unauthorized access detection system configured on a plurality of servers connected to a network connected to the Internet is used to virtually detect new malware that has infected or collected clients. A malware analysis function that generates a trace of malware access behavior by executing it in a test environment, a malware feature extraction function that extracts malware features from the trace of malware access behavior, and a client's past access log An access feature value extraction function that appropriately manages and extracts an access feature value from an access log and stores it in an access feature value storage unit, and a malware feature value as a query from the access feature value storage unit within a predetermined threshold A similar URL search function for searching for similar URLs that satisfy a distance; A URL regular expression is generated from the connection destination URL of the malware feature amount and the searched similar URL, and the URL regular expression is applied to pattern matching with the connection destination URL included in the access log, and the matching rate is calculated. When the match rate is less than or equal to the recommended value, a regular expression generation function for adding the URL regular expression to a new detection rule, and a detection rule updated by adding the URL regular expression are updated as URLs to be accessed. And a malicious URL detection function for judging whether or not unauthorized access is made.

 本発明により、不正URLのサンプルが少量の場合であっても、効果的なURL正規表現を生成し、不正アクセスを検知することができる。 According to the present invention, an effective URL regular expression can be generated and unauthorized access can be detected even if there are a small number of unauthorized URL samples.

本実施形態の不正アクセス検知システムのシステム構成の例を示した図である。It is the figure which showed the example of the system configuration | structure of the unauthorized access detection system of this embodiment. マルウェア解析サーバ、検知ルール設定サーバ、ログ管理サーバ、プロキシサーバが有する機能の関連を説明した図である。It is the figure explaining the relationship of the function which a malware analysis server, a detection rule setting server, a log management server, and a proxy server have. マルウェア特徴量の例を示した図である。It is the figure which showed the example of the malware feature-value. アクセス特徴量の例を示した図である。It is the figure which showed the example of the access feature-value. 検知ルール管理情報の例を示した図である。It is the figure which showed the example of detection rule management information. マルウェア特徴量抽出機能の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of the malware feature-value extraction function. URL正規表現生成機能の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of a URL regular expression production | generation function. 類似URL検索機能の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of a similar URL search function.

 以下、本発明を実施するための形態(以下、「実施形態」という。)について、適宜図面を参照しつつ説明する。 Hereinafter, modes for carrying out the present invention (hereinafter referred to as “embodiments”) will be described with reference to the drawings as appropriate.

 図1は、本実施形態の不正アクセス検知システム100のシステム構成の例を示した図である。図1に示すように、本システムは、プロキシサーバ121、ログ管理サーバ122、マルウェア解析サーバ123、検知ルール設定サーバ124を含み、各装置はネットワーク101を介して相互に接続されて構成される。 FIG. 1 is a diagram illustrating an example of a system configuration of an unauthorized access detection system 100 according to the present embodiment. As shown in FIG. 1, this system includes a proxy server 121, a log management server 122, a malware analysis server 123, and a detection rule setting server 124, and each device is configured to be connected to each other via a network 101.

 不正アクセス検知システム100は、ある組織内に敷設されたローカルエリアネットワーク120に複数のクライアント125と共に接続されている。ローカルエリアネットワーク120は、ファイアウォール130、ネットワーク101を介してインターネット110に接続されている。 The unauthorized access detection system 100 is connected together with a plurality of clients 125 to a local area network 120 installed in a certain organization. The local area network 120 is connected to the Internet 110 via the firewall 130 and the network 101.

 インターネット110上の攻撃者サーバ111は、ネットワークに接続する前記組織などに対して攻撃を行う者が利用するサーバである。攻撃者は、マルウェアを組織内に侵入させることに成功すると、攻撃者サーバ111を使い組織のクライアント125に感染したマルウェアと通信を行う。その結果、新たなマルウェアの送信や、組織内から取得したファイルの受信などを行う。攻撃者サーバ111はインターネット110上に複数設置される。 The attacker server 111 on the Internet 110 is a server used by an attacker who attacks the organization connected to the network. When the attacker succeeds in infiltrating the malware into the organization, the attacker uses the attacker server 111 to communicate with the malware infected with the client 125 of the organization. As a result, new malware is transmitted and files acquired from within the organization are received. A plurality of attacker servers 111 are installed on the Internet 110.

 ファイアウォール130は、ローカルエリアネットワーク120とインターネット110との間で、互いのネットワークを行き来するパケットの中から、特定の条件に合ったパケットを破棄(遮断)あるいは許可(通過)する機能を備える。特にプロキシサーバ121を経由しないパケットを破棄することで、ローカルエリアネットワーク120からインターネット110へ向かう全てのアクセスをプロキシサーバ121経由で行うことができる。 The firewall 130 has a function of discarding (blocking) or permitting (passing) a packet that meets a specific condition from among packets traveling between the local area network 120 and the Internet 110. In particular, by discarding packets that do not pass through the proxy server 121, all accesses from the local area network 120 to the Internet 110 can be performed through the proxy server 121.

 プロキシサーバ121は、クライアント125とインターネット上のサーバ間のパケットのやり取りを中継する。プロキシサーバ121に悪性URLを登録しておくことで、不正アクセスを検知することができる。不正アクセスを検知した場合、アクセスを中止することで、攻撃者との通信を遮断することができる。また、プロキシサーバ121は、クライアント125が行ったアクセスの履歴を全て記録する機能を備える。この記録をアクセスログと呼ぶ。プロキシサーバ121の処理の詳細は、図2で説明する。 The proxy server 121 relays packet exchange between the client 125 and a server on the Internet. By registering a malicious URL in the proxy server 121, unauthorized access can be detected. When unauthorized access is detected, communication with the attacker can be blocked by canceling access. In addition, the proxy server 121 has a function of recording all the history of accesses performed by the client 125. This record is called an access log. Details of the processing of the proxy server 121 will be described with reference to FIG.

 ログ管理サーバ122は、プロキシサーバ121が出力するアクセスログを保存し、正規表現URLの生成に利用するURLを検索する機能を備える。ログ管理サーバ122の処理の詳細は、図2で説明する。 The log management server 122 has a function of storing an access log output by the proxy server 121 and searching for a URL used for generating a regular expression URL. Details of the processing of the log management server 122 will be described with reference to FIG.

 マルウェア解析サーバ123は、仮想環境などでマルウェアを実行し、ネットワークアクセスの振る舞いなどを記録する機能を備える。図1では、マルウェア解析サーバ123は、ローカルエリアネットワークに接続されているが、インターネット110上に接続されてもよい。マルウェア解析サーバ123の処理の詳細は、図2で説明する。 The malware analysis server 123 has a function of executing malware in a virtual environment and recording network access behavior. In FIG. 1, the malware analysis server 123 is connected to the local area network, but may be connected to the Internet 110. Details of the processing of the malware analysis server 123 will be described with reference to FIG.

 検知ルール設定サーバ124は、マルウェア解析サーバ123が記録したマルウェアのネットワークアクセスの振る舞いを記録したデータから抽出した特徴量と、ログ管理サーバ122が有するアクセスログから抽出した特徴量を用いてURL正規表現を生成する機能を備える。さらに、生成したURL正規表現をプロキシサーバ121へ検知ルールとして設定する機能を備える。 The detection rule setting server 124 uses the feature amount extracted from the data recorded by the malware analysis server 123 to record the network access behavior of the malware, and the URL regular expression using the feature amount extracted from the access log of the log management server 122. The function to generate. Furthermore, a function of setting the generated URL regular expression as a detection rule in the proxy server 121 is provided.

 クライアント125は、ネットワーク101を介してインターネット110にアクセスする機能を備える。クライアント125は、偽造メールに添付された実行ファイルを実行するなどして、マルウェアに感染する可能性がある。マルウェアに感染したクライアント125は、正規のユーザに気づかれずに、攻撃者と通信を行う。 The client 125 has a function of accessing the Internet 110 via the network 101. The client 125 may be infected with malware by executing an executable file attached to the forged mail. The client 125 infected with malware communicates with an attacker without being noticed by a legitimate user.

 ネットワークに接続されたこれらの各装置のハードウェア構成は、少なくともCPU(Central Processing Unit)、ハードディスクドライブなどの補助記憶装置、ROM(Read Only Memory)、RAM(Random Access Memory)などの主記憶装置、キーボードやマウスといった入力装置、ディスプレイなどの出力装置と接続されるI(Input)/O(Output)インターフェース、ローカルエリアネット120およびインターネット110に接続するためのネットワークインターフェースなどを備える。 The hardware configuration of each of these devices connected to the network includes at least a main storage device such as a CPU (Central Processing Unit), an auxiliary storage device such as a hard disk drive, a ROM (Read Only Memory), a RAM (Random Access Memory), An input device such as a keyboard and a mouse, an I (Input) / O (Output) interface connected to an output device such as a display, a network interface for connecting to the local area network 120 and the Internet 110 are provided.

 図2を参照して、マルウェア解析サーバ123、検知ルール設定サーバ124、ログ管理サーバ122、プロキシサーバ121および各サーバが連携して行う処理の概要について説明する。 Referring to FIG. 2, an outline of the malware analysis server 123, the detection rule setting server 124, the log management server 122, the proxy server 121, and processing performed by each server in cooperation with each other will be described.

 マルウェア解析サーバ123は、マルウェア解析機能231を備える。マルウェア解析機能231は、マルウェア解析サーバ123上で仮想的試験環境でマルウェアを実行し、マルウェアが行うファイルの生成、レジストリの変更、ネットワークを介したアクセスの挙動などを記録する。特に、本発明では、ネットワークを介したアクセスの記録を利用する。アクセスの記録には、マルウェアが行ったアクセスが時系列で記録される。各アクセスの記録は、アクセスした時刻、送信したパケットを含む。パケットを解析することで、接続先URL、接続先IPアドレス、接続先ポート、送信元ポート、プロトコル、User-Agentなどの情報を取得できる。このようなマルウェア解析機能231は、一般に動的解析と呼ばれる技術で実現できる。本実施形態では、既存の動的解析の技術をマルウェア解析サーバ123に実装して、マルウェアをデバッガやエミュレータにより実行して、マルウェアの制御フローのトレースを記録する。 The malware analysis server 123 includes a malware analysis function 231. The malware analysis function 231 executes malware on the malware analysis server 123 in a virtual test environment, and records file generation, registry change, access behavior via the network, and the like performed by the malware. In particular, the present invention uses a record of access via a network. In the access record, accesses made by malware are recorded in chronological order. Each access record includes the access time and the transmitted packet. By analyzing the packet, information such as a connection destination URL, a connection destination IP address, a connection destination port, a transmission source port, a protocol, and a User-Agent can be acquired. Such a malware analysis function 231 can be realized by a technique generally called dynamic analysis. In the present embodiment, the existing dynamic analysis technology is installed in the malware analysis server 123, the malware is executed by a debugger or an emulator, and a trace of the malware control flow is recorded.

 マルウェア解析サーバ123にマルウェアを用意する方法は大きく2通りある。一つは、マルウェア解析サーバ123とは別のコンピュータ(クライアント125)で発見されたマルウェアを、手動でマルウェア解析サーバ123へコピー(マルウェア検体)する方法である。もう一つは、マルウェア解析サーバ123を攻撃者が狙いやすい場所に設置し(例えば、外部からアクセスし易い場所、ファイアウォール130の外側のネットワーク101上に設置する。)、マルウェアに感染させる方法である。この方法は、一般にハニーポットと呼ばれる。 There are two main methods for preparing malware in the malware analysis server 123. One is a method of manually copying (malware specimen) malware found on a computer (client 125) different from the malware analysis server 123 to the malware analysis server 123. The other is a method in which the malware analysis server 123 is installed in a place where an attacker can easily aim (for example, installed on the network 101 outside the firewall 130 where access is easy from the outside), and infecting with malware. . This method is generally called a honeypot.

 検知ルール設定サーバ124は、マルウェア特徴量300抽出機能241、正規表現生成機能242、検知ルール管理情報記憶部243、検知ルール設定機能244などを備える。
  マルウェア特徴量300抽出機能241は、マルウェア解析機能231が出力したマルウェアの解析結果(マルウェアが行ったアクセスの時系列の記録)から、URL正規表現の生成に利用する情報を抽出する。マルウェア特徴量300の詳細は、図3で、処理の詳細は、図6で説明する。
The detection rule setting server 124 includes a malware feature amount 300 extraction function 241, a regular expression generation function 242, a detection rule management information storage unit 243, a detection rule setting function 244, and the like.
The malware feature quantity 300 extraction function 241 extracts information used for generating a URL regular expression from the analysis result of the malware output by the malware analysis function 231 (the time series of accesses made by the malware). Details of the malware feature 300 will be described with reference to FIG. 3, and details of the processing will be described with reference to FIG.

 正規表現生成機能242は、マルウェア特徴量300抽出機能241が出力した情報をもとに、ログ管理サーバ122へ問合せを行い、URL正規表現を生成するための類似URLの集合を取得する。取得した類似URLの集合から正規表現の候補を生成し、アクセスログ221を利用してURL正規表現502を生成する。生成したURL正規表現502を検知ルール管理情報記憶部243に保存する。正規表現生成機能242の処理の詳細は、図7で説明する。 The regular expression generation function 242 makes an inquiry to the log management server 122 based on the information output by the malware feature quantity 300 extraction function 241 and acquires a set of similar URLs for generating a URL regular expression. A regular expression candidate is generated from the acquired set of similar URLs, and a URL regular expression 502 is generated using the access log 221. The generated URL regular expression 502 is stored in the detection rule management information storage unit 243. Details of the processing of the regular expression generation function 242 will be described with reference to FIG.

 検知ルール管理情報記憶部243は、正規表現生成機能242が生成したURL正規表現とURL正規表現を適用する装置の情報などを含む。詳細は図5で説明する。
  検知ルール設定機能244は、正規表現生成機能242が生成したURL正規表現をプロキシサーバ121に設定する機能を備える。
The detection rule management information storage unit 243 includes a URL regular expression generated by the regular expression generation function 242 and information on a device to which the URL regular expression is applied. Details will be described with reference to FIG.
The detection rule setting function 244 has a function of setting the URL regular expression generated by the regular expression generation function 242 in the proxy server 121.

 ログ管理サーバ122は、アクセスログ記憶部221、アクセス特徴量抽出機能222、アクセス特徴量記憶部223、類似URL検索機能224などを備える。
  アクセスログ記憶部221は、プロキシサーバ121が出力したアクセスログ213を例えば1年以上に亘って記録したものを含む。アクセスログ記憶部221は、クライアント125がアクセスを行った日時、アクセスを行ったクライアント125のIPアドレス、アクセスを行った接続先URL、アクセスに利用したUser-Agent、リファラ、送信したパケットのサイズ、受信したパケットのサイズなどを含む。一般に、負荷分散などの理由からプロキシサーバ121は複数設置される。そのため、出力されるアクセスログも複数ファイルに分割される。ログ管理サーバ122は、これら分割されたログをマージして保存する。
The log management server 122 includes an access log storage unit 221, an access feature amount extraction function 222, an access feature amount storage unit 223, a similar URL search function 224, and the like.
The access log storage unit 221 includes a record of the access log 213 output from the proxy server 121 over, for example, one year or more. The access log storage unit 221 includes the date and time when the client 125 accessed, the IP address of the client 125 that accessed, the connection destination URL that accessed, the User-Agent used for the access, the referer, the size of the transmitted packet, Includes the size of the received packet. Generally, a plurality of proxy servers 121 are installed for reasons such as load distribution. For this reason, the output access log is also divided into a plurality of files. The log management server 122 merges and stores these divided logs.

 アクセス特徴量抽出機能222は、アクセスログ221を解析し、一連のアクセスに対してアクセス特徴量223を算出する。膨大なアクセスログデータから、必要時にアクセス特徴量を算出すると時間を要するので、アクセスログ221が記録された際に適宜アクセス特徴量223を算出する。
  アクセス特徴量記憶部223は、類似URLを検索するのに必要な情報を含む。詳細は図4で説明する。
  類似URL検索機能224は、正規表現生成機能242が正規表現を生成するのに必要な類似URLの集合を検索する。処理の詳細は、図7で説明する。
The access feature amount extraction function 222 analyzes the access log 221 and calculates an access feature amount 223 for a series of accesses. Since it takes time to calculate an access feature amount from a large amount of access log data when necessary, the access feature amount 223 is appropriately calculated when the access log 221 is recorded.
The access feature amount storage unit 223 includes information necessary for searching for similar URLs. Details will be described with reference to FIG.
The similar URL search function 224 searches for a set of similar URLs necessary for the regular expression generation function 242 to generate a regular expression. Details of the processing will be described with reference to FIG.

 プロキシサーバ121は、悪性URL検知機能211、検知ルール記憶部212を備える。
  悪性URL検知機能211は、クライアント125がアクセスしようとするURLが予め設定したURL正規表現に一致するか否かを比較し、一致する場合は不正アクセスと判定し、アクセスを中止するなどの制御を行う。
  検知ルール212は、プロキシサーバ121がクライアント125のアクセスを許可するか遮断するかの判断を行うためのルールを含む。URL正規表現も検知ルールの一つだが、パケットのサイズやプロトコルの種類などによる検知ルール212も有する。例えば、クライアント125がアクセスを試みるインターネット上の相手のURLがURL正規表現に一致し、パケットサイズが1MB以上の場合は、アクセスを中止するなど複合的なルールを設定できる。
The proxy server 121 includes a malicious URL detection function 211 and a detection rule storage unit 212.
The malicious URL detection function 211 compares whether or not the URL to be accessed by the client 125 matches a URL regular expression set in advance. If the URL matches, the malicious URL detection function 211 determines that the access is unauthorized and stops the access. Do.
The detection rule 212 includes a rule for the proxy server 121 to determine whether to permit or block access to the client 125. A URL regular expression is one of the detection rules, but also includes a detection rule 212 based on a packet size, a protocol type, and the like. For example, if the URL of the other party on the Internet that the client 125 tries to access matches the URL regular expression and the packet size is 1 MB or more, a complex rule such as stopping access can be set.

 図3を参照して、検知ルール設定サーバ124のマルウェア特徴量抽出機能241がマルウェア解析機能231が出力したマルウェアの解析結果(マルウェアが行ったアクセスの時系列の記録)から抽出したマルウェア特徴量300の例について説明する。マルウェア特徴量300は、マルウェアID301、接続先URLリスト302、平均パケットサイズ303、アクセス時間間隔304、User-Agent305、Post回数306などを含む。 Referring to FIG. 3, malware feature quantity 300 extracted from the malware analysis result (time series of accesses made by malware) output by malware analysis function 231 by malware feature quantity extraction function 241 of detection rule setting server 124. An example will be described. The malware feature amount 300 includes a malware ID 301, a connection destination URL list 302, an average packet size 303, an access time interval 304, a User-Agent 305, a Post count 306, and the like.

 マルウェアID301は、マルウェアを一意に特定するための識別子である。例えば、MD5などのハッシュ値をマルウェアID301として利用する。接続先URLリスト302は、マルウェア解析機能231が解析したマルウェアがアクセスしたURLのリストである。これらは、悪性URLと判定される。平均パケットサイズ303は、一連のマルウェアのアクセスの中でマルウェアが送信したパケットの平均サイズである。アクセス時間間隔304は、一連のマルウェアのアクセスの時間パターンを表す量である。例えば、アクセス間隔の平均時間などが利用できる。より高度な時間パターンを表す量として、アクセス時刻の周期性などを抽出することも考えられる。User-Agentは、アクセスを行ったプログラムを特定するための識別子である。POST回数306は、一連のマルウェアのアクセスでPOSTを行った回数である。これらマルウェア特徴量はマルウェアの挙動を特徴付ける量であり、マルウェア特徴量と類似の特徴量を持つアクセスはマルウェアによるアクセスである可能性が高い。 The malware ID 301 is an identifier for uniquely identifying malware. For example, a hash value such as MD5 is used as the malware ID 301. The connection destination URL list 302 is a list of URLs accessed by malware analyzed by the malware analysis function 231. These are determined to be malicious URLs. The average packet size 303 is an average size of packets transmitted by malware in a series of malware accesses. The access time interval 304 is an amount representing a time pattern of access of a series of malware. For example, the average time of access intervals can be used. It may be possible to extract the periodicity of the access time as an amount representing a more advanced time pattern. User-Agent is an identifier for specifying the program that has accessed. The number of POSTs 306 is the number of times POST is performed by a series of malware accesses. These malware feature amounts are features that characterize the behavior of malware, and an access having a feature amount similar to the malware feature amount is likely to be an access by malware.

 図4を参照して、ログ管理サーバ122のアクセス特徴量抽出機能222がアクセスログ221から抽出するアクセス特徴量223の例について説明する。アクセス特徴量223は、セッションID401、イベントIDリスト402、接続先URLリスト403、平均パケットサイズ404、アクセス時間間隔405、User-Agent406、Post回数407などを含む。 With reference to FIG. 4, an example of the access feature quantity 223 extracted from the access log 221 by the access feature quantity extraction function 222 of the log management server 122 will be described. The access feature amount 223 includes a session ID 401, an event ID list 402, a connection destination URL list 403, an average packet size 404, an access time interval 405, a User-Agent 406, a Post count 407, and the like.

 セッションID401は、クライアント125が行った一連の繋がりを持ったアクセスを特定するための識別子である。イベントIDリスト402は、セッションIDで識別されるセッションに属するイベントのリストを特定するための識別子である。ここでイベントとは、アクセスログ記憶部221に含まれる一つのアクセスを指す。接続先URLリスト403は、セッション中にクライアント125がアクセスしたURLを記録したリストである。平均パケットサイズ404は、セッション中にクライアント125が送信したパケットの平均サイズである。アクセス時間間隔405は、一連のマルウェアのアクセスの時間パターンを表す量である。マルウェア特徴量300のアクセス時間と同様の量が利用できる。User-Agent406は、アクセスを行ったプログラムを特定するための識別子である。POST回数407は、セッション中にクライアント125が送信したPOSTアクセスの回数である。 The session ID 401 is an identifier for specifying an access having a series of connections made by the client 125. The event ID list 402 is an identifier for specifying a list of events belonging to the session identified by the session ID. Here, the event refers to one access included in the access log storage unit 221. The connection destination URL list 403 is a list in which URLs accessed by the client 125 during a session are recorded. The average packet size 404 is an average size of packets transmitted by the client 125 during the session. The access time interval 405 is an amount representing a time pattern of access of a series of malware. An amount similar to the access time of the malware feature amount 300 can be used. User-Agent 406 is an identifier for identifying the program that has accessed. The POST count 407 is the number of POST accesses transmitted by the client 125 during the session.

 以下で、セッションの決め方について説明する。まず、アクセスログ記憶部221に含まれる各イベントをクライアント125ごとに分類する。クライアント125はイベントに含まれる送信元IPや、ユーザの認証情報により特定される。次に、クライアント125で分類されたイベントをセッションに分類する。アクセスの時間間隔の差異が予め決めた閾値(例えば30分)を超えた場合に別のセッションと判定する。さらに、User-Agentが異なるイベントは別セッションと判定する。これにより、アクセスログ125は複数のセッションに分解される。 The following explains how to decide a session. First, each event included in the access log storage unit 221 is classified for each client 125. The client 125 is specified by the source IP included in the event and user authentication information. Next, the events classified by the client 125 are classified into sessions. When the difference in the access time interval exceeds a predetermined threshold (for example, 30 minutes), it is determined as another session. Furthermore, an event with a different User-Agent is determined as another session. Thereby, the access log 125 is decomposed into a plurality of sessions.

 図5を参照して、正規表現生成機能242にて作成した検知ルール管理情報243の例について説明する。検知ルール管理情報記憶部243は、ルールID501、URL正規表現502、対象装置ID503、対策504、設定日505などを含む。
  ルールID501は、検知ルール243を一意に特定するための識別子である。URL正規表現502は、プロキシサーバ121が、クライアント125がアクセスする対象のURLを不正アクセスとして判定するための接続先URLを正規表現で表したものである。対象装置ID503は、検知ルールを適用する装置を識別するための情報である。例えば、プロキシサーバ121のIPアドレスなどが利用できる。対策504は、クライアント125の接続先URLがURL正規表現に一致した場合に対象装置(プロキシサーバ)の悪性URL検知機能211が行う制御の内容である。例えば、通信の遮断や、管理者へ通知など対策として利用できる。設定日は、検知ルールを設定した日時を表す。設定日を利用することで、設定してから一定期間経過したルールは削除するなどの運用が可能になる。
An example of the detection rule management information 243 created by the regular expression generation function 242 will be described with reference to FIG. The detection rule management information storage unit 243 includes a rule ID 501, a URL regular expression 502, a target device ID 503, a countermeasure 504, a setting date 505, and the like.
The rule ID 501 is an identifier for uniquely specifying the detection rule 243. The URL regular expression 502 represents the connection destination URL for the proxy server 121 to determine the URL to be accessed by the client 125 as unauthorized access in regular expression. The target device ID 503 is information for identifying a device to which the detection rule is applied. For example, the IP address of the proxy server 121 can be used. The countermeasure 504 is the content of control performed by the malicious URL detection function 211 of the target device (proxy server) when the connection destination URL of the client 125 matches the URL regular expression. For example, it can be used as a countermeasure such as blocking communication or notifying the administrator. The setting date represents the date and time when the detection rule is set. By using the set date, it is possible to operate such as deleting a rule that has passed for a certain period after setting.

 図6のフローチャートを参照して、検知ルール設定サーバ124のマルウェア特徴量抽出機能241の処理の流れの例を説明する。
  ステップS601において、マルウェア特徴量抽出機能241は、マルウェア解析機能231が出力したマルウェア解析結果(マルウェアが行ったアクセスの時系列の記録)を読み込む。複数のマルウェアを解析した場合は、マルウェア解析結果も複数存在する。
With reference to the flowchart of FIG. 6, an example of the flow of processing of the malware feature amount extraction function 241 of the detection rule setting server 124 will be described.
In step S601, the malware feature amount extraction function 241 reads the malware analysis result (the time series of accesses made by the malware) output by the malware analysis function 231. When analyzing multiple malware, there are multiple malware analysis results.

 ステップS602において、マルウェア特徴量抽出機能241は、マルウェア解析機能231が出力したマルウェア解析結果から図3に示すマルウェア特徴量300を抽出する。前述したように、マルウェア解析結果は、OSのAPI呼び出しや、ネットワークアクセスのログを含む。マルウェア特徴量300は、プロキシサーバ121が出力したアクセスログの検索に利用される。そのため、アクセスログに含まれないOSのAPI呼び出しなどの情報は除外し、ネットワークアクセスに関するログを選択する。その後、選択したログを解析し、図3に含まれる情報を抽出する。 In step S602, the malware feature quantity extraction function 241 extracts the malware feature quantity 300 shown in FIG. 3 from the malware analysis result output by the malware analysis function 231. As described above, the malware analysis result includes an OS API call and a network access log. The malware feature 300 is used for searching the access log output by the proxy server 121. For this reason, information such as an API call of the OS that is not included in the access log is excluded and a log related to network access is selected. Thereafter, the selected log is analyzed, and information included in FIG. 3 is extracted.

 ステップS603において、マルウェア特徴量抽出機能241は、重複するマルウェア特徴量300を除外する。マルウェア解析機能231が複数のマルウェアに対して解析を行った場合、ステップS602において、複数のマルウェア特徴量300が抽出される。その中には、マルウェアのハッシュ値(マルウェアID301)は異なるが、他のマルウェア特徴量300は同一なマルウェアが存在する可能性がある。その場合、いずれか一つのマルウェア特徴量300のみを選択する。
  ステップS604において、マルウェア特徴量抽出機能241は、ステップS603で重複を除外したマルウェア特徴量300を正規表現生成機能242に送信する。
In step S603, the malware feature amount extraction function 241 excludes the duplicate malware feature amount 300. When the malware analysis function 231 analyzes a plurality of malware, a plurality of malware feature quantities 300 are extracted in step S602. Among them, although the malware hash values (malware ID 301) are different, there is a possibility that the same malware exists in the other malware feature amount 300. In that case, only one malware feature 300 is selected.
In step S604, the malware feature amount extraction function 241 transmits the malware feature amount 300 from which duplication is excluded in step S603 to the regular expression generation function 242.

 図7のフローチャートを参照して、正規表現生成機能242の処理の流れの例を説明する。
  ステップS701において、正規表現生成機能242は、マルウェア特徴量抽出機能241が出力したマルウェア特徴量300を取得する。複数のマルウェアを解析した場合は、複数のマルウェア特徴量300を取得する。
An example of the processing flow of the regular expression generation function 242 will be described with reference to the flowchart of FIG.
In step S <b> 701, the regular expression generation function 242 acquires the malware feature quantity 300 output by the malware feature quantity extraction function 241. When a plurality of malwares are analyzed, a plurality of malware feature quantities 300 are acquired.

 ステップS702において、正規表現生成機能242は、ステップS701で取得したマルウェア特徴量300をログ管理サーバ122の類似URL検索機能224に送信する。マルウェア特徴量300を受信した類似URL検索機能224は、マルウェア特徴量300をクエリにして、アクセス特徴量記憶部223から類似URLを検索する。検索した結果、該当する類似URLを正規表現生成機能242に送信する。類似URL検索機能224の詳細は、図8で説明する。
  ステップS703において、正規表現生成機能242は、類似URL検索機能224が送信した類似URLを受信する。
In step S <b> 702, the regular expression generation function 242 transmits the malware feature quantity 300 acquired in step S <b> 701 to the similar URL search function 224 of the log management server 122. The similar URL search function 224 that has received the malware feature quantity 300 searches the access feature quantity storage unit 223 for a similar URL using the malware feature quantity 300 as a query. As a result of the search, the corresponding similar URL is transmitted to the regular expression generation function 242. Details of the similar URL search function 224 will be described with reference to FIG.
In step S <b> 703, the regular expression generation function 242 receives the similar URL transmitted by the similar URL search function 224.

 ステップS704において、正規表現生成機能242は、類似URLの集合からURL正規表現を生成する。複数の文字列から正規表現を生成する方法として、特許文献1に記載される方法を採用する。
  上記手法を使うことで、類似URLの集合から、正規表現を生成することができる。しかし、上記手法も含めて一般にある複数の文字列を表現する正規表現は一意には定まらない。例えば、http://www.sample.com/path[a-zA-Z]{5,10}.exeと、http://www.sample.com/[a-zA-Z]*.exeという二つの正規表現を比較すると、前者で表現されるURLは全て後者でも表現できる。その意味で、後者は粒度が荒い正規表現と言える。粒度が荒い正規表現をプロキシサーバ121の検知ルール記憶部212に記憶して、クライアント125がアクセスしようとするURLを検知すると、悪性URLと誤認識する確率が高くなる。そのため、正規表現の粒度は細かい方が望ましい。 
 ステップS705において、正規表現生成機能242は、アクセスログ221を使って正規表現の粒度を確認する。まず、ステップS704で生成したURL正規表現を、ステップS703で受信した類似URL以外のアクセスログ221のURLとのパターンマッチングに適用し、その一致率を計算する。ここで、予め一致率の推奨値(例えば3%)を決めておく。そして、計算した一致率が推奨値より大きくなった場合は、正規表現の粒度が荒すぎるので、ステップS704で生成した該当正規表現を破棄する。
In step S704, the regular expression generation function 242 generates a URL regular expression from a set of similar URLs. As a method for generating a regular expression from a plurality of character strings, the method described in Patent Document 1 is adopted.
By using the above method, a regular expression can be generated from a set of similar URLs. However, a regular expression that expresses a plurality of general character strings including the above method is not uniquely determined. For example, http://www.sample.com/path[a-zA-Z]{5,10}.exe and http://www.sample.com/[a-zA-Z]*.exe Comparing two regular expressions, all URLs expressed in the former can be expressed in the latter. In that sense, the latter is a regular expression with coarse grain. If a regular expression with a coarse granularity is stored in the detection rule storage unit 212 of the proxy server 121 and a URL that the client 125 attempts to access is detected, the probability of erroneous recognition as a malicious URL increases. Therefore, it is desirable that the regular expression has a finer granularity.
In step S <b> 705, the regular expression generation function 242 uses the access log 221 to check the regular expression granularity. First, the URL regular expression generated in step S704 is applied to pattern matching with the URL of the access log 221 other than the similar URL received in step S703, and the matching rate is calculated. Here, a recommended value (for example, 3%) of the matching rate is determined in advance. If the calculated match rate is larger than the recommended value, the regular expression granularity is too rough, and the corresponding regular expression generated in step S704 is discarded.

 ステップS706において、正規表現生成機能242は、ステップS705で粒度を確認したURL正規表現のうち、アクセスログ221のURLとの一致率が推奨値以下となるURL正規表現のみを検知ルール管理情報記憶部243に保存する。ここで、URL正規表現に、対象装置ID、対策、設定日などを追加する。対象装置ID、対策は予め決められた値を登録してもよいし、正規表現生成結果を確認した管理者が手動で登録してもよい。 In step S706, the regular expression generation function 242 detects only the URL regular expression whose matching rate with the URL of the access log 221 is equal to or less than the recommended value among the URL regular expressions whose granularity is confirmed in step S705. Save to 243. Here, the target device ID, countermeasure, set date, etc. are added to the URL regular expression. For the target device ID and the countermeasure, a predetermined value may be registered, or an administrator who has confirmed the regular expression generation result may manually register the value.

 図8のフローチャートを参照して、ログ管理サーバ122の類似URL検索機能224の処理の流れの例を説明する。
  ステップS801において、類似URL検索機能224は、正規表現生成機能242が送信したマルウェア特徴量300を受信する。
An example of the processing flow of the similar URL search function 224 of the log management server 122 will be described with reference to the flowchart of FIG.
In step S801, the similar URL search function 224 receives the malware feature quantity 300 transmitted by the regular expression generation function 242.

 ステップS802において、類似URL検索機能224は、マルウェア特徴量300の接続先URL以外の特徴量を検索キーとして、アクセス特徴量記憶部223に記憶されたアクセス特徴量を検索して、数1の距離関数が予め設定した閾値より小さくなる場合に、マルウェア特徴量300とアクセス特徴量223は類似していると見做して、類似アクセス特徴量の集合を取得する。
  マルウェア特徴量Cmとアクセス特徴量Caの間に定義する距離関数d(Cm,Ca)(数1)は、接続先URL以外の対応する特徴量の差の絶対値に特徴量ごとの重み係数wfを掛けて、それらの線形和を距離として用いる。このとき、User-Agentなど数値的でない特徴量は、値が一致であれば0、不一致であれば1とする離散距離を用いる。
(数1) d(Cm,Ca)=wf1・|cm_pk-ca_pk|+wf2・|cm_atime-ca_atime|
              +wf3・f(cm_ua,ca_ua)+wf4・|cm_pfreq-ca_pfreq|+‥‥
 ここで、cm_pk,cm_atime,cm_ua,cm_pfreq:マルウェア特徴量(平均パケットサイズ、アクセス時間間隔、User-Agent、Post回数)、ca_pk,ca_atime,ca_ua,ca_pfreq:アクセス特徴量(平均パケットサイズ、アクセス時間間隔、User-Agent、Post回数)、wf1,wf2,wf3,wf4:特徴量ごとの重み係数、f(cm_ua,ca_ua):特徴量(User-Agent)の距離関数=0(cm_ua=ca_ua) or 1(cm_ua≠ca_ua)。
 検索の結果、該当するアクセス特徴量(セッション)の集合を取得する。
In step S <b> 802, the similar URL search function 224 searches the access feature amount stored in the access feature amount storage unit 223 using a feature amount other than the connection destination URL of the malware feature amount 300 as a search key, When the function is smaller than a preset threshold, it is assumed that the malware feature quantity 300 and the access feature quantity 223 are similar, and a set of similar access feature quantities is acquired.
The distance function d (Cm, Ca) (Equation 1) defined between the malware feature quantity Cm and the access feature quantity Ca is a weighting factor wf for each feature quantity in the absolute value of the corresponding feature quantity difference other than the connection destination URL. And use their linear sum as the distance. At this time, a non-numerical feature quantity such as User-Agent uses a discrete distance of 0 if the values match and 1 if the values do not match.
(Equation 1) d (Cm, Ca) = wf1 · | cm_pk−ca_pk | + wf2 · | cm_atime−ca_atime |
+ Wf3 · f (cm_ua, ca_ua) + wf4 · | cm_pfreq−ca_pfreq | +
Here, cm_pk, cm_atime, cm_ua, cm_pfreq: malware feature (average packet size, access time interval, User-Agent, Post count), ca_pk, ca_atime, ca_ua, ca_pfreq: access feature (average packet size, access time interval) , User-Agent, Post count), wf1, wf2, wf3, wf4: weighting factor for each feature quantity, f (cm_ua, ca_ua): distance function of feature quantity (User-Agent) = 0 (cm_ua = ca_ua) or 1 (cm_ua ≠ ca_ua).
As a result of the search, a set of corresponding access feature values (sessions) is acquired.

 ステップS803において、類似URL検索機能224は、ステップS802で取得した類似アクセス特徴量(セッション)に含まれる接続先URLを取得する。
  ステップS804において、類似URL検索機能224は、ステップS803で取得した接続先URLの中から、ステップS801で取得したマルウェア特徴量が含む接続先URLと類似する接続先URLを取得する。具体的には、マルウェア特徴量が含む接続先URLと、類似アクセス特徴量(セッション)に含まれる接続先URLに対して、距離関数を定義して、あらかじめ設定した閾値より距離が小さい組を類似とみなす。距離関数として、文字列の近さを図る「編集距離(レーベンシュタイン距離とも呼ばれる)」などが利用できる。
  編集距離(レーベンシュタイン距離)は、二つの文字列がどの程度異なっているかを示す数値である。具体的には、文字の挿入や削除、置換によって、一つの文字列を別の文字列に変形するのに必要な手順の最小回数として与えられる。
  ステップS805において、類似URL検索機能224は、ステップS804で検索した類似URLを正規表現生成機能242へ送信する。
In step S803, the similar URL search function 224 acquires a connection destination URL included in the similar access feature amount (session) acquired in step S802.
In step S804, the similar URL search function 224 acquires a connection destination URL similar to the connection destination URL included in the malware feature amount acquired in step S801 from the connection destination URLs acquired in step S803. Specifically, a distance function is defined for the connection destination URL included in the malware feature quantity and the connection destination URL included in the similar access feature quantity (session), and a pair whose distance is smaller than a preset threshold is similar. It is considered. As the distance function, “edit distance (also called Levenshtein distance)” that makes a character string close can be used.
The edit distance (Levenstein distance) is a numerical value indicating how different two character strings are. Specifically, it is given as the minimum number of steps required to transform one character string into another character string by inserting, deleting, or replacing characters.
In step S805, the similar URL search function 224 transmits the similar URL searched in step S804 to the regular expression generation function 242.

 以上に記載したように、本実施形態の不正アクセス検知システム100は、図2に示す通り、マルウェア解析サーバ123にユーザがマルウェアコピー(マルウェア検体)を投入することにより、又はマルウェア解析サーバ123がネットワーク101を介してマルウェアに感染したと判定した場合に、新たなURL正規表現の検知ルールを作成する処理が起動される。
  マルウェア解析サーバ123は、投入、又は感染したマルウェアのアクセスの挙動を解析して、アクセスの記録を蓄積する。検知ルール設定サーバ124は、マルウェアの解析結果(マルウェアが行ったアクセスの時系列の記録)からマルウェア特徴量を抽出して、ログ管理サーバ122が管理するローカルエリアネットワーク上のクライアント125の過去のアクセスログから類似のアクセス特徴量を検索することを指示する。ログ管理サーバ122は、過去のアクセスログからアクセス特徴量を適宜抽出して記憶しておいて、マルウェア特徴量と類似するアクセス特徴量からマルウェア特徴量が含む接続先URLと類似するアクセス特徴量が含む接続先URLを抽出して、類似URLとして検知ルール設定サーバ124へ報告する。検知ルール設定サーバ124は、マルウェア特徴量が含む接続先URLと類似URLとに基づいて、新たなURL正規表現を作成して、その新たなURL正規表現が検知ルールとして適当であるか否かをアクセスログに含まれるURLとの一致率を計算して判定する。算出した一致率が推奨値以下と判定した場合に、新たなURL正規表現を検知ルール管理情報記憶部243に保存して、そのURL正規表現をプロキシサーバ121の検知ルール記憶部212へ設定して、それ以後の悪性URL検知機能に使用される。
As described above, the unauthorized access detection system 100 according to the present embodiment, as shown in FIG. 2, allows the user to insert a malware copy (malware sample) into the malware analysis server 123, or the malware analysis server 123 is connected to the network. When it is determined that the computer is infected with malware via 101, a process for creating a new URL regular expression detection rule is started.
The malware analysis server 123 analyzes the access behavior of the input or infected malware, and accumulates access records. The detection rule setting server 124 extracts the malware feature amount from the malware analysis result (the time series of the access performed by the malware), and the past access of the client 125 on the local area network managed by the log management server 122. Instruct to search similar access feature quantity from log. The log management server 122 appropriately extracts and stores the access feature amount from the past access log, and the access feature amount similar to the connection destination URL included in the malware feature amount from the access feature amount similar to the malware feature amount. The included connection destination URL is extracted and reported to the detection rule setting server 124 as a similar URL. The detection rule setting server 124 creates a new URL regular expression based on the connection destination URL and the similar URL included in the malware feature, and determines whether or not the new URL regular expression is appropriate as the detection rule. Judgment is made by calculating the matching rate with the URL included in the access log. When it is determined that the calculated matching rate is equal to or less than the recommended value, a new URL regular expression is stored in the detection rule management information storage unit 243, and the URL regular expression is set in the detection rule storage unit 212 of the proxy server 121. , Used for the subsequent malicious URL detection function.

 本実施形態の不正アクセス検知システム100は、プロキシサーバ121、ログ管理サーバ122、マルウェア解析サーバ123、検知ルール設定サーバ124に分けられて構成される例を示した。しかし、これらの各サーバのうちいずれかの複数のサーバが同一のサーバ上に構成される例も考えられる。また、例えばプロキシサーバ121などが複数のサーバの分散処理により構成される例も考えられる。また、全てのサーバが同一のサーバ上に構成される例も考えられる。 In the present embodiment, the unauthorized access detection system 100 is divided into a proxy server 121, a log management server 122, a malware analysis server 123, and a detection rule setting server 124. However, an example in which any one of these servers is configured on the same server is also conceivable. Further, for example, an example in which the proxy server 121 or the like is configured by distributed processing of a plurality of servers can be considered. An example in which all servers are configured on the same server is also conceivable.

100:不正アクセス検知システム、101:ネットワーク、110:インターネット、111:攻撃者サーバ、120:ローカルエリアネットワーク、121:プロキシサーバ、122:ログ管理サーバ、123:マルウェア解析サーバ、124:検知ルール設定サーバ、125:クライアント、130:ファイアウォール、211:悪性URL検知機能、212:検知ルール、213:アクセスログ、221:アクセスログ、222:アクセス特徴量抽出機能、233:アクセス特徴量、224:類似URL検索機能、231:マルウェア解析機能、241:マルウェア特徴量抽出機能、242:正規表現生成機能、243:検知ルール管理情報、244:検知ルール設定機能、300:マルウェア特徴量
 
100: Unauthorized access detection system 101: Network 110: Internet 111: Attacker server 120: Local area network 121: Proxy server 122: Log management server 123: Malware analysis server 124: Detection rule setting server , 125: client, 130: firewall, 211: malicious URL detection function, 212: detection rule, 213: access log, 221: access log, 222: access feature amount extraction function, 233: access feature amount, 224: similar URL search Function, 231: malware analysis function, 241: malware feature extraction function, 242: regular expression generation function, 243: detection rule management information, 244: detection rule setting function, 300: malware feature

Claims (6)

 マルウェア解析結果から得られるマルウェアのアクセス挙動のトレースから不正アクセス検知のためのURL正規表現を生成して検知ルールを更新する不正アクセスの検知方法であって、
 解析者が採集した、またはネットワーク上から感染した新たなマルウェアのアクセス挙動のトレース解析からマルウェア特徴量を抽出するステップと、
 過去のネットワーク上のアクセスログからアクセス特徴量を随時抽出して、記録したアクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似URLを検索するステップと、
 マルウェア特徴量の接続先URL、および前記検索された類似URLよりURL正規表現を生成するステップと、
 前記URL正規表現を前記アクセスログに含まれる接続先URLとのパターンマッチングに適用して、その一致率を計算し、該一致率が推奨値以下を満たした場合に、前記URL正規表現を新たな検知ルールに設定するステップと、
を有することを特徴とする不正アクセス検知方法。
A method for detecting unauthorized access by generating a URL regular expression for detecting unauthorized access from a trace of malware access behavior obtained from a malware analysis result, and updating a detection rule,
Extracting malware features from trace analysis of access behavior of new malware collected by analysts or infected from the network;
Extracting an access feature amount from the access log on the past network as needed, and searching the recorded access feature amount storage unit for a similar URL satisfying a distance within a predetermined threshold using the malware feature amount as a query;
Generating a URL regular expression from the connection destination URL of the malware feature quantity and the searched similar URL;
The URL regular expression is applied to pattern matching with a connection destination URL included in the access log, and the matching rate is calculated. When the matching rate satisfies a recommended value or less, the URL regular expression is updated to a new one. The steps to set in the detection rule;
A method for detecting unauthorized access, comprising:
 前記アクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似URLを検索するステップは、
 前記マルウェア特徴量の接続先URL以外の特徴量と対応するアクセス特徴量の接続先URL以外の特徴量との間に定義した距離関数値が所定の閾値より小さくなる場合に、類似アクセス特徴量と判定する第1のステップと、
 前記マルウェア特徴量の接続先URLと前記類似アクセス特徴量に含まれる接続先URLとの間に定義した文字列間の相違の距離関数値が所定の閾値より小さくなる場合に、類似URLと判定して検索する第2のステップとよりなることを特徴とする請求項1に記載の不正アクセス検知方法。
The step of searching for a similar URL satisfying a distance within a predetermined threshold from the access feature amount storage unit using a malware feature amount as a query,
When the distance function value defined between the feature amount other than the connection destination URL of the malware feature amount and the feature amount other than the connection destination URL of the corresponding access feature amount is smaller than a predetermined threshold, A first step of determining;
When the distance function value of the difference between the character strings defined between the connection destination URL of the malware feature amount and the connection destination URL included in the similar access feature amount is smaller than a predetermined threshold, it is determined as a similar URL. The unauthorized access detection method according to claim 1, further comprising: a second step of performing a search.
 前記マルウェア特徴量、及び前記アクセス特徴量は、少なくとも接続先URL、平均パケットサイズ、アクセス時間間隔、User-Agent、及びPost回数の特徴量データ項目を有することを特徴とする請求項2に記載の不正アクセス検知方法。 The said malware feature-value and the said access feature-value have the feature-value data item of a connection destination URL, an average packet size, an access time interval, User-Agent, and the frequency | count of Post at least. Unauthorized access detection method.  インターネットに接続するネットワークに接続された複数のサーバ上に構成された不正アクセス検知システムであって、
 クライアントに感染した、または採集した新たなマルウェアを仮想的試験環境で実行して、マルウェアのアクセス挙動のトレースを生成するマルウェア解析機能と、
 前記マルウェアのアクセス挙動のトレースより、マルウェア特徴量を抽出するマルウェア特徴量抽出機能と、
 クライアントの過去のアクセスログを記憶管理して、アクセスログよりアクセス特徴量を適宜抽出してアクセス特徴量記憶部へ記憶するアクセス特徴量抽出機能と、
 前記アクセス特徴量記憶部から、マルウェア特徴量をクエリとして所定の閾値内の距離を満たす類似URLを検索する類似URL検索機能と、
 マルウェア特徴量の接続先URL、および前記検索された類似URLよりURL正規表現を生成し、前記URL正規表現を前記アクセスログに含まれる接続先URLとのパターンマッチングに適用して、その一致率を計算し、該一致率が推奨値以下を満たした場合に、前記URL正規表現を新たな検知ルールに加える正規表現生成機能と、
 前記URL正規表現を加えて更新した検知ルールを、アクセス対象のURLに適用して、不正アクセスか否かを判定する悪性URL検知機能と、
を備えたことを特徴とする不正アクセス検知システム。
An unauthorized access detection system configured on a plurality of servers connected to a network connected to the Internet,
A malware analysis function that generates new traces of malware access behavior by running new malware infected or collected in a virtual test environment;
Malware feature amount extraction function for extracting malware feature amount from the trace of malware access behavior;
An access feature amount extraction function for storing and managing a client's past access log, appropriately extracting an access feature amount from the access log, and storing it in an access feature amount storage unit;
A similar URL search function for searching for similar URLs satisfying a distance within a predetermined threshold from the access feature value storage unit using a malware feature value as a query;
A URL regular expression is generated from the connection destination URL of the malware feature amount and the searched similar URL, and the URL regular expression is applied to pattern matching with the connection destination URL included in the access log, and the matching rate is calculated. A regular expression generation function for calculating and adding the URL regular expression to a new detection rule when the matching rate satisfies a recommended value or less;
Applying the detection rule updated by adding the URL regular expression to the URL to be accessed to determine whether or not it is unauthorized access;
An unauthorized access detection system comprising:
 前記類似URL検索機能は、前記アクセス特徴量記憶部から、前記マルウェア特徴量の接続先URL以外の特徴量と対応するアクセス特徴量の接続先URL以外の特徴量との間に定義した距離関数値が所定の閾値より小さくなる場合に、類似アクセス特徴量と判定して検索し、
 前記マルウェア特徴量の接続先URLと前記類似アクセス特徴量に含まれる接続先URLとの間に定義した文字列間の相違の距離関数値が所定の閾値より小さくなる場合に、類似URLと判定して検索することを特徴とする請求項4に記載の不正アクセス検知システム。
The similar URL search function is a distance function value defined between the feature quantity other than the connection destination URL of the malware feature quantity and the feature quantity other than the connection destination URL of the corresponding access feature quantity from the access feature quantity storage unit. If the value is smaller than a predetermined threshold value, it is determined as a similar access feature amount and searched.
When the distance function value of the difference between the character strings defined between the connection destination URL of the malware feature amount and the connection destination URL included in the similar access feature amount is smaller than a predetermined threshold, it is determined as a similar URL. The unauthorized access detection system according to claim 4, wherein search is performed.
 前記マルウェア特徴量、及び前記アクセス特徴量は、少なくとも接続先URL、平均パケットサイズ、アクセス時間間隔、User-Agent、及びPost回数の特徴量データ項目を有することを特徴とする請求項5に記載の不正アクセス検知システム。 6. The malware feature amount and the access feature amount include feature amount data items of at least a connection destination URL, an average packet size, an access time interval, a User-Agent, and a Post count. Unauthorized access detection system.
PCT/JP2014/052288 2014-01-31 2014-01-31 Unauthorized-access detection method and detection system Ceased WO2015114804A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2014/052288 WO2015114804A1 (en) 2014-01-31 2014-01-31 Unauthorized-access detection method and detection system
JP2015559696A JP6039826B2 (en) 2014-01-31 2014-01-31 Unauthorized access detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/052288 WO2015114804A1 (en) 2014-01-31 2014-01-31 Unauthorized-access detection method and detection system

Publications (1)

Publication Number Publication Date
WO2015114804A1 true WO2015114804A1 (en) 2015-08-06

Family

ID=53756416

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/052288 Ceased WO2015114804A1 (en) 2014-01-31 2014-01-31 Unauthorized-access detection method and detection system

Country Status (2)

Country Link
JP (1) JP6039826B2 (en)
WO (1) WO2015114804A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017146670A (en) * 2016-02-15 2017-08-24 Necプラットフォームズ株式会社 Router device and router device filtering method
JP2018132787A (en) * 2017-02-13 2018-08-23 株式会社日立ソリューションズ Log analysis support apparatus and log analysis support method
JP2018525717A (en) * 2016-01-12 2018-09-06 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 Search processing method and device
WO2018159337A1 (en) * 2017-03-03 2018-09-07 日本電信電話株式会社 Profile generation device, attack detection apparatus, profile generation method, and profile generation program
JP2018142927A (en) * 2017-02-28 2018-09-13 沖電気工業株式会社 System and method for addressing malware unauthorized communication
JP2019047335A (en) * 2017-09-01 2019-03-22 日本電信電話株式会社 Detector, detection method, and detection program
WO2019225251A1 (en) * 2018-05-21 2019-11-28 日本電信電話株式会社 Learning method, learning device and learning program
JPWO2021106172A1 (en) * 2019-11-28 2021-06-03

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090265786A1 (en) * 2008-04-17 2009-10-22 Microsoft Corporation Automatic botnet spam signature generation
JP2012118713A (en) * 2010-11-30 2012-06-21 Nippon Telegr & Teleph Corp <Ntt> List generation method, list generation device, and list generation program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2638205C (en) * 2007-07-26 2015-06-16 Magna International Inc. Truck box with external storage structural frame

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090265786A1 (en) * 2008-04-17 2009-10-22 Microsoft Corporation Automatic botnet spam signature generation
JP2012118713A (en) * 2010-11-30 2012-06-21 Nippon Telegr & Teleph Corp <Ntt> List generation method, list generation device, and list generation program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAKEO HARIU: "Shinka suru Kyoi to Korekara no Cyber Security", NTT GIJUTSU JOURNAL, vol. 24, no. 8, 1 August 2012 (2012-08-01), pages 13 - 17 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018525717A (en) * 2016-01-12 2018-09-06 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 Search processing method and device
JP2017146670A (en) * 2016-02-15 2017-08-24 Necプラットフォームズ株式会社 Router device and router device filtering method
JP2018132787A (en) * 2017-02-13 2018-08-23 株式会社日立ソリューションズ Log analysis support apparatus and log analysis support method
JP2018142927A (en) * 2017-02-28 2018-09-13 沖電気工業株式会社 System and method for addressing malware unauthorized communication
JPWO2018159337A1 (en) * 2017-03-03 2019-06-27 日本電信電話株式会社 Profile generation device, attack detection device, profile generation method, and profile generation program
WO2018159337A1 (en) * 2017-03-03 2018-09-07 日本電信電話株式会社 Profile generation device, attack detection apparatus, profile generation method, and profile generation program
US11470097B2 (en) 2017-03-03 2022-10-11 Nippon Telegraph And Telephone Corporation Profile generation device, attack detection device, profile generation method, and profile generation computer program
JP2019047335A (en) * 2017-09-01 2019-03-22 日本電信電話株式会社 Detector, detection method, and detection program
WO2019225251A1 (en) * 2018-05-21 2019-11-28 日本電信電話株式会社 Learning method, learning device and learning program
JPWO2019225251A1 (en) * 2018-05-21 2020-12-10 日本電信電話株式会社 Learning methods, learning devices and learning programs
JPWO2021106172A1 (en) * 2019-11-28 2021-06-03
JP7315023B2 (en) 2019-11-28 2023-07-26 日本電信電話株式会社 Rule generator and rule generator
US12282550B2 (en) 2019-11-28 2025-04-22 Nippon Telegraph And Telephone Corporation Rule generating device and rule generating program

Also Published As

Publication number Publication date
JP6039826B2 (en) 2016-12-07
JPWO2015114804A1 (en) 2017-03-23

Similar Documents

Publication Publication Date Title
JP6039826B2 (en) Unauthorized access detection method and system
Song et al. Advanced evasion attacks and mitigations on practical ML‐based phishing website classifiers
US9300682B2 (en) Composite analysis of executable content across enterprise network
US9462009B1 (en) Detecting risky domains
CN108156131B (en) Webshell detection method, electronic device and computer storage medium
CN107645503B (en) A rule-based detection method for malicious domain names belonging to DGA family
US11030311B1 (en) Detecting and protecting against computing breaches based on lateral movement of a computer file within an enterprise
US8375450B1 (en) Zero day malware scanner
US10516671B2 (en) Black list generating device, black list generating system, method of generating black list, and program of generating black list
Wang et al. Machine learning based cross-site scripting detection in online social network
KR101080953B1 (en) Real-time web shell detection and defense system and method
CN105491053A (en) Web malicious code detection method and system
CN107547490B (en) Scanner identification method, device and system
JP6717206B2 (en) Anti-malware device, anti-malware system, anti-malware method, and anti-malware program
CN108337269B (en) A WebShell Detection Method
Stock et al. Kizzle: a signature compiler for detecting exploit kits
JP5656266B2 (en) Blacklist extraction apparatus, extraction method and extraction program
US20170318037A1 (en) Distributed anomaly management
CN107209834A (en) Malicious communication pattern extraction apparatus, malicious communication schema extraction system, malicious communication schema extraction method and malicious communication schema extraction program
CN111183620B (en) Intrusion investigation
Stock et al. Kizzle: A signature compiler for exploit kits
Wang et al. Improved N-gram approach for cross-site scripting detection in Online Social Network
US20200334353A1 (en) Method and system for detecting and classifying malware based on families
Munir et al. {PURL}: Safe and Effective Sanitization of Link Decoration
Li et al. LogKernel: A threat hunting approach based on behaviour provenance graph and graph kernel clustering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14880557

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2015559696

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14880557

Country of ref document: EP

Kind code of ref document: A1