[go: up one dir, main page]

WO2018105320A1 - Dispositif de traitement d'informations, procédé de traitement d'informations et programme - Google Patents

Dispositif de traitement d'informations, procédé de traitement d'informations et programme Download PDF

Info

Publication number
WO2018105320A1
WO2018105320A1 PCT/JP2017/040727 JP2017040727W WO2018105320A1 WO 2018105320 A1 WO2018105320 A1 WO 2018105320A1 JP 2017040727 W JP2017040727 W JP 2017040727W WO 2018105320 A1 WO2018105320 A1 WO 2018105320A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning dictionary
noise
training data
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2017/040727
Other languages
English (en)
Japanese (ja)
Inventor
良太 高橋
崇光 佐々木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Intellectual Property Corp of America
Original Assignee
Panasonic Intellectual Property Corp of America
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2017207085A external-priority patent/JP6782679B2/ja
Application filed by Panasonic Intellectual Property Corp of America filed Critical Panasonic Intellectual Property Corp of America
Priority to EP17877549.0A priority Critical patent/EP3553712B1/fr
Priority to CN201780022736.0A priority patent/CN109074519B/zh
Publication of WO2018105320A1 publication Critical patent/WO2018105320A1/fr
Priority to US16/255,877 priority patent/US10601852B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • This disclosure relates to an abnormality detection technology used in an in-vehicle network or the like.
  • the automobile is equipped with a large number of electronic control units (Electronic Control Units, hereinafter referred to as ECUs) for controlling various systems.
  • ECUs Electronic Control Units
  • the ECUs are connected to an in-vehicle network, and communication is performed through the in-vehicle network in order to realize various functions of the automobile.
  • CAN Controller Area Network
  • CAN Controller Area Network
  • a network conforming to the CAN protocol can be constructed as a closed communication path on a single vehicle.
  • each automobile it is not uncommon for each automobile to be built and installed as a network that can be accessed from the outside.
  • an in-vehicle network is provided with a port for taking out information flowing through the network for the purpose of diagnosis for each in-vehicle system, or a car navigation system having a function of providing a wireless LAN is connected. Allowing external access to the in-vehicle network can improve convenience for automobile users, but also increases threats.
  • An attack frame is an abnormal frame that differs in some way from a normal frame that flows through an in-vehicle network that is not attacked.
  • an abnormal data detection process for a frame (hereinafter also referred to as a CAN message or simply a message) flowing on a CAN bus is obtained as a result of learning using learning data.
  • a technique to be executed using an evaluation model is disclosed (see Patent Document 1 and Patent Document 2).
  • This disclosure provides an information processing apparatus and the like that are useful for detecting anomalies due to attacks in an in-vehicle network of a vehicle such as an automobile.
  • an information processing apparatus including a processor, and the processor uses N pieces of N (N is 2 or more) used as training data for Isolation Forest.
  • N is 2 or more
  • M-dimensional vector M is an integer equal to or larger than 2
  • a data element acquisition step that receives input of data elements
  • normalization that normalizes the training data to be distributed over the first area of M dimensions Step and M-dimensional second region larger than the first region and including the first region into a third region which is an LM hypercube of LM pieces (L is an integer of 4 or more) having the same size.
  • a division step for dividing and the number S of data elements included in each of the third regions (S is an integer of 0 or more) are obtained, and among the third regions, a first threshold T (T is a natural number)
  • a generation step of generating noise-added training data including a noise element and a learning dictionary data output step of generating and outputting Ilation Forest learning dictionary data using the noise-added training data are executed.
  • an information processing method is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest).
  • N is used as training data for Isolation Forest.
  • a data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension.
  • L is an integer of 4 or more
  • S is an integer greater than or equal to 0
  • T is a natural number
  • a generation step for generating noise-added training data and a learning dictionary data output step for generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.
  • an information processing apparatus and the like that can quickly provide a learning dictionary that is used for abnormality detection due to an attack in an in-vehicle network of a vehicle such as an automobile and that has a reduced false detection rate.
  • FIG. 1A is a block diagram illustrating a configuration example of an abnormality detection system including an information processing device according to Embodiment 1.
  • FIG. 1B is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1.
  • FIG. 1C is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according to Embodiment 1.
  • FIG. 2 is a block diagram illustrating a configuration example of an abnormality determination unit and a learning unit that configure the above-described abnormality detection system.
  • FIG. 3 is a schematic diagram for explaining a learning dictionary generated by the learning unit using training data.
  • FIG. 4 is a schematic diagram for explaining the abnormality determination by the abnormality determination unit.
  • FIG. 1A is a block diagram illustrating a configuration example of an abnormality detection system including an information processing device according to Embodiment 1.
  • FIG. 1B is a block diagram illustrating a configuration example of an abnormality detection system including the information processing apparatus according
  • FIG. 5 is a diagram showing a data flow in the learning unit that generates the learning dictionary.
  • FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit that performs abnormality determination.
  • FIG. 7 is a diagram illustrating an example of an inappropriate determination boundary that does not fit the distribution of training data.
  • FIG. 8 is a flowchart illustrating an example of a training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system.
  • FIG. 9A is a diagram illustrating an example of training data before normalization distributed in an M-dimensional space.
  • FIG. 9B is a diagram illustrating an example of training data after normalization distributed in an M-dimensional space.
  • FIG. 9C is a diagram illustrating an example of training data after addition of noise elements distributed in the M-dimensional space.
  • FIG. 10 is a flowchart showing another example of the training data processing method for obtaining an appropriate learning dictionary, which is executed in the abnormality detection system.
  • FIG. 11A is a diagram for explaining an example of division of an M-dimensional region in the M-dimensional space.
  • FIG. 11B is a diagram for describing an example of training data after adding noise elements distributed in an M-dimensional space.
  • FIG. 12A is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise.
  • FIG. 12B is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise.
  • FIG. 12A is a diagram illustrating a determination boundary of a learning dictionary generated using training data without adding noise and a determination boundary of a learning dictionary generated using the same training data added with noise.
  • FIG. 12B is a diagram illustrating a determination
  • FIG. 12C is a bar graph showing a false detection rate in an abnormality detection test performed using each learning dictionary whose determination boundaries are shown in FIGS. 12A and 12B.
  • FIG. 13 is a flowchart illustrating an example of a processing method for determining whether to select a training data processing method and whether to perform parameter search in each processing method, which is executed in the abnormality detection system according to the second embodiment. It is.
  • FIG. 14 is a flowchart illustrating an example of a processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment.
  • FIG. 15 is a flowchart illustrating another example of the processing method for obtaining a more appropriate learning dictionary, which is executed in the abnormality detection system according to the second embodiment.
  • the other is to monitor CAN messages flowing through the in-vehicle network.
  • This method can be realized by adding a monitoring ECU (node) to each vehicle, and is relatively easy to introduce.
  • the proposed method can be further classified into three types: a rule-based method, a method that uses the data transmission cycle, and a method that detects outliers in message contents using LOF (Local Outer Factor). It can be roughly divided.
  • the rule-based method and the method using the data transmission cycle can deal with a known attack pattern, but a method using LOF to detect an unknown attack pattern. Thus, detection based on the content of the message is necessary.
  • the ECU connected to the in-vehicle network does not always have sufficient data processing capacity and storage capacity, and even in such an execution environment, the speed required for a car traveling on a road at several tens of kilometers per hour is required. If it is not possible to detect, it is not practical.
  • the present inventors use an abnormality detection algorithm called Isolation Forest or iForest (see Non-Patent Document 1), which requires less retained data than LOF and requires a small amount of calculation, as an abnormality detection method for an in-vehicle network. I came up with it. Furthermore, the present inventors can execute anomaly detection at the required speed and with the highest possible accuracy even when executed with limited computer resources when using Isolation Forest. Propose technology to make Isolation Forest or iForest (see Non-Patent Document 1), which requires less retained data than LOF and requires a small amount of calculation, as an abnormality detection method for an in-vehicle network. I came up with it. Furthermore, the present inventors can execute anomaly detection at the required speed and with the highest possible accuracy even when executed with limited computer resources when using Isolation Forest. Propose technology to make
  • An information processing apparatus is an information processing apparatus including a processor, and the processor uses N (N is an integer of 2 or more) M-dimensional vectors (N is an integer of 2 or more) used as training data for Isolation Forest.
  • N is an integer of 2 or more
  • M-dimensional vectors N is an integer of 2 or more
  • a data element acquisition step that receives an input of a data element that is an integer greater than or equal to 2)
  • a normalization step that normalizes the training data to be distributed over the M-dimensional first region, and a step larger than the first region.
  • the number S of data elements included in S is acquired (S is an integer equal to or greater than 0), and each of the third areas including a number of data elements smaller than the first threshold T (T is a natural number) among the third areas.
  • a learning dictionary data output step of generating and outputting the learning dictionary data of the Isolation Forest using the additional training data is executed.
  • the processor executes a first determination step for determining whether N is equal to or greater than a predetermined second threshold value, and if it is determined in the first determination step that N is not equal to or greater than the second threshold value, the division step
  • the generation step and the learning dictionary data output step may be executed after executing the first noise addition step.
  • the processor determines that N is equal to or greater than the second threshold value in the first determination step
  • the noise element that is an M-dimensional vector of K pieces (K is a natural number smaller than N) is set in the second region.
  • the generation step and the learning dictionary data output step may be executed after executing the second noise addition step for adding at a different density.
  • the processor when the processor further determines that N is not equal to or greater than the second threshold value in the first determination step, the processor receives a test data in the Isolation Forest, and N is equal to or greater than a predetermined third threshold value.
  • a set of steps is executed a plurality of times using L having different values in the division step to output a plurality of learning dictionary data, and further, abnormality detection for the test data is executed using each of the plurality of learning dictionary data.
  • An evaluation step for evaluating each of the plurality of learning dictionary data based on the result of the abnormality detection, and an evaluation step A learning dictionary data selection step for selecting the best learning dictionary data from a plurality of learning dictionary data based on the result of the search, and if it is determined in the second determination step that N is greater than or equal to the third threshold,
  • the set may be executed once using L which is a predetermined value in the step.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor may determine the number of different values of L so as to have a negative correlation with the value of N.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor determines, as the value of the first threshold value T, any number smaller than the median number of data elements included in each of the third areas in the first area. May be.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor determines that N is equal to or greater than the second threshold value in the first determination step, the processor receives the test data for Isolation Forest test data, and N is equal to or greater than a predetermined fourth threshold value.
  • An evaluation step for evaluating each of a plurality of learning dictionary data, and a plurality of learning dictionaries based on the result of the evaluation step.
  • a learning dictionary data selection step for selecting the best learning dictionary data from the data, and if it is determined in the third determination step that N is equal to or greater than a fourth threshold, K is a predetermined value in the second noise addition step.
  • the set may be executed once using.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the processor may determine the number of different values of K so as to have a negative correlation with the value of N.
  • the learning dictionary can be generated at a speed suitable for the execution environment.
  • the first region is a region defined by a hypercube of [0, 1] M in an M-dimensional space
  • the second region is [ ⁇ 0.5, 1.5] in the M-dimensional space. It may be a region defined by a hypercube of M.
  • An abnormality detection system includes any one of the information processing apparatuses described above and a memory and a processor that store learning dictionary data output from the information processing apparatus, and is connected to a network.
  • the determination device, the processor includes an abnormality determination device that acquires data flowing through a network and executes an abnormality determination of the acquired data based on learning dictionary data stored in a memory.
  • abnormality detection is performed using a learning dictionary that is updated quickly in consideration of accuracy.
  • an information processing method is an information processing method executed using an information processing apparatus including a processor, and the processor uses N pieces (N is used as training data for Isolation Forest).
  • a data element acquisition step that receives an input of a data element that is an M-dimensional vector (M is an integer of 2 or more) and normalization so that the training data is distributed over the first area of the M dimension. Normalization step, and dividing the M-dimensional second area larger than the first area and including the first area into LM third areas (L is an integer of 4 or more) having the same size.
  • the step S and the number S of data elements included in each of the third regions are acquired, and the number of data elements smaller than the first threshold T (T is a natural number) among the third regions
  • a generation step of generating training data, and a learning dictionary data output step of generating and outputting Isolation Forest learning dictionary data using the noise-added training data are included.
  • a program according to an aspect of the present disclosure is a program that causes a processor included in a computer to execute the above information processing method.
  • FIGS. 1A to 1C are block diagrams respectively showing a configuration example of an abnormality detection system including an information processing apparatus according to Embodiment 1.
  • 1A to 1C show abnormality detection systems 100A, 100B, and 100C having different configurations, respectively.
  • the anomaly detection systems 100A to 100C are systems that detect an anomaly of data flowing through a network to be monitored using an algorithm called Isolation Forest, and each includes an anomaly determination unit 110 and a learning unit 120.
  • the abnormality determination unit 110 determines whether data flowing through the in-vehicle network 210 included in the vehicle 20 is normal or abnormal.
  • the vehicle 20 is an automobile, for example.
  • the in-vehicle network 210 is a network corresponding to, for example, a CAN standard, and includes a bus, a plurality of ECUs and diagnostic ports connected to the bus in each of the configuration examples of FIGS. 1A to 1C.
  • the plurality of ECUs include ECUs having different functions such as an ECU that collects and analyzes measurement data from various sensors, an ECU that controls an engine, an ECU that controls a brake, and an ECU that monitors a network.
  • the data flowing through the in-vehicle network 210 is message data flowing through the bus.
  • the learning unit 120 performs prior learning for the abnormality determination unit 110 to perform the above determination. More specifically, the learning unit 120 learns using the training data, and generates a learning dictionary that the abnormality determination unit 110 uses for the above determination.
  • the generated learning dictionary data (hereinafter also referred to as learning dictionary data) is stored, for example, in a storage device (not shown).
  • the abnormality determination unit 110 reads the learning dictionary from the storage device, and whether or not unknown data that is a target of normality or abnormality, that is, message data acquired from the in-vehicle network 210 deviates from the learning dictionary. Whether or not it is abnormal is determined based on whether or not it is abnormal. More specifically, the learning dictionary generated by the learning unit 120 includes a plurality of binary trees, and the abnormality determination unit 110 uses the average value of the scores calculated from the plurality of binary trees to determine whether the data is abnormal. Determine whether. In addition, this binary tree used in Isolation Forest is called Isolation Tree or iTree.
  • the abnormality determination unit 110 and the learning unit 120 are functional components provided by a processor that reads and executes a predetermined program. In each configuration example shown in FIGS. 1A to 1C, the locations of the processors that provide the functional components of these processors are different.
  • the learning unit 120 is provided by a processor and a memory included in the external server 10 that is a so-called server computer outside the vehicle 20.
  • the external server 10 is one example of the information processing apparatus in the present embodiment.
  • the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network.
  • the learning unit 120 also outputs Isolation Forest learning dictionary data generated using the training data, and provides it to the abnormality determination unit 110 of the vehicle 20 via the communication network.
  • the learning dictionary data is stored in a storage device such as a flash memory of a microcontroller included in a monitoring ECU for network monitoring connected to the in-vehicle network 210, and the abnormality determination unit 110 is operated by the processor of the microcontroller. Provided.
  • the abnormality determination unit 110 performs message abnormality determination on the message acquired from the bus using the learning dictionary data acquired from the learning dictionary data from the storage device.
  • learning dictionary data updated after shipment of the vehicle 20 can be provided to the abnormality determination unit 110.
  • both the abnormality determination unit 110 and the learning unit 120 are provided by a processor and a memory included in the external server 10 outside the vehicle 20.
  • Such an external server 10 is also an example of the information processing apparatus in the present embodiment.
  • the learning unit 120 acquires, for example, a message flowing through the in-vehicle network 210 as training data from the vehicle 20 via the communication network.
  • the learning unit 120 outputs the learning dictionary data of the Isolation Forest generated using the training data, but the output destination is not outside the external server 10, but a storage device (for example, a hard disk drive provided in the external server 10 (illustrated) None).
  • the abnormality determination is performed on the external server 10 instead of on the vehicle 20. That is, the message flowing through the in-vehicle network 210 is transmitted to the external server 10 via the communication network. This message received by the external server 10 is input to the abnormality determination unit 110.
  • Abnormality determination unit 110 acquires learning dictionary data from the storage device, performs abnormality determination of the message using the learning dictionary data, and transmits the result to vehicle 20 via the communication network.
  • the learning dictionary data used by the abnormality determination unit 110 in the external server 10 is updated as needed.
  • both the abnormality determination unit 110 and the learning unit 120 are provided by a microcontroller provided in a monitoring ECU that is connected to the in-vehicle network 210 of the vehicle 20 and monitors the in-vehicle network 210.
  • the monitoring ECU 10 is one example of the information processing apparatus in the present embodiment.
  • the learning unit 120 directly acquires and uses, for example, a message flowing through the in-vehicle network 210 as training data.
  • the learning unit 120 outputs the learning dictionary data of Isolation Forest generated using the training data, but the output destination is not outside the vehicle 20, but a storage device on the vehicle 20, for example, a flash memory in the monitoring ECU Etc. are stored in a storage device.
  • learning dictionary generation and abnormality determination are performed on the vehicle 20.
  • the learning unit 120 acquires message data flowing through the in-vehicle network 210 to which the monitoring ECU is connected, and uses it as training data to generate a learning dictionary.
  • the generated learning dictionary data is stored in the storage device of the monitoring ECU.
  • the abnormality determination unit 110 further acquires learning dictionary data from the storage device, and executes abnormality determination of the message using the learning dictionary data.
  • the learning dictionary data used by the abnormality determination unit 110 on the vehicle 20 can be updated.
  • each configuration shown in FIGS. 1A to 1C may be a configuration that can be dynamically changed on the vehicle 20 instead of a fixed configuration on the vehicle 20 after shipment. For example, depending on the communication speed between the vehicle 20 and the external server 10, the usage rate of the computer resources of the monitoring ECU, the remaining power amount when the vehicle 20 is an electric vehicle, or the operation of the driver, between these configurations Switching may be possible.
  • FIG. 2 is a block diagram illustrating a configuration example of the abnormality determination unit 110 and the learning unit 120 included in the abnormality detection system 100.
  • the learning unit 120 includes a training data receiving unit 122 and a learning dictionary generating unit 124.
  • the training data receiving unit 122 receives input of training data.
  • the training data here is two or more M-dimensional vectors, and M is an integer of 2 or more.
  • the value of each dimension is a value of each byte from the beginning of the payload of the CAN message having a maximum of 8 bytes, for example.
  • the learning unit 120 generates learning dictionary data using the training data received by the training data receiving unit 122, and outputs the learning dictionary data to a storage unit 112 of the abnormality determination unit 110 described later.
  • the data elements are point groups distributed in the M-dimensional space, each point is indicated by a white circle, and the learning dictionary is a boundary in the M-dimensional space and indicated by a thick solid line.
  • this boundary is also referred to as a determination boundary.
  • the determination boundary is a boundary line.
  • the abnormality determination unit 110 includes a storage unit 112, a determination target data reception unit 114, a determination target data conversion unit 116, and a determination execution unit 118.
  • the storage unit 112 stores the learning dictionary data output from the learning unit 120 as described above. In addition, data used for conversion of determination target data described later is also stored in the storage unit 112.
  • the determination target data receiving unit 114 acquires data that is a target of abnormality determination, that is, a CAN message from the in-vehicle network 210.
  • the determination target data conversion unit 116 converts the CAN message received by the determination target data reception unit 114 into a format for processing by the determination execution unit 118. In this conversion, for example, extraction of a determination target portion from the CAN message, normalization using the data for conversion of the determination target data, and the like are performed. The normalization will be described later.
  • the determination execution unit 118 determines whether the determination target data is normal or abnormal, that is, abnormality determination based on the learning dictionary stored as learning dictionary data in the storage unit 112.
  • FIG. 4 is a schematic diagram for explaining this abnormality determination.
  • two pieces of data, determination target data A and determination target data B, are shown in the M-dimensional space based on the values.
  • the determination execution unit 118 determines whether each data is normal or abnormal based on whether the data is positioned inside or outside the determination boundary of the learning dictionary, and outputs the result.
  • determination target data A located inside the determination boundary is determined to be normal
  • determination target data B positioned outside the determination boundary is determined to be abnormal.
  • the monitoring ECU including the abnormality determination unit 110 and the learning unit executes, for example, another program that receives the determination result as an input and outputs an error message to the bus, A command for restricting part or all of the functions of the ECU or shifting another ECU to a special operation mode corresponding to an abnormality is transmitted.
  • the notification of abnormality occurrence toward the driver of the vehicle 20 may be issued by display on the instrument panel or by voice.
  • information regarding the occurrence of an abnormality may be recorded in a log. This log is acquired and used, for example, by a mechanic of the vehicle 20 through a diagnostic port included in the in-vehicle network 210.
  • Each component of the abnormality determination unit 110 and the learning unit 120 executes a part of the Isolation Forest algorithm, and cooperates as described above to execute the entire Isolation Forest algorithm.
  • FIG. 5 is a diagram illustrating a data flow in the learning unit 120 that generates the learning dictionary.
  • FIG. 6 is a diagram illustrating a data flow in the abnormality determination unit 110 that performs abnormality determination. These diagrams are based on a sequence diagram showing the flow of data, and are also represented in a form that also serves as a flowchart showing the processing order in each unit.
  • the training data receiving unit 122 receives input and acquires training data (step S51). If the generation of the learning dictionary is performed before the vehicle 20 is shipped, the training data input source is, for example, a place in the storage device that is artificially specified or preset at this stage. Further, if the learning dictionary is generated after the vehicle 20 is shipped, for example, the vehicle-mounted network 210 to which the monitoring ECU including the learning unit 120 is connected.
  • the training dictionary generation unit 124 normalizes the input training data (step S52), and generates a learning dictionary by the method of Isolation Forest using the normalized training data (step S53).
  • Normalization refers to the original distribution range of the input training data in the M-dimensional space, maintaining the relative positional relationship of each training data, and the distribution range within a predetermined region in the same space. It is a calculation process that converts to pass.
  • the generated learning dictionary data is transferred to the abnormality determination unit 110 (step S54), and the abnormality determination unit 110 stores the learning dictionary data in the storage unit 112 (step S55).
  • the data used in the normalization calculation process is also passed from the learning unit 120 to the abnormality determination unit 110.
  • This data includes the maximum and minimum values of each component of the feature vector necessary for conversion.
  • normalization of unknown data that is a determination target is executed using this data.
  • the determination target data receiving unit 114 acquires data of a CAN message that is a target of abnormality determination from the in-vehicle network 210 (step S61). ).
  • the determination execution unit 118 reads the learning dictionary data stored in the storage unit 112 (step S62). Further, the determination target data conversion unit 116 reads data such as coefficients used for normalization of the training data from the storage unit 112, and normalizes the determination target data, that is, the acquired CAN message data, using this data. (Step S63). The determination execution unit 118 determines whether the normalized data is normal or abnormal based on the learning dictionary data (step S64).
  • the above is the outline of the abnormality detection process including the steps from the generation of the learning dictionary using the training data to the abnormality determination using this learning dictionary, which is executed in the abnormality detection system 100.
  • the Isolation Forest method for this abnormality detection, the load on the computer resources is reduced compared to the conventional case, and the processing can be executed at a higher speed.
  • FIG. 7 is an example of such an inappropriate determination boundary.
  • an erroneous determination is made in which it is determined that the abnormality is abnormal although it is actually normal.
  • data elements indicated by black circles are data elements determined to be abnormal data, and many of these are actually normal data elements.
  • overdetection such erroneous detection based on an erroneous determination that normal data is abnormal.
  • Such a learning dictionary that causes erroneous determination may occur when, for example, the amount of abnormal data included in the training data is insufficient.
  • the process performed in the abnormality detection system 100 in order to obtain a suitable learning dictionary also in such a case is demonstrated.
  • FIG. 8 is a flowchart showing a first processing method which is an example of a training data processing method for obtaining the appropriate learning dictionary described above.
  • the first processing method is executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest consisting of two or more M-dimensional vectors.
  • the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120.
  • the learning unit 120 reads parameters used for this processing (step S80). Details of the parameters will be described in the following steps.
  • the learning unit 120 acquires the number of data elements of the input training data (step S81).
  • the learning unit 120 determines the number of noise elements added to the training data based on the number of data elements (step S82).
  • the noise element is also an M-dimensional vector.
  • the parameter acquired in step S80 is used to determine the number of noise elements in step S82, and is a real number greater than 0 and less than 1, for example.
  • a value obtained by rounding the value obtained by multiplying the number of data elements acquired in step S81 by this parameter to an integer is used. That is, the number of noise elements is determined to be smaller than the number of data elements of training data.
  • FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane.
  • the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the [0, 1] 2 region in the two-dimensional plane.
  • Such a region is an example of the first region in the present embodiment.
  • the learning unit 120 adds the number of noise elements determined in step S82 over an M-dimensional space that is larger than the first area and includes the first area, that is, a two-dimensional plane area in this example.
  • FIG. 9C is an example of training data after addition of noise elements distributed in the M-dimensional space, and the noise elements are indicated by dotted outline circles distributed in the two-dimensional plane.
  • noise elements are added so as to be distributed over the region [ ⁇ 0.5, 1.5] 2.
  • Such a region is an example of the second region in the present embodiment.
  • step S84 As shown in FIG. 9C, as a result of the process of step S84, a smaller number of noise elements than the original training data data elements are added so as to be distributed over a wider area than the original training data distribution range. Therefore, the distribution density of the noise elements is lower than the distribution density of the data elements of the original training data. In addition, noise elements are added so as to have a uniform distribution as a whole in the above-described region.
  • the learning unit 120 generates noise-added training data including both an element that is an M-dimensional vector in the second region, that is, a training data element and a noise element that are both two-dimensional vectors ( Step S85).
  • the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S85, and outputs the learning dictionary data (step S86).
  • step S82 and step S84 are examples of the second noise addition step
  • step S85 is a generation step
  • step S86 is an example of the learning dictionary data output step in this embodiment.
  • the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.
  • the abnormality detection system 100 can perform abnormality detection with a reduced overdetection rate.
  • the number of noise elements that is smaller than the data elements of the original training data is determined by using a parameter that takes a real value greater than 0 and less than 1.
  • the method of determining the number of noise elements is not limited to this.
  • the number of noise elements may be obtained by subtracting a certain number from the number of data elements of training data.
  • the number of training data may be divided into a plurality of ranges, and a predetermined number of noise elements may be used for each range.
  • the correspondence between the number of training data and the number of noise elements is stored in the memory of the information processing apparatus, for example, included in a data table.
  • the first processing method has been described by taking an example in which the data elements of the training data are two-dimensional vectors, but the idea based on the first processing method can be generalized and applied to higher dimensional spaces.
  • the first processing method can also be applied to training data that is a vector of three or more dimensions. If the training data is an M-dimensional vector, the range of the first region is read as [0, 1] M, and the range of the second region is read as [ ⁇ 0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space, and the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.
  • FIG. 10 is a flowchart showing a second processing method as another example of the training data processing method for obtaining the appropriate learning dictionary described above.
  • the second processing method is also executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors.
  • the processing by the learning dictionary generation unit 124 may be described as the processing of the learning unit 120.
  • a case where the second processing method is also started from the initial state of the training data shown in FIG. 9A will be described as an example.
  • the description of the steps common to the first processing method may be simplified.
  • the learning unit 120 reads parameters used for this processing (step S100). Details of the parameters will be described in the following steps.
  • the learning unit 120 normalizes the input training data (step S101).
  • the content of this process is the same as that of the first processing method, and FIG. 9B shows an example of training data after normalization distributed on a two-dimensional plane. Further, the distribution range of the training data distributed as shown in FIG. 9A before normalization is converted so as to cover the area [0, 1] 2 on the two-dimensional plane. Such a region is an example of the first region in the present embodiment.
  • the learning unit 120 sets an M-dimensional space that is larger than the first area and includes the first area, that is, a second area that is a two-dimensional plane area in this example, and the second area is It is divided into third regions that are equal M-dimensional hypercubes (step S102).
  • FIG. 11A is a diagram for explaining the second region and the third region in the two-dimensional plane. In the example shown in FIG. 11A, is an area of [ ⁇ 0.5, 1.5] 2, and the third area is a sub-area obtained by dividing the second area into 64 areas.
  • the parameter acquired in step S100 is used to determine the number of third regions obtained by dividing the second region in step S102, and the value of this parameter is 8 in the example of FIG. 11A.
  • the number is 8 to the Mth power, that is, in this example, the number is squared to 64.
  • the learning unit 120 determines a first threshold value T (T is a natural number) that is a threshold value for the data elements of the training data in each third region (step S104). For example, the parameter acquired in step S100 is used to determine the first threshold T.
  • the parameters used in step S102 may be the same or different. If they are different, they may be calculated from the parameters used in step S102.
  • the number of data elements of training data included in any third region in the first region may be specified.
  • a specific rank may be indicated by a rank in which the number of data elements of training data included in the third region are arranged in the order of size.
  • the number of data elements of training data included in the third region of this specific rank is used as the first threshold value.
  • the order it may be indicated by the number from the minimum value or the maximum value, or by the order from the average value or the median value, whichever is larger or smaller.
  • the learning unit 120 determines whether or not it is necessary to add a noise element to each third region by using the above S and T, determines the number of noise elements to be added to each third region, and determines the noise element Execute the adding procedure.
  • the learning unit 120 confirms whether there is a third region in which the determination as to whether noise elements need to be added has been made (step S105). If there is one (YES in step S105), the learning unit 120 selects one from the third region. It is selected (step S106), and it is determined whether or not the number S of data elements of the training data in the third region is smaller than the first threshold T (step S107).
  • step S107 When the number S of data elements of the training data in the third area is smaller than the first threshold T (YES in step S107), the total number of data elements and noise elements in the third area is T (T -S) noise elements are added (step S108).
  • step S105 If the number S of data elements of the training data in the third area is equal to or greater than the first threshold T (NO in step S107), it is confirmed whether there is an unprocessed third area (step S105).
  • FIG. 11B is a diagram for describing an example of training data and noise elements distributed in a two-dimensional space in the case of NO in step S105.
  • the noise element is indicated by a dotted outline circle.
  • TS 3 noise elements are added.
  • TS 1 noise element is added. Since all other third regions in the first region have S of 9 or more, no noise element is added. Since other hatched third regions are outside the first region and do not include data elements of training data, nine noise elements are added thereto.
  • the noise element is a random number according to a uniform distribution in each third region.
  • the learning unit 120 generates the learning dictionary data for Isolation Forest using the noise-added training data generated in step S109, and outputs the learning dictionary data (step S110).
  • step S101 is a normalization step
  • step S102 is a division step
  • steps S103 to S108 are a first noise addition step
  • step S109 is a generation step
  • step S110 is a learning dictionary data output step. It is an example in embodiment.
  • the learning unit 120 does not use the training data normalized as in the past. Instead, the learning unit 120 generates a learning dictionary using a region obtained by adding noise to a region including the periphery of the distribution range of the normalized training data in the M-dimensional space.
  • the abnormality detection system 100 can perform abnormality detection with a reduced overdetection rate.
  • the number of noise elements to be added in the first region where the training data is distributed is determined according to the density of each subdivided region. Therefore, in the second processing method, the occurrence of an overcrowded place of data elements and noise elements that can occur in the first region in the first processing method is suppressed.
  • Isolation Forest the place where vector data is overcrowded in training data tends to be inside the judgment boundary. Therefore, if data elements and noise elements are likely to be overcrowded, there is an increased possibility of erroneous determination that even abnormal data is determined to be normal.
  • erroneous detection based on an erroneous determination that abnormal data is normal is also referred to as detection omission for the above-described overdetection.
  • the abnormality detection system 100 in which the abnormality determination of unknown data is performed based on the learning dictionary generated by executing the second processing method, the abnormality detection is performed while suppressing the occurrence of overdetection and also suppressing the possibility of detection omission. Can do.
  • the concept based on this processing method can be applied to a higher-dimensional space in general, and the second processing method is training data that is a vector of three or more dimensions. It can also be applied to.
  • the training data is an M-dimensional vector
  • the range of the first region is read as [0, 1] M
  • the range of the second region is read as [ ⁇ 0.5, 1.5] M. That is, the first region is an M-dimensional space region defined by the first hypercube that is a hypercube in the M-dimensional space
  • the second region is a hypercube that is larger than the first hypercube in the M-dimensional space. Is an area of an M-dimensional space defined by the second hypercube.
  • FIG. 12A and FIG. 12B show the determination boundary of the learning dictionary generated using the training data without adding noise, and the determination of the learning dictionary generated using the same training data added with noise by the above processing method. It is a figure which shows a boundary.
  • the training data 1 in FIG. 12A and the training data 2 in FIG. 12B are different types of data acquired from the same vehicle-mounted network. Comparing the training data 1 and the training data 2, the training data 1 has data elements distributed almost uniformly from the center to the periphery of the distribution, and the training data 2 has a sparse distribution of data elements at the periphery. It can be said that the training data 2 is more likely to contain outliers than the training data 1.
  • a circle indicates a data element of training data.
  • a solid line box represents a decision boundary of a learning dictionary generated using training data without adding noise
  • a broken line box represents a decision boundary of a learning dictionary generated using training data added with noise. Noise elements are not shown in each figure.
  • FIG. 12C shows the false detection rate in this abnormality detection test.
  • the left column of each training data is the false detection rate in the learning dictionary obtained without adding noise to the training data
  • the right column is the false detection rate in the learning dictionary obtained by adding noise to the training data.
  • the original training data that includes many normal data elements includes a small amount of data elements that deviate from the training data to some extent in a data space at a lower density than the original training data. It is added inside. This added data element is referred to as a noise element above. And in the abnormality detection system using the learning dictionary produced
  • the first processing method and the second processing method described in the first embodiment are differences in the algorithms of programs executed in the information processing apparatus in order to realize each, for example, by switching the program read by a certain processor It can be selectively executed.
  • the time required for adding a noise element is greater in dependence on the number of training data in the second processing method than in the first processing method, and it takes longer as training data increases. That is, the processing load on the processor is larger in the second processing method.
  • the detection accuracy (low false detection rate) in the generated learning dictionary is improved as compared with the conventional method as described above, but the second processing method is superior.
  • the second processing method is always executed in the abnormality detection system.
  • the difference in processing load as described above is unlikely to be a problem because the abnormality detection system 100A in FIG. 1A or the abnormality detection system 100B in FIG.
  • the configuration such as the abnormality detection system 100C in FIG. 1C it is assumed that there is a limit to computer resources such as the processor operation speed. That is, in a traveling vehicle, there is a possibility that the learning dictionary cannot be generated or updated at a necessary speed by the second processing method.
  • the parameter used to determine the number of noise elements can take a real number larger than 0 and smaller than 1. However, it is difficult to predict in advance which value in this range will generate a learning dictionary that is more suitable for anomaly detection. In order to know this, for example, a plurality of learning dictionaries generated by changing parameter values may be used. Compare the accuracy of anomaly detection performed on test data. However, as a matter of course, if a comparison is made for searching for such an optimum parameter, it takes more time until a learning dictionary used for abnormality detection is determined. If the learning dictionary is determined slowly, the abnormality detection cannot be executed until the learning dictionary is determined or is performed using the old learning dictionary.
  • the former is, for example, divided into two or more in the first area in each dimension, and more than one third area on both sides outside the first area.
  • L can take an integer value of four or more. If the latter is a value used for specifying any one of the third regions in the second region, for example, it can take a real value that is 1 or more and less than or equal to the number of the third regions in the second region.
  • a learning dictionary capable of detecting an abnormality with higher accuracy may be obtained.
  • a learning dictionary used for detecting an abnormality is determined. It takes more time to complete. Therefore, the execution of abnormality detection is delayed or accuracy is sacrificed.
  • the inventors have selected whether to select a training data processing method or perform parameter search in order to cause the abnormality detection system to perform abnormality detection at the required speed and with the highest possible accuracy. I came up with a method to make a quick decision on the anomaly detection system.
  • FIG. 13 is a flowchart illustrating an example of a processing method for determining whether or not to perform training data selection and parameter search in each processing method, which is executed in the abnormality detection system 100.
  • This processing method includes a step executed by the learning dictionary generation unit 124 in the learning unit 120 after receiving the training data of Isolation Forest composed of two or more M-dimensional vectors.
  • the processing by the learning dictionary generation unit 124 will be described as the processing of the learning unit 120.
  • the following description may be made as processing by the abnormality determination unit 110.
  • training data receiving unit 122 has already received training data in the initial state.
  • the learning unit 120 acquires the number N of data elements of training data (step S130).
  • the learning unit 120 determines whether N is greater than or equal to a predetermined second threshold (step S131).
  • the second threshold value is a threshold value used for determining whether to use the first processing method or the second processing method as the training data processing method. For example, the computing ability of the processor that implements the learning unit 120, etc. It is determined by available computer resources and stored in the memory of the information processing apparatus. By using a predetermined threshold in this way, a quick determination can be made.
  • the learning unit 120 selects a first processing method that can be completed in a shorter time (step S132).
  • the learning unit 120 selects a second processing method that provides a learning dictionary capable of detecting an abnormality with higher accuracy. (Step S133).
  • the learning unit 120 determines whether N is greater than or equal to a predetermined third threshold (step S134).
  • the third threshold value is a threshold value used for determining whether or not to search for a parameter when executing each processing method of training data.
  • the third threshold is determined by available computer resources such as the computing capability of the processor that implements the learning unit 120 and stored in the memory of the information processing apparatus.
  • the second threshold value may be related or may be a value independent of each other.
  • the learning unit 120 determines not to execute the parameter search so that it can be completed in a shorter time (step S135).
  • the learning unit 120 When it is determined that N is not equal to or greater than the third threshold, that is, when the number of data elements of the training data is small, the learning unit 120 performs a parameter search for obtaining a learning dictionary capable of detecting an abnormality with higher accuracy (Ste S136).
  • step S137 When generating and outputting learning dictionary data (step S137) through step S132 and step S135, the learning unit 120 executes the first processing method shown in the flowchart of FIG.
  • step S137 When the learning dictionary data is generated and output (step S137) through step S133 and step S135, the learning unit 120 executes the second processing method shown in the flowchart of FIG.
  • FIG. 14 is a flowchart of the first processing method including parameter search, which is executed in the abnormality detection system 100.
  • steps common to the first processing method shown in the flowchart of FIG. 8 are denoted by common reference numerals, and detailed description thereof is omitted.
  • the learning unit 120 executes a set of steps S82 and S84 to S86 a plurality of times by exchanging parameter values.
  • a plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S83 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.
  • the anomaly judgment unit 110 has acquired data for Isolation Forest testing. This test data is input, for example, into the abnormality determination unit 110 in advance and stored in the storage unit 112. If it is determined in step S131 that N is not equal to or greater than the second threshold value, the abnormality determination unit 110 performs this test data. Is read from the storage unit 112 and acquired. Then, the abnormality determination unit 110 normalizes the test data using the data used for normalization in step S83, and executes abnormality determination for the test data using each learning dictionary data (step S140).
  • the learning unit 120 evaluates the abnormality determination using each learning dictionary data performed in step S140, and selects the best learning dictionary data as learning dictionary data used for actual abnormality detection based on the evaluation result. (Step S141). For this evaluation, for example, a known evaluation scale such as a recall and F value can be used. Note that step S141 may be performed by the abnormality determination unit 110.
  • step S82 and step S84 are examples of the second noise addition step
  • step S85 is a generation step
  • step S86 is an example of the learning dictionary data output step in this embodiment.
  • step S131 is an example in the present embodiment of the first determination step
  • step S134 is the second determination step.
  • Steps S140 and S141 are examples in the present embodiment corresponding to the test data acquisition step, the evaluation step, and the learning dictionary data selection step.
  • One difference from the case where the first processing method is executed through steps S132 and S135 is that the set of steps S82 and S84 to S86 is performed once before learning dictionary data used for abnormality detection is output. Is only executed or multiple times. Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.
  • FIG. 15 is a flowchart of the second processing method including parameter search, which is executed in the abnormality detection system 100.
  • steps common to the second processing method shown in the flowchart of FIG. 10 are denoted by common reference numerals, and detailed description thereof is omitted.
  • the learning unit 120 executes a set of steps S102 to S110 a plurality of times by exchanging combinations of two types of parameter values.
  • a plurality of learning dictionary data generated and output as a result are stored in the storage unit 112 of the abnormality determination unit 110. Further, from the learning unit 120, the data used for normalization in step S101 is also provided to the abnormality determination unit 110 and stored in the storage unit 112.
  • steps S150 and S151 are the same as those of steps S140 and S141, respectively.
  • step S102 is an example in the present embodiment of the division step
  • steps S103 to S108 are the first noise addition step
  • step S109 is the generation step
  • step S110 is the learning dictionary data output step.
  • step S131 is an example in the present embodiment of the first determination step
  • step S134 is the second determination step.
  • Steps S150 and S151 are examples in the present embodiment corresponding to a test data acquisition step, an evaluation step, and a learning dictionary data selection step.
  • the set of steps S102 to S110 is executed only once until learning dictionary data used for abnormality detection is output. Whether it is executed or executed multiple times.
  • Another difference is that a plurality of learning dictionary data is evaluated using test data, and the best learning dictionary data is selected as learning dictionary data used for abnormality detection based on the result of the evaluation.
  • the time cost has the largest time when the second processing method is executed including parameter search.
  • the time cost is large when the first processing method is executed including parameter search.
  • the time costs of the remaining two patterns are significantly smaller.
  • the second threshold value and the third threshold value may be independent values, but may be determined in consideration of the magnitude relationship of this time cost.
  • step S134 the threshold value used in step S134 is switched according to the determination result in step S131, that is, depending on whether the first processing method or the second processing method is used for adding noise. Also good. For example, when the second processing method is used, the third threshold is used, and when the first processing method is used, a fourth threshold that is another predetermined threshold is used instead of the third threshold. Also good. Step S134 in the case where the fourth threshold is used in this way is an example of the third determination step in the present embodiment.
  • two determinations are made, that is, the determination of the noise addition processing method and the determination of whether or not to execute the parameter search for each processing method. Both are not mandatory.
  • the time cost may be adjusted by only one of these determinations.
  • the parameters to be replaced for the search may be changed in stages. That is, as the number of data elements of training data increases, the number of parameters to be replaced may be reduced.
  • the number of parameters may be a value calculated from the number of data elements, or may be a value determined in advance for each predetermined range of data elements. That is, it is sufficient that there is a negative correlation between the number of data elements of training data and the number of parameters. Thereby, when there are many data elements of training data, the increase in the load of calculation processing can be suppressed so that the time required to determine learning dictionary data does not become too long.
  • the first processing method is executed for training data processing or the second processing method is determined according to the comparison result of the number N of data elements of training data with the second threshold value.
  • execution is selected, it is not limited to this.
  • the option may be two options of executing either the first processing method or the second processing method and not executing the training data processing.
  • Embodiments 1 and 2 have been described as examples of the technology according to the present disclosure.
  • the technology according to the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are appropriately performed.
  • the following modifications are also included in one embodiment of the present disclosure.
  • the system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip.
  • the system LSI is a computer system including a microprocessor, a ROM, a RAM, and the like. . A computer program is recorded in this RAM. Further, the system LSI achieves its functions by the microprocessor operating according to the computer program recorded in the RAM.
  • each part of the constituent elements constituting each of the above devices may be individually made into one chip, or may be made into one chip so as to include a part or all of them.
  • the system LSI is used here, it may be called IC, LSI, super LSI, or ultra LSI depending on the degree of integration.
  • the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible.
  • An FPGA Field Programmable Gate Array
  • a reconfigurable processor that can reconfigure the connection and setting of the circuit cells inside the LSI may be used.
  • integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied as a possibility.
  • each of the above devices may be constituted by an IC card or a single module that can be attached to and detached from each device.
  • This IC card or module is a computer system including a microprocessor, ROM, RAM, and the like. Further, this IC card or module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
  • each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component.
  • Each component may be realized by a program executor such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
  • the software that realizes the information processing apparatus of the above-described embodiment is a program as follows.
  • this program allows the computer to receive input of N data elements (M is an integer equal to or greater than 2) M-dimensional vectors (M is an integer equal to or greater than 2) used as training data for Isolation Forest.
  • An element acquisition step a normalization step for normalizing the training data to be distributed over the M-dimensional first region, and an M-dimensional second region larger than the first region and including the first region.
  • a first noise addition step for adding noise elements in a uniform distribution a generation step for generating noise addition training data including data elements and noise elements, and generation of learning dictionary data for Isolation Forest using the noise addition training data.
  • An information processing method including a learning dictionary data output step to be output is executed.
  • the present disclosure can be implemented as an information processing apparatus that generates learning dictionary data using training data and provides the learning dictionary data to an abnormality determination apparatus that performs abnormality determination, as described in the above embodiment. is there. Moreover, it is realizable also as an abnormality detection system provided with this information processing apparatus and abnormality determination apparatus.
  • This abnormality determination device is a monitoring ECU that realizes an abnormality determination unit connected to the in-vehicle network 210, for example, within the abnormality detection system configured as shown in FIG. 1A or 1C. Moreover, if it is in the abnormality detection system of the structure shown by FIG. 1BC, it is the external server 10 which implement
  • This network is typically an in-vehicle CAN network as described above, but is not limited thereto.
  • a network such as CAN-FD (CAN with Flexible Data rate), FlexRay, Ethernet, LIN (Local Interconnect Network), MOST (Media Oriented Systems Transport) may be used.
  • CAN-FD CAN with Flexible Data rate
  • FlexRay FlexRay
  • Ethernet LIN
  • LIN Local Interconnect Network
  • MOST Media Oriented Systems Transport
  • an in-vehicle network combining these networks as sub-networks with a CAN network may be used.
  • each component may be a circuit.
  • a plurality of components may constitute one circuit as a whole, or may constitute separate circuits.
  • Each circuit may be a general-purpose circuit or a dedicated circuit.
  • a process executed by a specific component may be executed by another component instead of the specific component.
  • the order of the plurality of processes may be changed, and the plurality of processes may be executed in parallel.
  • This disclosure can be used for an in-vehicle network system including an in-vehicle network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

L'invention concerne un dispositif de traitement d'informations, comprenant un processeur. Le processeur : reçoit une entrée d'éléments de données qui sont au moins deux vecteurs qui sont utilisés en tant que données d'apprentissage; normalise les données d'apprentissage de façon aux distribuer sur une première région; segmente une seconde région multidimensionnelle qui contient la première région en troisièmes régions qui sont des hypercubes de taille équivalente; acquiert le nombre S d'éléments de données que chacune des troisièmes régions comprend; ajoute dans une distribution uniforme, à chacune des troisièmes régions qui comprennent moins d'un premier seuil T des éléments de données, (T – S) des éléments de bruit qui sont des vecteurs; génère des données d'apprentissage à bruit ajouté qui comprennent les vecteurs dans la seconde région; et à l'aide des données d'apprentissage à bruit ajouté générées, génère et délivre des données de dictionnaire d'apprentissage de forêt d'Isolation.
PCT/JP2017/040727 2016-12-06 2017-11-13 Dispositif de traitement d'informations, procédé de traitement d'informations et programme Ceased WO2018105320A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP17877549.0A EP3553712B1 (fr) 2016-12-06 2017-11-13 Dispositif de traitement d'informations, procédé de traitement d'informations et programme
CN201780022736.0A CN109074519B (zh) 2016-12-06 2017-11-13 信息处理装置、信息处理方法以及程序
US16/255,877 US10601852B2 (en) 2016-12-06 2019-01-24 Information processing device, information processing method, and recording medium storing program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662430570P 2016-12-06 2016-12-06
US62/430570 2016-12-06
JP2017207085A JP6782679B2 (ja) 2016-12-06 2017-10-26 情報処理装置、情報処理方法及びプログラム
JP2017-207085 2017-10-26

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/255,877 Continuation US10601852B2 (en) 2016-12-06 2019-01-24 Information processing device, information processing method, and recording medium storing program

Publications (1)

Publication Number Publication Date
WO2018105320A1 true WO2018105320A1 (fr) 2018-06-14

Family

ID=62491996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/040727 Ceased WO2018105320A1 (fr) 2016-12-06 2017-11-13 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Country Status (1)

Country Link
WO (1) WO2018105320A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345137A (zh) * 2018-10-22 2019-02-15 广东精点数据科技股份有限公司 一种基于农业大数据的异常值检测方法
CN109508738A (zh) * 2018-10-31 2019-03-22 北京国双科技有限公司 一种信息处理方法及相关设备
CN109948738A (zh) * 2019-04-11 2019-06-28 合肥工业大学 涂装烘干室的能耗异常检测方法、装置及系统
CN110243599A (zh) * 2019-07-02 2019-09-17 西南交通大学 多维离群列车动车组轴箱轴承温度异常状态监测方法
CN114019940A (zh) * 2020-03-02 2022-02-08 阿波罗智联(北京)科技有限公司 用于检测异常的方法和装置
WO2023127111A1 (fr) * 2021-12-28 2023-07-06 富士通株式会社 Procédé de génération, programme de génération et dispositif de traitement d'informations

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013084175A (ja) * 2011-10-12 2013-05-09 Sony Corp 情報処理装置、情報処理方法、及びプログラム
US20150078654A1 (en) * 2013-09-13 2015-03-19 Interra Systems, Inc. Visual Descriptors Based Video Quality Assessment Using Outlier Model
JP2016133895A (ja) * 2015-01-16 2016-07-25 キヤノン株式会社 情報処理装置、情報処理方法、及びプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013084175A (ja) * 2011-10-12 2013-05-09 Sony Corp 情報処理装置、情報処理方法、及びプログラム
US20150078654A1 (en) * 2013-09-13 2015-03-19 Interra Systems, Inc. Visual Descriptors Based Video Quality Assessment Using Outlier Model
JP2016133895A (ja) * 2015-01-16 2016-07-25 キヤノン株式会社 情報処理装置、情報処理方法、及びプログラム

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAGA, TOMOYUKI ET AL.: "Proposal of statistical abnormality detecting system for vehicle-mounted newtwork using Cloud", SCIS2016 SYMPOSIUM ON CRYPTOGRAPHY AND INFORMATION SECURITY, 19 January 2016 (2016-01-19), pages 1 - 8, XP009515155 *
LIU, FEI TONY ET AL.: "Isolation-Based Anomaly Detection", ACM TRANSACT IONS ON KNOWLEDGE DISCOVERY FROM DATA (TKDD), vol. 6, no. 1, 1 March 2012 (2012-03-01), pages 1 - 39, XP055492079 *
See also references of EP3553712A4 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345137A (zh) * 2018-10-22 2019-02-15 广东精点数据科技股份有限公司 一种基于农业大数据的异常值检测方法
CN109508738A (zh) * 2018-10-31 2019-03-22 北京国双科技有限公司 一种信息处理方法及相关设备
CN109948738A (zh) * 2019-04-11 2019-06-28 合肥工业大学 涂装烘干室的能耗异常检测方法、装置及系统
CN110243599A (zh) * 2019-07-02 2019-09-17 西南交通大学 多维离群列车动车组轴箱轴承温度异常状态监测方法
CN114019940A (zh) * 2020-03-02 2022-02-08 阿波罗智联(北京)科技有限公司 用于检测异常的方法和装置
CN114035544A (zh) * 2020-03-02 2022-02-11 阿波罗智联(北京)科技有限公司 用于检测异常的方法和装置
WO2023127111A1 (fr) * 2021-12-28 2023-07-06 富士通株式会社 Procédé de génération, programme de génération et dispositif de traitement d'informations
JPWO2023127111A1 (fr) * 2021-12-28 2023-07-06
JP7639946B2 (ja) 2021-12-28 2025-03-05 富士通株式会社 生成方法、生成プログラム、及び、情報処理装置

Similar Documents

Publication Publication Date Title
JP6782679B2 (ja) 情報処理装置、情報処理方法及びプログラム
WO2018105320A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
US12118080B2 (en) Anomaly detection for vehicular networks for intrusion and malfunction detection
EP4075726A1 (fr) Système multi-agents unifié pour la détection et l'isolement d'anomalies
US10986503B2 (en) Cooperative security in wireless sensor networks
CN112288723B (zh) 缺陷检测方法、装置、计算机设备及存储介质
CN109859054B (zh) 网络社团挖掘方法、装置、计算机设备及存储介质
US11595431B2 (en) Information processing apparatus, moving apparatus, and method
US20240411864A2 (en) System and method for detection of anomalous controller area network (can) messages
US11972334B2 (en) Method and apparatus for generating a combined isolation forest model for detecting anomalies in data
CN113328985A (zh) 一种被动物联网设备识别方法、系统、介质及设备
JP2021179935A (ja) 車両用異常検出装置及び車両用異常検出方法
JP6939898B2 (ja) ビットアサイン推定装置、ビットアサイン推定方法、プログラム
US20190370503A1 (en) Functional safety over trace-and-debug
US12177239B2 (en) Attack analyzer, attack analysis method and attack analysis program
CN112511294B (zh) 一种基于对抗式图神经网络结构的不可区分混淆器的设计方法
US11474889B2 (en) Log transmission controller
US9971676B2 (en) Systems and methods for state based test case generation for software validation
US20240039949A1 (en) Attack estimation verification device, attack estimation verification method, and storage medium storing attack estimation verification program
US20150113645A1 (en) System and method for operating point and box enumeration for interval bayesian detection
JP2022153081A (ja) 攻撃分析装置、攻撃分析方法、及び攻撃分析プログラム
US20240202336A1 (en) Method and system for incremental centroid clustering
CN119577560A (zh) 基于异常数据筛选的发动机故障智能诊断方法及装置
Bhattacharya CAN-ADS: Machine Learning-based Anomaly Detection System for CAN Bus Network in Agriculture Machinery
Wang et al. An Intrusion Detection System Based on the Double-Decision-Tree Method for In-Vehicle Network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17877549

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017877549

Country of ref document: EP

Effective date: 20190708