[go: up one dir, main page]

WO2020168756A1 - Cluster log feature extraction method, and apparatus, device and storage medium - Google Patents

Cluster log feature extraction method, and apparatus, device and storage medium Download PDF

Info

Publication number
WO2020168756A1
WO2020168756A1 PCT/CN2019/118288 CN2019118288W WO2020168756A1 WO 2020168756 A1 WO2020168756 A1 WO 2020168756A1 CN 2019118288 W CN2019118288 W CN 2019118288W WO 2020168756 A1 WO2020168756 A1 WO 2020168756A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
value
log
collected
log data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2019/118288
Other languages
French (fr)
Chinese (zh)
Inventor
吴超勇
陈仕财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Publication of WO2020168756A1 publication Critical patent/WO2020168756A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Definitions

  • This application relates to base frame operation and maintenance, and specifically to a method, device, equipment, and storage medium for extracting cluster log features.
  • cluster storage system nodes In the era of explosive growth of information, it has become a reality for file size and data scale to reach terabytes or even petabytes.
  • the number of cluster storage system nodes has reached the number of clusters with 64 nodes.
  • Managing such a large cluster system has become a severe problem for data centers. challenge. It is especially important to track the running status of cluster nodes in time and accurately locate node error information.
  • cluster storage system log management which can send system logs regularly or in real time to realize the centralized transmission of logs, but the logs are not analyzed and managed, and the entire cluster cannot be globally understood. The operating status of the storage system cannot quickly locate the error message.
  • this application provides a cluster log feature extraction method, which is applied to electronic equipment, including the following steps: collect the log of the server cluster through the flume client and send it to the Hbase database, where the flume client processes multiple Agent processes Corresponding to the log of each server in the collection server cluster, the Agent regularly collects the log data on the corresponding server and sends it to the Hbase database through the API interface; uses Hadoop to clean the log data and filter out the original data.
  • the original data is at least Including server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume; extract the feature value of the original data including mean value, effective value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index; Use the Pearson correlation coefficient to filter out the effective features, and perform the calculation of the Pearson correlation coefficient between the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If the correlation threshold is higher than the correlation threshold, it is considered valid data , Below the correlation threshold, it is considered invalid data and removed.
  • This application also provides a cluster log feature extraction device, including: a log collection module, a data cleaning module, a feature extraction module, and an effective feature screening module.
  • the log collection module is used to collect logs of the server cluster through the flume client and send it to Hbase database, where the flume client collects the logs of each server in the server cluster through multiple agent processes.
  • the agent regularly collects the log data on the corresponding server and sends it to the Hbase database through the API interface; the data cleaning module is used for Use Hadoop to clean the log data to filter out the original data.
  • the original data includes at least the server disk occupancy rate, memory usage rate, cpu occupancy rate, and the amount of business interface calls; the feature extraction module is used to perform average and effective Extraction of feature values of value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index; effective feature selection module uses Pearson correlation coefficient to filter out effective features: Pearson correlation between the extracted feature values and the original data The calculation of the coefficient is compared with the correlation threshold based on the calculated correlation coefficient. If the correlation coefficient is higher than the correlation threshold, it is valid data, and if the correlation threshold is lower, the data is invalid, and it will be eliminated.
  • the application also provides an electronic device, the electronic device comprising: a memory and a processor, the memory stores a cluster log feature extraction program, and the cluster log feature extraction program is executed by the processor to implement the following steps:
  • the flume client collects the logs of the server cluster and sends them to the Hbase database.
  • the flume client collects the logs of each server in the server cluster through multiple agent processes.
  • the agent regularly collects the log data on the corresponding server and passes the API
  • the interface is sent to the Hbase database; the log data is cleaned using Hadoop, and the original data is filtered out.
  • the original data includes at least the server disk occupancy rate, memory utilization rate, cpu occupancy rate, and business interface call volume; the original data includes average, Extraction of eigenvalues of effective value, peak value, square root amplitude, waveform index, impulse index, kurtosis index; use Pearson correlation coefficient to filter out effective features, and perform Pearson correlation coefficient calculation on the extracted eigenvalues and original data , According to the calculated correlation coefficient and the correlation threshold value, it is considered as valid data if it is higher than the correlation degree threshold value, and it is considered invalid data if it is lower than the correlation degree threshold value and removed.
  • the present application also provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a processor, Realize the above-mentioned cluster log feature extraction method.
  • This application can effectively filter out the effective information of the production data of each host in the server cluster, and extract the characteristic values of the production data from the effective information, which facilitates the failure prediction and classification of the production system and reduces the occurrence of production accidents.
  • FIG. 1 is a schematic flowchart of a cluster log feature extraction method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of the hardware architecture of an electronic device according to an embodiment of the present application.
  • Fig. 3 is a block diagram of a cluster log feature extraction program according to an embodiment of the present application.
  • FIG. 4 is a unit structure diagram of a log collection module of an embodiment of the present application.
  • FIG. 5 is a unit structure diagram of a feature extraction module of an embodiment of the present application.
  • FIG. 6 is a unit structure diagram of a data cleaning module according to an embodiment of the present application.
  • Figure 7 is a schematic diagram of Flume's Agent process reading data.
  • the cluster log feature extraction method of this embodiment includes the following steps:
  • the logs of the server cluster are collected by the flume (distributed mass log collection, aggregation and transmission system) client, and sent to the Hbase database server.
  • Flume takes the Agent process as the smallest independent operation unit, and an Agent process is a complete data collection tool.
  • the Agent includes components Source (data collection component), Channel (transit temporary storage), and Sink.
  • the source collects data from the server and passes it to the Channel.
  • the Channel saves the data passed by the Source component.
  • Event (data unit), Sink reads and removes the Event from the Channel, and passes the Event to the background.
  • Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.
  • Step S30 Use Hadoop (distributed system infrastructure) to perform data cleaning on the log data to filter out the original data, where the original data includes at least the server disk occupancy rate, the memory usage rate, the cpu occupancy rate, and the amount of service interface calls.
  • the original data includes at least the server disk occupancy rate, the memory usage rate, the cpu occupancy rate, and the amount of service interface calls.
  • step S50 the original data is extracted with feature values including average value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index.
  • Step S70 Use the Pearson correlation coefficient to filter out the effective features: perform Pearson correlation coefficient calculations on the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If it is higher than the correlation threshold, it is considered Data is valid, and data below the correlation threshold is considered invalid data and removed.
  • the Laida criterion is used to remove data with gross errors in data cleaning, including the following steps:
  • x 1 ,x 2 ...,x n calculate the arithmetic average And residual error Among them, x i is the log data collected by a single agent;
  • x b is a singular value with a gross error value, and the singular value is eliminated.
  • the median value means that the variable values x 1 , x 2 ..., x n are arranged in order of magnitude to form a sequence, and the variable value in the middle of the variable sequence is called the median value.
  • feature value extraction including mean value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index is performed on the original data, wherein,
  • the square root amplitude is calculated using the following formula:
  • the waveform index is calculated using the following formula:
  • the impulse index is calculated using the following formula:
  • the kurtosis index is calculated using the following formula:
  • x i is the log data collected by a single agent; N is the number of log data collection; Is the arithmetic mean of the collected log data; X rms is the effective value of the collected log data; X p is the peak value of the collected log data; X r is the square root amplitude of the collected log data; X ws is the collected log The waveform index of the data; X if is the impulse index of the collected log data; X kv is the kurtosis index of the collected log data.
  • the Pearson correlation coefficient is used to filter out the effective features. Specifically, the above feature values are calculated with the original data for the Pearson correlation coefficient, and the calculated correlation coefficient is compared with the correlation threshold. If it is higher than the correlation threshold, it is considered It is valid data. If it is lower than the correlation threshold, it is considered invalid data and needs to be eliminated so that valid data can be filtered out. For example, if the correlation threshold is 0.7, the correlation coefficient between the square root amplitude and the original data is 0.2, it indicates that the square root amplitude is invalid data, and the correlation coefficient between the kurtosis index and the original data is 0.85, and the kurtosis index is deemed valid data. Among them, the formula of Pearson's correlation coefficient is as follows:
  • x i is the value of data collected by a single agent
  • y i is a characteristic value extracted from the data collected by a single agent
  • N is the number of log data collections.
  • Flume includes multiple first-level agents and one second-level agent.
  • Each first-level agent collects log data from a server, and the log data collected by multiple first-level agents are collected.
  • the electronic device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • it can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the electronic device 2 at least includes, but is not limited to, a memory 21, a processor 22, and a network interface 23 that can be communicatively connected to each other through a system bus.
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM) ), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the electronic device 2, for example, a hard disk or a memory of the electronic device 2.
  • the memory 21 may also be an external storage device of the electronic device 2, for example, a plug-in hard disk equipped on the electronic device 2, a smart media card (SMC), a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the electronic device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the electronic device 2, such as the cluster log feature extraction program code.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 22 is generally used to control the overall operation of the electronic device 2, for example, perform data interaction or communication-related control and processing with the electronic device 2.
  • the processor 22 is configured to run the program code or process data stored in the memory 21, for example, run the cluster log feature extraction program.
  • the network interface 23 may include a wireless network interface or a wired network interface.
  • the network interface 23 is usually used to establish a communication connection between the electronic device 2 and other electronic devices.
  • the network interface 23 is used to connect the electronic device 2 with a push platform through a network, and establish a data transmission channel and a communication connection between the electronic device 2 and the push platform.
  • the network may be Intranet, Internet, Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G network , Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • the electronic device 2 may also include a display, and the display may also be called a display screen or a display unit.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) display, etc.
  • the display is used to display the information processed in the electronic device 2 and to display a visualized user interface.
  • FIG. 2 only shows the electronic device 2 with components 21-23, but it should be understood that it is not required to implement all of the illustrated components, and more or fewer components may be implemented instead.
  • the memory 21 containing a readable storage medium may include an operating system, a cluster log feature extraction program 50, and the like.
  • the processor 22 implements the following steps when executing the cluster log feature extraction program 50 in the memory 21:
  • step S10 the logs of the server cluster are collected by the flume (distributed mass log collection, aggregation and transmission system) client, and sent to the Hbase database server.
  • Flume takes the Agent component as the smallest independent operating unit, and an Agent component is a complete data collection tool. Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.
  • Step S30 Use Hadoop (distributed system infrastructure) to perform data cleaning on the log data to filter out the original data, where the original data includes at least the server disk occupancy rate, the memory usage rate, the cpu occupancy rate, and the amount of service interface calls.
  • the original data includes at least the server disk occupancy rate, the memory usage rate, the cpu occupancy rate, and the amount of service interface calls.
  • step S50 the original data is extracted with feature values including average value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index.
  • Step S70 Use the Pearson correlation coefficient to filter out the effective features, and calculate the Pearson correlation coefficient between the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If it is higher than the correlation threshold, it is considered Data is valid, and data below the correlation threshold is considered invalid data and removed.
  • the cluster log feature extraction program stored in the memory 21 can be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and can be divided into one or more program modules. It is executed by two processors (in this embodiment, the processor 22) to complete the application.
  • FIG. 3 shows a schematic diagram of program modules of the cluster log feature extraction program.
  • the cluster log feature extraction program 50 can be divided into a log collection module 501, a data cleaning module 502, and a feature extraction module 503. , Effective feature screening module 504.
  • the program module referred to in this application refers to a series of computer program instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the cluster log feature extraction program in the electronic device 2.
  • the cluster log feature extraction method is realized through the specific functions of the program module.
  • This application also provides a cluster log feature extraction device, including: a log collection module 501, a data cleaning module 502, a feature extraction module 503, and an effective feature screening module 504.
  • the log collection module 501 is configured to collect logs of the server cluster through a flume (distributed mass log collection, aggregation and transmission system) client, and send them to the Hbase database server.
  • Flume takes the Agent component as the smallest independent operating unit, and an Agent component is a complete data collection tool. Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.
  • the data cleaning module 502 is configured to use Hadoop (distributed system infrastructure) to perform data cleaning on log data to filter out the original data, where the original data includes at least the server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume.
  • Hadoop distributed system infrastructure
  • the feature extraction module 503 is used for extracting feature values including average value, effective value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index from the original data.
  • the effective feature selection module 504 uses the Pearson correlation coefficient to filter out the effective features, and performs the calculation of the Pearson correlation coefficient on the extracted feature values and the original data respectively, and compares the calculated correlation coefficient with the correlation threshold, which is higher than the correlation threshold Data is considered valid, and data below the correlation threshold is considered invalid data and removed.
  • the data cleaning module 502 includes a Laida criterion determining unit 5021, and the Laida criterion determining unit 5021 uses the Laida criterion to remove data with gross errors, including the following steps:
  • x i is the data value collected by a single agent
  • x b is a singular value with a gross error value, and the singular value is eliminated.
  • the data cleaning module 502 further includes a singular value replacement unit 5022.
  • a singular value replacement unit 5022 replaces the identified singular value of the log data with the median value to realize the preprocessing of the production data information.
  • the median value means that the variable values x 1 , x 2 ..., x n are arranged in order of magnitude to form a sequence, and the variable value in the middle of the variable sequence is called the median value.
  • the feature extraction module 503 includes a mean value extraction unit 5031, an effective value extraction unit 5032, a peak value extraction unit 5033, a square root amplitude extraction unit 5034, a waveform index extraction unit 5035, a pulse Index extraction unit 5036 and kurtosis index extraction unit 5037. Extract the eigenvalues of the original data, including the mean value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index. Among them,
  • the square root amplitude is calculated using the following formula:
  • the waveform index is calculated using the following formula:
  • the impulse index is calculated using the following formula:
  • the kurtosis index is calculated using the following formula:
  • x i is the log data collected by a single agent; N is the number of log data collection; Is the arithmetic mean of the collected log data; X rms is the effective value of the collected log data; X p is the peak value of the collected log data; X r is the square root amplitude of the collected log data; X ws is the collected log The waveform index of the data; X if is the impulse index of the collected log data; X kv is the kurtosis index of the collected log data.
  • the Pearson correlation coefficient is used to filter out the effective features. Specifically, the above feature values are calculated with the original data for the Pearson correlation coefficient, and the calculated correlation coefficient is compared with the correlation threshold. If it is higher than the correlation threshold, it is considered It is valid data. If it is lower than the correlation threshold, it is considered invalid data and needs to be eliminated so that valid data can be filtered out. For example, if the correlation threshold is 0.7, the correlation coefficient between the square root amplitude and the original data is 0.2, it indicates that the square root amplitude is invalid data, and the correlation coefficient between the kurtosis index and the original data is 0.85, and the kurtosis index is deemed valid data. Among them, the formula of Pearson's correlation coefficient is as follows:
  • x i is the value of data collected by a single agent
  • y i is a characteristic value extracted from the data collected by a single agent
  • N is the number of data collections.
  • the log collection module 501 further includes an agent setting unit 5011, which is configured to set a plurality of first-level agents and a second-level agent for Flume, and each first-level agent
  • the level agents respectively collect log data of a server, and the log data collected by multiple first level agents are collected to the second level agent, and the second level agent is transmitted to HDFS.
  • cluster log feature extraction device of the present application is substantially the same as the specific implementation of the cluster log feature extraction method and the electronic device described above, and will not be repeated here.
  • the embodiment of the present application also proposes a computer non-volatile readable storage medium.
  • the computer readable storage medium may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read-only memory (ROM), a Erasing programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, etc., or any combination of several.
  • the computer non-volatile readable storage medium includes a cluster log feature extraction program, etc., and the cluster log feature extraction program 50 implements the following operations when executed by the processor 22:
  • step S10 logs of the server cluster are collected through the flume client and sent to the Hbase database server.
  • Flume takes the Agent component as the smallest independent operating unit, and an Agent component is a complete data collection tool. Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.
  • Step S30 Use Hadoop to perform data cleaning on the log data to filter out the original data, where the original data includes at least server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume.
  • step S50 the original data is extracted with feature values including average value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index.
  • Step S70 Use the Pearson correlation coefficient to filter out the effective features, and calculate the Pearson correlation coefficient between the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If it is higher than the correlation threshold, it is considered Data is valid, and data below the correlation threshold is considered invalid data and removed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A cluster log feature extraction method, and an apparatus, a device and a storage medium. The method comprises: a flume client collecting a log of a server cluster, and sending same to a database (S10); performing data cleaning on log data to screen out original data (S30); extracting feature values, comprising the mean value, effective value, peak value, root amplitude, waveform index, impulse index and kurtosis index, of the original data (S50); and respectively performing operation of a Pearson correlation coefficient on the extracted feature values and the original data, comparing calculated correlation coefficients with a correlation threshold, regarding correlation coefficients higher than the correlation threshold as being valid data, and regarding correlation coefficients lower than the correlation threshold as being invalid data and removing same (S70). Valid information of production data of each host in a server cluster can be effectively screened out, and feature values of the production data are extracted from the valid information, thereby facilitating failure prediction and failure classification of a production system and reducing the occurrence of production accidents.

Description

集群日志特征提取方法、装置、设备及存储介质Cluster log feature extraction method, device, equipment and storage medium

本申请要求于2019年02月19日提交的中国专利申请号201910123928.1的优先权益,上述案件全部内容以引用的方式并入本文中。This application claims the priority rights of Chinese Patent Application No. 201910123928.1 filed on February 19, 2019. The entire contents of the above cases are incorporated herein by reference.

技术领域Technical field

本申请涉及基架运维,具体地说,涉及一种集群日志特征提取方法、装置、设备及存储介质。This application relates to base frame operation and maintenance, and specifically to a method, device, equipment, and storage medium for extracting cluster log features.

背景技术Background technique

在信息爆炸式增长的时代,文件大小和数据规模迈向TB级甚至PB级已成现实,集群存储系统节点数已达到64节点集群数目,管理如此庞大的集群系统已经成为数据中心所面临的严峻挑战。及时跟踪集群节点运行状态,精确定位节点出错信息变得尤为重要。在集群存储系统实际的运行中,目前常用一种集群存储系统日志管理方法,可以定时或实时发送系统日志,实现了日志的集中传输,但是没有对日志进行分析和管理,不能全局的了解整个集群存储系统的运行情况,不能快速的定位到错误信息。但是随着集群节点数的增多,对集群系统管理变得越来越复杂。从海量服务器数据中,抽取出能反映服务器性能的特征,精确定位集群节点的潜在故障,提前做好相应的性能检测显得尤为重要。In the era of explosive growth of information, it has become a reality for file size and data scale to reach terabytes or even petabytes. The number of cluster storage system nodes has reached the number of clusters with 64 nodes. Managing such a large cluster system has become a severe problem for data centers. challenge. It is especially important to track the running status of cluster nodes in time and accurately locate node error information. In the actual operation of the cluster storage system, there is currently a commonly used method of cluster storage system log management, which can send system logs regularly or in real time to realize the centralized transmission of logs, but the logs are not analyzed and managed, and the entire cluster cannot be globally understood. The operating status of the storage system cannot quickly locate the error message. However, as the number of cluster nodes increases, the management of the cluster system becomes more and more complicated. From the massive server data, it is particularly important to extract the characteristics that can reflect the server performance, accurately locate the potential failures of the cluster nodes, and do the corresponding performance detection in advance.

发明内容Summary of the invention

为解决以上问题,本申请提供一种集群日志特征提取方法,应用于电子设备,包括以下步骤:通过flume客户端采集服务器集群的日志,发送至Hbase数据库,其中,flume客户端通过多个Agent进程对应采集服务器集群中的每台服务器的日志,Agent定时将对应的服务器上的日志数据收集并通过API接口发送到Hbase数据库;利用Hadoop对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量;对原始数据进行包括均值、有效值、峰值、方根幅值、波 形指标、脉冲指标、峭度指标的特征值提取;运用皮尔逊相关系数筛选出有效特征,将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据并予以剔除。In order to solve the above problems, this application provides a cluster log feature extraction method, which is applied to electronic equipment, including the following steps: collect the log of the server cluster through the flume client and send it to the Hbase database, where the flume client processes multiple Agent processes Corresponding to the log of each server in the collection server cluster, the Agent regularly collects the log data on the corresponding server and sends it to the Hbase database through the API interface; uses Hadoop to clean the log data and filter out the original data. The original data is at least Including server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume; extract the feature value of the original data including mean value, effective value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index; Use the Pearson correlation coefficient to filter out the effective features, and perform the calculation of the Pearson correlation coefficient between the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If the correlation threshold is higher than the correlation threshold, it is considered valid data , Below the correlation threshold, it is considered invalid data and removed.

本申请还提供一种集群日志特征提取装置,包括:日志采集模块、数据清洗模块、特征提取模块、有效特征筛选模块,其中,日志采集模块用于通过flume客户端采集服务器集群的日志,发送至Hbase数据库,其中,flume客户端通过多个Agent进程对应采集服务器集群中的每台服务器的日志,Agent定时将对应的服务器上的日志数据收集并通过API接口发送到Hbase数据库;数据清洗模块用于利用Hadoop对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量;特征提取模块用于对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取;有效特征筛选模块运用皮尔逊相关系数筛选出有效特征:将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则是有效数据,低于相关度阈值则是无效数据,并予以剔除。This application also provides a cluster log feature extraction device, including: a log collection module, a data cleaning module, a feature extraction module, and an effective feature screening module. The log collection module is used to collect logs of the server cluster through the flume client and send it to Hbase database, where the flume client collects the logs of each server in the server cluster through multiple agent processes. The agent regularly collects the log data on the corresponding server and sends it to the Hbase database through the API interface; the data cleaning module is used for Use Hadoop to clean the log data to filter out the original data. The original data includes at least the server disk occupancy rate, memory usage rate, cpu occupancy rate, and the amount of business interface calls; the feature extraction module is used to perform average and effective Extraction of feature values of value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index; effective feature selection module uses Pearson correlation coefficient to filter out effective features: Pearson correlation between the extracted feature values and the original data The calculation of the coefficient is compared with the correlation threshold based on the calculated correlation coefficient. If the correlation coefficient is higher than the correlation threshold, it is valid data, and if the correlation threshold is lower, the data is invalid, and it will be eliminated.

本申请还提供一种电子设备,该电子设备包括:存储器和处理器,所述存储器中存储有集群日志特征提取程序,所述集群日志特征提取程序被所述处理器执行时实现如下步骤:通过flume客户端采集服务器集群的日志,发送至Hbase数据库,其中,flume客户端通过多个Agent进程对应采集服务器集群中的每台服务器的日志,Agent定时将对应的服务器上的日志数据收集并通过API接口发送到Hbase数据库;利用Hadoop对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量;对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取;运用皮尔逊相关系数筛选出有效特征,将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据并予以剔除。The application also provides an electronic device, the electronic device comprising: a memory and a processor, the memory stores a cluster log feature extraction program, and the cluster log feature extraction program is executed by the processor to implement the following steps: The flume client collects the logs of the server cluster and sends them to the Hbase database. The flume client collects the logs of each server in the server cluster through multiple agent processes. The agent regularly collects the log data on the corresponding server and passes the API The interface is sent to the Hbase database; the log data is cleaned using Hadoop, and the original data is filtered out. The original data includes at least the server disk occupancy rate, memory utilization rate, cpu occupancy rate, and business interface call volume; the original data includes average, Extraction of eigenvalues of effective value, peak value, square root amplitude, waveform index, impulse index, kurtosis index; use Pearson correlation coefficient to filter out effective features, and perform Pearson correlation coefficient calculation on the extracted eigenvalues and original data , According to the calculated correlation coefficient and the correlation threshold value, it is considered as valid data if it is higher than the correlation degree threshold value, and it is considered invalid data if it is lower than the correlation degree threshold value and removed.

本申请还提供一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,实现以上所述的集群日志特征提取方法。The present application also provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a processor, Realize the above-mentioned cluster log feature extraction method.

本申请能有效筛选出服务器集群中各主机的生产数据的有效信息,且从有效信息中提取出生产数据的特征值,便于生产系统的故障预测和故障分类,减少生产事故的发生。This application can effectively filter out the effective information of the production data of each host in the server cluster, and extract the characteristic values of the production data from the effective information, which facilitates the failure prediction and classification of the production system and reduces the occurrence of production accidents.

附图说明Description of the drawings

通过结合下面附图对其实施例进行描述,本申请的上述特征和技术优点将会变得更加清楚和容易理解。By describing its embodiments in conjunction with the following drawings, the above-mentioned features and technical advantages of the present application will become clearer and easier to understand.

图1是本申请实施例的集群日志特征提取方法的流程示意图;FIG. 1 is a schematic flowchart of a cluster log feature extraction method according to an embodiment of the present application;

图2是本申请实施例的电子设备的硬件架构示意图;2 is a schematic diagram of the hardware architecture of an electronic device according to an embodiment of the present application;

图3是本申请实施例的集群日志特征提取程序的模块构成图;Fig. 3 is a block diagram of a cluster log feature extraction program according to an embodiment of the present application;

图4是本申请实施例的日志采集模块的单元构成图;FIG. 4 is a unit structure diagram of a log collection module of an embodiment of the present application;

图5是本申请实施例的特征提取模块的单元构成图;FIG. 5 is a unit structure diagram of a feature extraction module of an embodiment of the present application;

图6是本申请实施例的数据清洗模块的单元构成图;FIG. 6 is a unit structure diagram of a data cleaning module according to an embodiment of the present application;

图7是Flume的Agent进程读取数据的示意图。Figure 7 is a schematic diagram of Flume's Agent process reading data.

具体实施方式detailed description

下面将参考附图来描述本申请所述的集群日志特征提取方法、装置及存储介质的实施例。本领域的普通技术人员可以认识到,在不偏离本申请的精神和范围的情况下,可以用各种不同的方式或其组合对所描述的实施例进行修正。因此,附图和描述在本质上是说明性的,而不是用于限制权利要求的保护范围。此外,在本说明书中,附图未按比例画出,并且相同的附图标记表示相同的部分。Hereinafter, embodiments of the cluster log feature extraction method, device and storage medium described in this application will be described with reference to the accompanying drawings. A person of ordinary skill in the art may realize that the described embodiments can be modified in various different ways or combinations thereof without departing from the spirit and scope of the present application. Therefore, the drawings and descriptions are illustrative in nature and are not used to limit the scope of protection of the claims. In addition, in this specification, the drawings are not drawn to scale, and the same reference numerals denote the same parts.

如图1所示,本实施例的集群日志特征提取方法,包括如下步骤:As shown in Figure 1, the cluster log feature extraction method of this embodiment includes the following steps:

步骤S10,通过flume(分布式的海量日志采集、聚合和传输系统)客户端采集服务器集群的日志,发送至Hbase数据库服务器。Flume以Agent进程为最小的独立运行单位,一个Agent进程就是一个完整的数据收集工具。如图7所示,Agent包含组件Source(数据收集组件)、Channel(中转临时存储)、 Sink,三者组建了一个Agent,source从服务器收集数据,传递给Channel,Channel保存由Source组件传递过来的Event(数据单元),Sink从Channel中读取并移除Event,将Event传递到后台。Flume通过多个Agent来对应各服务器收集日志数据。对应每一台服务器设置一个Agent,定时将对应的服务器上的日志数据收集并通过API接口发送到后台。In step S10, the logs of the server cluster are collected by the flume (distributed mass log collection, aggregation and transmission system) client, and sent to the Hbase database server. Flume takes the Agent process as the smallest independent operation unit, and an Agent process is a complete data collection tool. As shown in Figure 7, the Agent includes components Source (data collection component), Channel (transit temporary storage), and Sink. The three form an Agent. The source collects data from the server and passes it to the Channel. The Channel saves the data passed by the Source component. Event (data unit), Sink reads and removes the Event from the Channel, and passes the Event to the background. Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.

步骤S30,利用Hadoop(分布式系统基础架构)对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量。Step S30: Use Hadoop (distributed system infrastructure) to perform data cleaning on the log data to filter out the original data, where the original data includes at least the server disk occupancy rate, the memory usage rate, the cpu occupancy rate, and the amount of service interface calls.

步骤S50,对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取。In step S50, the original data is extracted with feature values including average value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index.

步骤S70,运用皮尔逊相关系数筛选出有效特征:将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据并予以剔除。Step S70: Use the Pearson correlation coefficient to filter out the effective features: perform Pearson correlation coefficient calculations on the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If it is higher than the correlation threshold, it is considered Data is valid, and data below the correlation threshold is considered invalid data and removed.

进一步地,数据清洗中采用拉依达准则剔除具有粗大误差的数据,包括以下步骤:Further, the Laida criterion is used to remove data with gross errors in data cleaning, including the following steps:

对日志数据x 1,x 2...,x n,计算其算术平均值

Figure PCTCN2019118288-appb-000001
及剩余误差
Figure PCTCN2019118288-appb-000002
其中,x i为单次Agent采集的日志数据; For log data x 1 ,x 2 ...,x n , calculate the arithmetic average
Figure PCTCN2019118288-appb-000001
And residual error
Figure PCTCN2019118288-appb-000002
Among them, x i is the log data collected by a single agent;

计算标准偏差S x

Figure PCTCN2019118288-appb-000003
Calculate the standard deviation S x ,
Figure PCTCN2019118288-appb-000003

若日志数据中的x b的剩余误差v b(1≤b≤n),满足公式

Figure PCTCN2019118288-appb-000004
If the residual error v b of x b in the log data (1≤b≤n), it satisfies the formula
Figure PCTCN2019118288-appb-000004

则认为x b是含有粗大误差值的奇异值,并剔除奇异值。 It is considered that x b is a singular value with a gross error value, and the singular value is eliminated.

进一步地,采用拉依达法则能有效地识别出生产数据的奇异值,但对于剔除掉的数据则会产生空值。因此,对识别出的日志数据的奇异值用中值替代,实现对生产数据信息的预处理。其中所述中值是指将各个变量值x 1,x 2...,x n按大小顺序排列起来,形成一个数列,处于变量数列中间位置的变量值就称为中值。 Furthermore, using Laida's law can effectively identify the singular values of the production data, but will produce null values for the deleted data. Therefore, the singular value of the identified log data is replaced with the median value to realize the preprocessing of the production data information. The median value means that the variable values x 1 , x 2 ..., x n are arranged in order of magnitude to form a sequence, and the variable value in the middle of the variable sequence is called the median value.

在一个可选实施例中,对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取,其中,In an optional embodiment, feature value extraction including mean value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index is performed on the original data, wherein,

有效值采用如下公式计算:

Figure PCTCN2019118288-appb-000005
The effective value is calculated using the following formula:
Figure PCTCN2019118288-appb-000005

峰值采用如下公式计算:X p=max(x i) The peak value is calculated using the following formula: X p =max(x i )

方根幅值采用如下公式计算:

Figure PCTCN2019118288-appb-000006
The square root amplitude is calculated using the following formula:
Figure PCTCN2019118288-appb-000006

波形指标采用如下公式计算:

Figure PCTCN2019118288-appb-000007
The waveform index is calculated using the following formula:
Figure PCTCN2019118288-appb-000007

脉冲指标采用如下公式计算:

Figure PCTCN2019118288-appb-000008
The impulse index is calculated using the following formula:
Figure PCTCN2019118288-appb-000008

峭度指标采用如下公式计算:

Figure PCTCN2019118288-appb-000009
The kurtosis index is calculated using the following formula:
Figure PCTCN2019118288-appb-000009

其中,x i为单次Agent采集的日志数据;N为日志数据采集的次数;

Figure PCTCN2019118288-appb-000010
为采集的日志数据的算术平均值;X rms为采集的日志数据的有效值;X p为采集的日志数据的峰值;X r为采集的日志数据的方根幅值;X ws为采集的日志数据的波形指标;X if为采集的日志数据的脉冲指标;X kv为采集的日志数据的峭度指标。 Among them, x i is the log data collected by a single agent; N is the number of log data collection;
Figure PCTCN2019118288-appb-000010
Is the arithmetic mean of the collected log data; X rms is the effective value of the collected log data; X p is the peak value of the collected log data; X r is the square root amplitude of the collected log data; X ws is the collected log The waveform index of the data; X if is the impulse index of the collected log data; X kv is the kurtosis index of the collected log data.

运用皮尔逊相关系数筛选出有效特征,具体说,是将以上特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值来比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据,需要予以剔除,从而可筛选出有效的数据。例如,相关度阈值为0.7,方根幅值与原始数据的相关系数为0.2,则表明方根幅值为无效数据,峭度指标与原始数据的相关系数为0.85,则认定峭度指标为有效数据。其中,皮尔逊相关系数的公式如下:Use the Pearson correlation coefficient to filter out the effective features. Specifically, the above feature values are calculated with the original data for the Pearson correlation coefficient, and the calculated correlation coefficient is compared with the correlation threshold. If it is higher than the correlation threshold, it is considered It is valid data. If it is lower than the correlation threshold, it is considered invalid data and needs to be eliminated so that valid data can be filtered out. For example, if the correlation threshold is 0.7, the correlation coefficient between the square root amplitude and the original data is 0.2, it indicates that the square root amplitude is invalid data, and the correlation coefficient between the kurtosis index and the original data is 0.85, and the kurtosis index is deemed valid data. Among them, the formula of Pearson's correlation coefficient is as follows:

Figure PCTCN2019118288-appb-000011
Figure PCTCN2019118288-appb-000011

其中,x i为单次Agent采集数据值;y i为单次Agent采集数据中提取的某一特征值;

Figure PCTCN2019118288-appb-000012
是日志数据x 1,x 2...,x n的算数平均值;
Figure PCTCN2019118288-appb-000013
是y 1,y 2...,y n的算数平均值;N为日志数据采集的次数。 Among them, x i is the value of data collected by a single agent; y i is a characteristic value extracted from the data collected by a single agent;
Figure PCTCN2019118288-appb-000012
Is the arithmetic average of log data x 1 , x 2 ..., x n ;
Figure PCTCN2019118288-appb-000013
Is the arithmetic mean of y 1 , y 2 ..., y n ; N is the number of log data collections.

在一个可选实施例中,Flume包括多个第一层级Agent和一个第二层级 Agent,每个第一层级Agent分别对应的采集一个服务器的日志数据,多个第一层级Agent采集的日志数据汇集至第二层级Agent,并由第二层级Agent传输至HDFS(分布式文件系统)中。In an optional embodiment, Flume includes multiple first-level agents and one second-level agent. Each first-level agent collects log data from a server, and the log data collected by multiple first-level agents are collected. To the second-level agent, and the second-level agent transmits to HDFS (distributed file system).

参阅图2所示,是本申请电子设备的实施例的硬件架构示意图。本实施例中,所述电子设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。例如,可以是智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图2所示,所述电子设备2至少包括,但不限于,可通过系统总线相互通信连接的存储器21、处理器22、网络接口23。其中:所述存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述电子设备2的内部存储单元,例如该电子设备2的硬盘或内存。在另一些实施例中,所述存储器21也可以是所述电子设备2的外部存储设备,例如该电子设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述电子设备2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器21通常用于存储安装于所述电子设备2的操作系统和各类应用软件,例如所述集群日志特征提取程序代码等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。Refer to FIG. 2, which is a schematic diagram of the hardware architecture of an embodiment of the electronic device of the present application. In this embodiment, the electronic device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. For example, it can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers). As shown in FIG. 2, the electronic device 2 at least includes, but is not limited to, a memory 21, a processor 22, and a network interface 23 that can be communicatively connected to each other through a system bus. Wherein: the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM) ), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 21 may be an internal storage unit of the electronic device 2, for example, a hard disk or a memory of the electronic device 2. In other embodiments, the memory 21 may also be an external storage device of the electronic device 2, for example, a plug-in hard disk equipped on the electronic device 2, a smart media card (SMC), a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 21 may also include both the internal storage unit of the electronic device 2 and its external storage device. In this embodiment, the memory 21 is generally used to store the operating system and various application software installed in the electronic device 2, such as the cluster log feature extraction program code. In addition, the memory 21 can also be used to temporarily store various types of data that have been output or will be output.

所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述电子设备2的总体操作,例如执行与所述电子设备2进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述的集群日志特征提取程序等。The processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 22 is generally used to control the overall operation of the electronic device 2, for example, perform data interaction or communication-related control and processing with the electronic device 2. In this embodiment, the processor 22 is configured to run the program code or process data stored in the memory 21, for example, run the cluster log feature extraction program.

所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23 通常用于在所述电子设备2与其他电子设备之间建立通信连接。例如,所述网络接口23用于通过网络将所述电子设备2与推送平台相连,在所述电子设备2与推送平台之间建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband CodeDivision Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may include a wireless network interface or a wired network interface. The network interface 23 is usually used to establish a communication connection between the electronic device 2 and other electronic devices. For example, the network interface 23 is used to connect the electronic device 2 with a push platform through a network, and establish a data transmission channel and a communication connection between the electronic device 2 and the push platform. The network may be Intranet, Internet, Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G network , Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.

可选地,该电子设备2还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)显示器等。显示器用于显示在电子设备2中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 2 may also include a display, and the display may also be called a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) display, etc. The display is used to display the information processed in the electronic device 2 and to display a visualized user interface.

需要指出的是,图2仅示出了具有组件21-23的电子设备2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。It should be pointed out that FIG. 2 only shows the electronic device 2 with components 21-23, but it should be understood that it is not required to implement all of the illustrated components, and more or fewer components may be implemented instead.

包含可读存储介质的存储器21中可以包括操作系统、集群日志特征提取程序50等。处理器22执行存储器21中集群日志特征提取程序50时实现如下步骤:The memory 21 containing a readable storage medium may include an operating system, a cluster log feature extraction program 50, and the like. The processor 22 implements the following steps when executing the cluster log feature extraction program 50 in the memory 21:

步骤S10,通过flume(分布式的海量日志采集、聚合和传输系统)客户端采集服务器集群的日志,发送至Hbase数据库服务器。Flume以Agent组件为最小的独立运行单位,一个Agent组件就是一个完整的数据收集工具。Flume通过多个Agent来对应各服务器收集日志数据。对应每一台服务器设置一个Agent,定时将对应的服务器上的日志数据收集并通过API接口发送到后台。In step S10, the logs of the server cluster are collected by the flume (distributed mass log collection, aggregation and transmission system) client, and sent to the Hbase database server. Flume takes the Agent component as the smallest independent operating unit, and an Agent component is a complete data collection tool. Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.

步骤S30,利用Hadoop(分布式系统基础架构)对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量。Step S30: Use Hadoop (distributed system infrastructure) to perform data cleaning on the log data to filter out the original data, where the original data includes at least the server disk occupancy rate, the memory usage rate, the cpu occupancy rate, and the amount of service interface calls.

步骤S50,对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取。In step S50, the original data is extracted with feature values including average value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index.

步骤S70,运用皮尔逊相关系数筛选出有效特征,将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据并予以剔除。Step S70: Use the Pearson correlation coefficient to filter out the effective features, and calculate the Pearson correlation coefficient between the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If it is higher than the correlation threshold, it is considered Data is valid, and data below the correlation threshold is considered invalid data and removed.

需要说明的是,本申请之电子设备的具体实施方式与上述集群日志提取 方法的具体实施方式大致相同,在此不再赘述。It should be noted that the specific implementation of the electronic device of the present application is substantially the same as the specific implementation of the cluster log extraction method described above, and will not be repeated here.

在本实施例中,存储于存储器21中的所述集群日志特征提取程序可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并可由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。例如,图3示出了所述集群日志特征提取程序的程序模块示意图,该实施例中,所述集群日志特征提取程序50可以被分割为日志采集模块501、数据清洗模块502、特征提取模块503、有效特征筛选模块504。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述集群日志特征提取程序在所述电子设备2中的执行过程。通过所述程序模块的具体功能实现集群日志特征提取方法。In this embodiment, the cluster log feature extraction program stored in the memory 21 can be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and can be divided into one or more program modules. It is executed by two processors (in this embodiment, the processor 22) to complete the application. For example, FIG. 3 shows a schematic diagram of program modules of the cluster log feature extraction program. In this embodiment, the cluster log feature extraction program 50 can be divided into a log collection module 501, a data cleaning module 502, and a feature extraction module 503. , Effective feature screening module 504. Wherein, the program module referred to in this application refers to a series of computer program instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the cluster log feature extraction program in the electronic device 2. The cluster log feature extraction method is realized through the specific functions of the program module.

本申请还提供一种集群日志特征提取装置,包括:日志采集模块501、数据清洗模块502、特征提取模块503、有效特征筛选模块504。This application also provides a cluster log feature extraction device, including: a log collection module 501, a data cleaning module 502, a feature extraction module 503, and an effective feature screening module 504.

其中,日志采集模块501用于通过flume(分布式的海量日志采集、聚合和传输系统)客户端采集服务器集群的日志,发送至Hbase数据库服务器。Flume以Agent组件为最小的独立运行单位,一个Agent组件就是一个完整的数据收集工具。Flume通过多个Agent来对应各服务器收集日志数据。对应每一台服务器设置一个Agent,定时将对应的服务器上的日志数据收集并通过API接口发送到后台。Among them, the log collection module 501 is configured to collect logs of the server cluster through a flume (distributed mass log collection, aggregation and transmission system) client, and send them to the Hbase database server. Flume takes the Agent component as the smallest independent operating unit, and an Agent component is a complete data collection tool. Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.

数据清洗模块502用于利用Hadoop(分布式系统基础架构)对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量。The data cleaning module 502 is configured to use Hadoop (distributed system infrastructure) to perform data cleaning on log data to filter out the original data, where the original data includes at least the server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume.

特征提取模块503用于对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取。The feature extraction module 503 is used for extracting feature values including average value, effective value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index from the original data.

有效特征筛选模块504运用皮尔逊相关系数筛选出有效特征,将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据并予以剔除。The effective feature selection module 504 uses the Pearson correlation coefficient to filter out the effective features, and performs the calculation of the Pearson correlation coefficient on the extracted feature values and the original data respectively, and compares the calculated correlation coefficient with the correlation threshold, which is higher than the correlation threshold Data is considered valid, and data below the correlation threshold is considered invalid data and removed.

在一个可选实施例中,如图6所示,数据清洗模块502包括拉依达准则判定单元5021,拉依达准则判定单元5021采用拉依达准则剔除具有粗大误差的数据,包括以下步骤:In an optional embodiment, as shown in FIG. 6, the data cleaning module 502 includes a Laida criterion determining unit 5021, and the Laida criterion determining unit 5021 uses the Laida criterion to remove data with gross errors, including the following steps:

对日志数据x 1,x 2...,x n,计算其算术平均值

Figure PCTCN2019118288-appb-000014
及剩余误差
Figure PCTCN2019118288-appb-000015
其中,x i为单次Agent采集数据值; For log data x 1 , x 2 ..., x n , calculate its arithmetic average
Figure PCTCN2019118288-appb-000014
And residual error
Figure PCTCN2019118288-appb-000015
Among them, x i is the data value collected by a single agent;

计算标准偏差S x

Figure PCTCN2019118288-appb-000016
Calculate the standard deviation S x ,
Figure PCTCN2019118288-appb-000016

若数据x b的剩余误差v b(1≤b≤n),满足下式

Figure PCTCN2019118288-appb-000017
If the residual error v b (1≤b≤n) of the data x b satisfies the following formula
Figure PCTCN2019118288-appb-000017

则认为x b是含有粗大误差值的奇异值,并剔除该奇异值。 It is considered that x b is a singular value with a gross error value, and the singular value is eliminated.

进一步地,数据清洗模块502还包括奇异值替换单元5022。采用拉依达法则能有效地识别出生产数据的奇异值,但对于剔除掉的数据则会产生空值。奇异值替换单元5022对识别出的日志数据的奇异值用中值替代,实现对生产数据信息的预处理。其中所述中值是指将各个变量值x 1,x 2...,x n按大小顺序排列起来,形成一个数列,处于变量数列中间位置的变量值就称为中值。 Further, the data cleaning module 502 further includes a singular value replacement unit 5022. Using Laida's law can effectively identify the singular value of production data, but will produce null values for the deleted data. The singular value replacement unit 5022 replaces the identified singular value of the log data with the median value to realize the preprocessing of the production data information. The median value means that the variable values x 1 , x 2 ..., x n are arranged in order of magnitude to form a sequence, and the variable value in the middle of the variable sequence is called the median value.

在一个可选实施例中,如图5所示,特征提取模块503包括均值提取单元5031、有效值提取单元5032、峰值提取单元5033、方根幅值提取单元5034、波形指标提取单元5035、脉冲指标提取单元5036、峭度指标提取单元5037。分别对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取,其中,In an optional embodiment, as shown in FIG. 5, the feature extraction module 503 includes a mean value extraction unit 5031, an effective value extraction unit 5032, a peak value extraction unit 5033, a square root amplitude extraction unit 5034, a waveform index extraction unit 5035, a pulse Index extraction unit 5036 and kurtosis index extraction unit 5037. Extract the eigenvalues of the original data, including the mean value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index. Among them,

有效值采用如下公式计算:

Figure PCTCN2019118288-appb-000018
The effective value is calculated using the following formula:
Figure PCTCN2019118288-appb-000018

峰值采用如下公式计算:X p=max(x i) The peak value is calculated using the following formula: X p =max(x i )

方根幅值采用如下公式计算:

Figure PCTCN2019118288-appb-000019
The square root amplitude is calculated using the following formula:
Figure PCTCN2019118288-appb-000019

波形指标采用如下公式计算:

Figure PCTCN2019118288-appb-000020
The waveform index is calculated using the following formula:
Figure PCTCN2019118288-appb-000020

脉冲指标采用如下公式计算:

Figure PCTCN2019118288-appb-000021
The impulse index is calculated using the following formula:
Figure PCTCN2019118288-appb-000021

峭度指标采用如下公式计算:

Figure PCTCN2019118288-appb-000022
The kurtosis index is calculated using the following formula:
Figure PCTCN2019118288-appb-000022

其中,x i为单次Agent采集的日志数据;N为日志数据采集的次数;

Figure PCTCN2019118288-appb-000023
为采集的日志数据的算术平均值;X rms为采集的日志数据的有效值;X p为采集的日志数据的峰值;X r为采集的日志数据的方根幅值;X ws为采集的日志数据 的波形指标;X if为采集的日志数据的脉冲指标;X kv为采集的日志数据的峭度指标。 Among them, x i is the log data collected by a single agent; N is the number of log data collection;
Figure PCTCN2019118288-appb-000023
Is the arithmetic mean of the collected log data; X rms is the effective value of the collected log data; X p is the peak value of the collected log data; X r is the square root amplitude of the collected log data; X ws is the collected log The waveform index of the data; X if is the impulse index of the collected log data; X kv is the kurtosis index of the collected log data.

运用皮尔逊相关系数筛选出有效特征,具体说,是将以上特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值来比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据,需要予以剔除,从而可筛选出有效的数据。例如,相关度阈值为0.7,方根幅值与原始数据的相关系数为0.2,则表明方根幅值为无效数据,峭度指标与原始数据的相关系数为0.85,则认定峭度指标为有效数据。其中,皮尔逊相关系数的公式如下:Use the Pearson correlation coefficient to filter out the effective features. Specifically, the above feature values are calculated with the original data for the Pearson correlation coefficient, and the calculated correlation coefficient is compared with the correlation threshold. If it is higher than the correlation threshold, it is considered It is valid data. If it is lower than the correlation threshold, it is considered invalid data and needs to be eliminated so that valid data can be filtered out. For example, if the correlation threshold is 0.7, the correlation coefficient between the square root amplitude and the original data is 0.2, it indicates that the square root amplitude is invalid data, and the correlation coefficient between the kurtosis index and the original data is 0.85, and the kurtosis index is deemed valid data. Among them, the formula of Pearson's correlation coefficient is as follows:

Figure PCTCN2019118288-appb-000024
Figure PCTCN2019118288-appb-000024

其中,x i为单次Agent采集数据值;y i为单次Agent采集数据中提取的某一特征值;

Figure PCTCN2019118288-appb-000025
是日志数据x 1,x 2...,x n的算数平均值;
Figure PCTCN2019118288-appb-000026
是y 1,y 2...,y n的算数平均值;N为数据采集的次数。 Among them, x i is the value of data collected by a single agent; y i is a characteristic value extracted from the data collected by a single agent;
Figure PCTCN2019118288-appb-000025
Is the arithmetic mean of log data x 1 , x 2 ..., x n ;
Figure PCTCN2019118288-appb-000026
Is the arithmetic mean of y 1 , y 2 ..., y n ; N is the number of data collections.

在一个可选实施例中,如图4所示,日志采集模块501还包括Agent设置单元5011,用于针对Flume进行包括多个第一层级Agent和一个第二层级Agent的设置,每个第一层级Agent分别对应的采集一个服务器的日志数据,多个第一层级Agent采集的日志数据汇集至第二层级Agent,并由第二层级Agent传输至HDFS中。In an optional embodiment, as shown in FIG. 4, the log collection module 501 further includes an agent setting unit 5011, which is configured to set a plurality of first-level agents and a second-level agent for Flume, and each first-level agent The level agents respectively collect log data of a server, and the log data collected by multiple first level agents are collected to the second level agent, and the second level agent is transmitted to HDFS.

需要说明的是,本申请之集群日志特征提取装置的具体实施方式与上述集群日志特征提取方法、电子设备的具体实施方式大致相同,在此不再赘述。It should be noted that the specific implementation of the cluster log feature extraction device of the present application is substantially the same as the specific implementation of the cluster log feature extraction method and the electronic device described above, and will not be repeated here.

此外,本申请实施例还提出一种计算机非易失性可读存储介质,所述计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等等中的任意一种或者几种的任意组合。所述计算机非易失性可读存储介质中包括集群日志特征提取程序等,所述集群日志特征提取程序50被处理器22执行时实现如下操作:In addition, the embodiment of the present application also proposes a computer non-volatile readable storage medium. The computer readable storage medium may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read-only memory (ROM), a Erasing programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, etc., or any combination of several. The computer non-volatile readable storage medium includes a cluster log feature extraction program, etc., and the cluster log feature extraction program 50 implements the following operations when executed by the processor 22:

步骤S10,通过flume客户端采集服务器集群的日志,发送至Hbase数据库服务器。Flume以Agent组件为最小的独立运行单位,一个Agent组件就是一个完整的数据收集工具。Flume通过多个Agent来对应各服务器收集日志数据。对应每一台服务器设置一个Agent,定时将对应的服务器上的日志数据收集并通过API接口发送到后台。In step S10, logs of the server cluster are collected through the flume client and sent to the Hbase database server. Flume takes the Agent component as the smallest independent operating unit, and an Agent component is a complete data collection tool. Flume collects log data corresponding to each server through multiple agents. Set up an Agent for each server, collect log data on the corresponding server regularly and send it to the background through the API interface.

步骤S30,利用Hadoop对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量。Step S30: Use Hadoop to perform data cleaning on the log data to filter out the original data, where the original data includes at least server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume.

步骤S50,对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取。In step S50, the original data is extracted with feature values including average value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index.

步骤S70,运用皮尔逊相关系数筛选出有效特征,将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则认为是有效数据,低于相关度阈值则认为是无效数据并予以剔除。Step S70: Use the Pearson correlation coefficient to filter out the effective features, and calculate the Pearson correlation coefficient between the extracted feature values and the original data, and compare the calculated correlation coefficient with the correlation threshold. If it is higher than the correlation threshold, it is considered Data is valid, and data below the correlation threshold is considered invalid data and removed.

本申请之计算机非易失性可读存储介质的具体实施方式与上述集群日志特征提取方法以及电子设备2的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer non-volatile readable storage medium of the present application is substantially the same as the specific implementation of the cluster log feature extraction method and the electronic device 2 described above, and will not be repeated here.

以上所述仅为本申请的优选实施例,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The foregoing descriptions are only preferred embodiments of the application, and are not intended to limit the application. For those skilled in the art, the application may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection scope of this application.

Claims (20)

一种集群日志特征提取方法,应用于电子设备,其特征在于,包括以下步骤:A cluster log feature extraction method, applied to electronic equipment, is characterized in that it includes the following steps: 通过flume客户端采集服务器集群的日志,发送至Hbase数据库,其中,flume客户端通过多个Agent进程对应采集服务器集群中的每台服务器的日志,Agent定时将对应的服务器上的日志数据收集并通过API接口发送到Hbase数据库;Logs of the server cluster are collected through the flume client and sent to the Hbase database. The flume client collects the logs of each server in the server cluster through multiple agent processes. The agent regularly collects and passes the log data on the corresponding server. API interface is sent to Hbase database; 利用Hadoop对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量;Use Hadoop to clean the log data and filter out the original data. The original data includes at least the server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume; 对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取;Extract the feature value of the original data including mean value, effective value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index; 运用皮尔逊相关系数筛选出有效特征:将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则是有效数据,低于相关度阈值则是无效数据,并予以剔除。Use Pearson correlation coefficient to filter out effective features: Perform Pearson correlation coefficient calculations on the extracted feature values with the original data, and compare the calculated correlation coefficient with the correlation threshold. If the correlation coefficient is higher than the correlation threshold, it is valid data. Data below the correlation threshold is invalid data and will be eliminated. 根据权利要求1所述的集群日志特征提取方法,其特征在于,The cluster log feature extraction method according to claim 1, characterized in that: 在数据清洗过程中,采用拉依达准则剔除具有粗大误差的数据,包括以下步骤:In the data cleaning process, the Laida criterion is used to remove data with gross errors, including the following steps: 对日志数据x 1,x 2...,x n,计算其算术平均值
Figure PCTCN2019118288-appb-100001
及剩余误差
Figure PCTCN2019118288-appb-100002
其中,x i为单次Agent采集的日志数据;
For log data x 1 ,x 2 ...,x n , calculate the arithmetic average
Figure PCTCN2019118288-appb-100001
And residual error
Figure PCTCN2019118288-appb-100002
Among them, x i is the log data collected by a single agent;
计算标准偏差S x
Figure PCTCN2019118288-appb-100003
Calculate the standard deviation S x ,
Figure PCTCN2019118288-appb-100003
若日志数据中的x b的剩余误差v b(1≤b≤n),满足公式
Figure PCTCN2019118288-appb-100004
If the residual error v b of x b in the log data (1≤b≤n), it satisfies the formula
Figure PCTCN2019118288-appb-100004
则确定x b是含有粗大误差值的奇异值,并剔除奇异值。 It is determined that x b is a singular value with a gross error value, and the singular value is eliminated.
根据权利要求2所述的集群日志特征提取方法,其特征在于,The cluster log feature extraction method according to claim 2, characterized in that: 对日志数据的奇异值用中值替代,其中所述中值是指将各个日志数据x 1,x 2...,x n按大小顺序排列,处于中间位置的值称为中值。 The singular value of the log data is replaced with a median value, where the median value refers to arranging each log data x 1 , x 2 ..., x n in order of size, and the value in the middle position is called the median value. 根据权利要求2所述的集群日志特征提取方法,其特征在于,The cluster log feature extraction method according to claim 2, characterized in that: 原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取,其中,The original data is extracted with feature values including mean value, effective value, peak value, root square amplitude, waveform index, impulse index, and kurtosis index, among which, 有效值采用如下公式计算:
Figure PCTCN2019118288-appb-100005
The effective value is calculated using the following formula:
Figure PCTCN2019118288-appb-100005
峰值采用如下公式计算:X p=max(x i) The peak value is calculated using the following formula: X p =max(x i ) 方根幅值采用如下公式计算:
Figure PCTCN2019118288-appb-100006
The square root amplitude is calculated using the following formula:
Figure PCTCN2019118288-appb-100006
波形指标采用如下公式计算:
Figure PCTCN2019118288-appb-100007
The waveform index is calculated using the following formula:
Figure PCTCN2019118288-appb-100007
脉冲指标采用如下公式计算:
Figure PCTCN2019118288-appb-100008
The impulse index is calculated using the following formula:
Figure PCTCN2019118288-appb-100008
峭度指标采用如下公式计算:
Figure PCTCN2019118288-appb-100009
The kurtosis index is calculated using the following formula:
Figure PCTCN2019118288-appb-100009
其中,x i为单次Agent采集的日志数据;N为数据采集的次数;
Figure PCTCN2019118288-appb-100010
为采集的日志数据的算术平均值;X rms为采集的日志数据的有效值;X p为采集的日志数据的峰值;X r为采集的日志数据的方根幅值;X ws为采集的日志数据的波形指标;X if为采集的日志数据的脉冲指标;X kv为采集的日志数据的峭度指标。
Among them, x i is the log data collected by a single agent; N is the number of data collection;
Figure PCTCN2019118288-appb-100010
Is the arithmetic mean of the collected log data; X rms is the effective value of the collected log data; X p is the peak value of the collected log data; X r is the square root amplitude of the collected log data; X ws is the collected log The waveform index of the data; X if is the impulse index of the collected log data; X kv is the kurtosis index of the collected log data.
根据权利要求2所述的集群日志特征提取方法,其特征在于,The cluster log feature extraction method according to claim 2, characterized in that: 皮尔逊相关系数的公式如下:The formula of Pearson's correlation coefficient is as follows:
Figure PCTCN2019118288-appb-100011
Figure PCTCN2019118288-appb-100011
其中,x i为单次Agent采集的日志数据;y i为单次Agent采集数据中提取的某一特征值;
Figure PCTCN2019118288-appb-100012
是日志数据x 1,x 2...,x n的算数平均值;
Figure PCTCN2019118288-appb-100013
是y 1,y 2...,y n的算数平均值;N为日志数据采集的次数。
Among them, x i is the log data collected by a single agent; y i is a characteristic value extracted from the data collected by a single agent;
Figure PCTCN2019118288-appb-100012
Is the arithmetic mean of log data x 1 , x 2 ..., x n ;
Figure PCTCN2019118288-appb-100013
Is the arithmetic mean of y 1 , y 2 ..., y n ; N is the number of log data collections.
根据权利要求1所述的集群日志特征提取方法,其特征在于,The cluster log feature extraction method according to claim 1, characterized in that: Flume包括多个第一层级Agent和一个第二层级Agent,每个第一层级Agent分别对应的采集一个服务器的日志数据,多个第一层级Agent采集的日志数据汇集至第二层级Agent,并由第二层级Agent传输至HDFS中。Flume includes multiple first-level agents and one second-level agent. Each first-level agent collects log data from a server respectively. The log data collected by multiple first-level agents are collected to the second-level agent, and the The second-level agent is transmitted to HDFS. 一种集群日志特征提取装置,其特征在于,包括:日志采集模块、数据 清洗模块、特征提取模块、有效特征筛选模块,A cluster log feature extraction device, which is characterized by comprising: a log collection module, a data cleaning module, a feature extraction module, and an effective feature screening module, 其中,日志采集模块用于通过flume客户端采集服务器集群的日志,发送至Hbase数据库,其中,flume客户端通过多个Agent进程对应采集服务器集群中的每台服务器的日志,Agent定时将对应的服务器上的日志数据收集并通过API接口发送到Hbase数据库;Among them, the log collection module is used to collect the logs of the server cluster through the flume client and send them to the Hbase database. Among them, the flume client collects the logs of each server in the server cluster through multiple agent processes, and the agent periodically sends the corresponding server The log data on the system is collected and sent to the Hbase database through the API interface; 数据清洗模块用于利用Hadoop对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量;The data cleaning module is used to use Hadoop to clean the log data and filter out the original data. The original data includes at least the server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume; 特征提取模块用于对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取;The feature extraction module is used to extract the feature value of the original data including mean value, effective value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index; 有效特征筛选模块运用皮尔逊相关系数筛选出有效特征:将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则是有效数据,低于相关度阈值则是无效数据,并予以剔除。The effective feature screening module uses the Pearson correlation coefficient to filter out the effective features: the extracted feature value is calculated with the original data for the Pearson correlation coefficient, and the calculated correlation coefficient is compared with the correlation threshold. If it is higher than the correlation threshold The data is valid, and the data is invalid if it is lower than the correlation threshold, and it will be eliminated. 根据权利要求7所述的集群日志特征提取装置,其特征在于,数据清洗模块包括拉依达准则判定单元,所述拉依达准则判定单元采用拉依达准则剔除具有粗大误差的数据,包括以下步骤:The cluster log feature extraction device according to claim 7, wherein the data cleaning module includes a Laida criterion determining unit, and the Laida criterion determining unit uses the Laida criterion to eliminate data with gross errors, including the following step: 对日志数据x 1,x 2...,x n,计算其算术平均值
Figure PCTCN2019118288-appb-100014
及剩余误差
Figure PCTCN2019118288-appb-100015
其中,x i为单次Agent采集数据值;
For log data x 1 , x 2 ..., x n , calculate its arithmetic average
Figure PCTCN2019118288-appb-100014
And residual error
Figure PCTCN2019118288-appb-100015
Among them, x i is the data value collected by a single agent;
计算标准偏差S x
Figure PCTCN2019118288-appb-100016
Calculate the standard deviation S x ,
Figure PCTCN2019118288-appb-100016
若数据x b的剩余误差v b(1≤b≤n),满足下式
Figure PCTCN2019118288-appb-100017
If the residual error v b (1≤b≤n) of the data x b satisfies the following formula
Figure PCTCN2019118288-appb-100017
则认为x b是含有粗大误差值的奇异值,并剔除所述奇异值。 It is considered that x b is a singular value with a gross error value, and the singular value is eliminated.
根据权利要求8所述的集群日志特征提取装置,其特征在于,数据清洗模块还包括奇异值替换单元,所述奇异值替换单元对日志数据的奇异值用中值替代,其中所述中值是指将各个日志数据x 1,x 2...,x n按大小顺序排列,处于中间位置的值称为中值。 The cluster log feature extraction device according to claim 8, wherein the data cleaning module further comprises a singular value replacement unit, and the singular value replacement unit replaces the singular value of the log data with a median value, wherein the median value is Refers to arranging each log data x 1 , x 2 ..., x n in order of size, and the value in the middle position is called the median value. 根据权利要求8所述的集群日志特征提取装置,其特征在于,特征提取模块包括均值提取单元、有效值提取单元、峰值提取单元、方根幅值提取单元、波形指标提取单元、脉冲指标提取单元、峭度指标提取单元,分别对 原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取,其中,The cluster log feature extraction device according to claim 8, wherein the feature extraction module includes a mean value extraction unit, an effective value extraction unit, a peak value extraction unit, a square root amplitude extraction unit, a waveform index extraction unit, and an impulse index extraction unit , The kurtosis index extraction unit separately extracts the characteristic values of the original data including the mean value, the effective value, the peak value, the square root amplitude, the waveform index, the impulse index, and the kurtosis index. 有效值采用如下公式计算:
Figure PCTCN2019118288-appb-100018
The effective value is calculated using the following formula:
Figure PCTCN2019118288-appb-100018
峰值采用如下公式计算:X p=max(x i) The peak value is calculated using the following formula: X p =max(x i ) 方根幅值采用如下公式计算:
Figure PCTCN2019118288-appb-100019
The square root amplitude is calculated using the following formula:
Figure PCTCN2019118288-appb-100019
波形指标采用如下公式计算:
Figure PCTCN2019118288-appb-100020
The waveform index is calculated using the following formula:
Figure PCTCN2019118288-appb-100020
脉冲指标采用如下公式计算:
Figure PCTCN2019118288-appb-100021
The impulse index is calculated using the following formula:
Figure PCTCN2019118288-appb-100021
峭度指标采用如下公式计算:
Figure PCTCN2019118288-appb-100022
The kurtosis index is calculated using the following formula:
Figure PCTCN2019118288-appb-100022
其中,x i为单次Agent采集的日志数据;N为日志数据采集的次数;
Figure PCTCN2019118288-appb-100023
为采集的日志数据的算术平均值;X rms为采集的日志数据的有效值;X p为采集的日志数据的峰值;X r为采集的日志数据的方根幅值;X ws为采集的日志数据的波形指标;X if为采集的日志数据的脉冲指标;X kv为采集的日志数据的峭度指标。
Among them, x i is the log data collected by a single agent; N is the number of log data collection;
Figure PCTCN2019118288-appb-100023
Is the arithmetic mean of the collected log data; X rms is the effective value of the collected log data; X p is the peak value of the collected log data; X r is the square root amplitude of the collected log data; X ws is the collected log The waveform index of the data; X if is the impulse index of the collected log data; X kv is the kurtosis index of the collected log data.
根据权利要求8所述的集群日志特征提取装置,其特征在于,皮尔逊相关系数的公式如下:The cluster log feature extraction device according to claim 8, wherein the formula of Pearson correlation coefficient is as follows:
Figure PCTCN2019118288-appb-100024
Figure PCTCN2019118288-appb-100024
其中,x i为单次Agent采集的日志数据;y i为单次Agent采集数据中提取的某一特征值;
Figure PCTCN2019118288-appb-100025
是日志数据x 1,x 2...,x n的算数平均值;
Figure PCTCN2019118288-appb-100026
是y 1,y 2...,y n的算数平均值;N为日志数据采集的次数。
Among them, x i is the log data collected by a single agent; y i is a characteristic value extracted from the data collected by a single agent;
Figure PCTCN2019118288-appb-100025
Is the arithmetic mean of log data x 1 , x 2 ..., x n ;
Figure PCTCN2019118288-appb-100026
Is the arithmetic mean of y 1 , y 2 ..., y n ; N is the number of log data collections.
根据权利要求7所述的集群日志特征提取装置,其特征在于,日志采集模块还包括Agent设置单元,用于针对Flume进行包括多个第一层级Agent 和一个第二层级Agent的设置,每个第一层级Agent分别对应的采集一个服务器的日志数据,多个第一层级Agent采集的日志数据汇集至第二层级Agent,并由第二层级Agent传输至HDFS中。The cluster log feature extraction device according to claim 7, wherein the log collection module further comprises an agent setting unit, which is used to set a plurality of first-level agents and a second-level agent for Flume. The first-level agents respectively collect log data of a server, and the log data collected by multiple first-level agents are collected to the second-level agent, and the second-level agent is transmitted to HDFS. 一种电子设备,其特征在于,该电子设备包括:存储器和处理器,所述存储器中存储有集群日志特征提取程序,所述集群日志特征提取程序被所述处理器执行时实现如下步骤:An electronic device, characterized in that it includes a memory and a processor, the memory stores a cluster log feature extraction program, and the cluster log feature extraction program is executed by the processor to implement the following steps: 通过flume客户端采集服务器集群的日志,发送至Hbase数据库,其中,flume客户端通过多个Agent进程对应采集服务器集群中的每台服务器的日志,Agent定时将对应的服务器上的日志数据收集并通过API接口发送到Hbase数据库;Logs of the server cluster are collected through the flume client and sent to the Hbase database. The flume client collects the logs of each server in the server cluster through multiple agent processes. The agent regularly collects and passes the log data on the corresponding server. API interface is sent to Hbase database; 利用Hadoop对日志数据进行数据清洗,筛选出原始数据,其中原始数据至少包括服务器磁盘占用率、内存使用率、cpu占用率、业务接口调用量;Use Hadoop to clean the log data and filter out the original data. The original data includes at least the server disk occupancy rate, memory usage rate, cpu occupancy rate, and business interface call volume; 对原始数据进行包括均值、有效值、峰值、方根幅值、波形指标、脉冲指标、峭度指标的特征值提取;Extract the feature value of the original data including mean value, effective value, peak value, square root amplitude, waveform index, impulse index, and kurtosis index; 运用皮尔逊相关系数筛选出有效特征:将提取的特征值分别与原始数据进行皮尔逊相关系数的运算,根据计算出的相关系数与相关度阈值进行比较,高于相关度阈值则是有效数据,低于相关度阈值则是无效数据,并予以剔除。Use the Pearson correlation coefficient to filter out the effective features: the extracted feature values are calculated with the original data for the Pearson correlation coefficient, and the calculated correlation coefficient is compared with the correlation threshold. If it is higher than the correlation threshold, it is valid data. Data below the correlation threshold is invalid data and will be eliminated. 根据权利要求13所述的电子设备,其特征在于,The electronic device according to claim 13, wherein: 数据清洗中采用拉依达准则剔除具有粗大误差的数据,包括以下步骤:In data cleaning, the Laida criterion is used to eliminate data with gross errors, including the following steps: 对日志数据x 1,x 2...,x n,计算其算术平均值
Figure PCTCN2019118288-appb-100027
及剩余误差
Figure PCTCN2019118288-appb-100028
其中,x i为单次Agent采集数据值;
For log data x 1 ,x 2 ...,x n , calculate the arithmetic average
Figure PCTCN2019118288-appb-100027
And residual error
Figure PCTCN2019118288-appb-100028
Among them, x i is the data value collected by a single agent;
计算标准偏差S x
Figure PCTCN2019118288-appb-100029
Calculate the standard deviation S x ,
Figure PCTCN2019118288-appb-100029
若日志数据中的x b的剩余误差v b(1≤b≤n),满足公式
Figure PCTCN2019118288-appb-100030
If the residual error v b of x b in the log data (1≤b≤n), it satisfies the formula
Figure PCTCN2019118288-appb-100030
则认为x b是含有粗大误差值的奇异值,并剔除所述奇异值。 It is considered that x b is a singular value with a gross error value, and the singular value is eliminated.
根据权利要求14所述的电子设备,其特征在于,The electronic device according to claim 14, wherein: 对日志数据中的奇异值用中值替代,其中所述中值是指将各个日志数据x 1,x 2...,x n按大小顺序排列,处于中间位置的值称为中值。 The singular value in the log data is replaced with a median value, where the median value refers to arranging each log data x 1 , x 2 ..., x n in order of size, and the value in the middle position is called the median value. 根据权利要求13所述的电子设备,其特征在于,Flume包括多个第一层级Agent和一个第二层级Agent,每个第一层级Agent分别对应的采集一个 服务器的日志数据,多个第一层级Agent采集的日志数据汇集至第二层级Agent,并由第二层级Agent传输至HDFS中。The electronic device according to claim 13, wherein Flume includes a plurality of first-level agents and a second-level agent, and each first-level agent collects log data of a server respectively, and multiple first-level agents The log data collected by the agent is collected to the second-level agent and transmitted to the HDFS by the second-level agent. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,实现权利要求1所述的集群日志特征提取方法。A computer nonvolatile readable storage medium, wherein the computer nonvolatile readable storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a processor, Implement the cluster log feature extraction method of claim 1. 根据权利要求17所述的计算机非易失性可读存储介质,其特征在于,在数据清洗过程中,采用拉依达准则剔除具有粗大误差的数据,包括以下步骤:18. The computer non-volatile readable storage medium of claim 17, wherein in the data cleaning process, using the Laida criterion to remove data with gross errors includes the following steps: 对日志数据x 1,x 2...,x n,计算其算术平均值
Figure PCTCN2019118288-appb-100031
及剩余误差
Figure PCTCN2019118288-appb-100032
其中,x i为单次Agent采集的日志数据;
For log data x 1 ,x 2 ...,x n , calculate the arithmetic average
Figure PCTCN2019118288-appb-100031
And residual error
Figure PCTCN2019118288-appb-100032
Among them, x i is the log data collected by a single agent;
计算标准偏差S x
Figure PCTCN2019118288-appb-100033
Calculate the standard deviation S x ,
Figure PCTCN2019118288-appb-100033
若日志数据中的x b的剩余误差v b(1≤b≤n),满足公式
Figure PCTCN2019118288-appb-100034
If the residual error v b of x b in the log data (1≤b≤n), it satisfies the formula
Figure PCTCN2019118288-appb-100034
则确定x b是含有粗大误差值的奇异值,并剔除奇异值。 It is determined that x b is a singular value with a gross error value, and the singular value is eliminated.
根据权利要求18所述的计算机非易失性可读存储介质,其特征在于,对日志数据的奇异值用中值替代,其中所述中值是指将各个日志数据x 1,x 2...,x n按大小顺序排列,处于中间位置的值称为中值。 The computer non-volatile readable storage medium according to claim 18, wherein the singular value of the log data is replaced with a median value, wherein the median value refers to each log data x 1 , x 2 .. ., x n are arranged in order of size, and the value in the middle is called the median. 根据权利要求17所述的计算机非易失性可读存储介质,其特征在于,Flume包括多个第一层级Agent和一个第二层级Agent,每个第一层级Agent分别对应的采集一个服务器的日志数据,多个第一层级Agent采集的日志数据汇集至第二层级Agent,并由第二层级Agent传输至HDFS中。The computer non-volatile readable storage medium of claim 17, wherein Flume comprises a plurality of first-level agents and a second-level agent, and each first-level agent collects a log of a server respectively Data, the log data collected by multiple first-level agents are collected to the second-level agent, and transmitted to the HDFS by the second-level agent.
PCT/CN2019/118288 2019-02-19 2019-11-14 Cluster log feature extraction method, and apparatus, device and storage medium Ceased WO2020168756A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910123928.1 2019-02-19
CN201910123928.1A CN109992569A (en) 2019-02-19 2019-02-19 Cluster log feature extracting method, device and storage medium

Publications (1)

Publication Number Publication Date
WO2020168756A1 true WO2020168756A1 (en) 2020-08-27

Family

ID=67129790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118288 Ceased WO2020168756A1 (en) 2019-02-19 2019-11-14 Cluster log feature extraction method, and apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN109992569A (en)
WO (1) WO2020168756A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113867142A (en) * 2021-09-09 2021-12-31 北京小米移动软件有限公司 Sensor control method, device, electronic device and storage medium
CN114201304A (en) * 2021-12-15 2022-03-18 平安科技(深圳)有限公司 Application running method, device, device and storage medium
CN114689916A (en) * 2022-03-31 2022-07-01 国网河北省电力有限公司营销服务中心 Intelligent electric energy meter metering error analysis system
CN114840566A (en) * 2022-04-06 2022-08-02 西人马(深圳)科技有限责任公司 A method and system for wind vibration data management based on two-level architecture
CN117171594A (en) * 2023-09-04 2023-12-05 中国建设银行股份有限公司 Indicator configuration method, device, electronic equipment and computer-readable medium
CN119249121A (en) * 2024-08-26 2025-01-03 中国建设银行股份有限公司 Feature screening method, device, equipment and medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992569A (en) * 2019-02-19 2019-07-09 平安科技(深圳)有限公司 Cluster log feature extracting method, device and storage medium
CN110737648B (en) * 2019-09-17 2024-05-07 平安科技(深圳)有限公司 Performance feature dimension reduction method and device, electronic equipment and storage medium
CN111290916B (en) * 2020-02-18 2022-11-25 深圳前海微众银行股份有限公司 Big data monitoring method, device, equipment and computer-readable storage medium
CN111984499B (en) * 2020-08-04 2024-05-28 中国建设银行股份有限公司 Fault detection method and device for big data cluster
CN112069036B (en) * 2020-11-10 2021-09-03 南京信易达计算技术有限公司 Management and monitoring system based on cluster computing
CN113945684A (en) * 2021-10-14 2022-01-18 中国计量科学研究院 Big data-based micro air station self-calibration method
CN114911843A (en) * 2022-05-11 2022-08-16 中国平安人寿保险股份有限公司 Service index reporting method, device and computer readable storage medium
CN117056182B (en) * 2023-07-13 2024-05-03 北京新数科技有限公司 SQL SERVER database performance evaluation method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356550B1 (en) * 2001-06-25 2008-04-08 Taiwan Semiconductor Manufacturing Company Method for real time data replication
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN106570151A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Data collection processing method and system for mass files
CN106845799A (en) * 2016-12-29 2017-06-13 中国电力科学研究院 A kind of appraisal procedure of battery energy storage system typical condition
CN107092592A (en) * 2017-04-10 2017-08-25 浙江鸿程计算机系统有限公司 A kind of personalized method for recognizing semantics in the place based on type multiple-situation data and cost-sensitive integrated model
CN109992569A (en) * 2019-02-19 2019-07-09 平安科技(深圳)有限公司 Cluster log feature extracting method, device and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9904893B2 (en) * 2013-04-02 2018-02-27 Patternex, Inc. Method and system for training a big data machine to defend
CN105353644B (en) * 2015-09-29 2018-06-15 中国人民解放军63892部队 Radar Target Track flavor and method based on real-equipment data information excavating
US10394868B2 (en) * 2015-10-23 2019-08-27 International Business Machines Corporation Generating important values from a variety of server log files
CN106769032B (en) * 2016-11-28 2018-11-02 南京工业大学 Method for predicting service life of slewing bearing
CN108399199A (en) * 2018-01-30 2018-08-14 武汉大学 A kind of collection of the application software running log based on Spark and service processing system and method
CN109032910A (en) * 2018-07-24 2018-12-18 北京百度网讯科技有限公司 Log collection method, device and storage medium
CN109033404B (en) * 2018-08-03 2022-03-11 北京百度网讯科技有限公司 Log data processing method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356550B1 (en) * 2001-06-25 2008-04-08 Taiwan Semiconductor Manufacturing Company Method for real time data replication
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN106570151A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Data collection processing method and system for mass files
CN106845799A (en) * 2016-12-29 2017-06-13 中国电力科学研究院 A kind of appraisal procedure of battery energy storage system typical condition
CN107092592A (en) * 2017-04-10 2017-08-25 浙江鸿程计算机系统有限公司 A kind of personalized method for recognizing semantics in the place based on type multiple-situation data and cost-sensitive integrated model
CN109992569A (en) * 2019-02-19 2019-07-09 平安科技(深圳)有限公司 Cluster log feature extracting method, device and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113867142A (en) * 2021-09-09 2021-12-31 北京小米移动软件有限公司 Sensor control method, device, electronic device and storage medium
CN114201304A (en) * 2021-12-15 2022-03-18 平安科技(深圳)有限公司 Application running method, device, device and storage medium
CN114201304B (en) * 2021-12-15 2024-11-12 平安科技(深圳)有限公司 Application program operation method, device, equipment and storage medium
CN114689916A (en) * 2022-03-31 2022-07-01 国网河北省电力有限公司营销服务中心 Intelligent electric energy meter metering error analysis system
CN114840566A (en) * 2022-04-06 2022-08-02 西人马(深圳)科技有限责任公司 A method and system for wind vibration data management based on two-level architecture
CN114840566B (en) * 2022-04-06 2025-05-30 合肥新理科技有限公司 A wind vibration data management method and system based on two-level architecture
CN117171594A (en) * 2023-09-04 2023-12-05 中国建设银行股份有限公司 Indicator configuration method, device, electronic equipment and computer-readable medium
CN119249121A (en) * 2024-08-26 2025-01-03 中国建设银行股份有限公司 Feature screening method, device, equipment and medium

Also Published As

Publication number Publication date
CN109992569A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
WO2020168756A1 (en) Cluster log feature extraction method, and apparatus, device and storage medium
US12155693B1 (en) Rapid predictive analysis of very large data sets using the distributed computational graph
US9590880B2 (en) Dynamic collection analysis and reporting of telemetry data
JP2022160405A (en) ALARM LOG COMPRESSION METHOD, APPARATUS AND SYSTEM, AND STORAGE MEDIUM
WO2021051529A1 (en) Method, apparatus and device for estimating cloud host resources, and storage medium
US20210092160A1 (en) Data set creation with crowd-based reinforcement
US10860962B2 (en) System for fully integrated capture, and analysis of business information resulting in predictive decision making and simulation
KR102301946B1 (en) Visual tools for failure analysis in distributed systems
CN104598495A (en) Hierarchical storage method and system based on distributed file system
US11636549B2 (en) Cybersecurity profile generated using a simulation engine
US11372904B2 (en) Automatic feature extraction from unstructured log data utilizing term frequency scores
CA3167981C (en) Offloading statistics collection
CN104917836A (en) Method and device for monitoring and analyzing availability of computing equipment based on cluster
EP3440569A1 (en) System for fully integrated capture, and analysis of business information resulting in predictive decision making and simulation
CN112015995B (en) Method, device, equipment and storage medium for data analysis
CN110519263A (en) Anti- brush amount method, apparatus, equipment and computer readable storage medium
US20150281037A1 (en) Monitoring omission specifying program, monitoring omission specifying method, and monitoring omission specifying device
Lee et al. Detecting anomaly teletraffic using stochastic self-similarity based on Hadoop
CN112597490A (en) Security threat arrangement response method and device, electronic equipment and readable storage medium
CN119621386A (en) Fault diagnosis method, device, electronic equipment and storage medium
CN118820026A (en) Cloud service cluster status monitoring method, device, equipment and storage medium
US20120233224A1 (en) Data processing
CN116132111B (en) Attack identification method and device based on mouse track data in network traffic
JP6508202B2 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
WO2025027655A1 (en) Method and system for data management in a network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19915659

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19915659

Country of ref document: EP

Kind code of ref document: A1