CN114036120A

CN114036120A - A real-time analysis method and system based on massive log data

Info

Publication number: CN114036120A
Application number: CN202111298565.9A
Authority: CN
Inventors: 王宜才; 丁正; 顾晓东; 祝敬安; 韦红; 刘志永; 卢亚洲; 高树江; 邢喜云
Original assignee: Shanghai Xinfang Software Co ltd; Shanghai Cintel Intelligent System Co ltd
Current assignee: Shanghai Xinfang Software Co ltd; Shanghai Cintel Intelligent System Co ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-02-11

Abstract

The present application discloses a real-time analysis method based on massive log data. The method includes: acquiring input query conditions, combining the query conditions into keywords, performing a search based on the stored big data according to the keywords, and performing a search on the searched result set. The analysis is performed according to the set analysis strategy, and the analysis result is obtained; wherein, the big data is obtained in the following manner: the data associated with the business in each business process of each business point is recorded, the log data of each business point is generated, and the real-time log data of each business point is generated. The log data of each business point is collected and cached, and the cached log data is filtered and structured to obtain big data for persistent storage, and the big data is stored. The present application can satisfy operators' real-time analysis of call logs, thereby minimizing log analysis time and quickly locating problems.

Description

A real-time analysis method and system based on massive log data

技术领域technical field

本发明涉及通信领域，特别地，涉及一种基于海量日志数据的实时分析方法。The present invention relates to the field of communications, and in particular, to a real-time analysis method based on massive log data.

背景技术Background technique

随着通信技术的发展，越来越多的用户借助于手机、固定电话、网络等通信技术进行沟通和交流，在用户的使用过程中涉及到语音、视频、音频等各种设备的调用。With the development of communication technology, more and more users communicate and communicate by means of communication technologies such as mobile phones, fixed telephones, and networks. The use of users involves the invocation of various devices such as voice, video, and audio.

用户希望有一个安全、稳定的通话环境，减少和杜绝非正常通话的发生；运营商等也希望能够通过技术途径从源头上杜绝非正常通话，给公众一个安全、稳定的通话环境，并且能够对用户的使用习惯进行分析汇总，以改善用户体验。Users hope to have a safe and stable call environment to reduce and eliminate the occurrence of abnormal calls; operators, etc. also hope to eliminate abnormal calls from the source through technical means, provide the public with a safe and stable call environment, and be able to The user's usage habits are analyzed and summarized to improve the user experience.

目前针对业务日志的存储及分析技术主要集中在如下几种：At present, the storage and analysis technologies for business logs mainly focus on the following:

1、通过文本文件实现1. Implemented through a text file

在每个业务点上保存文本文件，文本文件中包括时间、业务名、机器名、自动机号、主叫号码、被叫号码、原被叫、消息类型和内容；当有分析需求时，对每个业务点的文本文件进行扫描，从中过滤出需要的信息，再进行分析、整理。Save a text file at each service point, the text file includes time, service name, machine name, automatic machine number, calling number, called number, original called, message type and content; when there is an analysis requirement, the The text files of each business point are scanned, and the required information is filtered out, and then analyzed and sorted.

2、通过关系型数据库实现2. Implemented through relational database

将每个业务点上的业务日志实时汇总到关系型数据库，对数据按需求建立索引；当有分析需求时，对符合时间段、呼叫号码等条件的数据进行全量扫描、查询，从中获取需要的信息。The business logs on each business point are aggregated to the relational database in real time, and the data is indexed according to the requirements; when there is an analysis requirement, the data that meets the conditions such as time period and call number is scanned and queried in full to obtain the required data. information.

3、基于关系型数据库与文本文件的结合实现3. Realization based on the combination of relational database and text file

通过对每个业务点传输过来的数据按照主叫号码、被叫号码、业务名、自动机号进行分类汇总，每一通话产生唯一任务标识(ID)，然后把每一通话不同的数据分别存在不同的数据文件中，在数据库中存储主叫号码、被叫号码、业务名、自动机号、任务ID与数据文件的对应关系；使用时，根据查询条件找到任务ID，获取数据文件列表，读取所有文件，汇总数据，进行数据分析、展示。By classifying and summarizing the data transmitted from each service point according to the calling number, called number, service name, and automatic machine number, each call generates a unique task identifier (ID), and then the different data of each call are stored separately. In different data files, the corresponding relationship between calling number, called number, service name, automaton number, task ID and data file is stored in the database; when using, find the task ID according to the query conditions, obtain the list of data files, read Take all files, summarize data, perform data analysis and display.

其中，业务点为通信过程中使用的各个电信设备、应用软件。Among them, the service point is each telecommunication equipment and application software used in the communication process.

以上方法均有一些缺陷，具体如下:The above methods have some drawbacks, as follows:

对于通过文本文件实现的方法，这种方法需要逐个文件进行读取、查询，对数据分析的滞后性高。For the method implemented through text files, this method needs to read and query files one by one, and has a high lag in data analysis.

对于通过关系型数据库实现的方法，这种方法所有的数据都存在关系型数据库内，对于与日俱增的海量日志，例如亿万级日志，建立索引、分析查询都影响响应较慢。For the method implemented by the relational database, all data in this method is stored in the relational database. For the ever-increasing mass of logs, such as hundreds of millions of logs, indexing and analyzing queries will affect the response and slow down.

对于关系型数据库与文本文件的结合的方法，这种方法对每一条数据都要进行基于主叫号码、被叫号码、业务名、自动机号进行分类汇总的数据预处理，数据预处理节点会形成瓶颈。For the method of combining relational database and text file, this method performs data preprocessing based on the calling number, called number, service name, and automatic machine number for each piece of data. The data preprocessing node will form a bottleneck.

同时，以上所有方法共同存在的缺陷是：At the same time, the common defects of all the above methods are:

1.数据存储管理不方便。不管是文件形式的，还是存储在数据库中，当积累一段时间后，海量日志对存储和备份都增加的了难度，处理不当，会造成数据流失。1. Data storage management is inconvenient. Whether it is in the form of a file or stored in a database, when accumulated for a period of time, the massive log increases the difficulty of storage and backup. Improper handling will result in data loss.

2.在海量日志的情况下，数据查询响应严重滞后，不能满足分析数据实时性的基本需求。2. In the case of massive logs, the data query response is seriously delayed, which cannot meet the basic needs of real-time analysis of data.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种基于海量日志数据的实时分析方法，以减少日志的分析时间。The present invention provides a real-time analysis method based on massive log data to reduce log analysis time.

本发明提供的一种基于海量日志数据的实时分析方法是这样实现的：A real-time analysis method based on massive log data provided by the present invention is implemented as follows:

获取输入的查询条件，将查询条件组合为关键字，Get the input query conditions, combine the query conditions into keywords,

基于存储的大数据，根据关键字进行搜索，Based on the stored big data, search by keywords,

对搜索到的结果集按照设定的分析策略进行分析，得到分析结果；Analyze the searched result set according to the set analysis strategy, and obtain the analysis result;

其中，in,

所述大数据以如下方式获得：The big data is obtained as follows:

将各业务点的各业务过程中与业务关联的数据进行记录，生成各业务点的日志数据，Record the data associated with the business in each business process of each business point, and generate log data of each business point,

实时采集各业务点的日志数据，并进行缓存，Collect the log data of each business point in real time and cache it,

对缓存的日志数据进行过滤和结构化处理，得到用于持久化存储的大数据，Filter and structure the cached log data to obtain big data for persistent storage,

将大数据进行存储。Store big data.

较佳地，所述大数据以分布式集群方式存储，其中，热数据存储于第一分布式存储设备，所有数据作为冷数据存储于第二分布式存储设备，当冷数据被访问后作为热数据转存至第一分布式存储设备，所述第一分布式存储设备的性能高于第二分布式存储设备的性能，Preferably, the big data is stored in a distributed cluster manner, wherein hot data is stored in the first distributed storage device, all data is stored as cold data in the second distributed storage device, and when the cold data is accessed, it is stored as hot data. data is transferred to a first distributed storage device, the performance of the first distributed storage device is higher than the performance of the second distributed storage device,

基于存储的大数据，根据关键字进行搜索，包括，Based on stored big data, search by keywords, including,

优先搜索热数据，Prioritize the search for hot data,

在热数据搜索不到的情况下，再搜索冷数据。When the hot data cannot be searched, the cold data is searched.

可选地，该方法进一步包括，将分析结果进行图形展示或表格展示，其中，图形化展示是时序化的二维流程图，每一步流程显示该流程的详细信息。Optionally, the method further includes displaying the analysis results in a graph or in a table, wherein the graphical presentation is a time-series two-dimensional flowchart, and each step of the process displays detailed information of the process.

可选地，所述实时采集各业务点的日志数据，并进行缓存，包括，Optionally, the real-time collection of log data of each service point and buffering, including,

从各业务点采集各业务点的日志数据，Collect log data of each business point from each business point,

根据日志数据的类型，以分布式集群方式分别缓存到相应的队列，并以生产者、消费者的模式对缓存的数据进行管理，当所缓存的数据被所有消费者使用后，则删除该数据。According to the type of log data, it is cached to the corresponding queue in a distributed cluster mode, and the cached data is managed in the mode of producer and consumer. When the cached data is used by all consumers, the data is deleted.

本发明还提供一种基于海量日志数据的实时分析系统，该系统包括，The present invention also provides a real-time analysis system based on massive log data, the system comprising:

日志数据产生装置，用于将各业务点的各业务过程中与业务关联的数据进行记录，生成各业务点的日志数据，A log data generating device is used to record the data associated with the business in each business process of each business point, and generate log data of each business point,

日志数据采集装置，用于实时采集各业务点的日志数据，The log data collection device is used to collect the log data of each business point in real time,

大数据缓存装置，用于将所采集的各业务点的日志数据进行缓存，The big data cache device is used to cache the collected log data of each business point,

数据过滤确认装置，用于对缓存的日志数据进行过滤和结构化处理，得到用于持久化存储的大数据，The data filtering confirmation device is used to filter and structure the cached log data to obtain big data for persistent storage.

大数据存储装置，用于将大数据进行存储，Big data storage device for storing big data,

日志数据分析装置，用于获取输入的查询条件，将查询条件组合为关键字，基于存储的大数据，根据关键字进行搜索，对搜索到的结果集按照设定的分析策略进行分析，得到分析结果。The log data analysis device is used to obtain the input query conditions, combine the query conditions into keywords, search according to the keywords based on the stored big data, analyze the searched result set according to the set analysis strategy, and obtain the analysis result.

较佳地，该系统还包括，Preferably, the system further includes,

分析结果展示装置，用于将分析结果进行图形展示或表格展示，其中，图形化展示是时序化的二维流程图，每一步流程显示该流程的详细信息。The analysis result display device is used to display the analysis results in a graph or in a table, wherein the graphical display is a time-series two-dimensional flowchart, and each step of the process displays the detailed information of the process.

可选地，所述日志数据产生装置包括，Optionally, the log data generating device includes,

日志数据整理单元，用于按照标准协议与各个业务点进行交互，把各个业务点提供的数据进行整理，生成日志数据；The log data sorting unit is used for interacting with each business point according to the standard protocol, sorting the data provided by each business point, and generating log data;

日志数据文件单元，用于将日志数据整理单元生成的日志数据进行文件化处理，形成适合进行读取的文件；The log data file unit is used to file the log data generated by the log data sorting unit to form a file suitable for reading;

所述日志数据采集装置包括，The log data collection device includes:

日志数据收集单元，用于跟踪日志文件，并将事件数据提供给日志数据聚集单元使用；The log data collection unit is used to track log files and provide event data to the log data aggregation unit for use;

日志数据聚集单元，用于将日志数据收集单元采集的数据进行处理后至少向第一日志数据输出单元传送一次，在第一日志数据输出单元被阻止且未确认所有已传送事件的情况下，持续尝试向第一日志数据输出单元发送数据，直到日志数据输出单元输出确认已接收事件为止；The log data aggregation unit is used to process the data collected by the log data collection unit and transmit it to the first log data output unit at least once, and in the case that the first log data output unit is blocked and all transmitted events are not confirmed, continue Attempting to send data to the first log data output unit until the log data output unit outputs a confirmation that the event has been received;

第一日志数据输出单元，用于与大数据缓存装置进行交互，从日志数据聚集单元获取数据，将获取的数据传送给大数据缓存装置，当收到大数据缓存装置的确认事件，返回给日志数据聚集单元。The first log data output unit is used to interact with the big data cache device, obtain data from the log data aggregation unit, transmit the acquired data to the big data cache device, and return to the log when receiving a confirmation event from the big data cache device Data aggregation unit.

可选地，所述大数据缓存装置包括，Optionally, the big data cache device includes:

生产者单元，用于与第一日志数据输出单元进行交互，获取日志数据，并将数据按照类型发布到相应的数据队列单元；The producer unit is used to interact with the first log data output unit, obtain log data, and publish the data to the corresponding data queue unit according to the type;

数据队列单元，用于接收生产者单元发送来的数据进行存储，与消费者单元交互，从队列中读取数据提供给消费者单元；The data queue unit is used to receive the data sent by the producer unit for storage, interact with the consumer unit, read data from the queue and provide it to the consumer unit;

消费者单元，用于与数据队列单元交互，从队列中读取数据，将数据传送给数据过滤确认装置；The consumer unit is used to interact with the data queue unit, read data from the queue, and transmit the data to the data filtering confirmation device;

所述数据过滤确认装置包括，The data filtering and confirming device includes:

日志数据输入单元，用于与大数据缓存装置进行交互，从大数据缓存装置中以连续的流传输方式获取数据；The log data input unit is used to interact with the big data cache device, and obtain data from the big data cache device in a continuous stream transmission mode;

日志数据过滤单元，用于对数据进行过滤、重组、确认之一或其任意组合的处理，以及通用格式的转换；The log data filtering unit is used to filter, reorganize, confirm one or any combination of data processing, and convert the general format;

第二日志数据输出单元，用于将日志数据过滤单元生成的数据输出到大数据存储装置中；a second log data output unit, configured to output the data generated by the log data filtering unit to the big data storage device;

所述大数据存储装置包括，The big data storage device includes,

第一数据接口单元，用于对大数据进行数据的输入存储和查询输出。The first data interface unit is used to input, store, query and output big data.

搜索分析单元，用于根据第一数据接口单元提供的数据查询条件，从数据存储单元中获取相应的数据。The search and analysis unit is configured to acquire corresponding data from the data storage unit according to the data query condition provided by the first data interface unit.

数据存储单元，用于对日志数据进行存储。A data storage unit for storing log data.

可选地，所述日志数据分析装置包括，Optionally, the log data analysis device includes:

第二数据接口单元，用于与大数据存储装置交互，为查询大数据存储装置中的数据提供统一读取接口，获取数据分析单元计算所需的数据；The second data interface unit is used for interacting with the big data storage device, providing a unified reading interface for querying the data in the big data storage device, and obtaining the data required for the calculation of the data analysis unit;

数据分析单元，用于与第二数据接口单元和查询接口单元交互，提供实时计算和挖掘计算能力；a data analysis unit for interacting with the second data interface unit and the query interface unit to provide real-time computing and mining computing capabilities;

查询接口单元，用于与分析结果展示装置交互，将查询条件进行分析，形成适合的查询语句，交与数据分析单元进行查询，并将查询结果整理成规定格式的结果集，返回给分析结果展示装置；The query interface unit is used to interact with the analysis result display device, analyze the query conditions, form a suitable query statement, send it to the data analysis unit for query, and organize the query results into a result set in a specified format, and return it to the analysis result display device;

所述分析结果展示装置包括，The analysis result display device includes:

查询控制单元，用于日志数据查询任务的开始、停止、查询条件的录入与规范性的检测，The query control unit is used for the start, stop, input of query conditions and normative detection of log data query tasks,

查询请求单元，用于接收查询控制单元的查询请求，与日志数据分析装置交互，获取数据处理单元计算所需的数据；与数据处理单元进行交互，将查询到的数据交与数据处理单元进行处理。The query request unit is used to receive the query request of the query control unit, interact with the log data analysis device, and obtain the data required for the calculation of the data processing unit; interact with the data processing unit, and hand over the queried data to the data processing unit for processing .

数据处理单元，用于将来自查询请求单元的数据按照查询条件的要求进行转换和/或处理；A data processing unit for converting and/or processing the data from the query request unit according to the requirements of the query conditions;

结果展示单元，用于将数据处理单元整理完成的日志数据进行图形化或表格形式的展示。The result display unit is used to display the log data organized by the data processing unit in a graphical or tabular form.

可选地，所述大数据缓存装置为分布式集群缓存装置，该装置还包括，Optionally, the big data cache device is a distributed cluster cache device, and the device further includes:

第一集群管理单元，用于生产者单元、数据队列单元、以及消费者单元的集群管理；a first cluster management unit, used for cluster management of the producer unit, the data queue unit, and the consumer unit;

所述大数据存储装置为分布式集群存储装置，该装置还包括，The big data storage device is a distributed cluster storage device, and the device further includes:

第二集群管理单元，用于管理各个节点，并在所有节点上都提供搜索分析功能。The second cluster management unit is used to manage each node and provides search and analysis functions on all nodes.

本申请将各个业务点中生成的日志数据汇总为大数据进行存储，基于大数据按照查询条件进行搜索，改善了从业务日志中查找非正常通话原因的效率，避免了业务日志存储不及时、分析速度慢的问题。采用本申请基于海量日志数据的实时分析方法和系统，在运营商进行部署之后，能够满足运营商对通话日志的实时分析，从而最大限度的减少日志分析的时间，快速定位问题，避免运营商和用户而蒙受损失，真正为电话用户提供了安全的通话环境。例如，可以实时、有效的对于电话用户通话过程中遇到的故障进行定位，及时处理，大量减少通信网络中的故障呼叫数量，减少电话用户和运营商因电话故障而产生的损失，有效提高运营商的服务质量。This application aggregates the log data generated in each service point into big data for storage, and searches according to the query conditions based on the big data, which improves the efficiency of finding the cause of abnormal calls from service logs, and avoids untimely storage and analysis of service logs. slow problem. The real-time analysis method and system based on massive log data of the present application can satisfy the operator's real-time analysis of call logs after deployment by the operator, thereby minimizing the time for log analysis, quickly locating problems, and avoiding operators and Users suffer losses, and truly provide a safe call environment for phone users. For example, it is possible to locate and deal with the faults encountered by telephone users in real-time and effectively in a timely manner, thereby greatly reducing the number of faulty calls in the communication network, reducing the losses of telephone users and operators due to telephone faults, and effectively improving operations. provider's service quality.

附图说明Description of drawings

图1为本申请实施例基于海量日志的实时分析系统组成结构示意图。FIG. 1 is a schematic structural diagram of a real-time analysis system based on massive logs according to an embodiment of the present application.

图2为日志数据产生装置的一种示意图。FIG. 2 is a schematic diagram of a log data generating apparatus.

图3为日志数据采集装置的一种示意图。FIG. 3 is a schematic diagram of a log data collection device.

图4为大数据缓存装置的一种示意图。FIG. 4 is a schematic diagram of a big data cache device.

图5为数据过滤确认装置的一种示意图。FIG. 5 is a schematic diagram of a data filtering and confirming device.

图6为大数据存储装置的一种示意图。FIG. 6 is a schematic diagram of a big data storage device.

图7为日志数据分析装置的一种示意图。FIG. 7 is a schematic diagram of an apparatus for analyzing log data.

图8为分析结果装置的一种示意图。FIG. 8 is a schematic diagram of a device for analyzing results.

图9为本申请基于海量日志的实时分析方法的一种流程示意图。FIG. 9 is a schematic flowchart of a real-time analysis method based on massive logs of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术手段和优点更加清楚明白，以下结合附图对本申请做进一步详细说明。In order to make the objectives, technical means and advantages of the present application more clear, the present application will be further described in detail below with reference to the accompanying drawings.

参见图1所示，图1为本申请实施例基于海量日志的实时分析系统组成结构示意图。该系统包括，进行日志数据格式化输出的日志数据产生装置、对日志数据进行实时收集的日志数据采集装置、对日志数据进行临时存储的大数据缓存装置、对日志数据进行整理、过滤、确认的数据过滤确认装置、对日志数据进行持久化存储的大数据存储装置、对日志数据进行分析的日志数据分析装置以及对查询进行响应的分析结果展示装置；其中：Referring to FIG. 1 , FIG. 1 is a schematic structural diagram of a real-time analysis system based on massive logs according to an embodiment of the present application. The system includes: a log data generating device for formatted output of log data, a log data acquisition device for real-time collection of log data, a large data cache device for temporary storage of log data, and a log data for sorting, filtering and confirming. A data filtering confirmation device, a big data storage device for persistently storing log data, a log data analysis device for analyzing log data, and an analysis result display device for responding to a query; wherein:

日志数据产生装置，用于在各个业务点实现日志数据输出功能，具体地，按照指定的格式、时间间隔、在指定的路径生成日志文件，该日志文件详细记录了用户通信过程中所使用的各个业务模块、各个业务模块的使用流程，以及各个业务模块间传递的信息内容、信息状态等。The log data generating device is used to realize the log data output function at each service point, and specifically, according to the specified format, time interval, and specified path, the log file is generated. The business module, the use process of each business module, and the information content and information status transmitted between each business module.

日志数据采集装置，用于与位于各个业务点的日志数据产生装置进行交互，对从日志数据产生装置采集到的日志数据进行初步的整理，生成标准的统一日志数据，即结构化日志数据；与大数据缓存装置进行交互，将结构化日志数据推送至大数据缓存装置进行存储。其中，鉴于日志数据采集装置实时从各个业务节点的日志数据产生装置所获取日志数据较多，例如，电信通话业务中，一次通话中会产生多条日志数据，随着通话时间的延长，日志数据也会随着增加，如果不对数据进行处理，日后在海量的数据中查询一次通话的全部记录将会非常麻烦，因此，在对日志数据进行记录时，根据时间、业务、号码等为每一次通话生成唯一标识，记录在每一行日志数据中。鉴于日志数据中存在有同一行为数据被分别存储于多行中、字段为空等各种情况，日志数据采集装置会对这些情况进行整理，形成同一格式的数据提交给大数据缓存装置。The log data collection device is used for interacting with the log data generation device located at each business point, and preliminarily organizes the log data collected from the log data generation device to generate standard unified log data, that is, structured log data; and The big data cache device interacts and pushes the structured log data to the big data cache device for storage. Among them, in view of the fact that the log data collection device obtains a lot of log data from the log data generation device of each service node in real time, for example, in the telecommunications call service, multiple log data will be generated in one call. It will also increase. If the data is not processed, it will be very troublesome to query all the records of a call in the massive data in the future. Therefore, when recording the log data, according to the time, business, number, etc. Generate a unique ID, recorded in each line of log data. Considering that there are various situations in the log data, such as the same behavior data is stored in multiple rows, the fields are empty, etc., the log data collection device will sort out these situations, and submit the data in the same format to the big data cache device.

大数据缓存装置，用于负责对日志数据进行临时存储和中转，与日志数据采集装置进行交互，接收日志数据采集装置上报的日志数据，实时对日志数据进行存储，与数据过滤确认装置进行交互，从该装置读取日志数据，根据所读取成功后的日志数据，放弃所读取日志数据在大数据缓存装置中的存储。大数据缓存装置可以采用订阅模式，同时为多个订阅者提供数据，每条数据的订阅者都“消费”了该数据后，该数据才会从缓存队列中清除，这样保证了日志数据的安全性，使得数据过滤确认装置从缓存拿到日志数据后，按照大数据存储装置的格式对数据进行转换，过滤掉无用信息。The big data cache device is responsible for temporarily storing and transferring log data, interacting with the log data collection device, receiving log data reported by the log data collection device, storing the log data in real time, and interacting with the data filtering and confirming device, The log data is read from the device, and according to the successfully read log data, the storage of the read log data in the big data cache device is abandoned. The big data cache device can adopt the subscription mode to provide data to multiple subscribers at the same time. After the subscribers of each data "consume" the data, the data will be cleared from the cache queue, which ensures the security of log data. The data filtering confirmation device can convert the data according to the format of the big data storage device after obtaining the log data from the cache, and filter out the useless information.

数据过滤确认装置，用于与大数据缓存装置进行交互，从大数据缓存装置获取日志数据，对所获取的日志数据进行过滤、重组、确认，同时将确认后生成的新的日志数据反馈至大数据存储装置。The data filtering confirmation device is used to interact with the big data cache device, obtain log data from the big data cache device, filter, reorganize and confirm the acquired log data, and feed back the new log data generated after confirmation to the big data cache device. data storage device.

大数据存储装置是本系统的核心装置，用于与数据过滤确认装置进行交互，对来自数据过滤确认装置的日志数据进行持久化存储。同时，与日志数据分析装置进行交互，供日志数据分析装置查询数据。大数据存储装置对数据进行索引、分布式存储，例如，对日志数据中每一次通话的唯一标识进行索引，并根据时间对数据进行冷热分类分别存储，对近期的、经常使用的数据作为热数据，存储在性能高的第一分布式存储设备上，同时所有数据作为冷数据，会存储在数据存储性能一般的第二分布式的存储设备上，当冷数据被访问过后会作为热数据而转存到第一分布式存储设备上，这样可以既解决了实时性问题，也解决了海量数据存储带来的成本问题。The big data storage device is the core device of the system, which is used to interact with the data filtering and confirming device, and persistently store the log data from the data filtering and confirming device. At the same time, it interacts with the log data analysis device for the log data analysis device to query data. The big data storage device indexes and distributes the data, for example, indexes the unique identifier of each call in the log data, and classifies the data according to the time of hot and cold storage. The data is stored on the first distributed storage device with high performance. At the same time, all data is stored as cold data on the second distributed storage device with average data storage performance. When the cold data is accessed, it will be stored as hot data. Dumping to the first distributed storage device can not only solve the real-time problem, but also solve the cost problem caused by mass data storage.

日志数据分析装置，用于与大数据存储装置进行交互，获取实时日志数据，对日志数据进行分析、汇总，将结果集发送到分析结果展示装置。The log data analysis device is used to interact with the big data storage device, obtain real-time log data, analyze and summarize the log data, and send the result set to the analysis result display device.

分析结果展示装置，用于与日志数据分析装置进行交互，向日志数据分析装置发出查询请求，获取日志数据分析装置返回的结果集，同时将结果集进行图形化或表格的形式进行可视化展示。The analysis result display device is used to interact with the log data analysis device, issue a query request to the log data analysis device, obtain the result set returned by the log data analysis device, and visualize the result set in the form of graphs or tables.

参见图2所示，图2为日志数据产生装置的一种示意图。日志数据产生装置由日志数据整理单元和日志数据文件单元构成。其中：Referring to FIG. 2, FIG. 2 is a schematic diagram of a log data generating apparatus. The log data generating means is composed of a log data sorting unit and a log data file unit. in:

日志数据整理单元，用于按照标准协议与各个业务点进行交互，把各个业务点提供的通信消息数据进行整理，生成包含时间、业务类型、自动机号、通话唯一编码、主机号码、被叫号码、消息数据等内容的日志数据；The log data sorting unit is used to interact with each service point according to the standard protocol, sort out the communication message data provided by each service point, and generate a time, service type, automatic machine number, call unique code, host number, called number. , message data and other log data;

日志数据文件单元，用于将日志数据整理单元生成的日志数据进行文件化处理，形成适合进行读取的文件。日志数据文件单元支持按分钟、小时、天等时间周期形成txt、log等多种后缀的文件。也支持多种格式的数据源，这样，多种业务可同时使用日志数据来进行实时分析。The log data file unit is used to file the log data generated by the log data sorting unit to form a file suitable for reading. The log data file unit supports the formation of files with various suffixes such as txt and log in time periods such as minutes, hours, and days. It also supports data sources in multiple formats, so that multiple businesses can use log data for real-time analysis at the same time.

参见图3所示，图3为日志数据采集装置的一种示意图。日志数据采集装置由日志数据收集单元、日志数据聚集单元和第一日志数据输出单元组成。其中：Referring to FIG. 3 , FIG. 3 is a schematic diagram of a log data collection device. The log data collection device is composed of a log data collection unit, a log data aggregation unit and a first log data output unit. in:

日志数据聚集单元，用于将日志数据收集单元采集的数据进行处理后至少向第一日志数据输出单元传送一次，以确保不丢失数据。在第一日志数据输出单元被阻止且未确认所有已传送事件的情况下，日志数据聚集单元将持续尝试向第一日志数据输出单元发送数据，直到第一日志数据输出单元输出确认已接收事件为止；通过采用数据过滤整理技术，设置可支持多种格式文件的采集，并可以对多行数据进行汇总。The log data aggregating unit is configured to process the data collected by the log data collection unit and transmit it to the first log data output unit at least once to ensure that no data is lost. In the event that the first log data output unit is blocked and all transmitted events are not acknowledged, the log data aggregation unit will continue to attempt to send data to the first log data output unit until the first log data output unit outputs an acknowledgement of received events ; By adopting the data filtering and sorting technology, the setting can support the collection of files of various formats, and can summarize the data of multiple lines.

第一日志数据输出单元，用于与大数据缓存装置进行交互，从日志数据聚集单元获取数据，然后将获取的数据传送给大数据缓存装置，当收到大数据缓存装置的确认事件，返回给日志数据聚集单元，从而完成一次数据传送。The first log data output unit is used to interact with the big data cache device, obtain data from the log data aggregation unit, and then transmit the obtained data to the big data cache device, and return to the big data cache device when receiving the confirmation event. Log data aggregation unit to complete a data transfer.

参见图4所示，图4为大数据缓存装置的一种示意图。大数据缓存装置由生产者单元、数据队列单元、消费者单元和第一集群管理单元组成。其中：Referring to FIG. 4, FIG. 4 is a schematic diagram of a large data cache device. The big data cache device is composed of a producer unit, a data queue unit, a consumer unit and a first cluster management unit. in:

生产者单元，用于与第一日志数据采集装置的日志数据输出单元进行交互，获取日志数据，并将数据发布到数据队列单元；根据日志数据所属类型的不同，日志数据会被发布到相应的数据队列单元，例如，对不同业务的日志数据采用不同的关键字进行分类处理，形成独立的传输、存储。The producer unit is used to interact with the log data output unit of the first log data collection device, obtain log data, and publish the data to the data queue unit; according to the type of log data, the log data will be published to the corresponding The data queue unit, for example, uses different keywords to classify and process log data of different services to form independent transmission and storage.

数据队列单元，用于接收生产者单元发送来的数据进行存储，与消费者单元交互，从队列中读取数据，同一个数据可以被多个消费者单元使用；The data queue unit is used to receive the data sent by the producer unit for storage, interact with the consumer unit, read data from the queue, and the same data can be used by multiple consumer units;

第一集群管理单元，用于大数据缓存装置是一个分布式、分区的系统，集群管理单元用于生产者单元、数据队列单元、消费者单元的集群管理。The first cluster management unit is used for the big data cache. The device is a distributed and partitioned system, and the cluster management unit is used for cluster management of the producer unit, the data queue unit and the consumer unit.

这样，大数据缓存装置对日志数据进行以生产者、消费者的模式进行存储和管理，所述日志数据采集装置作为生产者将日志数据存储到大数据缓存装置，数据过滤确认装置作为消费者从大数据缓存装置消费数据。In this way, the big data cache device stores and manages log data in the mode of producers and consumers, the log data collection device acts as a producer to store log data in the big data cache device, and the data filtering confirmation device acts as a consumer from The big data cache device consumes data.

参见图5所示，图5为数据过滤确认装置的一种示意图。数据过滤确认装置是日志数据持久化存储前的最后整理，该装置从大数据缓存装置的消费者单元获取日志数据，按既定的格式进行整理、过滤、确认，然后传送给勇于持久化存储的大数据存储装置。数据过滤确认装置由日志数据输入单元、日志数据过滤单元和第二日志数据输出单元组成。其中，Referring to FIG. 5 , FIG. 5 is a schematic diagram of a data filtering and confirming device. The data filtering and confirmation device is the final arrangement before the persistent storage of log data. The device obtains the log data from the consumer unit of the big data cache device, sorts, filters, and confirms the log data according to the established format, and then transmits it to the large data storage company who is brave in persistent storage. data storage device. The data filtering confirmation device is composed of a log data input unit, a log data filtering unit and a second log data output unit. in,

日志数据输入单元，用于与大数据缓存装置进行交互，从大数据缓存装置中以连续的流传输方式获取各种各样的数据。The log data input unit is used to interact with the big data cache device, and obtain various data from the big data cache device in a continuous stream transmission mode.

日志数据过滤单元，用于数据处理与转换，解析各个事件，识别已命名的字段以构建结构，并将它们转换成通用格式。Log data filtering unit for data processing and transformation, parsing individual events, identifying named fields to build structures, and converting them into a common format.

第二日志数据输出单元，用于将日志数据过滤单元生成是数据输出到大数据存储装置中。The second log data output unit is configured to output the data generated by the log data filtering unit to the big data storage device.

参见图6所示，图6为大数据存储装置的一种示意图。大数据存储装置将日志数据进行持久化存储，并提供用于搜索和分析功能。大数据存储装置由第一数据接口单元、搜索分析单元和数据存储单元组成。其中：Referring to FIG. 6, FIG. 6 is a schematic diagram of a big data storage device. The big data storage device stores log data persistently and provides functions for search and analysis. The big data storage device is composed of a first data interface unit, a search analysis unit and a data storage unit. in:

第一数据接口单元，用于对大数据存储装置进行数据的输入存储和查询输出。The first data interface unit is used to input, store, query and output data to the big data storage device.

搜索分析单元，用于根据第一数据接口单元提供的数据查询条件从数据存储单元中获取相应的数据。The search and analysis unit is configured to acquire corresponding data from the data storage unit according to the data query condition provided by the first data interface unit.

数据存储单元，用于对日志数据进行存储；较佳地，通过对数据的分类汇总，在数据入库做持久化时，将同一组数据设定全局唯一的标签，以便于大大提高查询速度。The data storage unit is used to store the log data; preferably, by classifying and summarizing the data, when the data is stored in the database for persistence, a globally unique label is set for the same group of data, so as to greatly improve the query speed.

第二集群管理单元，用于在数据存储装置由多个节点组成的集群的情况下，管理各个节点，并在所有节点上都提供搜索分析功能。The second cluster management unit is configured to manage each node in the case of a cluster composed of a plurality of nodes in the data storage device, and provide a search and analysis function on all nodes.

参见图7所示，图7为日志数据分析装置的一种示意图。该装置用于对数据查询条件进行编排，以最优的条件去大数据存储装置中搜索数据，并对搜索结果按要求进行处理。日志数据分析装置由第二数据接口单元、数据分析单元和查询接口单元组成。其中：Referring to FIG. 7 , FIG. 7 is a schematic diagram of an apparatus for analyzing log data. The device is used for arranging data query conditions, searching for data in the big data storage device with optimal conditions, and processing the search results as required. The log data analysis device is composed of a second data interface unit, a data analysis unit and a query interface unit. in:

第二数据接口单元，用于与大数据存储装置交互，为查询大数据存储装置中的数据提供统一读取接口，获取数据分析单元计算所需的数据。The second data interface unit is used for interacting with the big data storage device, providing a unified reading interface for querying data in the big data storage device, and obtaining the data required for the calculation of the data analysis unit.

数据分析单元，用于与第二数据接口单元和查询接口单元交互，提供实时计算和挖掘计算能力。The data analysis unit is configured to interact with the second data interface unit and the query interface unit to provide real-time computing and mining computing capabilities.

查询接口单元，用于与分析结果展示装置交互，将查询条件进行分析，形成适合的查询语句，交与数据分析单元进行查询，并将查询结果整理成规定格式的结果集，返回给分析结果展示装置。The query interface unit is used to interact with the analysis result display device, analyze the query conditions, form a suitable query statement, send it to the data analysis unit for query, and organize the query results into a result set in a specified format, and return it to the analysis result display device.

参见图8所示，图8为分析结果装置的一种示意图。分析结果装置是本系统的控制中心，负责查询流程的发起、停止和查询结果的展示，为查询结果提供图形和表格化的展示效果。分析结果展示装置由查询控制单元、查询请求单元、数据处理单元以及结果展示单元等组成。其中：Referring to FIG. 8 , FIG. 8 is a schematic diagram of an apparatus for analyzing results. The analysis result device is the control center of the system, responsible for the initiation and stop of the query process and the display of query results, and provides graphical and tabular display effects for the query results. The analysis result display device is composed of a query control unit, a query request unit, a data processing unit, and a result display unit. in:

查询控制单元，用于日志数据查询任务的开始、停止、查询条件的录入与规范性的检测。The query control unit is used for the start, stop, input of query conditions and normative detection of log data query tasks.

查询请求单元，用于接收查询控制单元的查询请求，与日志数据分析装置交互，获取数据处理单元计算所需的数据；与数据处理单元进行交互，将查询到的数据交于数据处理单元进行处理。The query request unit is used to receive the query request from the query control unit, interact with the log data analysis device, and obtain the data required for the calculation of the data processing unit; interact with the data processing unit, and hand over the queried data to the data processing unit for processing .

数据处理单元，用于将来自查询请求单元的数据按照查询条件的要求进行转换、处理。The data processing unit is used to convert and process the data from the query request unit according to the requirements of the query conditions.

结果展示单元，用于将数据处理单元整理完成的日志数据进行图形化或表格等形式的展示。The result display unit is used to display the log data organized by the data processing unit in the form of graphs or tables.

上述系统实现了如下功能。The above system realizes the following functions.

1、日志数据实时采集功能1. Real-time collection of log data

根据运营商分析需要，将全部或部分通信过程中产生的日志数据传输至所述系统进行通信情况分析处理。According to the analysis needs of the operator, all or part of the log data generated during the communication process is transmitted to the system for analysis and processing of the communication situation.

通过日志数据采集装置，根据预先设定的策略，实时地对通信信息进行初步的整理，过滤并去除无效的信息，把日志数据推送至大数据缓存装置进行存储。Through the log data collection device, according to the preset strategy, the communication information is preliminarily sorted in real time, the invalid information is filtered and removed, and the log data is pushed to the big data cache device for storage.

2、数据缓存功能2. Data cache function

对于来自日志数据采集装置的并行海量数据，需要具有数据处理不及时而造成数据丢失的防范措施，因此在进行数据处理之前，增设数据缓存装置，该装置是分布式、订阅消息队列系统，具有快速持久化、高吞吐、负载均衡等功能。For the parallel massive data from the log data acquisition device, it is necessary to have preventive measures against data loss caused by untimely data processing. Therefore, before data processing, a data cache device is added. This device is a distributed, subscribed message queue system, with fast Persistence, high throughput, load balancing and other functions.

3、数据转换过滤功能3. Data conversion filtering function

在日志数据入库进行持久化存储之前，要对数据进行处理和转换，以保证数据处于符合查询最优的格式；数据转换过滤功能可以添加字段、移除字段、通过正则表达式切分数据，也可以根据条件判断来进行不同数据的处理方式。Before log data is stored in the database for persistent storage, the data must be processed and transformed to ensure that the data is in the optimal format for querying; the data conversion filtering function can add fields, remove fields, and segment data through regular expressions. Different data processing methods can also be performed according to conditional judgment.

4、日志数据实时分析功能4. Real-time analysis of log data

当日志数据产生后立即被日志数据采集装置采集、并进行传输，最后存入大数据存储装置，从而可以实时地从大数据存储装置中获取查询所需数据，根据设定的分析策略进行分析，例如，根据主叫号码和被叫号码、指定时间段、业务类型等多维度地进行识别分析，分析的结果提供给分析结果展示装置。Immediately after the log data is generated, it is collected by the log data collection device, transmitted, and finally stored in the big data storage device, so that the data required for the query can be obtained from the big data storage device in real time, and analyzed according to the set analysis strategy. For example, identification and analysis are performed in multiple dimensions according to the calling number and the called number, the specified time period, and the service type, and the analysis result is provided to the analysis result display device.

5、分析结果展示功能5. Analysis result display function

当得到分析结果后，分析结果展示装置可以根据选择的方式进行图形化或表格的展示，其中图形化的展示是时序化的二维流程图，每一步流程都可显示该流程的详细信息。After the analysis result is obtained, the analysis result display device can display the analysis result graphically or in a table according to the selected method, wherein the graphical display is a sequential two-dimensional flow chart, and each step of the flow can display the detailed information of the flow.

本实施例实时在线采集日志数据，采用大数据技术对日志进行分析，分析效率高；采用缓存机制，在采集的日志数据并发量超过入库量时，可以先将日志进行缓存，以防日志丢失；对已经入库做持久化的数据进行热数据、冷数据分类，减少每次查询扫描的数据量，有利于提高查询速度。In this embodiment, log data is collected online in real time, and the big data technology is used to analyze the log, which has high analysis efficiency; the cache mechanism is adopted, and when the concurrent amount of collected log data exceeds the storage amount, the log can be cached first to prevent log loss. ; Classify the hot data and cold data that have been stored in the database for persistence to reduce the amount of data scanned by each query, which is beneficial to improve the query speed.

参见图9所示，图9为本申请基于海量日志的实时分析方法的一种流程示意图。该方法包括，Referring to FIG. 9 , FIG. 9 is a schematic flowchart of a real-time analysis method based on massive logs of the present application. The method includes,

步骤901，对输入的查询条件进行规范化检查，检查通过后，执行步骤902，否则，输出提示，Step 901, perform a normalization check on the input query condition, after the check is passed, execute Step 902, otherwise, output a prompt,

步骤902，按照日志数据的唯一标识的规则，例如，用于分类的不同标识，把查询条件组合成关键字，基于大数据存储装置所存储的数据进行搜索，Step 902, according to the rules of the unique identification of the log data, for example, different identifications used for classification, combine the query conditions into keywords, and perform a search based on the data stored in the big data storage device,

其中，热数据具有较高的搜索优先级，即，先搜索热数据，Among them, the hot data has a higher search priority, that is, the hot data is searched first,

若热数据搜索不到，则再去冷数据进行搜索。If the hot data cannot be searched, then go to the cold data to search.

若最终搜索结果为空，则直接发送查询结果；If the final search result is empty, the query result will be sent directly;

若最终搜索结果不为空，则得到结果集，执行步骤903，对搜索结果进行数据分析、整理。If the final search result is not empty, a result set is obtained, and step 903 is executed to perform data analysis and sorting on the search result.

步骤903，对结果集进行整理，然后按图形化或表格进行结果展示；图形化的展示是时序化的二维流程图，每一步流程都可显示该流程的详细信息，表格展示就对结果集的详细内容进行列表分页展示。Step 903, arranging the result set, and then displaying the results graphically or in a table; the graphical display is a time-series two-dimensional flow chart, the detailed information of the process can be displayed in each step of the process, and the tabular display can display the result set. The details of the list are displayed in pagination.

在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。In this document, relational terms such as first and second, etc. are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such existence between these entities or operations. The actual relationship or sequence. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the present invention. within the scope of protection.

Claims

1. A real-time analysis method based on mass log data is characterized by comprising the following steps,

acquiring input query conditions, combining the query conditions into keywords,

based on the stored big data, searching is carried out according to the keywords,

analyzing the searched result set according to a set analysis strategy to obtain an analysis result;

wherein,

the big data is obtained as follows:

recording the data associated with the service in each service process of each service point to generate the log data of each service point,

collecting log data of each service point in real time, caching,

filtering and structuring the cached log data to obtain big data for persistent storage,

and storing the big data.

2. The real-time analysis method of claim 1, wherein the big data is stored in a distributed cluster, wherein hot data is stored in a first distributed storage device, all data is stored in a second distributed storage device as cold data, and the cold data is transferred to the first distributed storage device as hot data when accessed, wherein the performance of the first distributed storage device is higher than the performance of the second distributed storage device,

based on the stored big data, a search is performed according to the keywords, including,

the hot data is preferentially searched for,

in the case that the hot data cannot be searched, the cold data is searched again.

3. The real-time analysis method of claim 1, further comprising graphically displaying or tabulating the analysis results, wherein the graphical display is a time-sequenced two-dimensional flow chart, and each flow shows the detailed information of the flow.

4. The real-time analysis method of claim 1, wherein the collecting log data of each service point in real-time and buffering comprises,

collecting log data of each service point from each service point,

according to the type of the log data, caching the log data in corresponding queues in a distributed cluster mode, managing the cached data in a mode of a producer and a consumer, and deleting the cached data after the cached data is used by all consumers.

5. A real-time analysis system based on mass log data is characterized in that the system comprises,

a log data generating device for recording the data associated with the service in each service process of each service point to generate the log data of each service point,

a log data acquisition device for acquiring log data of each service point in real time,

the big data caching device is used for caching the collected log data of each service point,

the data filtering and confirming device is used for filtering and structuring the cached log data to obtain big data for persistent storage,

a big data storage device for storing big data,

and the log data analysis device is used for acquiring input query conditions, combining the query conditions into keywords, searching according to the keywords based on the stored big data, and analyzing the searched result set according to a set analysis strategy to obtain an analysis result.

6. The real-time analysis system of claim 5, further comprising,

and the analysis result display device is used for carrying out graphic display or table display on the analysis result, wherein the graphic display is a time-sequenced two-dimensional flow chart, and each step of the flow displays the detailed information of the flow.

7. The real-time analysis system of claim 6, wherein the log data generation means comprises,

the log data sorting unit is used for interacting with each service point according to a standard protocol, sorting the data provided by each service point and generating log data;

the log data file unit is used for performing file processing on the log data generated by the log data sorting unit to form a file suitable for reading;

the log data acquisition device comprises a log data acquisition device,

the log data collection unit is used for tracking the log file and providing the event data for the log data aggregation unit to use;

the log data gathering unit is used for processing the data collected by the log data collecting unit and transmitting the processed data to the first log data output unit at least once, and under the condition that the first log data output unit is blocked and all transmitted events are not confirmed, continuously trying to send data to the first log data output unit until the log data output unit outputs and confirms the received events;

and the first log data output unit is used for interacting with the big data cache device, acquiring data from the log data aggregation unit, transmitting the acquired data to the big data cache device, and returning the acquired data to the log data aggregation unit when receiving a confirmation event of the big data cache device.

8. The real-time analysis system of claim 7, wherein the big data caching means comprises,

the producer unit is used for interacting with the first log data output unit, acquiring log data and issuing the data to the corresponding data queue unit according to types;

the data queue unit is used for receiving and storing the data sent by the producer unit, interacting with the consumer unit, reading the data from the queue and providing the data to the consumer unit;

the consumer unit is used for interacting with the data queue unit, reading data from the queue and transmitting the data to the data filtering and confirming device;

the data filtering and confirming device comprises a data filtering and confirming device,

the log data input unit is used for interacting with the big data caching device and acquiring data from the big data caching device in a continuous stream transmission mode;

the log data filtering unit is used for carrying out one or any combination of processing of filtering, recombining and confirming the data and converting the universal format;

a second log data output unit for outputting the data generated by the log data filtering unit to the big data storage device;

the big data storage device comprises a big data storage device,

and the first data interface unit is used for carrying out data input storage and query output on the big data.

And the search analysis unit is used for acquiring corresponding data from the data storage unit according to the data query condition provided by the first data interface unit.

And the data storage unit is used for storing the log data.

9. The real-time analysis system of claim 8, wherein the log data analysis means comprises,

the second data interface unit is used for interacting with the big data storage device, providing a uniform reading interface for inquiring data in the big data storage device and acquiring data required by the calculation of the data analysis unit;

the data analysis unit is used for interacting with the second data interface unit and the query interface unit and providing real-time calculation and mining calculation capacity;

the query interface unit is used for interacting with the analysis result display device, analyzing the query conditions to form a proper query statement, submitting the query statement to the data analysis unit for query, sorting the query result into a result set with a specified format, and returning the result set to the analysis result display device;

the analysis result display device comprises a display device,

the query control unit is used for starting and stopping a log data query task, recording query conditions and detecting normalization,

the query request unit is used for receiving a query request of the query control unit, interacting with the log data analysis device and acquiring data required by the calculation of the data processing unit; interacting with the data processing unit, and handing the inquired data with the data processing unit for processing;

the data processing unit is used for converting and/or processing the data from the query request unit according to the requirements of the query conditions;

and the result display unit is used for displaying the log data finished by the data processing unit in a graphical or tabular form.

10. The real-time analysis system of claim 9, wherein the big data caching apparatus is a distributed cluster caching apparatus, the apparatus further comprising,

the first cluster management unit is used for cluster management of the producer unit, the data queue unit and the consumer unit;

the big data storage device is a distributed cluster storage device, the device further comprises,

and the second cluster management unit is used for managing each node and providing a search analysis function on all the nodes.