CN108170830A

CN108170830A - Group event data visualization method and system

Info

Publication number: CN108170830A
Application number: CN201810022368.6A
Authority: CN
Inventors: 徐葳; 孙娇; 姚期智
Original assignee: Tsinghua University
Current assignee: Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2018-06-15
Anticipated expiration: 2038-01-10
Also published as: CN108170830B

Abstract

The present application provides a method and system for visualizing group event data, which is applied in a fraud event detection system. The method includes the following steps: acquiring a data set of a group, and the data features in the data set include at least event type and event Type-associated time information; create the first time axis and the second time axis; based on the encoding of data features, display the first time axis with the first shape as the node, to represent each group in the first time axis The type and quantity of events occurring in the time granularity; display the second shape to represent the total number of each event type occurring in the time interval of the second time axis; display the second time axis to represent the event types represented in the second shape Associate with each time granularity of the event type on the second time axis, and represent the distribution of each event type on the second time axis through the third shape; and display the fourth shape to represent the group on the second time axis The type and number of events that occur in each time granularity of the

Description

Group event data visualization method and system

技术领域technical field

本申请涉及计算机处理技术领域，特别是涉及一种群组事件数据可视化方法及系统。The present application relates to the field of computer processing technology, in particular to a method and system for visualizing group event data.

背景技术Background technique

在线欺诈已经是众人熟知的当今互联网黑暗面了，它每年都会在世界范围内造成不可估量的损失。2015年，互联网犯罪投诉中心接到了全世界范围内的百万级别的关于欺诈问题的投诉，而网上欺诈每年也会在世界范围内造成几十亿的经济损失，欺诈用户通常而言会从帮忙推销某个具体商品，或者散布垃圾信息中得到报酬。在互联网金融中，欺诈用户利用假身份来申请贷款、用他们盗取的信用卡购买商品、甚至进行洗钱等非法活动。因此，在互联网商业场景中，找到合适的反欺诈算法变得越发关键，这一需求也与日俱增。Online fraud is a well-known dark side of today's Internet, causing immeasurable losses worldwide every year. In 2015, the Internet Crime Complaint Center received millions of complaints about fraud issues around the world, and online fraud causes billions of economic losses worldwide every year. Fraudulent users usually get help from Promote a specific product, or get paid for spreading spam. In Internet finance, fraudulent users use false identities to apply for loans, use their stolen credit cards to purchase goods, and even carry out illegal activities such as money laundering. Therefore, in Internet business scenarios, finding a suitable anti-fraud algorithm has become more and more critical, and this demand is also increasing day by day.

尽管如今有很多方法来识别互联网上的欺诈，但是受所构建的欺诈事件检测系统的限制，所筛选出的对应欺诈嫌疑人的数据的可信性需要后续大量的人力验证，例如，平台监管人员需逐个排查验证。这使得欺诈事件检测系统中比如算法参数的修订、数据特征优先级的设计、算法模型选取等，不仅需要算法专家的软件设计，更需要领域专家的参与。因此，提高欺诈识别算法的透明度能有效改进欺诈事件检测准确率，以如何实现数据的可视化为本领域亟待解决的问题。Although there are many ways to identify fraud on the Internet today, due to the limitations of the fraud event detection system built, the credibility of the screened data corresponding to fraud suspects requires a large amount of subsequent human verification, for example, platform supervisors Need to check and verify one by one. This makes the revision of algorithm parameters, the design of data feature priority, and the selection of algorithm models in the fraud event detection system not only require the software design of algorithm experts, but also the participation of domain experts. Therefore, improving the transparency of fraud identification algorithms can effectively improve the accuracy of fraud event detection, and how to realize data visualization is an urgent problem to be solved in this field.

发明内容Contents of the invention

鉴于以上所述现有技术的缺点，本申请的目的在于提供一种群组事件数据可视化方法及系统，用于解决现有技术中欺诈识别算法可视化的问题。In view of the shortcomings of the prior art described above, the purpose of the present application is to provide a group event data visualization method and system for solving the problem of visualization of fraud identification algorithms in the prior art.

为实现上述目的及其他相关目的，本申请的第一方面提供一种群组数据可视化方法，应用于一欺诈事件检测系统中，包括以下步骤：获取一个群组的数据集，所述数据集中的数据特征至少包括事件类型及与所述事件类型相关联的时间信息；创建第一时间轴及第二时间轴；基于对所述数据特征的编码，显示以第一形状作为节点的第一时间轴，以表征所述群组在所述第一时间轴的每一时间粒度内发生的事件类型及数量；显示第二形状，以表征所述第二时间轴的时间区间内发生的每种事件类型的总数量；显示第二时间轴，将所述第二形状中表征的事件类型与该事件类型在所述第二时间轴的各时间粒度进行关联，并通过第三形状表征的各事件类型在所述第二时间轴上的分布；以及显示第四形状，以表征所述群组在所述第二时间轴的每一时间粒度内发生的事件类型及数量。In order to achieve the above purpose and other related purposes, the first aspect of the present application provides a group data visualization method, which is applied to a fraud event detection system, and includes the following steps: acquiring a data set of a group, and the data set in the data set The data feature includes at least an event type and time information associated with the event type; creating a first time axis and a second time axis; based on the encoding of the data feature, displaying the first time axis with the first shape as a node , to represent the type and quantity of events occurring in the group in each time granularity of the first time axis; display a second shape, to represent each event type occurring in the time interval of the second time axis The total number of ; display the second time axis, associate the event types represented in the second shape with the time granularities of the event types on the second time axis, and each event type represented by the third shape in the distribution on the second time axis; and displaying a fourth shape to represent the type and quantity of events occurring in the group within each time granularity of the second time axis.

本申请第二方面提供一种计算机设备，包括：一个或多个处理器；以及在所述一个或多个处理器上执行的呈现引擎，所述呈现引擎用于执行如本申请第一方面所述的群组数据可视化方法。The second aspect of the present application provides a computer device, including: one or more processors; and a presentation engine executed on the one or more processors, and the presentation engine is used to execute the computer described in the first aspect of the present application. The group data visualization method described above.

本申请第三方面提供一种群组数据可视化系统，包括：获取模块，通过网络获取一个群组的数据集，所述数据集中的数据特征至少包括事件类型及与所述事件类型相关联的时间信息；处理模块，创建第一时间轴及第二时间轴，以及对所述数据特征的编码；以及显示模块，通过显示设备在一个界面中显示第一、第二时间轴以及显示第一、第二、第三、及第四形状，其中，所述第一形状作为所述第一时间轴的节点以表征所述群组在所述第一时间轴的每一时间粒度内发生的事件类型及数量；所述第二形状表征所述第二时间轴的时间区间内发生的每种事件类型的总数量；所述第三形状表征所述第二形状中表征的事件类型在所述第二时间轴上的分布；所述第四形状表征所述群组在所述第二时间轴的每一时间粒度内发生的事件类型及数量。The third aspect of the present application provides a group data visualization system, including: an acquisition module that acquires a data set of a group through the network, and the data features in the data set include at least the event type and the time associated with the event type Information; a processing module, creating the first time axis and the second time axis, and encoding the data characteristics; and a display module, displaying the first and second time axes and displaying the first and second time axes in one interface through a display device 2. The third and fourth shapes, wherein, the first shape is used as a node of the first time axis to represent the type of events occurring in each time granularity of the group in the first time axis and Quantity; the second shape represents the total number of each event type that occurs within the time interval of the second time axis; the third shape represents the event type represented in the second shape at the second time distribution on the axis; the fourth shape characterizes the type and quantity of events occurring in the group within each time granularity of the second time axis.

本申请第四方面提供一种客户端，通过网络连接一服务端，所述客户端基于发送请求以登录所述服务端执行本申请第一方面所述的群组数据可视化方法的步骤The fourth aspect of the present application provides a client, which is connected to a server through the network, and the client performs the steps of the group data visualization method described in the first aspect of the present application to log in to the server based on sending a request

本申请第五方面提供一种服务器，通过网络连接一客户端，所述服务器基于所述客户端执行请求的操作，向所述客户端发送本申请第一方面所述的群组数据可视化方法的过程并通过所述客户端显示执行结果。The fifth aspect of the present application provides a server connected to a client through the network, the server executes the requested operation based on the client, and sends the group data visualization method described in the first aspect of the present application to the client process and display the execution result through the client.

本申请第六方面提供一种浏览器，通过网络连接一服务端，所述浏览器基于发送请求以登录所述服务端执行本申请第一方面所述的群组数据可视化方法的步骤。The sixth aspect of the present application provides a browser connected to a server through a network, and the browser logs in to the server based on sending a request to execute the steps of the group data visualization method described in the first aspect of the present application.

本申请第七方面提供一种计算机可读存储介质，存储有数据可视化计算机程序，其特征在于，所述数据可视化计算机程序被执行时实现本申请第一方面所述的群组数据可视化方法的步骤。The seventh aspect of the present application provides a computer-readable storage medium storing a data visualization computer program, wherein the data visualization computer program implements the steps of the group data visualization method described in the first aspect of the present application when executed .

如上所述，本申请的群组数据可视化方法及系统通过将欺诈事件检测过程中所确定群组的数据集基于时间轴、类型分布、分类列表等方式予以呈现，实现了将欺诈事件检测期间所分群组的数据特征以多种关系界面进行展示，有利于领域专家和算法专家对欺诈事件检测系统的检测算法进行评估和修订。As mentioned above, the group data visualization method and system of the present application present the data sets of the groups determined in the fraud event detection process based on the time axis, type distribution, classification list, etc., so as to realize the The data characteristics of the groups are displayed in a variety of relational interfaces, which is beneficial for domain experts and algorithm experts to evaluate and revise the detection algorithm of the fraud event detection system.

附图说明Description of drawings

图1显示为本申请在一实施例中的群组数据可视化方法流程图。FIG. 1 is a flowchart of a group data visualization method in an embodiment of the present application.

图2显示为本申请在一实施例中获取群组数据集步骤的流程图。FIG. 2 is a flow chart showing the steps of acquiring group data sets in an embodiment of the present application.

图3显示为本申请在一实施例中显示的包含多个群组的界面。FIG. 3 shows an interface including multiple groups displayed in an embodiment of the present application.

图4显示为本申请在一实施例中群组数据可视化的显示界面示意图。FIG. 4 is a schematic diagram of a display interface for visualizing group data in an embodiment of the present application.

图5显示为本申请在另一实施例中群组数据可视化的显示界面示意图。FIG. 5 is a schematic diagram of a display interface for visualizing group data in another embodiment of the present application.

图6a-6d分别显示本申请为利用本申请的可视化方法而显示几种状态的界面示意图。Figures 6a-6d respectively show the interface diagrams of several states displayed by the present application for utilizing the visualization method of the present application.

图7显示本申请在一实施例中显示的一个群组的数据集的列表界面示意图。FIG. 7 shows a schematic diagram of a list interface of a group of data sets displayed in an embodiment of the present application.

图8显示为本申请在一实施例中群组数据集的特征分布的界面的流程图。FIG. 8 is a flow chart showing the interface of the feature distribution of the group data set in an embodiment of the present application.

图9显示为本申请在一实施例中显示的一个群组中的注册时间的特征分布的直方图及对比图的界面。FIG. 9 shows an interface of a histogram and a comparison chart of the characteristic distribution of registration time in a group displayed in an embodiment of the present application.

图10显示为本申请在一个实施例中显示多个群组在集群中分布步骤流程图。FIG. 10 is a flow chart showing the steps of distributing multiple groups in a cluster in one embodiment of the present application.

图11显示了本申请在一个实施例中显示多个群组在集群中分布界面示意图。FIG. 11 shows a schematic diagram of an interface for displaying distribution of multiple groups in a cluster in an embodiment of the present application.

图12显示为本申请在一个实施例中所提供计算机设备的模块结构示意图。FIG. 12 shows a schematic diagram of the module structure of a computer device provided in an embodiment of the present application.

图13显示为本申请在一个实施例中所提供的群组数据可视化系统的模块结构示意图。FIG. 13 is a schematic diagram of a module structure of a group data visualization system provided in an embodiment of the present application.

具体实施方式Detailed ways

以下由特定的具体实施例说明本申请的实施方式，熟悉此技术的人士可由本说明书所揭露的内容轻易地了解本申请的其他优点及功效。The implementation of the present application will be described by specific specific examples below, and those skilled in the art can easily understand other advantages and effects of the present application from the content disclosed in this specification.

在下述描述中，参考附图，附图描述了本申请的若干实施例。应当理解，还可使用其他实施例，并且可以在不背离本申请的精神和范围的情况下进行机械组成、结构、电气以及操作上的改变。下面的详细描述不应该被认为是限制性的，并且本申请的实施例的范围仅由公布的专利的权利要求书所限定。这里使用的术语仅是为了描述特定实施例，而并非旨在限制本申请。In the following description, reference is made to the accompanying drawings, which illustrate several embodiments of the application. It is to be understood that other embodiments may be utilized, and mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description should not be considered limiting, and the scope of the embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for describing particular embodiments only and is not intended to limit the application.

再者，如同在本文中所使用的，单数形式“一”、“一个”和“该”旨在也包括复数形式，除非上下文中有相反的指示。应当进一步理解，术语“包含”、“包括”表明存在所述的特征、步骤、操作、元件、组件、项目、种类、和/或组，但不排除一个或多个其他特征、步骤、操作、元件、组件、项目、种类、和/或组的存在、出现或添加。此处使用的术语“或”和“和/或”被解释为包括性的，或意味着任一个或任何组合。因此，“A、B或C”或者“A、B和/或C”意味着“以下任一个：A；B；C；A和B；A和C；B和C；A、B和C”。仅当元件、功能、步骤或操作的组合在某些方式下内在地互相排斥时，才会出现该定义的例外。Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It should be further understood that the terms "comprising", "comprising" indicate the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not exclude one or more other features, steps, operations, The existence, occurrence or addition of an element, component, item, species, and/or group. The terms "or" and "and/or" as used herein are to be construed as inclusive, or to mean either one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: A; B; C; A and B; A and C; B and C; A, B and C" . Exceptions to this definition will only arise when combinations of elements, functions, steps or operations are inherently mutually exclusive in some way.

在欺诈事件检测技术中，领域专家为欺诈事件识别的核心技术提供数据分类的经验和分类结果准确性的需求，但算法架构本身及算法中的参数并不是他们所熟知的。领域专家由于无从得到检测期间对数据分类的方式，当利用欺诈事件检测系统得到欺诈事件检测结果时，领域专家除了对检测结果进行验证之外，无从判断所得到的检测结果的准确性。为了提高欺诈事件检测系统的准确性，本申请提供一种应用于欺诈事件检测系统的群组数据可视化方法，将欺诈事件检测系统中经分类得到的群组及其数据集以可视化的方式展示给算法专家和领域专家，使得不同的用户(如领域专家或算法专家)通过多种交互式手段来探索各种欺诈行为，并能够按根据自己的需要进行不同深度的探索。In fraud event detection technology, domain experts provide data classification experience and classification result accuracy requirements for the core technology of fraud event identification, but the algorithm architecture itself and the parameters in the algorithm are not well known to them. Since the domain experts have no way of obtaining the data classification method during the detection period, when the fraud event detection results are obtained by using the fraud event detection system, the domain experts have no way of judging the accuracy of the detection results except for verifying the detection results. In order to improve the accuracy of the fraud event detection system, this application provides a group data visualization method applied to the fraud event detection system, which displays the classified groups and their data sets in the fraud event detection system to the public in a visual way Algorithm experts and domain experts enable different users (such as domain experts or algorithm experts) to explore various fraudulent behaviors through a variety of interactive means, and can conduct different depths of exploration according to their needs.

所述群组数据可视化方法主要由计算机设备来执行。所述计算机设备可以是以下合适的计算机设备，诸如手持计算机设备、平板计算机设备、笔记本计算机、桌上型计算机，服务器等。计算机设备包括显示器、输入装置、输入/输出(I/O)端口、一个或多个处理器、存储器、非易失性存储设备、网络接口、以及电源等。所述的各种部件可包括硬件元件(例如芯片和电路)、软件元件(例如存储指令的有形非暂态计算机可读介质)、或者硬件元件和软件元件的组合。此外，需注意，各种部件可被组合成更少的部件或者被分离成附加部件。例如，存储器和非易失性存储设备可被包括在单个部件中。所述计算机设备可单独执行所述可视化方法，或与其他计算机设备配合执行。在一些实施方式中，计算机设备执行可视化方法并将相应的可视化界面予以展示。例如，计算机设备包含处理器、显示器，其中，在所述处理器上执行的呈现引擎(或显示引擎)，所述呈现引擎用于执行所述的群组数据可视化方法并通过显示器予以显示，在此，所述呈现引擎包括但不限于能够解析基于程序语言开发的用于界面显示的软件及硬件，如XML、HTML等脚本语言、C语言等。在又一些实施方式中，一台计算机设备执行可视化方法并将相应的可视化界面提供给另一台计算机设备予以展示。例如，客户端基于用户的请求操作向服务端发起请求并登录所述服务端，服务端执行可视化方法以形成相应的界面数据，并将所述界面数据反馈给客户端，由客户端的浏览器或定制的应用程序按照相应界面数据显示相应图示。The group data visualization method is mainly performed by computer equipment. The computer device may be a suitable computer device such as a handheld computer device, a tablet computer device, a notebook computer, a desktop computer, a server or the like. Computer equipment includes displays, input devices, input/output (I/O) ports, one or more processors, memory, non-volatile storage devices, network interfaces, and power supplies, among others. The various components described may include hardware elements (such as chips and circuits), software elements (such as a tangible, non-transitory computer readable medium storing instructions), or a combination of hardware and software elements. Also, note that various components may be combined into fewer components or separated into additional components. For example, memory and non-volatile storage may be included in a single component. The computer device can execute the visualization method alone, or cooperate with other computer devices. In some embodiments, the computer device executes the visualization method and displays a corresponding visualization interface. For example, the computer device includes a processor and a display, wherein, a presentation engine (or display engine) executed on the processor is used to execute the group data visualization method and display it through the display, in Here, the presentation engine includes, but is not limited to, software and hardware capable of analyzing interface display developed based on programming languages, such as XML, HTML and other scripting languages, C language, and the like. In still other embodiments, a computer device executes the visualization method and provides a corresponding visualization interface to another computer device for display. For example, the client initiates a request to the server based on the user's request operation and logs in to the server, the server executes a visualization method to form corresponding interface data, and feeds back the interface data to the client, and the client's browser or The customized application program displays the corresponding icons according to the corresponding interface data.

所述可视化方法主要由欺诈事件检测系统来执行。所述欺诈事件检测系统可包含一个或多个计算机设备中的软件和硬件。为了向用户提供一个欺诈群组在不同时间段上的行为，从而回答领域专家所提出的“一个组作为一个欺诈群组做了什么呢”，以及算法专家所提出的“同一个组的用户是否都有相同的行为习惯”。本申请从时间轴线上提供一种可视化方法。请参阅图1，显示为本申请在一实施例中的群组数据可视化方法流程图。如图所示，所述群组数据可视化方法包括以下步骤：The visualization method is mainly performed by the fraud event detection system. The fraudulent event detection system may comprise software and hardware in one or more computer devices. In order to provide users with the behavior of a fraudulent group in different time periods, so as to answer the question "what does a group do as a fraudulent group" proposed by domain experts, and "do users in the same group They all have the same behavior." This application provides a visualization method from the time axis. Please refer to FIG. 1 , which is a flowchart of a method for visualizing group data in an embodiment of the present application. As shown in the figure, the group data visualization method includes the following steps:

在步骤S11中，获取一个群组的数据集。所述数据集中的数据特征至少包括事件类型及与所述事件类型相关联的时间信息。在某些实施例中，确定一个群组的方式如下描述，请参阅图2，显示为本申请的所提供一种实施方式的获取一个群组数据集的流程图，如图所示，所述步骤S11进一步包括：In step S11, a data set of a group is acquired. The data features in the data set at least include event types and time information associated with the event types. In some embodiments, the method of determining a group is described as follows. Please refer to FIG. 2 , which shows a flow chart of obtaining a group data set in an implementation manner provided by the present application. As shown in the figure, the Step S11 further includes:

步骤S111，获取由多个网络用户组成集群的操作日志；在不同的实施例中，所述集群是能够获取到的所有网络用户组成的一个集群，所述集群中的网络用户来自同一网站或者不同的网站，也或者来自不同的网络渠道，比如可以是因特网、一个或多个内部网、局域网(LAN)、广域网(WLAN)、存储局域网(SAN)等或其适当组合，也可以是移动电话的移动通信网络等。Step S111, obtaining the operation log of a cluster composed of multiple network users; in different embodiments, the cluster is a cluster composed of all network users that can be obtained, and the network users in the cluster come from the same website or different , or from different network channels, such as the Internet, one or more intranets, local area network (LAN), wide area network (WLAN), storage area network (SAN), etc. or an appropriate combination thereof, or mobile phone mobile communication network, etc.

步骤S112，从所述多个网络用户的操作日志中确定至少一个数据特征，并分析所述操作日志中至少一组数据特征的相似度以确定所述群组；在具体的实施例中，针对网络欺诈行为必然会在网络中留下用户使用数据的特点，欺诈事件检测系统中收集来自至少一个网站的多个网络用户的操作日志，通过分析所述操作日志中至少一个数据特征的相似度，对产生相应操作日志的用户进行分组，得到群组及群组在操作日志中的数据集。Step S112, determining at least one data feature from the operation logs of the plurality of network users, and analyzing the similarity of at least one set of data features in the operation logs to determine the group; in a specific embodiment, for Network fraud will inevitably leave the characteristics of user usage data in the network. The fraud event detection system collects the operation logs of multiple network users from at least one website, and by analyzing the similarity of at least one data feature in the operation logs, Group the users who generate corresponding operation logs to obtain the group and the data set of the group in the operation log.

在某些实施例中，位于一个群组的数据集中包括但不限用户信息、IP地址、事件类型、事件发起源、事件响应方，及事件发生时间中的至少二者数据特征。其中，所述用户信息如手机号码、邮箱、ID号、身份证号、性别、用户所使用的用户设备编号、注册时间等。其中，同一用户信息可对应至少一个事件类型，每个事件类型对应事件发起源、事件响应方和事件发生时间。所述事件特征包括但不限于：网络用户之间进行的关注、点赞、评论、馈赠(或者称之为送礼)等社交行为，或者网络用户进行登录、登出、更新状态、注册、修改信息等操作行为中的至少一者。例如，同一用户信息可对应多个点赞事件类型，每个点赞事件类型对应各自事件发起源、事件响应方和事件发生时间。In some embodiments, the data set in one group includes but not limited to at least two data characteristics of user information, IP address, event type, event source, event responder, and event occurrence time. Wherein, the user information includes mobile phone number, email address, ID number, ID number, gender, user equipment number used by the user, registration time, and the like. Wherein, the same user information may correspond to at least one event type, and each event type corresponds to an event source, an event responder, and an event occurrence time. The event characteristics include but are not limited to: social behaviors such as attention, likes, comments, gifts (or called gifts) between network users, or network users logging in, logging out, updating status, registering, and modifying information at least one of other operations. For example, the same user information may correspond to multiple like event types, and each like event type corresponds to its own event source, event responder, and event occurrence time.

步骤S113，获取所述群组的数据集。在一些实施例中，所述数据集可获取自一存放有各群组及其数据集的数据库，所述数据库例如配置于一远端的存储服务器上，或者配置于本地的计算机设备中的存储装置中，则所获取的一个群组的数据集可基于用户的输入操作从数据库中提取而获取。例如，所述欺诈事件检测系统利用无监督检测算法得到多个群组，用户通过选择界面选择其中一个群组，则获取相应群组的数据集。Step S113, acquiring the data set of the group. In some embodiments, the data set can be obtained from a database that stores groups and their data sets. The database is, for example, configured on a remote storage server, or configured in a storage in a local computer device. In the device, the acquired data set of a group can be extracted from the database based on the user's input operation. For example, the fraud event detection system uses an unsupervised detection algorithm to obtain multiple groups, and the user selects one of the groups through the selection interface to obtain the data set of the corresponding group.

具体地，所述欺诈事件检测系统先对操作日志中所有数据在同一类数据特征的相似度进行计算，其中，所述相似度可利用信息熵予以衡量，例如，所述欺诈事件检测系统分别利用用户信息计算IP使用量或最大IP使用量维度的信息熵，利用事件类型计算操作类型维度的信息熵，利用注册时间维度的信息熵或者操作时间计算不良操作维度的信息熵；藉由上述的计算，再利用无监督检测方式对所得到的各信息熵进行检测并划分得到多个群组。其中，所述无监督检测方式举例包括采用基于稠密子图的算法、或者基于向量空间的算法等。本申请所提供的可视化方法所呈现的各群组用于反映欺诈事件所使用的共享资源、用户关系等，来让使用所述欺诈事件检测系统的用户更清晰地确定所述无监督检测算法中的分类策略是否合理。其中，所述共享资源包括但不限于共用的IP、邮箱等，用户关系包括但不限于：用户关注、交互关系等。Specifically, the fraud event detection system first calculates the similarity of all data in the operation log in the same type of data features, wherein the similarity can be measured by information entropy, for example, the fraud event detection system uses User information calculates the information entropy of IP usage or maximum IP usage dimension, uses the event type to calculate the information entropy of the operation type dimension, and uses the information entropy of the registration time dimension or operation time to calculate the information entropy of the bad operation dimension; through the above calculation , and then use the unsupervised detection method to detect the obtained information entropy and divide it into multiple groups. Wherein, the unsupervised detection method includes, for example, an algorithm based on a dense subgraph, or an algorithm based on a vector space, and the like. The groups presented by the visualization method provided in this application are used to reflect the shared resources used by fraudulent events, user relationships, etc., so that users who use the fraudulent event detection system can more clearly determine the Whether the classification strategy is reasonable. Wherein, the shared resource includes but not limited to shared IP, mailbox, etc., and user relationship includes but not limited to: user attention, interaction relationship, etc.

在一种实施例中，所述可视化方法还包括显示至少一个群组界面的步骤，所述群组界面中的群组大小以显示的几何图形大小进行表征。请参阅图3，显示为本申请在一实施例中显示的包含多个群组的界面，如图所示，界面中显示有11个群组，用来表征该些群组的几何图形为圆形，所述的11个群组皆位于一个最大虚线圆内，在所述虚线圆内，比如所述虚线圆用来表征一个有N个网络用户组成的集群，例如标号为0的群组例如为正常群组，在一个较小虚线圆内有标号为1-10的大小不同的10个群组，圆形的尺寸与群组的成员数量成正比，即，大的群组表示成员数量较多，小的群组表示成员数量较少，例如标号为1-10的群组为异常群组。在不同的实施例中，所述群组的所述几何图形可以是任意形状。几何图形的颜色可随机设置，或与群组的数量或群组的成员数量相关。例如，预设有N种颜色，所述欺诈事件检测系统随机地将不同颜色对应到表征各群组的几何图形上。又如，所述欺诈事件检测系统根据预设的颜色顺序，按照成员数量由小到大的顺序依次对应表征各群组的几何图形上。当用户操作所述显示界面而选中一个几何图形时，所述欺诈事件检测系统获取一个群组的数据集。In one embodiment, the visualization method further includes the step of displaying at least one group interface, and the group size in the group interface is characterized by the size of the displayed geometric figure. Please refer to Fig. 3, which shows an interface comprising multiple groups displayed in an embodiment of the present application. As shown in the figure, there are 11 groups displayed in the interface, and the geometric figure used to represent these groups is a circle Shape, the 11 groups mentioned above are all located in a maximum dotted circle, within the dotted circle, for example, the dotted circle is used to represent a cluster composed of N network users, for example, the group labeled 0, for example It is a normal group, and there are 10 groups of different sizes labeled 1-10 in a smaller dotted circle. Many and small groups indicate a small number of members, for example, groups marked 1-10 are abnormal groups. In various embodiments, the geometry of the group may be of arbitrary shape. The color of the geometry can be set randomly, or relative to the number of groups or the number of members of a group. For example, N colors are preset, and the fraud event detection system randomly corresponds different colors to the geometric figures representing each group. As another example, the fraud event detection system corresponds to the geometric figures representing each group in descending order of the number of members according to the preset color sequence. When the user operates the display interface to select a geometric figure, the fraud event detection system acquires a group of data sets.

在一个优选实施例中，所述显示至少一个群组界面中还可以包括显示群组信息的信息栏，当用户选择所述群组界面中的一个群组时，在界面的一侧以视窗或者文本框的方式显示所述群组的基本信息，所述基本信息例如为：群组编码、成员数量、用于确定所述群组最优选的数据特征，群组属性(比如正常群组或异常群组)等信息。In a preferred embodiment, the displaying at least one group interface may also include an information bar displaying group information. When the user selects a group in the group interface, a window or The basic information of the group is displayed in the form of a text box, and the basic information is, for example: group code, number of members, data characteristics used to determine the most preferred group, group attributes (such as normal group or abnormal group) and other information.

在步骤S12中，创建第一时间轴及第二时间轴。所述第一时间轴和第二时间轴是根据数据集中的时间信息而创建的，比如所述数据集中多个时间信息中时间跨度最大为10天，则第一时间轴或第二时间轴的最大时间区间为10天。在一个实施例中，按照相同的时间区间及时间粒度对创建第一时间轴及第二时间轴；在另一个实施例中，照不同的时间区间及时间粒度对创建第一时间轴及第二时间轴，容后详述。In step S12, a first time axis and a second time axis are created. The first time axis and the second time axis are created according to the time information in the data set. For example, the maximum time span in the multiple time information in the data set is 10 days, then the first time axis or the second time axis The maximum time interval is 10 days. In one embodiment, the first time axis and the second time axis are created according to the same time interval and time granularity pair; in another embodiment, the first time axis and the second time axis are created according to different time intervals and time granularity pairs Timeline, detailed later.

在步骤S13中，基于对所述数据特征的编码，显示以第一形状作为节点的第一时间轴，以表征所述群组在所述第一时间轴的每一时间粒度内发生的事件类型及数量。其中，所述欺诈事件检测系统按照第一时间轴的时间粒度对数据集中的事件类型的数量进行统计，将所统计的事件类型编码成预设的第一形状的图形，并按照时序将所编码的各第一形状作为第一时间轴的节点呈现在第一时间轴上。通过第一时间轴上各节点的显示，领域专家能够清晰获得依据时间所统计的事件类型在分布上或数量上的变化过程。其中，所述第一形状包括但不限于：饼状形状、或柱状形状。在一些实施示例中，欺诈事件检测系统可以将一时间粒度内的各事件类型的数量百分比的占比情况编码成第一形状的图形并显示在第一时间轴上，其中，同样事件类型的占比区域的颜色相同。请参阅图4，显示为本申请在一实施例中群组数据可视化的显示界面示意图，如图所示，在显示的界面中，所述第一时间轴T1位于显示界面的下方区域，显示为自8月1日至8月10日10天的时间区间，以天为时间粒度，将每天所统计的事件类型的数量百分比分布编码成饼状图形并作为节点显示在第一时间轴T1上，所述饼状图形中的颜色用于代表事件类型，比如图中标为“黄”色的表示为关注事件，图中标为“红”色的表示为馈赠事件，图中标为“蓝”色的表示为点赞事件，比如图示中第一时间轴T1上以饼状图形作为节点显示的8月7日这天，产生的事件类型中关注事件占比较多，馈赠事件占比较少，点赞事件占比最少。In step S13, based on the encoding of the data features, display the first time axis with the first shape as the node, to represent the type of event that occurs in each time granularity of the group in the first time axis and quantity. Wherein, the fraud event detection system counts the number of event types in the data set according to the time granularity of the first time axis, encodes the counted event types into a graph of a preset first shape, and encodes the encoded event types according to time sequence Each first shape of is presented on the first time axis as nodes of the first time axis. Through the display of each node on the first time axis, domain experts can clearly obtain the change process of the distribution or quantity of event types counted according to time. Wherein, the first shape includes but is not limited to: a pie shape, or a columnar shape. In some implementation examples, the fraudulent event detection system may encode the proportion of the number percentage of each event type within a time granularity into a graph of the first shape and display it on the first time axis, wherein the proportion of the same event type Than the same color as the area. Please refer to FIG. 4 , which is a schematic diagram of a display interface for visualizing group data in an embodiment of the present application. As shown in the figure, in the displayed interface, the first time axis T1 is located in the lower area of the display interface, which is displayed as In the 10-day time interval from August 1st to August 10th, with days as the time granularity, the percentage distribution of the number of event types counted each day is encoded into a pie graph and displayed as a node on the first time axis T1, The colors in the pie graph are used to represent event types, for example, those marked with "yellow" in the figure represent attention events, those marked with "red" in the figure represent gift events, and those marked with "blue" in the figure represent Like events, for example, August 7, which is displayed on the first time axis T1 in the figure with a pie graph as a node, among the generated event types, attention events accounted for a large proportion, gift events accounted for a small proportion, and like events accounted for the least.

在步骤S13中，基于对所述数据特征的编码，显示第二形状，以表征所述第二时间轴的时间区间内发生的每种事件类型的总数量。其中，所述欺诈事件检测系统按照第二时间轴的时间区间对数据集中的事件类型的数量进行加和，将所累加的各事件类型编码成预设的第二形状的图形，并显示在第二时间轴的一时间区间内各事件类型的总数量。其中，所述第二形状包括但不限于：直方图、柱状图、折线图等。根据所创建第二时间轴的时间区间，所显示的各种事件类型的总数量反映了在同一时间区间内各事件类型在数量上的对比情况。当所述第二时间轴的时间区间表示一天或一周时，用户可根据所显示的对应“红”、“黄”和“蓝”三种事件类型的总数量的柱状形状的长短来确定三种事件类型在总数量上的对比情况。此外，所显示的柱状图形还可以依据粗细、透明度等来确定该三种事件类型在总数量上的对比情况。再请参阅图3，如图所示，临近所述第一时间轴T1的一侧(图示中的右侧)显示有一个呈横向直方图，所述直方图中自上而下显示有“红”、“黄”和“蓝”三个柱状条，柱状条的长度代表所述第二时间轴的时间区间内产生事件类型的总数量，可以从该第二形状中看出，在第二时间轴的时间区间内产生事件类型中标为“黄”色的柱状条表示的关注事件最多，标为“红”色的柱状条表示的馈赠事件次之，标为“蓝”色的柱状条表示的点赞事件最少。In step S13, based on the encoding of the data features, a second shape is displayed to represent the total number of each event type occurring within the time interval of the second time axis. Wherein, the fraud event detection system sums the number of event types in the data set according to the time interval of the second time axis, encodes the accumulated event types into a graph of a preset second shape, and displays the The total number of each event type in a time interval of the second time axis. Wherein, the second shape includes but is not limited to: a histogram, a histogram, a line graph, and the like. According to the time interval of the created second time axis, the displayed total quantity of various event types reflects the comparison of the quantity of each event type in the same time interval. When the time interval of the second time axis represents a day or a week, the user can determine three How event types compare in total volume. In addition, the displayed column graph can also determine the comparison of the total number of the three event types according to thickness, transparency, and the like. Referring to Fig. 3 again, as shown in the figure, a horizontal histogram is displayed on the side (the right side in the illustration) near the first time axis T1, and " Red", "yellow" and "blue" three columnar bars, the length of the columnar bar represents the total number of event types generated in the time interval of the second time axis, as can be seen from the second shape, in the second In the time interval of the time axis, the "yellow" colored columnar bars represent the most attention events, followed by the "red" colored columnar bars, and the "blue" colored columnar bars indicate has the fewest likes.

通过显示第二时间轴的时间区间内所发生的各事件类型的总数量，领域专家能够从另一视角清晰获得依据时间所统计的事件类型在数量上的变化过程。为了更清晰的显示第一时间轴和第二时间轴之间的关联关系，在步骤S13中，基于对所述数据特征的编码，显示第二时间轴，将所述第二形状中表征的事件类型与该事件类型在所述第二时间轴的各时间粒度进行关联，并通过第三形状表征的各事件类型在所述第二时间轴上的分布。其中，将第二时间轴呈现成以相应的时间粒度为节点的轴线，利用第三形状将分布在各相邻节点的事件类型与第二形状进行关联，使得用户清晰地获得第二形状与第二时间轴的各时间粒度之间的关联关系。其中，第三形状可以是线条状，所述线条的颜色可依据第二形状中对应事件类型的颜色而定，以便于让用户清晰分辨统一的事件类型。By displaying the total quantity of each event type occurring in the time interval of the second time axis, domain experts can clearly obtain the change process of the event type statistics based on time from another perspective. In order to display the relationship between the first time axis and the second time axis more clearly, in step S13, based on the encoding of the data features, the second time axis is displayed, and the events represented in the second shape The type is associated with each time granularity of the event type on the second time axis, and the distribution of each event type on the second time axis represented by a third shape. Among them, the second time axis is presented as an axis with corresponding time granularity as nodes, and the third shape is used to associate the event types distributed in each adjacent node with the second shape, so that the user can clearly obtain the relationship between the second shape and the second shape. The relationship between the time granularities of the two time axes. Wherein, the third shape may be in the shape of a line, and the color of the line may be determined according to the color of the corresponding event type in the second shape, so as to allow the user to clearly distinguish a unified event type.

复请参阅图3，如图所示，通过第三形状将第二形状和第二时间轴进行关联，其中，第三形状以弧线为例并基于数据集中各事件类型的时间信息散布到第二时间轴的各时间粒度的节点上。比如图中用点状线(第一种虚线)表示“红”色的柱状条表示的馈赠事件和第二时间轴上相应时间节点(时间粒度)的关联，用连续线表示“黄”色的柱状条表示的关注事件和第二时间轴上相应时间节点(时间粒度)的关联，用点和线段(第二种虚线)组成的线表示“蓝”色的柱状条表示的点赞事件和第二时间轴上相应时间节点(时间粒度)的关联。在不同的实施方式中，所述第三形状利用线条粗细或透明度来描述在相应时间粒度间隔内所产生的事件类型的数量，由此便于呈现事件发生的高频时段或规律。Please refer to Figure 3 again, as shown in the figure, the second shape is associated with the second time axis through the third shape, where the third shape takes an arc as an example and spreads to the second time axis based on the time information of each event type in the data set On the nodes of each time granularity of the second time axis. For example, a dotted line (the first type of dotted line) is used in the figure to indicate the association between the gift event represented by the "red" columnar bar and the corresponding time node (time granularity) on the second time axis, and a continuous line is used to indicate the "yellow" color The relationship between the attention event represented by the column bar and the corresponding time node (time granularity) on the second time axis, the line composed of dots and line segments (the second dotted line) represents the like event represented by the "blue" color column bar and the second time node. The association of corresponding time nodes (time granularity) on the two time axes. In different implementations, the third shape uses line thickness or transparency to describe the number of event types generated within a corresponding time granularity interval, thereby facilitating presentation of high-frequency periods or rules of event occurrence.

为了更直观地显示第二时间轴上各时间粒度间隔内所发生的事件类型及数量，在步骤S13中，显示第四形状，以表征所述群组在所述第二时间轴的每一时间粒度内发生的事件类型及数量。其中，所述欺诈事件检测系统按照第二时间轴的时间区间对数据集中的事件类型的数量进行加和或分布统计，将所累加的事件类型或分布情况编码成预设的第四形状的图形，并按照时序将所编码的各第四形状作为第二时间轴的节点呈现在第二时间轴上。其中，根据所创建的第二时间轴的时间粒度在第三形状的指引下，显示对应的第四形状。通过第二时间轴上各节点的显示，用户能够从另一视角清晰获得依据时间所统计的事件类型在数量上的变化过程。其中，所述第四形状包括但不限于：饼状形状、或柱状形状，且选择不同于第一形状的形状。在一些实施示例中，欺诈事件检测系统可以将第二时间轴的时间粒度内的各事件类型的数量累加和分别编码成第四形状的图形并显示在第二时间轴上，其中，同样事件类型的累加和采用与第三形状及第二形状相同的颜色。In order to more intuitively display the types and quantities of events occurring in each time granularity interval on the second time axis, in step S13, a fourth shape is displayed to represent the group at each time of the second time axis The type and number of events that occur within the granularity. Wherein, the fraud event detection system sums or distributes the number of event types in the data set according to the time interval of the second time axis, and encodes the accumulated event types or distribution into a preset fourth-shaped graph , and present the encoded fourth shapes on the second time axis as nodes of the second time axis according to time sequence. Wherein, a corresponding fourth shape is displayed under the guidance of the third shape according to the created time granularity of the second time axis. Through the display of each node on the second time axis, the user can clearly obtain the change process of the number of event types counted according to time from another perspective. Wherein, the fourth shape includes but is not limited to: a pie shape or a columnar shape, and a shape different from the first shape is selected. In some implementation examples, the fraudulent event detection system can accumulate and respectively encode the quantities of each event type in the time granularity of the second time axis into a graph of a fourth shape and display it on the second time axis, wherein the same event type The sum of is the same color as the third shape and the second shape.

以时间轴作为呈现群组数据的方式之一，是因为无论是领域专家还是算法专家，理解用户在一个段时间内的集中性行为是非常关键的。为此，通过执行步骤S13将第一时间轴和第二时间轴的结合来描述这种集中性的行为。The timeline is used as one of the ways to present group data because it is very important for domain experts and algorithm experts to understand the centralized behavior of users within a period of time. For this reason, this centralized behavior is described by performing step S13 to combine the first time axis and the second time axis.

请参阅图3，如图所示，第一时间轴T1中的每个饼状图都呈现了每个时间粒度的(如每天)不同事件类型(如关注了一个用户或给某个用户在网上送了一个礼物)所占的比例。将各事件类型编码为不同颜色，将第一时间轴T1的单位时间粒度内各事件类型的数量编码为饼图中各区域的面积占比以形成一个饼图，将第二时间轴T2的时间区间内各事件类型的数量编码为柱状图形的长度以形成对应各事件类型的柱状图(即第二形状)，将第二时间轴T2的单位时间粒度内各事件类型数量编码为柱状图的长度以形成单独的柱状图(即第四形状)；当用户选择第一时间轴T1上的一个饼图时，自对应各事件类型的第二形状射出以事件类型为颜色的弧线(即第三形状)，并对应到第二时间轴T2上对应时间粒度的各第四形状上，由此将一个群组数据集中各事件类型的时间轴关系清晰地呈现给用户。Please refer to Figure 3, as shown in the figure, each pie chart in the first time axis T1 presents different event types (such as following a user or giving a user online sent a gift). Encode each event type into a different color, encode the number of each event type in the unit time granularity of the first time axis T1 into the area ratio of each area in the pie chart to form a pie chart, and encode the time of the second time axis T2 The number of each event type in the interval is encoded as the length of the histogram to form a histogram corresponding to each event type (that is, the second shape), and the number of each event type in the unit time granularity of the second time axis T2 is encoded as the length of the histogram to form a separate histogram (i.e. the fourth shape); when the user selects a pie chart on the first time axis T1, an arc in the color of the event type is emitted from the second shape corresponding to each event type (i.e. the third shape), and correspond to each fourth shape corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of each event type in a group data set to the user.

在一种实施方式中，按照相同的时间区间及时间粒度对创建第一时间轴及第二时间轴。例如，所述欺诈事件检测系统预先加载时间粒度均相同的第一时间轴和第二时间轴，以供所述欺诈事件检测系统按照数据集中的时间信息和时间粒度将各事件类型对应到各时间轴上，以得到各自时间轴的至少一个时间区间。又如，所述欺诈事件检测系统根据数据集中的时间信息的排序，确定预先设定的第一时间轴和第二时间轴的时间区间，并按照数据集中的时间信息和时间粒度将各事件类型对应到各时间轴上。请参阅图3显示的包含第一时间轴T1和第二时间轴T2的界面。其中，T1和T2时间轴均以天为时间粒度，均以10天为时间区间，所述欺诈事件检测系统可通过执行前述之各步骤按照数据集中的时间信息在所述第一时间轴T1和第二时间轴T2上显示数据集中的数据特征。例如以图3所示的，第二时间轴T2以天为时间粒度，欺诈事件检测系统可将每天所统计的每种事件类型的总和布编码成柱状图形并作为节点显示在第二时间轴T2上。In one embodiment, the first time axis and the second time axis are created according to the same time interval and time granularity. For example, the fraud event detection system preloads the first time axis and the second time axis with the same time granularity, so that the fraud event detection system can map each event type to each time according to the time information and time granularity in the data set. axis to get at least one time interval of the respective time axis. As another example, the fraud event detection system determines the preset time intervals of the first time axis and the second time axis according to the sorting of the time information in the data set, and classifies each event type according to the time information and time granularity in the data set corresponding to each time axis. Please refer to the interface including the first time axis T1 and the second time axis T2 shown in FIG. 3 . Wherein, the T1 and T2 time axes both use days as the time granularity, and both use 10 days as the time interval. The data features in the data set are displayed on the second time axis T2. For example, as shown in Figure 3, the second time axis T2 takes days as the time granularity, and the fraud event detection system can encode the sum of each event type counted every day into a columnar graph and display it as a node on the second time axis T2 superior.

在另一实施示例中，按照不同的时间区间及时间粒度对创建第一时间轴及第二时间轴。其中，所述第二时间轴的时间区间为所述第一时间轴的时间粒度。例如，预先设定第一时间轴和第二时间轴的时间粒度不同，以及预设两个时间轴之间时间粒度之间的对应关系，所述欺诈事件检测系统按照数据集中的时间信息将各事件类型对应到各时间轴上。请参阅图5，其显示为包含第一时间轴T1和第二时间轴T2的界面。其中，T1时间轴以10天为时间区间，以天为时间粒度，T2时间轴以天为时间区间，以小时为时间粒度；所述欺诈事件检测系统可通过执行后续步骤按照数据集中的时间信息在所述第一时间轴T1和第二时间轴T2上显示数据集中的数据特征。例如以图5所示的界面C2，第二时间轴T2以小时为时间粒度，欺诈事件检测系统可将每小时所统计的事件类型的总和编码成柱状图形并作为节点显示在第二时间轴T2上。In another implementation example, the first time axis and the second time axis are created according to different time intervals and time granularity pairs. Wherein, the time interval of the second time axis is the time granularity of the first time axis. For example, the time granularity of the first time axis and the second time axis are preset to be different, and the corresponding relationship between the time granularities between the two time axes is preset, and the fraud event detection system divides each Event types correspond to each time axis. Please refer to FIG. 5 , which shows an interface including a first time axis T1 and a second time axis T2 . Wherein, the T1 time axis takes 10 days as the time interval, with days as the time granularity, and the T2 time axis takes days as the time interval, and takes hours as the time granularity; the fraud event detection system can perform follow-up steps according to the time information in the data set The data features in the data set are displayed on the first time axis T1 and the second time axis T2. For example, in the interface C2 shown in Figure 5, the second time axis T2 takes hours as the time granularity, and the fraud event detection system can encode the sum of the event types counted every hour into a column graph and display it as a node on the second time axis T2 superior.

将第一时间轴T1的时间粒度内各事件类型的数量编码为饼图中各区域的面积占比以形成一个饼图；当用户选择第一时间轴T1上的一个饼图时，将第二时间轴T2的时间区间内各事件类型(相当于所选择的饼图所对应的各事件类型)的数量编码为柱状图形的长度以形成对应各事件类型的柱状图(即第二形状)，将第二时间轴T2的单位时间粒度内各事件类型数量编码为柱状图的长度以形成单独的柱状图(即第四形状)，以及自对应各事件类型的第二形状射出以事件类型为颜色的弧线(即第三形状)，并对应到第二时间轴T2上对应时间粒度的各第四形状上，由此将一个群组数据集中各事件类型的时间轴关系清晰地呈现给用户。Encode the quantity of each event type in the time granularity of the first time axis T1 into the area ratio of each area in the pie chart to form a pie chart; when the user selects a pie chart on the first time axis T1, the second The number of each event type (equivalent to each event type corresponding to the selected pie chart) in the time interval of the time axis T2 is encoded as the length of the bar graph to form a histogram (ie, the second shape) corresponding to each event type. The number of each event type in the unit time granularity of the second time axis T2 is coded as the length of the histogram to form a separate histogram (that is, the fourth shape), and from the second shape corresponding to each event type, the event type is colored The arc (that is, the third shape) corresponds to each fourth shape corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of each event type in a group data set to the user.

如图4所示界面C1，第一时间轴T1中的每个饼状图都展示了每个时间粒度的(如每天)不同事件类型(如关注了一个用户或给某个用户在网上送了一个礼物)所占的比例。将各事件类型编码为不同颜色，比如图示中所述饼状图形中的颜色用于代表事件类型，比如图中标为“黄”色的表示为关注事件，图中标为“红”色的表示为馈赠事件，图中标为“蓝”色的表示为点赞事件，比如图示中第一时间轴T1上以饼状图形作为节点显示的8月7日这天，产生的事件类型中关注事件占比较多，馈赠事件占比较少，点赞事件占比最少。当用户选择第一时间轴T1上的节点8月7日这天时，则在第二时间轴T2上则显示8月7日这天24个小时内，每个小时发生的事件类型以及各事件类型对应的数量。As shown in the interface C1 in Figure 4, each pie chart in the first time axis T1 shows different event types (such as following a user or sending a message to a user online) at each time granularity (such as every day) a gift). Code each event type into a different color. For example, the color in the pie graph mentioned in the illustration is used to represent the event type. For example, the event marked with "yellow" in the figure indicates an event of concern, and the one marked with "red" in the figure indicates For gift events, those marked with "blue" in the figure represent like events. For example, on the first time axis T1 in the figure, a pie graph is used as a node on August 7th, and the following events are among the event types generated. The proportion is relatively large, the proportion of gift events is relatively small, and the proportion of like events is the least. When the user selects the node August 7th on the first timeline T1, the event types and event types that occur every hour within 24 hours on August 7th are displayed on the second timeline T2 corresponding quantity.

需要特别说明的是，上述各实施例中的第一时间轴和第二时间轴的时间区间及时间粒度并不仅受限于所举例的情况，在不同的实施例中，用户可依据实际的情况设置第一时间轴和第二时间轴的时间区间及时间粒度，比如为周、月、季度甚至年等时间单位。It should be noted that the time intervals and time granularity of the first time axis and the second time axis in the above-mentioned embodiments are not limited to the illustrated cases, and in different embodiments, the user can Set the time interval and time granularity of the first time axis and the second time axis, such as time units such as weeks, months, quarters, or even years.

用户可利用该呈现过程和所展示的统计情况对欺诈事件检测系统所分类的群组进行检测并利用该可视化的界面让领域专家发现或纠正检测算法中的不足。此外，为了更清晰地显示两时间轴的关联关系，所述可视化方法还包括所述第一形状在被选择时，通过所述第三形状动态、高亮、或动态且高亮地显示所述第一形状表征的时间粒度内发生的事件类型在所述第二时间轴的分布。例如，在图4所示的界面C1中，当用户选中第一时间轴T1上的一个饼图时，与在第二时间轴T2上对应所选中饼图的柱状图相连的各第三形状闪烁数秒或者更长时间的闪烁，也或者高亮显示，当用户选中第一时间轴T1上另一个饼图时，此前闪烁及高亮的第三形状恢复初始形状和颜色，且与在第二时间轴T2上对应所选中饼图的柱状图相连的各第三形状闪烁数秒且高亮显示。Users can use the presentation process and displayed statistics to detect the groups classified by the fraud event detection system and use the visualized interface to allow domain experts to discover or correct deficiencies in the detection algorithm. In addition, in order to display the relationship between the two time axes more clearly, the visualization method further includes dynamically, highlighting, or dynamically and highlighting displaying the first shape through the third shape when the first shape is selected. The distribution of event types occurring within the time granularity represented by the first shape on the second time axis. For example, in the interface C1 shown in FIG. 4, when the user selects a pie chart on the first time axis T1, each third shape connected to the histogram corresponding to the selected pie chart on the second time axis T2 flashes Flashing or highlighting for a few seconds or longer, when the user selects another pie chart on the first time axis T1, the previously flashing and highlighted third shape restores its original shape and color, and is the same as the pie chart at the second time Each third shape connected to the histogram corresponding to the selected pie chart on the axis T2 blinks for a few seconds and is highlighted.

在某些实施例中，当用户选中第一时间轴上的一第一形状时，所述可视化方法还可以执行所述第一形状在被选择时显示放大的步骤，以便用户更清晰地查看第一形状所表征的事件类型数量的对比情况。在一种具体示例中，所述第一形状在被选择时在所述第一时间轴的一侧放大显示。例如，所选中的第一形状在第一时间轴上侧放大显示，呈如图6a所示的界面C3。在另一具体示例中，所述第一形状在被选择时在所述第一时间轴中放大显示。例如，所选中的第一形状在第一时间轴的同一圆心位置被放大显示，呈如图6b所示的界面C4。In some embodiments, when the user selects a first shape on the first time axis, the visualization method may also perform the step of displaying and zooming in on the first shape when selected, so that the user can view the first shape more clearly. A comparison of the number of event types represented by a shape. In a specific example, when the first shape is selected, it is displayed enlarged on one side of the first time axis. For example, the selected first shape is enlarged and displayed on the upper side of the first time axis, presenting an interface C3 as shown in FIG. 6 a . In another specific example, when the first shape is selected, it is displayed enlarged in the first time axis. For example, the selected first shape is enlarged and displayed at the same center position of the first time axis, presenting an interface C4 as shown in FIG. 6b.

在另一种具体示例中，当用户选中第一时间轴T1上的一个饼图时，所述第一形状在被选择时在所述第一时间轴TI的一侧放大显示的同时，与在第二时间轴T2上对应所选中饼图的柱状图相连的各第三形状闪烁数秒或者更长时间的闪烁显示，诚如图6c所示的界面C5，当用户选中第一时间轴T1上表征8月7日的饼图时，所述表征8月7日的饼图在被选择时在所述第一时间轴TI的一侧放大显示，而且，与在第二时间轴T2上对应表征8月7日的柱状图相连的各线条闪烁数秒或者更长时间的闪烁显示。再例如图6d所示的界面C6，当用户选中第一时间轴T1上表征8月7日的饼图时，所述表征8月10日的饼图在被选择时在所述第一时间轴TI的一侧放大显示，而且，与在第二时间轴T2上对应表征8月10日的柱状图相连的各线条高亮显示。In another specific example, when the user selects a pie chart on the first time axis T1, when the first shape is selected, it will be enlarged and displayed on one side of the first time axis TI. Each third shape connected to the histogram corresponding to the selected pie chart on the second time axis T2 blinks for a few seconds or longer, as shown in interface C5 shown in Figure 6c. When the user selects the first time axis T1 to represent For the pie chart on August 7, when the pie chart representing August 7 is selected, it will be enlarged and displayed on one side of the first time axis TI, and the corresponding representation 8 on the second time axis T2 Each line connected to the histogram of March 7 blinks for a few seconds or longer. Another example is the interface C6 shown in Figure 6d. When the user selects the pie chart representing August 7 on the first time axis T1, the pie chart representing August 10 will appear on the first time axis when selected. One side of TI is displayed enlarged, and each line connected to the histogram corresponding to August 10 on the second time axis T2 is highlighted.

在一些实施例中，用户不仅关心群组数据集中各事件类型依时间轴所呈现的变化情况，更关心所分配的群组是否合理，这需要用户能够查看每个群组中的详细数据特征及用于分类群组而构建的各数据特征的优选次序。所述可视化方法可包含显示一个群组的数据集的界面的步骤。所显示的数据集以列表方式予以显示，由此为用户显示同一群组中数据特征的详细信息。为提高所述群组数据集分类准确性，所述界面中所显示的列表可依据欺诈事件检测系统分类时所依据的分类优先级将一个群组中的数据特征列表逐列展示。例如，请参阅图7，显示本申请在一实施例中显示的一个群组的数据集的列表界面示意图。在所述列表界面示意图中，所显示的一个群组的数据集是按照数据特征的相似性为优先级由高到低的顺序排序而得的。当第一优先级中的数据特征相似性相同时，按照第二优先级的数据特征进行排序，在图7所示的实施例中，所述优先级由高向低的顺序为：IP地址、事件发起源(source)、事件响应方(target)、事件类型(event_type)及事件发生时间(timestamp)。在本实施例中，将表格的抬头(表头)用不同列的重要性进行编码，如果一个特征的取值越集中，那么这个特征就越重要。在本申请提供的一实施例中，所述欺诈事件检测系统是通过计算每个特征的信息熵来代表这一特性。如果信息熵越低，那么意味着一致性就越高。然后所述欺诈事件检测系统将特征按照信息熵递增的顺序进行排序，最终将低信息熵的列表头顺序靠前来提示户的注意，当然，不同的实施情况下，还可以依据将显示的表格中的列表头进行颜色渲染，比如最终将低信息熵的列表头的颜色渲染为最深来提示户的注意该列所表征的数据特征最为重要，以此类推进行颜色渲染该列所表征的其他数据特征，进而得到图中所示的数据集列表界面。该列表界面可承接在显示多个群组界面的步骤或步骤S13之后，再或者基于用户选择该列表界面的选择操作而显示。In some embodiments, the user not only cares about the changes of each event type in the group data set according to the time axis, but also cares about whether the assigned group is reasonable, which requires the user to be able to view the detailed data characteristics and The preferred order of each data feature constructed for classifying groups. The visualization method may comprise the step of displaying an interface of a group of data sets. The displayed datasets are displayed in a tabular manner, thereby presenting the user with detailed information on the characteristics of the data in the same group. In order to improve the classification accuracy of the group data sets, the list displayed on the interface can display a list of data features in a group column by column according to the classification priority used by the fraud event detection system for classification. For example, please refer to FIG. 7 , which shows a schematic diagram of a list interface of a group of data sets displayed in an embodiment of the present application. In the schematic diagram of the list interface, the displayed data sets of a group are sorted according to the similarity of data features in order of priority from high to low. When the similarity of the data features in the first priority is the same, sort according to the data features of the second priority. In the embodiment shown in FIG. 7, the order of the priority from high to low is: IP address, Event source (source), event responder (target), event type (event_type) and event time (timestamp). In this embodiment, the header (header) of the table is coded with the importance of different columns, and the more concentrated the value of a feature is, the more important the feature is. In an embodiment provided by the present application, the fraud event detection system represents this feature by calculating the information entropy of each feature. The lower the information entropy, the higher the consistency. Then the fraud event detection system sorts the features in the order of increasing information entropy, and finally puts the list headers with low information entropy in order to prompt the user's attention. Of course, in different implementation situations, it can also be based on the table to be displayed. For example, the color of the column header with low information entropy is finally rendered to the deepest to remind the user that the data characteristics represented by this column are the most important, and so on for color rendering of other data represented by this column Features, and then get the data set list interface shown in the figure. The list interface may be displayed after the step of displaying multiple group interfaces or after step S13, or based on the user's selection operation of selecting the list interface.

在某些实施例中，为更进一步表征所获取的群组的数据集是否能够反映欺诈事件的特性，还需要从其他维度进行展示。例如，通过比对正常用户的网络操作数据和群组数据集来进一步确认所检测的欺诈事件的准确性。为此，所述可视化方法还包括：显示所述群组的数据集的特征分布的界面的步骤。其中，所述特征分布界面可展示以各数据类型在整体网络中的分布，所述的整体网络是相对的，比如由多个网络用户组成一个集群，则可以通过界面显示该集群中某一个群组中的某一个数据特征的分布，请参阅图2，比如图2中最大虚线圆表示一个由多个网络用户组成集群，该集群中有11个群组，分别是编号为0-10的群组，从中选择一个群组进行信息展示。In some embodiments, in order to further characterize whether the acquired data set of the group can reflect the characteristics of the fraud event, it needs to be displayed from other dimensions. For example, the accuracy of detected fraudulent events can be further confirmed by comparing normal users' network operation data with group data sets. To this end, the visualization method further includes the step of displaying an interface of the feature distribution of the data sets of the group. Wherein, the feature distribution interface can display the distribution of each data type in the overall network. The overall network is relative. For example, if a cluster is formed by multiple network users, a certain group in the cluster can be displayed through the interface. For the distribution of a data characteristic in a group, please refer to Figure 2. For example, the largest dotted circle in Figure 2 represents a cluster composed of multiple network users. There are 11 groups in this cluster, which are groups numbered 0-10 Group, select a group for information display.

在一些实施例中，特征分布界面可展示的数据类型例如为：平均操作时间间隔维度的信息熵(average operation interval entropy)，IP地址使用量维度的信息熵(IPused amount entropy)，性别维度的信息熵(sex entropy)，电子邮件维度的信息熵(emailentropy)，注册时间维度的信息熵(reg time entropy)，操作次数维度的信息熵(operation times entropy)，设备数量维度的信息熵(device amount entropy)，操作类型维度的信息熵(operation type entropy)，所使用IP被他人使用的最大量的信息熵(maxIP used be used amount entropy)等等。在图7所示的实施例中，以注册时间维度的信息熵为数据特征为例进行展示，即图7显示为一个群组中注册时间(注册时段)维度的信息熵在网络集群中的特征分布。为了有效比对所获取的群组数据集与正常用户的网络操作数据的特征分布差异，请参阅图8，其显示为显示所述群组的数据集的特征分布的界面的流程图，如图所示，包括以下步骤：In some embodiments, the data types that can be displayed on the feature distribution interface are, for example: information entropy in the dimension of average operation interval (average operation interval entropy), information entropy in the dimension of IP address usage (IPused amount entropy), information in the dimension of gender Entropy (sex entropy), information entropy of email dimension (emailentropy), information entropy of registration time dimension (reg time entropy), information entropy of operation times dimension (operation times entropy), information entropy of device quantity dimension (device amount entropy) ), the information entropy of the operation type dimension (operation type entropy), the maximum amount of information entropy of the IP used by others (maxIP used be used amount entropy), etc. In the embodiment shown in Figure 7, the information entropy of the dimension of registration time is used as an example to show the data characteristics, that is, Figure 7 shows the characteristics of the information entropy of the dimension of registration time (registration period) in a group in the network cluster distributed. In order to effectively compare the difference in feature distribution between the obtained group data set and the network operation data of normal users, please refer to Figure 8, which is shown as a flow chart of the interface showing the feature distribution of the group data set, as shown in Fig. shown, including the following steps:

在步骤S211中，选择一个所述群组，并从所述群组的数据集中确定至少一个数据特征。在一个实施例中，比如选择图3中标号为2的群组，并从所述标号为2的群组中的数据集中确定一个为用户信息的数据特征，比如所述用户信息为注册时间。In step S211, one of the groups is selected, and at least one data feature is determined from the data set of the group. In one embodiment, for example, the group marked 2 in FIG. 3 is selected, and a data characteristic of user information is determined from the data set in the group marked 2, for example, the user information is registration time.

在步骤S212中，统计所述确定的至少一个数据特征在所述群组及集群中的特征分布。在本实施例中，统计所述为注册时间的数据特征在所述群组中的特征分布，以及统计所述为注册时间的数据特征在所述整个集群中的特征分布。In step S212, the feature distribution of the determined at least one data feature in the groups and clusters is counted. In this embodiment, the feature distribution of the data feature that is the registration time in the group is counted, and the feature distribution of the data feature that is the registration time in the entire cluster is counted.

在步骤S213中，显示所述特征分布的直方图及对应所述直方图在整个集群直方图中的分布对比图。在本实施例中，基于对所述数据特征的编码，显示所述为注册时间的数据特征在所述群组中特征分布的直方图，以及显示所述为注册时间的数据特征在所述整个集群中特征分布的直方图。请参阅图9，显示为本申请在一实施例中一个群组中的注册时间的特征分布的直方图及对比图的界面，如图所示，在所述界面D中，图(a)显示为所选标号为2的群组中注册时间的特征分布缩略图，对应所述缩略图的放大，则为界面D中最下侧的放大图(d)，由所述放大图可以看出，在该群组中，自8月1日至8月31日的一个月中，该群组成员进行注册操作的时间集中在8月5日、8月6日，8月11日，8月12日，以及8月16日这5天，而在所述界面D中图(c)表征为所述集群中注册用户在8月份内进行注册操作的时间分布的直方图，从该图(c)可以看出，所述集群中注册用户在8月份内的注册分布具有一定的规律，在界面D中图(b)表征为将图(d)和图(c)进行重合叠加来展示为注册时间的数据特征在所述整个集群中和选择的群组中的差别。为了能够使得用户能够知道不同特征之间的区别和联系，本申请提供的实施例中将这个柱状图以三层形式进行呈现，用户通过点击其中一个缩略图后，页面将滚动到经过归一化的分布对比图。当然，在具体的应用中，所述数据特征的缩略图还可能有多个，每个代表不同的数据特征。In step S213, the histogram of the feature distribution and the distribution comparison diagram corresponding to the histogram in the whole cluster histogram are displayed. In this embodiment, based on the encoding of the data feature, display the histogram of the distribution of the data feature of the registration time in the group, and display the distribution of the data feature of the registration time in the entire A histogram of the distribution of features in the cluster. Please refer to Figure 9, which shows the interface of the histogram and comparison chart of the characteristic distribution of the registration time in a group in an embodiment of the present application, as shown in the figure, in the interface D, Figure (a) shows It is the thumbnail image of the feature distribution of the registration time in the selected group labeled 2, corresponding to the enlargement of the thumbnail image, it is the enlarged image (d) at the bottom of the interface D, as can be seen from the enlarged image, In this group, during the month from August 1st to August 31st, the group members registered on August 5th, August 6th, August 11th, and August 12th. day, and the 5 days of August 16th, and in the interface D, the figure (c) is represented as a histogram of the time distribution of registered users in the cluster performing registration operations in August, from the figure (c) It can be seen that the registration distribution of registered users in the cluster in August has a certain pattern, and the picture (b) in interface D is characterized by overlapping and superimposing pictures (d) and pictures (c) to display the registration time The difference in data characteristics in the entire cluster and in selected groups. In order to enable users to know the differences and connections between different features, the embodiment provided by this application presents the histogram in a three-layer form. After the user clicks one of the thumbnails, the page will scroll to the normalized The distribution comparison chart. Certainly, in a specific application, there may be multiple thumbnail images of the data feature, each representing a different data feature.

在一些实施例中，还可以通过对直方图进行颜色渲染以区分或强调某个数据特征在所述群组及整个集群中特征分布，或者动态显示(比如闪烁的方式)以区分或强调某个数据特征在所述群组及整个集群中特征分布。In some embodiments, the color rendering of the histogram can also be used to distinguish or emphasize the distribution of a certain data feature in the group and the entire cluster, or dynamically display (such as in a flashing manner) to distinguish or emphasize a certain data feature The data characteristics are distributed characteristically in the group and across the cluster.

在一些实施例中，为了进一步分析一个网络集群中的多个群组之间的差异，所述群组数据可视化方法还包括显示多个群组的数据集的特征分布的界面的步骤，请参阅图10及图11，图10显示为本申请在一个实施例中显示多个群组在集群中分布的步骤流程图，图11显示为本申请在一个实施例中显示多个群组在集群中分布界面E，如图所示，所述步骤包括：In some embodiments, in order to further analyze the differences between multiple groups in a network cluster, the group data visualization method further includes the step of displaying the interface of feature distribution of data sets of multiple groups, see Figure 10 and Figure 11, Figure 10 shows a flow chart of the steps of the application showing the distribution of multiple groups in the cluster in one embodiment, and Figure 11 shows the application showing the distribution of multiple groups in the cluster in one embodiment Distribution interface E, as shown in the figure, the steps include:

在步骤S311中，由多个网络用户组成的集群中确定多个群组，分别用不同形状、图标、标签和/或颜色表征所述多个群组的不同；在一个实施例中，比如选择图3中标号0、1和2的3个群组，其中，标号为0的群组用“绿”色表示，标号为1的群组用“红”色表示，标号为2的群组用“蓝”色表示。In step S311, a plurality of groups are determined in a cluster composed of a plurality of network users, and different shapes, icons, labels and/or colors are used to represent the differences of the plurality of groups; in one embodiment, such as selecting The three groups marked 0, 1 and 2 in Figure 3, wherein the group marked 0 is represented by "green", the group marked 1 is represented by "red", and the group marked 2 is represented by "Blue" color representation.

在步骤S312中，从所述多个群组的数据集中确定至少一个数据特征；在本实施例中，从所述这3个群组的数据集中确定一个数据特征，比如IP地址。In step S312, at least one data characteristic is determined from the data sets of the plurality of groups; in this embodiment, one data characteristic, such as an IP address, is determined from the data sets of the three groups.

在步骤S313中，基于所述至少一个数据特征分析各该群组中每两个网络用户之间的相对信息熵作为度量所述每两个网络用户之间的相似程度；在本实施例中，基于所述IP地址分析标号0、1和2的3个群组中每两个网络用户之间的相对信息熵(IP使用量维度的信息熵，IP used amount entropy)作为度量所述每两个网络用户之间的相似程度。比如，采用数据降维的方法t-SNE(t-分布邻域嵌入算法)并用两个用户之间的相对熵来作为度量这些网络用户距离的指标。In step S313, based on the at least one data feature, analyze the relative information entropy between every two network users in each group as a measure of the similarity between every two network users; in this embodiment, The relative information entropy (the information entropy of IP usage dimension, IP used amount entropy) between every two network users in the 3 groups of label 0, 1 and 2 based on described IP address analysis is used as the measure described every two The degree of similarity between network users. For example, the data dimension reduction method t-SNE (t-distribution neighborhood embedding algorithm) is used and the relative entropy between two users is used as an index to measure the distance between these network users.

在步骤S314中，输出显示界面，在所述界面中，用形状、图标、和/或标签表征网络用户，用不同颜色表征所述多个群组的不同，用显示的距离表征每一群组中两个网络用户之间的相似程度。在本实施例中，呈如图11所示的界面E，用圆点表征网络用户，“绿”色表示标号为0的群组，用“红”色表示标号为1的群组，用蓝“蓝”色表示标号为2的群组，其中，用“蓝”色表示标号为2的群组中的用户距离比较短，该群组成簇状分布，用“红”色表示标号为1的群组中的用户距离也比较短，该群组成簇状分布，用“绿”色表示随机抽样的正常用户的分布，正常用户之间的距离较远，分布更为分散。藉此可以认为，一个群组如果是稠密的一簇，其被认为是一个欺诈组的可能性越大。比如图11所示的实施例中，该用“绿”色表示的群组呈较为分散的分布，则表示为该“绿”色群组为正常群组，其中的“绿”点表示的用户也为正常用户。相反的，用“红”色表示的群组(即标号为1的群组)以及用“蓝”色表示的群组(即标号为2的群组)呈成簇状分布，则表示为该“红”色及“蓝”色群组为异常群组，其中，用“红”点及“蓝”点表示的用户为异常用户。在一实施例中，使用所述可视化系统的用户可交互式地通过鼠标悬浮来查看每个群组中用户的具体信息及特征取值。In step S314, a display interface is output, in which network users are represented by shapes, icons, and/or labels, differences in the multiple groups are represented by different colors, and each group is represented by a displayed distance The degree of similarity between two network users in . In this embodiment, an interface E as shown in Figure 11 is presented, with dots representing network users, "green" color representing the group labeled 0, "red" color representing the group labeled 1, and blue The "blue" color represents the group labeled 2, among which, the "blue" color indicates that the distance between users in the group labeled 2 is relatively short, and the group is distributed in clusters, and the "red" color indicates that the group labeled 1 The distance between users in the group is also relatively short, and the group forms a cluster distribution, and the distribution of normal users randomly sampled is represented by "green" color, and the distance between normal users is farther, and the distribution is more scattered. From this, it can be considered that if a group is a dense cluster, it is more likely to be considered a fraudulent group. For example, in the embodiment shown in Figure 11, the group represented by the "green" color is relatively scattered, and it means that the "green" color group is a normal group, and the users represented by the "green" point Also for normal users. On the contrary, the group represented by "red" color (that is, the group marked 1) and the group represented by "blue" color (that is, the group marked 2) are distributed in clusters, which means that "Red" and "blue" groups are abnormal groups, and users represented by "red" and "blue" dots are abnormal users. In an embodiment, the user using the visualization system can interactively check the specific information and feature values of the users in each group by hovering over the mouse.

在其他的实施例中，在输出的界面中，也可以用例如为形状、图标、和/或标签表征网络用户，比如形状为三角形、矩形等几何图形，比如图标为笑脸或哭脸、骷髅头像、强盗头像等图标，比如标签用文字或者具有明确区分的符号等。In other embodiments, in the output interface, network users can also be represented by, for example, shapes, icons, and/or labels, such as geometric figures such as triangles and rectangles, such as smiling faces or crying faces, skull heads, etc. , robber’s avatar and other icons, such as labels with text or symbols with clear distinctions, etc.

本申请的群组数据可视化方法通过将欺诈事件检测过程中所确定群组的数据集基于时间轴、类型分布、分类列表等方式予以呈现，实现了将欺诈事件检测期间所分群组的数据特征以多种关系界面进行展示，有利于领域专家和算法专家对欺诈事件检测系统的检测算法进行评估和修订。The group data visualization method of the present application presents the data sets of the groups determined in the fraud event detection process based on the time axis, type distribution, classification list, etc., and realizes the data characteristics of the groups grouped during the fraud event detection It is displayed in a variety of relational interfaces, which is beneficial for domain experts and algorithm experts to evaluate and revise the detection algorithm of the fraud event detection system.

本申请还提供一种计算机设备，所述计算机设备可以是以下合适的计算机设备，诸如手持计算机设备、平板计算机设备、笔记本计算机、桌上型计算机，服务器等。计算机设备包括显示器、输入装置、输入/输出(I/O)端口、一个或多个处理器、存储器、非易失性存储设备、网络接口、以及电源等。所述的各种部件可包括硬件元件(例如芯片和电路)、软件元件(例如存储指令的有形非暂态计算机可读介质)、或者硬件元件和软件元件的组合。此外，需注意，各种部件可被组合成更少的部件或者被分离成附加部件。例如，存储器和非易失性存储设备可被包括在单个部件中。所述计算机设备可单独执行所述可视化方法，或与其他计算机设备配合执行。The present application also provides a computer device, and the computer device may be the following suitable computer devices, such as a handheld computer device, a tablet computer device, a notebook computer, a desktop computer, a server, and the like. Computer equipment includes displays, input devices, input/output (I/O) ports, one or more processors, memory, non-volatile storage devices, network interfaces, and power supplies, among others. The various components described may include hardware elements (such as chips and circuits), software elements (such as a tangible, non-transitory computer readable medium storing instructions), or a combination of hardware and software elements. Also, note that various components may be combined into fewer components or separated into additional components. For example, memory and non-volatile storage may be included in a single component. The computer device can execute the visualization method alone, or cooperate with other computer devices.

请参阅图12，显示为本申请计算机设备在一实施例中的架构示意图，如图所示，在本实施方式中，所述计算机设备1包括一个或多个处理器以及在所述处理器上执行的呈现引擎，用以执行上述可视化方法并将相应的可视化界面予以展示。例如，计算机设备包含处理器、显示器以及在所述处理器上执行的呈现引擎，其中，在所述处理器上执行的呈现引擎(或显示引擎)，所述呈现引擎用于执行上述实施例中描述的群组数据可视化方法并通过显示器予以显示，执行所述群组数据可视化方法的实施过程的描述参阅针对图1至图11的描述。在具体的实施状态下，所述呈现引擎例如为被存储在本地计算机设备的存储器上或者远程存储服务器上，所述呈现引擎包括但不限于能够解析基于程序语言开发的用于界面显示的软件及硬件，如XML、HTML等脚本语言、C语言等。在又一些实施方式中，一台计算机设备执行可视化方法并将相应的可视化界面提供给另一台计算机设备予以展示。例如，客户端基于用户的请求操作向服务端发起请求并登录所述服务端，服务端执行可视化方法以形成相应的界面数据，并将所述界面数据反馈给客户端，由客户端的浏览器或定制的应用程序按照相应界面数据显示相应图示。Please refer to FIG. 12 , which shows a schematic diagram of the architecture of a computer device in an embodiment of the present application. As shown in the figure, in this embodiment, the computer device 1 includes one or more processors and on the processors The executed presentation engine is used to execute the above visualization method and display the corresponding visualization interface. For example, a computer device includes a processor, a display, and a rendering engine executed on the processor, wherein the rendering engine (or display engine) executed on the processor is used to execute the display engine in the above-mentioned embodiments. The described group data visualization method is displayed on a display, and for the description of the implementation process of the group data visualization method, refer to the description of FIGS. 1 to 11 . In a specific implementation state, the presentation engine is, for example, stored on a memory of a local computer device or on a remote storage server, and the presentation engine includes but is not limited to software that can analyze software developed based on a programming language for interface display and Hardware, such as XML, HTML and other scripting languages, C language, etc. In still other embodiments, a computer device executes the visualization method and provides a corresponding visualization interface to another computer device for display. For example, the client initiates a request to the server based on the user's request operation and logs in to the server, the server executes a visualization method to form corresponding interface data, and feeds back the interface data to the client, and the client's browser or The customized application program displays the corresponding icons according to the corresponding interface data.

本申请还提供一种客户端，所述客户端通过网络连接一服务端，在本实施例中，所述客户端例如为web客户端，所述客户端例如为web服务端，所述web客户端基于发送web业务请求以登录所述web服务端执行上述实施例中描述的群组数据可视化方法并通过显示器予以显示，执行所述群组数据可视化方法的实施过程的描述参阅针对图1至图11的描述。The present application also provides a client, which is connected to a server through a network. In this embodiment, the client is, for example, a web client, and the client is, for example, a web server. The web client The terminal sends a web service request to log in to the web server to execute the group data visualization method described in the above embodiment and display it through the display. For the description of the implementation process of the group data visualization method, refer to FIG. 1 to FIG. 11 descriptions.

本申请还提供一种服务器，通过网络连接一客户端，在本实施例中，所述客户端例如为web客户端，所述客户端例如为web服务端，所述web服务器基于web客户端执行请求的操作，向所述客户端发送执行上述实施例中描述的群组数据可视化方法并通过显示器予以显示，执行所述群组数据可视化方法的实施过程的描述参阅针对图1至图11的描述。The present application also provides a server connected to a client through a network. In this embodiment, the client is, for example, a web client, and the client is, for example, a web server. The web server is executed based on the web client. The requested operation is to send to the client to execute the group data visualization method described in the above embodiment and display it on the display. For the description of the implementation process of the group data visualization method, please refer to the description of Fig. 1 to Fig. 11 .

本申请还提供一种浏览器，通过网络连接一服务端，所述浏览器基于发送请求以登录所述服务端执行上述实施例中描述的群组数据可视化方法并通过显示器予以显示，执行所述群组数据可视化方法的实施过程的描述参阅针对图1至图11的描述。在本实施例中，所述浏览器例如为网页浏览器，包括但不限于QQ浏览器、Internet Explorer浏览器、Firefox浏览器、Safari浏览器，Opera浏览器、Google Chrome浏览器、百度浏览器、搜狗浏览器、猎豹浏览器、360浏览器、UC浏览器、傲游浏览器、世界之窗浏览器等。The present application also provides a browser, which is connected to a server through a network. The browser logs in to the server based on sending a request to execute the group data visualization method described in the above embodiment and display it through a display. For the description of the implementation process of the group data visualization method, refer to the description of FIGS. 1 to 11 . In this embodiment, the browser is, for example, a web browser, including but not limited to QQ browser, Internet Explorer browser, Firefox browser, Safari browser, Opera browser, Google Chrome browser, Baidu browser, Sogou Browser, Cheetah Browser, 360 Browser, UC Browser, Maxthon Browser, Window of the World Browser, etc.

本申请还提供一种群组数据可视化系统，所述群组数据可视化系统可包含一个或多个计算机设备中的软件和硬件。为了向用户提供一个欺诈群组在不同时间段上的行为，从而回答领域专家所提出的“一个组作为一个欺诈群组做了什么呢”，以及算法专家所提出的“同一个组的用户是否都有相同的行为习惯”。本申请从时间轴线上提供一种可视的群组数据可视化系统。请参阅图13，其显示为本申请所提供的群组数据可视化系统的模块结构示意图。如图所示，所述群组数据可视化系统3包括获取模块31、处理模块32和显示模块33。The present application also provides a group data visualization system, which may include software and hardware in one or more computer devices. In order to provide users with the behavior of a fraudulent group in different time periods, so as to answer the question "what does a group do as a fraudulent group" proposed by domain experts, and "do users in the same group They all have the same behavior." This application provides a visual group data visualization system from the time axis. Please refer to FIG. 13 , which shows a schematic diagram of the module structure of the group data visualization system provided by the present application. As shown in the figure, the group data visualization system 3 includes an acquisition module 31 , a processing module 32 and a display module 33 .

其中，所述获取模块31用于获取一个群组的数据集。所述数据集中的数据特征至少包括事件类型及与所述事件类型相关联的时间信息。Wherein, the obtaining module 31 is used to obtain a data set of a group. The data features in the data set at least include event types and time information associated with the event types.

在某些实施例中，所述获取模块31获取由多个网络用户组成集群的操作日志；在不同的实施例中，所述集群是能够获取到的所有网络用户组成的一个集群，所述集群中的网络用户来自同一网站或者不同的网站，也或者来自不同的网络渠道，比如可以是因特网、一个或多个内部网、局域网(LAN)、广域网(WLAN)、存储局域网(SAN)等或其适当组合，也可以是移动电话的移动通信网络等。In some embodiments, the obtaining module 31 obtains the operation log of a cluster composed of multiple network users; in different embodiments, the cluster is a cluster composed of all network users that can be obtained, and the cluster The network users in the network come from the same website or different websites, or from different network channels, such as the Internet, one or more intranets, local area networks (LANs), wide area networks (WLANs), storage area networks (SANs), etc. An appropriate combination may also be a mobile communication network of a mobile phone or the like.

所述获取模块31将所获取的操作日志交由处理模块32，并由处理模块32从所述多个网络用户的操作日志中确定至少一个数据特征，并分析所述操作日志中至少一组数据特征的相似度以确定所述群组。在具体的实施例中，针对网络欺诈行为必然会在网络中留下用户使用数据的特点，群组数据可视化系统中收集来自至少一个网站的多个网络用户的操作日志，处理模块32通过分析所述操作日志中至少一个数据特征的相似度，对产生相应操作日志的用户进行分组，得到群组及群组在操作日志中的数据集。The acquisition module 31 hands over the acquired operation log to the processing module 32, and the processing module 32 determines at least one data feature from the operation logs of the plurality of network users, and analyzes at least one set of data in the operation log The similarity of features to determine the group. In a specific embodiment, in view of the fact that network fraud will inevitably leave user usage data in the network, the group data visualization system collects operation logs of multiple network users from at least one website, and the processing module 32 analyzes the collected According to the similarity of at least one data feature in the operation log, the users who generate the corresponding operation log are grouped to obtain the group and the data set of the group in the operation log.

在某些实施例中，位于一个群组的数据集中包括但不限用户信息、IP地址、事件类型、事件发起源、事件响应方，及事件发生时间中的至少二者数据特征。其中，所述用户信息如手机号码、邮箱、ID号、身份证号、性别、用户所使用的用户设备编号、注册时间等表征。其中，同一用户信息可对应至少一个事件类型，每个事件类型对应事件发起源、事件响应方和事件发生时间。所述事件特征包括但不限于：网络用户之间进行的关注、点赞、评论、馈赠(或者称之为送礼)等社交行为，或者网络用户进行登录、登出、更新状态、注册、修改信息等操作行为中的至少一者。例如，同一用户信息可对应多个点赞事件类型，每个点赞事件类型对应各自事件发起源、事件响应方和事件发生时间。In some embodiments, the data set in one group includes but not limited to at least two data characteristics of user information, IP address, event type, event source, event responder, and event occurrence time. Wherein, the user information is characterized by such as mobile phone number, email address, ID number, ID card number, gender, user equipment number used by the user, and registration time. Wherein, the same user information may correspond to at least one event type, and each event type corresponds to an event source, an event responder, and an event occurrence time. The event characteristics include but are not limited to: social behaviors such as attention, likes, comments, gifts (or called gifts) between network users, or network users logging in, logging out, updating status, registering, and modifying information at least one of other operations. For example, the same user information may correspond to multiple like event types, and each like event type corresponds to its own event source, event responder, and event occurrence time.

所述处理模块32可将所得到的各群组的数据集存放在数据库中。在一些实施例中，所述数据集可获取自一存放有各群组及其数据集的数据库，所述数据库例如配置于一远端的存储服务器上，或者配置于本地的计算机设备中的存储装置中，所获取模块31可基于用户的输入操作从数据库中提取而获取。例如，所述处理模块32利用无监督检测算法得到多个群组，用户通过选择界面选择其中一个群组，则获取相应群组的数据集。The processing module 32 can store the obtained data sets of each group in a database. In some embodiments, the data set can be obtained from a database that stores groups and their data sets. The database is, for example, configured on a remote storage server, or configured in a storage in a local computer device. In the device, the acquired module 31 can extract and acquire from the database based on the user's input operation. For example, the processing module 32 uses an unsupervised detection algorithm to obtain multiple groups, and the user selects one of the groups through the selection interface to obtain the data set of the corresponding group.

具体地，所述处理模块32先对操作日志中所有数据在同一类数据特征的相似度进行计算，其中，所述相似度可利用信息熵予以衡量，例如，所述处理模块32分别利用用户信息计算IP使用量或最大IP使用量维度的信息熵，利用事件类型计算操作类型维度的信息熵，利用注册时间维度的信息熵或者操作时间计算不良操作维度的信息熵等；藉由上述的计算，处理模块32再利用无监督检测方式对所得到的各信息熵进行检测并划分得到多个群组。其中，所述无监督检测方式举例包括采用基于稠密子图的算法、或者基于向量空间的算法等。本申请所提供的可视化方法所呈现的各群组用于反映欺诈事件所使用的共享资源、用户关系等，来让使用所述群组数据可视化系统3的用户更清晰地确定所述无监督检测算法中的分类策略是否合理。其中，所述共享资源包括但不限于共用的IP、邮箱等，用户关系包括但不限于：用户关注、交互关系等。Specifically, the processing module 32 first calculates the similarity of all data in the operation log in the same type of data features, wherein the similarity can be measured by information entropy, for example, the processing module 32 respectively uses user information Calculate the information entropy of the dimension of IP usage or maximum IP usage, use the event type to calculate the information entropy of the operation type dimension, use the information entropy of the registration time dimension or the operation time to calculate the information entropy of the bad operation dimension, etc.; through the above calculation, The processing module 32 then detects the obtained information entropy by an unsupervised detection method and divides them into multiple groups. Wherein, the unsupervised detection method includes, for example, an algorithm based on a dense subgraph, or an algorithm based on a vector space, and the like. Each group presented by the visualization method provided in this application is used to reflect the shared resources used by the fraudulent events, user relationships, etc., so that users who use the group data visualization system 3 can more clearly determine the unsupervised detection Whether the classification strategy in the algorithm is reasonable. Wherein, the shared resource includes but not limited to shared IP, mailbox, etc., and user relationship includes but not limited to: user attention, interaction relationship, etc.

在一实施例中，所述群组数据可视化系统3中的显示模块33显示至少一个群组界面，所述群组界面中的群组大小以显示的几何图形大小进行表征。请参阅图3，其显示为包含多个群组的界面，如图所示，界面中显示有11个群组，用来表征该些群组的几何图形为圆形，所述的11个群组皆位于一个最大虚线圆内，在所述虚线圆内，比如所述虚线圆用来表征一个有N个网络用户组成的集群，例如标号为0的群组例如为正常群组，在一个较小虚线圆内有标号为1-10的大小不同的10个群组，圆形的尺寸与群组的成员数量成正比，即，大的群组表示成员数量较多，小的群组表示成员数量较少，例如标号为1-10的群组为异常群组。在不同的实施例中，所述群组的所述几何图形可以是任意形状。几何图形的颜色可随机设置，或与群组的数量或群组的成员数量相关。例如，预设有N种颜色，所述处理模块32随机地将不同颜色编码到表征各群组的几何图形上，并通过显示模块33显示在显示设备上。又如，所述处理模块32根据预设的颜色顺序，按照成员数量由小到大的顺序依次编码成用于表征各群组的几何图形，并通过显示模块33显示在显示设备上。当用户操作所述显示界面而选中一个几何图形时，所述获取模块31获取一个群组的数据集。In an embodiment, the display module 33 in the group data visualization system 3 displays at least one group interface, and the group size in the group interface is characterized by the size of the displayed geometric figure. Please refer to Fig. 3, which is shown as an interface comprising multiple groups, as shown in the figure, there are 11 groups shown in the interface, and the geometric figure used to characterize these groups is a circle, and the 11 groups described Groups are located in a maximum dotted circle, within the dotted circle, for example, the dotted circle is used to represent a cluster composed of N network users, for example, the group marked 0 is a normal group, in a relatively There are 10 groups of different sizes labeled 1-10 inside the small dotted circle, and the size of the circle is proportional to the number of members in the group, that is, a larger group means more members, and a smaller group means members The number is small, for example, groups labeled 1-10 are abnormal groups. In various embodiments, the geometry of the group may be of arbitrary shape. The color of the geometry can be set randomly, or relative to the number of groups or the number of members of a group. For example, N colors are preset, and the processing module 32 randomly encodes different colors onto the geometric figures representing each group, and displays them on the display device through the display module 33 . As another example, the processing module 32 sequentially encodes the geometric figures used to represent each group according to the preset color sequence in descending order of the number of members, and displays them on the display device through the display module 33 . When the user selects a geometric figure by operating the display interface, the acquisition module 31 acquires a data set of a group.

在一个优选实施例中，所述显示模块33显示至少一个群组界面中还可以包括显示群组信息的信息栏，当用户选择所述群组界面中的一个群组时，在界面的一侧以视窗或者文本框的方式显示所述群组的基本信息，所述基本信息例如为：群组编码、成员数量、用于确定所述群组最优选的数据特征，群组属性(比如正常群组或异常群组)等信息。所述显示模块例如为包括显示器。In a preferred embodiment, the display module 33 displays at least one group interface and may also include an information bar displaying group information. When the user selects a group in the group interface, a Display the basic information of the group in the form of a window or a text box, such as: group code, number of members, data characteristics used to determine the most preferred group, group attributes (such as normal group group or exception group) and other information. The display module includes, for example, a display.

为了将群组数据可视化系统3对所获取的群组的数据集的分析结果以时间轴的方式描述，所述处理模块32用于创建第一时间轴及第二时间轴，以及对所述数据特征的编码。所述显示模块33通过显示设备在一个界面中显示第一、第二时间轴以及显示第一、第二、第三、及第四形状，其中，所述第一形状作为所述第一时间轴的节点以表征所述群组在所述第一时间轴的每一时间粒度内发生的事件类型及数量；所述第二形状表征所述第二时间轴的时间区间内发生的每种事件类型的总数量；所述第三形状表征所述第二形状中表征的事件类型在所述第二时间轴上的分布；所述第四形状表征所述群组在所述第二时间轴的每一时间粒度内发生的事件类型及数量。其中，所述显示设备可以是计算机设备所外接或集成的显示屏、显示屏的驱动程序、以及为处理显示数据而专门配置的呈现引擎；所述呈现引擎包括但不限于：图像处理芯片及运行在该图像处理芯片中的显示程序等。In order to describe the analysis results of the acquired group data sets by the group data visualization system 3 in a time axis manner, the processing module 32 is used to create a first time axis and a second time axis, and to analyze the data Encoding of features. The display module 33 displays the first and second time axes and the first, second, third and fourth shapes in an interface through a display device, wherein the first shape is used as the first time axis to represent the type and quantity of events occurring within each time granularity of the first time axis of the group; the second shape represents each type of event occurring within the time interval of the second time axis The total number of; the third shape characterizes the distribution of event types represented in the second shape on the second time axis; the fourth shape characterizes the distribution of the group on the second time axis The type and quantity of events that occur within a time granularity. Wherein, the display device may be an external or integrated display screen of a computer device, a driver program of the display screen, and a rendering engine specially configured for processing display data; the rendering engine includes but is not limited to: an image processing chip and a running The display program, etc. in the image processing chip.

其中，所述第一时间轴和第二时间轴是根据数据集中的时间信息而创建的，比如所述数据集中多个时间信息中时间跨度最大为10天，则第一时间轴或第二时间轴的最大时间区间为10天。在一个实施例中，按照相同的时间区间及时间粒度对创建第一时间轴及第二时间轴；在另一个实施例中，照不同的时间区间及时间粒度对创建第一时间轴及第二时间轴，容后详述。Wherein, the first time axis and the second time axis are created according to the time information in the data set, for example, the maximum time span in the multiple time information in the data set is 10 days, then the first time axis or the second time axis The maximum time interval for the axis is 10 days. In one embodiment, the first time axis and the second time axis are created according to the same time interval and time granularity pair; in another embodiment, the first time axis and the second time axis are created according to different time intervals and time granularity pairs Timeline, detailed later.

其中，所述处理模块32对待呈现的数据特征、事件类型、事件类型的数量等所有数据进行图案化编码，以便于所呈现出的界面美观、清晰。在此，处理模块32按照第一时间轴的时间粒度对数据集中的事件类型的数量进行统计，将所统计的事件类型编码成预设的第一形状的图形，并由显示模块33按照时序将所编码的各第一形状作为第一时间轴的节点呈现在第一时间轴上。通过第一时间轴上各节点的显示，领域专家能够清晰获得依据时间所统计的事件类型在分布上或数量上的变化过程。其中，所述第一形状包括但不限于：饼状形状、或柱状形状。在一些实施示例中，处理模块32可以将一时间粒度内的各事件类型的数量百分比的占比情况编码成第一形状的图形并由显示模块33显示在第一时间轴上，其中，同样事件类型的占比区域的颜色相同。请参阅图4，图4显示为本申请群组数据可视化系统在一个实施例中显示的示意图，如图所示，在显示的界面中，所述第一时间轴T1位于显示界面的下方区域，显示为自8月1日至8月10日10天的时间区间，以天为时间粒度，将每天所统计的事件类型的数量百分比分布编码成饼状图形并作为节点显示在第一时间轴T1上，所述饼状图形中的颜色用于代表事件类型，比如图中标为“黄”色的表示为关注事件，图中标为“红”色的表示为馈赠事件，图中标为“蓝”色的表示为点赞事件，比如图示中第一时间轴T1上以饼状图形作为节点显示的8月7日这天，产生的事件类型中关注事件占比较多，馈赠事件占比较少，点赞事件占比最少。Wherein, the processing module 32 performs pattern coding on all data such as data characteristics to be presented, event types, number of event types, etc., so that the presented interface is beautiful and clear. Here, the processing module 32 counts the number of event types in the data set according to the time granularity of the first time axis, encodes the counted event types into a graph of a preset first shape, and uses the display module 33 to time-series The encoded first shapes are presented on the first time axis as nodes of the first time axis. Through the display of each node on the first time axis, domain experts can clearly obtain the change process of the distribution or quantity of event types counted according to time. Wherein, the first shape includes but is not limited to: a pie shape, or a columnar shape. In some implementation examples, the processing module 32 can encode the ratio of the number percentage of each event type within a time granularity into a graph of the first shape and display it on the first time axis by the display module 33, wherein the same event The percentage fields of the types are the same color. Please refer to FIG. 4. FIG. 4 is a schematic diagram of an embodiment of the group data visualization system of the present application. As shown in the figure, in the displayed interface, the first time axis T1 is located in the lower area of the display interface. It is displayed as a 10-day time interval from August 1 to August 10, with days as the time granularity, encoding the percentage distribution of the number of event types counted each day into a pie graph and displayed as a node on the first time axis T1 Above, the colors in the pie graph are used to represent event types. For example, events marked with "yellow" in the figure represent attention events, events marked with "red" in the figure represent gift events, and events marked with "blue" in the figure For example, on the first time axis T1 in the illustration, the pie graph is used as a node to display August 7th. Among the event types generated, attention events accounted for a large proportion, and gift events accounted for a small proportion. Like events account for the least amount of events.

另外，所述处理模块32按照第二时间轴的时间区间对数据集中的事件类型的数量进行加和，将所累加的各事件类型编码成预设的第二形状的图形，并通过显示模块33显示在第二时间轴的一时间区间内各事件类型的总数量。其中，所述第二形状包括但不限于：直方图、柱状图、折线图等。根据所创建第二时间轴的时间区间，所显示的各种事件类型的总数量反映了在同一时间区间内各事件类型在数量上的对比情况。当所述第二时间轴的时间区间表示一天或一周时，用户可根据所显示的对应“红”、“黄”和“蓝”三种事件类型的总数量的柱状形状的长短来确定三种事件类型在总数量上的对比情况。此外，所显示的柱状图形还可以依据粗细、透明度等来确定该三种事件类型在总数量上的对比情况。再请参阅图3，如图所示，临近所述第一时间轴T1的一侧(图示中的右侧)显示有一个呈横向直方图，所述直方图中自上而下显示有“红”、“黄”和“蓝”三个柱状条，柱状条的长度代表所述第二时间轴的时间区间内产生事件类型的总数量，可以从该第二形状中看出，在第二时间轴的时间区间内产生事件类型中标为“黄”色的柱状条表示的关注事件最多，标为“红”色的柱状条表示的馈赠事件次之，标为“蓝”色的柱状条表示的点赞事件最少。In addition, the processing module 32 sums the number of event types in the data set according to the time interval of the second time axis, encodes the accumulated event types into a graph of a preset second shape, and displays it through the display module 33 Displays the total number of each event type in a time interval of the second time axis. Wherein, the second shape includes but is not limited to: a histogram, a histogram, a line graph, and the like. According to the time interval of the created second time axis, the displayed total quantity of various event types reflects the comparison of the quantity of each event type in the same time interval. When the time interval of the second time axis represents a day or a week, the user can determine three How event types compare in total volume. In addition, the displayed column graph can also determine the comparison of the total number of the three event types according to thickness, transparency, and the like. Referring to Fig. 3 again, as shown in the figure, a horizontal histogram is displayed on the side (the right side in the illustration) near the first time axis T1, and " Red", "yellow" and "blue" three columnar bars, the length of the columnar bar represents the total number of event types generated in the time interval of the second time axis, as can be seen from the second shape, in the second In the time interval of the time axis, the "yellow" colored columnar bars represent the most attention events, followed by the "red" colored columnar bars, and the "blue" colored columnar bars indicate has the fewest likes.

通过显示第二时间轴的时间区间内所发生的各事件类型的总数量，领域专家能够从另一视角清晰获得依据时间所统计的事件类型在数量上的变化过程。为了更清晰的显示第一时间轴和第二时间轴之间的关联关系，所述处理模块32基于对所述数据特征的编码，并由显示模块33显示第二时间轴，处理模块32再将所述第二形状中表征的事件类型与该事件类型在所述第二时间轴的各时间粒度进行关联，并通过显示模块33将第三形状表征的各事件类型在所述第二时间轴上的分布予以显示。其中，将第二时间轴呈现成以相应的时间粒度为节点的轴线，利用第三形状将分布在各相邻节点的事件类型与第二形状进行关联，使得用户清晰地获得第二形状与第二时间轴的各时间粒度之间的关联关系。其中，第三形状可以是线条状其颜色可依据第二形状中对应事件类型的颜色而定，以便于让用户清晰分辨统一的事件类型。By displaying the total quantity of each event type occurring in the time interval of the second time axis, domain experts can clearly obtain the change process of the event type statistics based on time from another perspective. In order to more clearly show the relationship between the first time axis and the second time axis, the processing module 32 is based on the encoding of the data features, and the display module 33 displays the second time axis, and the processing module 32 then The event types represented in the second shape are associated with the time granularities of the event types on the second time axis, and the event types represented by the third shape are displayed on the second time axis through the display module 33 distribution is displayed. Among them, the second time axis is presented as an axis with corresponding time granularity as nodes, and the third shape is used to associate the event types distributed in each adjacent node with the second shape, so that the user can clearly obtain the relationship between the second shape and the second shape. The relationship between the time granularities of the two time axes. Wherein, the third shape may be a line shape, and its color may be determined according to the color of the corresponding event type in the second shape, so as to allow the user to clearly distinguish a unified event type.

为了更直观地显示第二时间轴上各时间粒度间隔内所发生的事件类型及数量，所述显示模块33还在处理模块32的控制下显示第四形状，以表征所述群组在所述第二时间轴的每一时间粒度内发生的事件类型及数量。其中，所述处理模块32按照第二时间轴的时间区间对数据集中的事件类型的数量进行加和或分布统计，将所累加的事件类型或分布情况编码成预设的第四形状的图形，并由显示模块33按照时序将所编码的各第四形状作为第二时间轴的节点呈现在第二时间轴上。其中，所述处理模块32根据所创建的第二时间轴的时间粒度在第三形状的指引下，控制显示模块33显示对应的第四形状。通过第二时间轴上各节点的显示，用户能够从另一视角清晰获得依据时间所统计的事件类型在数量上的变化过程。其中，所述第四形状包括但不限于：饼状形状、或柱状形状，且选择不同于第一形状的形状。在一些实施示例中，处理模块32可以将第二时间轴的时间粒度内的各事件类型的数量累加和分别编码成第四形状的图形并由显示模块33显示在第二时间轴上，其中，同样事件类型的累加和采用与第三形状及第二形状相同的颜色。In order to more intuitively display the type and quantity of events occurring in each time granularity interval on the second time axis, the display module 33 also displays a fourth shape under the control of the processing module 32 to represent the group in the The type and quantity of events occurring in each time granularity of the second time axis. Wherein, the processing module 32 performs summation or distribution statistics on the number of event types in the data set according to the time interval of the second time axis, and encodes the accumulated event types or distribution into a graph of a preset fourth shape, And the display module 33 presents the encoded fourth shapes as nodes of the second time axis on the second time axis according to time sequence. Wherein, the processing module 32 controls the display module 33 to display the corresponding fourth shape under the guidance of the third shape according to the created time granularity of the second time axis. Through the display of each node on the second time axis, the user can clearly obtain the change process of the number of event types counted according to time from another perspective. Wherein, the fourth shape includes but is not limited to: a pie shape or a columnar shape, and a shape different from the first shape is selected. In some implementation examples, the processing module 32 may accumulate and respectively encode the quantities of each event type within the time granularity of the second time axis into a graph of a fourth shape and display it on the second time axis by the display module 33, wherein, The cumulative sum of the same event type takes the same color as the third shape and the second shape.

以时间轴作为呈现群组数据的方式之一，是因为无论是领域专家还是算法专家，理解用户在一个段时间内的集中性的行为是非常关键的。为此，需要将第一时间轴和第二时间轴的结合来描述这种集中性的行为。The timeline is used as one of the ways to present group data, because it is very important to understand the concentrated behavior of users in a period of time, whether it is a domain expert or an algorithm expert. For this reason, the combination of the first time axis and the second time axis is needed to describe this centralized behavior.

请参阅图3，如图所示，第一时间轴T1中的每个饼状图都呈现了每个时间粒度的(如每天)不同事件类型(如关注了一个用户或给某个用户在网上送了一个礼物)所占的比例。处理模块32将各事件类型编码为不同颜色，将第一时间轴T1的单位时间粒度内各事件类型的数量编码为饼图中各区域的面积占比以形成一个饼图，将第二时间轴T2的时间区间内各事件类型的数量编码为柱状图形的长度以形成对应各事件类型的柱状图(即第二形状)，将第二时间轴T2的单位时间粒度内各事件类型数量编码为柱状图的长度以形成单独的柱状图(即第四形状)；当用户选择第一时间轴T1上的一个饼图时，自对应各事件类型的第二形状射出以事件类型为颜色的弧线(即第三形状)，并对应到第二时间轴T2上对应时间粒度的各第四形状上，由此将一个群组数据集中各事件类型的时间轴关系清晰地呈现给用户。Please refer to Figure 3, as shown in the figure, each pie chart in the first time axis T1 presents different event types (such as following a user or giving a user online sent a gift). The processing module 32 encodes each event type into a different color, encodes the quantity of each event type in the unit time granularity of the first time axis T1 into the area ratio of each region in the pie chart to form a pie chart, and encodes the second time axis The number of each event type in the time interval of T2 is coded as the length of the histogram to form a histogram corresponding to each event type (that is, the second shape), and the number of each event type in the unit time granularity of the second time axis T2 is coded as a column The length of the figure to form a separate histogram (that is, the fourth shape); when the user selects a pie chart on the first time axis T1, an arc (colored by the event type) is emitted from the second shape corresponding to each event type ( That is, the third shape), and correspond to the fourth shapes corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of each event type in a group data set to the user.

在一种实施方式中，按照相同的时间区间及时间粒度对创建第一时间轴及第二时间轴。例如，所述处理模块32预先加载时间粒度均相同的第一时间轴和第二时间轴，以供所述显示模块33按照数据集中的时间信息和时间粒度将各事件类型对应到各时间轴上，以得到各自时间轴的至少一个时间区间。又如，所述处理模块32根据数据集中的时间信息的排序，确定预先设定的第一时间轴和第二时间轴的时间区间，并由显示模块33按照数据集中的时间信息和时间粒度将各事件类型对应到各时间轴上。请参阅图3显示的包含第一时间轴T1和第二时间轴T2的界面。其中，T1和T2时间轴均以天为时间粒度，均以10天为时间区间，所述处理模块32按照数据集中的时间信息在所述第一时间轴T1和第二时间轴T2上显示数据集中的数据特征。例如以图3所示的，第二时间轴T2以天为时间粒度，处理模块32可将每天所统计的每种事件类型的总和布编码成柱状图形并作为节点由显示模块33显示在第二时间轴T2上。In one embodiment, the first time axis and the second time axis are created according to the same time interval and time granularity. For example, the processing module 32 preloads the first time axis and the second time axis with the same time granularity, so that the display module 33 can map each event type to each time axis according to the time information and time granularity in the data set , to get at least one time interval of the respective time axis. As another example, the processing module 32 determines the preset time intervals of the first time axis and the second time axis according to the sorting of the time information in the data set, and the display module 33 displays the time interval according to the time information and time granularity in the data set. Each event type corresponds to each time axis. Please refer to the interface including the first time axis T1 and the second time axis T2 shown in FIG. 3 . Wherein, the T1 and T2 time axes both use days as the time granularity, and both use 10 days as the time interval, and the processing module 32 displays data on the first time axis T1 and the second time axis T2 according to the time information in the data set Centralized data characteristics. For example, as shown in FIG. 3 , the second time axis T2 takes days as the time granularity, and the processing module 32 can encode the sum of each event type counted every day into a columnar graph and display it as a node on the second time axis T2 by the display module 33. on the time axis T2.

在另一实施示例中，按照不同的时间区间及时间粒度对创建第一时间轴及第二时间轴。其中，所述第二时间轴的时间区间为所述第一时间轴的时间粒度。例如，处理模块32预先设定第一时间轴和第二时间轴的时间粒度不同，以及预设两个时间轴之间时间粒度之间的对应关系，所述显示模块33按照数据集中的时间信息将各事件类型对应到各时间轴上。请参阅图5，其显示为包含第一时间轴T1和第二时间轴T2的界面。其中，T1时间轴以10天为时间区间，以天为时间粒度，T2时间轴以天为时间区间，以小时为时间粒度；所述显示模块33按照数据集中的时间信息在所述第一时间轴T1和第二时间轴T2上显示数据集中的数据特征。例如以图5所示的界面C2，第二时间轴T2以小时为时间粒度，处理模块32可将每小时所统计的事件类型的总和编码成柱状图形并作为节点由显示模块33显示在第二时间轴T2上。In another implementation example, the first time axis and the second time axis are created according to different time intervals and time granularity pairs. Wherein, the time interval of the second time axis is the time granularity of the first time axis. For example, the processing module 32 presets that the time granularity of the first time axis and the second time axis are different, and presets the corresponding relationship between the time granularities between the two time axes, and the display module 33 according to the time information in the data set Correspond each event type to each time axis. Please refer to FIG. 5 , which shows an interface including a first time axis T1 and a second time axis T2 . Wherein, the T1 time axis takes 10 days as the time interval, with days as the time granularity, and the T2 time axis takes days as the time interval, and takes hours as the time granularity; Data features in the data set are displayed on the axis T1 and the second time axis T2. For example, with the interface C2 shown in FIG. 5 , the second time axis T2 takes hours as the time granularity, and the processing module 32 can encode the sum of the event types counted every hour into a columnar graph and display it as a node by the display module 33 on the second time axis. on the time axis T2.

处理模块32将第一时间轴T1的时间粒度内各事件类型的数量编码为饼图中各区域的面积占比以形成一个饼图；当用户选择第一时间轴T1上的一个饼图时，将第二时间轴T2的时间区间内各事件类型(相当于所选择的饼图所对应的各事件类型)的数量编码为柱状图形的长度以形成对应各事件类型的柱状图(即第二形状)，将第二时间轴T2的单位时间粒度内各事件类型数量编码为柱状图的长度以形成单独的柱状图(即第四形状)，以及自对应各事件类型的第二形状射出以事件类型为颜色的弧线(即第三形状)，并对应到第二时间轴T2上对应时间粒度的各第四形状上，由此将一个群组数据集中各事件类型的时间轴关系清晰地呈现给用户。The processing module 32 encodes the quantity of each event type in the time granularity of the first time axis T1 as the area ratio of each region in the pie chart to form a pie chart; when the user selects a pie chart on the first time axis T1, The number of each event type (equivalent to each event type corresponding to the selected pie chart) in the time interval of the second time axis T2 is encoded as the length of the histogram to form a histogram corresponding to each event type (that is, the second shape ), the number of each event type in the unit time granularity of the second time axis T2 is encoded as the length of the histogram to form a separate histogram (that is, the fourth shape), and the event type is emitted from the second shape corresponding to each event type is an arc of color (that is, the third shape), and corresponds to each fourth shape corresponding to the time granularity on the second time axis T2, thereby clearly presenting the time axis relationship of each event type in a group data set to user.

需要特别说明的是，上述的第一时间轴和第二时间轴的时间区间及时间粒度并不受限于所举例的情况，在不同的实施例中，用户可依据实际的情况设置第一时间轴和第二时间轴的时间区间及时间粒度，比如为周、月、季度甚至年等时间单位。It should be noted that the above-mentioned time intervals and time granularity of the first time axis and the second time axis are not limited to the examples given. In different embodiments, the user can set the first time according to the actual situation. The time interval and time granularity of the first time axis and the second time axis, such as time units such as weeks, months, quarters, or even years.

用户可利用该呈现过程和所展示的统计情况对群组数据可视化系统所分类的群组进行检测并利用该可视化的界面让领域专家发现或纠正检测算法中的不足。此外，为了更清晰地显示两时间轴的关联关系，所述群组数据可视化系统还包括第一检测模块(未予图示)。检测到用户基于所述第一检测模块选择所述第一形状时，通过所述第三形状动态、高亮、或动态且高亮地显示所述第一形状表征的时间粒度内发生的事件类型在所述第二时间轴的分布。例如，在图4所示的界面C1中，当用户选中第一时间轴T1上的一个饼图时，与在第二时间轴T2上对应所选中饼图的柱状图相连的各第三形状闪烁数秒或者更长时间的闪烁，也或者高亮显示，当用户选中第一时间轴T1上另一个饼图时，此前闪烁及高亮的第三形状恢复初始形状和颜色，且与在第二时间轴T2上对应所选中饼图的柱状图相连的各第三形状闪烁数秒且高亮显示。Users can use the presentation process and displayed statistics to detect the groups classified by the group data visualization system, and use the visualized interface to allow domain experts to discover or correct deficiencies in the detection algorithm. In addition, in order to display the relationship between the two time axes more clearly, the group data visualization system further includes a first detection module (not shown). When it is detected that the user selects the first shape based on the first detection module, the event type occurring within the time granularity represented by the first shape is displayed dynamically, highlighted, or dynamically and highlighted by the third shape Distribution on the second time axis. For example, in the interface C1 shown in FIG. 4, when the user selects a pie chart on the first time axis T1, each third shape connected to the histogram corresponding to the selected pie chart on the second time axis T2 flashes Flashing or highlighting for a few seconds or longer, when the user selects another pie chart on the first time axis T1, the previously flashing and highlighted third shape restores its original shape and color, and is the same as the pie chart at the second time Each third shape connected to the histogram corresponding to the selected pie chart on the axis T2 blinks for a few seconds and is highlighted.

在某些实施例中，所述群组数据可视化系统还包括第二检测模块。当检测到用户基于所述第二检测模块选择所述第一形状时，所述显示模块33在所述第一形状被选择时显示放大，以便用户更清晰地查看第一形状所表征的事件类型数量的对比情况。在一种具体示例中，所述第一形状在被选择时在所述第一时间轴的一侧放大显示。例如，所选中的第一形状在第一时间轴上侧放大显示，呈如图6a所示的界面C3。在另一具体示例中，所述第一形状在被选择时在所述第一时间轴中放大显示。例如，所选中的第一形状在第一时间轴的同一圆心位置被放大显示，呈如图6b所示的界面C4。In some embodiments, the group data visualization system further includes a second detection module. When it is detected that the user selects the first shape based on the second detection module, the display module 33 displays an enlarged display when the first shape is selected, so that the user can more clearly view the event type represented by the first shape Quantitative comparisons. In a specific example, when the first shape is selected, it is displayed enlarged on one side of the first time axis. For example, the selected first shape is enlarged and displayed on the upper side of the first time axis, presenting an interface C3 as shown in FIG. 6 a . In another specific example, when the first shape is selected, it is displayed enlarged in the first time axis. For example, the selected first shape is enlarged and displayed at the same center position of the first time axis, presenting an interface C4 as shown in FIG. 6b.

在一些实施例中，用户不仅关心群组数据集中各事件类型依时间轴所呈现的变化情况，更关心所分配的群组是否合理，这需要用户能够查看每个群组中的详细数据特征及用于分类群组而构建的各数据特征的优选次序。所述显示模块33还用于显示一个群组的数据集的界面。所显示的数据集以列表方式予以显示，由此为用户显示同一群组中数据特征的详细信息。为提高所述群组数据集分类准确性，所述界面中所显示的列表可依据群组数据可视化系统分类时所依据的分类优先级将一个群组中的数据特征列表逐列展示。例如，请参阅图6，其显示为一个群组的数据集的列表界面示意图。在所述列表界面示意图中，所显示的一个群组的数据集是按照数据特征的相似性为优先级由高到低的顺序排序而得的。当第一优先级中的数据特征相似性相同时，按照第二优先级的数据特征进行排序，在图7所示的实施例中，所述优先级由高向低的顺序为：IP地址、事件发起源(source)、事件响应方(target)、事件类型(event_type)及事件发生时间(timestamp)。在本实施例中，处理模块32将表格的抬头用不同列的重要性进行编码，如果一个特征的取值越集中，那么这个特征就越重要。在本申请提供的一实施例中，所述群组数据可视化系统是通过计算每个特征的信息熵来代表这一特性。如果信息熵越低，那么意味着一致性就越高。然后所述处理模块32将特征按照信息熵递增的顺序进行排序，最终由显示模块33将低信息熵的列表头顺序靠前来提示户的注意，当然，不同的实施情况下，还可以依据将显示的表格中的列表头进行颜色渲染，比如最终将低信息熵的列表头的颜色渲染为最深来提示户的注意该列所表征的数据特征最为重要，以此类推进行颜色渲染该列所表征的其他数据特征，进而得到图中所示的数据集列表界面。该列表界面可承接在显示多个群组界面或时间轴显示界面之后，再或者基于用户选择该列表界面的选择操作而显示。In some embodiments, the user not only cares about the changes of each event type in the group data set according to the time axis, but also cares about whether the assigned group is reasonable, which requires the user to be able to view the detailed data characteristics and The preferred order of each data feature constructed for classifying groups. The display module 33 is also used for displaying a data set interface of a group. The displayed datasets are displayed in a tabular manner, thereby presenting the user with detailed information on the characteristics of the data in the same group. In order to improve the classification accuracy of the group data sets, the list displayed in the interface can display a list of data features in a group column by column according to the classification priority used by the group data visualization system for classification. For example, please refer to FIG. 6, which shows a schematic diagram of a list interface of a group of data sets. In the schematic diagram of the list interface, the displayed data sets of a group are sorted according to the similarity of data features in order of priority from high to low. When the similarity of the data features in the first priority is the same, sort according to the data features of the second priority. In the embodiment shown in FIG. 7, the order of the priority from high to low is: IP address, Event source (source), event responder (target), event type (event_type) and event time (timestamp). In this embodiment, the processing module 32 encodes the header of the table with the importance of different columns, and if the value of a feature is more concentrated, then this feature is more important. In an embodiment provided in the present application, the group data visualization system represents this feature by calculating the information entropy of each feature. The lower the information entropy, the higher the consistency. Then the processing module 32 sorts the features according to the increasing order of information entropy, and finally the display module 33 puts the list headers with low information entropy in front to prompt the user's attention. Of course, in different implementation situations, it can also be based on the The column headers in the displayed table are color-rendered. For example, the color of the column headers with low information entropy is finally rendered to the deepest to remind users that the data characteristics represented by this column are the most important, and so on. Other data features, and then get the data set list interface shown in the figure. The list interface may be displayed after displaying multiple group interfaces or timeline display interfaces, or based on a selection operation of the user selecting the list interface.

在某些实施例中，为更进一步表征所获取的群组的数据集是否能够反映欺诈事件的特性，还需要从其他维度进行展示。例如，通过比对正常用户的网络操作数据和群组数据集来进一步确认所检测的欺诈事件的准确性。为此，所述显示模块33还用于显示所述群组的数据集的特征分布的界面，所述特征分布的直方图及对应所述直方图在整个集群直方图中的分布对比图。其中，所述特征分布界面可展示以各数据类型在整体网络中的分布，所述的整体网络是相对的，比如由多个网络用户组成一个集群，则可以通过界面显示该集群中某一个群组中的某一个数据特征的分布，请参阅图2，比如图2中最大虚线圆表示一个由多个网络用户组成集群，该集群中有11个群组，分别是编号为0-10的群组，从中选择一个群组进行信息展示。In some embodiments, in order to further characterize whether the acquired data set of the group can reflect the characteristics of the fraud event, it needs to be displayed from other dimensions. For example, the accuracy of detected fraudulent events can be further confirmed by comparing normal users' network operation data with group data sets. To this end, the display module 33 is further configured to display an interface of the feature distribution of the data set of the group, a histogram of the feature distribution and a distribution comparison chart corresponding to the histogram in the whole cluster histogram. Wherein, the feature distribution interface can display the distribution of each data type in the overall network. The overall network is relative. For example, if a cluster is formed by multiple network users, a certain group in the cluster can be displayed through the interface. For the distribution of a data characteristic in a group, please refer to Figure 2. For example, the largest dotted circle in Figure 2 represents a cluster composed of multiple network users. There are 11 groups in this cluster, which are groups numbered 0-10 Group, select a group for information display.

在一些实施例中，特征分布界面可展示的数据类型例如为：平均操作时间间隔维度的信息熵(average operation interval entropy)，IP地址使用量维度的信息熵(IPused amount entropy)，性别维度的信息熵(sex entropy)，电子邮件维度的信息熵(emailentropy)，注册时间维度的信息熵(reg time entropy)，操作次数维度的信息熵(operation times entropy)，设备数量维度的信息熵(device amount entropy)，操作类型维度的信息熵(operation type entropy)，所使用IP被他人使用的最大量的信息熵(maxIP used be used amount entropy)等等。在图7所示的实施例中，以注册时间维度的信息熵为数据特征为例进行展示，即图7显示为一个群组中注册时间维度的信息熵(注册时段)在网络集群中的特征分布。为了有效比对所获取的群组数据集与正常用户的网络操作数据的特征分布差异，如图8所示，处理模块32执行以下步骤以得到用于显示特征分布直方图及对应所述直方图在整个集群直方图中的分布对比图的数据，进而由显示模块33予以显示。In some embodiments, the data types that can be displayed on the feature distribution interface are, for example: information entropy in the dimension of average operation interval (average operation interval entropy), information entropy in the dimension of IP address usage (IPused amount entropy), information in the dimension of gender Entropy (sex entropy), information entropy of email dimension (emailentropy), information entropy of registration time dimension (reg time entropy), information entropy of operation times dimension (operation times entropy), information entropy of device quantity dimension (device amount entropy) ), the information entropy of the operation type dimension (operation type entropy), the maximum amount of information entropy of the IP used by others (maxIP used be used amount entropy), etc. In the embodiment shown in Figure 7, the information entropy of the registration time dimension is used as an example to show the data characteristics, that is, Figure 7 shows the characteristics of the information entropy (registration period) of the registration time dimension in a group in the network cluster distributed. In order to effectively compare the feature distribution difference between the acquired group data set and the network operation data of normal users, as shown in FIG. The data of the distribution comparison chart in the whole cluster histogram is further displayed by the display module 33 .

在一些实施例中，显示模块33还可以通过对直方图进行颜色渲染以区分或强调某个数据特征在所述群组及整个集群中特征分布，或者动态显示(比如闪烁的方式)以区分或强调某个数据特征在所述群组及整个集群中特征分布。In some embodiments, the display module 33 can also render the histogram to distinguish or emphasize the characteristic distribution of a certain data feature in the group and the entire cluster, or dynamically display (such as in a blinking manner) to distinguish or emphasize Emphasize the characteristic distribution of a data characteristic in the group and the whole cluster.

在一些实施例中，为了进一步分析一个网络集群中的多个群组之间的差异，所述显示模块33还显示多个群组的数据集的特征分布的界面，请参阅图10及图11，图10显示为本申请在一个实施例中显示多个群组在集群中分布的步骤，图11显示为本申请在一个实施例中显示多个群组在集群中分布界面E，如图所示，所述处理模块32按照图10所示的步骤执行，显示模块33显示图11所示的界面。In some embodiments, in order to further analyze the differences among multiple groups in a network cluster, the display module 33 also displays an interface for feature distribution of data sets of multiple groups, please refer to FIG. 10 and FIG. 11 , Fig. 10 shows the steps of the present application displaying the distribution of multiple groups in the cluster in one embodiment, and Fig. 11 shows the interface E of the present application displaying the distribution of multiple groups in the cluster in one embodiment, as shown in the figure As shown, the processing module 32 is executed according to the steps shown in FIG. 10 , and the display module 33 displays the interface shown in FIG. 11 .

在步骤S314中，输出显示界面，在所述界面中，用形状、图标、和/或标签表征网络用户，用不同颜色表征所述多个群组的不同，用显示的距离表征每一群组中两个网络用户之间的相似程度。在本实施例中，呈如图11所示的界面E，用圆点表征网络用户，“绿”色表示标号为0的群组，用“红”色表示标号为1的群组，用“蓝”色表示标号为2的群组，其中，用“蓝”色表示标号为2的群组中的用户距离比较短，该群组成簇状分布，用“红”色表示标号为1的群组中的用户距离也比较短，该群组成簇状分布，用“绿”色表示表示随机抽样的正常用户的分布，正常用户之间的距离较远，分布更为分散。藉此可以认为，一个群组如果是稠密的一簇，其被认为是一个欺诈组的可能性越大。比如图11所示的实施例中，该用“绿”色表示的群组呈较为分散的分布，则表示为该“绿”色群组为正常群组，其中的“绿”点表示的用户也为正常用户。相反的，用“红”色表示的群组(即标号为1的群组)以及用“蓝”色表示的群组(即标号为2的群组)呈成簇状分布，则表示为该该“红”色及“蓝”色群组为异常群组，其中，用“红”点及“蓝”点表示的用户为异常用户。在一实施例中，使用所述可视化系统的用户可交互式地通过鼠标悬浮来查看每个群组中用户的具体信息及特征取值。In step S314, a display interface is output, in which network users are represented by shapes, icons, and/or labels, differences in the multiple groups are represented by different colors, and each group is represented by a displayed distance The degree of similarity between two network users in . In this embodiment, an interface E as shown in FIG. 11 is shown, and network users are represented by dots, "green" color represents the group marked 0, "red" color represents the group marked 1, and " The color "blue" indicates the group labeled 2, among which, the color "blue" indicates that the distance between users in the group labeled 2 is relatively short, and the group is distributed in clusters, and the color "red" indicates the group labeled 1 The distance between users in the group is also relatively short, and the group forms a cluster distribution. The "green" color represents the distribution of randomly sampled normal users. The distance between normal users is farther, and the distribution is more scattered. From this, it can be considered that if a group is a dense cluster, it is more likely to be considered a fraudulent group. For example, in the embodiment shown in Figure 11, the group represented by the "green" color is relatively scattered, and it means that the "green" color group is a normal group, and the users represented by the "green" point Also for normal users. On the contrary, the group represented by "red" color (that is, the group marked 1) and the group represented by "blue" color (that is, the group marked 2) are distributed in clusters, which means that The "red" and "blue" groups are abnormal groups, and users represented by "red" and "blue" dots are abnormal users. In an embodiment, the user using the visualization system can interactively check the specific information and feature values of the users in each group by hovering over the mouse.

在其他的实施例中，在输出的界面中，也可以用例如为形状、图标、和/或标签表征网络用户，比如形状为三角形、矩形等几何图形，比如图标为笑脸或哭脸来表征，比如标签用文字或者具有明确区分的符号等。In other embodiments, in the output interface, network users can also be characterized by, for example, shapes, icons, and/or labels, such as shapes such as triangles, rectangles and other geometric figures, such as icons that are characterized by smiling faces or crying faces, For example, the label uses words or symbols with clear distinctions.

需要说明的是，所述群组数据可视化系统中的所有模块可被配置在单一计算机设备上。或所述群组数据可视化系统中的各模块被分别配置在用户侧的客户端以及网络侧的服务器上，且客户端与服务器网络连接。例如，群组数据可视化系统的获取模块和处理模块安装在服务器中，显示模块安装在客户端内，所述客户端基于发送请求以登录所述服务端，所述服务器基于所述客户端执行请求的操作向所述客户端运行所述群组数据可视化系统，并通过客户端显示相应界面。所述客户端包括但不限于：配置在用户终端的浏览器或专用客户端软件的界面、以及用于执行显示界面程序的硬件等。It should be noted that all the modules in the group data visualization system can be configured on a single computer device. Or the modules in the group data visualization system are respectively configured on the client side on the user side and the server on the network side, and the client side is connected to the server network. For example, the acquisition module and processing module of the group data visualization system are installed in the server, the display module is installed in the client, the client sends a request to log in to the server, and the server executes the request based on the client The operation runs the group data visualization system to the client, and displays a corresponding interface through the client. The client includes, but is not limited to: a browser configured on a user terminal or an interface of dedicated client software, and hardware for executing a display interface program.

还需要说明的是，通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到本申请的部分或全部可借助软件并结合必需的通用硬件平台来实现。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可包括其上存储有机器可执行指令的一个或多个机器可读介质，这些指令在由诸如计算机、计算机网络或其他电子设备等一个或多个机器执行时可使得该一个或多个机器根据本申请的实施例来执行操作。机器可读介质可包括，但不限于，软盘、光盘、CD-ROM(紧致盘-只读存储器)、磁光盘、ROM(只读存储器)、RAM(随机存取存储器)、EPROM(可擦除可编程只读存储器)、EEPROM(电可擦除可编程只读存储器)、磁卡或光卡、闪存、或适于存储机器可执行指令的其他类型的介质/机器可读介质。It should also be noted that, through the description of the above embodiments, those skilled in the art can clearly understand that part or all of the present application can be implemented by means of software combined with a necessary general hardware platform. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art may be embodied in the form of a software product, and the computer software product may include one or more computer programs on which machine-executable instructions are stored. A machine-readable medium that, when executed by one or more machines, such as a computer, computer network, or other electronic device, causes the one or more machines to perform operations according to embodiments of the present application. Machine-readable media may include, but are not limited to, floppy disks, compact disks, CD-ROM (Compact Disk - Read Only Memory), magneto-optical disks, ROM (Read Only Memory), RAM (Random Access Memory), EPROM (Erasable non-programmable read-only memory), EEPROM (electrically erasable programmable read-only memory), magnetic or optical cards, flash memory, or other types of media/machine-readable media suitable for storing machine-executable instructions.

本申请可用于众多通用或专用的计算系统环境或配置中。例如：个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等。The application can be used in numerous general purpose or special purpose computing system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including Distributed computing environment of any of the above systems or devices, etc.

本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

需要说明的是，本领域技术人员可以理解，上述部分组件可以是可编程逻辑器件，包括：可编程阵列逻辑(Programmable Array Logic，PAL)、通用阵列逻辑(Generic ArrayLogic，GAL)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)、复杂可编程逻辑器件(Complex Programmable Logic Device，CPLD)中的一种或多种，本申请对此不做具体限制。It should be noted that those skilled in the art can understand that some of the above components can be programmable logic devices, including: Programmable Array Logic (Programmable Array Logic, PAL), General Array Logic (Generic Array Logic, GAL), Field Programmable Gate One or more of Array (Field-Programmable Gate Array, FPGA), Complex Programmable Logic Device (Complex Programmable Logic Device, CPLD), which is not specifically limited in this application.

综上所述，本申请通过将欺诈事件检测过程中所分群组的数据集基于时间轴、类型分布、分类列表等方式予以呈现，实现了将欺诈事件检测期间所分群组的数据特征以多种关系界面进行展示，有利于领域专家和算法专家对欺诈事件检测系统的检测算法进行评估和修订。In summary, this application presents the grouped data sets in the fraud event detection process based on the time axis, type distribution, classification list, etc., and realizes the data characteristics of the grouped data sets in the fraud event detection process. The display of multiple relationship interfaces is beneficial for domain experts and algorithm experts to evaluate and revise the detection algorithm of the fraud event detection system.

上述实施例仅例示性说明本申请的原理及其功效，而非用于限制本申请。任何熟悉此技术的人士皆可在不违背本申请的精神及范畴下，对上述实施例进行修饰或改变。因此，举凡所属技术领域中具有通常知识者在未脱离本申请所揭示的精神与技术思想下所完成的一切等效修饰或改变，仍应由本申请的权利要求所涵盖。The above-mentioned embodiments are only illustrative to illustrate the principles and effects of the present application, but are not intended to limit the present application. Any person familiar with the technology can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present application. Therefore, all equivalent modifications or changes made by those skilled in the art without departing from the spirit and technical ideas disclosed in the application shall still be covered by the claims of the application.

Claims

1. A group data visualization method, applied in a fraud event detection system, is characterized in that, comprising the following steps:

Obtaining a data set of a group, the data features in the data set include at least an event type and time information associated with the event type;

Create the first time axis and the second time axis;

Based on the encoding of the data features, displaying a first time axis with a first shape as a node to represent the type and quantity of events occurring in each time granularity of the group in the first time axis;

displaying a second shape to represent the total number of each event type occurring within the time interval of the second time axis;

Displaying the second time axis, associating the event types represented by the second shape with the time granularities of the event types on the second time axis, and each event type represented by the third shape on the second distribution on the time axis; and

A fourth shape is displayed to represent the type and quantity of events occurring in the group within each time granularity of the second time axis.

2. The group data visualization method according to claim 1, wherein the step of obtaining a data set of a group comprises:

Obtain the operation log of a cluster composed of multiple network users;

determining at least one data characteristic from operation logs of the plurality of network users, and analyzing the similarity of at least one set of data characteristics in the operation logs to determine the group; and

Get the dataset for the cohort.

3. The group data visualization method according to claim 1 or 2, further comprising the step of displaying at least one group interface, the group size in the group interface is characterized by the size of the displayed geometric figures .

4. The group data visualization method according to claim 1 or 2, further comprising the step of displaying an interface of a group data set, the data characteristics of the group data set include user information, IP The address, event type, event source, event responder, and event occurrence time are at least two data characteristics, and in the interface of the group data set, the group data sets are sorted and displayed after being grouped.

5. The group data visualization method according to claim 2, further comprising the step of displaying the interface of the feature distribution of the data set of the group:

selecting one of said groups, and determining at least one data characteristic from the data set of said group,

Statistical feature distribution of the determined at least one data feature in the groups and clusters; and

A histogram of the feature distribution and a distribution comparison diagram corresponding to the histogram in the whole cluster histogram are displayed.

6. The group data visualization method according to claim 1, further comprising the step of displaying the interface of the feature distribution of the data sets of a plurality of groups:

Determining a plurality of groups in a cluster composed of a plurality of network users, respectively using different shapes, icons, labels and/or colors to represent the differences of the plurality of groups;

determining at least one data characteristic from the data sets of the plurality of groups;

Analyzing the relative information entropy between every two network users in each of the groups based on the at least one data feature as a measure of the similarity between the every two network users; and

Outputting a display interface, in which network users are represented by shapes, icons, and/or labels, differences in the plurality of groups are represented by different colors, and two network users in each group are represented by displayed distances the degree of similarity between them.

7. The group data visualization method according to claim 1, further comprising the step of displaying and zooming in on the first shape when selected comprising:

The first shape is displayed enlarged on one side of the first time axis when selected; or

The first shape is displayed enlarged in the first time axis when selected.

8. The method for visualizing group data according to claim 1, wherein the event types include network users' concerns, likes, comments, gifts, logins, logouts, update status, registration, and modification information at least one.

9. The group data visualization method according to claim 1, wherein the step of creating the first time axis and the second time axis is to create the first time axis and the second time axis according to the same time interval and time granularity Two timelines.

10. The group data visualization method according to claim 1, wherein the step of creating the first time axis and the second time axis is to create the first time axis and the second time axis according to different time intervals and time granularities Two time axes, the time interval of the second time axis is the time granularity of the first time axis.

11. The group data visualization method according to claim 9 or 10, further comprising dynamically and/or highlighting the third shape when the first shape is selected. A distribution of event types occurring within the time granularity represented by a shape on the second time axis.

12. A computer device, characterized in that it comprises:

one or more processors; and

A presentation engine executed on the one or more processors, the presentation engine is used to execute the group data visualization method according to any one of claims 1-11.

13. A group data visualization system, comprising:

The acquisition module acquires a data set of a group through the network, and the data features in the data set include at least an event type and time information associated with the event type;

A processing module for creating a first time axis and a second time axis, and encoding the data features; and

A display module, displaying the first and second time axes and displaying the first, second, third, and fourth shapes in one interface through a display device, wherein the first shape is used as a node of the first time axis To represent the event type and quantity of the group occurring in each time granularity of the first time axis; the second shape represents the total number of each event type occurring in the time interval of the second time axis Quantity; the third shape characterizes the distribution of event types represented in the second shape on the second time axis; the fourth shape characterizes the group at each time of the second time axis The type and number of events that occur within the granularity.

14. The group data visualization system according to claim 13, wherein the group is the operation log of a plurality of network users obtained by the acquisition module, and the operation log is analyzed by the processing module Determined by the similarity of at least one set of data features in the

15. The group data visualization system according to claim 13, wherein the display module is also used to display at least one group interface, and the group size in the group interface is determined by the size of the displayed geometric figure. characterization.

16. The group data visualization system according to claim 13, wherein the display module is also used to display an interface of a data set of a group, and the data characteristics of the data set of the group include user information, The IP address, event type, event source, event responder, and event occurrence time are at least two data characteristics, and in the interface of the group data set, the group data sets are sorted and displayed after being grouped.

17. The group data visualization system according to claim 13, wherein the display module is also used to display the interface of the feature distribution of the data set of the group, the histogram of the feature distribution and the corresponding The distribution comparison chart of the above histogram in the whole cluster histogram.

18. The group data visualization system according to claim 13, wherein the display module is further used to display network users represented by shapes, icons, and/or labels, and the plurality of groups represented by different colors The difference between the two network users in each group is represented by the distance shown in the interface.

19. The group data visualization system according to claim 13, further comprising a detection module, when it is detected that the user selects the first shape based on the detection module, the first shape displayed in the display module A shape is displayed enlarged on one side of the first time axis; or the first shape displayed in the display module is displayed enlarged on the first time axis.

20. The group data visualization system according to claim 13, wherein the first time axis and the second time axis created by the processing module have the same time interval and time granularity.

21. The group data visualization system according to claim 13, wherein the first time axis and the second time axis created by the processing module are created according to different time intervals and time granularities. A second time axis, where the time interval of the second time axis is the time granularity of the first time axis.

22. The group data visualization system according to claim 20 or 21, further comprising a detection module, when it is detected that the user selects the first shape based on the detection module, the third shape dynamically and /or highlight the distribution of event types occurring within the time granularity represented by the first shape on the second time axis.

23. The group data visualization system according to claim 13, characterized in that, the event types include network users' concerns, likes, comments, gifts, logins, logouts, update status, registration, and modification information at least one.

24. A client, connected to a server through a network, characterized in that the client executes the group data visualization method described in any one of claims 1-11 based on sending a request to log in to the server A step of.

25. A server, connected to a client through a network, characterized in that, the server sends the group according to any one of claims 1-11 to the client based on the operation requested by the client. Group the process of data visualization method and display the execution result through the client.

26. A browser, connected to a server through a network, characterized in that the browser executes the group data visualization method described in any one of claims 1-11 based on sending a request to log in to the server A step of.

27. A computer-readable storage medium storing a data visualization computer program, characterized in that, when the data visualization computer program is executed, the steps of the group data visualization method according to any one of claims 1-11 are implemented.