CN118797629A

CN118797629A - User portrait construction method and device

Info

Publication number: CN118797629A
Application number: CN202410443072.7A
Authority: CN
Inventors: 武星宇; 苏昭玉; 刘佳; 谢懿; 胡俊; 杜雪涛; 陈敏时; 徐世权; 许勇; 张晨; 杜刚; 王郁含; 王倩; 于少中; 王�华; 郝明诗; 涂文峰
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Design Institute Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Design Institute Co Ltd
Priority date: 2024-04-12
Filing date: 2024-04-12
Publication date: 2024-10-18

Abstract

The application relates to the technical field of computers, and provides a user portrait construction method and device. The method comprises the following steps: obtaining target operation activity of a user according to operation time, operation frequency and sensitivity level of the user for executing various operation instructions on each system; calculating target weight of instruction class labels of various operation instructions executed by the user on each system; and constructing the portrait of the user according to the target operation activity and the target weight. The user portrait construction method and device provided by the application can construct the user portrait from two dimensions of the target operation activity of the user and the target weights of various operation instructions, so that the user operation behaviors can be more comprehensively reflected, the accuracy of user portrait construction is improved, and the sensitivity level of the operation instructions is considered in the construction process of the user portrait, so that the user portrait construction method and device are suitable for the user operation behavior analysis scene related to data security, and the accuracy of user portrait construction is further improved.

Description

User portrait construction method and device

技术领域Technical Field

本申请涉及计算机技术领域，具体涉及一种用户画像构建方法及装置。The present application relates to the field of computer technology, and in particular to a method and device for constructing a user portrait.

背景技术Background Art

随着各业务系统内海量业务数据的增加，各用户对业务系统的操作也愈加频繁，为了维护业务系统内的数据安全，需要对各用户的操作进行安全评估。通过对用户的操作进行画像构建，能够快速、有效地分析用户操作行为并进行安全评估。As the amount of business data in each business system increases, users are operating the business system more frequently. In order to maintain the data security in the business system, it is necessary to conduct security assessments on the operations of each user. By building a profile of the user's operation, it is possible to quickly and effectively analyze the user's operation behavior and conduct security assessments.

在企业营销或产品推广领域，一般采用RFM模型，即通过用户最近一次购买时间(Recency)、购买频率(Frequency)和购买金额(Monetary)对用户进行评估和分析，构建用户画像，据此进行用户分类和精准地营销推广。但该方法只考虑了用户交易的金额、频次和时间因素，未考虑用户不同操作行为的敏感级别，只能简单通过指标统计的方式对构建用户画像，在涉及数据安全的用户操作行为分析场景中，该方法构建的用户画像准确率较低。In the field of enterprise marketing or product promotion, the RFM model is generally used, that is, the user is evaluated and analyzed through the user's most recent purchase time (recency), purchase frequency (frequency) and purchase amount (monetary), and a user portrait is constructed, based on which user classification and precise marketing promotion are carried out. However, this method only considers the amount, frequency and time factors of user transactions, and does not consider the sensitivity level of different user operation behaviors. It can only simply construct user portraits through indicator statistics. In the user operation behavior analysis scenario involving data security, the user portrait constructed by this method has a low accuracy rate.

发明内容Summary of the invention

本申请实施例提供一种用户画像构建方法及装置，用以解决传统用户画像构建方法该方法只考虑了用户交易的金额、频次和时间因素，未考虑用户不同操作行为的敏感级别，在涉及数据安全的用户操作行为分析场景中，该方法构建的用户画像准确率较低的技术问题。The embodiments of the present application provide a method and device for constructing a user portrait, which are used to solve the technical problem that the traditional user portrait construction method only considers the amount, frequency and time factors of the user's transaction, but does not consider the sensitivity level of different user operation behaviors. In the user operation behavior analysis scenario involving data security, the user portrait constructed by this method has a low accuracy rate.

第一方面，本申请实施例提供一种用户画像构建方法，包括：In a first aspect, an embodiment of the present application provides a method for constructing a user portrait, comprising:

根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到所述用户的目标操作活跃度；According to the operation time, operation frequency and sensitivity level of the user's execution of various operation instructions on each system, the target operation activity of the user is obtained;

计算所述用户对各系统执行各类操作指令的指令类别标签的目标权重；Calculate the target weight of the instruction category label of each type of operation instruction executed by the user on each system;

根据所述目标操作活跃度和所述目标权重，构建所述用户的画像。A profile of the user is constructed according to the target operation activity and the target weight.

在一个实施例中，所述根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到所述用户的目标操作活跃度，包括：In one embodiment, obtaining the target operation activity of the user according to the operation time, operation frequency and sensitivity level of the user executing various operation instructions on each system includes:

根据多条目标操作日志，获取所述用户对任一系统执行任一类操作指令的操作时刻、操作频次和敏感级别；According to multiple target operation logs, obtain the operation time, operation frequency and sensitivity level of the user performing any type of operation instruction on any system;

对所述操作时刻进行无量纲转换，得到时刻转换值；Performing dimensionless conversion on the operation time to obtain a time conversion value;

对所述操作频次进行无量纲转换，得到频次转换值；Performing dimensionless conversion on the operation frequency to obtain a frequency conversion value;

对所述时刻转换值的最大值、所述频次转换值和所述敏感级别进行加权求和，得到所述用户对所述任一系统执行所述任一类操作指令的操作活跃度；Performing a weighted summation on the maximum value of the moment conversion value, the frequency conversion value, and the sensitivity level to obtain the operation activity of the user executing any type of operation instruction on any system;

计算所述用户对各系统执行各类操作指令的操作活跃度的平均值，得到所述用户的目标操作活跃度。The average value of the operation activity of the user executing various types of operation instructions on each system is calculated to obtain the target operation activity of the user.

在一个实施例中，所述计算所述用户对各系统执行各类操作指令的指令类别标签的目标权重，包括：In one embodiment, the calculating of the target weights of the instruction category labels of the user's execution of various types of operation instructions on each system includes:

计算多条目标操作日志中，所述用户对应的任一指令类别标签的出现次数与所述用户对应的各指令类别标签的出现次数总和的比值，得到第一比值；Calculating a ratio of the number of occurrences of any instruction category label corresponding to the user to the total number of occurrences of each instruction category label corresponding to the user in multiple target operation logs to obtain a first ratio;

计算所述多条目标操作日志中，各用户对应的各指令类别标签的出现次数总和与各用户对应的所述任一指令类别标签的出现次数总和的比值，得到第二比值；Calculating a ratio of a total number of occurrences of each instruction category label corresponding to each user in the plurality of target operation logs to a total number of occurrences of any instruction category label corresponding to each user to obtain a second ratio;

计算所述第一比值与所述第二比值的乘积，得到所述任一指令类别标签的初始权重；Calculating the product of the first ratio and the second ratio to obtain an initial weight of any instruction category label;

根据所述任一指令类别标签的初始权重和所属指令类别，得到所述任一指令类别标签的目标权重。According to the initial weight of any instruction category label and the instruction category to which it belongs, a target weight of any instruction category label is obtained.

在一个实施例中，所述根据所述任一指令类别标签的初始权重和所属指令类别，得到所述任一指令类别标签的目标权重，包括：In one embodiment, obtaining the target weight of any instruction category label according to the initial weight of any instruction category label and the instruction category to which it belongs includes:

若所述任一指令类别标签所属指令类别为运维部署类，则根据所述初始权重和目标时间差，得到所述任一指令类别标签的目标权重；所述目标时间差是当前时刻与最新目标操作日志中所述任一指令类别标签的出现时刻之间的时间差；If the instruction category to which any instruction category label belongs is the operation and maintenance deployment category, then the target weight of any instruction category label is obtained according to the initial weight and the target time difference; the target time difference is the time difference between the current time and the time when any instruction category label appears in the latest target operation log;

若所述任一指令类别标签所属指令类别为非涉敏查询、导出和登录类，则将所述初始权重确定为所述任一指令类别标签的目标权重；If the instruction category to which any instruction category label belongs is non-sensitive query, export and login, the initial weight is determined as the target weight of any instruction category label;

若所述任一指令类别标签所属指令类别为涉敏查询、导出和修改类，则根据所述初始权重和目标次数，得到所述任一指令类别标签的目标权重；所述目标次数是初始时刻与最新目标操作日志中的操作时刻之间的时段内，所述任一指令类别标签的出现次数。If the instruction category to which any instruction category label belongs is sensitive query, export and modification, then the target weight of any instruction category label is obtained based on the initial weight and the target number of times; the target number of times is the number of occurrences of any instruction category label in the period between the initial moment and the operation moment in the latest target operation log.

在一个实施例中，所述目标操作日志，是基于以下方式获取的：In one embodiment, the target operation log is obtained based on the following method:

若操作日志中存在敏感级别，则将所述操作日志确定为待处理操作日志；If there is a sensitivity level in the operation log, the operation log is determined as an operation log to be processed;

若所述操作日志中不存在敏感级别，则将所述操作日志输入敏感级别识别模型，得到所述敏感级别识别模型输出的敏感级别；If there is no sensitivity level in the operation log, inputting the operation log into a sensitivity level identification model to obtain a sensitivity level output by the sensitivity level identification model;

将所述敏感级别添加至所述操作日志中，得到待处理操作日志；Adding the sensitivity level to the operation log to obtain an operation log to be processed;

将所述待处理操作日志进行归一化，得到所述目标操作日志；Normalizing the operation log to be processed to obtain the target operation log;

所述敏感级别识别模型是在任一分类模型的基础上，通过历史操作日志及其对应的敏感级别标签训练得到的。The sensitivity level recognition model is obtained by training historical operation logs and their corresponding sensitivity level labels based on any classification model.

在一个实施例中，所述敏感级别识别模型，是基于以下方式确定得到的：In one embodiment, the sensitivity level identification model is determined based on the following method:

将第一历史操作日志集中的系统数据、操作指令数据和敏感级别数据输入至任一分类模型中，得到所述任一分类模型输出的对应于各第一历史操作日志的预测敏感级别；Inputting the system data, operation instruction data and sensitivity level data in the first historical operation log set into any classification model, and obtaining the predicted sensitivity level corresponding to each first historical operation log output by any classification model;

新增第二历史操作日志集，将所述第二历史操作日志集添加至所述第一历史操作日志集合中，返回将第一历史操作日志集中的系统数据、操作指令数据和敏感级别数据输入至任一分类模型中的步骤，直至连续两次训练得到的各预测敏感级别的整体准确率均大于或等于准确率阈值，将此时的所述任一分类模型确定为敏感级别识别模型。A second historical operation log set is added, and the second historical operation log set is added to the first historical operation log set. The step of inputting the system data, operation instruction data and sensitivity level data in the first historical operation log set into any classification model is returned, until the overall accuracy of each predicted sensitivity level obtained from two consecutive trainings is greater than or equal to the accuracy threshold, and any classification model at this time is determined as a sensitivity level recognition model.

在一个实施例中，所述对所述操作时刻进行无量纲转换，得到时刻转换值，包括：In one embodiment, performing dimensionless conversion on the operation time to obtain a time conversion value includes:

若所述操作时刻处于工作日的预设工作时段内，则将所述操作时刻转换为第一时刻转换值；If the operation time is within the preset working period of the working day, converting the operation time into a first time conversion value;

若所述操作时刻处于工作日的非预设工作时段内，则将所述操作时刻转换为第二时刻转换值；If the operation time is within a non-preset working period of a working day, converting the operation time into a second time conversion value;

若所述操作时刻处于周末，则将所述操作时刻转换为第三时刻转换值；If the operation time is on the weekend, converting the operation time into a third time conversion value;

若所述操作时刻处于节假日，则将所述操作时刻转换为第四时刻转换值；If the operation time is a holiday, converting the operation time into a fourth time conversion value;

所述第一时刻转换值小于所述第二时刻转换值，所述第二时刻转换值小于所述第三时刻转换值，所述第三时刻转换值小于所述第四时刻转换值。The conversion value at the first moment is smaller than the conversion value at the second moment, the conversion value at the second moment is smaller than the conversion value at the third moment, and the conversion value at the third moment is smaller than the conversion value at the fourth moment.

在一个实施例中，所述对所述操作频次进行无量纲转换，得到频次转换值，包括：In one embodiment, performing dimensionless conversion on the operation frequency to obtain a frequency conversion value includes:

若所述操作频次小于或等于第一频次阈值，则将所述操作频次转换为第一频次转换值；If the operation frequency is less than or equal to a first frequency threshold, converting the operation frequency into a first frequency conversion value;

若所述操作频次大于所述第一频次阈值，且小于第二频次阈值，则将所述操作频次转换为第二频次转换值；If the operation frequency is greater than the first frequency threshold and less than a second frequency threshold, converting the operation frequency into a second frequency conversion value;

若所述操作频次大于或等于所述第二频次阈值，则将所述操作频次转换为第三频次转换值；If the operation frequency is greater than or equal to the second frequency threshold, converting the operation frequency into a third frequency conversion value;

所述第一频次阈值小于所述第二频次阈值，所述第一频次转换值小于所述第二频次转换值，所述第二频次转换值小于所述第三频次转换值。The first frequency threshold is smaller than the second frequency threshold, the first frequency conversion value is smaller than the second frequency conversion value, and the second frequency conversion value is smaller than the third frequency conversion value.

第二方面，本申请实施例提供一种用户画像构建装置，包括：In a second aspect, an embodiment of the present application provides a user portrait construction device, including:

目标操作活跃度计算模块，用于：根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到所述用户的目标操作活跃度；The target operation activity calculation module is used to obtain the target operation activity of the user according to the operation time, operation frequency and sensitivity level of the user executing various operation instructions on each system;

目标权重计算模块，用于：计算所述用户对各系统执行各类操作指令的指令类别标签的目标权重；A target weight calculation module is used to calculate the target weight of the instruction category label of each type of operation instruction executed by the user on each system;

用户画像构建模块，用于：根据所述目标操作活跃度和所述目标权重，构建所述用户的画像。The user portrait construction module is used to: construct a portrait of the user according to the target operation activity and the target weight.

第三方面，本申请实施例提供一种电子设备，包括处理器和存储有计算机程序的存储器，所述处理器执行所述程序时实现第一方面所述的用户画像构建方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, comprising a processor and a memory storing a computer program, wherein when the processor executes the program, the steps of the user portrait construction method described in the first aspect are implemented.

第四方面，本申请实施例提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现第一方面所述的用户画像构建方法的步骤。In a fourth aspect, an embodiment of the present application provides a computer program product, including a computer program, which, when executed by a processor, implements the steps of the user portrait construction method described in the first aspect.

第五方面，本申请实施例提供一种非暂态计算机可读存储介质，包括计算机程序，所述计算机程序被处理器执行时实现第一方面所述的用户画像构建方法的步骤。In a fifth aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium, including a computer program, which, when executed by a processor, implements the steps of the user portrait construction method described in the first aspect.

本申请提供的用户画像构建方法及装置，根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到该用户的目标操作活跃度，计算该用户对各系统执行各类操作指令的指令类别标签的目标权重，根据目标操作活跃度和目标权重，构建用户的画像。一方面，本申请从用户的目标操作活跃度和各类操作指令的目标权重两个维度构建用户画像，能够更加全面地体现用户操作行为，提高用户画像构建的准确率；另一方面，通过用户的操作时刻、操作频次和操作指令的敏感级别得到用户的目标操作活跃度，将操作指令的敏感级别也考虑至用户画像的构建过程中，适用于涉及数据安全的用户操作行为分析场景，进一步提高用户画像构建的准确率。The user portrait construction method and device provided in the present application obtain the target operation activity of the user according to the operation time, operation frequency and sensitivity level of the user's execution of various operation instructions on various systems, calculate the target weight of the instruction category label of the user's execution of various operation instructions on various systems, and construct the user's portrait according to the target operation activity and target weight. On the one hand, the present application constructs the user portrait from two dimensions: the user's target operation activity and the target weight of various operation instructions, which can more comprehensively reflect the user's operation behavior and improve the accuracy of user portrait construction; on the other hand, the user's target operation activity is obtained through the user's operation time, operation frequency and the sensitivity level of the operation instruction, and the sensitivity level of the operation instruction is also taken into account in the process of user portrait construction, which is suitable for user operation behavior analysis scenarios involving data security, and further improves the accuracy of user portrait construction.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present application or the prior art, a brief introduction will be given below to the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1是本申请实施例提供的用户画像构建方法的流程示意图之一；FIG1 is a flow chart of a method for constructing a user portrait according to an embodiment of the present application;

图2是本申请实施例提供的用户画像构建方法的流程示意图之二；FIG2 is a second flow chart of a method for constructing a user portrait according to an embodiment of the present application;

图3是本申请实施例提供的用户画像构建方法的流程示意图之三；FIG3 is a third flow chart of the method for constructing a user portrait provided in an embodiment of the present application;

图4是本申请实施例提供的用户画像构建方法的流程示意图之四；FIG4 is a fourth flow chart of the method for constructing a user portrait provided in an embodiment of the present application;

图5是本申请实施例提供的用户画像构建装置的结构示意图；FIG5 is a schematic diagram of the structure of a user portrait construction device provided in an embodiment of the present application;

图6是本申请实施例提供的电子设备的结构示意图。FIG. 6 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为使本申请的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the drawings in the embodiments of this application. Obviously, the described embodiments are part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

图1是本申请实施例提供的用户画像构建方法的流程示意图之一。参照图1，本申请实施例提供一种用户画像构建方法，可以包括：FIG1 is a flow chart of a method for constructing a user portrait provided in an embodiment of the present application. Referring to FIG1 , an embodiment of the present application provides a method for constructing a user portrait, which may include:

101、根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到用户的目标操作活跃度；101. Obtain the user's target operation activity according to the operation time, operation frequency and sensitivity level of the user's execution of various operation instructions on each system;

102、计算该用户对各系统执行各类操作指令的指令类别标签的目标权重；102. Calculate the target weight of the instruction category label of each type of operation instruction executed by the user on each system;

103、根据目标操作活跃度和目标权重，构建该用户的画像。103. Build a profile of the user based on the target operation activity and target weight.

步骤102中，即单一用户对所有系统执行的所有操作指令具有类别的不同，将这些操作指令进行分类标识，可以得到指令类别标签，并对这些类别设置权重，表征操作指令类别对于用户操作行为的重要性。In step 102, all operation instructions executed by a single user on all systems have different categories. These operation instructions are classified and identified to obtain instruction category labels, and weights are set for these categories to represent the importance of the operation instruction category to the user's operation behavior.

在实际应用中，步骤101和步骤102之间没有严格的时序关系；即，可同时执行，或任一步骤先执行，具体根据实际需求而定，此处不作限定。In practical applications, there is no strict timing relationship between step 101 and step 102; that is, they can be executed simultaneously, or any one step can be executed first, depending on actual needs and is not limited here.

本实施例提供的用户画像构建方法，根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到该用户的目标操作活跃度，计算该用户对各系统执行各类操作指令的指令类别标签的目标权重，根据目标操作活跃度和目标权重，构建用户的画像。一方面，本实施例从用户的目标操作活跃度和各类操作指令的目标权重两个维度构建用户画像，能够更加全面地体现用户操作行为，提高用户画像构建的准确率；另一方面，通过用户的操作时刻、操作频次和操作指令的敏感级别得到用户的目标操作活跃度，将操作指令的敏感级别也考虑至用户画像的构建过程中，适用于涉及数据安全的用户操作行为分析场景，进一步提高用户画像构建的准确率。The user portrait construction method provided in this embodiment obtains the target operation activity of the user according to the operation time, operation frequency and sensitivity level of the user's execution of various operation instructions on each system, calculates the target weight of the instruction category label of the user's execution of various operation instructions on each system, and constructs the user's portrait according to the target operation activity and target weight. On the one hand, this embodiment constructs the user portrait from two dimensions: the user's target operation activity and the target weight of various operation instructions, which can more comprehensively reflect the user's operation behavior and improve the accuracy of user portrait construction; on the other hand, the user's target operation activity is obtained through the user's operation time, operation frequency and the sensitivity level of the operation instruction, and the sensitivity level of the operation instruction is also taken into account in the process of user portrait construction, which is suitable for user operation behavior analysis scenarios involving data security, and further improves the accuracy of user portrait construction.

图2是本申请实施例提供的用户画像构建方法的流程示意图之二。参照图2，在一个实施例中，根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到用户的目标操作活跃度，可以包括：FIG2 is a second flow chart of the user portrait construction method provided in an embodiment of the present application. Referring to FIG2, in one embodiment, according to the operation time, operation frequency and sensitivity level of the user executing various operation instructions on each system, the user's target operation activity is obtained, which may include:

201、根据多条目标操作日志，获取用户对任一系统执行任一类操作指令的操作时刻、操作频次和敏感级别；201. Based on multiple target operation logs, obtain the operation time, operation frequency and sensitivity level of a user executing any type of operation instruction on any system;

202、对操作时刻进行无量纲转换，得到时刻转换值；202. Perform dimensionless conversion on the operation time to obtain a time conversion value;

203、对操作频次进行无量纲转换，得到频次转换值；203. Perform dimensionless conversion on the operation frequency to obtain a frequency conversion value;

204、对时刻转换值的最大值、频次转换值和敏感级别进行加权求和，得到该用户对该系统执行该类操作指令的操作活跃度；204. Perform a weighted summation of the maximum value of the moment conversion value, the frequency conversion value, and the sensitivity level to obtain the operation activity of the user in executing the operation instruction of this type on the system;

205、计算该用户对各系统执行各类操作指令的操作活跃度的平均值，得到该用户的目标操作活跃度。205. Calculate the average value of the operation activity of the user executing various types of operation instructions on various systems to obtain the target operation activity of the user.

步骤201中，每条目标操作日志可以包括用户、操作时刻、系统、操作指令、敏感级别、指令类别标签这7个字段，对于多条目标操作日志，可以从中获取单一用户对任一系统执行任一类操作指令的多个操作时刻。In step 201, each target operation log may include seven fields: user, operation time, system, operation instruction, sensitivity level, and instruction category label. For multiple target operation logs, multiple operation times of a single user executing any type of operation instruction on any system can be obtained.

对于多条目标操作日志，可以根据其中记录有该单一用户对该系统执行该类操作指令的目标操作日志的条数，确定操作频次。For multiple target operation logs, the operation frequency may be determined according to the number of target operation logs in which the single user executes the type of operation instruction on the system.

敏感级别与系统和操作指令类别有关，对同一系统执行不同类别的操作指令或是对不同系统执行同一类别操作指令均可能导致敏感级别的变化，但对同一系统执行同一类别的操作指令时，敏感级别是唯一不变的。The sensitivity level is related to the system and the category of operation instructions. Executing different categories of operation instructions on the same system or executing the same category of operation instructions on different systems may cause changes in the sensitivity level. However, when executing the same category of operation instructions on the same system, the sensitivity level is the only constant.

步骤202和步骤203中，通过对操作时刻和操作频次进行无量纲转换，可以将操作时刻和操作频次转换为无量纲表达式，使两者转换为同一数量级，便于后续的计算分析。In step 202 and step 203, by performing dimensionless conversion on the operation time and the operation frequency, the operation time and the operation frequency can be converted into dimensionless expressions so that the two are converted into the same order of magnitude, which is convenient for subsequent calculation and analysis.

步骤204中，由于敏感级别通常也是用无量纲的数字表示，因此能够与转换后的操作时刻和操作频次进行联合计算。具体来说，任一用户对任一系统执行任一类操作指令的操作活跃度A可根据以下公式计算得到：In step 204, since the sensitivity level is usually expressed as a dimensionless number, it can be calculated together with the converted operation time and operation frequency. Specifically, the operation activity A of any user executing any type of operation instruction on any system can be calculated according to the following formula:

A＝α*T+β*F+θ*S；A＝α*T+β*F+θ*S；

其中，T是时刻转换值的最大值，F是频次转换值，S是敏感级别，α、β和θ是权重系数，且θ>α>β，α+β+θ＝1，本实施例中，α、β和θ可以分别设置为0.3、0.2和0.5。Among them, T is the maximum value of the moment conversion value, F is the frequency conversion value, S is the sensitivity level, α, β and θ are weight coefficients, and θ>α>β, α+β+θ=1. In this embodiment, α, β and θ can be set to 0.3, 0.2 and 0.5 respectively.

步骤205中，可以根据同样的方法计算所有用户的目标操作活跃度，并根据目标操作活跃度对所有用户进行分类。In step 205, the target operation activity of all users may be calculated according to the same method, and all users may be classified according to the target operation activity.

在实际应用中，步骤202和步骤203之间没有严格的时序关系；即，可同时执行，或任一步骤先执行，具体根据实际需求而定，此处不作限定。In practical applications, there is no strict timing relationship between step 202 and step 203; that is, they can be executed simultaneously, or any one of the steps can be executed first, depending on actual needs and is not limited here.

本实施例通过将操作时刻、操作频次和敏感级别转换至同一数量级，使三者具备可比性，使得计算得到的用户对任一系统执行任一类操作指令的操作活跃度更加准确，同时，利用该用户对各系统执行各类操作指令的操作活跃度的平均值，得到该用户的目标操作活跃度，将用户对所有系统执行所有类别操作指令的影响进行了平衡，能够得到更准确的目标操作活跃度。This embodiment makes the operation time, operation frequency and sensitivity level comparable by converting them to the same order of magnitude, so that the calculated operation activity of a user executing any type of operation instruction on any system is more accurate. At the same time, the target operation activity of the user is obtained by using the average value of the operation activity of the user executing various types of operation instructions on various systems, which balances the impact of the user executing all types of operation instructions on all systems, and can obtain a more accurate target operation activity.

图3是本申请实施例提供的用户画像构建方法的流程示意图之三。参照图3，在一个实施例中，计算用户对各系统执行各类操作指令的指令类别标签的目标权重，可以包括：FIG3 is a flowchart of the method for constructing a user portrait according to an embodiment of the present application. Referring to FIG3 , in one embodiment, calculating the target weight of the instruction category label of each type of operation instruction executed by the user on each system may include:

301、计算多条目标操作日志中，用户对应的任一指令类别标签的出现次数与该用户对应的各指令类别标签的出现次数总和的比值，得到第一比值；301. Calculate the ratio of the number of occurrences of any instruction category label corresponding to a user to the total number of occurrences of each instruction category label corresponding to the user in multiple target operation logs to obtain a first ratio;

302、计算该多条目标操作日志中，各用户对应的各指令类别标签的出现次数总和与各用户对应的该指令类别标签的出现次数总和的比值，得到第二比值；302. Calculate the ratio of the total number of occurrences of each instruction category label corresponding to each user in the plurality of target operation logs to the total number of occurrences of the instruction category label corresponding to each user to obtain a second ratio;

303、计算第一比值与第二比值的乘积，得到该指令类别标签的初始权重；303. Calculate the product of the first ratio and the second ratio to obtain an initial weight of the instruction category label;

304、根据该指令类别标签的初始权重和所属指令类别，得到该指令类别标签的目标权重。304. Obtain a target weight of the instruction category label according to the initial weight of the instruction category label and the instruction category to which it belongs.

步骤301中，每条目标操作日志都对应有一个指令类别标签，用于表示该条目标操作日志中记录的操作指令的类别，具体来说，第一比值TF(p,L)可以根据以下公式计算得到：In step 301, each target operation log corresponds to an instruction category label, which is used to indicate the category of the operation instruction recorded in the target operation log. Specifically, the first ratio TF(p, L) can be calculated according to the following formula:

其中，n(p,L)是用户p对应的指令类别标签L的出现次数，是用户p对应的各指令类别标签的出现次数总和，L_i是用户p对应的任一指令类别标签，L_p是用户p对应的各指令类别标签的集合。TF(p,L)的值越大，说明指令类别标签L与用户p的关系越紧密。Among them, n(p,L) is the number of occurrences of the instruction category label L corresponding to user p, is the total number of occurrences of each instruction category label corresponding to user p, _Li is any instruction category label corresponding to user p, and _Lp is the set of instruction category labels corresponding to user p. The larger the value of TF(p,L), the closer the relationship between instruction category label L and user p.

步骤302中，第二比值IDF(p,L)可以根据以下公式计算得到：In step 302, the second ratio IDF(p,L) can be calculated according to the following formula:

其中，p_j是任一用户，是各用户对应的各指令类别标签的出现次数总和，是各用户对应的指令类别标签L的出现次数总和。IDF(p,L)越大，说明指令类别标签L越稀缺。Among them, _pj is any user, is the total number of occurrences of each instruction category label corresponding to each user, It is the sum of the occurrence times of the instruction category label L corresponding to each user. The larger IDF(p,L) is, the more scarce the instruction category label L is.

步骤303中，指令类别标签L的初始权重W(p,L)可以根据以下公式计算得到：In step 303, the initial weight W(p,L) of the instruction category label L can be calculated according to the following formula:

W(p,L)＝TF(p,L)*IDF(p,L)；W(p,L)＝TF(p,L)*IDF(p,L);

步骤304中，得到指令类别标签L的目标权重的具体方式如下：In step 304, the specific method of obtaining the target weight of the instruction category label L is as follows:

1、若指令类别标签L所属指令类别为运维部署类，则根据初始权重和目标时间差，得到指令类别标签L的目标权重；1. If the instruction category to which the instruction category label L belongs is the operation and maintenance deployment category, the target weight of the instruction category label L is obtained according to the initial weight and the target time difference;

目标时间差是当前时刻与最新目标操作日志中指令类别标签L的出现时刻之间的时间差。The target time difference is the time difference between the current time and the time when the instruction category label L appears in the latest target operation log.

指令类别为运维部署类，说明其对应的操作指令属于在某段时间内的必要行为，此类操作随着时间推移会发生衰减，即此类操作伴随目标时间差冷却，则目标权重W_f(p,L)可以根据以下公式计算得到：The instruction category is operation and maintenance deployment, which means that the corresponding operation instruction is a necessary behavior within a certain period of time. Such operations will decay over time, that is, such operations will cool down with the target time difference. The target weight W _f (p, L) can be calculated according to the following formula:

W_f(p,L)＝W(p,L)*exp(-μ*Δt)；W _f (p,L)=W(p,L)*exp(-μ*Δt);

其中，μ是时间衰减系数，且0<μ<1，可以根据实际情况进行设置，本实施例中，μ可以设置为0.5，Δt为目标时间差。Wherein, μ is the time attenuation coefficient, and 0<μ<1, which can be set according to actual conditions. In this embodiment, μ can be set to 0.5, and Δt is the target time difference.

需要说明的是，随着时间的推移，最新目标操作日志会不断更新，Δt也会发生变化，可以据此不断更新目标权重，以得到更加准确的用户画像。It should be noted that as time goes by, the latest target operation log will be continuously updated, and Δt will also change. The target weight can be continuously updated accordingly to obtain a more accurate user portrait.

2、若指令类别标签L所属指令类别为非涉敏查询、导出和登录类，则将初始权重确定为指令类别标签L的目标权重。2. If the instruction category to which the instruction category label L belongs is non-sensitive query, export and login, the initial weight is determined as the target weight of the instruction category label L.

非涉敏查询、导出和登录类的操作不随着时间进行衰减，因此不需要更新，可以直接将初始权重确定为指令类别标签L的目标权重。Non-sensitive query, export and login operations do not decay over time, so they do not need to be updated. The initial weight can be directly determined as the target weight of the instruction category label L.

3、若指令类别标签L所属指令类别为涉敏查询、导出和修改类，则根据初始权重和目标次数，得到指令类别标签L的目标权重；3. If the instruction category label L belongs to the sensitive query, export and modification category, then the target weight of the instruction category label L is obtained according to the initial weight and the target number;

目标次数是初始时刻与最新目标操作日志中的操作时刻之间的时段内，指令类别标签L的出现次数。The target number is the number of occurrences of the instruction category label L in the period between the initial time and the operation time in the latest target operation log.

指令类别为涉敏查询、导出和修改类，说明该类操作敏感级别比较高，因此需要进一步分析是否存在多次该类涉敏操作的情况，若存在，则该类操作对应的指令类别标签L的目标权重将随用户对该类敏感操作的次数升高，此时目标权重W_f(p,L)可以根据以下公式计算得到：The instruction categories are sensitive query, export and modification, indicating that the sensitivity level of this type of operation is relatively high. Therefore, it is necessary to further analyze whether there are multiple such sensitive operations. If so, the target weight of the instruction category label L corresponding to this type of operation will increase with the number of times the user performs this type of sensitive operation. At this time, the target weight W _f (p, L) can be calculated according to the following formula:

W_f(p,L)＝W(p,L)*(1+δ*Δd)；W _f (p,L)=W(p,L)*(1+δ*Δd);

其中，δ是热膨胀系数，且0<δ<1，可以根据实际情况进行设置，本实施例中，δ可以设置为0.5，Δd为目标次数。Wherein, δ is the thermal expansion coefficient, and 0<δ<1, which can be set according to actual conditions. In this embodiment, δ can be set to 0.5, and Δd is the target number.

需要说明的是，随着时间的推移，最新目标操作日志会不断更新，Δd也会发生变化，可以据此不断更新目标权重，以得到更加准确的用户画像。It should be noted that as time goes by, the latest target operation log will be continuously updated, and Δd will also change. The target weight can be continuously updated accordingly to obtain a more accurate user portrait.

在实际应用中，步骤301和步骤302之间没有严格的时序关系；即，可同时执行，或任一步骤先执行，具体根据实际需求而定，此处不作限定。In practical applications, there is no strict timing relationship between step 301 and step 302; that is, they can be executed simultaneously, or any one of the steps can be executed first, depending on actual needs and is not limited here.

本实施例利用指令类别标签与用户的紧密程度以及该指令类别标签的稀缺程度，结合时间因素对用户各类操作的影响和指令类别的涉敏情况，得到各指令类别标签的目标权重，能够极大地提升目标权重的精准度，进而提升后续用户画像构建的准确率。This embodiment utilizes the closeness between instruction category labels and users and the scarcity of the instruction category labels, combines the impact of time factors on various user operations and the sensitive nature of instruction categories, to obtain the target weights of each instruction category label, which can greatly improve the accuracy of the target weights, thereby improving the accuracy of subsequent user portrait construction.

图4是本申请实施例提供的用户画像构建方法的流程示意图之四。参照图4，在一个实施例中，目标操作日志，可以基于以下方式获取：FIG4 is a fourth flow chart of the method for constructing a user portrait provided in an embodiment of the present application. Referring to FIG4 , in one embodiment, the target operation log can be obtained based on the following method:

401、若操作日志中存在敏感级别，则将操作日志确定为待处理操作日志；401. If there is a sensitivity level in the operation log, the operation log is determined as a pending operation log;

402、若操作日志中不存在敏感级别，则将操作日志输入敏感级别识别模型，得到敏感级别识别模型输出的敏感级别；402. If the sensitivity level does not exist in the operation log, input the operation log into the sensitivity level recognition model to obtain the sensitivity level output by the sensitivity level recognition model;

403、将敏感级别添加至操作日志中，得到待处理操作日志；403. Add the sensitivity level to the operation log to obtain the operation log to be processed;

404、将待处理操作日志进行归一化，得到目标操作日志。404. Normalize the operation log to be processed to obtain a target operation log.

敏感级别识别模型是在任一分类模型的基础上，通过历史操作日志及其对应的敏感级别标签训练得到的。The sensitivity level recognition model is trained on the basis of any classification model through historical operation logs and their corresponding sensitivity level labels.

对于操作日志来说，正常情况下都会包含敏感级别，但有时也会出现敏感级别缺失的情况，在传统的数据预处理过程中，这种操作日志可能被当作脏数据过滤掉，基于过滤后的操作日志存在日志特征学习不完整的情况。Operation logs normally contain sensitivity levels, but sometimes they are missing. In traditional data preprocessing, such operation logs may be filtered out as dirty data, and the log feature learning based on the filtered operation logs is incomplete.

在实际应用中，步骤401和步骤402之间没有严格的时序关系；即，可同时执行，或任一步骤先执行，具体根据实际需求而定，此处不作限定，但步骤403和步骤404必须在步骤402之后顺序执行。In actual applications, there is no strict timing relationship between step 401 and step 402; that is, they can be executed simultaneously, or any one step can be executed first, depending on actual needs and is not limited here, but step 403 and step 404 must be executed sequentially after step 402.

本实施例在操作日志的敏感级别缺失的情况下，通过敏感级别识别模型对此类操作日志进行敏感级别识别，并将日志补充完整，再进行归一化，进一步使得操作日志中的各数据处于同一可比数量级，最终能够基于完备的日志数据便捷地计算用户目标操作活跃度，实现用户分类，更好的挖掘用户操作行为特征。In the case where the sensitivity level of the operation log is missing, this embodiment uses a sensitivity level identification model to identify the sensitivity level of such operation log, completes the log, and then normalizes it, so that the data in the operation log is at the same comparable order of magnitude. Ultimately, based on the complete log data, the user's target operation activity can be easily calculated to achieve user classification and better mine the user's operation behavior characteristics.

在一个实施例中，敏感级别识别模型，是基于以下方式确定得到的：In one embodiment, the sensitivity level identification model is determined based on the following method:

将第一历史操作日志集中的系统数据、操作指令数据和敏感级别数据输入至任一分类模型中，得到该分类模型输出的对应于各第一历史操作日志的预测敏感级别，新增第二历史操作日志集，将第二历史操作日志集添加至第一历史操作日志集合中，返回将第一历史操作日志集中的系统数据、操作指令数据和敏感级别数据输入至任一分类模型中的步骤，直至连续两次训练得到的各预测敏感级别的整体准确率均大于或等于准确率阈值，将此时的所述任一分类模型确定为敏感级别识别模型。The system data, operation instruction data and sensitivity level data in the first historical operation log set are input into any classification model to obtain the predicted sensitivity level corresponding to each first historical operation log output by the classification model, a second historical operation log set is added, the second historical operation log set is added to the first historical operation log set, and the step of inputting the system data, operation instruction data and sensitivity level data in the first historical operation log set into any classification model is returned until the overall accuracy of each predicted sensitivity level obtained by two consecutive trainings is greater than or equal to the accuracy threshold, and the any classification model at this time is determined as a sensitivity level recognition model.

假设第一历史操作日志集为C₁＝{日志1,日志2,…,日志m}，其对应的实际敏感级别的集合为S₁＝{S1,S2,…,Sm}，其对应的预测敏感级别的集合为X₁＝{X1,X2,…,Xm}，则将S₁与X₁进行相似度对比有：Assume that the first historical operation log set is C ₁ ={log 1, log 2, …, log m}, its corresponding set of actual sensitivity levels is S ₁ ={S1, S2, …, Sm}, and its corresponding set of predicted sensitivity levels is X ₁ ={X1, X2, …, Xm}, then the similarity comparison between S ₁ and X ₁ is:

根据以上公式，利用第一历史操作日志集对分类模型进行第一次训练得到的各预测敏感级别的整体准确率σ有：According to the above formula, the overall accuracy σ of each predicted sensitivity level obtained by first training the classification model using the first historical operation log set is:

将第二历史操作日志集添加至第一历史操作日志集合之后，再对分类模型进行第二次训练，并计算第二次训练得到的各预测敏感级别的整体准确率，若两次均大于或等于准确率阈值，则说明分类模型的识别效果较好，可以停止训练，若任一次训练得到的各预测敏感级别的整体准确率小于准确率阈值，则在下一次训练时往训练集中添加新的历史操作日志集后再进行训练，直至连续两次训练得到的各预测敏感级别的整体准确率均大于或等于准确率阈值，得到敏感级别识别模型。After adding the second historical operation log set to the first historical operation log set, the classification model is trained for the second time, and the overall accuracy of each predicted sensitivity level obtained from the second training is calculated. If both are greater than or equal to the accuracy threshold, it means that the recognition effect of the classification model is good and the training can be stopped. If the overall accuracy of each predicted sensitivity level obtained from any training is less than the accuracy threshold, then in the next training, a new historical operation log set is added to the training set and then training is performed again until the overall accuracy of each predicted sensitivity level obtained from two consecutive trainings is greater than or equal to the accuracy threshold, and a sensitivity level recognition model is obtained.

需要说明的是，该准确率阈值可以根据实际情况进行设定，此处不作限定，本实施例中，可以将其设置为90％。It should be noted that the accuracy threshold can be set according to actual conditions and is not limited here. In this embodiment, it can be set to 90%.

本实施例通过历史操作日志集中与敏感级别关联程度较大的系统数据和操作指令数据，及其对应的敏感级别数据对分类模型进行训练，并在连续两次训练得到的预测敏感级别的整体准确率均较高的情况下，得到敏感级别识别模型，在保证训练效率的情况下，使得敏感级别识别模型具有较强的识别能力，能够准确识别缺失操作日志中的敏感级别。This embodiment trains the classification model through the system data and operation instruction data with a high degree of correlation with the sensitivity level in the historical operation log set, and the corresponding sensitivity level data, and obtains the sensitivity level recognition model when the overall accuracy of the predicted sensitivity level obtained from two consecutive trainings is high. While ensuring the training efficiency, the sensitivity level recognition model has a strong recognition ability and can accurately identify the sensitivity level in the missing operation log.

在一个实施例中，对操作时刻进行无量纲转换，得到时刻转换值，可以包括：In one embodiment, performing dimensionless conversion on the operation time to obtain the time conversion value may include:

若操作时刻处于工作日的预设工作时段内，则将操作时刻转换为第一时刻转换值；If the operation time is within the preset working period of the working day, the operation time is converted to the first time conversion value;

若操作时刻处于工作日的非预设工作时段内，则将操作时刻转换为第二时刻转换值；If the operation time is within a non-preset working period of a working day, the operation time is converted into a second time conversion value;

若操作时刻处于周末，则将操作时刻转换为第三时刻转换值；If the operation time is on the weekend, the operation time is converted to the third time conversion value;

若操作时刻处于节假日，则将操作时刻转换为第四时刻转换值；If the operation time is a holiday, the operation time is converted to the fourth time conversion value;

第一时刻转换值小于第二时刻转换值，第二时刻转换值小于第三时刻转换值，第三时刻转换值小于第四时刻转换值。The conversion value at the first moment is smaller than the conversion value at the second moment, the conversion value at the second moment is smaller than the conversion value at the third moment, and the conversion value at the third moment is smaller than the conversion value at the fourth moment.

其中，第一时刻转换值、第二时刻转换值、第三时刻转换值和第四时刻转换值可以根据实际情况进行设定，此处不作限定，本实施例中，可以将第一时刻转换值设定为1，将第二时刻转换值设定为2，将第三时刻转换值设定为3，将第四时刻转换值设定为4。Among them, the first moment conversion value, the second moment conversion value, the third moment conversion value and the fourth moment conversion value can be set according to actual conditions and are not limited here. In this embodiment, the first moment conversion value can be set to 1, the second moment conversion value can be set to 2, the third moment conversion value can be set to 3, and the fourth moment conversion value can be set to 4.

另外，需要说明的是，若某一操作时刻所处的时段重叠，则以转换值较大的作为该操作时刻的转换值。例如某一操作时刻既处于周末又处于节假日，由于节假日对应的转换值较高，则将该操作时刻转换为第四时刻转换值。In addition, it should be noted that if the time periods of a certain operation moment overlap, the larger conversion value is used as the conversion value of the operation moment. For example, if a certain operation moment is both on a weekend and on a holiday, since the conversion value corresponding to the holiday is higher, the operation moment is converted to the conversion value of the fourth moment.

进一步地，对操作频次进行无量纲转换，得到频次转换值，可以包括：Furthermore, dimensionless conversion is performed on the operation frequency to obtain a frequency conversion value, which may include:

若操作频次小于或等于第一频次阈值，则将操作频次转换为第一频次转换值；If the operation frequency is less than or equal to the first frequency threshold, converting the operation frequency to the first frequency conversion value;

若操作频次大于第一频次阈值，且小于第二频次阈值，则将操作频次转换为第二频次转换值；If the operation frequency is greater than the first frequency threshold and less than the second frequency threshold, the operation frequency is converted to the second frequency conversion value;

若操作频次大于或等于第二频次阈值，则将操作频次转换为第三频次转换值；If the operation frequency is greater than or equal to the second frequency threshold, converting the operation frequency to a third frequency conversion value;

第一频次阈值小于第二频次阈值，第一频次转换值小于第二频次转换值，第二频次转换值小于第三频次转换值。The first frequency threshold is smaller than the second frequency threshold, the first frequency conversion value is smaller than the second frequency conversion value, and the second frequency conversion value is smaller than the third frequency conversion value.

其中，第一频次阈值、第二频次阈值、第一频次转换值、第二频次转换值和第三频次转换值可以根据实际情况进行设定，此处不作限定，本实施例中，可以将第一频次阈值10，将第二频次阈值50，将第一频次转换值设定为1，将第二频次转换值设定为2，将第三频次转换值设定为3。Among them, the first frequency threshold, the second frequency threshold, the first frequency conversion value, the second frequency conversion value and the third frequency conversion value can be set according to actual conditions and are not limited here. In this embodiment, the first frequency threshold can be 10, the second frequency threshold can be 50, the first frequency conversion value can be set to 1, the second frequency conversion value can be set to 2, and the third frequency conversion value can be set to 3.

进一步地，对于敏感级别，可以根据实际情况进行设定，此处不作限定，本实施例中，可以将敏感级别从非敏感到极敏感设定为1、2、3和4这四个级别。Furthermore, the sensitivity level can be set according to actual conditions and is not limited here. In this embodiment, the sensitivity level can be set to four levels, 1, 2, 3 and 4, from non-sensitive to extremely sensitive.

本实施例根据操作时刻所处时段对其进行无量纲转换，根据阈值划分对操作频次进行无量纲转换，能够将操作时刻和操作频次合理地转换至与敏感级别同一数量级，使得三者具有良好的可比性，有助于后续的计算分析。This embodiment performs dimensionless conversion on the operation moment according to the time period in which it is located, and performs dimensionless conversion on the operation frequency according to the threshold division, and can reasonably convert the operation moment and the operation frequency to the same order of magnitude as the sensitivity level, so that the three have good comparability, which is helpful for subsequent calculation and analysis.

下面对本申请实施例提供的用户画像构建装置进行描述，下文描述的用户画像构建装置与上文描述的用户画像构建方法可相互对应参照。The user portrait construction device provided in an embodiment of the present application is described below. The user portrait construction device described below and the user portrait construction method described above can be referenced to each other.

图5是本申请实施例提供的用户画像构建装置的结构示意图。参照图5，本申请实施例提供一种用户画像构建装置，可以包括：FIG5 is a schematic diagram of the structure of a user portrait construction device provided in an embodiment of the present application. Referring to FIG5 , an embodiment of the present application provides a user portrait construction device, which may include:

目标操作活跃度计算模块501，用于：根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到所述用户的目标操作活跃度；The target operation activity calculation module 501 is used to obtain the target operation activity of the user according to the operation time, operation frequency and sensitivity level of the user executing various operation instructions on each system;

目标权重计算模块502，用于：计算所述用户对各系统执行各类操作指令的指令类别标签的目标权重；The target weight calculation module 502 is used to calculate the target weight of the instruction category label of each type of operation instruction executed by the user on each system;

用户画像构建模块503，用于：根据所述目标操作活跃度和所述目标权重，构建所述用户的画像。The user portrait building module 503 is used to build a portrait of the user according to the target operation activity and the target weight.

本实施例提供的用户画像构建装置，根据用户对各系统执行各类操作指令的操作时刻、操作频次和敏感级别，得到该用户的目标操作活跃度，计算该用户对各系统执行各类操作指令的指令类别标签的目标权重，根据目标操作活跃度和目标权重，构建用户的画像。一方面，本实施例从用户的目标操作活跃度和各类操作指令的目标权重两个维度构建用户画像，能够更加全面地体现用户操作行为，提高用户画像构建的准确率；另一方面，通过用户的操作时刻、操作频次和操作指令的敏感级别得到用户的目标操作活跃度，将操作指令的敏感级别也考虑至用户画像的构建过程中，适用于涉及数据安全的用户操作行为分析场景，进一步提高用户画像构建的准确率。The user portrait construction device provided in this embodiment obtains the target operation activity of the user according to the operation time, operation frequency and sensitivity level of the user's execution of various operation instructions on various systems, calculates the target weight of the instruction category label of the user's execution of various operation instructions on various systems, and constructs the user's portrait according to the target operation activity and the target weight. On the one hand, this embodiment constructs the user portrait from two dimensions: the user's target operation activity and the target weight of various operation instructions, which can more comprehensively reflect the user's operation behavior and improve the accuracy of user portrait construction; on the other hand, the user's target operation activity is obtained through the user's operation time, operation frequency and the sensitivity level of the operation instruction, and the sensitivity level of the operation instruction is also taken into account in the process of user portrait construction, which is suitable for user operation behavior analysis scenarios involving data security, and further improves the accuracy of user portrait construction.

在一个实施例中，目标操作活跃度计算模块501，具体用于：In one embodiment, the target operation activity calculation module 501 is specifically used to:

在一个实施例中，目标权重计算模块502，具体用于：In one embodiment, the target weight calculation module 502 is specifically used to:

在一个实施例中，还包括目标操作日志获取模块(图中未示出)，用于：In one embodiment, a target operation log acquisition module (not shown in the figure) is further included, which is used to:

在一个实施例中，还包括敏感级别识别模型构建模块(图中未示出)，用于：In one embodiment, a sensitivity level identification model building module (not shown in the figure) is further included, which is used to:

图6示例了一种电子设备的结构示意图，如图6所示，该电子设备可以包括：处理器(processor)610、通信接口(Communication Interface)620、存储器(memory)630和通信总线640，其中，处理器610，通信接口620，存储器630通过通信总线640完成相互间的通信。处理器610可以调用存储器630中的计算机程序，以执行用户画像构建方法的步骤，例如包括：FIG6 illustrates a schematic diagram of the structure of an electronic device. As shown in FIG6 , the electronic device may include: a processor 610, a communication interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communication interface 620, and the memory 630 communicate with each other through the communication bus 640. The processor 610 may call a computer program in the memory 630 to execute the steps of the user portrait construction method, for example, including:

此外，上述的存储器630中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 630 can be implemented in the form of a software functional unit and can be stored in a computer-readable storage medium when it is sold or used as an independent product. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art, and the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a disk or an optical disk.

另一方面，本申请实施例还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，所述计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各实施例所提供的用户画像构建方法的步骤，例如包括：On the other hand, an embodiment of the present application further provides a computer program product, the computer program product including a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can perform the steps of the user portrait construction method provided in the above embodiments, for example, including:

另一方面，本申请实施例还提供一种处理器可读存储介质，所述处理器可读存储介质存储有计算机程序，所述计算机程序用于使处理器执行上述各实施例提供的用户画像构建方法的步骤，例如包括：On the other hand, an embodiment of the present application further provides a processor-readable storage medium, wherein the processor-readable storage medium stores a computer program, wherein the computer program is used to enable the processor to execute the steps of the user portrait construction method provided in the above embodiments, for example, including:

所述处理器可读存储介质可以是处理器能够存取的任何可用介质或数据存储设备，包括但不限于磁性存储器(例如软盘、硬盘、磁带、磁光盘(MO)等)、光学存储器(例如CD、DVD、BD、HVD等)、以及半导体存储器(例如ROM、EPROM、EEPROM、非易失性存储器(NANDFLASH)、固态硬盘(SSD))等。The processor-readable storage medium can be any available medium or data storage device that can be accessed by the processor, including but not limited to magnetic storage (such as floppy disks, hard disks, magnetic tapes, magneto-optical disks (MO), etc.), optical storage (such as CD, DVD, BD, HVD, etc.), and semiconductor storage (such as ROM, EPROM, EEPROM, non-volatile memory (NANDFLASH), solid-state drive (SSD)), etc.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1.A user portrait construction method is characterized by comprising the following steps:

obtaining target operation activity of a user according to operation time, operation frequency and sensitivity level of the user for executing various operation instructions on each system;

calculating target weight of instruction class labels of various operation instructions executed by the user on each system;

and constructing the portrait of the user according to the target operation activity and the target weight.

2. The user portrait construction method according to claim 1, wherein the obtaining the target operation activity of the user according to the operation time, the operation frequency and the sensitivity level of the user to execute various operation instructions on each system includes:

According to the multi-item mark operation log, the operation time, the operation frequency and the sensitivity level of the user for executing any type of operation instruction to any system are obtained;

Performing dimensionless conversion on the operation time to obtain a time conversion value;

dimensionless conversion is carried out on the operation frequency to obtain a frequency conversion value;

The maximum value of the moment conversion value, the frequency conversion value and the sensitivity level are weighted and summed to obtain the operation activity of the user for executing any type of operation instruction on any system;

And calculating the average value of the operation liveness of the user for executing various operation instructions on each system to obtain the target operation liveness of the user.

3. The user portrait construction method according to claim 1, wherein calculating the target weights of instruction class labels of various operation instructions performed by the user on respective systems includes:

Calculating the ratio of the occurrence times of any instruction type label corresponding to the user to the sum of the occurrence times of all instruction type labels corresponding to the user in a multi-item label operation log to obtain a first ratio;

Calculating the ratio of the sum of the occurrence times of the instruction class labels corresponding to the users to the sum of the occurrence times of any instruction class label corresponding to the users in the multi-item label operation log to obtain a second ratio;

calculating the product of the first ratio and the second ratio to obtain the initial weight of the label of any instruction category;

And obtaining the target weight of any instruction category label according to the initial weight of any instruction category label and the belonging instruction category.

4. The user portrait construction method according to claim 3, wherein the obtaining the target weight of the arbitrary instruction class label according to the initial weight of the arbitrary instruction class label and the instruction class to which the arbitrary instruction class label belongs includes:

If the instruction category to which any instruction category label belongs is an operation and maintenance deployment category, obtaining a target weight of any instruction category label according to the initial weight and the target time difference; the target time difference is the time difference between the current time and the occurrence time of any instruction type label in the latest target operation log;

If the instruction category to which the arbitrary instruction category label belongs is a non-sensitive inquiry, export and login category, determining the initial weight as the target weight of the arbitrary instruction category label;

If the instruction category to which any instruction category label belongs is the class of the sensitive inquiry, the export and the modification, the target weight of any instruction category label is obtained according to the initial weight and the target times; the target number of times is the number of occurrences of the arbitrary instruction class label in a period between an initial time and an operation time in a latest target operation log.

5. A user portrayal construction method according to claim 2 or 3, characterized in that the target operation log is obtained on the basis of:

If the operation log has a sensitive level, determining the operation log as an operation log to be processed;

if the operation log does not have the sensitive level, inputting the operation log into a sensitive level identification model to obtain the sensitive level output by the sensitive level identification model;

adding the sensitivity level into the operation log to obtain an operation log to be processed;

normalizing the operation log to be processed to obtain the target operation log;

the sensitivity level identification model is obtained through training a historical operation log and a corresponding sensitivity level label on the basis of any classification model.

6. The user portrayal construction method according to claim 5, wherein the sensitivity level recognition model is determined based on:

inputting system data, operation instruction data and sensitivity level data in the first historical operation log set into any classification model to obtain a prediction sensitivity level corresponding to each first historical operation log output by any classification model;

Adding a second historical operation log set, adding the second historical operation log set to the first historical operation log set, returning to the step of inputting the system data, the operation instruction data and the sensitivity level data in the first historical operation log set into any classification model until the overall accuracy of each predicted sensitivity level obtained by continuous training is greater than or equal to an accuracy threshold value, and determining any classification model at the moment as a sensitivity level identification model.

7. The user portrait construction method according to claim 2, wherein the performing dimensionless transformation on the operation time to obtain a time transformation value includes:

If the operation time is in a preset working period of a working day, converting the operation time into a first time conversion value;

If the operation time is in a non-preset working period of a working day, converting the operation time into a second time conversion value;

If the operation time is on the weekend, converting the operation time into a third time conversion value;

If the operation time is in holidays, converting the operation time into a fourth time conversion value;

The first time instant conversion value is smaller than the second time instant conversion value, the second time instant conversion value is smaller than the third time instant conversion value, and the third time instant conversion value is smaller than the fourth time instant conversion value.

8. The user portrait construction method according to claim 2, wherein said dimensionless converting the operation frequency to obtain a frequency conversion value includes:

If the operation frequency is smaller than or equal to a first frequency threshold value, converting the operation frequency into a first frequency conversion value;

if the operation frequency is greater than the first frequency threshold and less than the second frequency threshold, converting the operation frequency into a second frequency conversion value;

If the operation frequency is greater than or equal to the second frequency threshold value, converting the operation frequency into a third frequency conversion value;

the first frequency threshold is smaller than the second frequency threshold, the first frequency conversion value is smaller than the second frequency conversion value, and the second frequency conversion value is smaller than the third frequency conversion value.

9. A user portrayal construction device comprising:

a target operation activity calculating module, configured to: obtaining target operation activity of a user according to operation time, operation frequency and sensitivity level of the user for executing various operation instructions on each system;

The target weight calculation module is used for: calculating target weight of instruction class labels of various operation instructions executed by the user on each system;

The user portrait construction module is used for: and constructing the portrait of the user according to the target operation activity and the target weight.

10. An electronic device comprising a processor and a memory storing a computer program, characterized in that the processor implements the steps of the user portrayal construction method according to any one of claims 1 to 8 when executing the computer program.

11. A computer program product comprising a computer program, characterized in that the computer program when executed by a processor implements the steps of the user portrayal construction method according to any one of claims 1 to 8.

12. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the user portrayal construction method according to any one of claims 1 to 8.