WO2025118131A1

WO2025118131A1 - Intelligent recommendation method based on user behavior features

Info

Publication number: WO2025118131A1
Application number: PCT/CN2023/136354
Authority: WO
Inventors: 蔡洪斌; 卢光辉; 赵晨; 胡耀东; 艾鑫; 晏小虎; 段雅俊
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-12-05
Filing date: 2023-12-05
Publication date: 2025-06-12
Anticipated expiration: 2026-06-05

Abstract

An intelligent recommendation method based on user behavior features. The method comprises data collection, data organization, construction of an LR-based CTR model, construction of a CIN-based CTR model, and model selection and recommendation. On the basis of the design idea of a CTR recommendation model, the method takes into account user behavior features such as viewing, purchasing and favoriting, thus making reasonable recommendations on the basis of behaviors of user groups; in addition, massive user data is cleaned and reshaped to filter out valid data, thereby improving the training speed; furthermore, in order to enhance the prediction efficiency and prediction accuracy, a high-order explicit feature interaction terms algorithm is additionally introduced. The method effectively improves the process of conventional recommendation algorithms, thus enhancing the recommendation accuracy.

Description

An intelligent recommendation method based on user behavior characteristics

Technical Field

本发明属于大数据分析技术领域，尤其涉及一种基于用户行为特征的智能推荐方法。The present invention belongs to the technical field of big data analysis, and in particular relates to an intelligent recommendation method based on user behavior characteristics.

Background Art

随着近些年互联网、云计算和人工智能的迅猛发展，智能推荐技术已经广泛渗透到我们的日常生活中，并具有巨大的商业潜力。因此，各大互联网公司都对此高度重视，并已经展开了大量相关研究。再这样一种行业快速发展的情况下，现有的智能推荐系统已经能够较为容易地通过对用户海量数据进行分析，从而对每个用户根据其个人性格、行为习惯等，提供最符合用户行为倾向的建议。With the rapid development of the Internet, cloud computing and artificial intelligence in recent years, intelligent recommendation technology has been widely infiltrated into our daily lives and has huge commercial potential. Therefore, major Internet companies attach great importance to this and have carried out a lot of related research. In such a rapidly developing industry, the existing intelligent recommendation system can easily analyze the massive user data and provide each user with the most suitable suggestions based on their personal character, behavior habits, etc.

研究人员对智能推荐系统的改进经过了多个阶段，每个阶段都有一定的提升与发展。在早期，通常采用协同过滤模型构建推荐算法，后来随着机器学习模型的兴起，尝试将机器学习应用于推荐系统。最终，由于基于CTR(Click Through Rate)算法的推荐模型效果较好，所以逐渐取代经典的机器学习模型，成为当前的主流。CTR推荐模型通过收集用户和推荐内容的各种特征信息，预测用户对推荐内容的点击率，并根据点击率的高低进行推荐筛选，从而实现智能推荐。Researchers have gone through several stages of improving the intelligent recommendation system, and each stage has certain improvements and developments. In the early days, collaborative filtering models were usually used to build recommendation algorithms. Later, with the rise of machine learning models, attempts were made to apply machine learning to recommendation systems. In the end, due to the good effect of the recommendation model based on the CTR (Click Through Rate) algorithm, it gradually replaced the classic machine learning model and became the current mainstream. The CTR recommendation model collects various feature information of users and recommended content, predicts the user's click-through rate of recommended content, and performs recommendation screening based on the click-through rate, thereby achieving intelligent recommendation.

发明内容Summary of the invention

针对现有技术存在的不足，本发明提供了一种基于用户行为特征的智能推荐方法，包括以下主要步骤：In view of the shortcomings of the prior art, the present invention provides an intelligent recommendation method based on user behavior characteristics, which includes the following main steps:

步骤1，收集用户数据。所述用户数据包括用户的基本信息、隐性信息、行为信息。本步骤包括：Step 1: Collect user data. The user data includes the user's basic information, implicit information, and behavior information. This step includes:

步骤1.1，收集用户的基本信息：在用户进行注册时，获取用户的基本描述信息特征，包括性别、年龄、学历等信息。Step 1.1, collect basic information of users: when users register, obtain basic descriptive information characteristics of users, including gender, age, education level and other information.

步骤1.2，收集用户的隐性信息：用户完成必要的测评流程，根据测评流程提取出用户的隐性信息，包括各种测评的能力数值、性格倾向等信息。Step 1.2, collect the user's implicit information: the user completes the necessary assessment process, and the user's implicit information is extracted according to the assessment process, including various assessment ability values, personality tendencies and other information.

步骤1.3，收集用户的行为信息：在用户进入推荐页面时，分类并标记用户对不同的推荐内容反馈的行为信息，包括忽略、查看、收藏、购买等行为类型，并记录操作的时间。Step 1.3, collect user behavior information: When the user enters the recommendation page, classify and mark the user's behavior information on different recommended content feedback, including behavior types such as ignore, view, collect, purchase, etc., and record the time of the operation.

步骤2，对用户数据进行整形。本步骤包括： Step 2: Reshape the user data. This step includes:

步骤2.1，清洗数据：采用取平均值或者特殊标记的方法，清洗用户数据中的无效数据。Step 2.1, data cleaning: Use the average value or special marking method to clean invalid data in the user data.

步骤2.2，组合构造：根据后续筛选CTR模型的需求，对步骤1收集的数据进行不同组合构造。Step 2.2, combination construction: According to the needs of subsequent screening of CTR models, different combinations of the data collected in step 1 are constructed.

步骤2.3，数据预处理：将步骤1收集的数据中的类别型特征数据、连续型特征数据分别进行相应的编码转换操作、embedding处理操作，使其可作为CTR模型的输入。Step 2.3, data preprocessing: perform corresponding encoding conversion operations and embedding processing operations on the categorical feature data and continuous feature data in the data collected in step 1, so that they can be used as input of the CTR model.

步骤3，构建基于LR的CTR模型。基于逻辑回归算法，构造CTR模型，本步骤包括：Step 3: Construct a CTR model based on LR. Based on the logistic regression algorithm, construct a CTR model. This step includes:

步骤3.1，输入信息特征：选取用户的基本信息特征、用户的隐性信息特征、用户的行为信息特征中的是否点击标签信息和商品信息作为模型输入。Step 3.1, input information features: select the user's basic information features, the user's implicit information features, the user's behavior information features, whether to click on the label, and the product information as model input.

步骤3.2，组合信息特征：使用一阶线性关系函数组合这些特征。Step 3.2, combine informative features: combine these features using a first-order linear relationship function.

步骤3.3，进行训练计算：根据实际是否点击的结果，对线性计算结果进行求交叉熵的训练计算。Step 3.3, perform training calculation: according to the actual result of whether the click is made, perform training calculation to obtain the cross entropy of the linear calculation result.

步骤4，构建基于CIN的CTR模型。基于交叉网络原理，构建CTR模型，本步骤包括：Step 4: Build a CTR model based on CIN. Based on the cross network principle, build a CTR model. This step includes:

步骤4.1，输入信息特征。选取用户的基本信息特征、用户的隐性信息特征、用户的行为信息特征中的是否点击标签信息和商品信息作为模型输入。Step 4.1, input information features. Select the user's basic information features, the user's implicit information features, the user's behavior information features (whether to click on the label) and the product information as model input.

步骤4.2，构造第一层向量。构造第一层压缩交互网络CIN的输出向量，本步骤包括：Step 4.2, construct the first layer vector. Construct the output vector of the first layer of the compressed interaction network CIN. This step includes:

步骤4.2.1，将embedding后的输入向量，通过与自身进行哈达玛积计算得到多个二维矩阵。Step 4.2.1, calculate the Hadamard product of the embedded input vector with itself to obtain multiple two-dimensional matrices.

步骤4.2.2，将多个二维矩阵乘以相应的系数矩阵得到一个feature map向量。Step 4.2.2, multiply multiple two-dimensional matrices by the corresponding coefficient matrix to obtain a feature map vector.

步骤4.2.3，根据需求重复进行步骤4.2.2得到多个feature map向量，构成第一层CIN网络。Step 4.2.3: Repeat step 4.2.2 as needed to obtain multiple feature map vectors to form the first layer of the CIN network.

步骤4.3，构造多层向量。将前一层的CIN网络向量与输入向量进行如同步骤4.2的计算，继续构造出下一层CIN网络。Step 4.3: construct multi-layer vectors. The CIN network vector of the previous layer is calculated with the input vector as in step 4.2 to continue constructing the next layer of CIN network.

步骤4.4，压缩结果。通过按一定系数采用随机取值pooling的方法，压缩每一层CIN网络的结果，得到CIN网络部分的输出。Step 4.4, compress the results. By using a random value pooling method according to a certain coefficient, compress the results of each layer of the CIN network to obtain the output of the CIN network part.

步骤4.5，联合训练。构造CTR模型中DNN部分和线性部分，与CIN部分进行联合训练。Step 4.5: Joint training: Construct the DNN part and the linear part of the CTR model and perform joint training with the CIN part.

步骤4.6，计算交叉熵。使用sigmoid函数联合计算三个部分的输出，并进行交叉熵计算作为整个模型的输出。Step 4.6, calculate the cross entropy. Use the sigmoid function to jointly calculate the outputs of the three parts, and perform cross entropy calculation as the output of the entire model.

步骤4.7，正则化目标函数。对目标函数进行正则化处理，其中深度神经网络部分添加Batch Normalization层以提高泛化能力，对神经元使用Dropout防止训练过拟合并采用L2正则化；而Embedding层和CIN网络部分只采用L2正则化。Step 4.7, regularize the objective function. Regularize the objective function, and add Batch to the deep neural network part. The Normalization layer is used to improve the generalization ability, and Dropout is used for neurons to prevent overfitting in training and L2 regularization is adopted; while the Embedding layer and CIN network part only use L2 regularization.

步骤5，应用模型筛选推荐。利用两层推荐模型筛选得到推荐结果，本步骤包括：Step 5: Apply the model to filter the recommendations. The two-layer recommendation model is used to filter the recommendation results. This step includes:

步骤5.1，初步筛选。通过用户所选类别，将相应推荐内容放入基于LR的CTR模型中进行初步筛选。Step 5.1: Preliminary screening: Based on the category selected by the user, the corresponding recommended content is put into the LR-based CTR model for preliminary screening.

步骤5.2，二次筛选。将初步筛选后的结果，送入基于CIN的CTR模型中进行最终筛选，最终按照输出的概率排序，取得前20的结果作为最终输出。Step 5.2: Secondary screening: The results after the initial screening are sent to the CTR model based on CIN for final screening. Finally, the results are sorted according to the output probability, and the top 20 results are obtained as the final output.

本发明的有益效果是：提出一种基于用户行为特征的智能推荐方法，能够合理利用用户基本信息特征、用户隐性信息特征、用户行为特征，大幅度降低推荐模型训练时的训练代价，提高推荐时的准确率。The beneficial effect of the present invention is: proposing an intelligent recommendation method based on user behavior characteristics, which can reasonably utilize user basic information characteristics, user implicit information characteristics, and user behavior characteristics, greatly reduce the training cost when training the recommendation model, and improve the accuracy of recommendation.

BRIEF DESCRIPTION OF THE DRAWINGS

图1示出了本发明一种基于用户行为特征的智能推荐方法的流程图；FIG1 shows a flow chart of an intelligent recommendation method based on user behavior characteristics of the present invention;

图2示出了本发明一种基于用户行为特征的智能推荐方法的使用过程；FIG2 shows the use process of an intelligent recommendation method based on user behavior characteristics of the present invention;

图3示出了本发明智能推荐方法核心算法模型的具体结构；FIG3 shows the specific structure of the core algorithm model of the intelligent recommendation method of the present invention;

图4示出了本发明模型输入数据的收集转换过程；FIG4 shows the collection and conversion process of the input data of the model of the present invention;

图5、图6示出了本发明CIN网络部分feature map部分的计算过程。Figures 5 and 6 show the calculation process of the feature map part of the CIN network part of the present invention.

DETAILED DESCRIPTION

下面结合附图和实施例对本发明优先实施方式进一步说明。The preferred implementation modes of the present invention are further described below with reference to the accompanying drawings and examples.

图1所示的流程图给出了本发明实施的基本流程，图2所示的网络结构图给出了本发明整个创新模型的结构：The flowchart shown in FIG1 shows the basic process of the implementation of the present invention, and the network structure diagram shown in FIG2 shows the structure of the entire innovative model of the present invention:

步骤1，收集用户数据。收集用户在CTR模型中的输入特征数据，本步骤包括：Step 1: Collect user data. Collect the user's input feature data in the CTR model. This step includes:

步骤1.1.收集用户的基本信息。在用户进行注册时，获取用户的基本描述信息特征，如性别，年龄，学历。Step 1.1. Collect basic information of users. When users register, obtain basic descriptive information characteristics of users, such as gender, age, and education level.

步骤1.2.收集用户的隐性信息。用户完成必要的测评流程，使用测评表从测评流程获得的信息中，提取出用户的隐性信息，包括各种测评的能力数值、性格倾向等信息。Step 1.2. Collect the user's implicit information. The user completes the necessary assessment process and uses the assessment form to extract the user's implicit information from the information obtained from the assessment process, including various assessment ability values, personality tendencies and other information.

步骤1.3.收集用户的行为信息。在用户进入推荐页面时，分类并标记用户对不同的推荐内容信息的行为，包括：忽略、查看、收藏、购买等行为类型，并记录操作的时间，从而实时更新用户的行为信息。Step 1.3. Collect user behavior information. When a user enters the recommendation page, classify and mark the user's behavior on different recommended content information, including: ignore, view, collect, purchase and other behavior types, and record the time of the operation, so as to update the user's behavior information in real time.

步骤2，对用户数据进行整形。对步骤1中获得数据进行整型，使数据符合之后步骤的 Step 2: Reshape the user data. Reshape the data obtained in step 1 to make it conform to the requirements of the following steps.

使用要求，本步骤包括：Usage requirements. This step includes:

步骤2.1.清洗数据。采用取平均值或者特殊标记的方法，清洗步骤1收集数据中的无效数据。Step 2.1. Clean the data. Use the average value or special marking method to clean the invalid data collected in step 1.

步骤2.2.组合构造。根据后续筛选CTR模型的需求，对步骤1收集的数据进行不同组合构造。Step 2.2. Combination construction: According to the needs of subsequent screening of CTR models, different combinations of the data collected in step 1 are constructed.

步骤2.3.数据预处理。将步骤1收集的数据中的类别型特征数据、连续型特征数据分别进行相应的编码转换操作、embedding处理操作，使其可作为CTR模型的输入，本步骤包括：Step 2.3. Data preprocessing. Perform corresponding encoding conversion and embedding operations on the categorical feature data and continuous feature data collected in step 1, so that they can be used as input for the CTR model. This step includes:

步骤2.3.1.对连续型的数据进行简单的归一化操作，对类别型的数据使用标签编码的转换操作。Step 2.3.1. Perform a simple normalization operation on continuous data and use label encoding conversion operation on categorical data.

步骤2.3.2.基于LR模型，对步骤2.3.1经过标签编码转换操作的类别型的数据进行归一化操作，并与步骤2.3.1经过简单归一化操作的连续型数据进行联合模型训练。Step 2.3.2. Based on the LR model, the categorical data that has undergone the label encoding conversion operation in step 2.3.1 is normalized, and the continuous data that has undergone a simple normalization operation in step 2.3.1 is jointly trained with the model.

步骤2.3.3.基于CIN模型，对步骤2.3.1经过标签编码转换操作的类别型的数据使用独热编码将其转换为稀疏向量数据，再对其与步骤2.3.1经过简单归一化操作的连续型数据进行Embedding操作以分别得到稠密向量，使用两种稠密向量进行模型训练。Step 2.3.3. Based on the CIN model, use one-hot encoding to convert the categorical data that has undergone label encoding conversion in step 2.3.1 into sparse vector data, and then perform embedding operation on it and the continuous data that has undergone simple normalization operation in step 2.3.1 to obtain dense vectors respectively, and use the two dense vectors for model training.

步骤3.1.输入信息特征。选取用户的基本信息特征、用户的隐性信息特征、用户的行为信息特征中的是否点击标签信息和商品信息作为模型输入。Step 3.1. Input information features. Select the user's basic information features, the user's implicit information features, the user's behavior information features (whether to click on the label) and the product information as the model input.

步骤3.2.组合信息特征。使用一阶线性关系函数组合这些特征。
Step 3.2. Combine informative features. Combine these features using a first-order linear relationship function.

步骤3.3.进行训练计算。根据实际是否点击的结果，对线性计算结果进行求交叉熵的训练计算。
Step 3.3. Perform training calculations. Perform training calculations to find cross entropy on the linear calculation results based on the actual click results.

步骤4.1.输入信息特征。选取用户的基本信息特征、用户的隐性信息特征、用户的行为信息特征中的是否点击标签信息和商品信息作为模型输入。Step 4.1. Input information features. Select the user's basic information features, the user's implicit information features, and the user's behavior The information features of whether the label is clicked and the product information are used as model inputs.

步骤4.2.构造第一层向量。构造第一层压缩交互网络CIN的输出向量，本步骤包括：Step 4.2. Construct the first layer vector. Construct the output vector of the first layer of the compressed interaction network CIN. This step includes:

步骤4.2.1.将embedding后的输入向量，通过与自身进行哈达玛积计算得到多个二维矩阵。Step 4.2.1. Calculate the Hadamard product of the embedded input vector with itself to obtain multiple two-dimensional matrices.

步骤4.2.2.将多个矩阵乘以相应的系数矩阵得到一个feature map向量。Step 4.2.2. Multiply multiple matrices by the corresponding coefficient matrix to obtain a feature map vector.

步骤4.2.3.根据需求重复进行步骤4.2.2.得到多个feature map向量，构成第一层CIN网络，计算过程图如图5、图6，计算公式如下：
Step 4.2.3. Repeat step 4.2.2 as required to obtain multiple feature map vectors to form the first layer of CIN network. The calculation process diagram is shown in Figures 5 and 6, and the calculation formula is as follows:

步骤4.3.构造多层向量。将前一层的CIN网络向量与输入向量进行如同步骤4.2的计算，继续构造出下一层CIN网络，计算公式如下：
Step 4.3. Construct multi-layer vectors. The CIN network vector of the previous layer is calculated with the input vector as in step 4.2, and the next layer of CIN network is constructed. The calculation formula is as follows:

步骤4.4.结果压缩。通过按一定系数采用随机取值pooling的方法，压缩每一层CIN网络的结果，得到CIN网络部分的输出。Step 4.4. Result compression: By using a random value pooling method according to a certain coefficient, the result of each layer of the CIN network is compressed to obtain the output of the CIN network part.

步骤4.5.联合训练。构造CTR模型中DNN部分和线性部分，与CIN部分进行联合训练。Step 4.5. Joint training: Construct the DNN part and the linear part of the CTR model, and perform joint training with the CIN part.

步骤4.6.交叉熵计算。使用sigmoid函数联合计算三个部分的输出，并进行交叉熵计算作为整个模型的输出，计算公式如下：
Step 4.6. Cross entropy calculation. Use the sigmoid function to jointly calculate the outputs of the three parts, and perform cross entropy calculation as the output of the entire model. The calculation formula is as follows:

步骤4.7.正则化目标函数。对目标函数进行正则化处理，其中深度神经网络部分添加Batch Normalization层以提高泛化能力，对神经元使用Dropout防止训练过拟合并采用L2正则化；而Embedding层和CIN网络部分只采用L2正则化。Step 4.7. Regularize the objective function. Regularize the objective function. In the deep neural network part, add a Batch Normalization layer to improve the generalization ability, use Dropout for neurons to prevent overfitting and use L2 regularization; while only use L2 regularization for the Embedding layer and CIN network part.

步骤5.1.初步筛选。通过用户所选类别，将相应推荐内容放入基于LR的CTR模型中进行初步筛选。Step 5.1. Preliminary screening: Based on the category selected by the user, the corresponding recommended content is put into the LR-based CTR model for preliminary screening.

步骤5.2.二次筛选。将初步筛选后的结果，送入基于CIN的CTR模型中进行最终筛选，最终按照输出的概率排序，取得前20的结果作为最终输出。 Step 5.2. Secondary screening: The results after the initial screening are sent to the CIN-based CTR model for final screening, and finally sorted according to the output probability, and the top 20 results are obtained as the final output.

Claims

An intelligent recommendation method based on user behavior characteristics, characterized by comprising the following steps:

Step 1: Collect user data; the user data includes basic information, implicit information, and behavior information of the user;

Step 2, shaping the user data;

Step 3: Construct a CTR model based on LR; Based on the logistic regression algorithm, construct a CTR model;

Step 4: Build a CTR model based on CIN; Build a CTR model based on the cross network principle;

Step 5: Apply the model to filter the recommendations; use the two-layer recommendation model to filter and obtain the recommendation results.

An intelligent recommendation method based on user behavior characteristics, characterized in that in step 3, a CTR model based on LR is constructed; based on a logistic regression algorithm, a CTR model is constructed; and step 3 further includes:

Step 3.1, input information features: select the user's basic information features, the user's implicit information features, the user's behavior information features, whether to click on the label information and product information as model input;

Step 3.2, combine information features: combine these features using a first-order linear relationship function;

Step 3.3, perform training calculation: according to the actual result of whether the click is made, perform training calculation to obtain the cross entropy of the linear calculation result.

An intelligent recommendation method based on user behavior characteristics, characterized in that in step 4, a CTR model based on CIN is constructed; based on the cross network principle, a CTR model is constructed; and step 4 further includes:

Step 4.1, input information features; select the user's basic information features, the user's implicit information features, the user's behavior information features, whether to click on the tag information and the product information as model input;

Step 4.2, construct the first layer vector; construct the output vector of the first layer compressed interaction network CIN, this step includes:

Step 4.2.1, calculate the Hadamard product of the embedded input vector with itself to obtain multiple two-dimensional matrices;

Step 4.2.2, multiply multiple two-dimensional matrices by the corresponding coefficient matrix to obtain a feature map vector;

Step 4.2.3, repeat step 4.2.2 as needed to obtain multiple feature map vectors to form the first layer of CIN network;

Step 4.3, construct a multi-layer vector; perform the calculation of the CIN network vector of the previous layer and the input vector as in step 4.2, and continue to construct the next layer of CIN network;

Step 4.4, compress the result; by using a random value pooling method according to a certain coefficient, compress the result of each layer of the CIN network to obtain the output of the CIN network part;

Step 4.5, joint training: construct the DNN part and the linear part of the CTR model, and perform joint training with the CIN part;

Step 4.6, calculate cross entropy; use the sigmoid function to jointly calculate the output of the three parts and perform cross entropy calculation As the output of the entire model;

Step 4.7, regularize the objective function; regularize the objective function, add a Batch Normalization layer to the deep neural network part to improve the generalization ability, use Dropout on neurons to prevent overfitting in training and use L2 regularization; while only L2 regularization is used in the Embedding layer and CIN network part.

An intelligent recommendation method based on user behavior characteristics, characterized in that in step 5, a model is applied to filter the recommendation; a two-layer recommendation model is used to filter and obtain the recommendation result; step 5 further includes:

Step 5.1, preliminary screening: according to the category selected by the user, the corresponding recommended content is put into the LR-based CTR model for preliminary screening;

Step 5.2, secondary screening: the results after the preliminary screening are sent to the CIN-based CTR model for final screening, and finally sorted according to the output probability, and the top 20 results are obtained as the final output.