CN109741114A - A kind of user under big data financial scenario buys prediction technique - Google Patents
A kind of user under big data financial scenario buys prediction technique Download PDFInfo
- Publication number
- CN109741114A CN109741114A CN201910021428.7A CN201910021428A CN109741114A CN 109741114 A CN109741114 A CN 109741114A CN 201910021428 A CN201910021428 A CN 201910021428A CN 109741114 A CN109741114 A CN 109741114A
- Authority
- CN
- China
- Prior art keywords
- user
- training
- data
- features
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims description 115
- 230000006399 behavior Effects 0.000 claims description 42
- 230000004927 fusion Effects 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 27
- 238000007637 random forest analysis Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 17
- 238000010276 construction Methods 0.000 claims description 9
- 230000010354 integration Effects 0.000 claims description 9
- 230000002159 abnormal effect Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000007787 long-term memory Effects 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims 1
- 230000015654 memory Effects 0.000 claims 1
- 238000009825 accumulation Methods 0.000 abstract description 3
- 235000019580 granularity Nutrition 0.000 description 12
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003442 weekly effect Effects 0.000 description 1
- 210000002268 wool Anatomy 0.000 description 1
Landscapes
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention belongs to the users under financial scenario to buy prediction field, user under specially a kind of big data financial scenario buys prediction technique, the method includes under big data financial scenario, it pre-processed by the history consumer behavior data to financial platform APP, divide data set, Feature Engineering, algorithm model building carry out prediction client in following ten days, if the discount coupon on credit card platform APP can be bought;The present invention utilizes the model integrated of lifting feature correlation, accurately prediction client will be within ten days futures, whether discount coupon credit card platform APP on can be bought, help credit card while constantly branching out with scene, also it can pass through data accumulation and data-driven, user's value information and consumption demand are actively captured, data value is played, provides the user with more accurate service.
Description
Technical Field
The invention relates to the field of user purchase prediction in a financial scene, in particular to a user purchase prediction method in a big data financial scene.
Background
The user purchasing prediction is an important and fire-heat research field all the time, and how to accurately predict the future purchasing behavior of the user has important significance for capturing user value information and consumption requirements of enterprises.
On one hand, under the existing technology, how to predict whether a user purchases products in a financial scene does not provide an effective solution, and the existing technology mostly comprises user mobile purchase prediction, high-potential user mining, user reputation prediction and the like, however, under the condition of high-speed development of big data, the user purchase prediction based on the financial background is very important, and the continuous expansion of services and scenes in a financial credit card can be promoted, and the traditional service mode is skipped;
on the other hand, the degree of prediction of the conventional purchase prediction method is not accurate enough, and further improvement is required.
Disclosure of Invention
Based on the problems in the prior art, the invention provides a user purchase prediction method in a big data financial scene, and the method can be expanded to the scenes of risk prediction of default users in the big data financial scene and identification of a wool party group in users in the financial scene.
The invention provides a user purchase prediction method under a big data financial scene, which comprises the following steps:
step 101: acquiring historical behavior data of a financial user from a financial platform APP, and preprocessing the data;
step 102: dividing the preprocessed historical behavior data into a plurality of overlapped expansion training sets and a plurality of non-overlapped expansion training sets;
step 103: respectively carrying out feature engineering operation on each extended training set to construct features of different categories;
step 104: carrying out balance training on each extended training set by adopting an unbalanced training mode so as to obtain a series of balanced training subsets;
step 105: grafting each balanced training subset to form a balanced sample set, outputting a test result through a training model, and grafting the test result which is determined to be reliable and the balanced sample set to form a training set to be used in prediction;
step 106: constructing a model integration scheme for improving the characteristic correlation, constructing a plurality of models, and forming a fusion structure; in the fusion structure, whether a user purchases a coupon on a financial platform APP within a future number of days is predicted according to user historical behavior data prediction in a financial consumption scene, namely, a threshold value is set according to the predicted probability to output a prediction result as whether the coupon is purchased; if the predicted threshold value is larger than or equal to the set threshold value, the fact that the customer has a high probability of purchasing the coupons on the financial platform APP within the future days is indicated.
Further, the data is preprocessed by abnormal value processing, missing value processing and repeated value processing; the outlier processing comprises linear interpolation filling scheme processing or mode replacement; the missing value processing comprises multi-dimensional processing, namely counting the number of the missing values according to columns, dividing the number by the total number of the columns to calculate the missing ratio of each column, and adding the missing ratio into a characteristic system; adding the deletion ratio into a characteristic system, namely keeping the original non-number NaN type value of the deletion value, and constructing the deletion ratio to represent the deletion degree; the repeated value processing includes simplified processing of user information having the same meaning, with more than character being eliminated.
Where a Not a Number (NaN) is a type of value of a numeric data type in computer science, representing an undefined or unrepresentable value.
Further, the overlapped extended training set comprises setting the label interval to be N days, and sliding the window forward every time the characteristic interval isDays, areas where periods between their training sets have overlap; the non-overlapping type expansion training set comprises an area, wherein a label interval is set to be N days, a forward sliding window of a characteristic interval is N days each time, and the period between the training sets is non-overlapping.
Further, the feature engineering operation is respectively performed on each extended training set, and the construction of different types of features includes construction of user information features, user consumption business features, financial APP operation behavior log features, and granularity features.
Further, the user information characteristics comprise that desensitized data are combined by polynomial construction, and non-desensitized data are subjected to discrete processing by a characteristic extraction method one-hot, and then a discrete result is expanded by one hundred times according to the maximum and minimum normalization operation to serve as normalization characteristics;
the user consumption business characteristics comprise user loan times, order amount, order count, user loan credit level ranking, user loan amount and user loan rate in the user historical behavior data;
the financial APP operational behavior log features include
The granularity features comprise granularity extraction features according to different days and granularity extraction features according to different hours.
Further, the adoption of the unbalanced training mode carries out balanced training on each extended training set, so that a series of balanced training subsets are obtained, and a reasonable proportion of a large class training subset and a small class training subset is determined according to the requirement of cost sensitive learning; combining the disjoint large class training subsets with the small class training subsets to form a series of balanced training subsets.
Further, the step 104 specifically includes training the test set by using a model CatBoost; and regarding the data with the higher accuracy in the test result as a real and reliable balanced sample set, grafting the balanced sample set and the test set result, and finally finishing grafting to obtain a training set to be used for prediction.
Further, the step 106 includes constructing a plurality of models, including two gradient lifting algorithm models, two random forest models and a long-term and short-term memory neural network model; and constructing a four-layer fusion structure, and obtaining a final result whether the user purchases the coupon on the financial platform APP or not by using a fusion formula according to the fusion structure.
Further, each layer in the fusion structure outputs a fusion result or training characteristics as the next layer;
wherein,
training multidimensional characteristics by using a first random forest model in a first layer, and taking an output result as a new list of characteristics;
on the second layer, two gradient lifting algorithm models are respectively used for training the multi-dimensional features and the new row of features, wherein the output result of the second feature gradient lifting algorithm model is used as the first result to be fused on the fourth layer;
a third layer, training a second random forest model as a second result to be fused by using the output result of the first feature gradient lifting algorithm model and the original multi-dimensional features, and training a long-term memory neural network model as a third result to be fused by using the original multi-dimensional features; and obtaining the final result of each fusion result according to the fusion formula.
Preferably, the fusion formula is expressed as:
answer=0.25×RF_2+0.4×LSTM+0.35×CatBoost_2
wherein answer represents the final result after fusion; RF _2 represents the second RF output result; LSTM represents the output result of the LSTM layer; catboost _2 represents the output result of the second Catboost layer.
The invention provides a user purchase prediction method in a big data financial scene. Under the background of big data finance, the current credit card center makes full efforts of attempts and innovations in aspects of financial technologies such as big data wind control, big data consumption and the like, and an integrated big data platform from data collection to data cleaning to data mining and commercial application is constructed. While continuously expanding services and scenes, the credit card actively captures user value information and consumption requirements through data accumulation and data driving, exerts data value and provides more accurate service for users.
The user purchase prediction scheme in the financial scene utilizes big data analysis and machine learning algorithm, wherein the technical innovation comprises the following contents:
in the data dividing part, the data dividing of the traditional sliding window method is improved, and two dividing methods of an overlapped expansion training set and a non-overlapped expansion training set are provided, so that more complete user information is covered, the difference of the feature space of a training sample is improved, and the accuracy of model prediction is greatly enhanced.
In a financial scenario, the purchasing behavior of users is often unbalanced, i.e., a large class of users has no consuming behavior, while a small class of users has consuming behavior. In order to prevent the cost from inclining to the negative class during training, a construction method of a balanced classification subset is provided, and reasonable sample proportion is divided according to cost sensitive learning; meanwhile, in order to avoid data similarity of user samples in the financial field, a data grafting method is designed to improve the diversity of the samples.
Finally, the invention is also an innovation of constructing a model integration method for improving the feature correlation, and the method is a fusion structure with each layer of output as the fusion result of the next layer or training features, so that the feature correlation is greatly enhanced, a better result is integrated, and the purchasing group of the user is accurately excavated.
Based on the above description, the beneficial effects of the invention are as follows:
according to the user purchase prediction method based on the big data financial scene, the effectiveness of the model is guaranteed by adopting the model integration for improving the characteristic correlation, the final output accurate prediction probability value is the probability value of the user purchase in the next 10 days, and whether the user purchases the coupon is accurately predicted according to the probability setting threshold, so that the purposes of capturing the user value information and the consumption demand in the consumption financial scene, exerting the data value and providing more accurate service for the user are achieved; the method has the advantages that the purchasing prediction accuracy of the user is very superior to that of the prior art, and the purchasing trend of the user is fully excavated by combining the financial platform APP, such as the user information of the credit card center and the operation log information of the credit card platform APP, so that the purchasing prediction of the user is accurately carried out, and the financial credit card center is ensured to provide more accurate service for the user.
Drawings
FIG. 1 is a flow chart of a user purchase prediction method based on big data financial scenarios provided by an embodiment of the present invention;
FIG. 2 is a diagram illustrating an embodiment of an extended scheme for an overlapped training set;
FIG. 3 is a diagram illustrating an embodiment of an expansion scheme for a non-overlapping training set;
FIG. 4 is a diagram of an example of a balanced classification subset provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating an example of data grafting operations for sample diversity according to an embodiment of the present invention;
fig. 6 is a flowchart of model integration for improving feature correlation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly and completely apparent, the technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As an alternative, the data sources of the present invention: personal attributes of a customer, credit card consumption data and operation behavior log data accumulated for 60 days (day 1-day 60) on a credit card platform APP of a certain bank are provided by a credit card center, whether the user buys coupons (including meal tickets, movie tickets and the like) on a financial platform APP, namely the credit card platform APP within the next 10 days (day 61-day 70) is predicted, and the prediction result is output as a probability value of the user buying the coupons.
A flow chart of a user purchase prediction method based on big data financial scenario is shown in fig. 1, which includes the following steps:
the step 101 of preprocessing the historical behavior data of the user specifically comprises the following steps:
abnormal value processing: the abnormal phenomena that unknown abnormal values exist in the data, such as an empty value character string 'NAN', an abnormal numerical value type '-999', a character messy code '@ # 5', and a deviation from an actual age '190'. Aiming at null value character strings, abnormal numerical values and character messy codes, a linear interpolation filling scheme is adopted for processing, namely results of the latest 2 numerical values are selected for linear fitting filling; the calculation mode is shown as formula (1):
in the formula (x)1,y1) And (x)2,y2) Representing the two closest sample points of the current sample, α representing the slope of the linear interpolation fit, y being the calculated result value of the outlier fill.
And replacing the mode of all ages by taking the mode of all ages for the abnormal value deviating from the actual age.
Missing value processing: the credit card credit investigation field, the perfection degree of user information influences the credit rating of the user. The missing values are processed in multiple dimensions (different column dimensions) for the credit card user information. And counting the number of the missing values according to the columns (attributes), dividing the number by the total number of the columns to calculate the missing ratio of each column, and adding the missing ratio into the characteristic system. The deletion ratio is added into the characteristic system, and the calculation mode is shown as the formula (2):
in the formula, xiThe number of missing values of an attribute column in the data set, Count is the number of sample sets, MissRateiThe attribute column missing rate in the data set;
repeated value processing: main contents are reserved aiming at the repeated value processing method, and men and males in the gender are replaced by the men, so that the redundancy of data is reduced;
as an optional way, the present embodiment further adds an amount processing: when a comma is included in the amount, a "case" is used in the replacement data, and "1,230" or "1230" may exist in the data, the data including the comma is recognized as a character string format by default after the data is read, and the amount is processed by replacing the comma "in the data, and forcibly converting the result into a numerical value.
The step 102 of dividing the overlapped extended training set and the non-overlapped extended training set according to the historical behaviors comprises the following specific steps: as shown in fig. 2, the overlapped extended training set has a tag interval of 10 days, a forward sliding window of a feature interval of 1 day each time, and periods between training sets have overlapped regions, so as to cover more complete user information, enhance a model prediction result, and cause a certain sample similarity to be higher. And no overlapping extended training set: the label interval is 10 days, the sliding window of the characteristic interval is 10 days before, and the period between the training sets is a non-overlapped area, so that the difference of the characteristic space of the training sample is improved, and the generalization capability of the model is enhanced. 6 overlapped extended training sets are constructed, 2 non-overlapped extended training sets are constructed, and 8 training sets are calculated.
In the embodiment, the label interval of the overlapped extended training set is 10 days, the forward sliding window of each characteristic interval is 1 day, 6 training sets in total from a day period [1-45,46-60] to [1-50,51-60] are constructed, and the test set is constructed into [1-60,61-70 ];
in this embodiment, the non-overlapping extended training set is shown in fig. 3, the tag interval is 10 days, the forward sliding window of the feature interval is 10 days each time, and the periods between the training sets are non-overlapping areas, so that 2 training sets of [1-50,51-60] and [1-40,41-50] are constructed, and the test set construction is the same as above.
The specific steps of performing the feature engineering operation on the user historical data in the step 103 are as follows:
the construction of feature engineering mainly comprises the following four aspects,
1) the user information characteristics are as follows: directly putting the attribute characteristics of the numerical type into a characteristic system, wherein the attribute characteristics of the numerical type are desensitized data, and constructing simple combined characteristics by using addition, subtraction, multiplication and division polynomials; for discrete and non-desensitized attribute characterization: and (3) performing one-hot discrete processing on the user reputation grade, gender, age, marital status and receiving address area, and enlarging the result by one hundred times according to the maximum and minimum normalization operation to serve as features so as to enhance the difference among the features.
2) The user consumption service characteristics are as follows: the service performance of the user is mainly enhanced, and the service performance comprises the user loan times (extracted according to the last 60 days, 30 days, 20 days and 10 days), the order amount (counted according to the average, the variance, the kurtosis, the skewness and the mode), the order count (extracted according to the granularity of morning, noon, evening, weekday and weekend and each week), the user loan credit level ranking feature, the user loan amount and the user loan rate in the historical consumption data of the user.
3) Financial APP operation behavior log characteristics: the method comprises the following steps of: the APP click module EVT _ LBL attribute column represents three levels of the click module, the three levels are split, and dispersion is carried out according to the statistical times; the user time characteristic: the number of days of the last behavior of the module with the maximum number of user pair behaviors from the prediction day, the number of days of the first behavior of the module with the maximum number of user pair behaviors from the prediction day, the maximum continuous behavior number of days of the user, the number of days of the user behavior, the longest/shortest distance between the user behaviors, and the number of days of the first/last behavior of the user from the prediction day.
4) The granularity is characterized in that: mainly processes data in a financial platform APP click module EVT _ LBL. Extracting features at different day granularities (last 60,45,31,21,18,14,10,7,5,4,3,2,1 days) to count how many times the features occur and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 are interacted; the statistics of how many times the feature appeared in total and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 were interacted are counted according to different hour granularities (last 24,21,18,12,6,1 hour).
Specifically, 178 dimensions of credit card personal information characteristics are constructed: aiming at the 30-dimensional attribute characteristics of the numerical type, a combination characteristic is constructed by utilizing an addition, subtraction, multiplication and division polynomial method, as shown in formula 3:
in the formula FiAnd FjFor different attribute columns of the data set, F _ newiFeatures of the polynomial method are used for addition, subtraction, multiplication and division.
For the attribute characteristics of discrete values and non-desensitization, namely user reputation grade, gender, age, marital status and receiving address area, one-hot discrete processing is carried out, the result is expanded by one hundred times according to the maximum and minimum normalization operation as the characteristics, and the calculation method is shown as formula 4:
in the formula, xmin、xmaxCurrent sample characteristic value, minimum value, maximum value, xnewThe final characteristic result is obtained;
constructing 135 dimensions of credit card consumption business characteristics, including user loan times (extracted according to the last 60 days, 30 days, 20 days and 10 days), order amount (counted according to the average, variance, kurtosis, skewness and mode), order count (extracted according to the granularity of morning, noon, evening, weekday, weekend and weekly), user loan credit level ranking characteristics, user amount loan and user loan rate in the user historical consumption data;
establishing 221 dimensions of the operation behavior log feature of the credit card APP, including the discrete feature of the user: the APP click module EVT _ LBL attribute column represents three levels of the click module and is dispersed according to the statistical times;
the user time characteristic comprises the number of days of the last behavior of the module with the maximum number of behaviors of the user from the prediction day, the number of days of the first behavior of the module with the maximum number of behaviors of the user from the prediction day, the maximum continuous behavior number of days of the user, the number of days of the user behavior, the longest/shortest distance between the user behavior and the number of days of the first/last behavior of the user from the prediction day;
extracting 221-dimension by extracting behavior characteristics of each module according to different granularities: respectively counting how many times the feature appears and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 are interacted according to different days granularity extraction features (last 60,45,31,21,18,14,10,7,5,4,3,2 and 1 days); the extraction features (last 24,21,18,12,6 and 1 hours) in different hour granularities are used for respectively counting how many times the feature appears and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 are interacted.
In the step 104, in the construction of the balanced classification subset of the unbalanced training solution, a training set with 8 training set combinations in total is constructed in the step 102, and a feature project is constructed for the training set in the third step, but in a financial scenario, the purchasing behavior of the user is often unbalanced, that is, the large class of users does not have the consuming behavior, and the small class of users has the consuming behavior. Unbalanced data can bring a great negative class skew cost to the training or prediction of the algorithm. The method for constructing the balanced classification subset by adopting the unbalanced training solution comprises the following steps: determining a reasonable proportion of a large class training subset and a small class training subset according to the requirement of cost sensitive learning; and combining the intersected large class training subsets with the small class training subsets to form a series of balanced training subsets.
The method comprises the following specific steps: an exemplary diagram of the construction of the balanced classification subsets is shown in fig. 4, and the specific steps are learning a reasonable class positive-negative sample distribution ratio of 1:2.5 (i.e. the ratio of the large class training subset to the small class training subset) according to the required value of the cost-sensitive learning of 9.7, so that disjoint large class training subsets (large class sample subsets) are decimated by 2.5 times the number of the small class training subsets (small class sample sets) each time, and are combined with the small class training subsets to form a series of balanced training subsets, namely, the balanced sample subset 1 to the balanced sample subset 25.
The step 105 is to solve the data grafting operation of the sample diversity, i.e. the data grafting operation of the sample diversity. The method comprises the steps of grafting training samples and grafting test results. The training sample refers to the grafting of the balanced training subset generated in step 104, the grafting of the test result refers to the grafting of the data set with the accuracy top100 in the training result, and the finally completed grafted training set is the training set to be used in prediction. The method comprises the following specific steps: the data grafting operation of the sample diversity is shown in fig. 5:
firstly, grafting a balance training subset, in the embodiment, training a test set by using a model Catboost; and determining the accuracy top100 data in the test result as a real and reliable training sample, grafting the balance sample and the test set result in the second step, and finally finishing grafting the training set as the training set to be used for prediction.
The step 106 of constructing a model integration scheme for improving the feature correlation, and predicting whether the customer will purchase the coupon on the credit card platform APP within ten days in the future in a financial consumption scene according to the historical behavior data of the customer specifically comprises the following steps: the model integration flow chart for improving the characteristic correlation is shown in fig. 6, the scheme constructs 5 models in total, including a Long-Short-Term neural network (LSTM), a Short-Term neural network (calco) for calco _1 and calco _2, a Random Forest (RF _1 and RF _ 2), and a local-Term neural network (LSTM), and the models include a gradient lifting algorithm for a characteristic of a structural class type, a Random Forest (RF) for LSTM, and the LSTM is a neural network model, and the last two types can be regarded as tree models, so that the model heterogeneity is satisfied. And constructing a model integration scheme for improving feature correlation, constructing 5 models, namely Catboost _1 and Catboost _2, RF _1 and RF _2 and LSTM, constructing a four-layer fusion structure, and outputting each layer as a fusion result or training feature of the next layer. And finally, outputting a prediction probability value representing the probability value of the user purchasing in the future 10 days, setting the probability to be higher than 0.95 when the user purchases the user group with high probability, setting the probability to be the total possible users when the probability is higher than 0.8, and setting the user as not purchasing the user group when the probability is lower than 0.6.
And a four-layer fusion structure is constructed, and each layer outputs a fusion result or training characteristics as the next layer. Specifically, the first layer RF _1 trains 785-dimensional features to output results as a list of features, i.e., features 785+ 1; in the second layer, the output results of the features 785+1 from the training feature 1 of the Catboost _1 are used as a list of features 785+2, and the output result of the Catboost _2 is used as the result to be fused in the fourth layer; next to the third layer, the RF _2 model is trained using the 785+2 and original 785 dimensional features as the result to be fused, and the neural network LSTM is trained using the original 785 dimensional features as the result to be fused. And finally, fusing according to a formula 4 to obtain a final result:
answer=0.25×RF_2+0.4×LSTM+0.35×CatBoost_2 (4)
the model integration scheme for improving the feature correlation draws a staring thought for reference, and obtains a more optimized prediction result by utilizing a mode of promoting correlation learning by using a model.
And finally, outputting a prediction probability value representing the probability value of the user for purchasing the coupon on the credit card platform APP in the future 10 days, wherein the user is a high-probability purchasing user group when the set probability is greater than 0.95, the user is a possible purchasing user group when the probability is greater than 0.8, and the user is determined as a non-purchasing user group when the probability is lower than 0.6, namely, the user is output to purchase the coupon on the credit card platform APP in the future 10 days when the set probability is greater than or equal to the threshold 0.6, and the user is output not to purchase the coupon on the credit card platform APP in the future 10 days when the set probability is less than the threshold 0.6. The credit card center is helped to continuously expand services and scenes, and meanwhile, the credit card center is expected to actively capture user value information and consumption requirements through data accumulation and data driving, exert data values and provide more accurate services for users.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A user purchase prediction method under a big data financial scene is characterized by comprising the following steps:
step 101: acquiring historical behavior data of a financial user from a financial platform APP, and preprocessing the data;
step 102: dividing the preprocessed historical behavior data into a plurality of overlapped expansion training sets and a plurality of non-overlapped expansion training sets;
step 103: respectively carrying out feature engineering operation on each extended training set to construct features of different categories;
step 104: carrying out balance training on each extended training set by adopting an unbalanced training mode so as to obtain a series of balanced training subsets;
step 105: grafting each balanced training subset to form a balanced sample set, outputting a test result through a training model, and grafting the test result which is determined to be reliable and the balanced sample set to form a training set to be used in prediction;
step 106: constructing a model integration scheme for improving the characteristic correlation, namely constructing a plurality of models and forming a fusion structure; in the fusion structure, the probability of a user purchasing a coupon on a financial platform APP under a financial consumption scene is predicted according to the historical behavior data of the user.
2. The method for predicting the purchase of the user in the big data financial scene as claimed in claim 1, wherein the preprocessing of the data comprises abnormal value processing, missing value processing and repeated value processing; the outlier processing comprises linear interpolation filling scheme processing or mode replacement; the missing value processing comprises multidimensional processing, namely counting the number of missing values according to columns, dividing the number by the total number of the columns to calculate the missing ratio of each column, namely keeping the missing values at the original non-number NaN type value, and constructing the missing ratio to represent the missing degree; the repeated value processing includes simplified processing of user information having the same meaning, with more than character being eliminated.
3. The method of claim 1, wherein the lapped expanding training set comprises setting a tag interval to be N days, and sliding a window forward each time the feature interval isDays, areas where periods between their training sets have overlap; the non-overlapped type expansion training set comprises a label interval set to be N days, a forward sliding window of a characteristic interval is N days each time, and the period between the training sets is free from repetitionThe area of the stack.
4. The method of claim 1, wherein the feature engineering operation is performed on each extended training set, and the constructing of different types of features comprises constructing user information features, user consumption business features, financial APP operation behavior log features, and granularity features.
5. The method of claim 4, wherein the user purchase prediction method in big data finance scene,
the user information characteristics comprise that desensitized data are combined by polynomial construction, and un-desensitized data are subjected to discrete processing by a characteristic extraction method one-hot, and then a discrete result is expanded by one hundred times according to the maximum and minimum normalization operation to serve as normalization characteristics;
the user consumption business characteristics are used for enhancing the business performance of the user, and the business performance includes the statistics of user loan times of the user in historical behavior data, the statistics of order amount in historical orders, the counting statistics of the order amount in different time periods, the ranking statistics of loan credit levels of all users, the statistics of user loanable amount and user loan rate;
the financial APP operation behavior log features comprise user discrete features and user time features. The user discrete feature means that a user carries out discrete value calculation on each level of the APP click module according to the statistical times; the user time characteristic refers to statistics of various interval days of the user operation on the APP click module.
The granularity features comprise granularity extraction features according to different days and granularity extraction features according to different hours.
6. The method of claim 1, wherein the performing of the balance training on each extended training set in an unbalanced training manner to obtain a series of balanced training subsets comprises determining a reasonable ratio of a large class training subset to a small class training subset according to the need for cost sensitive learning; combining the disjoint large class training subsets with the small class training subsets to form a series of balanced training subsets.
7. The method according to claim 1, wherein the step 104 specifically includes training a test set using a model Catboost; and regarding the data with the higher accuracy in the test result as a real and reliable balanced sample set, grafting the balanced sample set and the test set result, and finally finishing grafting to obtain a training set to be used for prediction.
8. The method of claim 1, wherein the step 106 comprises constructing a plurality of models, including two gradient boosting algorithm models, two random forest models, and a long-term memory neural network model; and constructing a four-layer fusion structure, and obtaining a final result whether the user purchases the coupon on the financial platform APP or not by using a fusion formula according to the fusion structure.
9. The method according to claim 8, wherein each layer in the fusion structure outputs the fusion result or training feature as the next layer;
wherein,
training multidimensional characteristics by using a first random forest model in a first layer, and taking an output result as a new list of characteristics;
on the second layer, two gradient lifting algorithm models are respectively used for training the multi-dimensional features and the new row of features, wherein the output result of the second feature gradient lifting algorithm model is used as the first result to be fused on the fourth layer;
a third layer, training a second random forest model as a second result to be fused by using the output result of the first feature gradient lifting algorithm model and the original multi-dimensional features, and training a long-term memory neural network model as a third result to be fused by using the original multi-dimensional features; and obtaining the final result of each fusion result according to the fusion formula.
10. The method of claim 8, wherein the fusion formula is expressed as:
answer=0.25×RF_2+0.4×LSTM+0.35×CatBoost_2
wherein answer represents the final result after fusion; RF _2 represents the output result of the second random forest model; the LSTM represents an output result of the long-time memory neural network model; and Catboost _2 represents the output result of the second gradient boost algorithm model.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910021428.7A CN109741114A (en) | 2019-01-10 | 2019-01-10 | A kind of user under big data financial scenario buys prediction technique |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910021428.7A CN109741114A (en) | 2019-01-10 | 2019-01-10 | A kind of user under big data financial scenario buys prediction technique |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109741114A true CN109741114A (en) | 2019-05-10 |
Family
ID=66364256
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910021428.7A Pending CN109741114A (en) | 2019-01-10 | 2019-01-10 | A kind of user under big data financial scenario buys prediction technique |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109741114A (en) |
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110415039A (en) * | 2019-07-31 | 2019-11-05 | 北京三快在线科技有限公司 | Method and device for business processing |
| CN110956528A (en) * | 2019-10-14 | 2020-04-03 | 广东工业大学 | Recommendation method and system for e-commerce platform |
| CN110956497A (en) * | 2019-11-27 | 2020-04-03 | 桂林电子科技大学 | A method for predicting repeated purchase behavior of e-commerce platform users |
| CN110956209A (en) * | 2019-11-28 | 2020-04-03 | 上海风秩科技有限公司 | Model training and predicting method, device, electronic equipment and storage medium |
| CN111538873A (en) * | 2019-12-23 | 2020-08-14 | 浙江大学 | Telecommunication customer churn probability prediction method and system based on end-to-end model |
| CN111583016A (en) * | 2020-04-09 | 2020-08-25 | 上海淇毓信息科技有限公司 | GBST-based user recommendation method and device and electronic equipment |
| CN111914927A (en) * | 2020-07-30 | 2020-11-10 | 北京智能工场科技有限公司 | Mobile app user gender identification method and system for optimizing data imbalance state |
| CN111913940A (en) * | 2020-06-20 | 2020-11-10 | 武汉海云健康科技股份有限公司 | Temperature member label prediction method and device, electronic equipment and storage medium |
| CN112529624A (en) * | 2020-12-15 | 2021-03-19 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating business prediction model |
| CN112861064A (en) * | 2021-01-20 | 2021-05-28 | 重庆第二师范学院 | Social credit evaluation source data processing method, system, terminal and medium |
| CN113052327A (en) * | 2021-03-30 | 2021-06-29 | 北京骑胜科技有限公司 | Data processing method and device, readable storage medium and electronic equipment |
| CN113128739A (en) * | 2019-12-31 | 2021-07-16 | 马上消费金融股份有限公司 | Prediction method of user touch time, prediction model training method and related device |
| CN114548489A (en) * | 2022-01-11 | 2022-05-27 | 山东锋士信息技术有限公司 | Crop pest and disease damage prediction method and system |
| CN115439079A (en) * | 2022-07-27 | 2022-12-06 | 中银金融科技有限公司 | Item classification method and device |
| CN115860411A (en) * | 2022-12-20 | 2023-03-28 | 广西电网有限责任公司 | Method for predicting user demand based on power user behavior |
| CN116362791A (en) * | 2023-04-04 | 2023-06-30 | 平安银行股份有限公司 | Credit card equity sale price adjustment method, system, equipment and storage medium |
| CN116843377A (en) * | 2023-07-25 | 2023-10-03 | 河北鑫考科技股份有限公司 | Consumption behavior prediction method, device, equipment and medium based on big data |
| CN119338530A (en) * | 2024-10-21 | 2025-01-21 | 广州钛动科技股份有限公司 | Advertisement crowd expansion method, device, equipment and medium based on CatBoost model |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015189768A1 (en) * | 2014-06-10 | 2015-12-17 | Berengueres Jose Oriol Lopez | Method and system for forecasting activities of passengers in an airline loyalty program |
| CN107301562A (en) * | 2017-05-16 | 2017-10-27 | 重庆邮电大学 | A kind of O2O reward vouchers use big data Forecasting Methodology |
| CN107316108A (en) * | 2017-06-19 | 2017-11-03 | 华南理工大学 | A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology |
| CN107909433A (en) * | 2017-11-14 | 2018-04-13 | 重庆邮电大学 | A kind of Method of Commodity Recommendation based on big data mobile e-business |
| CN107944913A (en) * | 2017-11-21 | 2018-04-20 | 重庆邮电大学 | High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis |
| CN109034658A (en) * | 2018-08-22 | 2018-12-18 | 重庆邮电大学 | A kind of promise breaking consumer's risk prediction technique based on big data finance |
-
2019
- 2019-01-10 CN CN201910021428.7A patent/CN109741114A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015189768A1 (en) * | 2014-06-10 | 2015-12-17 | Berengueres Jose Oriol Lopez | Method and system for forecasting activities of passengers in an airline loyalty program |
| CN107301562A (en) * | 2017-05-16 | 2017-10-27 | 重庆邮电大学 | A kind of O2O reward vouchers use big data Forecasting Methodology |
| CN107316108A (en) * | 2017-06-19 | 2017-11-03 | 华南理工大学 | A kind of citizens' activities public bus network chooses sliding window multiple features Forecasting Methodology |
| CN107909433A (en) * | 2017-11-14 | 2018-04-13 | 重庆邮电大学 | A kind of Method of Commodity Recommendation based on big data mobile e-business |
| CN107944913A (en) * | 2017-11-21 | 2018-04-20 | 重庆邮电大学 | High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis |
| CN109034658A (en) * | 2018-08-22 | 2018-12-18 | 重庆邮电大学 | A kind of promise breaking consumer's risk prediction technique based on big data finance |
Non-Patent Citations (2)
| Title |
|---|
| KEYPIG_ZZ: "XGBoost+LightGBM+LSTM:一次机器学习比赛中的高分模型方案", 《HTTPS://BLOG.CSDN.NET/KEYPIG_ZZ/ARTICLE/DETAILS/82819558》 * |
| 蒋良孝,李超群: "《贝叶斯网络分类器算法与应用》", 31 December 2015 * |
Cited By (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110415039A (en) * | 2019-07-31 | 2019-11-05 | 北京三快在线科技有限公司 | Method and device for business processing |
| CN110956528A (en) * | 2019-10-14 | 2020-04-03 | 广东工业大学 | Recommendation method and system for e-commerce platform |
| CN110956528B (en) * | 2019-10-14 | 2022-11-04 | 广东工业大学 | Recommendation method and system for e-commerce platform |
| CN110956497A (en) * | 2019-11-27 | 2020-04-03 | 桂林电子科技大学 | A method for predicting repeated purchase behavior of e-commerce platform users |
| CN110956209A (en) * | 2019-11-28 | 2020-04-03 | 上海风秩科技有限公司 | Model training and predicting method, device, electronic equipment and storage medium |
| CN110956209B (en) * | 2019-11-28 | 2024-03-26 | 上海秒针网络科技有限公司 | Model training and predicting method and device, electronic equipment and storage medium |
| CN111538873A (en) * | 2019-12-23 | 2020-08-14 | 浙江大学 | Telecommunication customer churn probability prediction method and system based on end-to-end model |
| CN113128739A (en) * | 2019-12-31 | 2021-07-16 | 马上消费金融股份有限公司 | Prediction method of user touch time, prediction model training method and related device |
| CN111583016A (en) * | 2020-04-09 | 2020-08-25 | 上海淇毓信息科技有限公司 | GBST-based user recommendation method and device and electronic equipment |
| CN111913940A (en) * | 2020-06-20 | 2020-11-10 | 武汉海云健康科技股份有限公司 | Temperature member label prediction method and device, electronic equipment and storage medium |
| CN111913940B (en) * | 2020-06-20 | 2024-04-26 | 武汉海云健康科技股份有限公司 | Temperature membership tag prediction method and device, electronic equipment and storage medium |
| CN111914927A (en) * | 2020-07-30 | 2020-11-10 | 北京智能工场科技有限公司 | Mobile app user gender identification method and system for optimizing data imbalance state |
| CN112529624A (en) * | 2020-12-15 | 2021-03-19 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating business prediction model |
| CN112529624B (en) * | 2020-12-15 | 2024-01-09 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating business prediction model |
| CN112861064A (en) * | 2021-01-20 | 2021-05-28 | 重庆第二师范学院 | Social credit evaluation source data processing method, system, terminal and medium |
| CN112861064B (en) * | 2021-01-20 | 2023-02-03 | 重庆第二师范学院 | A social credit evaluation source data processing method, system, terminal and medium |
| CN113052327A (en) * | 2021-03-30 | 2021-06-29 | 北京骑胜科技有限公司 | Data processing method and device, readable storage medium and electronic equipment |
| CN113052327B (en) * | 2021-03-30 | 2024-04-19 | 北京骑胜科技有限公司 | Data processing method, device, readable storage medium and electronic equipment |
| CN114548489A (en) * | 2022-01-11 | 2022-05-27 | 山东锋士信息技术有限公司 | Crop pest and disease damage prediction method and system |
| CN115439079A (en) * | 2022-07-27 | 2022-12-06 | 中银金融科技有限公司 | Item classification method and device |
| CN115860411A (en) * | 2022-12-20 | 2023-03-28 | 广西电网有限责任公司 | Method for predicting user demand based on power user behavior |
| CN116362791A (en) * | 2023-04-04 | 2023-06-30 | 平安银行股份有限公司 | Credit card equity sale price adjustment method, system, equipment and storage medium |
| CN116843377A (en) * | 2023-07-25 | 2023-10-03 | 河北鑫考科技股份有限公司 | Consumption behavior prediction method, device, equipment and medium based on big data |
| CN119338530A (en) * | 2024-10-21 | 2025-01-21 | 广州钛动科技股份有限公司 | Advertisement crowd expansion method, device, equipment and medium based on CatBoost model |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109741114A (en) | A kind of user under big data financial scenario buys prediction technique | |
| US11995112B2 (en) | System and method for information recommendation | |
| CN109509033B (en) | Big data prediction method for user purchasing behavior in consumption financial scene | |
| Berry et al. | Data mining techniques | |
| CN112541817A (en) | Marketing response processing method and system for potential customers of personal consumption loan | |
| US20120036085A1 (en) | Social media variable analytical system | |
| US20230222536A1 (en) | Campaign management platform | |
| US20180285748A1 (en) | Performance metric prediction for delivery of electronic media content items | |
| US20240370898A1 (en) | Self-learning systems and methods for digital content selection and generation using generative ai | |
| CN110704706B (en) | Training method and classification method of classification model, related equipment and classification system | |
| CN114493686A (en) | A method and device for generating and pushing operation content | |
| CN111429214B (en) | Transaction data-based buyer and seller matching method and device | |
| CN114331495B (en) | Multimedia data processing method, device, equipment and storage medium | |
| CN113935780B (en) | Customer loss risk prediction method based on survival analysis and related equipment thereof | |
| CN116800831A (en) | Service data pushing method, device, storage medium and processor | |
| CN112330373A (en) | User behavior analysis method, device and computer-readable storage medium | |
| US20240346577A1 (en) | Generating dynamic base limit value user interface elements determined from a base limit value model | |
| Leventhal | Predictive Analytics for Marketers: Using Data Mining for Business Advantage | |
| TWI837066B (en) | Information processing devices, methods and program products | |
| JP2022154885A (en) | Learning method, learning model, prediction method, prediction device and computer program | |
| US20210365994A1 (en) | System and Method for Predicting an Anticipated Transaction | |
| US20240070722A1 (en) | System and method for providing people-based audience planning | |
| Krishna et al. | Cultivating customer purchase intent: Leveraging machine learning for precise predictions | |
| CN115098684B (en) | 5G user identification network model establishment method, device and storage medium | |
| CN117196698A (en) | Data processing method, device, terminal equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190510 |
|
| RJ01 | Rejection of invention patent application after publication |