CN109741114A

CN109741114A - A kind of user under big data financial scenario buys prediction technique

Info

Publication number: CN109741114A
Application number: CN201910021428.7A
Authority: CN
Inventors: 童毅; 周波依
Original assignee: Bola Network Co Ltd
Current assignee: Bola Network Co Ltd
Priority date: 2019-01-10
Filing date: 2019-01-10
Publication date: 2019-05-10

Abstract

The invention belongs to the users under financial scenario to buy prediction field, user under specially a kind of big data financial scenario buys prediction technique, the method includes under big data financial scenario, it pre-processed by the history consumer behavior data to financial platform APP, divide data set, Feature Engineering, algorithm model building carry out prediction client in following ten days, if the discount coupon on credit card platform APP can be bought；The present invention utilizes the model integrated of lifting feature correlation, accurately prediction client will be within ten days futures, whether discount coupon credit card platform APP on can be bought, help credit card while constantly branching out with scene, also it can pass through data accumulation and data-driven, user's value information and consumption demand are actively captured, data value is played, provides the user with more accurate service.

Description

User purchase prediction method in big data financial scene

Technical Field

The invention relates to the field of user purchase prediction in a financial scene, in particular to a user purchase prediction method in a big data financial scene.

Background

The user purchasing prediction is an important and fire-heat research field all the time, and how to accurately predict the future purchasing behavior of the user has important significance for capturing user value information and consumption requirements of enterprises.

On one hand, under the existing technology, how to predict whether a user purchases products in a financial scene does not provide an effective solution, and the existing technology mostly comprises user mobile purchase prediction, high-potential user mining, user reputation prediction and the like, however, under the condition of high-speed development of big data, the user purchase prediction based on the financial background is very important, and the continuous expansion of services and scenes in a financial credit card can be promoted, and the traditional service mode is skipped;

on the other hand, the degree of prediction of the conventional purchase prediction method is not accurate enough, and further improvement is required.

Disclosure of Invention

Based on the problems in the prior art, the invention provides a user purchase prediction method in a big data financial scene, and the method can be expanded to the scenes of risk prediction of default users in the big data financial scene and identification of a wool party group in users in the financial scene.

The invention provides a user purchase prediction method under a big data financial scene, which comprises the following steps:

step 101: acquiring historical behavior data of a financial user from a financial platform APP, and preprocessing the data;

step 102: dividing the preprocessed historical behavior data into a plurality of overlapped expansion training sets and a plurality of non-overlapped expansion training sets;

step 103: respectively carrying out feature engineering operation on each extended training set to construct features of different categories;

step 104: carrying out balance training on each extended training set by adopting an unbalanced training mode so as to obtain a series of balanced training subsets;

step 105: grafting each balanced training subset to form a balanced sample set, outputting a test result through a training model, and grafting the test result which is determined to be reliable and the balanced sample set to form a training set to be used in prediction;

step 106: constructing a model integration scheme for improving the characteristic correlation, constructing a plurality of models, and forming a fusion structure; in the fusion structure, whether a user purchases a coupon on a financial platform APP within a future number of days is predicted according to user historical behavior data prediction in a financial consumption scene, namely, a threshold value is set according to the predicted probability to output a prediction result as whether the coupon is purchased; if the predicted threshold value is larger than or equal to the set threshold value, the fact that the customer has a high probability of purchasing the coupons on the financial platform APP within the future days is indicated.

Further, the data is preprocessed by abnormal value processing, missing value processing and repeated value processing; the outlier processing comprises linear interpolation filling scheme processing or mode replacement; the missing value processing comprises multi-dimensional processing, namely counting the number of the missing values according to columns, dividing the number by the total number of the columns to calculate the missing ratio of each column, and adding the missing ratio into a characteristic system; adding the deletion ratio into a characteristic system, namely keeping the original non-number NaN type value of the deletion value, and constructing the deletion ratio to represent the deletion degree; the repeated value processing includes simplified processing of user information having the same meaning, with more than character being eliminated.

Where a Not a Number (NaN) is a type of value of a numeric data type in computer science, representing an undefined or unrepresentable value.

Further, the overlapped extended training set comprises setting the label interval to be N days, and sliding the window forward every time the characteristic interval isDays, areas where periods between their training sets have overlap; the non-overlapping type expansion training set comprises an area, wherein a label interval is set to be N days, a forward sliding window of a characteristic interval is N days each time, and the period between the training sets is non-overlapping.

Further, the feature engineering operation is respectively performed on each extended training set, and the construction of different types of features includes construction of user information features, user consumption business features, financial APP operation behavior log features, and granularity features.

Further, the user information characteristics comprise that desensitized data are combined by polynomial construction, and non-desensitized data are subjected to discrete processing by a characteristic extraction method one-hot, and then a discrete result is expanded by one hundred times according to the maximum and minimum normalization operation to serve as normalization characteristics;

the user consumption business characteristics comprise user loan times, order amount, order count, user loan credit level ranking, user loan amount and user loan rate in the user historical behavior data;

the financial APP operational behavior log features include

The granularity features comprise granularity extraction features according to different days and granularity extraction features according to different hours.

Further, the adoption of the unbalanced training mode carries out balanced training on each extended training set, so that a series of balanced training subsets are obtained, and a reasonable proportion of a large class training subset and a small class training subset is determined according to the requirement of cost sensitive learning; combining the disjoint large class training subsets with the small class training subsets to form a series of balanced training subsets.

Further, the step 104 specifically includes training the test set by using a model CatBoost; and regarding the data with the higher accuracy in the test result as a real and reliable balanced sample set, grafting the balanced sample set and the test set result, and finally finishing grafting to obtain a training set to be used for prediction.

Further, the step 106 includes constructing a plurality of models, including two gradient lifting algorithm models, two random forest models and a long-term and short-term memory neural network model; and constructing a four-layer fusion structure, and obtaining a final result whether the user purchases the coupon on the financial platform APP or not by using a fusion formula according to the fusion structure.

Further, each layer in the fusion structure outputs a fusion result or training characteristics as the next layer;

wherein,

training multidimensional characteristics by using a first random forest model in a first layer, and taking an output result as a new list of characteristics;

on the second layer, two gradient lifting algorithm models are respectively used for training the multi-dimensional features and the new row of features, wherein the output result of the second feature gradient lifting algorithm model is used as the first result to be fused on the fourth layer;

a third layer, training a second random forest model as a second result to be fused by using the output result of the first feature gradient lifting algorithm model and the original multi-dimensional features, and training a long-term memory neural network model as a third result to be fused by using the original multi-dimensional features; and obtaining the final result of each fusion result according to the fusion formula.

Preferably, the fusion formula is expressed as:

answer＝0.25×RF_2+0.4×LSTM+0.35×CatBoost_2

wherein answer represents the final result after fusion; RF _2 represents the second RF output result; LSTM represents the output result of the LSTM layer; catboost _2 represents the output result of the second Catboost layer.

The invention provides a user purchase prediction method in a big data financial scene. Under the background of big data finance, the current credit card center makes full efforts of attempts and innovations in aspects of financial technologies such as big data wind control, big data consumption and the like, and an integrated big data platform from data collection to data cleaning to data mining and commercial application is constructed. While continuously expanding services and scenes, the credit card actively captures user value information and consumption requirements through data accumulation and data driving, exerts data value and provides more accurate service for users.

The user purchase prediction scheme in the financial scene utilizes big data analysis and machine learning algorithm, wherein the technical innovation comprises the following contents:

in the data dividing part, the data dividing of the traditional sliding window method is improved, and two dividing methods of an overlapped expansion training set and a non-overlapped expansion training set are provided, so that more complete user information is covered, the difference of the feature space of a training sample is improved, and the accuracy of model prediction is greatly enhanced.

In a financial scenario, the purchasing behavior of users is often unbalanced, i.e., a large class of users has no consuming behavior, while a small class of users has consuming behavior. In order to prevent the cost from inclining to the negative class during training, a construction method of a balanced classification subset is provided, and reasonable sample proportion is divided according to cost sensitive learning; meanwhile, in order to avoid data similarity of user samples in the financial field, a data grafting method is designed to improve the diversity of the samples.

Finally, the invention is also an innovation of constructing a model integration method for improving the feature correlation, and the method is a fusion structure with each layer of output as the fusion result of the next layer or training features, so that the feature correlation is greatly enhanced, a better result is integrated, and the purchasing group of the user is accurately excavated.

Based on the above description, the beneficial effects of the invention are as follows:

according to the user purchase prediction method based on the big data financial scene, the effectiveness of the model is guaranteed by adopting the model integration for improving the characteristic correlation, the final output accurate prediction probability value is the probability value of the user purchase in the next 10 days, and whether the user purchases the coupon is accurately predicted according to the probability setting threshold, so that the purposes of capturing the user value information and the consumption demand in the consumption financial scene, exerting the data value and providing more accurate service for the user are achieved; the method has the advantages that the purchasing prediction accuracy of the user is very superior to that of the prior art, and the purchasing trend of the user is fully excavated by combining the financial platform APP, such as the user information of the credit card center and the operation log information of the credit card platform APP, so that the purchasing prediction of the user is accurately carried out, and the financial credit card center is ensured to provide more accurate service for the user.

Drawings

FIG. 1 is a flow chart of a user purchase prediction method based on big data financial scenarios provided by an embodiment of the present invention;

FIG. 2 is a diagram illustrating an embodiment of an extended scheme for an overlapped training set;

FIG. 3 is a diagram illustrating an embodiment of an expansion scheme for a non-overlapping training set;

FIG. 4 is a diagram of an example of a balanced classification subset provided by an embodiment of the present invention;

FIG. 5 is a diagram illustrating an example of data grafting operations for sample diversity according to an embodiment of the present invention;

fig. 6 is a flowchart of model integration for improving feature correlation according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly and completely apparent, the technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

As an alternative, the data sources of the present invention: personal attributes of a customer, credit card consumption data and operation behavior log data accumulated for 60 days (day 1-day 60) on a credit card platform APP of a certain bank are provided by a credit card center, whether the user buys coupons (including meal tickets, movie tickets and the like) on a financial platform APP, namely the credit card platform APP within the next 10 days (day 61-day 70) is predicted, and the prediction result is output as a probability value of the user buying the coupons.

A flow chart of a user purchase prediction method based on big data financial scenario is shown in fig. 1, which includes the following steps:

the step 101 of preprocessing the historical behavior data of the user specifically comprises the following steps:

abnormal value processing: the abnormal phenomena that unknown abnormal values exist in the data, such as an empty value character string 'NAN', an abnormal numerical value type '-999', a character messy code '@ # 5', and a deviation from an actual age '190'. Aiming at null value character strings, abnormal numerical values and character messy codes, a linear interpolation filling scheme is adopted for processing, namely results of the latest 2 numerical values are selected for linear fitting filling; the calculation mode is shown as formula (1):

in the formula (x)₁,y₁) And (x)₂,y₂) Representing the two closest sample points of the current sample, α representing the slope of the linear interpolation fit, y being the calculated result value of the outlier fill.

And replacing the mode of all ages by taking the mode of all ages for the abnormal value deviating from the actual age.

Missing value processing: the credit card credit investigation field, the perfection degree of user information influences the credit rating of the user. The missing values are processed in multiple dimensions (different column dimensions) for the credit card user information. And counting the number of the missing values according to the columns (attributes), dividing the number by the total number of the columns to calculate the missing ratio of each column, and adding the missing ratio into the characteristic system. The deletion ratio is added into the characteristic system, and the calculation mode is shown as the formula (2):

in the formula, x_iThe number of missing values of an attribute column in the data set, Count is the number of sample sets, MissRate_iThe attribute column missing rate in the data set;

repeated value processing: main contents are reserved aiming at the repeated value processing method, and men and males in the gender are replaced by the men, so that the redundancy of data is reduced;

as an optional way, the present embodiment further adds an amount processing: when a comma is included in the amount, a "case" is used in the replacement data, and "1,230" or "1230" may exist in the data, the data including the comma is recognized as a character string format by default after the data is read, and the amount is processed by replacing the comma "in the data, and forcibly converting the result into a numerical value.

The step 102 of dividing the overlapped extended training set and the non-overlapped extended training set according to the historical behaviors comprises the following specific steps: as shown in fig. 2, the overlapped extended training set has a tag interval of 10 days, a forward sliding window of a feature interval of 1 day each time, and periods between training sets have overlapped regions, so as to cover more complete user information, enhance a model prediction result, and cause a certain sample similarity to be higher. And no overlapping extended training set: the label interval is 10 days, the sliding window of the characteristic interval is 10 days before, and the period between the training sets is a non-overlapped area, so that the difference of the characteristic space of the training sample is improved, and the generalization capability of the model is enhanced. 6 overlapped extended training sets are constructed, 2 non-overlapped extended training sets are constructed, and 8 training sets are calculated.

In the embodiment, the label interval of the overlapped extended training set is 10 days, the forward sliding window of each characteristic interval is 1 day, 6 training sets in total from a day period [1-45,46-60] to [1-50,51-60] are constructed, and the test set is constructed into [1-60,61-70 ];

in this embodiment, the non-overlapping extended training set is shown in fig. 3, the tag interval is 10 days, the forward sliding window of the feature interval is 10 days each time, and the periods between the training sets are non-overlapping areas, so that 2 training sets of [1-50,51-60] and [1-40,41-50] are constructed, and the test set construction is the same as above.

The specific steps of performing the feature engineering operation on the user historical data in the step 103 are as follows:

the construction of feature engineering mainly comprises the following four aspects,

1) the user information characteristics are as follows: directly putting the attribute characteristics of the numerical type into a characteristic system, wherein the attribute characteristics of the numerical type are desensitized data, and constructing simple combined characteristics by using addition, subtraction, multiplication and division polynomials; for discrete and non-desensitized attribute characterization: and (3) performing one-hot discrete processing on the user reputation grade, gender, age, marital status and receiving address area, and enlarging the result by one hundred times according to the maximum and minimum normalization operation to serve as features so as to enhance the difference among the features.

2) The user consumption service characteristics are as follows: the service performance of the user is mainly enhanced, and the service performance comprises the user loan times (extracted according to the last 60 days, 30 days, 20 days and 10 days), the order amount (counted according to the average, the variance, the kurtosis, the skewness and the mode), the order count (extracted according to the granularity of morning, noon, evening, weekday and weekend and each week), the user loan credit level ranking feature, the user loan amount and the user loan rate in the historical consumption data of the user.

3) Financial APP operation behavior log characteristics: the method comprises the following steps of: the APP click module EVT _ LBL attribute column represents three levels of the click module, the three levels are split, and dispersion is carried out according to the statistical times; the user time characteristic: the number of days of the last behavior of the module with the maximum number of user pair behaviors from the prediction day, the number of days of the first behavior of the module with the maximum number of user pair behaviors from the prediction day, the maximum continuous behavior number of days of the user, the number of days of the user behavior, the longest/shortest distance between the user behaviors, and the number of days of the first/last behavior of the user from the prediction day.

4) The granularity is characterized in that: mainly processes data in a financial platform APP click module EVT _ LBL. Extracting features at different day granularities (last 60,45,31,21,18,14,10,7,5,4,3,2,1 days) to count how many times the features occur and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 are interacted; the statistics of how many times the feature appeared in total and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 were interacted are counted according to different hour granularities (last 24,21,18,12,6,1 hour).

Specifically, 178 dimensions of credit card personal information characteristics are constructed: aiming at the 30-dimensional attribute characteristics of the numerical type, a combination characteristic is constructed by utilizing an addition, subtraction, multiplication and division polynomial method, as shown in formula 3:

in the formula F_iAnd F_jFor different attribute columns of the data set, F _ new_iFeatures of the polynomial method are used for addition, subtraction, multiplication and division.

For the attribute characteristics of discrete values and non-desensitization, namely user reputation grade, gender, age, marital status and receiving address area, one-hot discrete processing is carried out, the result is expanded by one hundred times according to the maximum and minimum normalization operation as the characteristics, and the calculation method is shown as formula 4:

in the formula, x_min、x_maxCurrent sample characteristic value, minimum value, maximum value, x_newThe final characteristic result is obtained;

constructing 135 dimensions of credit card consumption business characteristics, including user loan times (extracted according to the last 60 days, 30 days, 20 days and 10 days), order amount (counted according to the average, variance, kurtosis, skewness and mode), order count (extracted according to the granularity of morning, noon, evening, weekday, weekend and weekly), user loan credit level ranking characteristics, user amount loan and user loan rate in the user historical consumption data;

establishing 221 dimensions of the operation behavior log feature of the credit card APP, including the discrete feature of the user: the APP click module EVT _ LBL attribute column represents three levels of the click module and is dispersed according to the statistical times;

the user time characteristic comprises the number of days of the last behavior of the module with the maximum number of behaviors of the user from the prediction day, the number of days of the first behavior of the module with the maximum number of behaviors of the user from the prediction day, the maximum continuous behavior number of days of the user, the number of days of the user behavior, the longest/shortest distance between the user behavior and the number of days of the first/last behavior of the user from the prediction day;

extracting 221-dimension by extracting behavior characteristics of each module according to different granularities: respectively counting how many times the feature appears and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 are interacted according to different days granularity extraction features (last 60,45,31,21,18,14,10,7,5,4,3,2 and 1 days); the extraction features (last 24,21,18,12,6 and 1 hours) in different hour granularities are used for respectively counting how many times the feature appears and how many different LBL/LBL _0/LBL _1/LBL _2/LBL _3 are interacted.

In the step 104, in the construction of the balanced classification subset of the unbalanced training solution, a training set with 8 training set combinations in total is constructed in the step 102, and a feature project is constructed for the training set in the third step, but in a financial scenario, the purchasing behavior of the user is often unbalanced, that is, the large class of users does not have the consuming behavior, and the small class of users has the consuming behavior. Unbalanced data can bring a great negative class skew cost to the training or prediction of the algorithm. The method for constructing the balanced classification subset by adopting the unbalanced training solution comprises the following steps: determining a reasonable proportion of a large class training subset and a small class training subset according to the requirement of cost sensitive learning; and combining the intersected large class training subsets with the small class training subsets to form a series of balanced training subsets.

The method comprises the following specific steps: an exemplary diagram of the construction of the balanced classification subsets is shown in fig. 4, and the specific steps are learning a reasonable class positive-negative sample distribution ratio of 1:2.5 (i.e. the ratio of the large class training subset to the small class training subset) according to the required value of the cost-sensitive learning of 9.7, so that disjoint large class training subsets (large class sample subsets) are decimated by 2.5 times the number of the small class training subsets (small class sample sets) each time, and are combined with the small class training subsets to form a series of balanced training subsets, namely, the balanced sample subset 1 to the balanced sample subset 25.

The step 105 is to solve the data grafting operation of the sample diversity, i.e. the data grafting operation of the sample diversity. The method comprises the steps of grafting training samples and grafting test results. The training sample refers to the grafting of the balanced training subset generated in step 104, the grafting of the test result refers to the grafting of the data set with the accuracy top100 in the training result, and the finally completed grafted training set is the training set to be used in prediction. The method comprises the following specific steps: the data grafting operation of the sample diversity is shown in fig. 5:

firstly, grafting a balance training subset, in the embodiment, training a test set by using a model Catboost; and determining the accuracy top100 data in the test result as a real and reliable training sample, grafting the balance sample and the test set result in the second step, and finally finishing grafting the training set as the training set to be used for prediction.

The step 106 of constructing a model integration scheme for improving the feature correlation, and predicting whether the customer will purchase the coupon on the credit card platform APP within ten days in the future in a financial consumption scene according to the historical behavior data of the customer specifically comprises the following steps: the model integration flow chart for improving the characteristic correlation is shown in fig. 6, the scheme constructs 5 models in total, including a Long-Short-Term neural network (LSTM), a Short-Term neural network (calco) for calco _1 and calco _2, a Random Forest (RF _1 and RF _ 2), and a local-Term neural network (LSTM), and the models include a gradient lifting algorithm for a characteristic of a structural class type, a Random Forest (RF) for LSTM, and the LSTM is a neural network model, and the last two types can be regarded as tree models, so that the model heterogeneity is satisfied. And constructing a model integration scheme for improving feature correlation, constructing 5 models, namely Catboost _1 and Catboost _2, RF _1 and RF _2 and LSTM, constructing a four-layer fusion structure, and outputting each layer as a fusion result or training feature of the next layer. And finally, outputting a prediction probability value representing the probability value of the user purchasing in the future 10 days, setting the probability to be higher than 0.95 when the user purchases the user group with high probability, setting the probability to be the total possible users when the probability is higher than 0.8, and setting the user as not purchasing the user group when the probability is lower than 0.6.

And a four-layer fusion structure is constructed, and each layer outputs a fusion result or training characteristics as the next layer. Specifically, the first layer RF _1 trains 785-dimensional features to output results as a list of features, i.e., features 785+ 1; in the second layer, the output results of the features 785+1 from the training feature 1 of the Catboost _1 are used as a list of features 785+2, and the output result of the Catboost _2 is used as the result to be fused in the fourth layer; next to the third layer, the RF _2 model is trained using the 785+2 and original 785 dimensional features as the result to be fused, and the neural network LSTM is trained using the original 785 dimensional features as the result to be fused. And finally, fusing according to a formula 4 to obtain a final result:

answer＝0.25×RF_2+0.4×LSTM+0.35×CatBoost_2 (4)

the model integration scheme for improving the feature correlation draws a staring thought for reference, and obtains a more optimized prediction result by utilizing a mode of promoting correlation learning by using a model.

And finally, outputting a prediction probability value representing the probability value of the user for purchasing the coupon on the credit card platform APP in the future 10 days, wherein the user is a high-probability purchasing user group when the set probability is greater than 0.95, the user is a possible purchasing user group when the probability is greater than 0.8, and the user is determined as a non-purchasing user group when the probability is lower than 0.6, namely, the user is output to purchase the coupon on the credit card platform APP in the future 10 days when the set probability is greater than or equal to the threshold 0.6, and the user is output not to purchase the coupon on the credit card platform APP in the future 10 days when the set probability is less than the threshold 0.6. The credit card center is helped to continuously expand services and scenes, and meanwhile, the credit card center is expected to actively capture user value information and consumption requirements through data accumulation and data driving, exert data values and provide more accurate services for users.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A user purchase prediction method under a big data financial scene is characterized by comprising the following steps:

step 106: constructing a model integration scheme for improving the characteristic correlation, namely constructing a plurality of models and forming a fusion structure; in the fusion structure, the probability of a user purchasing a coupon on a financial platform APP under a financial consumption scene is predicted according to the historical behavior data of the user.

2. The method for predicting the purchase of the user in the big data financial scene as claimed in claim 1, wherein the preprocessing of the data comprises abnormal value processing, missing value processing and repeated value processing; the outlier processing comprises linear interpolation filling scheme processing or mode replacement; the missing value processing comprises multidimensional processing, namely counting the number of missing values according to columns, dividing the number by the total number of the columns to calculate the missing ratio of each column, namely keeping the missing values at the original non-number NaN type value, and constructing the missing ratio to represent the missing degree; the repeated value processing includes simplified processing of user information having the same meaning, with more than character being eliminated.

3. The method of claim 1, wherein the lapped expanding training set comprises setting a tag interval to be N days, and sliding a window forward each time the feature interval isDays, areas where periods between their training sets have overlap; the non-overlapped type expansion training set comprises a label interval set to be N days, a forward sliding window of a characteristic interval is N days each time, and the period between the training sets is free from repetitionThe area of the stack.

4. The method of claim 1, wherein the feature engineering operation is performed on each extended training set, and the constructing of different types of features comprises constructing user information features, user consumption business features, financial APP operation behavior log features, and granularity features.

5. The method of claim 4, wherein the user purchase prediction method in big data finance scene,

the user information characteristics comprise that desensitized data are combined by polynomial construction, and un-desensitized data are subjected to discrete processing by a characteristic extraction method one-hot, and then a discrete result is expanded by one hundred times according to the maximum and minimum normalization operation to serve as normalization characteristics;

the user consumption business characteristics are used for enhancing the business performance of the user, and the business performance includes the statistics of user loan times of the user in historical behavior data, the statistics of order amount in historical orders, the counting statistics of the order amount in different time periods, the ranking statistics of loan credit levels of all users, the statistics of user loanable amount and user loan rate;

the financial APP operation behavior log features comprise user discrete features and user time features. The user discrete feature means that a user carries out discrete value calculation on each level of the APP click module according to the statistical times; the user time characteristic refers to statistics of various interval days of the user operation on the APP click module.

6. The method of claim 1, wherein the performing of the balance training on each extended training set in an unbalanced training manner to obtain a series of balanced training subsets comprises determining a reasonable ratio of a large class training subset to a small class training subset according to the need for cost sensitive learning; combining the disjoint large class training subsets with the small class training subsets to form a series of balanced training subsets.

7. The method according to claim 1, wherein the step 104 specifically includes training a test set using a model Catboost; and regarding the data with the higher accuracy in the test result as a real and reliable balanced sample set, grafting the balanced sample set and the test set result, and finally finishing grafting to obtain a training set to be used for prediction.

8. The method of claim 1, wherein the step 106 comprises constructing a plurality of models, including two gradient boosting algorithm models, two random forest models, and a long-term memory neural network model; and constructing a four-layer fusion structure, and obtaining a final result whether the user purchases the coupon on the financial platform APP or not by using a fusion formula according to the fusion structure.

9. The method according to claim 8, wherein each layer in the fusion structure outputs the fusion result or training feature as the next layer;

wherein,

10. The method of claim 8, wherein the fusion formula is expressed as:

answer＝0.25×RF_2+0.4×LSTM+0.35×CatBoost_2

wherein answer represents the final result after fusion; RF _2 represents the output result of the second random forest model; the LSTM represents an output result of the long-time memory neural network model; and Catboost _2 represents the output result of the second gradient boost algorithm model.