CN117633647A

CN117633647A - Information classification method and device, storage medium and electronic equipment

Info

Publication number: CN117633647A
Application number: CN202311370351.7A
Authority: CN
Inventors: 杨倩; 刘姝岑
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-03-01

Abstract

The application discloses a method and a device for classifying information, a storage medium and electronic equipment, and relates to the technical field of artificial intelligence, the field of financial science and technology or other related fields. The method comprises the following steps: determining N target objects to be classified; obtaining target information of each target object in a financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object transacting in the financial institution; determining a function curve set according to the M target information sets; and inputting the function curve set into a target classification model for classification processing, and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm. By the method and the device, the problem that the efficiency of classifying a plurality of clients in a financial institution is low in the related art is solved.

Description

Information classification method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence, financial technology, or other related fields, and in particular, to a method and apparatus for classifying information, a storage medium, and an electronic device.

Background

At present, with the advent of the internet financial era, customer data is explosively increased, financial institutions change to digital technology in a dispute, contend for customer resources, search for high-quality customers, and mine potential customers to become competing trends of the financial institutions. However, the conventional client classification method is difficult to acquire the hidden value and rule from a large amount of data, so that a data mining algorithm represented by clustering becomes a new tool for client subdivision.

In addition, in the related art, the clustering analysis of the financial institution customer information data is generally performed after the dimension reduction process, for example, K-means (a clustering algorithm), but the two-step method adopted in the related art performs the discretization and clustering steps respectively, which may result in the loss of discrimination information and lower efficiency of customer subdivision.

For the problem of low efficiency of classifying multiple customers in a financial institution in the related art, no effective solution has been proposed at present.

Disclosure of Invention

The main objective of the present application is to provide a method and apparatus for classifying information, a storage medium and an electronic device, so as to solve the problem of low efficiency of classifying a plurality of clients in a financial institution in the related art.

To achieve the above object, according to one aspect of the present application, there is provided a classification method of information. The method comprises the following steps: determining N target objects to be classified, wherein the target objects are objects for providing services for a financial institution, and N is a positive integer greater than 1; obtaining target information of each target object in the financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N; determining a function curve set according to the M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set; and inputting the function curve set into a target classification model for classification processing, and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

Further, the target classification model at least comprises a dimension reduction module and a clustering processing module, the dimension reduction module is determined based on the dimension reduction algorithm, the clustering processing module is determined based on the clustering algorithm, the function curve set is input into the target classification model for classification processing, and the output of the class information corresponding to each target object comprises: inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing to obtain S function curves, wherein S is a positive integer smaller than M; and inputting the S function curves into the clustering processing module in the target classification model to perform clustering processing to obtain class information corresponding to each target object.

Further, if the dimension reduction algorithm is a principal component analysis algorithm, inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing, where obtaining S function curves includes: determining the importance degree of each function curve in the M function curves by combining the principal component analysis algorithm; according to the importance degree of each function curve in the M function curves, function curves with importance degrees higher than a preset importance degree are screened out from the M function curves; and taking the function curve with the screened importance degree higher than the preset importance degree as the S function curves.

Further, inputting the S function curves into the clustering processing module in the target classification model to perform clustering processing, where obtaining category information corresponding to each target object includes: adjusting parameter information of the clustering processing module by using a maximum expected algorithm to obtain an adjusted clustering processing module; obtaining a preset clustering number, wherein the clustering number is the number of categories after clustering treatment; performing cluster analysis on the S function curves based on the preset cluster number and the adjusted cluster processing module to obtain a cluster result of the S function curves; and obtaining the category information corresponding to each target object according to the clustering result of the S function curves.

Further, determining a set of function curves from the M sets of target information includes: obtaining a preset base function, wherein the preset base function at least comprises one of the following: fourier functions, B-spline functions, and polynomial functions; and performing data fitting on the M target information sets based on the preset basis function to obtain the function curve set.

Further, obtaining the target information of each target object in the financial institution within the preset time period, where obtaining M target information sets includes: acquiring target information of each target object in the financial institution within a preset time period to obtain M original information sets; carrying out data cleaning treatment on the information in the M original information sets to obtain M cleaned original information sets; and carrying out data normalization processing on the M cleaned original information sets to obtain M target information sets.

Further, performing data cleaning processing on the information in the M original information sets, where obtaining the cleaned M original information sets includes: determining a target extremum based on information in the M sets of original information; determining abnormal information in the M original information sets according to the target extremum, wherein the numerical value corresponding to the abnormal information is larger than the target extremum; converting the numerical value corresponding to the abnormal information into the target extremum to obtain converted abnormal information; and obtaining M cleaned original information sets based on the converted abnormal information.

Further, performing data cleaning processing on the information in the M original information sets, where obtaining the cleaned M original information sets includes: acquiring missing information in the M original information sets; determining the deletion degree of the deletion information, and judging whether the deletion degree of the deletion information is larger than a preset deletion degree or not; if the deletion degree of the deletion information is greater than the preset deletion degree, deleting the deletion information from the M original information sets to obtain deleted M original information sets, and taking the deleted M original information sets as the cleaned M original information sets; if the missing degree of the missing information is not greater than the preset missing degree, performing interpolation processing on the missing information to obtain the missing information after the interpolation processing, and obtaining the M original information sets after cleaning based on the missing information after the interpolation processing.

Further, after the function curve set is input into a target classification model to perform classification processing and category information corresponding to each target object is output, the method further comprises: determining a financial product pushing scheme according to the category information corresponding to each target object; pushing financial products in a financial product set to each target object based on the financial product pushing scheme, wherein the financial product set at least comprises financial products in the financial institution.

In order to achieve the above object, according to another aspect of the present application, there is provided an information classification apparatus. The device comprises: a first determining unit, configured to determine N target objects to be classified, where the target objects are objects that provide services for a financial institution, and N is a positive integer greater than 1; the first acquisition unit is used for acquiring target information of each target object in the financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N; the second determining unit is used for determining a function curve set according to the M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set; the first processing unit is used for inputting the function curve set into a target classification model for classification processing and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

Further, the object classification model at least includes a dimension reduction module and a clustering processing module, the dimension reduction module is determined based on the dimension reduction algorithm, the clustering processing module is determined based on the clustering algorithm, and the first processing unit includes: the first processing module is used for inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing to obtain S function curves, wherein S is a positive integer smaller than M; and the second processing module is used for inputting the S function curves into the clustering processing module in the target classification model to perform clustering processing to obtain category information corresponding to each target object.

Further, if the dimension reduction algorithm is a principal component analysis algorithm, the first processing module includes: the first determining submodule is used for determining the importance degree of each function curve in the M function curves by combining the principal component analysis algorithm; the first screening submodule is used for screening out function curves with importance degrees higher than a preset importance degree from the M function curves according to the importance degrees of each function curve in the M function curves; and the second determining submodule is used for taking the function curve with the screened importance degree higher than the preset importance degree as the S function curves.

Further, the second processing module includes: the first adjustment sub-module is used for adjusting the parameter information of the clustering processing module by using a maximum expected algorithm to obtain an adjusted clustering processing module; the first acquisition sub-module is used for acquiring the preset clustering quantity, wherein the clustering quantity is the quantity of the categories after clustering; the first analysis submodule is used for carrying out cluster analysis on the S function curves based on the preset cluster number and the adjusted cluster processing module to obtain a cluster result of the S function curves; and the third determining submodule is used for obtaining the category information corresponding to each target object according to the clustering result of the S function curves.

Further, the second determining unit includes: the first acquisition module is used for acquiring a preset base function, wherein the preset base function at least comprises one of the following: fourier functions, B-spline functions, and polynomial functions; and the first fitting module is used for performing data fitting on the M target information sets based on the preset basis function to obtain the function curve set.

Further, the first acquisition unit includes: the second acquisition module is used for acquiring target information of each target object in the financial institution within a preset time period to obtain M original information sets; the third processing module is used for carrying out data cleaning processing on the information in the M original information sets to obtain M cleaned original information sets; and the fourth processing module is used for carrying out data normalization processing on the M cleaned original information sets to obtain the M target information sets.

Further, the third processing module includes: a fourth determining sub-module for determining a target extremum based on information in the M sets of raw information; a fifth determining submodule, configured to determine abnormal information in the M original information sets according to the target extremum, where a value corresponding to the abnormal information is greater than the target extremum; the first conversion sub-module is used for converting the numerical value corresponding to the abnormal information into the target extremum to obtain converted abnormal information; and a sixth determining submodule, configured to obtain the M cleaned original information sets based on the converted anomaly information.

Further, the third processing module includes: the second acquisition sub-module is used for acquiring missing information in the M original information sets; the first processing submodule is used for determining the deletion degree of the deletion information and judging whether the deletion degree of the deletion information is larger than a preset deletion degree or not; the first deleting sub-module is used for deleting the missing information from the M original information sets if the missing degree of the missing information is larger than the preset missing degree, obtaining deleted M original information sets, and taking the deleted M original information sets as the cleaned M original information sets; and the second processing submodule is used for carrying out interpolation processing on the missing information to obtain the missing information after the interpolation processing if the missing degree of the missing information is not greater than the preset missing degree, and obtaining M original information sets after cleaning based on the missing information after the interpolation processing.

Further, the apparatus further comprises: the third determining unit is used for inputting the function curve set into a target classification model for classification processing, outputting category information corresponding to each target object, and determining a financial product pushing scheme according to the category information corresponding to each target object; and the first pushing unit is used for pushing the financial products in the financial product set to each target object based on the financial product pushing scheme, wherein the financial product set at least comprises the financial products in the financial institutions.

In order to achieve the above object, according to another aspect of the present application, there is provided a computer-readable storage medium storing a program, wherein the program performs the information classification method of any one of the above.

To achieve the above object, according to another aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of classifying information of any of the above.

Through the application, the following steps are adopted: determining N target objects to be classified, wherein the target objects are objects for providing services for financial institutions, and N is a positive integer greater than 1; obtaining target information of each target object in a financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of a target object in a financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N; determining a function curve set according to M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set; the function curve set is input into a target classification model for classification processing, and category information corresponding to each target object is output, wherein the target classification model is constructed based on a dimension reduction algorithm and a clustering algorithm, and the problem that the efficiency of classifying a plurality of clients in a financial institution is low in the related art is solved. The method comprises the steps of obtaining a plurality of target information sets by determining a plurality of target objects to be classified and obtaining target information of each target object in a financial institution in a preset time period, wherein the target information comprises at least one of the following: asset information of target objects in a financial institution and transaction information of the target objects in the financial institution are transacted, a function curve set is determined according to a plurality of target information sets, the function curve set is input into a target classification model for classification processing, category information corresponding to each target object is output, the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm, and therefore dimension reduction and clustering processing can be simultaneously executed through the target classification model when clustering analysis is carried out on a plurality of client information data in the financial institution, and the effect of improving the efficiency of classifying a plurality of clients is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a flow chart of a method of classifying information provided in accordance with an embodiment of the present application;

FIG. 2 is a flow chart of an alternative method of classifying information provided in accordance with an embodiment of the present application;

FIG. 3 is a flowchart of the refinement steps of the data processing module in an embodiment of the present application;

FIG. 4 is a flowchart of the refinement step of the functional data fitting module in an embodiment of the present application;

FIG. 5 is a flowchart of a refinement step of the functional data dimension reduction module in an embodiment of the present application;

FIG. 6 is a flowchart of a refinement step of the funHDDC algorithm cluster processing module in an embodiment of the present application;

FIG. 7 is a schematic diagram of a sorting apparatus for information provided according to an embodiment of the present application;

fig. 8 is a schematic diagram of an electronic device provided according to an embodiment of the present application.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.

The invention will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a method for classifying information provided according to an embodiment of the present application, as shown in fig. 1, and the method includes the following steps:

in step S101, N target objects to be classified are determined, where the target objects are objects that provide services for the financial institution, and N is a positive integer greater than 1.

For example, the N target objects may be customers of a financial institution, and the customers of the financial institution may be categorized.

Step S102, obtaining target information of each target object in a financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object for transaction in the financial institution, M is a positive integer greater than or equal to N.

For example, the target information may be basic attribute information of the customer, customer value information, customer transaction and account information, and the like. For example, the preset time period may be a time period of 1 month to 3 months, and basic attribute information, value information, transaction and accounting information and the like of each of a plurality of clients to be classified in the financial institution in the preset time period of 1 month to 3 months may be acquired. When the target information is the value information of the client, the target information set may include the value information of the client a in the preset time period of 1 month to 3 months in the plurality of clients to be classified and the value information of the client B in the preset time period of 1 month to 3 months in the plurality of clients to be classified. When the target information is basic attribute information of the client and value information of the client, the target information set may include basic attribute information of an a client in a preset time period of 1 month to 3 months, value information of the a client in a preset time period of 1 month to 3 months, basic attribute information of a B client in a preset time period of 1 month to 3 months, and value information of the B client in a preset time period of 1 month to 3 months.

Step S103, determining a function curve set according to the M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set.

For example, when the target information set includes value information of the a-client in the preset time period of 1 month to 3 months in the plurality of clients to be classified and value information of the B-client in the preset time period of 1 month to 3 months in the plurality of clients to be classified, the value information of the a-client in the preset time period of 1 month to 3 months may be fitted into one function curve using the basis function, and the value information of the B-client in the preset time period of 1 month to 3 months may be fitted into another function curve using the basis function, and the two function curves after the fitting may be included in the function curve set.

Step S104, inputting the function curve set into a target classification model for classification processing, and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

For example, a classification model (the target classification model described above) for classifying a plurality of customers of a financial institution may be constructed first using a dimension reduction algorithm and a clustering algorithm. The fitted function curves may then be input into the constructed classification model (the target classification model described above), which may then output which class each customer in the financial institution belongs to. For example, the categories of customers can be categorized into the following categories: customers who have very large assets, customers who have assets in general, and customers who have very small assets.

It should be noted that the information classification method provided in the embodiment of the present application may be applied to a financial scenario.

Through the steps S101 to S104, a plurality of target information sets are obtained by determining a plurality of target objects to be classified and obtaining target information of each target object in a financial institution within a preset period of time, where the target information includes at least one of the following: asset information of target objects in a financial institution and transaction information of the target objects in the financial institution are transacted, a function curve set is determined according to a plurality of target information sets, the function curve set is input into a target classification model for classification processing, category information corresponding to each target object is output, the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm, and therefore dimension reduction and clustering processing can be simultaneously executed through the target classification model when clustering analysis is carried out on a plurality of client information data in the financial institution, and the effect of improving the efficiency of classifying a plurality of clients is achieved.

Optionally, in the information classification method provided in the embodiment of the present application, obtaining target information of each target object in a financial institution in a preset time period, where obtaining M target information sets includes: acquiring target information of each target object in a financial institution within a preset time period to obtain M original information sets; carrying out data cleaning treatment on the information in the M original information sets to obtain M cleaned original information sets; and carrying out data normalization processing on the M cleaned original information sets to obtain M target information sets.

For example, an initial sample set of information from a plurality of customers in a financial institution may be acquired first, and then the initial sample set may be subjected to data preprocessing to obtain a target sample. And the information of the plurality of clients can be pre-collected client basic attribute information, client value information, client transaction and account moving information and the like. In addition, the method of data preprocessing may include data cleansing processing, data normalization processing, and the like.

By the scheme, the initial sample set of the client information can be preprocessed rapidly and accurately, so that the accuracy of data in the sample set can be ensured.

Optionally, in the information classification method provided in the embodiment of the present application, performing data cleaning processing on information in M original information sets, where obtaining the cleaned M original information sets includes: determining a target extremum based on information in the M sets of original information; determining abnormal information in the M original information sets according to the target extremum, wherein the value corresponding to the abnormal information is larger than the target extremum; converting the numerical value corresponding to the abnormal information into a target extremum to obtain converted abnormal information; and obtaining M cleaned original information sets based on the converted abnormal information.

For example, the data cleaning process may include outlier processing and the like. And the percentile distribution and the box graph can be utilized to detect abnormal values for data abnormal value processing, extreme values (the target extreme values) of each variable are defined for abnormal points according to the variable index feature statistical result and the business meaning, and the abnormal values beyond the extreme value range are converted into the defined extreme values.

By the scheme, the abnormal value in the initial sample set of the client information can be processed quickly and conveniently.

Optionally, in the information classification method provided in the embodiment of the present application, performing data cleaning processing on information in M original information sets, where obtaining the cleaned M original information sets includes: acquiring missing information in M original information sets; determining the degree of missing of the missing information, and judging whether the degree of missing of the missing information is larger than a preset degree of missing; if the deletion degree of the deletion information is larger than the preset deletion degree, deleting the deletion information from the M original information sets to obtain deleted M original information sets, and taking the deleted M original information sets as cleaned M original information sets; if the missing degree of the missing information is not greater than the preset missing degree, performing interpolation processing on the missing information to obtain the missing information after the interpolation processing, and obtaining M cleaned original information sets based on the missing information after the interpolation processing.

For example, the data cleaning process may include a missing value process or the like. The method can process the missing values according to the distribution condition of the missing values, and when the missing rate is high (> 99%) (and 99% can be the preset missing degree), the index data is directly deleted; when the deletion rate is not high (99%), the deletion data can be interpolated by using a multiple interpolation method or directly not processed (default to a null value or 0).

Through the scheme, the missing values in the initial sample set of the client information can be processed quickly and conveniently.

Optionally, in the method for classifying information provided in the embodiment of the present application, determining the function curve set according to the M target information sets includes: acquiring a preset base function, wherein the preset base function at least comprises one of the following: fourier functions, B-spline functions, and polynomial functions; and carrying out data fitting on M target information sets based on a preset basis function to obtain a function curve set.

For example, since the target samples obtained are discrete and finite data, it is necessary to fit the discrete customer index data to a function curve in order to facilitate subsequent cluster analysis. Therefore, a set of basis functions (the preset basis functions) may be selected in advance, and the preselected basis functions may be fourier basis, B-spline basis, polynomial basis, and the like, and then data fitting may be performed on the discrete target samples according to the preselected basis functions (the preset basis functions) and function type data (function curves) may be generated.

By the scheme, the discrete customer index data can be quickly and accurately fitted into the function curve, so that subsequent cluster analysis can be conveniently carried out.

Optionally, in the information classification method provided in the embodiment of the present application, the target classification model includes at least a dimension reduction module and a cluster processing module, the dimension reduction module is determined based on a dimension reduction algorithm, the cluster processing module is determined based on a cluster algorithm, the function curve set is input into the target classification model for classification processing, and outputting class information corresponding to each target object includes: inputting M function curves in the function curve set into a dimension reduction module in a target classification model to perform dimension reduction processing to obtain S function curves, wherein S is a positive integer smaller than M; and inputting the S function curves into a clustering processing module in the target classification model to perform clustering processing to obtain class information corresponding to each target object.

For example, when the fitted functional data (function curve) is classified by using a pre-constructed classification model (the above-mentioned target classification model), the fitted functional data (function curve) may be projected into a low-dimensional subspace by using a funHDDC algorithm (another clustering algorithm) to perform cluster analysis, and then a cluster analysis result of the client is obtained, and a plurality of client subdivision types are generated. The funHDDC algorithm performs both the decreasing of the functional principal component analysis and the clustering of the functional principal component scores.

In summary, the dimension reduction and clustering operations can be performed simultaneously through the pre-constructed classification model, so that the client subdivision efficiency can be improved.

Optionally, in the information classification method provided in the embodiment of the present application, if the dimension reduction algorithm is a principal component analysis algorithm, inputting M function curves in the function curve set into a dimension reduction module in the target classification model to perform dimension reduction processing, where obtaining S function curves includes: determining the importance degree of each function curve in the M function curves by combining a principal component analysis algorithm; according to the importance degree of each function curve in the M function curves, function curves with importance degrees higher than a preset importance degree are screened out from the M function curves; and taking the function curve with the screened importance degree higher than the preset importance degree as an S-piece function curve.

For example, if the dimension reduction method is a functional principal component analysis method, the dimension reduction processing is performed on the functional data based on the functional principal component analysis to obtain the principal component score. The function curve can be obtained firstly, the dimension reduction treatment is carried out by utilizing a function type principal component analysis method, and the accumulated variance contribution rate of the principal component is obtained; the number of principal components selected according to the cumulative variance contribution rate exceeding 85% (the preset importance degree) can be used for achieving the dimension reduction processing and obtaining the principal component score.

Through the scheme, the functional data can be rapidly and accurately subjected to dimension reduction by adopting the functional principal component analysis method.

Optionally, in the information classification method provided in the embodiment of the present application, inputting the S function curves into a clustering processing module in the target classification model to perform clustering processing, where obtaining the class information corresponding to each target object includes: adjusting parameter information of the clustering processing module by using a maximum expected algorithm to obtain an adjusted clustering processing module; obtaining a preset clustering number, wherein the clustering number is the number of categories after clustering treatment; performing cluster analysis on the S function curves based on the preset cluster number and the adjusted cluster processing module to obtain a cluster result of the S function curves; and obtaining the category information corresponding to each target object according to the clustering result of the S function curves.

For example, when clustering principal component scores based on the funHDDC algorithm to generate a plurality of client subdivision types, the EM algorithm (the maximum expectation algorithm) may be used to estimate parameter information of a clustering processing module in the classification model (the target classification model described above); then, different preset clustering numbers can be obtained, the clustering number k (the clustering number) can be selected according to BIC criteria (Bayesian Information Criterion and Bayesian information criteria), then, clustering analysis can be carried out on the principal component scores based on the funHDDC algorithm, and k client subdivision types can be generated.

Through the scheme, the principal component scores can be quickly and accurately subjected to cluster analysis, so that the subdivision types of a plurality of clients in a financial institution can be determined.

Optionally, in the information classification method provided in the embodiment of the present application, after inputting the function curve set into the target classification model to perform classification processing and outputting the class information corresponding to each target object, the method further includes: determining a financial product pushing scheme according to the category information corresponding to each target object; and pushing the financial products in the financial product set to each target object based on a financial product pushing scheme, wherein the financial product set at least comprises financial products in a financial institution.

For example, after obtaining the revenue class characteristics of k guest groups by the guest subdivision device, a corresponding marketing strategy (the above-described financial product pushing scheme) may be pushed to each guest in the financial institution according to the revenue class characteristics of each guest group.

Through the scheme, the method and the system are beneficial to the omnibearing grasp of the customer characteristics of a financial institution, and the personalized service and marketing can be purposefully developed aiming at different customer groups.

For example, the method provided by the embodiment of the application belongs to the technical field of big data processing, and particularly relates to a client subdivision method and device based on a funHDDC clustering algorithm.

Moreover, client subdivision is an important component of client relationship management, which means that enterprises classify clients according to their intrinsic and extrinsic properties and their consumption behavior characteristics, thereby providing targeted products and services for different classes of clients; it is also a shareholder for financial institutions to increase the competitiveness, profit margin and market share of small and medium-sized enterprises' credit businesses. However, combining the functional data perspective with the cluster analysis can further mine information in traditional structured data, which is a powerful tool for customer subdivision.

For example, the embodiment can subdivide financial institution clients based on the funHDDC clustering algorithm, analyze a plurality of variables of each client, mine high-quality client groups, find preference characteristics of the clients in various aspects, and therefore help client managers to more accurately marketing and develop the clients, save manpower and improve service efficiency and quality.

In addition, the main system steps of the present embodiment may include:

the first step: acquiring an initial sample set composed of client information;

and a second step of: performing data preprocessing on the initial sample set to obtain a target sample;

and a third step of: fitting the target sample data using the basis function;

Fourth step: and projecting the fitted functional data into a low-dimensional subspace by using a furHDDC algorithm to perform cluster analysis, so as to obtain a client cluster analysis result and generate a plurality of client subdivision types. The funHDDC algorithm is used for simultaneously completing the descending and clustering based on the functional principal component score based on the functional principal component analysis.

1. Data acquisition

The embodiment can randomly extract all transaction records and related information of part of customers in the financial institution year by year from a financial institution database and a CRM system (Customer Relationship Management System ), and can take a customer ID (Identification) as a unique Identification. The AUM value (month and day average financial asset) of the customer is taken as a main object, and the income brought by the customer as a financial institution is taken as a research target.

Moreover, in order to achieve the research objective, the present embodiment may mine and collect data mainly from three dimensions of customer basic attribute information, customer value information, customer transaction and accounting information, and table 1 is a data index of an original customer in a financial institution, that is, a specific data index of a customer in a financial institution is shown in table 1.

TABLE 1

2. Data preprocessing

Data preprocessing is the most fundamental part of cluster analysis, namely, processing and processing the original data set in multiple aspects, and providing a high-quality data set for the subsequent analysis work. And the common data preprocessing method comprises the following steps: data cleaning, data transformation and data normalization.

(1) Data cleansing

Because the collected original data can have the problems of data deficiency, inconsistency, noise pollution and the like, the data can influence the client subdivision model and the statistical inference result, and therefore, the data cleaning is an indispensable link before the model is built.

Moreover, data cleansing mainly includes outlier processing and missing value processing, and also includes repeated data, noise data, and the like in the original data. Aiming at the abnormal value processing of the data, the percentile distribution and the box graph are mainly utilized to detect the abnormal value, the extreme value of each variable is regulated according to the variable index characteristic statistical result and the business meaning for the abnormal point, and the abnormal value beyond the extreme value range is converted into the regulated extreme value. Processing the missing values according to the distribution condition of the missing values, and directly deleting the index data when the missing rate is high (> 99%); when the missing rate is not high, the missing data can be interpolated by using a multiple interpolation method or is not processed directly (default is null or 0).

(2) Data normalization

For example, after the data is cleaned, data normalization processing is required, and because indexes in the basic attribute information of the client, the value information of the client, the transaction information of the client and the account information of the account information have different units and dimensions, in order to solve the problem of incomparability among data variables caused by different data dimensions, the embodiment can perform normalization processing on the indexes.

3. Client subdivision method based on funHDDC clustering algorithm

For the clustering analysis of customer information data in a financial institution, the clustering is generally performed after the dimension reduction process, such as K-means, but the two steps are performed for discretization and clustering respectively, which may result in the loss of discrimination information. While the funHDDC algorithm used in this embodiment uses functional principal component analysis while performing both dimension reduction and clustering when exploring the dataset. The method is applied to the clustering analysis of the financial institution customer information data, and more information of the customer data can be mined.

(1) Functional data fitting

Functional data is a smooth curve, surface or hypersurface over an interval (e.g., a period of time, a range of temperatures, etc.), which is essentially infinite in dimension, but can only be observed in discrete and finite forms.

Since this embodiment can only collect discrete and limited data, it is necessary to fit discrete customer data to a function curve, and the most common fitting method is the basis function expansion method.

(2) Functional data dimension reduction

Because the fitted functional data belongs to an infinite dimension space, infinite dimension to finite dimension reduction processing is needed, and preparation is carried out for subsequent cluster analysis. Next, a dimension reduction method, a functional principal component analysis method, used in the present embodiment will be described.

In addition, in the traditional data analysis, the principal component analysis method is mainly used for solving the dimension reduction problem of high-dimension data. The functional principal component analysis method is generalized from multi-element principal component analysis, so that the basic ideas of the two methods are communicated and only are different in form. The traditional principal component is a linear combination of the original variables, with discrete variable dimensions replaced by continuous time in the functional principal component analysis, and the linear combination is replaced by an inner product. Namely, the functional principal component is expressed as:

where φ (t) is a weight function, f is called a principal component, and a random sample x is taken _i (t) substituting the above formula, namely:

wherein f _i I=1, …, n is the score of the principal component of the i-th customer, and on the assumption that the sample mean function is zero, solving the constraint optimization problem of the first principal component becomes:

Wherein,is the first principal component score of the i-th customer. Similarly, the j-th principal component constraint optimization problem available becomes:

wherein i=1, 2, … n, j=1, 2, … m _i ，Is the j-th principal component score value for the i-th client.

The method selects partial index data of the customer information by a functional principal component analysis method, and retains most of information of the original customer data. Next, based on the principal component scores of the clients, a cluster analysis may be performed using a funHDDC clustering algorithm.

(3) Clustering based on funHDDC algorithm

For n index curves { x ] ₁ ,…,x _n Clustering the index curves into K groups, where x _i ＝{x _i (t)} _t∈[0,T] I is more than or equal to 1 and n is more than or equal to n. The distribution assumption is carried out on the model coefficient gamma, and the marginal distribution of the model coefficient can be obtained as a Gaussian mixture model, namely:

wherein phi is a Gaussian density function, pi _k ＝p(Z _k =1) is the prior probability of cluster k.

Moreover, the funHDDC algorithm relies mainly on the EM algorithm (maximum expectation algorithm) to estimate the parameters of the model, which is described next.

Model parameter estimation

The EM algorithm (maximum expectation algorithm) is a type of transmissionParameter estimation of the system, which alternates between two steps: desired (E) and maximized (M). Wherein, E-step, calculate x _ij (j-th index of i-th client) conditional posterior probability of belonging to cluster k; m-step, given the conditional posterior probability above, the conditional expectation for the funHDDC model to maximize the complete log likelihood function.

(II) Cluster quantity selection

Regarding the selection of the number of clusters k, this is regarded as a model selection problem in the present embodiment. Since the present embodiment selects the BIC criteria (Bayesian Information Criterion, bayesian information criteria) for the selection of the number of clusters in the context of the hybrid model. And the calculating method of the BIC criterion is as follows:

wherein,and m is the number of model parameters and n is the number of individuals, wherein m is the maximum log likelihood value. The criterion penalizes the log likelihood by the complexity of the model, and then selects the model that maximizes the criterion.

In addition, since the funHDDC model needs to select the appropriate number of clusters k first, the embodiment can directly set different numbers of clusters and select the number of clusters k according to the BIC criterion.

(4) Clustering result analysis

In the embodiment, a basis function expansion method and a functional principal component analysis method are used for fitting and dimension-reducing a large number of multidimensional financial institution customer data to obtain an objective function curve so as to compress the data volume, and meanwhile, the data characteristic structure is not changed. And then, carrying out cluster analysis on the objective function curve by using a funHDDC clustering algorithm, selecting the optimal cluster number k by using an EM algorithm, and simultaneously carrying out dimension reduction and clustering, thereby greatly reducing blindness of the traditional clustering method and increasing the subdivision efficiency of clients.

Moreover, according to the clustering analysis, subdivision results of different clients are obtained, the subdivision results have certain characteristic differences, client managers can distinguish the subdivision results through different attributes, and different marketing strategies are adopted for targeted adjustment, so that the subdivision results are beneficial to follow-up marketing activities. The customer subdivision result can also help the financial institution to know the consumption trend and behavior characteristics of various customers more clearly, thereby truly obtaining a group of high-quality customer groups with strong consumption capability and higher income expectation, being beneficial to differential operation of the financial institution aiming at specific demands of customers, saving cost, improving efficiency and finally realizing real profit of the financial institution.

4. Client subdivision device based on funHDDC clustering algorithm

In order to make the objects, technical solutions and effects of the present embodiment clearer and more specific, the client subdivision device will be described in detail below. Mainly comprises the following steps: the system comprises a data processing module, a functional data fitting module, a functional data dimension reduction module and a functional data clustering processing module. For example, fig. 2 is a flowchart of an alternative information classification method provided according to an embodiment of the present application, and as shown in fig. 2, the alternative information classification method includes the following steps: firstly, an initial sample set formed by client information is obtained, and data cleaning and data normalization processing are carried out on the initial sample set to obtain a target sample; then carrying out data fitting on the target sample to generate functional data (function curve); then, performing dimension reduction processing on the functional data based on the functional principal component analysis to obtain principal component scores; and clustering the principal component scores based on a furHDDC algorithm to generate a plurality of client subdivision types.

For example, fig. 3 is a flowchart of a refinement step of a data processing module in the embodiment of the present application, and as shown in fig. 3, the module is configured to obtain an initial sample set composed of client information, and perform data cleaning and data normalization processing on the initial sample set to obtain a target sample.

For example, since the target samples obtained are discrete and finite data, it is necessary to fit the discrete customer index data to a function curve in order to facilitate subsequent cluster analysis. Therefore, data fitting is performed on the target sample to generate functional data (function curve). Moreover, fig. 4 is a flowchart of a refinement step of the functional data fitting module in the embodiment of the present application, and as shown in fig. 4, data fitting may be performed on the target sample to generate functional data (functional curve).

For example, since the fitted functional data belongs to an infinite dimensional space, an infinite to finite dimensional dimension reduction process is required. Therefore, the functional data is subjected to dimension reduction processing based on the functional principal component analysis, and principal component scores are obtained. Further, fig. 5 is a flowchart illustrating a refinement step of the functional data dimension reduction module in the embodiment of the present application, and as shown in fig. 5, the functional data may be subjected to dimension reduction processing based on the functional principal component analysis, so as to obtain a principal component score.

For example, fig. 6 is a flowchart of a refinement step of the funHDDC algorithm clustering module in the embodiment of the present application, and as shown in fig. 6, the principal component scores may be clustered based on the funHDDC algorithm, to generate a number of client subdivision types.

In summary, the embodiment is based on big data, a basis function expansion method and a functional principal component analysis method, by introducing the basic attribute information of the client, the value information of the client, the original data of the transaction and dynamic account information of the client, comprehensively evaluating the conditions of the client asset, income and the like through a specific clustering algorithm, forming a client subdivision model, designing a client subdivision device, effectively solving the problems that the workload of manually managing and maintaining the client and the difficulty in acquiring sufficient data are required, and realizing efficient client subdivision.

Moreover, the method provided by the embodiment of the application simultaneously completes the reduction of the functional principal component analysis and the clustering of the functional principal component analysis score, and the model parameters are estimated through the EM algorithm, so that the problems of large data size and noise data of financial institution clients are solved, blindness and subjectivity of the clustering number assignment are avoided, and further effective subdivision of the financial institution clients is realized. And acquiring the revenue class characteristics of k guest groups through the client subdivision device, and pushing the corresponding marketing strategies. The behavior of each sub-client group is acquired through the funHDDC cluster analysis, so that the financial institution can master the client characteristics in an all-around way, and personalized service and marketing can be purposefully carried out aiming at different client groups.

In summary, according to the information classification method provided by the embodiment of the application, N target objects to be classified are determined, wherein the target objects are objects for providing services for financial institutions, and N is a positive integer greater than 1; obtaining target information of each target object in a financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of a target object in a financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N; determining a function curve set according to M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set; the function curve set is input into a target classification model for classification processing, and category information corresponding to each target object is output, wherein the target classification model is constructed based on a dimension reduction algorithm and a clustering algorithm, and the problem that the efficiency of classifying a plurality of clients in a financial institution is low in the related art is solved. The method comprises the steps of obtaining a plurality of target information sets by determining a plurality of target objects to be classified and obtaining target information of each target object in a financial institution in a preset time period, wherein the target information comprises at least one of the following: asset information of target objects in a financial institution and transaction information of the target objects in the financial institution are transacted, a function curve set is determined according to a plurality of target information sets, the function curve set is input into a target classification model for classification processing, category information corresponding to each target object is output, the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm, and therefore dimension reduction and clustering processing can be simultaneously executed through the target classification model when clustering analysis is carried out on a plurality of client information data in the financial institution, and the effect of improving the efficiency of classifying a plurality of clients is achieved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The embodiment of the application also provides an information classification device, and it should be noted that the information classification device of the embodiment of the application can be used for executing the information classification method provided by the embodiment of the application. The following describes a classification device for information provided in the embodiments of the present application.

Fig. 7 is a schematic diagram of a classification device for information provided according to an embodiment of the present application. As shown in fig. 7, the apparatus includes: a first determination unit 701, a first acquisition unit 702, a second determination unit 703, and a first processing unit 704.

Specifically, a first determining unit 701 is configured to determine N target objects to be classified, where the target objects are objects that provide services for a financial institution, and N is a positive integer greater than 1;

the first obtaining unit 702 is configured to obtain target information of each target object in the financial institution within a preset period of time, to obtain M target information sets, where the target information includes at least one of the following: asset information of a target object in a financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N;

A second determining unit 703, configured to determine a function curve set according to M target information sets, where the function curve set includes at least M function curves, and each function curve is a function curve corresponding to each target information set;

the first processing unit 704 is configured to input the function curve set into a target classification model for classification processing, and output class information corresponding to each target object, where the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

In summary, in the information classification device provided in the embodiment of the present application, N target objects to be classified are determined by the first determining unit 701, where the target objects are objects that provide services for a financial institution, and N is a positive integer greater than 1; the first obtaining unit 702 obtains target information of each target object in the financial institution within a preset time period, so as to obtain M target information sets, where the target information includes at least one of the following: asset information of a target object in a financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N; the second determining unit 703 determines a function curve set according to the M target information sets, where the function curve set includes at least M function curves, and each function curve is a function curve corresponding to each target information set; the first processing unit 704 inputs the function curve set into a target classification model to perform classification processing, and outputs class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm, and the problem that in the related art, the efficiency of classifying a plurality of clients in a financial institution is low is solved. The method comprises the steps of obtaining a plurality of target information sets by determining a plurality of target objects to be classified and obtaining target information of each target object in a financial institution in a preset time period, wherein the target information comprises at least one of the following: asset information of target objects in a financial institution and transaction information of the target objects in the financial institution are transacted, a function curve set is determined according to a plurality of target information sets, the function curve set is input into a target classification model for classification processing, category information corresponding to each target object is output, the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm, and therefore dimension reduction and clustering processing can be simultaneously executed through the target classification model when clustering analysis is carried out on a plurality of client information data in the financial institution, and the effect of improving the efficiency of classifying a plurality of clients is achieved.

Optionally, in the information classification device provided in the embodiment of the present application, the object classification model includes at least a dimension reduction module and a cluster processing module, the dimension reduction module is determined based on a dimension reduction algorithm, the cluster processing module is determined based on a cluster algorithm, and the first processing unit 704 includes: the first processing module is used for inputting M function curves in the function curve set into a dimension reduction module in the target classification model to perform dimension reduction processing to obtain S function curves, wherein S is a positive integer smaller than M; and the second processing module is used for inputting the S function curves into the clustering processing module in the target classification model to perform clustering processing to obtain the class information corresponding to each target object.

Optionally, in the information classification device provided in the embodiment of the present application, if the dimension reduction algorithm is a principal component analysis algorithm, the first processing module includes: the first determining submodule is used for determining the importance degree of each function curve in the M function curves by combining a principal component analysis algorithm; the first screening submodule is used for screening out function curves with importance degrees higher than a preset importance degree from the M function curves according to the importance degrees of each function curve in the M function curves; and the second determining submodule is used for taking the function curve with the screened importance degree higher than the preset importance degree as an S-bar function curve.

Optionally, in the information classification device provided in the embodiment of the present application, the second processing module includes: the first adjustment sub-module is used for adjusting the parameter information of the clustering processing module by utilizing a maximum expected algorithm to obtain an adjusted clustering processing module; the first acquisition sub-module is used for acquiring the preset clustering quantity, wherein the clustering quantity is the quantity of the categories subjected to clustering treatment; the first analysis submodule is used for carrying out cluster analysis on the S function curves based on the preset cluster number and the adjusted cluster processing module to obtain a cluster result of the S function curves; and the third determination submodule is used for obtaining the category information corresponding to each target object according to the clustering result of the S function curves.

Optionally, in the information classification apparatus provided in the embodiment of the present application, the second determining unit 703 includes: the first acquisition module is used for acquiring a preset base function, wherein the preset base function at least comprises one of the following: fourier functions, B-spline functions, and polynomial functions; the first fitting module is used for carrying out data fitting on M target information sets based on a preset basis function to obtain a function curve set.

Optionally, in the information classification apparatus provided in the embodiment of the present application, the first obtaining unit 702 includes: the second acquisition module is used for acquiring target information of each target object in a financial institution within a preset time period to obtain M original information sets; the third processing module is used for carrying out data cleaning processing on the information in the M original information sets to obtain M cleaned original information sets; and the fourth processing module is used for carrying out data normalization processing on the M cleaned original information sets to obtain M target information sets.

Optionally, in the information classification device provided in the embodiment of the present application, the third processing module includes: a fourth determining sub-module for determining a target extremum based on information in the M sets of original information; a fifth determining submodule, configured to determine abnormal information in the M original information sets according to the target extremum, where a value corresponding to the abnormal information is greater than the target extremum; the first conversion sub-module is used for converting the numerical value corresponding to the abnormal information into a target extremum to obtain converted abnormal information; and a sixth determining sub-module, configured to obtain M cleaned sets of original information based on the converted abnormal information.

Optionally, in the information classification device provided in the embodiment of the present application, the third processing module includes: the second acquisition sub-module is used for acquiring missing information in the M original information sets; the first processing sub-module is used for determining the deletion degree of the deletion information and judging whether the deletion degree of the deletion information is larger than a preset deletion degree or not; the first deleting sub-module is used for deleting the missing information from the M original information sets if the missing degree of the missing information is larger than the preset missing degree, obtaining M deleted original information sets, and taking the M deleted original information sets as M cleaned original information sets; and the second processing sub-module is used for performing interpolation processing on the missing information if the missing degree of the missing information is not more than the preset missing degree, obtaining the missing information after the interpolation processing, and obtaining M cleaned original information sets based on the missing information after the interpolation processing.

Optionally, in the information classification device provided in the embodiment of the present application, the device further includes: the third determining unit is used for determining a financial product pushing scheme according to the category information corresponding to each target object after inputting the function curve set into the target classification model for classification processing and outputting the category information corresponding to each target object; and the first pushing unit is used for pushing the financial products in the financial product set to each target object based on a financial product pushing scheme, wherein the financial product set at least comprises financial products in a financial institution.

The information classifying device includes a processor and a memory, where the first determining unit 701, the first acquiring unit 702, the second determining unit 703, the first processing unit 704, and the like are stored as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the efficiency of classifying a plurality of clients is improved by adjusting kernel parameters.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.

An embodiment of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a method of classifying information.

The embodiment of the invention provides a processor which is used for running a program, wherein the program runs to execute the information classification method.

As shown in fig. 8, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and when the processor executes the program, the following steps are implemented: determining N target objects to be classified, wherein the target objects are objects for providing services for a financial institution, and N is a positive integer greater than 1; obtaining target information of each target object in the financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N; determining a function curve set according to the M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set; and inputting the function curve set into a target classification model for classification processing, and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

The processor also realizes the following steps when executing the program: the target classification model at least comprises a dimension reduction module and a clustering processing module, the dimension reduction module is determined based on the dimension reduction algorithm, the clustering processing module is determined based on the clustering algorithm, the function curve set is input into the target classification model for classification processing, and the output of the class information corresponding to each target object comprises the following steps: inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing to obtain S function curves, wherein S is a positive integer smaller than M; and inputting the S function curves into the clustering processing module in the target classification model to perform clustering processing to obtain class information corresponding to each target object.

The processor also realizes the following steps when executing the program: if the dimension reduction algorithm is a principal component analysis algorithm, inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing, where obtaining S function curves includes: determining the importance degree of each function curve in the M function curves by combining the principal component analysis algorithm; according to the importance degree of each function curve in the M function curves, function curves with importance degrees higher than a preset importance degree are screened out from the M function curves; and taking the function curve with the screened importance degree higher than the preset importance degree as the S function curves.

The processor also realizes the following steps when executing the program: inputting the S function curves into the clustering processing module in the target classification model for clustering processing, and obtaining category information corresponding to each target object comprises the following steps: adjusting parameter information of the clustering processing module by using a maximum expected algorithm to obtain an adjusted clustering processing module; obtaining a preset clustering number, wherein the clustering number is the number of categories after clustering treatment; performing cluster analysis on the S function curves based on the preset cluster number and the adjusted cluster processing module to obtain a cluster result of the S function curves; and obtaining the category information corresponding to each target object according to the clustering result of the S function curves.

The processor also realizes the following steps when executing the program: determining a set of function curves from the M sets of target information includes: obtaining a preset base function, wherein the preset base function at least comprises one of the following: fourier functions, B-spline functions, and polynomial functions; and performing data fitting on the M target information sets based on the preset basis function to obtain the function curve set.

The processor also realizes the following steps when executing the program: obtaining target information of each target object in the financial institution within a preset time period, wherein obtaining M target information sets comprises: acquiring target information of each target object in the financial institution within a preset time period to obtain M original information sets; carrying out data cleaning treatment on the information in the M original information sets to obtain M cleaned original information sets; and carrying out data normalization processing on the M cleaned original information sets to obtain M target information sets.

The processor also realizes the following steps when executing the program: performing data cleaning processing on the information in the M original information sets to obtain cleaned M original information sets, wherein the steps of: determining a target extremum based on information in the M sets of original information; determining abnormal information in the M original information sets according to the target extremum, wherein the numerical value corresponding to the abnormal information is larger than the target extremum; converting the numerical value corresponding to the abnormal information into the target extremum to obtain converted abnormal information; and obtaining M cleaned original information sets based on the converted abnormal information.

The processor also realizes the following steps when executing the program: performing data cleaning processing on the information in the M original information sets to obtain cleaned M original information sets, wherein the steps of: acquiring missing information in the M original information sets; determining the deletion degree of the deletion information, and judging whether the deletion degree of the deletion information is larger than a preset deletion degree or not; if the deletion degree of the deletion information is greater than the preset deletion degree, deleting the deletion information from the M original information sets to obtain deleted M original information sets, and taking the deleted M original information sets as the cleaned M original information sets; if the missing degree of the missing information is not greater than the preset missing degree, performing interpolation processing on the missing information to obtain the missing information after the interpolation processing, and obtaining the M original information sets after cleaning based on the missing information after the interpolation processing.

The processor also realizes the following steps when executing the program: after the function curve set is input into a target classification model to carry out classification processing and category information corresponding to each target object is output, the method further comprises the following steps: determining a financial product pushing scheme according to the category information corresponding to each target object; pushing financial products in a financial product set to each target object based on the financial product pushing scheme, wherein the financial product set at least comprises financial products in the financial institution.

The device herein may be a server, PC, PAD, cell phone, etc.

The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: determining N target objects to be classified, wherein the target objects are objects for providing services for a financial institution, and N is a positive integer greater than 1; obtaining target information of each target object in the financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N; determining a function curve set according to the M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set; and inputting the function curve set into a target classification model for classification processing, and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: the target classification model at least comprises a dimension reduction module and a clustering processing module, the dimension reduction module is determined based on the dimension reduction algorithm, the clustering processing module is determined based on the clustering algorithm, the function curve set is input into the target classification model for classification processing, and the output of the class information corresponding to each target object comprises the following steps: inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing to obtain S function curves, wherein S is a positive integer smaller than M; and inputting the S function curves into the clustering processing module in the target classification model to perform clustering processing to obtain class information corresponding to each target object.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: if the dimension reduction algorithm is a principal component analysis algorithm, inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing, where obtaining S function curves includes: determining the importance degree of each function curve in the M function curves by combining the principal component analysis algorithm; according to the importance degree of each function curve in the M function curves, function curves with importance degrees higher than a preset importance degree are screened out from the M function curves; and taking the function curve with the screened importance degree higher than the preset importance degree as the S function curves.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: inputting the S function curves into the clustering processing module in the target classification model for clustering processing, and obtaining category information corresponding to each target object comprises the following steps: adjusting parameter information of the clustering processing module by using a maximum expected algorithm to obtain an adjusted clustering processing module; obtaining a preset clustering number, wherein the clustering number is the number of categories after clustering treatment; performing cluster analysis on the S function curves based on the preset cluster number and the adjusted cluster processing module to obtain a cluster result of the S function curves; and obtaining the category information corresponding to each target object according to the clustering result of the S function curves.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining a set of function curves from the M sets of target information includes: obtaining a preset base function, wherein the preset base function at least comprises one of the following: fourier functions, B-spline functions, and polynomial functions; and performing data fitting on the M target information sets based on the preset basis function to obtain the function curve set.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: obtaining target information of each target object in the financial institution within a preset time period, wherein obtaining M target information sets comprises: acquiring target information of each target object in the financial institution within a preset time period to obtain M original information sets; carrying out data cleaning treatment on the information in the M original information sets to obtain M cleaned original information sets; and carrying out data normalization processing on the M cleaned original information sets to obtain M target information sets.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: performing data cleaning processing on the information in the M original information sets to obtain cleaned M original information sets, wherein the steps of: determining a target extremum based on information in the M sets of original information; determining abnormal information in the M original information sets according to the target extremum, wherein the numerical value corresponding to the abnormal information is larger than the target extremum; converting the numerical value corresponding to the abnormal information into the target extremum to obtain converted abnormal information; and obtaining M cleaned original information sets based on the converted abnormal information.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: performing data cleaning processing on the information in the M original information sets to obtain cleaned M original information sets, wherein the steps of: acquiring missing information in the M original information sets; determining the deletion degree of the deletion information, and judging whether the deletion degree of the deletion information is larger than a preset deletion degree or not; if the deletion degree of the deletion information is greater than the preset deletion degree, deleting the deletion information from the M original information sets to obtain deleted M original information sets, and taking the deleted M original information sets as the cleaned M original information sets; if the missing degree of the missing information is not greater than the preset missing degree, performing interpolation processing on the missing information to obtain the missing information after the interpolation processing, and obtaining the M original information sets after cleaning based on the missing information after the interpolation processing.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: after the function curve set is input into a target classification model to carry out classification processing and category information corresponding to each target object is output, the method further comprises the following steps: determining a financial product pushing scheme according to the category information corresponding to each target object; pushing financial products in a financial product set to each target object based on the financial product pushing scheme, wherein the financial product set at least comprises financial products in the financial institution.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method of classifying information, comprising:

determining N target objects to be classified, wherein the target objects are objects for providing services for a financial institution, and N is a positive integer greater than 1;

obtaining target information of each target object in the financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N;

Determining a function curve set according to the M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set;

and inputting the function curve set into a target classification model for classification processing, and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

2. The method according to claim 1, wherein the target classification model at least includes a dimension reduction module and a clustering processing module, the dimension reduction module is determined based on the dimension reduction algorithm, the clustering processing module is determined based on the clustering algorithm, the function curve set is input into the target classification model for classification processing, and outputting class information corresponding to each target object includes:

inputting the M function curves in the function curve set into the dimension reduction module in the target classification model to perform dimension reduction processing to obtain S function curves, wherein S is a positive integer smaller than M;

and inputting the S function curves into the clustering processing module in the target classification model to perform clustering processing to obtain class information corresponding to each target object.

3. The method of claim 2, wherein if the dimension reduction algorithm is a principal component analysis algorithm, inputting the M function curves in the function curve set into the dimension reduction module in the target classification model for dimension reduction processing, and obtaining S function curves includes:

determining the importance degree of each function curve in the M function curves by combining the principal component analysis algorithm;

according to the importance degree of each function curve in the M function curves, function curves with importance degrees higher than a preset importance degree are screened out from the M function curves;

and taking the function curve with the screened importance degree higher than the preset importance degree as the S function curves.

4. The method of claim 2, wherein inputting the S-bar function curve into the clustering module in the target classification model for clustering, and obtaining the class information corresponding to each target object comprises:

adjusting parameter information of the clustering processing module by using a maximum expected algorithm to obtain an adjusted clustering processing module;

obtaining a preset clustering number, wherein the clustering number is the number of categories after clustering treatment;

Performing cluster analysis on the S function curves based on the preset cluster number and the adjusted cluster processing module to obtain a cluster result of the S function curves;

and obtaining the category information corresponding to each target object according to the clustering result of the S function curves.

5. The method of claim 1, wherein determining a set of function curves from the set of M target information comprises:

obtaining a preset base function, wherein the preset base function at least comprises one of the following: fourier functions, B-spline functions, and polynomial functions;

and performing data fitting on the M target information sets based on the preset basis function to obtain the function curve set.

6. The method of claim 1, wherein obtaining target information for each target object in the financial institution for a predetermined period of time, the obtaining M target information sets comprises:

acquiring target information of each target object in the financial institution within a preset time period to obtain M original information sets;

carrying out data cleaning treatment on the information in the M original information sets to obtain M cleaned original information sets;

And carrying out data normalization processing on the M cleaned original information sets to obtain M target information sets.

7. The method of claim 6, wherein performing data cleaning processing on information in the M sets of original information to obtain cleaned M sets of original information comprises:

determining a target extremum based on information in the M sets of original information;

determining abnormal information in the M original information sets according to the target extremum, wherein the numerical value corresponding to the abnormal information is larger than the target extremum;

converting the numerical value corresponding to the abnormal information into the target extremum to obtain converted abnormal information;

and obtaining M cleaned original information sets based on the converted abnormal information.

8. The method of claim 6, wherein performing data cleaning processing on information in the M sets of original information to obtain cleaned M sets of original information comprises:

acquiring missing information in the M original information sets;

determining the deletion degree of the deletion information, and judging whether the deletion degree of the deletion information is larger than a preset deletion degree or not;

If the deletion degree of the deletion information is greater than the preset deletion degree, deleting the deletion information from the M original information sets to obtain deleted M original information sets, and taking the deleted M original information sets as the cleaned M original information sets;

if the missing degree of the missing information is not greater than the preset missing degree, performing interpolation processing on the missing information to obtain the missing information after the interpolation processing, and obtaining the M original information sets after cleaning based on the missing information after the interpolation processing.

9. The method according to claim 1, wherein after inputting the function curve set into a target classification model for classification processing and outputting class information corresponding to each target object, the method further comprises:

determining a financial product pushing scheme according to the category information corresponding to each target object;

pushing financial products in a financial product set to each target object based on the financial product pushing scheme, wherein the financial product set at least comprises financial products in the financial institution.

10. An apparatus for classifying information, comprising:

A first determining unit, configured to determine N target objects to be classified, where the target objects are objects that provide services for a financial institution, and N is a positive integer greater than 1;

the first acquisition unit is used for acquiring target information of each target object in the financial institution within a preset time period to obtain M target information sets, wherein the target information comprises at least one of the following: asset information of the target object in the financial institution and transaction information of the target object for transaction in the financial institution, wherein M is a positive integer greater than or equal to N;

the second determining unit is used for determining a function curve set according to the M target information sets, wherein the function curve set at least comprises M function curves, and each function curve is a function curve corresponding to each target information set;

the first processing unit is used for inputting the function curve set into a target classification model for classification processing and outputting class information corresponding to each target object, wherein the target classification model is a model constructed based on a dimension reduction algorithm and a clustering algorithm.

11. A computer-readable storage medium storing a program, wherein the program performs the method of classifying information according to any one of claims 1 to 9.

12. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of classifying information of any of claims 1 to 9.