CN120580052A

CN120580052A - Data processing method, apparatus, medium and product

Info

Publication number: CN120580052A
Application number: CN202510710002.8A
Authority: CN
Inventors: 张园; 苏新锋; 孔亮; 张超莉
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2025-05-29
Filing date: 2025-05-29
Publication date: 2025-09-02

Abstract

Embodiments of the present invention disclose a data processing method, device, medium, and product, wherein the method includes: obtaining a first data set and a second data set, wherein the first data set includes multiple non-performing loan data items, and the second data set includes multiple preset non-performing loan indicator items for characterizing non-performing loans; calculating the maximum information coefficient between each preset non-performing loan indicator item and each loan data item, and determining multiple first data items in the loan data item based on the maximum information coefficient; constructing a first matrix based on the multiple first data items, and calculating the eigenvectors of the first matrix; calculating the vector distance between the vector corresponding to each first data item in the first matrix and the eigenvector of the first matrix, and determining, based on the vector distance, a second data item in the loan data item that is associated with the preset non-performing loan indicator item. The technical solution of the embodiment of the present invention can accurately retain the data items associated with the non-performing loan indicator, thereby improving analysis efficiency and accuracy.

Description

Data processing method, apparatus, medium and product

Technical Field

Embodiments of the present invention relate to the field of computer technologies, and in particular, to a data processing method, apparatus, medium, and product.

Background

In the financial field, identification of bad loans is critical to risk control and credit policy development. Currently, for poor loan data, a traditional data analysis method is generally adopted in the industry, so that the poor loan data is easily limited by subjectivity, linear assumption and the like, and many factors influencing the poor loan result cause inaccurate data analysis effect.

Disclosure of Invention

The embodiment of the invention provides a data processing method, equipment, medium and product, which can reduce the influence of human factors on results in bad loan data analysis, reduce the influence of noise and abnormal values in the bad loan data on the results, accurately reserve data items associated with bad loan indexes and improve analysis efficiency and accuracy.

In a first aspect, an embodiment of the present invention provides a data processing method, including:

Acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans;

Calculating the maximum information coefficient between each preset poor loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient;

Constructing a first matrix based on a plurality of first data items, and calculating eigenvectors of the first matrix;

And calculating the vector distance between the vector corresponding to each first data item in the first matrix and the characteristic vector of the first matrix, and determining a second data item which is associated with a preset bad loan index item in the loan data item according to the vector distance.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:

The data acquisition module is used for acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans;

The maximum information number calculation module is used for calculating the maximum information coefficient between each preset bad loan index item and each loan data item respectively, and determining a plurality of first data items in the loan data items according to the maximum information coefficient;

the matrix construction module is used for constructing a first matrix based on a plurality of first data items and calculating the eigenvectors of the first matrix;

The data item determining module is used for calculating the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determining a second data item which is related to the preset bad loan index item in the loan data item according to the vector distance.

In a third aspect, an embodiment of the present invention further provides a computer device, including:

One or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by one or more processors, the one or more processors are caused to implement the data processing method as provided by any embodiment of the present invention.

In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as provided by any of the embodiments of the present invention.

In a fifth aspect, embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as provided by any of the embodiments of the present invention.

The embodiments of the above invention have the following advantages or benefits:

The method comprises the steps of obtaining a first data set and a second data set, wherein the first data set comprises a plurality of pieces of bad loan data, each piece of bad loan data comprises a plurality of pieces of loan data, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, calculating maximum information coefficients between each preset bad loan index item and each loan data item respectively, determining a plurality of first data items in the loan data items according to the maximum information coefficients, constructing a first matrix based on the plurality of first data items, calculating feature vectors of the first matrix, calculating vector distances between vectors corresponding to each first data item in the first matrix and the feature vectors of the first matrix, and determining second data items related to the preset bad loan index items in the loan data items according to the vector distances. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.

Drawings

FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of yet another data processing method provided by an embodiment of the present invention;

FIG. 3 is a flow chart of yet another data processing method provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, where the embodiment is applicable to a scenario of performing data processing and analysis on poor loan data. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, integrated in a computer device with application development functionality.

As shown in fig. 1, the data processing method of the present embodiment includes the steps of:

S110, acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans.

Wherein the first data set may be a poor loan data set, which is poor loan-related data collected from a financial institution. The bad loan data may be a loan that cannot be retrieved as expected after the loan agency issues, including overdue, slow, bad accounts, etc. Poor loan data encompasses a variety of categories, such as loan type, use, amount, overdue status, regional distribution, and industry area, among others. Each piece of bad loan data may include a plurality of loan data items, including, for example, loan base information, borrower information, repayment records, loan status, risk classifications, guarantee information, financial information, credit scores, and the like.

The second data set may be a set of bad loan indicators including a plurality of preset bad loan indicators for characterizing a bad loan. The preset bad loan index may be a bad loan rate, a bad loan balance, a bad loan duration, a bad loan reserve coverage, a bad loan migration rate, and other associated index.

In addition, after the first data set is acquired, bad loan data preprocessing may also be performed. For example, the collected bad loan data is subjected to standardized processing, including error removal, duplicate removal, default value filling, and the like, to ensure the accuracy and integrity of the data.

S120, calculating the maximum information coefficient between each preset bad loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient.

The maximum information coefficient (Maximal Information Coefficient, MIC) is a statistic used to measure the degree of correlation between two variables, and can capture various relationships including linearity and nonlinearity. MIC has the advantage of universality and fairness, and can capture various types of associations, not just linear relationships.

The calculation core of the maximum information coefficient is to quantize mutual information (Mutual Information, MI) among variables through information entropy, map the result to the [0,1] interval through normalization processing, and thus uniformly weigh the association modes of different types and different scales. The closer the value of MIC is to 1, the stronger the relationship between the two variables, whether this relationship is linear, nonlinear, or complex periodic. Conversely, if the MIC is close to 0, it means that the two variables are almost independent.

When calculating the maximum information coefficient between each preset bad loan index item and each loan data item respectively, the Mutual Information (MI) can be calculated, wherein the mutual information is an index for measuring the sharing information of two random variables, and the formula can be expressed as follows:

Where p (x, y) is a joint probability distribution and p (x) and p (y) are edge probability distributions. X and Y represent two random variables, respectively, a preset poor loan index and a corresponding loan data. Then, network division and maximum mutual information determination are carried out, two-dimensional data (X, Y) are divided into N multiplied by m grids (N and m are grid line numbers and grid columns, N multiplied by m is less than or equal to B, B is a preset grid total number upper limit, B=Nbeta is usually taken, N is a sample size, beta is a regulating parameter, and 0.6 is usually taken). For all possible meshing schemes, the corresponding mutual information MIn, m (X, Y) is calculated and taken to be maximum, and the corresponding formula can be expressed as maxMI (X, Y) =max _n,m:n×m≤BMI_n,m (X, Y). Finally, normalization yields the MIC, normalized by dividing by the maximum possible mutual information (i.e., when X and Y are fully correlated): the final result ranges from [0,1].

Assuming 10 loan data items in the first data set and 3 default bad loan indicators in the second data set, 30 maximum information coefficients may be calculated. The Maximum Information Coefficient (MIC) can quantify the nonlinear relationship between each loan data item and a preset bad loan index, and accurately preserve the data variables required by the index. And then, according to the numerical value of each maximum information coefficient, a plurality of loan data items corresponding to the maximum information coefficients with larger numerical values can be selected as a plurality of first data items. The first plurality of all loan data items may also be determined by other filtering rules based on the maximum information coefficient as data that has an impact on poor loan data. Each bad loan indicator may correspond to a plurality of first data items.

S130, constructing a first matrix based on a plurality of first data items, and calculating eigenvectors of the first matrix.

The pearson correlation coefficient may be calculated for each first data item in pairs using each first data item as a matrix variable to construct a first matrix, i.e. a correlation matrix. For example, the first data item includes X, Y and three data items Z, and the pearson correlation coefficient calculated by two pairs may calculate the pearson correlation coefficient between XX, XY, XZ, YX, YY, YZ, ZX, ZY and ZZ, thereby obtaining a correlation matrix, and calculate the eigenvectors and eigenvalues of the correlation matrix.

S140, calculating the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determining a second data item which is related to the preset bad loan index item in the loan data item according to the vector distance.

The calculation of the vector distance may characterize the vector distance between each first data item by the standard deviation and variance between the corresponding vector and the eigenvector of the first matrix.

The smaller the standard deviation and variance between the vector corresponding to the first data item and the feature vector, the closer the distance between the vector corresponding to the first data item and the feature vector is, which means that the more relevant the first data item and the corresponding bad loan index is.

In addition, the vector corresponding to each first data item, the corresponding variance and the corresponding distance can be weighted and summed to calculate, the influence factor of the variable on the index is obtained, the order of the influence of each first data item on the corresponding bad loan index item can be obtained after the order, and the characteristics of the index data are obtained.

According to the technical scheme, a first data set is obtained, wherein the first data set comprises a plurality of pieces of bad loan data, each piece of bad loan data comprises a plurality of preset bad loan index items and loan data items, a maximum information coefficient between each preset bad loan index item and each loan data item is calculated, a plurality of first data items are determined in the loan data items according to the maximum information coefficient, a first matrix is constructed based on the plurality of first data items, feature vectors of the first matrix are calculated, vector distances between vectors corresponding to each first data item in the first matrix and the feature vectors of the first matrix are calculated, and second data items related to the preset bad loan index items in the loan data items are determined according to the vector distances. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.

Fig. 2 is a flowchart of a data processing method according to an embodiment of the present invention, where the data processing method in this embodiment and the data processing method in the foregoing embodiments belong to the same inventive concept, and further describes a process of calculating a maximum information coefficient. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, integrated in a computer device with application development functionality.

As shown in fig. 2, the data processing method of the present embodiment includes the steps of:

S210, acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans.

S220, calculating the maximum information coefficient between each preset bad loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient.

The calculation of the maximum information coefficient between each preset bad loan index item and each loan data item respectively can be realized by a function call mode. And calling a maximum information coefficient calculation function, and taking the preset bad loan index item and the corresponding loan data item as function input data of the maximum information coefficient calculation function, thereby obtaining a corresponding maximum information coefficient calculation result.

Further, determining a plurality of first data items in the loan data items according to the maximum information coefficient may be selecting a first preset number of the loan data items corresponding to the maximum information data from large to small according to the value of the maximum information number to obtain a plurality of first data items, or selecting a second preset number of the loan data items corresponding to the maximum information data from large to small in the plurality of maximum information numbers corresponding to each preset poor loan index item to obtain a plurality of first data items.

S230, constructing a first matrix based on the plurality of first data items, and calculating eigenvectors of the first matrix.

S240, calculating the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determining a second data item which is related to the preset bad loan index item in the loan data item according to the vector distance.

S250, according to a preset bad loan index item adjustment rule, according to the second data item and the corresponding vector distance, adjusting the corresponding preset bad loan index item or an index reference value corresponding to the second data item.

The preset bad loan index item adjustment rule may be a preset policy for adjusting risk early warning. And adjusting a risk management strategy, such as interest rate adjustment, loan amount control, loan period adjustment and the like, according to the second data item and the data characteristics of the corresponding vector distance. Or the index reference value corresponding to the preset bad loan index item can be adjusted.

For a new loan, a financial institution can better identify whether the loan is a bad loan or not, and early warning is carried out for a client manager in time, so that the credit management level and the property quality are improved, and the credit fund risk and the use efficiency are reduced.

According to the technical scheme, a first data set and a second data set are obtained, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, the maximum information coefficient between each preset bad loan index item and each loan data item is calculated, a plurality of first data items are determined in the loan data items according to the maximum information coefficient, a first matrix is built based on the plurality of first data items, the feature vector of the first matrix is calculated, the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix is calculated, the second data item which is relevant to the preset bad loan index item in the loan data items is determined according to the vector distance, and the corresponding reference value of the corresponding preset bad loan index item is adjusted according to the preset bad loan index item adjustment rule and the corresponding vector distance. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.

Fig. 3 is a flowchart of a data processing method according to an embodiment of the present invention, where the data processing method in this embodiment and the data processing method in the foregoing embodiments belong to the same inventive concept, and further describes a process of constructing a correlation matrix and performing analysis. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, integrated in a computer device with application development functionality.

As shown in fig. 3, the data processing method of the present embodiment includes the steps of:

S310, acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans.

S320, constructing a two-dimensional data variable data set of each preset bad loan index item and corresponding to each loan data item based on the first data set and the second data set.

Each bad loan index item data in the second data set and the loan data items in the first data set can be selected to respectively construct a plurality of bad loan indexes, such as a bad rate, a bad loan balance and a bad loan duration, in the second data set of the two-dimensional variable data set. An index is selected from the second data set as an index variable, such as a bad loan balance, and a corresponding two-dimensional variable data set is constructed with a certain loan data item in the first data set. And constructing a two-dimensional variable data set by using the balance of the bad loan and the total amount of the loan. And respectively constructing a two-dimensional variable data set by using other bad loan data variables and the selected indexes, and recording the two-dimensional variable data set as D1, D2 and D3.

S330, calculating the maximum information coefficient corresponding to each two-dimensional data variable data set.

And inputting each two-dimensional data variable data set D1, D2, D3, and Dn into a preset maximum information coefficient calculation component to obtain a corresponding maximum information coefficient.

The calculation process of the preset maximum information coefficient calculating component may refer to the formula MIC (D) =max { M (D) k, M },

Where MI (D, K, m) represents the mutual information value of the two-dimensional variable data set D divided by an integer (K, m) (K and m represent the number of rows and columns of the grid), and when the mutual information value is the maximum value, the magnitudes of K and m are obtained by an exhaustive method, where K X m is smaller than B (n), and B is a function of the data capacity n of the first data set.

S340, calculating correlation coefficients among the plurality of first data items, constructing a correlation matrix based on the correlation coefficients, and calculating eigenvectors of the correlation matrix.

The correlation coefficient may be a pearson correlation coefficient, which is a statistical index for measuring the degree of correlation between two variables, and is used to measure the strength and direction of the linear relationship between the two variables. Its value is between-1 and 1, 1 representing that there is a complete positive correlation between the two variables, -1 representing a complete negative correlation, and 0 representing independence. The calculation of the pearson correlation coefficient is based on the mean and standard deviation of the two variables, and therefore requires attention to the normalization process of the data at the time of application.

Correlation matrices are a widely used mathematical tool in statistics and data analysis to represent correlations between variables in a dataset and can be used to analyze a multi-variable dataset. By analyzing the correlation matrix, the correlation between the variables can be revealed, so that the degree and direction of the correlation between the variables can be known, and the correlation matrix can be used for identifying the most important variables so as to select the variables or reduce the dimension. Correlation matrices are a very useful statistical tool that can help us understand the underlying structure and associations in complex data.

S350, calculating standard deviation and variance of vectors corresponding to each first data item in the first matrix and feature vectors of the correlation matrix, and determining second data items which are associated with preset bad loan index items in the loan data items according to the standard deviation and variance.

According to the technical scheme, a first data set and a second data set are obtained, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, a two-dimensional data variable data set corresponding to each preset bad loan index item is built based on the first data set, the maximum information coefficient corresponding to each two-dimensional data variable data set is calculated, the correlation coefficient among the plurality of first data items is calculated, a correlation matrix is built based on the correlation coefficient, the feature vector of the correlation matrix is calculated, the standard deviation and the variance of the vector corresponding to each first data item in the first matrix and the feature vector of the correlation matrix are calculated, and the second data item which is related to the preset bad loan index item in the loan data items is determined according to the standard deviation and the variance. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.

Fig. 4 is a schematic structural diagram of a data processing device according to an embodiment of the present invention, where the embodiment is applicable to a scenario of performing data processing on bad loan data. The data processing device can be realized by means of software and/or hardware, and is integrated in a computer terminal device with application development functions.

As shown in fig. 4, the data processing apparatus includes a data acquisition module 410, a maximum information number calculation module 420, a matrix construction module 430, and a data item determination module 440.

The data obtaining module 410 is configured to obtain a first data set and a second data set, where the first data set includes a plurality of bad loan data, each bad loan data includes a plurality of loan data items, the second data set includes a plurality of preset bad loan index items for characterizing a bad loan, the maximum information number calculating module 420 is configured to calculate a maximum information coefficient between each preset bad loan index item and each loan data item, and determine a plurality of first data items in the loan data items according to the maximum information coefficient, the matrix constructing module 430 is configured to construct a first matrix based on the plurality of first data items, and calculate a feature vector of the first matrix, and the data item determining module 440 is configured to calculate a vector distance between a vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determine a second data item associated with the preset bad loan index item in the loan data items according to the vector distance.

According to the technical scheme, a first data set and a second data set are obtained, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, the maximum information coefficient between each preset bad loan index item and each loan data item is calculated, a plurality of first data items are determined in the loan data items according to the maximum information coefficient, a first matrix is built based on the plurality of first data items, the feature vector of the first matrix is calculated, the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix is calculated, and the second data item related to the preset bad loan index item in the loan data items is determined according to the vector distance. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.

In an alternative embodiment, the data processing apparatus further comprises an index data adjustment module for:

And according to a preset bad loan index item adjusting rule, adjusting an index reference value corresponding to the corresponding preset bad loan index item according to the second data item and the corresponding vector distance.

In an alternative embodiment, the maximum information number calculation module 420 is specifically configured to:

Constructing a two-dimensional data variable data set of each preset bad loan index item respectively corresponding to each loan data item based on the first data set and the second data set;

And calculating the maximum information coefficient corresponding to each two-dimensional data variable data set.

And inputting each two-dimensional data variable data set into a preset maximum information coefficient calculation component to obtain a corresponding maximum information coefficient.

In an alternative embodiment, the maximum information count calculation module 420 may also be configured to:

Selecting a first preset number of loan data items corresponding to the maximum information data from large to small according to the numerical value of the maximum information number to obtain a plurality of first data items, or

And selecting loan data items corresponding to second preset quantity of maximum information data from large to small in a plurality of maximum information numbers corresponding to each preset bad loan index item respectively to obtain a plurality of first data items.

In an alternative embodiment, the matrix construction module 430 is specifically configured to:

calculating correlation coefficients among a plurality of first data items, and constructing a correlation matrix based on the correlation coefficients;

and calculating the eigenvectors of the correlation matrix.

In an alternative embodiment, the data item determination module 440 is specifically configured to:

and calculating standard deviation and variance of the vector corresponding to each first data item in the first matrix and the eigenvector of the correlation matrix.

The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention. The computer device 12 may be any terminal device with computing power, such as an intelligent controller, a server, a mobile phone, and the like.

As shown in FIG. 5, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that connects the various system components, including system memory 28 and processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, artificial intelligence systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a data processing method provided by the present embodiment, the method including:

The embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a data processing method as provided by any embodiment of the present invention, the method comprising:

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, python, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as provided by any of the embodiments of the present application.

Computer program product in the implementation, the computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, python, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A data processing method, comprising:

Obtaining a first data set and a second data set, wherein the first data set includes a plurality of non-performing loan data, each of which includes a plurality of loan data items, and the second data set includes a plurality of preset non-performing loan indicator items for characterizing non-performing loans;

Calculating the maximum information coefficient between each of the preset non-performing loan indicator items and each of the loan data items, and determining a plurality of first data items in the loan data items according to the maximum information coefficients;

constructing a first matrix based on the plurality of first data items, and calculating an eigenvector of the first matrix;

The vector distance between the vector corresponding to each first data item in the first matrix and the eigenvector of the first matrix is calculated, and the second data item associated with the preset non-performing loan indicator item in the loan data item is determined based on the vector distance.

2. The method according to claim 1, further comprising:

According to the preset non-performing loan index item adjustment rule, the corresponding preset non-performing loan index item or the index reference value corresponding to the second data item is adjusted according to the second data item and the corresponding vector distance.

3. The method according to claim 1 or 2, wherein the step of calculating the maximum information coefficient between each of the preset non-performing loan indicator items and each of the loan data items comprises:

Based on the first data set and the second data set, constructing a two-dimensional data variable data set corresponding to each of the preset non-performing loan indicator items and each of the loan data items;

Calculate the maximum information coefficient corresponding to each of the two-dimensional data variable data sets.

4. The method according to claim 3, wherein calculating the maximum information coefficient corresponding to each of the two-dimensional data variable data sets comprises:

Each of the two-dimensional data variable data sets is input into a preset maximum information coefficient calculation component to obtain a corresponding maximum information coefficient.

5. The method according to claim 1, wherein determining a plurality of first data items in the loan data items according to the maximum information coefficient comprises:

According to the value of the maximum information number, a first preset number of loan data items corresponding to the maximum information data are selected from large to small to obtain a plurality of the first data items; or

Among the multiple maximum information numbers corresponding to each of the preset non-performing loan indicator items, loan data items corresponding to a second preset number of maximum information data are selected from large to small to obtain multiple first data items.

6. The method according to claim 1, wherein constructing a first matrix based on the plurality of first data items and calculating an eigenvector of the first matrix comprises:

Calculating correlation coefficients between a plurality of the first data items, and constructing a correlation matrix based on the correlation coefficients;

Calculate the eigenvectors of the correlation matrix.

7. The method according to claim 6, wherein calculating the vector distance between the vector corresponding to each first data item in the first matrix and the eigenvector of the first matrix comprises:

Calculate the standard deviation and variance of the vector corresponding to each first data item in the first matrix and the eigenvector of the correlation matrix.

8. A computer device, characterized in that the computer device comprises:

one or more processors;

a memory for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method according to any one of claims 1 to 7.

9. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the data processing method according to any one of claims 1 to 7 is implemented.

10. A computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the computer program implements the data processing method according to any one of claims 1 to 7.