[go: up one dir, main page]

CN120580052A - Data processing method, apparatus, medium and product - Google Patents

Data processing method, apparatus, medium and product

Info

Publication number
CN120580052A
CN120580052A CN202510710002.8A CN202510710002A CN120580052A CN 120580052 A CN120580052 A CN 120580052A CN 202510710002 A CN202510710002 A CN 202510710002A CN 120580052 A CN120580052 A CN 120580052A
Authority
CN
China
Prior art keywords
data
loan
items
matrix
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510710002.8A
Other languages
Chinese (zh)
Inventor
张园
苏新锋
孔亮
张超莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202510710002.8A priority Critical patent/CN120580052A/en
Publication of CN120580052A publication Critical patent/CN120580052A/en
Pending legal-status Critical Current

Links

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

本发明实施例公开了一种数据处理方法、设备、介质和产品,其中,方法包括:获取第一数据集合和第二数据集合,第一数据集合中包括多条不良贷款数据,第二数据集合包括多项用于表征不良贷款的预设不良贷款指标项;计算每个预设不良贷款指标项分别与每一项贷款数据项之间的最大信息系数,并根据最大信息系数在贷款数据项中确定多个第一数据项;基于多个第一数据项构建第一矩阵,并计算第一矩阵的特征向量;计算第一矩阵中每个第一数据项对应的向量与第一矩阵的特征向量的向量距离,并根据向量距离确定贷款数据项中对预设不良贷款指标项有关联的第二数据项。本发明实施例的技术方案可以精确的保留不良贷款指标关联的数据项,提高分析效率和准确度。

Embodiments of the present invention disclose a data processing method, device, medium, and product, wherein the method includes: obtaining a first data set and a second data set, wherein the first data set includes multiple non-performing loan data items, and the second data set includes multiple preset non-performing loan indicator items for characterizing non-performing loans; calculating the maximum information coefficient between each preset non-performing loan indicator item and each loan data item, and determining multiple first data items in the loan data item based on the maximum information coefficient; constructing a first matrix based on the multiple first data items, and calculating the eigenvectors of the first matrix; calculating the vector distance between the vector corresponding to each first data item in the first matrix and the eigenvector of the first matrix, and determining, based on the vector distance, a second data item in the loan data item that is associated with the preset non-performing loan indicator item. The technical solution of the embodiment of the present invention can accurately retain the data items associated with the non-performing loan indicator, thereby improving analysis efficiency and accuracy.

Description

Data processing method, apparatus, medium and product
Technical Field
Embodiments of the present invention relate to the field of computer technologies, and in particular, to a data processing method, apparatus, medium, and product.
Background
In the financial field, identification of bad loans is critical to risk control and credit policy development. Currently, for poor loan data, a traditional data analysis method is generally adopted in the industry, so that the poor loan data is easily limited by subjectivity, linear assumption and the like, and many factors influencing the poor loan result cause inaccurate data analysis effect.
Disclosure of Invention
The embodiment of the invention provides a data processing method, equipment, medium and product, which can reduce the influence of human factors on results in bad loan data analysis, reduce the influence of noise and abnormal values in the bad loan data on the results, accurately reserve data items associated with bad loan indexes and improve analysis efficiency and accuracy.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
Acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans;
Calculating the maximum information coefficient between each preset poor loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient;
Constructing a first matrix based on a plurality of first data items, and calculating eigenvectors of the first matrix;
And calculating the vector distance between the vector corresponding to each first data item in the first matrix and the characteristic vector of the first matrix, and determining a second data item which is associated with a preset bad loan index item in the loan data item according to the vector distance.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:
The data acquisition module is used for acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans;
The maximum information number calculation module is used for calculating the maximum information coefficient between each preset bad loan index item and each loan data item respectively, and determining a plurality of first data items in the loan data items according to the maximum information coefficient;
the matrix construction module is used for constructing a first matrix based on a plurality of first data items and calculating the eigenvectors of the first matrix;
The data item determining module is used for calculating the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determining a second data item which is related to the preset bad loan index item in the loan data item according to the vector distance.
In a third aspect, an embodiment of the present invention further provides a computer device, including:
One or more processors;
a memory for storing one or more programs;
when the one or more programs are executed by one or more processors, the one or more processors are caused to implement the data processing method as provided by any embodiment of the present invention.
In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as provided by any of the embodiments of the present invention.
In a fifth aspect, embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as provided by any of the embodiments of the present invention.
The embodiments of the above invention have the following advantages or benefits:
The method comprises the steps of obtaining a first data set and a second data set, wherein the first data set comprises a plurality of pieces of bad loan data, each piece of bad loan data comprises a plurality of pieces of loan data, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, calculating maximum information coefficients between each preset bad loan index item and each loan data item respectively, determining a plurality of first data items in the loan data items according to the maximum information coefficients, constructing a first matrix based on the plurality of first data items, calculating feature vectors of the first matrix, calculating vector distances between vectors corresponding to each first data item in the first matrix and the feature vectors of the first matrix, and determining second data items related to the preset bad loan index items in the loan data items according to the vector distances. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.
Drawings
FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of yet another data processing method provided by an embodiment of the present invention;
FIG. 3 is a flow chart of yet another data processing method provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, where the embodiment is applicable to a scenario of performing data processing and analysis on poor loan data. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, integrated in a computer device with application development functionality.
As shown in fig. 1, the data processing method of the present embodiment includes the steps of:
S110, acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans.
Wherein the first data set may be a poor loan data set, which is poor loan-related data collected from a financial institution. The bad loan data may be a loan that cannot be retrieved as expected after the loan agency issues, including overdue, slow, bad accounts, etc. Poor loan data encompasses a variety of categories, such as loan type, use, amount, overdue status, regional distribution, and industry area, among others. Each piece of bad loan data may include a plurality of loan data items, including, for example, loan base information, borrower information, repayment records, loan status, risk classifications, guarantee information, financial information, credit scores, and the like.
The second data set may be a set of bad loan indicators including a plurality of preset bad loan indicators for characterizing a bad loan. The preset bad loan index may be a bad loan rate, a bad loan balance, a bad loan duration, a bad loan reserve coverage, a bad loan migration rate, and other associated index.
In addition, after the first data set is acquired, bad loan data preprocessing may also be performed. For example, the collected bad loan data is subjected to standardized processing, including error removal, duplicate removal, default value filling, and the like, to ensure the accuracy and integrity of the data.
S120, calculating the maximum information coefficient between each preset bad loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient.
The maximum information coefficient (Maximal Information Coefficient, MIC) is a statistic used to measure the degree of correlation between two variables, and can capture various relationships including linearity and nonlinearity. MIC has the advantage of universality and fairness, and can capture various types of associations, not just linear relationships.
The calculation core of the maximum information coefficient is to quantize mutual information (Mutual Information, MI) among variables through information entropy, map the result to the [0,1] interval through normalization processing, and thus uniformly weigh the association modes of different types and different scales. The closer the value of MIC is to 1, the stronger the relationship between the two variables, whether this relationship is linear, nonlinear, or complex periodic. Conversely, if the MIC is close to 0, it means that the two variables are almost independent.
When calculating the maximum information coefficient between each preset bad loan index item and each loan data item respectively, the Mutual Information (MI) can be calculated, wherein the mutual information is an index for measuring the sharing information of two random variables, and the formula can be expressed as follows:
Where p (x, y) is a joint probability distribution and p (x) and p (y) are edge probability distributions. X and Y represent two random variables, respectively, a preset poor loan index and a corresponding loan data. Then, network division and maximum mutual information determination are carried out, two-dimensional data (X, Y) are divided into N multiplied by m grids (N and m are grid line numbers and grid columns, N multiplied by m is less than or equal to B, B is a preset grid total number upper limit, B=Nbeta is usually taken, N is a sample size, beta is a regulating parameter, and 0.6 is usually taken). For all possible meshing schemes, the corresponding mutual information MIn, m (X, Y) is calculated and taken to be maximum, and the corresponding formula can be expressed as maxMI (X, Y) =max n,m:n×m≤BMIn,m (X, Y). Finally, normalization yields the MIC, normalized by dividing by the maximum possible mutual information (i.e., when X and Y are fully correlated): the final result ranges from [0,1].
Assuming 10 loan data items in the first data set and 3 default bad loan indicators in the second data set, 30 maximum information coefficients may be calculated. The Maximum Information Coefficient (MIC) can quantify the nonlinear relationship between each loan data item and a preset bad loan index, and accurately preserve the data variables required by the index. And then, according to the numerical value of each maximum information coefficient, a plurality of loan data items corresponding to the maximum information coefficients with larger numerical values can be selected as a plurality of first data items. The first plurality of all loan data items may also be determined by other filtering rules based on the maximum information coefficient as data that has an impact on poor loan data. Each bad loan indicator may correspond to a plurality of first data items.
S130, constructing a first matrix based on a plurality of first data items, and calculating eigenvectors of the first matrix.
The pearson correlation coefficient may be calculated for each first data item in pairs using each first data item as a matrix variable to construct a first matrix, i.e. a correlation matrix. For example, the first data item includes X, Y and three data items Z, and the pearson correlation coefficient calculated by two pairs may calculate the pearson correlation coefficient between XX, XY, XZ, YX, YY, YZ, ZX, ZY and ZZ, thereby obtaining a correlation matrix, and calculate the eigenvectors and eigenvalues of the correlation matrix.
S140, calculating the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determining a second data item which is related to the preset bad loan index item in the loan data item according to the vector distance.
The calculation of the vector distance may characterize the vector distance between each first data item by the standard deviation and variance between the corresponding vector and the eigenvector of the first matrix.
The smaller the standard deviation and variance between the vector corresponding to the first data item and the feature vector, the closer the distance between the vector corresponding to the first data item and the feature vector is, which means that the more relevant the first data item and the corresponding bad loan index is.
In addition, the vector corresponding to each first data item, the corresponding variance and the corresponding distance can be weighted and summed to calculate, the influence factor of the variable on the index is obtained, the order of the influence of each first data item on the corresponding bad loan index item can be obtained after the order, and the characteristics of the index data are obtained.
According to the technical scheme, a first data set is obtained, wherein the first data set comprises a plurality of pieces of bad loan data, each piece of bad loan data comprises a plurality of preset bad loan index items and loan data items, a maximum information coefficient between each preset bad loan index item and each loan data item is calculated, a plurality of first data items are determined in the loan data items according to the maximum information coefficient, a first matrix is constructed based on the plurality of first data items, feature vectors of the first matrix are calculated, vector distances between vectors corresponding to each first data item in the first matrix and the feature vectors of the first matrix are calculated, and second data items related to the preset bad loan index items in the loan data items are determined according to the vector distances. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.
Fig. 2 is a flowchart of a data processing method according to an embodiment of the present invention, where the data processing method in this embodiment and the data processing method in the foregoing embodiments belong to the same inventive concept, and further describes a process of calculating a maximum information coefficient. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, integrated in a computer device with application development functionality.
As shown in fig. 2, the data processing method of the present embodiment includes the steps of:
S210, acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans.
S220, calculating the maximum information coefficient between each preset bad loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient.
The calculation of the maximum information coefficient between each preset bad loan index item and each loan data item respectively can be realized by a function call mode. And calling a maximum information coefficient calculation function, and taking the preset bad loan index item and the corresponding loan data item as function input data of the maximum information coefficient calculation function, thereby obtaining a corresponding maximum information coefficient calculation result.
Further, determining a plurality of first data items in the loan data items according to the maximum information coefficient may be selecting a first preset number of the loan data items corresponding to the maximum information data from large to small according to the value of the maximum information number to obtain a plurality of first data items, or selecting a second preset number of the loan data items corresponding to the maximum information data from large to small in the plurality of maximum information numbers corresponding to each preset poor loan index item to obtain a plurality of first data items.
S230, constructing a first matrix based on the plurality of first data items, and calculating eigenvectors of the first matrix.
S240, calculating the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determining a second data item which is related to the preset bad loan index item in the loan data item according to the vector distance.
S250, according to a preset bad loan index item adjustment rule, according to the second data item and the corresponding vector distance, adjusting the corresponding preset bad loan index item or an index reference value corresponding to the second data item.
The preset bad loan index item adjustment rule may be a preset policy for adjusting risk early warning. And adjusting a risk management strategy, such as interest rate adjustment, loan amount control, loan period adjustment and the like, according to the second data item and the data characteristics of the corresponding vector distance. Or the index reference value corresponding to the preset bad loan index item can be adjusted.
For a new loan, a financial institution can better identify whether the loan is a bad loan or not, and early warning is carried out for a client manager in time, so that the credit management level and the property quality are improved, and the credit fund risk and the use efficiency are reduced.
According to the technical scheme, a first data set and a second data set are obtained, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, the maximum information coefficient between each preset bad loan index item and each loan data item is calculated, a plurality of first data items are determined in the loan data items according to the maximum information coefficient, a first matrix is built based on the plurality of first data items, the feature vector of the first matrix is calculated, the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix is calculated, the second data item which is relevant to the preset bad loan index item in the loan data items is determined according to the vector distance, and the corresponding reference value of the corresponding preset bad loan index item is adjusted according to the preset bad loan index item adjustment rule and the corresponding vector distance. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.
Fig. 3 is a flowchart of a data processing method according to an embodiment of the present invention, where the data processing method in this embodiment and the data processing method in the foregoing embodiments belong to the same inventive concept, and further describes a process of constructing a correlation matrix and performing analysis. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, integrated in a computer device with application development functionality.
As shown in fig. 3, the data processing method of the present embodiment includes the steps of:
S310, acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans.
S320, constructing a two-dimensional data variable data set of each preset bad loan index item and corresponding to each loan data item based on the first data set and the second data set.
Each bad loan index item data in the second data set and the loan data items in the first data set can be selected to respectively construct a plurality of bad loan indexes, such as a bad rate, a bad loan balance and a bad loan duration, in the second data set of the two-dimensional variable data set. An index is selected from the second data set as an index variable, such as a bad loan balance, and a corresponding two-dimensional variable data set is constructed with a certain loan data item in the first data set. And constructing a two-dimensional variable data set by using the balance of the bad loan and the total amount of the loan. And respectively constructing a two-dimensional variable data set by using other bad loan data variables and the selected indexes, and recording the two-dimensional variable data set as D1, D2 and D3.
S330, calculating the maximum information coefficient corresponding to each two-dimensional data variable data set.
And inputting each two-dimensional data variable data set D1, D2, D3, and Dn into a preset maximum information coefficient calculation component to obtain a corresponding maximum information coefficient.
The calculation process of the preset maximum information coefficient calculating component may refer to the formula MIC (D) =max { M (D) k, M },
Where MI (D, K, m) represents the mutual information value of the two-dimensional variable data set D divided by an integer (K, m) (K and m represent the number of rows and columns of the grid), and when the mutual information value is the maximum value, the magnitudes of K and m are obtained by an exhaustive method, where K X m is smaller than B (n), and B is a function of the data capacity n of the first data set.
S340, calculating correlation coefficients among the plurality of first data items, constructing a correlation matrix based on the correlation coefficients, and calculating eigenvectors of the correlation matrix.
The correlation coefficient may be a pearson correlation coefficient, which is a statistical index for measuring the degree of correlation between two variables, and is used to measure the strength and direction of the linear relationship between the two variables. Its value is between-1 and 1, 1 representing that there is a complete positive correlation between the two variables, -1 representing a complete negative correlation, and 0 representing independence. The calculation of the pearson correlation coefficient is based on the mean and standard deviation of the two variables, and therefore requires attention to the normalization process of the data at the time of application.
Correlation matrices are a widely used mathematical tool in statistics and data analysis to represent correlations between variables in a dataset and can be used to analyze a multi-variable dataset. By analyzing the correlation matrix, the correlation between the variables can be revealed, so that the degree and direction of the correlation between the variables can be known, and the correlation matrix can be used for identifying the most important variables so as to select the variables or reduce the dimension. Correlation matrices are a very useful statistical tool that can help us understand the underlying structure and associations in complex data.
S350, calculating standard deviation and variance of vectors corresponding to each first data item in the first matrix and feature vectors of the correlation matrix, and determining second data items which are associated with preset bad loan index items in the loan data items according to the standard deviation and variance.
According to the technical scheme, a first data set and a second data set are obtained, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, a two-dimensional data variable data set corresponding to each preset bad loan index item is built based on the first data set, the maximum information coefficient corresponding to each two-dimensional data variable data set is calculated, the correlation coefficient among the plurality of first data items is calculated, a correlation matrix is built based on the correlation coefficient, the feature vector of the correlation matrix is calculated, the standard deviation and the variance of the vector corresponding to each first data item in the first matrix and the feature vector of the correlation matrix are calculated, and the second data item which is related to the preset bad loan index item in the loan data items is determined according to the standard deviation and the variance. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.
Fig. 4 is a schematic structural diagram of a data processing device according to an embodiment of the present invention, where the embodiment is applicable to a scenario of performing data processing on bad loan data. The data processing device can be realized by means of software and/or hardware, and is integrated in a computer terminal device with application development functions.
As shown in fig. 4, the data processing apparatus includes a data acquisition module 410, a maximum information number calculation module 420, a matrix construction module 430, and a data item determination module 440.
The data obtaining module 410 is configured to obtain a first data set and a second data set, where the first data set includes a plurality of bad loan data, each bad loan data includes a plurality of loan data items, the second data set includes a plurality of preset bad loan index items for characterizing a bad loan, the maximum information number calculating module 420 is configured to calculate a maximum information coefficient between each preset bad loan index item and each loan data item, and determine a plurality of first data items in the loan data items according to the maximum information coefficient, the matrix constructing module 430 is configured to construct a first matrix based on the plurality of first data items, and calculate a feature vector of the first matrix, and the data item determining module 440 is configured to calculate a vector distance between a vector corresponding to each first data item in the first matrix and the feature vector of the first matrix, and determine a second data item associated with the preset bad loan index item in the loan data items according to the vector distance.
According to the technical scheme, a first data set and a second data set are obtained, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, the second data set comprises a plurality of preset bad loan index items used for representing bad loans, the maximum information coefficient between each preset bad loan index item and each loan data item is calculated, a plurality of first data items are determined in the loan data items according to the maximum information coefficient, a first matrix is built based on the plurality of first data items, the feature vector of the first matrix is calculated, the vector distance between the vector corresponding to each first data item in the first matrix and the feature vector of the first matrix is calculated, and the second data item related to the preset bad loan index item in the loan data items is determined according to the vector distance. The technical scheme of the embodiment of the invention solves the problems of low analysis efficiency and inaccurate result of the bad loan data at present, can reduce the influence of human factors on the result in the analysis of the bad loan data, reduces the influence of noise and abnormal values in the bad loan data on the result, accurately reserves data items associated with bad loan indexes, and improves the analysis efficiency and accuracy.
In an alternative embodiment, the data processing apparatus further comprises an index data adjustment module for:
And according to a preset bad loan index item adjusting rule, adjusting an index reference value corresponding to the corresponding preset bad loan index item according to the second data item and the corresponding vector distance.
In an alternative embodiment, the maximum information number calculation module 420 is specifically configured to:
Constructing a two-dimensional data variable data set of each preset bad loan index item respectively corresponding to each loan data item based on the first data set and the second data set;
And calculating the maximum information coefficient corresponding to each two-dimensional data variable data set.
In an alternative embodiment, the maximum information number calculation module 420 is specifically configured to:
And inputting each two-dimensional data variable data set into a preset maximum information coefficient calculation component to obtain a corresponding maximum information coefficient.
In an alternative embodiment, the maximum information count calculation module 420 may also be configured to:
Selecting a first preset number of loan data items corresponding to the maximum information data from large to small according to the numerical value of the maximum information number to obtain a plurality of first data items, or
And selecting loan data items corresponding to second preset quantity of maximum information data from large to small in a plurality of maximum information numbers corresponding to each preset bad loan index item respectively to obtain a plurality of first data items.
In an alternative embodiment, the matrix construction module 430 is specifically configured to:
calculating correlation coefficients among a plurality of first data items, and constructing a correlation matrix based on the correlation coefficients;
and calculating the eigenvectors of the correlation matrix.
In an alternative embodiment, the data item determination module 440 is specifically configured to:
and calculating standard deviation and variance of the vector corresponding to each first data item in the first matrix and the eigenvector of the correlation matrix.
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention. The computer device 12 may be any terminal device with computing power, such as an intelligent controller, a server, a mobile phone, and the like.
As shown in FIG. 5, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that connects the various system components, including system memory 28 and processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, artificial intelligence systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a data processing method provided by the present embodiment, the method including:
Acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans;
Calculating the maximum information coefficient between each preset poor loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient;
Constructing a first matrix based on a plurality of first data items, and calculating eigenvectors of the first matrix;
And calculating the vector distance between the vector corresponding to each first data item in the first matrix and the characteristic vector of the first matrix, and determining a second data item which is associated with a preset bad loan index item in the loan data item according to the vector distance.
The embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a data processing method as provided by any embodiment of the present invention, the method comprising:
Acquiring a first data set and a second data set, wherein the first data set comprises a plurality of bad loan data, each bad loan data comprises a plurality of loan data items, and the second data set comprises a plurality of preset bad loan index items for representing bad loans;
Calculating the maximum information coefficient between each preset poor loan index item and each loan data item, and determining a plurality of first data items in the loan data items according to the maximum information coefficient;
Constructing a first matrix based on a plurality of first data items, and calculating eigenvectors of the first matrix;
And calculating the vector distance between the vector corresponding to each first data item in the first matrix and the characteristic vector of the first matrix, and determining a second data item which is associated with a preset bad loan index item in the loan data item according to the vector distance.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, python, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as provided by any of the embodiments of the present application.
Computer program product in the implementation, the computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, python, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (10)

1.一种数据处理方法,其特征在于,包括:1. A data processing method, comprising: 获取第一数据集合和第二数据集合,其中,所述第一数据集合中包括多条不良贷款数据,每条所述不良贷款数据包括多个贷款数据项,所述第二数据集合包括多项用于表征不良贷款的预设不良贷款指标项;Obtaining a first data set and a second data set, wherein the first data set includes a plurality of non-performing loan data, each of which includes a plurality of loan data items, and the second data set includes a plurality of preset non-performing loan indicator items for characterizing non-performing loans; 计算每个所述预设不良贷款指标项分别与每一项所述贷款数据项之间的最大信息系数,并根据所述最大信息系数在所述贷款数据项中确定多个第一数据项;Calculating the maximum information coefficient between each of the preset non-performing loan indicator items and each of the loan data items, and determining a plurality of first data items in the loan data items according to the maximum information coefficients; 基于多个所述第一数据项构建第一矩阵,并计算所述第一矩阵的特征向量;constructing a first matrix based on the plurality of first data items, and calculating an eigenvector of the first matrix; 计算所述第一矩阵中每个第一数据项对应的向量与所述第一矩阵的特征向量的向量距离,并根据所述向量距离确定所述贷款数据项中对所述预设不良贷款指标项有关联的第二数据项。The vector distance between the vector corresponding to each first data item in the first matrix and the eigenvector of the first matrix is calculated, and the second data item associated with the preset non-performing loan indicator item in the loan data item is determined based on the vector distance. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, further comprising: 按照预设不良贷款指标项调整规则,根据所述第二数据项和对应的所述向量距离,调整对应的所述预设不良贷款指标项或所述第二数据项对应的指标参考值。According to the preset non-performing loan index item adjustment rule, the corresponding preset non-performing loan index item or the index reference value corresponding to the second data item is adjusted according to the second data item and the corresponding vector distance. 3.根据权利要求1或2所述的方法,其特征在于,所述计算每个所述预设不良贷款指标项分别与每一项所述贷款数据项之间的最大信息系数,包括:3. The method according to claim 1 or 2, wherein the step of calculating the maximum information coefficient between each of the preset non-performing loan indicator items and each of the loan data items comprises: 基于所述第一数据集合和所述第二数据集合,构建每个所述预设不良贷款指标项分别与每一项所述贷款数据项对应的二维数据变量数据集;Based on the first data set and the second data set, constructing a two-dimensional data variable data set corresponding to each of the preset non-performing loan indicator items and each of the loan data items; 计算每个所述二维数据变量数据集对应的最大信息系数。Calculate the maximum information coefficient corresponding to each of the two-dimensional data variable data sets. 4.根据权利要求3所述的方法,其特征在于,所述计算每个所述二维数据变量数据集对应的最大信息系数,包括:4. The method according to claim 3, wherein calculating the maximum information coefficient corresponding to each of the two-dimensional data variable data sets comprises: 将每个所述二维数据变量数据集输入到预设最大信息系数计算组件中,得到对应的最大信息系数。Each of the two-dimensional data variable data sets is input into a preset maximum information coefficient calculation component to obtain a corresponding maximum information coefficient. 5.根据权利要求1所述的方法,其特征在于,所述根据所述最大信息系数在所述贷款数据项中确定多个第一数据项,包括:5. The method according to claim 1, wherein determining a plurality of first data items in the loan data items according to the maximum information coefficient comprises: 根据所述最大信息数的数值,从大到小选取第一预设数量个最大信息数据对应的贷款数据项,得到多个所述第一数据项;或者,According to the value of the maximum information number, a first preset number of loan data items corresponding to the maximum information data are selected from large to small to obtain a plurality of the first data items; or 分别在每个所述预设不良贷款指标项对应的多个最大信息数中,从大到小选取第二预设数量个最大信息数据对应的贷款数据项,得到多个所述第一数据项。Among the multiple maximum information numbers corresponding to each of the preset non-performing loan indicator items, loan data items corresponding to a second preset number of maximum information data are selected from large to small to obtain multiple first data items. 6.根据权利要求1所述的方法,其特征在于,所述基于多个所述第一数据项构建第一矩阵,并计算所述第一矩阵的特征向量,包括:6. The method according to claim 1, wherein constructing a first matrix based on the plurality of first data items and calculating an eigenvector of the first matrix comprises: 计算多个所述第一数据项之间的相关系数,并基于所述相关系数构建相关矩阵;Calculating correlation coefficients between a plurality of the first data items, and constructing a correlation matrix based on the correlation coefficients; 计算所述相关矩阵的特征向量。Calculate the eigenvectors of the correlation matrix. 7.根据权利要求6所述的方法,其特征在于,所述计算所述第一矩阵中每个第一数据项对应的向量与所述第一矩阵的特征向量的向量距离,包括:7. The method according to claim 6, wherein calculating the vector distance between the vector corresponding to each first data item in the first matrix and the eigenvector of the first matrix comprises: 计算所述第一矩阵中每个第一数据项对应的向量与所述相关矩阵的特征向量的标准差和方差。Calculate the standard deviation and variance of the vector corresponding to each first data item in the first matrix and the eigenvector of the correlation matrix. 8.一种计算机设备,其特征在于,所述计算机设备包括:8. A computer device, characterized in that the computer device comprises: 一个或多个处理器;one or more processors; 存储器,用于存储一个或多个程序;a memory for storing one or more programs; 当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的数据处理方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method according to any one of claims 1 to 7. 9.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7中任一所述的数据处理方法。9. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the data processing method according to any one of claims 1 to 7 is implemented. 10.一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序在被处理器执行时实现如权利要求1-7中任一所述的数据处理方法。10. A computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the computer program implements the data processing method according to any one of claims 1 to 7.
CN202510710002.8A 2025-05-29 2025-05-29 Data processing method, apparatus, medium and product Pending CN120580052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510710002.8A CN120580052A (en) 2025-05-29 2025-05-29 Data processing method, apparatus, medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510710002.8A CN120580052A (en) 2025-05-29 2025-05-29 Data processing method, apparatus, medium and product

Publications (1)

Publication Number Publication Date
CN120580052A true CN120580052A (en) 2025-09-02

Family

ID=96857263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510710002.8A Pending CN120580052A (en) 2025-05-29 2025-05-29 Data processing method, apparatus, medium and product

Country Status (1)

Country Link
CN (1) CN120580052A (en)

Similar Documents

Publication Publication Date Title
AU2016328959A1 (en) Updating attribute data structures to indicate trends in attribute data provided to automated modeling systems
CA2368931A1 (en) Risk management system, distributed framework and method
CN113034046A (en) Data risk metering method and device, electronic equipment and storage medium
CN114862291A (en) A data asset value evaluation system and method, device and medium
CN114781937A (en) Method and device for pre-paid card enterprise risk early warning and storage medium
CN117593115A (en) Feature value determining method, device, equipment and medium of credit risk assessment model
CN119577697A (en) Method, device and equipment for constructing elastic network regression model of battery retention rate
CN114707733A (en) Risk indicator prediction method and device, electronic equipment and storage medium
CN114186605A (en) Minority sample processing method, device, equipment and storage medium
Hamidieh Estimating the tail shape parameter from option prices
CN114331682A (en) Method and device for acquiring relevance of deposit amount between objects
CN118822722A (en) Early warning analysis method, system and readable storage medium for financial data
CN117787808A (en) Borrowed resource allocation method and device, computer equipment and storage medium
CN113935574A (en) Abnormal transaction monitoring method and device, computer equipment and storage medium
CN120580052A (en) Data processing method, apparatus, medium and product
CN117764617A (en) Property running cost prediction method and device based on big data analysis
CN115409607A (en) Method and device for determining credit granting data and electronic equipment
CN115796585A (en) Enterprise operation risk assessment method and system
CN119962996B (en) Asset screening and management method and system based on automated rules
CN110852392A (en) User grouping method, device, equipment and medium
TWI882743B (en) Danp expert integration calculating system and method thereof
CN119691528B (en) Data verification method, device, equipment, storage medium and product
CN114861768B (en) Anomaly detection method and related device
CN120578589A (en) Performance test case acquisition method, device, electronic device and storage medium
CN116468531A (en) Account information processing method, apparatus, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination