Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a user portrait construction method and a user portrait construction system suitable for an ether house, and aims to realize the construction of the user portrait in the ether house.
To achieve the above object, according to a first aspect of the present invention, there is provided a user representation construction method for an etherhouse, including:
(1) after the Ether house data are synchronized, collecting the Ether house data and analyzing external transaction information, internal transaction information and account category information from the Ether house data;
(2) collecting data of the accounts with the completed account classification as verification data of the corresponding account classification;
(3) based on information obtained by analyzing the Ether house data, carrying out primary classification on accounts in the Ether house by using a heuristic classification algorithm to obtain a plurality of primary account categories;
(4) extracting a feature vector representing account neighbor information in an ether house for each account under the primary account category, performing secondary classification on the accounts under the primary account category according to the extracted feature vector by using a machine learning algorithm, and taking the category obtained by the secondary classification as a target account category;
(5) comparing the data of each target account type with the verification data of the corresponding account type to obtain the accuracy of secondary classification, if the preset accuracy requirement is met, taking the current target account type as the final account type, and turning to the step (6); otherwise, adjusting parameters of a heuristic classification algorithm and a calculation learning algorithm to improve the accuracy of account classification, and after parameter adjustment is completed, switching to the step (3);
(6) and for each account in the ether house, reversely deducing the behavior characteristics of the account according to the account category to which the account belongs so as to construct a user portrait corresponding to the account according to the behavior characteristics of the account.
Further, in the method for constructing a user representation applicable to an ethernet workshop according to the first aspect of the present invention, the step (1) further includes:
before collecting the Ether house data, detecting whether the system carries out data synchronization on the Ether house nodes, if so, directly collecting the Ether house data; if not, the system synchronizes the Ether house data in a full node mode and then collects the Ether house data.
Further, in the method for constructing a user representation applicable to an ethernet workshop according to the first aspect of the present invention, the step (1) further includes:
after the Ether house data is analyzed, the information obtained by analysis is stored in an external database so as to accelerate the subsequent data query.
Further, the user portrait construction method applicable to the etherhouse provided by the first aspect of the present invention further includes:
for any account category, analyzing the behavior characteristics of each account under the account category to obtain the behavior tendency of the account in the blockchain network.
Further, the step (2) comprises:
for the account with the completed account classification, crawling the data of the account from the Internet as the verification data of the corresponding account classification;
or, for the account with the completed account classification, performing transaction with the account of the corresponding account classification to acquire the account address of the account as the verification data of the corresponding account classification.
Further, in the step (3), accounts in the ether house are preliminarily classified by using a heuristic classification algorithm, and the obtained primary account categories include: exchange, mine pool, independent miner, joint miner, Oracle contract.
Further, in the step (4), extracting a feature vector representing the account neighbor information in the ether house includes:
and calculating a probability matrix of the adjacent node type of the node where the account is located in the Etherd, and extracting a characteristic vector representing the neighbor information of the account from the probability matrix.
According to a second aspect of the present invention, there is provided a user representation construction system for an etherhouse, comprising: the system comprises a classification data collection module, a verification data collection module, a primary classification module, a secondary classification module, a verification module and a user portrait construction module;
the classification data collection module is used for collecting the Ether house data and analyzing external transaction information, internal transaction information and account category information from the Ether house data after the Ether house data are synchronized;
the verification data collection module is used for collecting the data of the accounts with the completed account classification as the verification data of the corresponding account classification;
the primary classification module is used for carrying out primary classification on accounts in the ether house by utilizing a heuristic classification algorithm based on information obtained by analyzing the ether house data by the classification data collection module to obtain a plurality of primary account categories, and triggering the secondary classification module after the primary classification is finished;
the secondary classification module is used for extracting a feature vector representing the neighbor information of the account in the ether house for each account under the primary account category, secondarily classifying the accounts under the primary account category according to the extracted feature vector by using a machine learning algorithm, and taking the category obtained by secondary classification as a target account category;
the verification module is used for comparing the data of each target account type with the verification data of the corresponding account type to obtain the accuracy of secondary classification, if the preset accuracy requirement is met, the current target account type is used as the final account type, and the user portrait construction module is triggered; if the preset accuracy requirement is not met, adjusting parameters of a heuristic classification algorithm and a calculation learning algorithm to improve the accuracy of account classification, and triggering a primary classification module after parameter adjustment is completed;
and the user portrait construction module is used for reversely deducing the behavior characteristics of each account in the Ether house according to the account category to which the account belongs so as to construct the user portrait corresponding to the account according to the behavior characteristics of the account.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) according to the invention, after the ether house data is collected and analyzed, the accounts in the ether house are primarily classified by adopting a heuristic classification algorithm, and then on the basis of the primary classification, the accounts are secondarily classified by adopting a machine learning algorithm based on the characteristic vector of the account neighbor information, so that the account classification can be rapidly completed; the accuracy of the classification result is verified by combining the information of the account subjected to account classification, so that the accuracy of the ether house account classification can be ensured; and after the classification is finished, constructing a user portrait corresponding to the account according to the behavior characteristics of the account based on the account classification result, thereby realizing the user portrait construction in the ether house.
(2) After the user portrait in the ether house is constructed, the behavior tendency of the account in the specific category in the block chain network can be analyzed based on the user portrait, namely the account behavior characteristics under the account category, so that the real-time behavior of the account in the ether house can be monitored more accurately.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In the present application, the terms "first," "second," and the like (if any) in the description and the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In order to implement the construction of the user portrait in the ether house, the user portrait construction method applicable to the ether house provided by the invention, as shown in fig. 1, includes:
(1) after the Ether house data are synchronized, collecting the Ether house data and analyzing external transaction information, internal transaction information and account category information from the Ether house data;
in an optional embodiment, in order to ensure the accuracy of the account classification, step (1) may further include:
before collecting the Ether house data, detecting whether the system carries out data synchronization on the Ether house nodes, if so, directly collecting the Ether house data; if not, the system synchronizes the Ether house data in a full node mode and then collects the Ether house data;
in an optional embodiment, in order to speed up the information query, step (1) may further include:
after the Etheng data is analyzed, the information obtained by analysis is stored in an external database so as to accelerate the subsequent data query;
optionally, the collected etherhouse data includes, but is not limited to, block header data, status data, transaction data, receipt data, and the like;
(2) collecting data of the accounts with the completed account classification as verification data of the corresponding account classification;
in an alternative embodiment, step (2) comprises:
for the account with the completed account classification, crawling the data of the account from the Internet as the verification data of the corresponding account classification;
or, for the account with the completed account classification, performing a transaction with the account of the corresponding account classification (for example, depositing cryptocurrency into the exchange or extracting cryptocurrency from the exchange, etc.) to obtain the account address of the account as the verification data of the corresponding account classification;
it is easy to understand that the verification data collected here is the verification data used for verifying the accuracy of the subsequent account classification result, and in order to ensure the validity of the verification, enough verification data should be collected, for example, data that is analyzed for more than three years in an accumulated manner;
(3) based on information obtained by analyzing the Ether house data, carrying out primary classification on accounts in the Ether house by using a heuristic classification algorithm to obtain a plurality of primary account categories;
when the heuristic classification algorithm is used for primarily classifying the Etheng accounts, some rules need to be designed specifically, and data to be classified are screened according to the rules; for example, for the analysis result already stored in the database, relevant rules can be formulated through SQL statements to filter the account data in the database;
when the preliminary classification is specifically carried out, the specific primary account category can be determined according to the actual application characteristics; in an optional embodiment, in step (3), the accounts in the etherhouses are preliminarily classified by using a heuristic classification algorithm, and the obtained primary account categories include: exchange, mine pool, independent miner, joint miner, Oracle contract; these 5 account categories can cover substantially all users in the Etherhouse; it should be noted that the primary account set forth herein is merely an exemplary description and should not be construed as the only limitation on the invention;
(4) extracting a feature vector representing account neighbor information in an ether house for each account under the primary account category, performing secondary classification on the accounts under the primary account category according to the extracted feature vector by using a machine learning algorithm, and taking the category obtained by the secondary classification as a target account category;
in an optional embodiment, in step (4), extracting a feature vector representing account neighbor information in the ether house includes:
calculating a probability matrix of the adjacent node type of the node where the account is located in the Etherhouse, and extracting a characteristic vector representing the neighbor information of the account from the probability matrix;
on the basis of carrying out primary classification by using a heuristic classification learning algorithm, carrying out secondary classification by using a machine learning algorithm, and improving the coverage rate of classification; alternatively, the machine learning algorithm for performing the secondary classification may be any machine learning algorithm, such as a machine learning algorithm for maximum likelihood estimation, a graph convolution network, or other machine learning algorithms, which will not be listed herein;
(5) comparing the data of each target account type with the verification data of the corresponding account type to obtain the accuracy of secondary classification, if the preset accuracy requirement is met, taking the current target account type as the final account type, and turning to the step (6); otherwise, adjusting parameters of a heuristic classification algorithm and a calculation learning algorithm to improve the accuracy of account classification, and after parameter adjustment is completed, switching to the step (3);
(6) for each account in the ether house, reversely deducing the behavior characteristics of the account according to the account category to which the account belongs so as to construct a user portrait corresponding to the account according to the behavior characteristics of the account;
optionally, when the account behavior characteristics are reversely deduced according to the account types, a transaction tracking analysis method can be specifically adopted; for example, by tracking roll-out and roll-in transactions between different exchanges, arbitrage behavior of an account is easily discovered;
according to the user portrait construction method applicable to the ether house, after the ether house data are collected and analyzed, accounts in the ether house are primarily classified by adopting a heuristic classification algorithm, and then on the basis of the primary classification, the accounts are secondarily classified by adopting a machine learning algorithm based on the feature vectors of the account neighbor information, so that account classification can be rapidly completed; the accuracy of the classification result is verified by combining the information of the account subjected to account classification, so that the accuracy of the ether house account classification can be ensured; and after the classification is finished, constructing a user portrait corresponding to the account according to the behavior characteristics of the account based on the account classification result, thereby realizing the user portrait construction in the ether house.
In order to accurately monitor the users in the ether house in real time based on the constructed user portrait, the user portrait construction method applicable to the ether house may further include:
for any account type, analyzing the behavior characteristics of each account under the account type to obtain the behavior tendency of the account in the block chain network; for example, some accounts tend to roll in and out of arbitrage between exchanges, greatly increasing network congestion;
it should be noted that the behaviors of different types of accounts are different, and when the behavior tendency of an account under a certain account type is specifically analyzed, the behavior tendency should be specifically analyzed by combining the constructed user image.
The invention also provides a user portrait construction system suitable for an ether house, as shown in fig. 2, comprising: the system comprises a classification data collection module, a verification data collection module, a primary classification module, a secondary classification module, a verification module and a user portrait construction module;
the classification data collection module is used for collecting the Ether house data and analyzing external transaction information, internal transaction information and account category information from the Ether house data after the Ether house data are synchronized;
the verification data collection module is used for collecting the data of the accounts with the completed account classification as the verification data of the corresponding account classification;
the primary classification module is used for carrying out primary classification on accounts in the ether house by utilizing a heuristic classification algorithm based on information obtained by analyzing the ether house data by the classification data collection module to obtain a plurality of primary account categories, and triggering the secondary classification module after the primary classification is finished;
the secondary classification module is used for extracting a feature vector representing the neighbor information of the account in the ether house for each account under the primary account category, secondarily classifying the accounts under the primary account category according to the extracted feature vector by using a machine learning algorithm, and taking the category obtained by secondary classification as a target account category;
the verification module is used for comparing the data of each target account type with the verification data of the corresponding account type to obtain the accuracy of secondary classification, if the preset accuracy requirement is met, the current target account type is used as the final account type, and the user portrait construction module is triggered; if the preset accuracy requirement is not met, adjusting parameters of a heuristic classification algorithm and a calculation learning algorithm to improve the accuracy of account classification, and triggering a primary classification module after parameter adjustment is completed;
the user portrait construction module is used for reversely deducing the behavior characteristics of each account in the Ether house according to the account category to which the account belongs so as to construct a user portrait corresponding to the account according to the behavior characteristics of the account;
in the embodiment of the present invention, the detailed implementation of each module may refer to the description of the method embodiment described above, and will not be repeated here.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.