CN111984898A - Tag push method, device, electronic device and storage medium based on big data - Google Patents
Tag push method, device, electronic device and storage medium based on big data Download PDFInfo
- Publication number
- CN111984898A CN111984898A CN202010610771.8A CN202010610771A CN111984898A CN 111984898 A CN111984898 A CN 111984898A CN 202010610771 A CN202010610771 A CN 202010610771A CN 111984898 A CN111984898 A CN 111984898A
- Authority
- CN
- China
- Prior art keywords
- node
- data
- label
- initial
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9562—Bookmark management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/55—Push-based network services
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明涉及大数据技术领域,提出一种基于大数据的标签推送方法、装置、电子设备及存储介质,包括:从预设的多个数据源中采集原始数据按照预设的数据清洗策略进行清洗,得到样本数据;从每个节点的样本数据中提取多维度的目标特征并按照预设的归类模型进行归类得到每个节点的初始标签;对每个节点的初始标签进行聚类分析形成不同对象的标签体系;当监测到目标节点被触发时,推送目标节点的标签体系中的初始标签。本发明通过清洗原始数据并提取多维度目标特征得到每个节点的初始标签,并聚类为不同对象的标签体系,提高了标签推荐的准确率。此外,本发明还涉及区块链技术领域,初始标签存储于区块链节点中。
The invention relates to the technical field of big data, and proposes a method, device, electronic device and storage medium for label pushing based on big data, including: collecting raw data from multiple preset data sources and cleaning according to a preset data cleaning strategy , obtain sample data; extract multi-dimensional target features from the sample data of each node and classify according to the preset classification model to obtain the initial label of each node; perform cluster analysis on the initial label of each node to form Label system of different objects; when the target node is detected to be triggered, the initial label in the label system of the target node is pushed. The invention obtains the initial label of each node by cleaning the original data and extracting multi-dimensional target features, and clusters them into a label system of different objects, thereby improving the accuracy of label recommendation. In addition, the present invention also relates to the technical field of blockchain, and the initial label is stored in the blockchain node.
Description
技术领域technical field
本发明涉及大数据技术领域,具体涉及一种基于大数据的标签推送方法、装置、电子设备及存储介质。The present invention relates to the technical field of big data, in particular to a method, device, electronic device and storage medium for pushing labels based on big data.
背景技术Background technique
传统的标签技术,是基于数据源进行提取,通过埋点等手段,获取用户的行为数据,根据用户的行为习惯及基本信息打上各种各样的标签,并且传统的标签都是从业务或产品角度出发,依靠经验进行相关维度组合和阈值设定,绝大多数标签都是无人问津。The traditional labeling technology is to extract based on the data source, obtain the user's behavior data by means such as burying points, and put various labels according to the user's behavior habits and basic information, and the traditional labels are derived from the business or product. From a perspective, relying on experience to carry out related dimension combinations and threshold settings, most labels are uninterested.
现有的执行业务系统的标签库是基于执行业务流程来做的,标签库中的标签包含了大量的手工操作,并且大部分标签来源于数据的简单归集整理,未对每个业务节点的数据进行清洗处理,提取有价值的信息创建标签库,导致用户无法根据推荐的标签快速的得到想要的资料和数据,在确定每步操作前需要查阅大量的资料和数据,推荐的标签的准确率低。The existing tag library of the execution business system is based on the execution of the business process. The tags in the tag library contain a large number of manual operations, and most of the tags are derived from the simple collection and sorting of data, and there is no need for each business node. The data is cleaned and processed, and valuable information is extracted to create a tag library, which makes it impossible for users to quickly obtain the desired information and data according to the recommended tags. Before determining each step, a large amount of information and data needs to be checked. rate is low.
发明内容SUMMARY OF THE INVENTION
鉴于以上内容,有必要提出一种基于大数据的标签推送方法、装置、电子设备及存储介质,通过清洗原始数据并提取多维度目标特征得到每个节点的初始标签,并聚类为不同对象的标签体系,提高了标签推荐的准确率。In view of the above, it is necessary to propose a big data-based label push method, device, electronic device and storage medium. By cleaning the original data and extracting multi-dimensional target features, the initial label of each node is obtained, and the labels of different objects are clustered. The label system improves the accuracy of label recommendation.
本发明的第一方面提供一种基于大数据的标签推送方法,所述基于大数据的标签推送方法包括:A first aspect of the present invention provides a method for pushing labels based on big data, and the method for pushing labels based on big data includes:
从预设的多个数据源中采集多个原始数据,其中,每个所述原始数据中对应有节点标识;Collect a plurality of raw data from a plurality of preset data sources, wherein each of the raw data corresponds to a node identifier;
对每个所述原始数据按照预设的数据清洗策略进行数据清洗,得到样本数据;Perform data cleaning on each of the raw data according to a preset data cleaning strategy to obtain sample data;
从每个节点对应的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签;Extract multi-dimensional target features from the sample data corresponding to each node, classify the multi-dimensional target features according to a preset classification model, and obtain the initial label of each node;
对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系;Perform cluster analysis on the initial label of each node to form a label system of different objects;
当监测到所述多个节点中的目标节点被触发时,推送所述目标节点对应的标签体系中的初始标签。When it is detected that the target node among the multiple nodes is triggered, the initial label in the label system corresponding to the target node is pushed.
优选的,所述对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系包括:Preferably, the cluster analysis of the initial labels of each node to form a label system for different objects includes:
根据k均值聚类算法对所述每个节点的初始标签进行聚类,获得多个对象;According to the k-means clustering algorithm, the initial label of each node is clustered to obtain a plurality of objects;
以所述多个对象中的任一对象作为所述目标对象,将所述目标对象及所述目标对象对应的初始标签设置为所述目标对象对应的标签体系。Taking any one of the multiple objects as the target object, the target object and the initial label corresponding to the target object are set as the label system corresponding to the target object.
优选的,在对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系之后,所述方法还包括:Preferably, after performing cluster analysis on the initial label of each node to form a label system of different objects, the method further includes:
实时监控预设周期内每个初始标签的点击率及转化率;Monitor the click-through rate and conversion rate of each initial tag in a preset period in real time;
判断所述每个初始标签的点击率是否大于对应的点击率阈值,及判断所述每个初始标签的转化率是否大于对应的转化率阈值;Judging whether the click-through rate of each initial label is greater than the corresponding click-through rate threshold, and judging whether the conversion rate of each initial label is greater than the corresponding conversion rate threshold;
当所述每个初始标签的点击率大于或者等于所述对应的点击率阈值,及所述每个初始标签的转化率大于或者等于所述对应的转化率阈值时,将所述初始标签划分为热门标签;When the click-through rate of each initial tag is greater than or equal to the corresponding click-through rate threshold, and the conversion rate of each initial tag is greater than or equal to the corresponding conversion rate threshold, the initial tags are divided into popular tags;
当所述每个初始标签的点击率小于所述对应的点击率阈值,或者所述每个初始标签的转化率小于所述对应的转化率阈值时,将所述初始标签划分为无用标签。When the click rate of each initial tag is smaller than the corresponding click rate threshold, or the conversion rate of each initial tag is smaller than the corresponding conversion rate threshold, the initial tag is classified as a useless tag.
优选的,所述对所述原始数据按照预设的数据清洗策略进行数据清洗,得到样本数据包括:Preferably, performing data cleaning on the original data according to a preset data cleaning strategy, and obtaining sample data includes:
识别每个原始数据的节点标识;Identify the node ID of each raw data;
获取所述节点标识对应的预设的数据清洗策略;obtaining a preset data cleaning strategy corresponding to the node identifier;
按照所述预设的数据清洗策略清洗所述节点标识对应的原始数据;Clean the original data corresponding to the node identifier according to the preset data cleaning strategy;
将清洗过的所述原始数据转换成预设类型的结构化数据;Converting the cleaned raw data into structured data of a preset type;
将所述结构化的数据按照所述节点标识进行归类得到样本数据,并将所述样本数据存放至预设的数据库中。Classifying the structured data according to the node identifiers to obtain sample data, and storing the sample data in a preset database.
优选的,所述从每个节点对应的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签包括:Preferably, the multi-dimensional target features are extracted from the sample data corresponding to each node, and the multi-dimensional target features are classified according to a preset classification model, and the initial label obtained for each node includes:
根据每个节点的节点标识和面向所述节点标识的查询语言HQL语法规则从预设的数据库中读取每个节点的样本数据;Read the sample data of each node from the preset database according to the node identification of each node and the query language HQL grammar rule oriented to the node identification;
根据预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征;Extract multi-dimensional target features from the read sample data of each node according to a preset algorithm;
将所述多维度的目标特征输入至所述预设的归类模型中进行归类得到每个节点的初始标签,其中,所述初始标签存储于区块链节点中。The multi-dimensional target features are input into the preset classification model for classification to obtain the initial label of each node, wherein the initial label is stored in the blockchain node.
优选的,所述根据预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征包括:Preferably, the extraction of multi-dimensional target features from the read sample data of each node according to a preset algorithm includes:
根据预置特征维度从所述读取的每个节点的样本数据中提取第一特征;Extract the first feature from the read sample data of each node according to the preset feature dimension;
通过训练好的模型对所述读取的每个节点的样本数据进行处理,得到第二特征;Process the read sample data of each node through the trained model to obtain the second feature;
将所述第一特征和所述第二特征进行合并,得到多维度的目标特征。The first feature and the second feature are combined to obtain a multi-dimensional target feature.
优选的,在所述推送所述目标节点对应的标签体系中的初始标签之后,所述方法还包括:Preferably, after the initial label in the label system corresponding to the target node is pushed, the method further includes:
当监测到用户对推送的初始标签的再加工指令时,解析所述再加工指令得到所述用户的再加工条件;When monitoring the user's reprocessing instruction for the pushed initial label, parse the reprocessing instruction to obtain the user's reprocessing condition;
将所述再加工条件输入至所述预设的归类模型中得到新的标签,将所述新的标签与所述推送的初始标签进行组合运算得到高级标签;Inputting the reprocessing conditions into the preset classification model to obtain a new label, and combining the new label and the pushed initial label to obtain an advanced label;
推送所述高级标签。Push the advanced tag.
本发明的第二方面提供一种基于大数据的标签推送装置,所述基于大数据的标签推送装置包括:A second aspect of the present invention provides a big data-based label pushing device, the big data-based label pushing device comprising:
采集模块,用于从预设的多个数据源中采集多个原始数据,其中,每个所述原始数据中对应有节点标识;a collection module, configured to collect a plurality of raw data from a plurality of preset data sources, wherein each of the raw data corresponds to a node identifier;
清洗模块,用于对每个所述原始数据按照预设的数据清洗策略进行数据清洗,得到样本数据;a cleaning module, configured to perform data cleaning on each of the raw data according to a preset data cleaning strategy to obtain sample data;
归类模块,用于从每个节点对应的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签;A classification module, used for extracting multi-dimensional target features from the sample data corresponding to each node, classifying the multi-dimensional target features according to a preset classification model, and obtaining the initial label of each node;
分析模块,用于对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系;an analysis module for performing cluster analysis on the initial label of each node to form a label system of different objects;
推送模块,用于当监测到目标节点被触发时,推送所述目标节点对应的标签体系中的初始标签。The push module is configured to push the initial label in the label system corresponding to the target node when it is detected that the target node is triggered.
本发明的第三方面提供一种电子设备,所述电子设备包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现所述基于大数据的标签推送方法。A third aspect of the present invention provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the big data-based tag push method when executing a computer program stored in a memory.
本发明的第四方面提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现所述基于大数据的标签推送方法。A fourth aspect of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for pushing tags based on big data is implemented.
综上所述,本发明所述的基于大数据的标签推送方法、装置、终端及存储介质,一方面通过预设的数据清洗策略清洗从不同的数据源采集的原始数据,对所述原始数据进行清洗得到每个节点的样本数据,删除问题数据,确保了得到的样本数据的一致性和完整性,提高了样本数据的质量,另一方面通过提取每个节点的样本数据中的多维度目标特征,将所述多维度的目标特征输入预设的归类模型进行归类得到每个节点的初始标签,提高了计算得到初始标签的效率,同时针对每个节点的初始标签聚类为不同对象的标签体系,提高了标签的推荐的准确率。To sum up, the method, device, terminal and storage medium for label pushing based on big data according to the present invention clean the original data collected from different data sources through a preset data cleaning strategy on the one hand. Clean the sample data of each node, delete the problem data, ensure the consistency and integrity of the obtained sample data, and improve the quality of the sample data. On the other hand, by extracting the multi-dimensional objects in the sample data of each node feature, the multi-dimensional target features are input into the preset classification model for classification to obtain the initial label of each node, which improves the efficiency of calculating the initial label, and at the same time, the initial label of each node is clustered into different objects The labeling system improves the accuracy of label recommendation.
此外,通过预设周期内实时监控每个初始标签的点击率和转换率,删除无用标签,不断通过训练学习优化整个标签体系,确保了标签体系中的初始标签的时效性,同时提高了推荐标签的准确率。In addition, by monitoring the click rate and conversion rate of each initial tag in real time within a preset period, deleting useless tags, and continuously optimizing the entire tag system through training and learning, ensuring the timeliness of the initial tags in the tag system and improving the recommended tags. 's accuracy.
附图说明Description of drawings
图1是本发明实施例一提供的基于大数据的标签推送方法的流程图。FIG. 1 is a flowchart of a method for pushing tags based on big data according to Embodiment 1 of the present invention.
图2是本发明实施例二提供的基于大数据的标签推送装置的结构图。FIG. 2 is a structural diagram of a big data-based label push device provided in Embodiment 2 of the present invention.
图3是本发明实施例三提供的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device according to Embodiment 3 of the present invention.
具体实施方式Detailed ways
为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施例对本发明进行详细描述。需要说明的是,在不冲突的情况下,本发明的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present invention, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and the features in the embodiments may be combined with each other under the condition of no conflict.
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention.
实施例一Example 1
图1是本发明实施例一提供的基于大数据的标签推送方法的流程图。FIG. 1 is a flowchart of a method for pushing tags based on big data according to Embodiment 1 of the present invention.
在本实施例中,所述基于大数据的标签推送方法可以应用于电子设备中,对于需要进行基于大数据的标签推送的电子设备,可以直接在电子设备上集成本发明的方法所提供的基于大数据的标签推送的功能,或者以软件开发工具包(Software Development Kit,SKD)的形式运行在电子设备中。In this embodiment, the method for pushing labels based on big data can be applied to electronic devices. For electronic devices that need to push labels based on big data, the method based on the method of the present invention can be directly integrated on the electronic device. The function of tag push of big data, or running in electronic equipment in the form of Software Development Kit (SKD).
如图1所示,所述基于大数据的标签推送方法具体包括以下步骤,根据不同的需求,该流程图中步骤的顺序可以改变,某些可以省略。As shown in FIG. 1 , the method for pushing labels based on big data specifically includes the following steps. According to different requirements, the order of the steps in this flowchart can be changed, and some can be omitted.
S11:从预设的多个数据源中采集多个原始数据,其中,每个所述原始数据中对应有节点标识。S11: Collect a plurality of raw data from a plurality of preset data sources, wherein each of the raw data corresponds to a node identifier.
本实施例中,所述原始数据包括:被执行对象的基本信息、案件的基本信息、执行主体信息和财产信息等,其中,所述被执行人指老赖,被执行人信息主要包括:姓名、身份证号、年龄、性别、职业和所在单位等;执行主体信息主要包括:案号、被执行人身份信息、用户信息、涉及环节、返回状态、操作时间等等;财产信息指被执行人名下的所有财产,例如:银行存款、房产、车辆等。以房产为例:房产所在的省市、楼层、朝向、面积等等。所述数据源可以为执行业务系统,从所述执行业务系统的各个流程节点中采集原始数据。In this embodiment, the original data includes: basic information of the subject to be executed, basic information of the case, information of the subject of execution, property information, etc., wherein the person to be executed refers to Lao Lai, and the information of the person to be executed mainly includes: name , ID number, age, gender, occupation and unit, etc.; execution subject information mainly includes: case number, identity information of the person subject to execution, user information, links involved, return status, operation time, etc.; property information refers to the name of the person subject to execution All property under it, such as bank deposits, real estate, vehicles, etc. Take real estate as an example: the province and city where the real estate is located, floor, orientation, area, etc. The data source may be an execution business system, and raw data is collected from each process node of the execution business system.
S12:对每个所述原始数据按照预设的数据清洗策略进行数据清洗,得到样本数据。S12: Perform data cleaning on each of the original data according to a preset data cleaning strategy to obtain sample data.
本实施例中,可以预先根据每个节点对应的标签的清洗条件设置数据清洗策略,所述预设的数据清洗策略可以为对缺失值清洗、格式内容清洗、逻辑错误清洗和非需求数据清洗,当采集到原始数据后,按照所述预设的数据清洗策略对所述原始数据进行清洗,得到样本数据。In this embodiment, a data cleaning strategy may be set in advance according to the cleaning conditions of the labels corresponding to each node, and the preset data cleaning strategy may be cleaning of missing values, format content cleaning, logical error cleaning, and non-required data cleaning, After the raw data is collected, the raw data is cleaned according to the preset data cleaning strategy to obtain sample data.
本实施例中,所述缺失值清洗对应的预设的数据清洗策略为直接删除带有缺失值数据记录或者补全带有缺失值数据记录。In this embodiment, the preset data cleaning strategy corresponding to the missing value cleaning is to directly delete data records with missing values or to complete data records with missing values.
示例性的,带有缺失值的数据记录的目标标签主要集中于某一类或几类,如果删除这些数据记录将使对应分类的数据样本丢失大量特征信息,导致模型过拟合或分类不准确,采用的预设的数据清洗策略为补全带有缺失值数据记录。Exemplarily, the target labels of data records with missing values are mainly concentrated in one or several categories. If these data records are deleted, a large amount of feature information will be lost in the corresponding classified data samples, resulting in overfitting of the model or inaccurate classification. , and the preset data cleaning strategy used is to complete data records with missing values.
本实施例中,所述格式内容清洗对应的预设的数据清洗策略为对时间、日期、数值、全半角等显示格式不一致、内容中有不该存在的字符及、内容与该字段应有内容不符的数据进行清洗。In this embodiment, the preset data cleaning strategy corresponding to the format content cleaning is that the display format of time, date, numerical value, full-width, etc. is inconsistent, the content contains characters that should not exist, and the content and the field should have content. Inconsistent data is cleaned.
示例性的,当时间、日期、数值、全半角等显示格式不一致时,预设的数据清洗策略为将所述时间、日期、数值、全半角等显示格式处理成一致的格式;当内容中有不该存在的字符时,预设的数据清洗策略为以半自动校验半人工方式来找出可能存在的问题,并去除不需要的字符,例如:身份证号中出现汉字。Exemplarily, when the display formats such as time, date, numerical value, full-width, etc. are inconsistent, the preset data cleaning strategy is to process the display formats such as time, date, numerical value, and full-width into a consistent format; When there are characters that should not exist, the preset data cleaning strategy is to use semi-automatic verification and semi-manual methods to find out possible problems and remove unnecessary characters, such as Chinese characters in ID numbers.
本实施例中,所述逻辑错误清洗对应的预设的数据清洗策略为去重、去除不合理值及修正矛盾内容。In this embodiment, the preset data cleaning strategy corresponding to the logic error cleaning is deduplication, removal of unreasonable values, and correction of contradictory content.
示例性的,针对去重设置的预设的数据清洗策略为将重复的字段进行删除,只保留一个;针对去除不合理值设置的预设的数据清洗策略如年龄200岁,删除年龄对应的不合理值;针对修正矛盾内容设置的预设的数据清洗策略为需要根据字段的数据来源,来判定哪个字段提供的信息更为可靠,去除或重构不可靠的字段,如身份证号是1101031980XXXXXXXX,然后年龄填18岁,需要判断身份证号和年龄那个更可靠进行重构或者删除矛盾内容。Exemplarily, the preset data cleaning strategy set for deduplication is to delete duplicate fields and keep only one; the preset data cleaning strategy set for removing unreasonable values, such as age 200, delete the unreasonable value corresponding to the age. Reasonable value; the preset data cleaning strategy set for correcting conflicting content is to determine which field provides more reliable information based on the data source of the field, and remove or reconstruct unreliable fields. For example, the ID number is 1101031980XXXXXXXX, Then fill in the age of 18, and you need to judge the ID number and age which are more reliable to reconstruct or delete contradictory content.
本实施例中,所述非需求数据清洗对应的预设的数据清洗策略为是指将不要的字段进行删除。In this embodiment, the preset data cleaning policy corresponding to the non-required data cleaning refers to deleting unnecessary fields.
优选的,所述对所述原始数据按照预设的数据清洗策略进行数据清洗,得到样本数据包括:Preferably, performing data cleaning on the original data according to a preset data cleaning strategy, and obtaining sample data includes:
识别每个原始数据的节点标识;Identify the node ID of each raw data;
获取所述节点标识对应的预设的数据清洗策略;obtaining a preset data cleaning strategy corresponding to the node identifier;
按照所述预设的数据清洗策略清洗所述节点标识对应的原始数据;Clean the original data corresponding to the node identifier according to the preset data cleaning strategy;
将清洗过的所述原始数据转换成预设类型的结构化数据;Converting the cleaned raw data into structured data of a preset type;
将所述结构化的数据按照所述节点标识进行归类得到样本数据,并将所述样本数据存放至预设的数据库中。Classifying the structured data according to the node identifiers to obtain sample data, and storing the sample data in a preset database.
本实施例中,所述预设数据库可以为hive数据库,Hive是基于Hadoop的一个数据仓库工具,可以存储结构化的数据,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行,通过预设的数据清洗策略将所述原始数据进行清洗后转换成预设类型的结构化数据,将所述结构化的数据按照所述节点标识进行归类得到样本数据,并将所述样本数据存放至预设的数据库中。In this embodiment, the preset database may be the hive database, which is a data warehouse tool based on Hadoop, which can store structured data, provide a complete SQL query function, and can convert SQL statements into MapReduce tasks for running , the original data is cleaned and converted into structured data of a preset type through a preset data cleaning strategy, the structured data is classified according to the node identifier to obtain sample data, and the sample Data is stored in a preset database.
进一步的,所述方法还包括:Further, the method also includes:
将所述原始数据中不符合所述预设的数据清洗策略的问题数据放置于问题数据库中;placing the problem data that does not conform to the preset data cleaning strategy in the original data in the problem database;
在预设时间段内未收到再次清洗指令时,结束对所述问题数据的处理;When the cleaning instruction is not received within the preset time period, the processing of the problem data is ended;
同时删除所述问题数据。Also delete the problem data.
本实施例中,在通过预设的数据清洗策略清洗数据过程中,若出现问题数据,可以将所述问题数据存入问题数据库中,若在预设时间段内未收到再次清洗指令,确定所述问题数据可以删除。In this embodiment, in the process of cleaning data through the preset data cleaning strategy, if problem data occurs, the problem data may be stored in the problem database, and if no cleaning instruction is received within the preset time period, determine The problem data can be deleted.
本实施例中,通过预设的数据清洗策略清洗原始数据,删除问题数据,确保了得到的样本数据的一致性和完整性,提高了样本数据的质量。In this embodiment, the original data is cleaned by the preset data cleaning strategy, and the problem data is deleted, so as to ensure the consistency and integrity of the obtained sample data, and improve the quality of the sample data.
S13:从每个节点对应的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签。S13: Extract multi-dimensional target features from the sample data corresponding to each node, classify the multi-dimensional target features according to a preset classification model, and obtain an initial label of each node.
本实施例中,由于不同的节点对应的数据不同,不同的节点对应不同的样本数据,从每个节点对应的样本数据中提取多维度的目标特征,并在预先训练好的归类模型中训练所述多维度目标特征得到每个节点的初始标签,并将多个节点的初始标签整理形成标签库。In this embodiment, since different nodes correspond to different data and different nodes correspond to different sample data, multi-dimensional target features are extracted from the sample data corresponding to each node, and trained in the pre-trained classification model The multi-dimensional target feature obtains the initial label of each node, and organizes the initial labels of multiple nodes to form a label library.
优选的,所述从每个节点对应的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签包括:Preferably, the multi-dimensional target features are extracted from the sample data corresponding to each node, and the multi-dimensional target features are classified according to a preset classification model, and the initial label obtained for each node includes:
根据每个节点的节点标识和面向所述节点标识的查询语言HQL语法规则从预设的数据库中读取每个节点的样本数据;Read the sample data of each node from the preset database according to the node identification of each node and the query language HQL grammar rule oriented to the node identification;
根据预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征;Extract multi-dimensional target features from the read sample data of each node according to a preset algorithm;
将所述多维度的目标特征输入至所述预设的归类模型中进行归类得到每个节点的初始标签,其中,所述初始标签存储于区块链节点中。The multi-dimensional target features are input into the preset classification model for classification to obtain the initial label of each node, wherein the initial label is stored in the blockchain node.
本实施例中,不同节点的样本数据不同,从所述预设的数据库中采用查询语言HQL语法规则读取对应节点的样本数据,使用预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征,所述预置算法为现有技术,本发明在此不做详细阐述。In this embodiment, the sample data of different nodes is different, the sample data of the corresponding node is read from the preset database using the query language HQL syntax rules, and the sample data of each node is read from the preset algorithm by using the preset algorithm. Multi-dimensional target features are extracted from the algorithm, the preset algorithm is the prior art, and the present invention will not describe in detail here.
需要强调的是,为进一步保证上述初始标签的私密和安全性,上述初始标签还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned initial label, the above-mentioned initial label can also be stored in a node of a blockchain.
本实施例中,通过将所述多维度的目标特征输入预设的归类模型进行归类得到每个节点的初始标签,提高了计算得到初始标签的效率。In this embodiment, the initial label of each node is obtained by inputting the multi-dimensional target feature into a preset classification model for classification, which improves the efficiency of obtaining the initial label by calculation.
进一步的,所述根据预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征包括:Further, the extraction of multi-dimensional target features from the read sample data of each node according to a preset algorithm includes:
根据预置特征维度从所述读取的每个节点的样本数据中提取第一特征;Extract the first feature from the read sample data of each node according to the preset feature dimension;
通过训练好的模型对所述读取的每个节点的样本数据进行处理,得到第二特征;Process the read sample data of each node through the trained model to obtain the second feature;
将所述第一特征和所述第二特征进行合并,得到多维度的目标特征。The first feature and the second feature are combined to obtain a multi-dimensional target feature.
本实施了中,所述多维度的目标特征包括基本特征和行为特征,所述基础特征是被执行对象的自然属性描述,例如,被执行对象的性别和年龄;所述行为特征是被执行对象的行为产生的特征,例如,无财产、无房、无车等。In this implementation, the multi-dimensional target features include basic features and behavioral features, where the basic features are descriptions of natural attributes of the executed object, for example, the gender and age of the executed object; the behavioral feature is the executed object behaviors, such as no property, no house, no car, etc.
本实施例中,与传统的标签体系不同的是,所述标签体系是通过分析业务流程中每个节点的原始数据,对所述原始数据进行清洗得到每个节点的样本数据,并提取每个节点的样本数据中的多维度目标特征得到每个节点的初始标签,确保了初始标签的准确性。In this embodiment, different from the traditional labeling system, the labeling system analyzes the raw data of each node in the business process, cleans the raw data to obtain the sample data of each node, and extracts each node's sample data. The multi-dimensional target feature in the node's sample data obtains the initial label of each node, which ensures the accuracy of the initial label.
S14:对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系。S14: Perform cluster analysis on the initial labels of each node to form label systems of different objects.
本实施例中,所述对象可以根据初始标签所属的主体维度来分,例如,可以分为案件标签、被执行人标签、财产信息标签等;所述对象也可以根据初始标签的应用的策略模型来分,例如,可以分为财产控制模型标签等。In this embodiment, the objects can be classified according to the subject dimension to which the initial label belongs, for example, it can be classified into a case label, an enforcee label, a property information label, etc.; the objects can also be classified according to the strategy model of the application of the initial label To points, for example, can be divided into property control model tags and so on.
优选的,所述对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系包括:Preferably, the cluster analysis of the initial labels of each node to form a label system for different objects includes:
根据k均值聚类算法对所述每个节点的初始标签进行聚类,获得多个对象;According to the k-means clustering algorithm, the initial label of each node is clustered to obtain a plurality of objects;
以所述多个对象中的任一对象作为目标对象,将所述目标对象及所述目标对象对应的初始标签设置为所述目标对象对应的标签体系。Taking any one of the multiple objects as the target object, the target object and the initial label corresponding to the target object are set as the label system corresponding to the target object.
本实施例中,所述k均值聚类算法是一种迭代求解的聚类分析算法,其步骤是随机选取K个对象作为初始的聚类中心,然后计算每个对象与各个种子聚类中心之间的距离,把每个对象分配给距离它最近的聚类中心;聚类中心以及分配给它们的对象就代表一个聚类;每分配一个样本,聚类的聚类中心会根据聚类中现有的对象被重新计算;这个过程将不断重复直到满足预设终止条件,其中,预设终止条件可以是没有对象被重新分配给不同的聚类,没有聚类中心再发生变化,误差平方和局部最小。In this embodiment, the k-means clustering algorithm is an iterative clustering analysis algorithm. The steps are to randomly select K objects as the initial cluster centers, and then calculate the difference between each object and each seed cluster center. The distance between each object is assigned to the cluster center closest to it; the cluster center and the objects assigned to them represent a cluster; each time a sample is assigned, the cluster center of the cluster will be Some objects are recalculated; this process is repeated until a preset termination condition is met, where the preset termination condition can be that no objects are reassigned to different clusters, no cluster centers change again, and the sum of squared errors is local. minimum.
本实施例中,通过采用k均值聚类算法将所述初始标签进行聚类获得多个对象,提高了得到不同对象的标签体系的准确性。In this embodiment, by using the k-means clustering algorithm to cluster the initial labels to obtain multiple objects, the accuracy of obtaining label systems for different objects is improved.
进一步的,在对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系之后,所述方法还包括:Further, after performing cluster analysis on the initial label of each node to form a label system of different objects, the method further includes:
实时监控预设周期内每个初始标签的点击率及转化率;Monitor the click-through rate and conversion rate of each initial tag in a preset period in real time;
判断所述每个初始标签的点击率是否大于对应的点击率阈值,及判断所述每个初始标签的转化率是否大于对应的转化率阈值;Judging whether the click-through rate of each initial label is greater than the corresponding click-through rate threshold, and judging whether the conversion rate of each initial label is greater than the corresponding conversion rate threshold;
当所述每个初始标签的点击率大于或者等于所述对应的点击率阈值,及所述每个初始标签的转化率大于或者等于所述对应的转化率阈值时,将所述初始标签划分为热门标签;When the click-through rate of each initial tag is greater than or equal to the corresponding click-through rate threshold, and the conversion rate of each initial tag is greater than or equal to the corresponding conversion rate threshold, the initial tags are divided into popular tags;
当所述每个初始标签的点击率小于所述对应的点击率阈值,或者所述每个初始标签的转化率小于所述对应的转化率阈值时,将所述初始标签划分为无用标签。When the click rate of each initial tag is smaller than the corresponding click rate threshold, or the conversion rate of each initial tag is smaller than the corresponding conversion rate threshold, the initial tag is classified as a useless tag.
进一步的,所述方法还包括:Further, the method also includes:
当所述初始标签为热门标签时,保留所述初始标签;When the initial label is a popular label, retain the initial label;
当所述初始标签为无用标签时,删除所述初始标签。When the initial label is a useless label, the initial label is deleted.
本实施例中,所述转化率是指初始标签转化为高级标签的比率。In this embodiment, the conversion rate refers to the rate at which initial tags are converted into advanced tags.
本实施了中,通过预设周期内实时监控每个初始标签的点击率和转换率,删除无用标签,不断通过训练学习优化整个标签体系,确保了标签体系中的初始标签的时效性,同时提高了推荐标签的准确率。In this implementation, the click-through rate and conversion rate of each initial tag are monitored in real time within a preset period, useless tags are deleted, and the entire tag system is continuously optimized through training and learning to ensure the timeliness of the initial tags in the tag system, while improving the accuracy of the recommended labels.
S15:当监测到目标节点被触发时,推送所述目标节点对应的标签体系中的初始标签。S15: When it is detected that the target node is triggered, push the initial label in the label system corresponding to the target node.
示例性的,当监测到触发了财产控制的节点时,推送所述财产控制模型对应的标签体系中的财产控制模型标签,辅助用户快速的做出决策,确定控制财产的优先级,如先冻结被执行人的银行存款,还是先查封被执行人的本地房产等,提高了用户的办案的效率。Exemplarily, when a node that triggers property control is detected, the property control model label in the label system corresponding to the property control model is pushed to assist the user in making decisions quickly and determine the priority of property control, such as freezing first. The bank deposit of the person subject to execution or the local real estate of the person subject to execution should be seized first, which improves the efficiency of the user's case handling.
本实施例中,通过标签直接给到不同的应用模型,减少模型策略运算过程。In this embodiment, the labels are directly given to different application models, thereby reducing the model strategy operation process.
进一步的,在所述推送所述目标节点对应的标签体系中的初始标签之后,所述方法还包括:Further, after the initial label in the label system corresponding to the target node is pushed, the method further includes:
当监测到用户对推送的初始标签的再加工指令时,解析所述再加工指令得到所述用户的再加工条件;When monitoring the user's reprocessing instruction for the pushed initial label, parse the reprocessing instruction to obtain the user's reprocessing condition;
将所述再加工条件输入至所述预设的归类模型中得到新的标签,将所述新的标签与所述推送的初始标签进行组合运算得到高级标签;Inputting the reprocessing conditions into the preset classification model to obtain a new label, and combining the new label and the pushed initial label to obtain an advanced label;
推送所述高级标签。Push the advanced tag.
本实施中,通过解析用户反馈的再加工指令,将所述再加工条件输入至所述预设的归类模型中得到新的标签,将所述新的标签与所述推送的初始标签进行组合运算得到高级标签,及时的响应了用户,提高了推荐标签的及时性,提升办案效率。In this implementation, by parsing the reprocessing instruction fed back by the user, inputting the reprocessing condition into the preset classification model to obtain a new tag, and combining the new tag with the pushed initial tag The calculation obtains advanced tags, which responds to users in a timely manner, improves the timeliness of recommended tags, and improves the efficiency of case handling.
综上所述,本实施例所述的一种基于大数据的标签推送方法,通过从预设的多个数据源中采集原始数据,其中,所述原始数据中对应有节点标识;对所述原始数据按照预设的数据清洗策略进行数据清洗,得到多个节点的样本数据;从每个节点的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签;对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系;当监测到所述多个节点中的目标节点被触发时,推送所述目标节点对应的标签体系中的初始标签。To sum up, the method for pushing labels based on big data described in this embodiment collects raw data from multiple preset data sources, wherein the raw data corresponds to a node identifier; The original data is cleaned according to the preset data cleaning strategy, and the sample data of multiple nodes is obtained; the multi-dimensional target features are extracted from the sample data of each node, and the multi-dimensional target features are classified according to the preset The model is classified to obtain the initial label of each node; the initial label of each node is clustered to form a label system of different objects; when the target node in the multiple nodes is monitored to be triggered, push The initial label in the label system corresponding to the target node.
本实施例所述的一种基于大数据的标签推送方法,一方面通过预设的数据清洗策略清洗从不同的数据源采集的原始数据,对所述原始数据进行清洗得到每个节点的样本数据,删除问题数据,确保了得到的样本数据的一致性和完整性,提高了样本数据的质量,另一方面通过提取每个节点的样本数据中的多维度目标特征,将所述多维度的目标特征输入预设的归类模型进行归类得到每个节点的初始标签,提高了计算得到初始标签的效率,同时针对每个节点的初始标签聚类为不同对象的标签体系,提高了标签的推荐的准确率。In the method for pushing labels based on big data described in this embodiment, on the one hand, the original data collected from different data sources is cleaned through a preset data cleaning strategy, and the sample data of each node is obtained by cleaning the original data. , delete the problem data, ensure the consistency and integrity of the obtained sample data, and improve the quality of the sample data. On the other hand, by extracting the multi-dimensional target features in the sample data of each node, the multi-dimensional target The feature is input into the preset classification model to classify the initial label of each node, which improves the efficiency of calculating the initial label. At the same time, the initial label of each node is clustered into a label system of different objects, which improves the recommendation of labels. 's accuracy.
此外,通过预设周期内实时监控每个初始标签的点击率和转换率,删除无用标签,不断通过训练学习优化整个标签体系,确保了标签体系中的初始标签的时效性,同时提高了推荐标签的准确率。In addition, by monitoring the click rate and conversion rate of each initial tag in real time within a preset period, deleting useless tags, and continuously optimizing the entire tag system through training and learning, ensuring the timeliness of the initial tags in the tag system and improving the recommended tags. 's accuracy.
实施例二Embodiment 2
图2是本发明实施例二提供的基于大数据的标签推送装置的结构图。FIG. 2 is a structural diagram of a big data-based label push device provided in Embodiment 2 of the present invention.
在一些实施例中,所述基于大数据的标签推送装置20可以包括多个由程序代码段所组成的功能模块。所述基于大数据的标签推送装置20中的各个程序段的程序代码可以存储于电子设备的存储器中,并由所述至少一个处理器所执行,以执行(详见图1描述)基于大数据的标签的推送。In some embodiments, the big data-based tag pushing apparatus 20 may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the big data-based label pushing apparatus 20 may be stored in the memory of the electronic device and executed by the at least one processor to execute (details described in FIG. 1 ) based on big data 's tag push.
本实施例中,所述基于大数据的标签推送装置20根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:采集模块201、清洗模块202、归类模块203、分析模块204、监控模块205、判断模块206及推送模块207。本发明所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机程序段,其存储在存储器中。在本实施例中,关于各模块的功能将在后续的实施例中详述。In this embodiment, the big data-based label pushing apparatus 20 may be divided into multiple functional modules according to the functions performed by the big data-based label pushing apparatus 20 . The functional modules may include: a
采集模块201:用于从预设的多个数据源中采集多个原始数据,其中,每个所述原始数据中对应有节点标识。Collection module 201: used to collect multiple raw data from multiple preset data sources, wherein each of the raw data corresponds to a node identifier.
本实施例中,所述原始数据包括:被执行对象的基本信息、案件的基本信息、执行主体信息和财产信息等,其中,所述被执行人指老赖,被执行人信息主要包括:姓名、身份证号、年龄、性别、职业和所在单位等;执行主体信息主要包括:案号、被执行人身份信息、用户信息、涉及环节、返回状态、操作时间等等;财产信息指被执行人名下的所有财产,例如:银行存款、房产、车辆等。以房产为例:房产所在的省市、楼层、朝向、面积等等。所述数据源可以为执行业务系统,从所述执行业务系统的各个流程节点中采集原始数据。In this embodiment, the original data includes: basic information of the subject to be executed, basic information of the case, information of the subject of execution, property information, etc., wherein the person to be executed refers to Lao Lai, and the information of the person to be executed mainly includes: name , ID number, age, gender, occupation and unit, etc.; execution subject information mainly includes: case number, identity information of the person subject to execution, user information, links involved, return status, operation time, etc.; property information refers to the name of the person subject to execution All property under it, such as bank deposits, real estate, vehicles, etc. Take real estate as an example: the province and city where the real estate is located, floor, orientation, area, etc. The data source may be an execution business system, and raw data is collected from each process node of the execution business system.
清洗模块202:用于对每个所述原始数据按照预设的数据清洗策略进行数据清洗,得到样本数据。Cleaning module 202 : for performing data cleaning on each of the raw data according to a preset data cleaning strategy to obtain sample data.
本实施例中,可以预先根据每个节点对应的标签的清洗条件设置数据清洗策略,所述预设的数据清洗策略可以为对缺失值清洗、格式内容清洗、逻辑错误清洗和非需求数据清洗,当采集到原始数据后,按照所述预设的数据清洗策略对所述原始数据进行清洗,得到样本数据。In this embodiment, a data cleaning strategy may be set in advance according to the cleaning conditions of the labels corresponding to each node, and the preset data cleaning strategy may be cleaning of missing values, format content cleaning, logical error cleaning, and non-required data cleaning, After the raw data is collected, the raw data is cleaned according to the preset data cleaning strategy to obtain sample data.
本实施例中,所述缺失值清洗对应的预设的数据清洗策略为直接删除带有缺失值数据记录或者补全带有缺失值数据记录。In this embodiment, the preset data cleaning strategy corresponding to the missing value cleaning is to directly delete data records with missing values or to complete data records with missing values.
示例性的,带有缺失值的数据记录的目标标签主要集中于某一类或几类,如果删除这些数据记录将使对应分类的数据样本丢失大量特征信息,导致模型过拟合或分类不准确,采用的预设的数据清洗策略为补全带有缺失值数据记录。Exemplarily, the target labels of data records with missing values are mainly concentrated in one or several categories. If these data records are deleted, a large amount of feature information will be lost in the corresponding classified data samples, resulting in overfitting of the model or inaccurate classification. , and the preset data cleaning strategy used is to complete data records with missing values.
本实施例中,所述格式内容清洗对应的预设的数据清洗策略为对时间、日期、数值、全半角等显示格式不一致、内容中有不该存在的字符及、内容与该字段应有内容不符的数据进行清洗。In this embodiment, the preset data cleaning strategy corresponding to the format content cleaning is that the display format of time, date, numerical value, full-width, etc. is inconsistent, the content contains characters that should not exist, and the content and the field should have content. Inconsistent data is cleaned.
示例性的,当时间、日期、数值、全半角等显示格式不一致时,预设的数据清洗策略为将所述时间、日期、数值、全半角等显示格式处理成一致的格式;当内容中有不该存在的字符时,预设的数据清洗策略为以半自动校验半人工方式来找出可能存在的问题,并去除不需要的字符,例如:身份证号中出现汉字。Exemplarily, when the display formats such as time, date, numerical value, full-width, etc. are inconsistent, the preset data cleaning strategy is to process the display formats such as time, date, numerical value, and full-width into a consistent format; When there are characters that should not exist, the preset data cleaning strategy is to use semi-automatic verification and semi-manual methods to find out possible problems and remove unnecessary characters, such as Chinese characters in ID numbers.
本实施例中,所述逻辑错误清洗对应的预设的数据清洗策略为去重、去除不合理值及修正矛盾内容。In this embodiment, the preset data cleaning strategy corresponding to the logic error cleaning is deduplication, removal of unreasonable values, and correction of contradictory content.
示例性的,针对去重设置的预设的数据清洗策略为将重复的字段进行删除,只保留一个;针对去除不合理值设置的预设的数据清洗策略如年龄200岁,删除年龄对应的不合理值;针对修正矛盾内容设置的预设的数据清洗策略为需要根据字段的数据来源,来判定哪个字段提供的信息更为可靠,去除或重构不可靠的字段,如身份证号是1101031980XXXXXXXX,然后年龄填18岁,需要判断身份证号和年龄那个更可靠进行重构或者删除矛盾内容。Exemplarily, the preset data cleaning strategy set for deduplication is to delete duplicate fields and keep only one; the preset data cleaning strategy set for removing unreasonable values, such as age 200, delete the unreasonable value corresponding to the age. Reasonable value; the preset data cleaning strategy set for correcting conflicting content is to determine which field provides more reliable information based on the data source of the field, and remove or reconstruct unreliable fields. For example, the ID number is 1101031980XXXXXXXX, Then fill in the age of 18, and you need to judge the ID number and age which are more reliable to reconstruct or delete contradictory content.
本实施例中,所述非需求数据清洗对应的预设的数据清洗策略为是指将不要的字段进行删除。In this embodiment, the preset data cleaning policy corresponding to the non-required data cleaning refers to deleting unnecessary fields.
优选的,所述清洗模块202对所述原始数据按照预设的数据清洗策略进行数据清洗,得到样本数据包括:Preferably, the
识别每个原始数据的节点标识;Identify the node ID of each raw data;
获取所述节点标识对应的预设的数据清洗策略;obtaining a preset data cleaning strategy corresponding to the node identifier;
按照所述预设的数据清洗策略清洗所述节点标识对应的原始数据;Clean the original data corresponding to the node identifier according to the preset data cleaning strategy;
将清洗过的所述原始数据转换成预设类型的结构化数据;Converting the cleaned raw data into structured data of a preset type;
将所述结构化的数据按照所述节点标识进行归类得到样本数据,并将所述样本数据存放至预设的数据库中。Classifying the structured data according to the node identifiers to obtain sample data, and storing the sample data in a preset database.
本实施例中,所述预设数据库可以为hive数据库,Hive是基于Hadoop的一个数据仓库工具,可以存储结构化的数据,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行,通过预设的数据清洗策略将所述原始数据进行清洗后转换成预设类型的结构化数据,将所述结构化的数据按照所述节点标识进行归类得到样本数据,并将所述样本数据存放至预设的数据库中。In this embodiment, the preset database may be the hive database, which is a data warehouse tool based on Hadoop, which can store structured data, provide a complete SQL query function, and can convert SQL statements into MapReduce tasks for running , the original data is cleaned and converted into structured data of a preset type through a preset data cleaning strategy, the structured data is classified according to the node identifier to obtain sample data, and the sample Data is stored in a preset database.
进一步的,在数据清洗的过程中,将所述原始数据中不符合所述预设的数据清洗策略的问题数据放置于问题数据库中;在预设时间段内未收到再次清洗指令时,结束对所述问题数据的处理;同时删除所述问题数据。Further, in the process of data cleaning, the problem data in the original data that does not conform to the preset data cleaning strategy is placed in the problem database; when the re-cleaning instruction is not received within the preset time period, the process ends. Processing of the problem data; deleting the problem data at the same time.
本实施例中,在通过预设的数据清洗策略清洗数据过程中,若出现问题数据,可以将所述问题数据存入问题数据库中,若在预设时间段内未收到再次清洗指令,确定所述问题数据可以删除。In this embodiment, in the process of cleaning data through the preset data cleaning strategy, if problem data occurs, the problem data may be stored in the problem database, and if no cleaning instruction is received within the preset time period, determine The problem data can be deleted.
本实施例中,通过预设的数据清洗策略清洗原始数据,删除问题数据,确保了得到的样本数据的一致性和完整性,提高了样本数据的质量。In this embodiment, the original data is cleaned by the preset data cleaning strategy, and the problem data is deleted, so as to ensure the consistency and integrity of the obtained sample data, and improve the quality of the sample data.
归类模块203:用于从每个节点对应的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签。Classification module 203: used to extract multi-dimensional target features from the sample data corresponding to each node, classify the multi-dimensional target features according to a preset classification model, and obtain an initial label of each node.
本实施例中,由于不同的节点对应的数据不同,不同的节点对应不同的样本数据,从每个节点对应的样本数据中提取多维度的目标特征,并在预先训练好的归类模型中训练所述多维度目标特征得到每个节点的初始标签,并将多个节点的初始标签整理形成标签库。In this embodiment, since different nodes correspond to different data and different nodes correspond to different sample data, multi-dimensional target features are extracted from the sample data corresponding to each node, and trained in the pre-trained classification model The multi-dimensional target feature obtains the initial label of each node, and organizes the initial labels of multiple nodes to form a label library.
优选的,所述归类模块203从每个节点对应的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签包括:Preferably, the
根据每个节点的节点标识和面向所述节点标识的查询语言HQL语法规则从预设的数据库中读取每个节点的样本数据;Read the sample data of each node from the preset database according to the node identification of each node and the query language HQL grammar rule oriented to the node identification;
根据预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征;Extract multi-dimensional target features from the read sample data of each node according to a preset algorithm;
将所述多维度的目标特征输入至所述预设的归类模型中进行归类得到每个节点的初始标签,其中,所述初始标签存储于区块链节点中。The multi-dimensional target features are input into the preset classification model for classification to obtain the initial label of each node, wherein the initial label is stored in the blockchain node.
本实施例中,不同节点的样本数据不同,从所述预设的数据库中采用查询语言HQL语法规则读取对应节点的样本数据,使用预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征,所述预置算法为现有技术,本发明在此不做详细阐述。In this embodiment, the sample data of different nodes is different, the sample data of the corresponding node is read from the preset database using the query language HQL syntax rules, and the sample data of each node is read from the preset algorithm by using the preset algorithm. Multi-dimensional target features are extracted from the algorithm, the preset algorithm is the prior art, and the present invention will not describe in detail here.
需要强调的是,为进一步保证上述初始标签的私密和安全性,上述初始标签还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned initial label, the above-mentioned initial label can also be stored in a node of a blockchain.
本实施例中,通过将所述多维度的目标特征输入预设的归类模型进行归类得到每个节点的初始标签,提高了计算得到初始标签的效率。In this embodiment, the initial label of each node is obtained by inputting the multi-dimensional target feature into a preset classification model for classification, which improves the efficiency of obtaining the initial label by calculation.
进一步的,所述根据预置算法从所述读取的每个节点的样本数据中提取多维度的目标特征包括:Further, the extraction of multi-dimensional target features from the read sample data of each node according to a preset algorithm includes:
根据预置特征维度从所述读取的每个节点的样本数据中提取第一特征;Extract the first feature from the read sample data of each node according to the preset feature dimension;
通过训练好的模型对所述读取的每个节点的样本数据进行处理,得到第二特征;Process the read sample data of each node through the trained model to obtain the second feature;
将所述第一特征和所述第二特征进行合并,得到多维度的目标特征。The first feature and the second feature are combined to obtain a multi-dimensional target feature.
本实施了中,所述多维度的目标特征包括基本特征和行为特征,所述基础特征是被执行对象的自然属性描述,例如,被执行对象的性别和年龄;所述行为特征是被执行对象的行为产生的特征,例如,无财产、无房、无车等。In this implementation, the multi-dimensional target features include basic features and behavioral features, where the basic features are descriptions of natural attributes of the executed object, for example, the gender and age of the executed object; the behavioral feature is the executed object behaviors, such as no property, no house, no car, etc.
本实施例中,与传统的标签体系不同的是,所述标签体系是通过分析业务流程中每个节点的原始数据,对所述原始数据进行清洗得到每个节点的样本数据,并提取每个节点的样本数据中的多维度目标特征得到每个节点的初始标签,确保了初始标签的准确性。In this embodiment, different from the traditional labeling system, the labeling system analyzes the raw data of each node in the business process, cleans the raw data to obtain the sample data of each node, and extracts each node's sample data. The multi-dimensional target feature in the node's sample data obtains the initial label of each node, which ensures the accuracy of the initial label.
分析模块204:用于对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系。Analysis module 204: used to perform cluster analysis on the initial label of each node to form a label system of different objects.
本实施例中,所述对象可以根据初始标签所属的主体维度来分,例如,可以分为案件标签、被执行人标签、财产信息标签等;所述对象也可以根据初始标签的应用的策略模型来分,例如,可以分为财产控制模型标签等。In this embodiment, the objects can be classified according to the subject dimension to which the initial label belongs, for example, it can be classified into a case label, an enforcee label, a property information label, etc.; the objects can also be classified according to the strategy model of the application of the initial label To points, for example, can be divided into property control model tags and so on.
优选的,所述分析模块204对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系包括:Preferably, the
根据k均值聚类算法对所述每个节点的初始标签进行聚类,获得多个对象;According to the k-means clustering algorithm, the initial label of each node is clustered to obtain a plurality of objects;
以所述多个对象中的任一对象作为目标对象,将所述目标对象及所述目标对象对应的初始标签设置为所述目标对象对应的标签体系。Taking any one of the multiple objects as the target object, the target object and the initial label corresponding to the target object are set as the label system corresponding to the target object.
本实施例中,所述k均值聚类算法是一种迭代求解的聚类分析算法,其步骤是随机选取K个对象作为初始的聚类中心,然后计算每个对象与各个种子聚类中心之间的距离,把每个对象分配给距离它最近的聚类中心;聚类中心以及分配给它们的对象就代表一个聚类;每分配一个样本,聚类的聚类中心会根据聚类中现有的对象被重新计算;这个过程将不断重复直到满足预设终止条件,其中,预设终止条件可以是没有对象被重新分配给不同的聚类,没有聚类中心再发生变化,误差平方和局部最小。In this embodiment, the k-means clustering algorithm is an iterative clustering analysis algorithm. The steps are to randomly select K objects as the initial cluster centers, and then calculate the difference between each object and each seed cluster center. The distance between each object is assigned to the cluster center closest to it; the cluster center and the objects assigned to them represent a cluster; each time a sample is assigned, the cluster center of the cluster will be Some objects are recalculated; this process is repeated until a preset termination condition is met, where the preset termination condition can be that no objects are reassigned to different clusters, no cluster centers change again, and the sum of squared errors is local. minimum.
本实施例中,通过采用k均值聚类算法将所述初始标签进行聚类获得多个对象,提高了得到不同对象的标签体系的准确性。In this embodiment, by using the k-means clustering algorithm to cluster the initial labels to obtain multiple objects, the accuracy of obtaining label systems for different objects is improved.
进一步的,在所述分析模块204对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系之后,监控模块205:用于实时监控预设周期内每个初始标签的点击率及转化率。Further, after the
判断模块206:用于判断所述每个初始标签的点击率是否大于对应的点击率阈值,及判断所述每个初始标签的转化率是否大于对应的转化率阈值。Judgment module 206: for judging whether the click rate of each initial tag is greater than the corresponding click rate threshold, and judging whether the conversion rate of each initial tag is greater than the corresponding conversion rate threshold.
本实施例中,当所述每个初始标签的点击率大于或者等于所述对应的点击率阈值,及所述每个初始标签的转化率大于或者等于所述对应的转化率阈值时,将所述初始标签划分为热门标签。In this embodiment, when the click rate of each initial tag is greater than or equal to the corresponding click rate threshold, and the conversion rate of each initial tag is greater than or equal to the corresponding conversion rate threshold, the The initial tags are divided into popular tags.
本实施例中,当所述每个初始标签的点击率小于所述对应的点击率阈值,或者所述每个初始标签的转化率小于所述对应的转化率阈值时,将所述初始标签划分为无用标签。In this embodiment, when the click rate of each initial tag is smaller than the corresponding click rate threshold, or the conversion rate of each initial tag is smaller than the corresponding conversion rate threshold, the initial tags are divided into for useless labels.
进一步的,在将所述初始标签化分为无用标签和热门标签之后,判断所述初始标签的类型,当所述初始标签为热门标签时,保留所述初始标签;当所述初始标签为无用标签时,删除所述初始标签。Further, after dividing the initial label into useless labels and popular labels, determine the type of the initial label, when the initial label is a popular label, keep the initial label; when the initial label is useless When tagging, delete the initial tag.
本实施例中,所述转化率是指初始标签转化为高级标签的比率。In this embodiment, the conversion rate refers to the rate at which initial tags are converted into advanced tags.
本实施了中,通过预设周期内实时监控每个初始标签的点击率和转换率,删除无用标签,不断通过训练学习优化整个标签体系,确保了标签体系中的初始标签的时效性,同时提高了推荐标签的准确率。In this implementation, the click-through rate and conversion rate of each initial tag are monitored in real time within a preset period, useless tags are deleted, and the entire tag system is continuously optimized through training and learning to ensure the timeliness of the initial tags in the tag system, while improving the accuracy of the recommended labels.
推送模块207:用于当监测到目标节点被触发时,推送所述目标节点对应的标签体系中的初始标签。Pushing module 207: configured to push the initial label in the label system corresponding to the target node when monitoring that the target node is triggered.
示例性的,当监测到触发了财产控制的节点时,推送所述财产控制模型对应的标签体系中的财产控制模型标签,辅助用户快速的做出决策,确定控制财产的优先级,如先冻结被执行人的银行存款,还是先查封被执行人的本地房产等,提高了用户的办案的效率。Exemplarily, when a node that triggers property control is detected, the property control model label in the label system corresponding to the property control model is pushed to assist the user in making decisions quickly and determine the priority of property control, such as freezing first. The bank deposit of the person subject to execution or the local real estate of the person subject to execution should be seized first, which improves the efficiency of the user's case handling.
本实施例中,通过标签直接给到不同的应用模型,减少模型策略运算过程。In this embodiment, the labels are directly given to different application models, thereby reducing the model strategy operation process.
进一步的,在所述推送模块207所述推送所述目标节点对应的标签体系中的初始标签之后,当监测到用户对推送的初始标签的再加工指令时,解析所述再加工指令得到所述用户的再加工条件;将所述再加工条件输入至所述预设的归类模型中得到新的标签,将所述新的标签与所述推送的初始标签进行组合运算得到高级标签;推送所述高级标签。Further, after the
本实施中,通过解析用户反馈的再加工指令,将所述再加工条件输入至所述预设的归类模型中得到新的标签,将所述新的标签与所述推送的初始标签进行组合运算得到高级标签,及时的响应了用户,提高了推荐标签的及时性,提升办案效率。In this implementation, by parsing the reprocessing instruction fed back by the user, inputting the reprocessing condition into the preset classification model to obtain a new tag, and combining the new tag with the pushed initial tag The calculation obtains advanced tags, which responds to users in a timely manner, improves the timeliness of recommended tags, and improves the efficiency of case handling.
综上所述,本实施例所述的一种基于大数据的标签推送装置,通过从预设的多个数据源中采集原始数据,其中,所述原始数据中对应有节点标识;对所述原始数据按照预设的数据清洗策略进行数据清洗,得到多个节点的样本数据;从每个节点的样本数据中提取多维度的目标特征,将所述多维度的目标特征按照预设的归类模型进行归类,得到每个节点的初始标签;对所述每个节点的初始标签进行聚类分析形成不同对象的标签体系;当监测到所述多个节点中的目标节点被触发时,推送所述目标节点对应的标签体系中的初始标签。To sum up, the device for pushing labels based on big data described in this embodiment collects raw data from multiple preset data sources, wherein the raw data corresponds to a node identifier; The original data is cleaned according to a preset data cleaning strategy to obtain sample data of multiple nodes; multi-dimensional target features are extracted from the sample data of each node, and the multi-dimensional target features are classified according to a preset The model is classified to obtain the initial label of each node; the initial label of each node is clustered to form a label system of different objects; when the target node in the multiple nodes is monitored to be triggered, push The initial label in the label system corresponding to the target node.
本实施例所述的一种基于大数据的标签推送方法,一方面通过预设的数据清洗策略清洗从不同的数据源采集的原始数据,对所述原始数据进行清洗得到每个节点的样本数据,删除问题数据,确保了得到的样本数据的一致性和完整性,提高了样本数据的质量,另一方面通过提取每个节点的样本数据中的多维度目标特征,将所述多维度的目标特征输入预设的归类模型进行归类得到每个节点的初始标签,提高了计算得到初始标签的效率,同时针对每个节点的初始标签聚类为不同对象的标签体系,提高了标签的推荐的准确率。In the method for pushing labels based on big data described in this embodiment, on the one hand, the original data collected from different data sources is cleaned through a preset data cleaning strategy, and the sample data of each node is obtained by cleaning the original data. , delete the problem data, ensure the consistency and integrity of the obtained sample data, and improve the quality of the sample data. On the other hand, by extracting the multi-dimensional target features in the sample data of each node, the multi-dimensional target The feature is input into the preset classification model to classify the initial label of each node, which improves the efficiency of calculating the initial label. At the same time, the initial label of each node is clustered into a label system of different objects, which improves the recommendation of labels. 's accuracy.
此外,通过预设周期内实时监控每个初始标签的点击率和转换率,删除无用标签,不断通过训练学习优化整个标签体系,确保了标签体系中的初始标签的时效性,同时提高了推荐标签的准确率。In addition, by monitoring the click rate and conversion rate of each initial tag in real time within a preset period, deleting useless tags, and continuously optimizing the entire tag system through training and learning, ensuring the timeliness of the initial tags in the tag system and improving the recommended tags. 's accuracy.
实施例三Embodiment 3
参阅图3所示,为本发明实施例三提供的电子设备的结构示意图。在本发明较佳实施例中,所述电子设备3包括存储器31、至少一个处理器32、至少一条通信总线33及收发器34。Referring to FIG. 3 , it is a schematic structural diagram of an electronic device according to Embodiment 3 of the present invention. In a preferred embodiment of the present invention, the electronic device 3 includes a
本领域技术人员应该了解,图3示出的电子设备的结构并不构成本发明实施例的限定,既可以是总线型结构,也可以是星形结构,所述电子设备3还可以包括比图示更多或更少的其他硬件或者软件,或者不同的部件布置。Those skilled in the art should understand that the structure of the electronic device shown in FIG. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type structure or a star-shaped structure, and the electronic device 3 may also include a ratio more or less other hardware or software, or a different arrangement of components is shown.
在一些实施例中,所述电子设备3是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路、可编程门阵列、数字处理器及嵌入式设备等。所述电子设备3还可包括客户设备,所述客户设备包括但不限于任何一种可与客户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、数码相机等。In some embodiments, the electronic device 3 is an electronic device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits , programmable gate arrays, digital processors and embedded devices. The electronic device 3 may also include a client device, which includes but is not limited to any electronic product that can perform human-computer interaction with a client through a keyboard, a mouse, a remote control, a touchpad, or a voice-activated device, for example, Personal computers, tablets, smartphones, digital cameras, etc.
需要说明的是,所述电子设备3仅为举例,其他现有的或今后可能出现的电子产品如可适应于本发明,也应包含在本发明的保护范围以内,并以引用方式包含于此。It should be noted that the electronic device 3 is only an example, and other existing or future electronic products that can be adapted to the present invention should also be included within the protection scope of the present invention, and are incorporated herein by reference .
在一些实施例中,所述存储器31用于存储程序代码和各种数据,例如安装在所述电子设备3中的基于大数据的标签推送装置20,并在电子设备3的运行过程中实现高速、自动地完成程序或数据的存取。所述存储器31包括只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、一次可编程只读存储器(One-timeProgrammable Read-Only Memory,OTPROM)、电子擦除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(CompactDisc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。In some embodiments, the
在一些实施例中,所述至少一个处理器32可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述至少一个处理器32是所述电子设备3的控制核心(Control Unit),利用各种接口和线路连接整个电子设备3的各个部件,通过运行或执行存储在所述存储器31内的程序或者模块,以及调用存储在所述存储器31内的数据,以执行电子设备3的各种功能和处理数据,例如执行基于大数据的标签推送的功能。In some embodiments, the at least one
在一些实施例中,所述至少一条通信总线33被设置为实现所述存储器31以及所述至少一个处理器32等之间的连接通信。In some embodiments, the at least one
尽管未示出,所述电子设备3还可以包括给各个部件供电的电源(比如电池),优选的,电源可以通过电源管理装置与所述至少一个处理器32逻辑相连,从而通过电源管理装置实现管理充电、放电、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备3还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。Although not shown, the electronic device 3 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.
上述以软件功能模块的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,电子设备,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分。The above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present invention. part.
在进一步的实施例中,结合图2,所述至少一个处理器32可执行所述电子设备3的操作装置以及安装的各类应用程序(如所述的基于大数据的标签推送装置20)、程序代码等,例如,上述的各个模块。In a further embodiment, with reference to FIG. 2 , the at least one
所述存储器31中存储有程序代码,且所述至少一个处理器32可调用所述存储器31中存储的程序代码以执行相关的功能。例如,图2中所述的各个模块是存储在所述存储器31中的程序代码,并由所述至少一个处理器32所执行,从而实现所述各个模块的功能以达到基于大数据的标签推送的目的。Program codes are stored in the
在本发明的一个实施例中,所述存储器31存储多个指令,所述多个指令被所述至少一个处理器32所执行以实现基于大数据的标签推送的目的。In an embodiment of the present invention, the
具体地,所述至少一个处理器32对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。Specifically, for the specific implementation method of the above instruction by the at least one
本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in the present invention is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
在本发明所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or, and the singular does not exclude the plural. Several units or means recited in the device claims can also be realized by one unit or means by means of software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.
最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010610771.8A CN111984898A (en) | 2020-06-29 | 2020-06-29 | Tag push method, device, electronic device and storage medium based on big data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010610771.8A CN111984898A (en) | 2020-06-29 | 2020-06-29 | Tag push method, device, electronic device and storage medium based on big data |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111984898A true CN111984898A (en) | 2020-11-24 |
Family
ID=73437640
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010610771.8A Pending CN111984898A (en) | 2020-06-29 | 2020-06-29 | Tag push method, device, electronic device and storage medium based on big data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111984898A (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112532750A (en) * | 2021-01-18 | 2021-03-19 | 深圳博士创新技术转移有限公司 | Big data push processing method and system and cloud platform |
| CN112860675A (en) * | 2021-02-06 | 2021-05-28 | 高云 | Big data processing method under online cloud service environment and cloud computing server |
| CN114003591A (en) * | 2021-10-29 | 2022-02-01 | 广州华多网络科技有限公司 | Commodity data multi-mode cleaning method and device, equipment, medium and product thereof |
| CN114638654A (en) * | 2022-04-07 | 2022-06-17 | 中国工商银行股份有限公司 | Target object determination method, device and electronic device |
| CN114756149A (en) * | 2022-05-12 | 2022-07-15 | 北京达佳互联信息技术有限公司 | Method and device for presenting data label, electronic equipment and storage medium |
| CN114791915A (en) * | 2022-06-22 | 2022-07-26 | 深圳高灯计算机科技有限公司 | Data aggregation method and device, computer equipment and storage medium |
| CN115455259A (en) * | 2022-09-15 | 2022-12-09 | 深圳壹账通智能科技有限公司 | Service feature label generation method, device, equipment and storage medium |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180129710A1 (en) * | 2016-11-10 | 2018-05-10 | Yahoo Japan Corporation | Information processing apparatus, information processing method, and non-transitory computer readable recording medium |
| CN111062750A (en) * | 2019-12-13 | 2020-04-24 | 中国平安财产保险股份有限公司 | User portrait label modeling and analysis method, device, equipment and storage medium |
| CN111177129A (en) * | 2019-12-16 | 2020-05-19 | 中国平安财产保险股份有限公司 | Label system construction method, device, equipment and storage medium |
-
2020
- 2020-06-29 CN CN202010610771.8A patent/CN111984898A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180129710A1 (en) * | 2016-11-10 | 2018-05-10 | Yahoo Japan Corporation | Information processing apparatus, information processing method, and non-transitory computer readable recording medium |
| CN111062750A (en) * | 2019-12-13 | 2020-04-24 | 中国平安财产保险股份有限公司 | User portrait label modeling and analysis method, device, equipment and storage medium |
| CN111177129A (en) * | 2019-12-16 | 2020-05-19 | 中国平安财产保险股份有限公司 | Label system construction method, device, equipment and storage medium |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112532750A (en) * | 2021-01-18 | 2021-03-19 | 深圳博士创新技术转移有限公司 | Big data push processing method and system and cloud platform |
| CN112860675A (en) * | 2021-02-06 | 2021-05-28 | 高云 | Big data processing method under online cloud service environment and cloud computing server |
| CN114003591A (en) * | 2021-10-29 | 2022-02-01 | 广州华多网络科技有限公司 | Commodity data multi-mode cleaning method and device, equipment, medium and product thereof |
| CN114003591B (en) * | 2021-10-29 | 2025-01-28 | 广州华多网络科技有限公司 | Commodity data multimodal cleaning method and its device, equipment, medium, and product |
| CN114638654A (en) * | 2022-04-07 | 2022-06-17 | 中国工商银行股份有限公司 | Target object determination method, device and electronic device |
| CN114756149A (en) * | 2022-05-12 | 2022-07-15 | 北京达佳互联信息技术有限公司 | Method and device for presenting data label, electronic equipment and storage medium |
| CN114756149B (en) * | 2022-05-12 | 2023-12-29 | 北京达佳互联信息技术有限公司 | Method, device, electronic equipment and storage medium for presenting data tag |
| CN114791915A (en) * | 2022-06-22 | 2022-07-26 | 深圳高灯计算机科技有限公司 | Data aggregation method and device, computer equipment and storage medium |
| CN115455259A (en) * | 2022-09-15 | 2022-12-09 | 深圳壹账通智能科技有限公司 | Service feature label generation method, device, equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111984898A (en) | Tag push method, device, electronic device and storage medium based on big data | |
| CN111612041B (en) | Abnormal user identification method and device, storage medium and electronic equipment | |
| WO2019218475A1 (en) | Method and device for identifying abnormally-behaving subject, terminal device, and medium | |
| CN111475612A (en) | Construction method, device, equipment and storage medium of early warning event map | |
| CN113626607B (en) | Abnormal work order identification method and device, electronic equipment and readable storage medium | |
| CN115146865A (en) | Task optimization method based on artificial intelligence and related equipment | |
| CN118689639B (en) | Cloud resource intelligent recovery method, system and terminal equipment | |
| CN117971606B (en) | Log management system and method based on elastic search | |
| CN113342979B (en) | Hot topic identification method, computer device and storage medium | |
| CN111400122B (en) | Hard disk health degree assessment method and device | |
| CN113590824A (en) | Method and device for constructing causal graph and related equipment | |
| CN109978619B (en) | Air ticket pricing strategy screening method, system, device and medium | |
| CN111612038A (en) | Abnormal user detection method and device, storage medium and electronic equipment | |
| WO2020140624A1 (en) | Method for extracting data from log, and related device | |
| CN116108276A (en) | Artificial intelligence-based information recommendation method, device and related equipment | |
| CN111651452B (en) | Data storage method, device, computer equipment and storage medium | |
| CN114372469A (en) | Method, system and storage medium for extracting entity sample | |
| CN114722801B (en) | Government data classification storage method and related device | |
| CN115237941A (en) | Data reporting method and device, electronic equipment and computer readable storage medium | |
| CN114881027A (en) | Information distribution method, device, electronic equipment and information classification model | |
| CN114140241A (en) | A kind of abnormal identification method and device of transaction monitoring index | |
| CN117520994B (en) | Method and system for identifying abnormal air ticket searching user based on user portrait and clustering technology | |
| CN118211102A (en) | Intelligent disease category analysis method and device, electronic equipment and storage medium | |
| CN111949867A (en) | Cross-APP user behavior analysis model training method, analysis method and related equipment | |
| CN113159363B (en) | Event trend prediction method based on historical news reports |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201124 |