[go: up one dir, main page]

CN115858603A - An intelligent retrieval method for business start-up standard matters that integrates multiple attributes - Google Patents

An intelligent retrieval method for business start-up standard matters that integrates multiple attributes Download PDF

Info

Publication number
CN115858603A
CN115858603A CN202211684820.8A CN202211684820A CN115858603A CN 115858603 A CN115858603 A CN 115858603A CN 202211684820 A CN202211684820 A CN 202211684820A CN 115858603 A CN115858603 A CN 115858603A
Authority
CN
China
Prior art keywords
enterprise
standard
attribute
starting
items
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211684820.8A
Other languages
Chinese (zh)
Other versions
CN115858603B (en
Inventor
李炎
钟涛
苏旷宇
李家豪
淳林
方明
禹勇平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD
Original Assignee
HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD filed Critical HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD
Priority to CN202211684820.8A priority Critical patent/CN115858603B/en
Publication of CN115858603A publication Critical patent/CN115858603A/en
Application granted granted Critical
Publication of CN115858603B publication Critical patent/CN115858603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种融合多元属性的企业开办标准事项智能检索方法、所述方法包括步骤:包括步骤:S1、企业历史注册数据预处理与统计分析,获得各行业分类出现次数;S2、企业开办标准事项目录生成;S3、企业开办标准事项多元属性圈层构建,形成完整企业开办事项多元属性知识库;S4:基于企业开办事项多元属性知识库的智能检索结果输出。本申请既可以解决目前企业开办事项目录不全的问题,也可以解决办事者面对大量可选择的企业开办事项无从下手的问题,同时后续还可解决企业经营范围推荐及企业开办所需事项一次办两个难题,很好的解决从办事者可能输入的多元检索词到待办企业开办事项的精准定位问题,为老百姓提供更为便捷的办事途径。

Figure 202211684820

This application discloses an intelligent search method for standard items of business start-up that integrates multiple attributes. The method includes steps: including steps: S1, preprocessing and statistical analysis of enterprise historical registration data, and obtaining the occurrence times of each industry classification; S2, business start-up The catalog of standard items is generated; S3, the multi-attribute circle construction of standard items for business start-up, forming a complete multi-attribute knowledge base for business start-up items; S4: the output of intelligent retrieval results based on the multi-attribute knowledge base for business start-up items. This application can not only solve the problem of the current incomplete catalog of business start-up items, but also solve the problem that the service provider is unable to start with a large number of optional business start-up items. At the same time, it can also solve the business scope recommendation and business start-up. Two problems are well solved, from the multiple search terms that service workers may input to the precise positioning of pending business start-up items, and provide more convenient ways for ordinary people to do business.

Figure 202211684820

Description

融合多元属性的企业开办标准事项智能检索方法An intelligent retrieval method for business start-up standard matters that integrates multiple attributes

技术领域technical field

本申请涉及信息处理的技术领域,特别地,涉及一种融合多元属性的企业开办标准事项智能检索方法。This application relates to the technical field of information processing, in particular, to an intelligent retrieval method for business start-up standard items that integrates multiple attributes.

背景技术Background technique

近年来,国家市场监管总局要求实现企业开办一件事一次办,要采取措施将企业开办时间压减至4个工作日内,各地政府开始推行线上企业开办平台,这对老百姓和工作人员均提出了较高的要求。以往线下办理时,办事者和工作人员可以就企业开办具体内容进行面对面沟通,而在线上办理时,需要办事者自身预先对企业开办的具体内容有一定了解,同时也需要在线上企业开办平台中提供完整的可供选择的企业开办事项办理指导目录,而目前企业开办事项目录大多存在不全的问题。选择正确的企业开办事项名称,是快速准确完成企业注册的第一步,但是在现实情况中,由于办事者自身对待办事项的理解与待办事项的标准理解会出现偏差,因此往往会出现企业开办检索结果与办事者输入的检索关键词无明显关联的情况。In recent years, the State Administration of Market Regulation has required that one thing be done once for starting a business, and measures should be taken to reduce the time for starting a business to within 4 working days. Local governments have begun to implement online business start-up platforms, which will affect the common people and staff Higher requirements were put forward. In the past, when handling offline, the operator and the staff can communicate face-to-face on the specific content of starting a business, but when handling online, it is necessary for the operator to have a certain understanding of the specific content of starting a business in advance, and an online business starting platform is also required Provides a complete alternative guide catalog for business start-up matters, but most of the current business start-up matters catalogs are incomplete. Choosing the correct name of the business start-up item is the first step to quickly and accurately complete the business registration. However, in reality, due to the deviation between the operator's own understanding of the to-do item and the standard understanding of the to-do item, there will often be a business There is no obvious relationship between search results and the search keywords entered by the service provider.

发明内容Contents of the invention

针对上述技术问题,本申请一方面提供了一种融合多元属性的企业开办标准事项智能检索方法。In view of the above technical problems, on the one hand, the present application provides an intelligent retrieval method for business start-up standard items that integrates multiple attributes.

本申请采用的技术方案如下:The technical scheme that this application adopts is as follows:

一种融合多元属性的企业开办标准事项智能检索方法,包括步骤:An intelligent retrieval method for business start-up standard matters that integrates multiple attributes, comprising the steps of:

S1、企业历史注册数据预处理与统计分析:读取个体和企业历史注册数据,对数据进行清洗和预处理,随后基于处理后的数据进行统计分析,获得各行业分类出现次数;S1. Preprocessing and statistical analysis of enterprise historical registration data: read individual and enterprise historical registration data, clean and preprocess the data, and then conduct statistical analysis based on the processed data to obtain the occurrence times of each industry classification;

S2、企业开办标准事项目录生成:以国民经济行业分类为基础,结合个体和企业历史注册数据,生成企业开办标准事项目录;S2. Generation of the catalog of standard items for starting a business: Based on the classification of national economic industries, combined with the historical registration data of individuals and companies, a catalog of standard items for starting a business is generated;

S3、企业开办标准事项多元属性圈层构建:为企业开办标准事项定义多元属性圈层的概念,对每一圈层通过自然语言处理技术获得企业开办相关知识,形成完整企业开办事项多元属性知识库;S3. Construction of multi-attribute circles of standard items for business start-up: define the concept of multi-attribute circles for standard items of business start-up, obtain relevant knowledge of business start-up through natural language processing technology for each circle, and form a complete multi-attribute knowledge base of business start-up items ;

S4:基于企业开办事项多元属性知识库的智能检索结果输出:借助自然语言处理技术和同义词模型,获得检索关键词的关键词列表,根据关键词命中的属性类型确定不同事项的匹配度得分,从而给出企业开办事项的智能检索结果推荐。S4: Output of intelligent retrieval results based on multi-attribute knowledge base of business start-up items: With the help of natural language processing technology and synonym model, obtain the keyword list of retrieval keywords, and determine the matching score of different items according to the attribute type hit by keywords, so as to Provide intelligent search result recommendations for business start-up matters.

进一步地,所述步骤S1具体包括步骤:Further, the step S1 specifically includes the steps of:

步骤S11、缺失值处理:读取个体和企业历史注册数据,将数据中缺少字号名称,即缺少企业名称或个体名称的数据作删除处理;Step S11, missing value processing: read the historical registration data of individuals and enterprises, and delete the data that lacks the name of the trade name in the data, that is, the data that lacks the name of the enterprise or the name of the individual;

步骤S12、基于行业分类的分组统计分析:基于原始数据中的行业分类字段,对数据进行分组统计,获得各行业分类出现次数。Step S12. Group statistical analysis based on industry classification: Based on the industry classification field in the original data, perform group statistics on the data to obtain the occurrence times of each industry classification.

进一步地,所述步骤S2具体包括步骤:Further, the step S2 specifically includes the steps of:

步骤S21、基于国家标准初步形成企业开办标准事项目录:首先基于国民经济行业分类表中的行业细分类,结合行业分类具体内涵以及企业开办相关业务知识,合并得到企业开办标准事项目录初步版本;Step S21, Preliminarily forming a catalog of standard items for starting a business based on national standards: First, based on the subdivision of industries in the National Economic Industry Classification Table, combined with the specific connotation of the industry classification and business knowledge related to starting a business, merge to obtain a preliminary version of the catalog of standard items for starting a business;

步骤S22、结合地域特色调整企业开办标准事项目录:为体现地域企业开办的时空特色,结合地区个体和企业一年内的历史注册数据统计分析结果,对步骤S21中得到的标准事项目录初步版本进行调整,形成完整的企业开办标准事项目录。Step S22. Adjusting the catalog of standard items for starting a business based on regional characteristics: In order to reflect the spatial and temporal characteristics of starting a business in a region, adjust the preliminary version of the catalog of standard items obtained in step S21 in combination with the statistical analysis results of historical registration data of individuals and companies in the region within one year , forming a complete catalog of standard items for starting a business.

进一步地,所述步骤S3具体包括:Further, the step S3 specifically includes:

步骤S31:行业形态圈的扩充:首先借鉴经营范围条目的内容进行行业形态扩充,即根据经营范围登记规范表述目录中经营范围条目与行业分类的对应关系,匹配出各行业分类对应的经营范围条目活动内容,获得行业形态;再结合行业代码与企业开办标准事项的对应关系,将行业形态与企业开办标准事项进行关联,从而实现每一件企业开办标准事项对应行业形态圈的扩充;Step S31: Expansion of the industry form circle: First, use the content of the business scope entry to expand the industry form, that is, according to the business scope registration specification to express the corresponding relationship between the business scope entry and the industry classification in the catalog, and match the business scope entries corresponding to each industry classification According to the content of the activity, the industry form is obtained; combined with the corresponding relationship between the industry code and the standard items for starting a business, the industry form is associated with the standard items for starting a business, so as to realize the expansion of the industry form circle corresponding to each standard item for starting a business;

步骤S32、事项表述圈的扩充:基于个体和企业的历史注册数据,对个体和企业注册名称进行分词处理,获得常用的行业表述,即企业注册名称中的俗语化表达;结合行业代码与企业开办标准事项的对应关系,将行业表述与企业开办标准事项进行关联,从而实现每一件企业开办标准事项对应行业种类圈的扩充;Step S32. Expansion of the event expression circle: Based on the historical registration data of individuals and enterprises, word segmentation is performed on the registered names of individuals and enterprises to obtain commonly used industry expressions, that is, colloquial expressions in enterprise registered names; combined with industry codes and enterprise start-up Corresponding relationship of standard items, linking industry expressions with standard items for starting a business, so as to realize the expansion of the industry category circle corresponding to each standard item for starting a business;

步骤S33、品牌圈的扩充:结合爬虫和自然语言处理技术,借鉴包括中国品牌网、中国加盟网、买购网的网站内容实现各行业品牌数据的获取,并将品牌数据与企业开办标准事项进行关联,形成完整企业开办事项多元属性知识库。Step S33. Expansion of the brand circle: Combining crawler and natural language processing technology, referring to the content of websites including China Brand Network, China Franchise Network, and Maigou.com to obtain brand data of various industries, and compare brand data with business start-up standards Correlation to form a complete multi-attribute knowledge base of business start-up matters.

进一步地,步骤S33具体包括以下步骤:Further, step S33 specifically includes the following steps:

S331、品牌数据获取:从中国品牌网、中国加盟网、买购网获取品牌数据,并对数据进行结构化处理;S331. Acquisition of brand data: obtain brand data from China Brand Network, China Franchise Network, and Maigou Network, and perform structured processing on the data;

S332、品牌数据与企业开办事项匹配:使用自然语言处理技术实现品牌数据与企业开办事项的自动匹配,并通过人工对已匹配数据进行核验,对未匹配数据进行关联,从而完成品牌到企业开办标准事项的映射,形成完整企业开办事项多元属性知识库。S332. Matching of brand data and business start-up items: use natural language processing technology to realize automatic matching of brand data and business start-up items, and manually verify the matched data and associate unmatched data, thereby completing the brand to business start-up standard The mapping of items forms a complete multi-attribute knowledge base of business start-up items.

进一步地,所述步骤S4具体包括步骤:Further, the step S4 specifically includes the steps of:

S41、首先将输入检索关键词与企业开办事项多元属性知识库中行业形态、事项表述以及品牌词进行全文匹配查询,若能够完全匹配,则直接推荐该关键词关联的企业开办标准事项;S41. Firstly, perform a full-text matching query of the input search keyword and the multi-attribute knowledge base of business start-up matters, the industry form, item expression and brand words, and if they can be completely matched, directly recommend the business start-up standard items associated with the keyword;

S42、若未能找到完全匹配的词,则利用自然语言处理技术对检索关键词进行分词处理,获得检索关键词组,利用纠错模型获得正确的输入检索关键词,再根据同义词模型获得与检索关键词相似的同义词组;S42. If no exact matching word is found, use natural language processing technology to perform word segmentation processing on the search keyword to obtain a search keyword group, use the error correction model to obtain the correct input search keyword, and then obtain the search keyword according to the synonym model synonymous phrases with similar words;

S43、把步骤S42的结果作为检索关键词,对企业开办标准事项以及其多元圈层属性进行检索,根据检索关键词的匹配度,以及设置的检索关键词与属性字段权重顺序,返回得分最高的N项企业开办标准事项,作为智能检索的推荐结果。S43. Use the result of step S42 as the search keyword to search the standard items for starting a business and its multi-circle attributes, and return the one with the highest score according to the matching degree of the search keyword and the set search keyword and attribute field weight order N items of standard items for starting a business are recommended as intelligent search results.

进一步地,所述步骤S43具体包括步骤:Further, the step S43 specifically includes the steps of:

S431、在检索企业开办标准事项目录时,进行权重的分配,其中,对于检索关键词的权重Wti分配为:原始检索关键词的权重为Wt1,步骤S42中通过分词算法获得的分词关键词词组权重为Wt2以及根据同义词模型获得同义关键词词组权重为Wt3,检索时对于检索关键词的权重设置为Wt1>Wt2>Wt3;其中,对于检索对象企业开办标准事项目录以及其多元属性的权重Wsj分配原则为:步骤S2中获得的标准事项名称权重为Ws1,步骤S31获得的行业形态属性权重为Ws2,步骤S32获得的事项表述属性权重为Ws3,步骤S33获得品牌属性权重为Ws4,检索时对于检索目标属性的权重设置为Ws1>Ws2>Ws3>Ws4S431. When retrieving the catalog of standard matters for starting an enterprise, perform weight distribution, wherein, the weight Wt i of the retrieval keyword is distributed as follows: the weight of the original retrieval keyword is Wt 1 , and the word segmentation keyword obtained by the word segmentation algorithm in step S42 The weight of the phrase is Wt 2 and the weight of the synonymous keyword phrase is Wt 3 obtained according to the synonym model, and the weight of the retrieval keyword is set to Wt 1 >Wt 2 >Wt 3 during retrieval; among them, for the search object enterprise to open a catalog of standard items and The distribution principle of the weight Ws j of its multiple attributes is as follows: the weight of the standard item name obtained in step S2 is Ws 1 , the weight of the industry form attribute obtained in step S31 is Ws 2 , the weight of the item expression attribute obtained in step S32 is Ws 3 , and step S33 The weight of the obtained brand attribute is Ws 4 , and the weight of the retrieval target attribute is set to Ws 1 >Ws 2 >Ws 3 >Ws 4 during retrieval;

S432、检索词命中标准事项的得分计算算法为传统的TF-IDF,检索词命中标准事项的得分为Sij,其中,i表示该检索词的来源(i=1表示该检索词为输入检索关键词,i=2表示该检索词为输入检索关键词的分词结果,i=3表示该检索词为根据同义词模型获得的同义词);j表示该检索词命中的属性圈层(j=1表示该检索词命中的是标准事项名称,j=2表示该检索词命中的是行业形态属性,j=3表示该检索词命中的是事项表述属性,j=4表示该检索词命中的是品牌属性),其中,S11则表示输入检索关键词命中标准事项名称的得分;S21表示分词关键词命中标准事项名称的得分;S32表示同义关键词命中行业形态属性的得分;S34表示同义关键词命中品牌属性的得分;S432, the score calculation algorithm of the search term hitting the standard item is traditional TF-IDF, and the score of the search term hitting the standard item is Sij, wherein, i represents the source of the search term (i=1 indicates that the search term is an input search keyword , i=2 indicates that the search term is the word segmentation result of the input search keyword, i=3 indicates that the search term is a synonym obtained according to the synonym model); j indicates the attribute circle layer hit by the search term (j=1 indicates that the search term What the term hits is the standard item name, j=2 means that what the search term hits is the industry form attribute, j=3 means that what the search term hits is the item expression attribute, and j=4 means that what the search term hits is the brand attribute), Among them, S11 represents the score of the input retrieval keyword hitting the name of the standard item; S21 represents the score of the word segmentation keyword hitting the name of the standard item; S32 represents the score of the synonymous keyword hitting the industry form attribute; S34 represents the synonymous key The score of the word hitting the brand attribute;

计算标准事项被检索命中的最终得分:Compute the final score for the criteria item being retrieved hits:

Figure BDA0004019361450000051
Figure BDA0004019361450000051

其中,Wti为检索关键词的权重,Wsj表示企业开办标准事项目录以及其多元属性的权重,Sij则表示检索词命中标准事项的得分,根据检索得到的各标准事项最终得分进行排序返回得分最高的N项企业开办标准事项,作为智能检索的推荐结果。Among them, Wt i is the weight of the search keywords, Ws j is the weight of the catalog of standard items for starting a business and its multi-attributes, S ij is the score of the search terms hitting the standard items, sorted and returned according to the final scores of each standard item retrieved The N items with the highest scores as the standard items for starting a business will be used as the recommended results of the intelligent search.

本申请另一方面还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现所述融合多元属性的企业开办标准事项智能检索方法的步骤。Another aspect of the present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the fusion of multiple attributes is realized. The steps of the intelligent retrieval method for the standard matters for the establishment of the enterprise.

本申请另一方面还提供了一种存储介质,所述存储介质包括存储的程序,在所述程序运行时控制所述存储介质所在的设备执行所述融合多元属性的企业开办标准事项智能检索方法的步骤。Another aspect of the present application also provides a storage medium, the storage medium includes a stored program, and when the program is running, the device where the storage medium is located is controlled to execute the intelligent retrieval method for enterprise start-up standard items that integrate multiple attributes A step of.

相比现有技术,本申请具有以下有益效果:Compared with the prior art, the present application has the following beneficial effects:

本申请提供了融合多元属性的企业开办标准事项智能检索方法,所述方法包括步骤:S1、企业历史注册数据预处理与统计分析:读取个体和企业历史注册数据,对数据进行清洗和预处理,随后基于处理后的数据进行统计分析,获得各行业分类出现次数;S2、企业开办标准事项目录生成:以国民经济行业分类为基础,结合个体和企业历史注册数据,生成企业开办标准事项目录;S3、企业开办标准事项多元属性圈层构建:为企业开办标准事项定义多元属性圈层的概念,对每一圈层通过自然语言处理技术获得企业开办相关知识,形成完整企业开办事项多元属性知识库;S4:基于企业开办事项多元属性知识库的智能检索结果输出:借助自然语言处理技术和同义词模型,获得检索关键词的关键词列表,根据关键词命中的属性类型确定不同事项的匹配度得分,从而给出企业开办事项的智能检索结果推荐。This application provides an intelligent retrieval method for business start-up standard items that integrates multiple attributes. The method includes steps: S1. Preprocessing and statistical analysis of historical registration data of enterprises: reading individual and historical registration data of enterprises, cleaning and preprocessing the data , and then conduct statistical analysis based on the processed data to obtain the occurrence times of each industry classification; S2. Generation of the catalog of standard items for starting a business: based on the classification of national economic industries, combined with the historical registration data of individuals and companies, a catalog of standard items for starting a business is generated; S3. Construction of multi-attribute circles of standard items for business start-up: define the concept of multi-attribute circles for standard items of business start-up, obtain relevant knowledge of business start-up through natural language processing technology for each circle, and form a complete multi-attribute knowledge base of business start-up items ; S4: Output of intelligent retrieval results based on multi-attribute knowledge base of business start-up items: with the help of natural language processing technology and synonym model, obtain the keyword list of retrieval keywords, and determine the matching scores of different items according to the attribute types hit by keywords, In this way, the intelligent retrieval result recommendation of the business start-up items is given.

本申请从国家标准的行业分类出发,结合地方工商注册历史数据,可建立完备的具有地域特色的企业开办标准事项目录,再结合自然语言处理技术,实现了企业开办事项多元属性的扩充与完善,从而构建了企业开办事项的多元属性知识库,最后基于该多元属性知识库提出了一种企业开办事项的智能检索推荐方法。从而实现快速准确地为办事者提供企业开办标准事项智能检索结果。This application starts from the industry classification of the national standard, combined with the historical data of local industrial and commercial registration, can establish a complete list of standard items for business start-up with regional characteristics, and combined with natural language processing technology, it realizes the expansion and improvement of multiple attributes of business start-up items. Therefore, the multi-attribute knowledge base of business start-up items is constructed, and finally an intelligent retrieval and recommendation method for business start-up items is proposed based on the multi-attribute knowledge base. In this way, it is possible to quickly and accurately provide service providers with intelligent retrieval results of standard items for starting a business.

本申请主要解决了以下技术问题:结合国家行业分类标准和地方工商注册历史数据,建立了完备的企业开办标准事项目录库;对事项的定义多元属性圈层的概念,对每一圈层通过自然语言处理技术获得相关扩充词库;基于构建的企业开办事项多元属性圈层,利用智能检索技术,实现从检索词到事项的全方位快速精准定位。基于本申请,既可以解决目前企业开办事项目录不全的问题,也可以解决办事者面对大量可选择的企业开办事项无从下手的问题,同时后续还可解决企业经营范围推荐及企业开办所需事项一次办两个难题,很好的解决从办事者可能输入的多元检索词到待办企业开办事项的精准定位问题,为老百姓提供更为便捷的办事途径。This application mainly solves the following technical problems: Combining national industry classification standards and local industrial and commercial registration historical data, a complete catalog of business start-up standard items is established; the concept of multiple attribute circles is defined for items, and each circle is naturally The language processing technology obtains relevant expanded thesaurus; based on the multi-attribute circle of business start-up matters, intelligent retrieval technology is used to realize all-round rapid and accurate positioning from search words to matters. Based on this application, it can not only solve the problem of the current incomplete catalog of business start-up items, but also solve the problem that the service provider is unable to start with a large number of optional business start-up items. At the same time, it can also solve the business scope recommendation and business start-up requirements. Handling two problems at a time can well solve the problems ranging from the multiple search terms that service workers may input to the precise positioning of to-do business start-up items, and provide more convenient ways for ordinary people to do business.

除了上面所描述的目的、特征和优点之外,本申请还有其它的目的、特征和优点。下面将参照附图,对本申请作进一步详细的说明。In addition to the objects, features and advantages described above, the present application has other objects, features and advantages. The application will be described in further detail below with reference to the accompanying drawings.

附图说明Description of drawings

构成本申请的一部分的附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings constituting a part of the application are used to provide further understanding of the application, and the schematic embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation to the application. In the attached picture:

图1是本申请优选实施例的融合多元属性的企业开办标准事项智能检索方法流程示意图。Fig. 1 is a schematic flowchart of an intelligent retrieval method for business start-up standard items integrating multiple attributes in a preferred embodiment of the present application.

图2是本申请优选实施例的步骤S1的子步骤流程示意图。Fig. 2 is a schematic flowchart of the sub-steps of step S1 in the preferred embodiment of the present application.

图3是本申请优选实施例的步骤S2的子步骤流程示意图。Fig. 3 is a schematic flowchart of the sub-steps of step S2 in the preferred embodiment of the present application.

图4是本申请优选实施例的步骤S3的子步骤流程示意图。Fig. 4 is a schematic flowchart of the sub-steps of step S3 in the preferred embodiment of the present application.

图5是本申请优选实施例的步骤S33的子步骤流程示意图。Fig. 5 is a schematic flowchart of the sub-steps of step S33 in the preferred embodiment of the present application.

图6是本申请优选实施例的步骤S4的子步骤流程示意图。Fig. 6 is a schematic flowchart of the sub-steps of step S4 in the preferred embodiment of the present application.

图7是本申请优选实施例的电子设备实体示意框图。Fig. 7 is a schematic block diagram of an electronic device entity in a preferred embodiment of the present application.

图8是本申请优选实施例的计算机设备的内部结构图。FIG. 8 is an internal structure diagram of a computer device in a preferred embodiment of the present application.

具体实施方式Detailed ways

需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

参照图1,在本发明的一个实施例中,提供了一种融合多元属性的企业开办标准事项智能检索方法,基于2017年10月01日实施的国民经济行业分类规范文档,结合利用XX省2021.10.15至2022.10.15一年内的个体和企业注册数据,按照图1所示的流程,首先,对企业历史注册数据进行预处理与统计分析,随后根据国民经济行业分类规范文档梳理出企业开办事项初步目录,结合历史数据反映出来的地域特色对初步目录进行调整,最终得到693件企业开办事项形成标准目录;其次,利用爬虫技术、自然语言处理技术,生成企业开办事项知识库;最后,将上述企业开办事项知识库同步到ES数据库中,基于企业开办事项多元属性知识库实现企业开办标准事项的智能检索推荐,具体包括步骤:Referring to Figure 1, in one embodiment of the present invention, an intelligent retrieval method for business start-up standard items is provided that integrates multiple attributes, based on the national economic industry classification specification document implemented on October 1, 2017, combined with the use of XX Province 2021.10 .15 to 2022.10.15, according to the process shown in Figure 1, first, preprocess and statistically analyze the historical registration data of enterprises, and then sort out the business start-up items according to the national economic industry classification specification documents The preliminary catalog is adjusted according to the regional characteristics reflected in the historical data, and finally 693 business start-up items are obtained to form a standard catalog; secondly, a knowledge base of business start-up items is generated by using crawler technology and natural language processing technology; finally, the above-mentioned The knowledge base of business start-up matters is synchronized to the ES database, and based on the multi-attribute knowledge base of business start-up matters, intelligent retrieval and recommendation of standard business start-up matters are realized, including steps:

S1、企业历史注册数据预处理与统计分析:读取个体和企业历史注册数据,对数据进行清洗和预处理,随后基于处理后的数据进行统计分析,获得各行业分类出现次数;S1. Preprocessing and statistical analysis of enterprise historical registration data: read individual and enterprise historical registration data, clean and preprocess the data, and then conduct statistical analysis based on the processed data to obtain the occurrence times of each industry classification;

S2、企业开办标准事项目录生成:以国民经济行业分类为基础,结合个体和企业历史注册数据,生成企业开办标准事项目录;S2. Generation of the catalog of standard items for starting a business: Based on the classification of national economic industries, combined with the historical registration data of individuals and companies, a catalog of standard items for starting a business is generated;

S3、企业开办标准事项多元属性圈层构建:为企业开办标准事项定义多元属性圈层的概念,对每一圈层通过自然语言处理技术获得企业开办相关知识,形成完整企业开办事项多元属性知识库;S3. Construction of multi-attribute circles of standard items for business start-up: define the concept of multi-attribute circles for standard items of business start-up, obtain relevant knowledge of business start-up through natural language processing technology for each circle, and form a complete multi-attribute knowledge base of business start-up items ;

S4:基于企业开办事项多元属性知识库的智能检索结果输出:借助自然语言处理技术和同义词模型,获得检索关键词的关键词列表,根据关键词命中的属性类型确定不同事项的匹配度得分,从而给出企业开办事项的智能检索结果推荐。S4: Output of intelligent retrieval results based on multi-attribute knowledge base of business start-up items: With the help of natural language processing technology and synonym model, obtain the keyword list of retrieval keywords, and determine the matching score of different items according to the attribute type hit by keywords, so as to Provide intelligent search result recommendations for business start-up matters.

本实施例从国家标准的行业分类出发,结合地方工商注册历史数据,可建立完备的具有地域特色的企业开办标准事项目录,再结合自然语言处理技术,实现了企业开办事项多元属性的扩充与完善,从而构建了企业开办事项的多元属性知识库,最后基于该多元属性知识库提出了一种企业开办事项的智能检索推荐方法。从而实现快速准确地为办事者提供企业开办标准事项智能检索结果。In this embodiment, starting from the industry classification of national standards, combined with the historical data of local industrial and commercial registration, a complete catalog of business start-up standard items with regional characteristics can be established, and combined with natural language processing technology, the expansion and improvement of multiple attributes of business start-up items are realized. , so as to build a multi-attribute knowledge base of business start-up items, and finally propose an intelligent retrieval and recommendation method for business start-up items based on the multi-attribute knowledge base. In this way, it is possible to quickly and accurately provide service providers with intelligent retrieval results of standard items for starting a business.

本实施例主要解决了以下技术问题:结合国家行业分类标准和地方工商注册历史数据,建立了完备的企业开办标准事项目录库;对事项的定义多元属性圈层的概念,对每一圈层通过自然语言处理技术获得相关扩充词库;基于构建的企业开办事项多元属性圈层,利用智能检索技术,实现从检索词到事项的全方位快速精准定位。基于本实施例,既可以解决目前企业开办事项目录不全的问题,也可以解决办事者面对大量可选择的企业开办事项无从下手的问题,同时后续还可解决企业经营范围推荐及企业开办所需事项一次办两个难题,很好的解决从办事者可能输入的多元检索词到待办企业开办事项的精准定位问题,为老百姓提供更为便捷的办事途径。This embodiment mainly solves the following technical problems: Combining the national industry classification standards and local industrial and commercial registration historical data, a complete catalog of business start-up standard items is established; The natural language processing technology obtains relevant expanded thesaurus; based on the multi-attribute circle of business start-up matters, intelligent retrieval technology is used to realize all-round rapid and accurate positioning from search terms to matters. Based on this embodiment, it can not only solve the problem that the current list of business start-up items is incomplete, but also solve the problem that the service provider is unable to start with a large number of optional business start-up items. At the same time, it can also solve the business scope recommendation and business start-up requirements in the future. The problem of handling two items at a time is a good solution to the problem of multiple search terms that service providers may input to the precise positioning of pending business start-up items, and provides a more convenient way for ordinary people to handle affairs.

在本发明的一个实施例中,基于2017年10月01日实施的国民经济行业分类规范文档,结合利用XX省2021.10.15至2022.10.15一年内的个体和企业注册数据,按照图1所示的流程,首先,对企业历史注册数据进行预处理与统计分析,随后根据国民经济行业分类规范文档梳理出企业开办事项初步目录,结合历史数据反映出来的地域特色对初步目录进行调整,最终得到693件企业开办事项形成标准目录。其次,利用爬虫技术、自然语言处理技术,生成企业开办事项知识库。最后,将上述企业开办事项知识库同步到ES数据库中,基于企业开办事项多元属性知识库实现企业开办标准事项的智能检索推荐。In one embodiment of the present invention, based on the national economic industry classification specification document implemented on October 1, 2017, combined with the individual and enterprise registration data of XX province from 2021.10.15 to 2022.10.15, as shown in Figure 1 Firstly, preprocessing and statistical analysis are performed on the historical registration data of the enterprise, and then a preliminary catalog of business start-up items is sorted out according to the national economic industry classification specification documents, and the preliminary catalog is adjusted in combination with the regional characteristics reflected in the historical data, and finally 693 Form a standard catalog of business start-up items. Secondly, use crawler technology and natural language processing technology to generate a knowledge base of business start-up matters. Finally, the above-mentioned enterprise start-up item knowledge base is synchronized to the ES database, and based on the enterprise start-up item multi-attribute knowledge base, the intelligent retrieval and recommendation of enterprise start-up standard items is realized.

优选地,如图2所示,所述步骤S1具体包括步骤:Preferably, as shown in Figure 2, the step S1 specifically includes the steps of:

步骤S11、缺失值处理:读取个体和企业历史注册数据,将数据中缺少字号名称,即缺少企业名称或个体名称的数据作删除处理,本实施例读取XX省2021.10.15至2022.10.15一年内的个体和企业注册数据,其中个体工商户注册数据共1191164条,企业注册数据共356139条,读取后得到的个体工商户注册数据中缺少字号名称的数据有212984条,企业注册数据中未发现缺少字号名称的数据,因此最后处理完剩余数据1334319条;Step S11. Missing value processing: read the historical registration data of individuals and enterprises, and delete the data that lacks the name of the trade name in the data, that is, the data that lacks the name of the enterprise or the name of the individual. This embodiment reads XX province from 2021.10.15 to 2022.10.15 Individual and enterprise registration data within one year, including 1,191,164 individual industrial and commercial household registration data, and 356,139 enterprise registration data, 212,984 individual industrial and commercial household registration data missing the name of the company after reading, and enterprise registration data No data lacking font size and name was found, so the remaining 1,334,319 pieces of data were finally processed;

步骤S12、基于行业分类的分组统计分析:基于原始数据中的行业分类字段,对数据进行分组统计,获得各行业分类下个体或企业注册频数。Step S12. Group statistical analysis based on industry classification: Based on the industry classification field in the original data, perform group statistics on the data to obtain individual or enterprise registration frequencies under each industry classification.

优选地,如图3所示,所述步骤S2具体包括步骤:Preferably, as shown in Figure 3, the step S2 specifically includes the steps of:

步骤S21、基于国家标准初步形成企业开办标准事项目录:首先基于国民经济行业分类表中的行业细分类,结合行业分类具体内涵以及企业开办相关业务知识,合并得到600件企业开办标准事项形成企业开办标准事项目录初步版本;Step S21. Preliminarily form a catalog of standard items for starting a business based on national standards: firstly, based on the subdivision of industries in the National Economic Industry Classification Table, combined with the specific connotation of the industry classification and business knowledge related to starting a business, merge and obtain 600 standard items for starting a business to form a business start The preliminary version of the catalog of standard items;

步骤S22、结合地域特色调整企业开办标准事项目录:为体现地域企业开办的时空特色,结合地区个体和企业一年内的历史注册数据统计分析结果,对步骤S21中得到的标准事项目录初步版本进行调整,形成完整的企业开办标准事项目录,本实施例根据步骤S3统计得到的结果,将出现次数大于等于总数据量千分之一(即出现频数≥1334)的国民经济行业分类单独作为一个标准事项,最终得到693个企业开办标准事项。Step S22. Adjusting the catalog of standard items for starting a business based on regional characteristics: In order to reflect the spatial and temporal characteristics of starting a business in a region, adjust the preliminary version of the catalog of standard items obtained in step S21 in combination with the statistical analysis results of historical registration data of individuals and companies in the region within one year , to form a complete list of standard items for starting a business. In this embodiment, according to the statistical results obtained in step S3, the classification of national economic industries whose frequency of occurrence is greater than or equal to one-thousandth of the total data volume (that is, the frequency of occurrence ≥ 1334) is separately regarded as a standard item. , and finally got 693 standard items for starting a business.

优选地,如图4所示,所述步骤S3具体包括:Preferably, as shown in Figure 4, the step S3 specifically includes:

步骤S31:行业形态圈的扩充:首先借鉴经营范围条目的内容进行行业形态扩充,即根据经营范围登记规范表述目录中经营范围条目与行业分类的对应关系,匹配出各行业分类对应的经营范围条目活动内容,获得行业形态;再结合行业代码与企业开办标准事项的对应关系,将行业形态与企业开办标准事项进行关联,从而实现每一件企业开办标准事项对应行业形态圈的扩充;Step S31: Expansion of the industry form circle: First, use the content of the business scope entry to expand the industry form, that is, according to the business scope registration specification to express the corresponding relationship between the business scope entry and the industry classification in the catalog, and match the business scope entries corresponding to each industry classification According to the content of the activity, the industry form is obtained; combined with the corresponding relationship between the industry code and the standard items for starting a business, the industry form is associated with the standard items for starting a business, so as to realize the expansion of the industry form circle corresponding to each standard item for starting a business;

步骤S32、事项表述圈的扩充:基于个体和企业的历史注册数据,对个体和企业注册名称进行分词处理,获得常用的行业表述,即企业注册名称中的俗语化表达,结合行业代码与企业开办标准事项的对应关系,将行业表述与企业开办标准事项进行关联,从而实现每一件企业开办标准事项对应行业种类圈的扩充;Step S32. Expansion of the event expression circle: Based on the historical registration data of individuals and enterprises, word segmentation is performed on the registered names of individuals and enterprises to obtain commonly used industry expressions, that is, colloquial expressions in enterprise registration names, combined with industry codes and business start-up Corresponding relationship of standard items, linking industry expressions with standard items for starting a business, so as to realize the expansion of the industry category circle corresponding to each standard item for starting a business;

具体地,本步骤进行个体或企业字号名称分词时,利用自然语言处理技术对个体或企业注册名称进行分词处理,获得关键词列表。如,湖南科创信息技术股份有限公司,分词结果为[“湖南”,“科创”,“信息技术”,“股份有限公司”];Specifically, when segmenting individual or enterprise names in this step, use natural language processing technology to perform word segmentation processing on individual or enterprise registered names to obtain a list of keywords. For example, Hunan Kechuang Information Technology Co., Ltd., the word segmentation result is ["Hunan", "Kechuang", "Information Technology", "Co., Ltd"];

本步骤在进行行业表述提取时,通过计算关键词与前期积累的行业表述字典表中各行业表述的相似性,得到最匹配的行业表述,从个体或企业字号名称分词时中提到的企业名称中提取出的行业表述为“信息技术”;In this step, when extracting industry expressions, the most matching industry expressions are obtained by calculating the similarity between the keywords and the industry expressions in the industry expression dictionary table accumulated in the previous stage. The industry extracted from is expressed as "information technology";

最后,本步骤结合行业分类与企业开办标准事项的对应关系,将行业表述与企业开办标准事项进行关联,从而实现每一件企业开办标准事项对应行业种类圈的扩充;Finally, this step combines the corresponding relationship between industry classification and business start-up standard items, and associates industry expressions with business start-up standard items, so as to realize the expansion of the industry category circle corresponding to each business start-up standard item;

步骤S33、品牌圈的扩充:结合爬虫和自然语言处理技术,借鉴包括中国品牌网、中国加盟网、买购网的网站内容实现各行业品牌数据的获取,并将品牌数据与企业开办标准事项进行关联,形成完整企业开办事项多元属性知识库。Step S33. Expansion of the brand circle: Combining crawler and natural language processing technology, referring to the content of websites including China Brand Network, China Franchise Network, and Maigou.com to obtain brand data of various industries, and compare brand data with business start-up standards Correlation to form a complete multi-attribute knowledge base of business start-up matters.

优选地,如图5所示,步骤S33具体包括以下步骤:Preferably, as shown in FIG. 5, step S33 specifically includes the following steps:

S331、品牌数据获取:从中国品牌网、中国加盟网、买购网获取品牌数据,并对数据进行结构化处理;S331. Acquisition of brand data: obtain brand data from China Brand Network, China Franchise Network, and Maigou Network, and perform structured processing on the data;

S332、品牌数据与企业开办事项匹配:使用自然语言处理技术实现品牌数据与企业开办事项的自动匹配,并通过人工对已匹配数据进行核验,对未匹配数据进行关联,从而完成品牌到企业开办标准事项的映射,形成完整企业开办事项多元属性知识库,将上述企业开办事项知识库同步到检索引擎中。S332. Matching of brand data and business start-up items: use natural language processing technology to realize automatic matching of brand data and business start-up items, and manually verify the matched data and associate unmatched data, thereby completing the brand to business start-up standard The mapping of items forms a complete multi-attribute knowledge base of business start-up items, and the above-mentioned knowledge base of business start-up items is synchronized to the search engine.

优选地,如图6所示,所述步骤S4具体包括步骤:Preferably, as shown in Figure 6, the step S4 specifically includes the steps of:

S41、首先将输入检索关键词与企业开办事项多元属性知识库中行业形态、事项表述以及品牌词进行全文匹配查询,若能够完全匹配,则直接推荐该关键词关联的企业开办标准事项,如,输入“茶颜悦色”会命中品牌圈属性,该品牌关联的企业开办事项为“我要开饮品店”,因此最终推荐结果为“我要开饮品店”;S41. First, carry out full-text matching query of the input search keyword with the industry form, item expression and brand words in the multi-attribute knowledge base of business start-up matters. If they can be completely matched, then directly recommend the business start-up standard items associated with the keyword, such as, Entering "Tea Yanyue Se" will hit the brand circle attribute, and the business establishment item associated with this brand is "I want to open a beverage store", so the final recommendation result is "I want to open a beverage store";

S42、若未能找到完全匹配的词,则利用自然语言处理技术对检索关键词进行分词处理,获得检索关键词组,利用纠错模型获得正确的输入检索关键词,再根据同义词模型获得与检索关键词相似的同义词组,如,输入检索词为“桂林米线”,分词结果为[“桂林”,“米线”],经过同义词模型,桂林未找到同义词,但是米线获得同义词组为[“米粉”,“米面”,“粉面”...];S42. If no exact matching word is found, use natural language processing technology to perform word segmentation processing on the search keyword to obtain a search keyword group, use the error correction model to obtain the correct input search keyword, and then obtain the search keyword according to the synonym model A synonym group with similar words, for example, the input search term is "Guilin rice noodle", and the word segmentation result is ["Guilin", "rice noodle"], after the synonym model, no synonym is found for Guilin, but the synonym group obtained for rice noodle is ["rice noodle", "Rice Noodles", "Flour Noodles"...];

S43、把步骤S42的结果作为检索关键词,对企业开办标准事项以及其多元圈层属性进行检索,如,把“桂林米粉”、分词结果[“桂林”,“米线”]、以及同义词组[“米粉”,“米面”,“粉面”...]同时作为检索关键词,对企业开办标准事项名称以及其行业形态属性、事项表述属性、品牌属性等同时进行匹配。同义词“粉面”匹配到标准事项名称“我要开粉面馆”,同义词“米粉”匹配到标准事项“我要开早餐店”的事项表述属性;根据检索关键词的匹配度,以及设置的检索关键词与属性字段权重顺序,返回得分最高的N项企业开办事项,作为智能检索的推荐结果,如最终返回结果的顺序为[“我要开粉面馆”,“我要开早餐店”,“我要开快餐店”,“我要开米粉加工厂”...]。S43. Use the result of step S42 as a search keyword to search the standard items for starting a business and its multi-circle attributes, such as "Guilin rice noodles", word segmentation results ["Guilin", "rice noodles"], and synonym groups [ "Rice Noodles", "Rice Noodles", "Flour Noodles"...] are also used as search keywords to match the name of the standard item for starting a business, its industry form attribute, item expression attribute, brand attribute, etc. at the same time. The synonym "noodles" is matched to the standard item name "I want to open a noodle restaurant", and the synonym "rice noodles" is matched to the item expression attribute of the standard item "I want to open a breakfast shop"; according to the matching degree of the search keywords and the set search The weight order of keywords and attribute fields returns the highest-scoring N business start-up items as the recommended results of intelligent retrieval, such as the order of the final returned results is ["I want to open a noodle restaurant", "I want to open a breakfast restaurant", " I want to open a fast food restaurant", "I want to open a rice noodle processing factory"...].

优选地,所述步骤S43具体包括步骤:Preferably, the step S43 specifically includes the steps of:

S431、在检索企业开办标准事项目录时,进行权重的分配,其中,对于检索关键词的权重Wti分配为:原始检索关键词的权重为Wt1,步骤S42中通过分词算法获得的分词关键词词组权重为Wt2以及根据同义词模型获得同义关键词词组权重为Wt3,检索时对于检索关键词的权重设置为Wt1>Wt2>Wt3;其中,对于检索对象企业开办标准事项目录以及其多元属性的权重Wsj分配原则为:步骤S2中获得的标准事项名称权重为Ws1,步骤S31获得的行业形态属性权重为Ws2,步骤S32获得的事项表述属性权重为Ws3,步骤S33获得品牌属性权重为Ws4,检索时对于检索目标属性的权重设置为Ws1>Ws2>Ws3>Ws4。如,同义词“粉面”作为检索关键词的权重为Wt3,匹配到标准事项名称“我要开粉面馆”的权重为WS1;同义词“米粉”作为检索关键词的权重为Wt3,而通过该检索关键词命中的标准事项属性为事项表述属性,因此属性权重为Ws3S431. When retrieving the catalog of standard matters for starting an enterprise, perform weight distribution, wherein, the weight Wt i of the retrieval keyword is distributed as follows: the weight of the original retrieval keyword is Wt 1 , and the word segmentation keyword obtained by the word segmentation algorithm in step S42 The weight of the phrase is Wt 2 and the weight of the synonymous keyword phrase is Wt 3 obtained according to the synonym model, and the weight of the retrieval keyword is set to Wt 1 >Wt 2 >Wt 3 during retrieval; among them, for the search object enterprise to open a catalog of standard items and The distribution principle of the weight Ws j of its multiple attributes is as follows: the weight of the standard item name obtained in step S2 is Ws 1 , the weight of the industry form attribute obtained in step S31 is Ws 2 , the weight of the item expression attribute obtained in step S32 is Ws 3 , and step S33 The weight of the obtained brand attribute is Ws 4 , and the weight of the retrieval target attribute is set to Ws 1 >Ws 2 >Ws 3 >Ws 4 during retrieval. For example, the weight of the synonym "noodles" as a retrieval keyword is Wt 3 , the weight of matching the standard item name "I want to open a noodle restaurant" is WS 1 ; the weight of the synonym "rice noodles" as a retrieval keyword is Wt 3 , and The attribute of the standard item hit by the search keyword is an item expression attribute, so the attribute weight is Ws 3 .

S432、检索词命中标准事项的得分计算算法为传统的TF-IDF,检索词命中标准事项的得分为Sij,其中,i表示该检索词的来源(i=1表示该检索词为输入检索关键词,i=2表示该检索词为输入检索关键词的分词结果,i=3表示该检索词为根据同义词模型获得的同义词);j表示该检索词命中的属性圈层(j=1表示该检索词命中的是标准事项名称,j=2表示该检索词命中的是行业形态属性,j=3表示该检索词命中的是事项表述属性,j=4表示该检索词命中的是品牌属性)。如,S11则表示输入检索关键词命中标准事项名称的得分;S21表示分词关键词命中标准事项名称的得分;S32表示同义关键词命中行业形态属性的得分;S34表示同义关键词命中品牌属性的得分。如,标准事项“我要开粉面馆”是通过检索同义关键词“粉面”匹配到事项名称所得,则检索词“粉面”命中标准事项的得分为S31S432. The calculation algorithm for the score of the search term hitting the standard item is the traditional TF-IDF, and the score of the search term hitting the standard item is S ij , wherein, i represents the source of the search term (i=1 indicates that the search term is the input search key word, i=2 represents that the search term is the participle result of the input search keyword, i=3 represents that the search term is a synonym obtained according to the synonym model); j represents the attribute circle layer that the search term hits (j=1 represents the The search term hits the standard item name, j=2 indicates that the search term hits the industry form attribute, j=3 indicates that the search term hits the item description attribute, and j=4 indicates that the search term hits the brand attribute) . For example, S11 indicates the score of the input retrieval keyword hitting the name of the standard item; S21 indicates the score of the word segmentation keyword hitting the name of the standard item; S32 indicates the score of the synonymous keyword hitting the industry form attribute; S34 indicates the synonymous key The score of the word hit brand attribute. For example, the standard item "I want to open a noodle restaurant" is obtained by matching the name of the item by searching the synonymous keyword "noodles", then the score of the search term "noodles" hitting the standard item is S 31 .

计算标准事项被检索命中的最终得分:Compute the final score for the criteria item being retrieved hits:

Figure BDA0004019361450000131
Figure BDA0004019361450000131

其中,Wti为检索关键词的权重,Wsj表示企业开办标准事项目录以及其多元属性的权重,Sij则表示检索词命中标准事项的得分,根据检索得到的各标准事项最终得分进行排序返回得分最高的N项企业开办标准事项,作为智能检索的推荐结果。Among them, Wt i is the weight of the search keywords, Ws j is the weight of the catalog of standard items for starting a business and its multi-attributes, S ij is the score of the search terms hitting the standard items, sorted and returned according to the final scores of each standard item retrieved The N items with the highest scores as the standard items for starting a business will be used as the recommended results of the intelligent search.

如图7所示,本申请的优选实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述实施例中的融合多元属性的企业开办标准事项智能检索方法的步骤。As shown in Figure 7, the preferred embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program Steps for implementing the intelligent retrieval method for business start-up standard matters in the above-mentioned embodiment that integrates multiple attributes.

如图8所示,本申请的优选实施例还提供了一种计算机设备,该计算机设备可以是终端或活体检测服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的其他计算机设备通过网络连接通信。该计算机程序被处理器执行时以实现上述融合多元属性的企业开办标准事项智能检索方法的步骤。As shown in FIG. 8 , a preferred embodiment of the present application also provides a computer device, which may be a terminal or a living body detection server, and its internal structure may be shown in FIG. 8 . The computer device includes a processor, memory and a network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with other external computer devices through a network connection. When the computer program is executed by the processor, the steps of the above-mentioned intelligent retrieval method for business start-up standard items integrated with multiple attributes can be realized.

本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment to which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.

本申请的优选实施例还提供了一种存储介质,所述存储介质包括存储的程序,在所述程序运行时控制所述存储介质所在的设备执行上述实施例中的融合多元属性的企业开办标准事项智能检索方法的步骤。A preferred embodiment of the present application also provides a storage medium, the storage medium includes a stored program, and when the program is running, the device where the storage medium is located is controlled to execute the business start-up standard that integrates multiple attributes in the above embodiment The steps of the item intelligent retrieval method.

需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。It should be noted that the steps shown in the flowcharts of the accompanying drawings may be performed in a computer system, such as a set of computer-executable instructions, and that although a logical order is shown in the flowcharts, in some cases, The steps shown or described may be performed in an order different than here.

本实施例方法所述功能若以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个或者多个计算设备可读取存储介质中。基于这样的理解,本申请实施例对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一台计算设备(可以是个人计算机,服务器,移动计算设备或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory),磁碟或者光盘等各种可以存储程序代码的介质。If the functions described in the methods of this embodiment are implemented in the form of software function units and sold or used as independent products, they can be stored in one or more computing device-readable storage media. Based on this understanding, the part of the embodiment of the present application that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, the software product is stored in a storage medium, and includes several instructions to make a A computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) executes all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), various media that can store program codes such as magnetic disk or optical disk. .

本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请实施例中的方案可以采用各种计算机语言实现,例如,面向对象的程序设计语言Java和直译式脚本语言JavaScript等。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The solutions in the embodiments of the present application can be realized by using various computer languages, for example, the object-oriented programming language Java and the literal translation scripting language JavaScript.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。While preferred embodiments of the present application have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, the appended claims are intended to be construed to cover the preferred embodiment and all changes and modifications which fall within the scope of the application.

显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims (9)

1. An enterprise development standard item intelligent retrieval method integrating multiple attributes is characterized by comprising the following steps:
s1, enterprise historical registration data preprocessing and statistical analysis: reading individual and enterprise historical registration data, cleaning and preprocessing the data, and then carrying out statistical analysis based on the processed data to obtain the occurrence frequency of each industry classification;
s2, generating an enterprise starting standard item catalog: based on national economy industry classification, generating an enterprise starting standard item catalog by combining individuals and enterprise historical registration data;
s3, constructing a multi-attribute circle layer of the enterprise starting standard items: defining the concept of a multi-attribute circle layer for the enterprise starting standard item, and acquiring enterprise starting related knowledge for each circle layer through a natural language processing technology to form a complete enterprise starting item multi-attribute knowledge base;
s4, outputting an intelligent retrieval result based on the enterprise starting item multi-attribute knowledge base: and acquiring a keyword list of the search keywords by means of a natural language processing technology and a synonym model, and determining matching degree scores of different items according to the attribute types hit by the keywords, so as to give intelligent search result recommendation of the enterprise starting items.
2. The method for intelligently retrieving the enterprise-sponsored standard event fused with the multivariate attributes as recited in claim 1, wherein the step S1 specifically comprises the steps of:
step S11, missing value processing: reading individual and enterprise historical registration data, and deleting data lacking a word name, namely lacking an enterprise name or an individual name in the data;
s12, grouping statistical analysis based on industry classification: and carrying out grouping statistics on the data based on the industry classification field in the original data to obtain the occurrence frequency of each industry classification.
3. The method for intelligently retrieving the enterprise-sponsored standard event fused with the multivariate attribute as claimed in claim 1, wherein the step S2 specifically comprises the steps of:
step S21, preliminarily forming an enterprise starting standard item catalog based on national standards: firstly, combining industry classification specific connotations and enterprise development related business knowledge based on industry fine classification in a national economy industry classification table to obtain an enterprise development standard item catalog preliminary version;
step S22, adjusting the enterprise to start a standard item catalog by combining regional characteristics: and (3) adjusting the preliminary version of the standard item catalog obtained in the step (S21) to reflect the time-space characteristics of the regional enterprise, and combining the regional individuals and the statistical analysis result of historical registration data of the enterprise within one year to form a complete standard item catalog of the enterprise.
4. The method for intelligently retrieving the enterprise-sponsored standard event fused with the multivariate attribute as claimed in claim 2, wherein the step S3 specifically comprises:
s31, expanding an industry form circle: firstly, the content of the operation range items is used for reference to expand the industry forms, namely, the corresponding relation between the operation range items and the industry classifications in the directory is expressed according to the operation range registration specifications, the operation range item activity content corresponding to each industry classification is matched, and the industry forms are obtained; then, the corresponding relation between the industry code and the enterprise starting standard item is combined, and the industry form is associated with the enterprise starting standard item, so that the expansion of the industry form circle corresponding to each enterprise starting standard item is realized;
step S32, expansion of item expression circle: based on historical registration data of individuals and enterprises, carrying out word segmentation processing on the individual and enterprise registration names to obtain common industry expressions, namely colloquial expressions in the enterprise registration names; associating the industry expression with the enterprise starting standard item by combining the corresponding relation between the industry code and the enterprise starting standard item, thereby realizing the expansion of the item expression circle corresponding to each enterprise starting standard item;
step S33, expansion of brand circles: combining crawler and natural language processing technology, obtaining brand data of each industry by using website contents including China brand network, china alliance network and purchase network for reference, and associating the brand data with enterprise starting standard items to form a complete enterprise starting item multi-attribute knowledge base.
5. The method for intelligently retrieving the enterprise-sponsored standard event fused with the multivariate attribute as claimed in claim 4, wherein the step S33 specifically comprises the steps of:
s331, obtaining brand data: acquiring brand data from a China brand network, a China alliance network and a purchase network, and carrying out structuring processing on the data;
s332, matching the brand data with the enterprise starting items: the automatic matching of brand data and enterprise starting items is realized by using a natural language processing technology, matched data is verified manually, and unmatched data is associated, so that the mapping from the brand to the enterprise starting standard items is completed, and a complete enterprise starting item multi-attribute knowledge base is formed.
6. The method for intelligently retrieving the enterprise-sponsored standard event fused with the multivariate attribute as claimed in claim 1, wherein the step S4 specifically comprises the steps of:
s41, firstly, carrying out full-text matching query on input search keywords and the operation forms, the item expressions and the brand words in the enterprise starting item multi-attribute knowledge base, and directly recommending enterprise starting standard items related to the keywords if complete matching is achieved;
s42, if the completely matched word cannot be found, performing word segmentation processing on the search keyword by using a natural language processing technology to obtain a search keyword group, obtaining a correct input search keyword by using an error correction model, and obtaining a synonym group similar to the search keyword according to the synonym model;
and S43, taking the result of the step S42 as a search keyword, searching the enterprise starting standard items and the multi-circle-layer attributes thereof, and returning N enterprise starting standard items with the highest scores as recommendation results of intelligent search according to the matching degree of the search keyword and the set weight sequence of the search keyword and the attribute field.
7. The method for intelligently retrieving the enterprise-sponsored standard event fused with the multivariate attribute as claimed in claim 6, wherein the step S43 specifically comprises the steps of:
s431, when searching the standard item catalog of the enterprise, distributing the weight, wherein the weight Wt of the search key word i The distribution is as follows: the weight of the original search keyword is Wt 1 And in step S42, the weight of the word group of the word segmentation keywords obtained by the word segmentation algorithm is Wt 2 And obtaining the weight Wt of the synonymy keyword phrase according to the synonymy model 3 Setting the weight of the search keyword to Wt during searching 1 >Wt 2 >Wt 3 (ii) a Wherein, the standard item catalog and the weight Ws of the multi-component attribute are opened for the enterprise to be searched j The distribution principle is as follows: the standard item name weight obtained in step S2 is Ws 1 And the industry form attribute weight obtained in the step S31 is Ws 2 The item expression attribute weight obtained in step S32 is Ws 3 Step S33 obtains the weight of the brand attribute as Ws 4 Setting the weight of the attribute of the retrieval target to Ws during retrieval 1 >Ws 2 >Ws 3 >Ws 4
S432, the search word hits the standard itemThe score calculation algorithm is the traditional TF-IDF, and the score of the hit standard item of the search word is S ij Wherein i represents a source of the search term, i =1 represents that the search term is an input search keyword, i =2 represents that the search term is a segmentation result of the input search keyword, and i =3 represents that the search term is a synonym obtained according to the synonym model; j represents the attribute circle layer of the search word hit, j =1 represents that the search word hit is a standard item name, j =2 represents that the search word hit is an industry form attribute, j =3 represents that the search word hit is an item expression attribute, j =4 represents that the search word hit is a brand attribute, wherein S is 11 Then the score of the input search keyword hit standard item name is represented; s. the 21 A score representing a hit of the participle keyword on the standard item name; s 32 A score representing the hit of the synonymous keyword on the business form attribute; s. the 34 A score representing a synonym keyword hit on a brand attribute;
calculating the final score of the retrieved hit of the standard item:
Figure FDA0004019361440000041
wherein, wt i To retrieve the weight of a keyword, ws j Weight, S, representing a catalog of enterprise-initiated standard events and its multiple attributes ij And then representing the scores of the standard matters hit by the search words, and sorting according to the final scores of the standard matters obtained by searching to return the N enterprise starting standard matters with the highest scores as the recommendation result of the intelligent search.
8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor,
the processor implements the steps of the intelligent searching method for the enterprise-initiated standard affairs fused with the multivariate attributes as recited in any one of claims 1 to 7 when executing the program.
9. A storage medium including a stored program, characterized in that,
controlling a device on which the storage medium is positioned to execute the steps of the enterprise-initiated standard event intelligent retrieval method fusing the multivariate attributes as claimed in any one of claims 1 to 7 when the program runs.
CN202211684820.8A 2022-12-27 2022-12-27 Intelligent retrieval method for enterprise establishment standard items integrating multiple attributes Active CN115858603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211684820.8A CN115858603B (en) 2022-12-27 2022-12-27 Intelligent retrieval method for enterprise establishment standard items integrating multiple attributes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211684820.8A CN115858603B (en) 2022-12-27 2022-12-27 Intelligent retrieval method for enterprise establishment standard items integrating multiple attributes

Publications (2)

Publication Number Publication Date
CN115858603A true CN115858603A (en) 2023-03-28
CN115858603B CN115858603B (en) 2025-08-05

Family

ID=85653505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211684820.8A Active CN115858603B (en) 2022-12-27 2022-12-27 Intelligent retrieval method for enterprise establishment standard items integrating multiple attributes

Country Status (1)

Country Link
CN (1) CN115858603B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026361A1 (en) * 2017-07-24 2019-01-24 Mycelebs Co., Ltd. Method and apparatus for providing information by using degree of association between reserved word and attribute language
CN111694878A (en) * 2020-05-11 2020-09-22 电子科技大学 Government affair subject matter co-processing method and system based on matter association network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026361A1 (en) * 2017-07-24 2019-01-24 Mycelebs Co., Ltd. Method and apparatus for providing information by using degree of association between reserved word and attribute language
CN111694878A (en) * 2020-05-11 2020-09-22 电子科技大学 Government affair subject matter co-processing method and system based on matter association network

Also Published As

Publication number Publication date
CN115858603B (en) 2025-08-05

Similar Documents

Publication Publication Date Title
US8386482B2 (en) Method for personalizing information retrieval in a communication network
JP5679993B2 (en) Method and query system for executing a query
CN102792298B (en) Metadata sources are matched using the rule of characterization matches
TWI557664B (en) Product information publishing method and device
JP6022056B2 (en) Generate search results
CN103400286B (en) A kind of commending system and method carrying out article characteristics mark based on user behavior
CN111581990A (en) Cross-border transaction matching method and device
CN114756570B (en) Vertical searching method, device and system for purchasing scene
CN104133913B (en) A kind of city retail shop information bank automatic build system being polymerized with search based on video analysis and method
CN112100396A (en) Data processing method and device
CN114254201A (en) A recommendation method for scientific and technological project evaluation experts
CN116739626A (en) Commodity data mining processing method and device, electronic equipment and readable medium
TW201401088A (en) Search method and apparatus
CN104424257A (en) Information indexing unit and information indexing method
US8700624B1 (en) Collaborative search apps platform for web search
CN113204696A (en) Retrieval method of intelligent search engine based on text atlas
CN114416848B (en) Data kinship processing method and device based on data warehouse
CN111752922A (en) Method and device for establishing knowledge database and realizing knowledge query
CN114580402A (en) Enterprise-oriented product information acquisition method and device, server and storage medium
CN113377922B (en) Methods, devices, electronic devices and media for matching information
JP7438808B2 (en) Needs matching equipment and programs
CN114297317A (en) Data processing method, device, electronic device and storage medium
CN115858603A (en) An intelligent retrieval method for business start-up standard matters that integrates multiple attributes
CN118796863A (en) Data query method and device
CN117609468A (en) Method and device for generating search statements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant