CN116579825A - Category prediction method, device, equipment and storage medium - Google Patents
Category prediction method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116579825A CN116579825A CN202310615330.0A CN202310615330A CN116579825A CN 116579825 A CN116579825 A CN 116579825A CN 202310615330 A CN202310615330 A CN 202310615330A CN 116579825 A CN116579825 A CN 116579825A
- Authority
- CN
- China
- Prior art keywords
- category
- commodity
- determining
- vocabulary
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0623—Electronic shopping [e-shopping] by investigating goods or services
- G06Q30/0625—Electronic shopping [e-shopping] by investigating goods or services by formulating product or service queries, e.g. using keywords or predefined options
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及类目预测技术领域,更具体地说,涉及一种类目预测方法、装置、设备及存储介质。The present invention relates to the technical field of category prediction, and more specifically, to a category prediction method, device, equipment and storage medium.
背景技术Background technique
目前在电商平台中,供应商创建了商品,需要在电商平台上查看商品能否被用户搜索到。通常情况下,用户输入的关键词,能够匹配到对应的类目。比如:用户输入关键词【电脑椅】,能够匹配到【办公】类目,因此召回的商品都是电脑椅相关的产品,这种情况下搜索的质量较好。At present, on the e-commerce platform, the supplier creates a product, and needs to check whether the product can be searched by the user on the e-commerce platform. Usually, the keyword entered by the user can be matched to the corresponding category. For example, if the user enters the keyword [computer chair], it can be matched to the category of [office], so the recalled products are all products related to computer chairs. In this case, the search quality is better.
但很多的情况下,用户输入的关键词由于不能匹配到对应的类目,导致搜索不到对应的商品,从而降低了用户的体验,使得交易无法达成。比如:用户输入关键词【电脑】,如果没有匹配到【数码】类目,匹配到了【办公】类目,那么召回的是电脑桌、电脑配件等商品。这种搜索方案搜索质量较差,用户体验较差,降低了交易达成概率。However, in many cases, because the keywords entered by the user cannot match the corresponding category, the corresponding product cannot be searched, thereby reducing the user experience and making the transaction impossible to complete. For example: the user enters the keyword [computer], if the category of [digital] is not matched but the category of [office] is matched, then the products such as computer desks and computer accessories are recalled. This search solution has poor search quality and poor user experience, which reduces the probability of closing a transaction.
因此,如何准确预测类目,是本领域技术人员需要解决的问题。Therefore, how to accurately predict categories is a problem to be solved by those skilled in the art.
发明内容Contents of the invention
本发明的目的在于提供一种类目预测方法、装置、设备及存储介质,以提高类目预测的准确率。The object of the present invention is to provide a category prediction method, device, equipment and storage medium to improve the accuracy of category prediction.
为实现上述目的,本发明提供一种类目预测方法,包括:In order to achieve the above object, the present invention provides a category prediction method, including:
确定用户输入的目标关键词;Determine the target keyword entered by the user;
将所述目标关键词与类目对应关系进行匹配,确定预测类目;Matching the target keyword with the category correspondence to determine the predicted category;
其中,所述类目对应关系包括:通过历史点击行为数据确定的关键词与类目之间的第一类目对应关系、和/或商品库内的商品的词汇与类目之间的第二类目对应关系、和/或预定周期内交易成功的商品的词汇与类目之间的第三类目对应关系。Wherein, the category correspondence includes: the first category correspondence between keywords and categories determined through historical click behavior data, and/or the second category correspondence between commodity vocabulary and categories in the commodity library. A category correspondence, and/or a third category correspondence between vocabulary and categories of commodities that are successfully traded within a predetermined period.
优选地,所述确定用户输入的目标关键词之前,还包括:Preferably, before determining the target keyword input by the user, it also includes:
获取每个关键词在各召回商品的第一类目下的历史点击行为数据;Obtain the historical click behavior data of each keyword under the first category of each recalled product;
根据所述历史点击行为数据确定每个关键词在各第一类目下的点击次数,以及每个关键词的总点击次数;Determine the number of clicks of each keyword under each first category according to the historical click behavior data, and the total number of clicks of each keyword;
利用每个关键词在各第一类目下的点击次数以及每个关键词的总点击次数,计算每个关键词的第一权重值,生成第一类目对应关系。Using the number of clicks of each keyword under each first category and the total number of clicks of each keyword, the first weight value of each keyword is calculated to generate a first category correspondence.
优选地,所述根据所述历史点击行为数据确定每个关键词在各第一类目下的点击次数之前,还包括:Preferably, before determining the number of clicks of each keyword under each first category according to the historical click behavior data, the method further includes:
将所述历史点击行为数据中的无效关键词删除;所述无效关键词包括:特殊字符关键词及长度为1的关键词。Delete invalid keywords in the historical click behavior data; the invalid keywords include: keywords with special characters and keywords with a length of 1.
优选地,所述计算每个关键词的第一权重值之后,还包括:Preferably, after the calculation of the first weight value of each keyword, it also includes:
检测每个关键词在各第一类目下的权重值是否小于预定阈值;Detecting whether the weight value of each keyword under each first category is less than a predetermined threshold;
若是,则将权重值小于预定阈值的关键词及第一类目的对应关系删除。If yes, delete the corresponding relationship between the keywords whose weight value is less than the predetermined threshold and the first category.
优选地,所述确定用户输入的目标关键词之前,还包括:Preferably, before determining the target keyword input by the user, it also includes:
获取商品库内各商品的第一基础信息;所述第一基础信息包括商品标题及属性;Obtain the first basic information of each commodity in the commodity library; the first basic information includes commodity title and attribute;
将各第一基础信息拆分成多个第一商品词汇;splitting each first basic information into a plurality of first commodity words;
确定每个第一商品词汇与各第一基础信息中的各第二类目,在同一个商品的第一基础信息中共同出现的共现次数;Determining the co-occurrence times that each first commodity vocabulary and each second category in each first basic information co-occur in the first basic information of the same commodity;
确定每个第一商品词汇的总出现次数;determining the total number of occurrences of each first commodity term;
利用每个第一商品词汇与各第二类目的共现次数及总出现次数,计算每个第一商品词汇在各第二类目下的第二权重值,生成第二类目对应关系。The second weight value of each first commodity vocabulary under each second category is calculated by using the co-occurrence times and total occurrence times of each first commodity vocabulary and each second category to generate a second category correspondence.
优选地,所述确定用户输入的目标关键词之前,还包括:Preferably, before determining the target keyword input by the user, it also includes:
获取预定周期内交易成功的商品的第二基础信息;所述第二基础信息包括商品标题及属性;Acquiring second basic information of commodities successfully traded within a predetermined period; the second basic information includes commodity titles and attributes;
将各第二基础信息拆分成多个第二商品词汇;splitting each second basic information into a plurality of second commodity words;
确定每个第二商品词汇与各第二基础信息中的各第三类目,在同一个商品的第一基础信息中共同出现的共现次数;Determining the co-occurrence times that each second commodity vocabulary and each third category in each second basic information co-occur in the first basic information of the same commodity;
确定每个第二商品词汇的总出现次数;determining the total number of occurrences of each second commodity term;
利用每个第二商品词汇与各第三类目的共现次数及总出现次数,计算每个第二商品词汇在各第三类目下的第三权重值,生成第三类目对应关系。The third weight value of each second commodity vocabulary under each third category is calculated by using the co-occurrence times and total occurrence times of each second commodity vocabulary and each third category to generate a third category correspondence.
优选地,所述确定用户输入的目标关键词,包括:Preferably, said determining the target keyword input by the user includes:
获取用户输入的搜索词;Get the search term entered by the user;
将所述搜索词进行查询改写后生成多个目标关键词。A plurality of target keywords are generated after query rewriting the search words.
为实现上述目的,本发明进一步提供一种类目预测装置,包括:To achieve the above object, the present invention further provides a category prediction device, including:
确定模块,用于确定用户输入的目标关键词;A determining module, configured to determine the target keyword input by the user;
匹配模块,用于将所述目标关键词与类目对应关系进行匹配,确定预测类目;A matching module, configured to match the target keyword with the category correspondence to determine the predicted category;
其中,所述类目对应关系包括:通过历史点击行为数据确定的关键词与类目之间的第一类目对应关系、和/或商品库内的商品的词汇与类目之间的第二类目对应关系、和/或预定周期内交易成功的商品的词汇与类目之间的第三类目对应关系。Wherein, the category correspondence includes: the first category correspondence between keywords and categories determined through historical click behavior data, and/or the second category correspondence between commodity vocabulary and categories in the commodity library. A category correspondence, and/or a third category correspondence between vocabulary and categories of commodities that are successfully traded within a predetermined period.
为实现上述目的,本发明进一步提供一种电子设备,包括:To achieve the above object, the present invention further provides an electronic device, comprising:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行所述计算机程序时实现上述类目预测方法的步骤。A processor, configured to implement the steps of the above category prediction method when executing the computer program.
为实现上述目的,本发明进一步提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述类目预测方法的步骤。To achieve the above object, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above category prediction method are realized.
通过以上方案可知,本发明实施例提供的一种类目预测方法、装置、设备及存储介质;在本方案中,需要预先确定类目对应关系,该类目对应关系包括通过历史点击行为数据确定的关键词与类目之间的第一类目对应关系、和/或商品库内的商品的词汇与类目之间的第二类目对应关系、和/或预定周期内交易成功的商品的词汇与类目之间的第三类目对应关系;获取到用户输入的目标关键词后,即可将该目标关键词与类目对应关系进行匹配得到预测类目;可见,本方案在进行类目预测时,不仅可根据历史点击行为数据确定的第一类目对应关系预测类目,还可利用商品库内的商品的词汇及预定周期内交易成功的商品的词汇与类目的对应关系预测,使得不管类目是否发生点击行为,均可准确进行类目预测,进而提高了商品召回的准确度。It can be known from the above scheme that the embodiment of the present invention provides a category prediction method, device, equipment and storage medium; in this scheme, it is necessary to predetermine the category correspondence, and the category correspondence includes The first category correspondence between keywords and categories, and/or the second category correspondence between the vocabulary of commodities in the commodity library and the second category, and/or the vocabulary of commodities that are successfully traded within a predetermined period The third category correspondence between the category and the category; after obtaining the target keyword input by the user, the target keyword can be matched with the category correspondence to obtain the predicted category; it can be seen that this program is performing category When forecasting, not only can the category be predicted based on the first category correspondence determined by the historical click behavior data, but also the vocabulary of the commodity in the commodity library and the correspondence between the vocabulary and the category of the commodity that has been successfully traded within a predetermined period can be used to predict, Therefore, regardless of whether a click behavior occurs in the category, the category prediction can be accurately performed, thereby improving the accuracy of product recall.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为传统方案中的类目预测装置结构示意图;Fig. 1 is a schematic structural diagram of a category prediction device in a traditional scheme;
图2为本发明实施例公开的一种类目预测方法流程示意图;Fig. 2 is a schematic flow chart of a category prediction method disclosed in an embodiment of the present invention;
图3为本发明实施例公开的第一类目对应关系示意图;Fig. 3 is a schematic diagram of the corresponding relationship of the first category disclosed in the embodiment of the present invention;
图4为本发明实施例公开的第二类目对应关系示意图;Fig. 4 is a schematic diagram of the corresponding relationship of the second category disclosed in the embodiment of the present invention;
图5为本发明实施例公开的类目预测整体流程示意图;Fig. 5 is a schematic diagram of the overall flow of category prediction disclosed in the embodiment of the present invention;
图6为本发明实施例公开的一种类目预测装置结构示意图;FIG. 6 is a schematic structural diagram of a category prediction device disclosed in an embodiment of the present invention;
图7为本发明实施例公开的一种电子设备结构示意图。Fig. 7 is a schematic structural diagram of an electronic device disclosed by an embodiment of the present invention.
具体实施方式Detailed ways
需要说明的是,为了对本方案清楚说明,在此对本方案涉及的专业名词进行进行说明:It should be noted that, in order to clearly explain this scheme, the professional terms involved in this scheme are explained here:
类目:对商品分类,指根据一定的管理目的,为满足商品生产、流通、消费活动的全部或部分需要,将管理范围内的商品集合总体,以所选择的适当的商品基本特征作为分类标志,逐次归纳为若干个范围更小、特质更趋一致的子集合体(类目)。在电商平台上面,商品都会以分门别类的方式将商品整齐的进行规划,在需要发布商品时是需要通过一级一级的方式来进行发布的。主类目有数码、家居、办公、服装,美妆,交通工具等。比如电脑类商品,分类在【数码】类目下面。电脑椅类商品,分类在【办公】类目下面。电脑桌布类商品,分类在【家居】类目下面。Category: Classification of commodities refers to the collection of commodities within the scope of management in order to meet all or part of the needs of commodity production, circulation and consumption activities according to certain management purposes, and the selected basic characteristics of commodities are used as classification marks , successively summarized into several sub-aggregates (categories) with smaller scope and more consistent characteristics. On the e-commerce platform, the products will be neatly planned in a categorized manner, and when products need to be released, they need to be released in a level-by-level manner. The main categories include digital, home, office, clothing, beauty, transportation, etc. For example, computer products are classified under the [Digital] category. Computer chairs are classified under 【Office】category. Computer tablecloth products are classified under the [Home Furnishing] category.
长尾词:在搜索输入关键词中,词长度比较长的词。一般长度超过8的词,就被称为长尾词;长尾词,由于长度较长,自身含有的信息较多,往往带有更清晰的目标。Long tail words: In the search input keywords, the word length is relatively long. Generally, words with a length of more than 8 are called long-tail words; long-tail words, because of their longer length, contain more information and often have clearer goals.
召回:通过用户给出的搜索关键词,长尾词等,去搜索引擎中全局框定一批符合用户筛选条件的商品。Recall: Through the search keywords and long-tail words given by the user, go to the search engine to globally frame a batch of products that meet the user's filtering conditions.
参见图1,为传统方案中的类目预测装置结构示意图,通过图1可以看出,在传统方案中,可搜索点击埋点得到搜索点击日志,从该搜索点击日志中选择一段时间内的中高频query(搜索词)(例如选择近15天内搜索次数大于等于5次的query),通过Job内的类目预测点击模型得到query的类目预测结果,存储在ES(ElasticSearch,数据库)中。用户搜索query时,可通过类目预测接口从ES直接取出类目预测结果返回。See Figure 1, which is a schematic diagram of the structure of the category prediction device in the traditional solution. It can be seen from Figure 1 that in the traditional solution, the search click log can be obtained by searching for the click buried point, and the middle and high levels within a period of time can be selected from the search click log. Frequent query (search term) (for example, select a query whose search times are greater than or equal to 5 times in the past 15 days), and get the category prediction result of the query through the category prediction click model in the job, and store it in ES (ElasticSearch, database). When a user searches for a query, the category prediction result can be directly retrieved from ES through the category prediction interface and returned.
可见,在传统方式中,类目预测模型通过存储搜索点击命中的高频query记录到离线字典,因此导致未曾发生点击的类目以及点击少的类目不在离线字典中,因此搜索时将不会返回相关类目。因此在本发明实施例中,公开了一种类目预测方法、装置、设备及存储介质,本方案可实现Top Query热门搜索词对应的类目预测;还可实现长尾词预测,并在类目预测是考虑用户行为偏好,从而提高类目预测的准确率,进而提高了搜索的召回率。It can be seen that in the traditional way, the category prediction model records the high-frequency query hit by the search click into the offline dictionary, so the categories that have never been clicked and the categories with few clicks are not in the offline dictionary, so the search will not Return related categories. Therefore, in the embodiment of the present invention, a category prediction method, device, device, and storage medium are disclosed. This solution can realize category prediction corresponding to Top Query popular search words; it can also realize long-tail word prediction, and in the category Prediction is to consider user behavior preferences, thereby improving the accuracy of category prediction, thereby improving the recall rate of search.
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
参见图2,本发明实施例提供的一种类目预测方法流程示意图,包括:Referring to FIG. 2 , a schematic flowchart of a category prediction method provided by an embodiment of the present invention includes:
S101、确定用户输入的目标关键词;S101. Determine the target keyword input by the user;
在本实施例中,确定用户输入的目标关键词时,需要获取用户输入的搜索词,并将该搜索词进行查询改写后生成多个目标关键词。In this embodiment, when determining the target keyword input by the user, it is necessary to obtain the search term input by the user, and rewrite the search term to generate a plurality of target keywords.
具体来说,该搜索词为用户在搜索时输入的内容,并且,本方案为了提高搜索质量,将用户输入的搜索词进行查询改写后生成多个目标关键词,每个目标关键词均与类目对应关系匹配,得到与每个目标关键词对应的预测类目;其中,查询改写(query rewrite,QR)技术可通过对用户输入的搜索词进行分析处理,得到更能够体现用户实际意图的目标关键词,该分析处理包括对搜索词进行分词、预处理等操作,因此在本方案中,就算用户输入的搜索词为长尾词,也可通过查询改写技术对该长尾词执行分词等操作,从而生成更容易预测类目的目标关键词,提高长尾词的搜索命中率。Specifically, the search term is the content entered by the user when searching, and in order to improve the search quality, this solution rewrites the search term entered by the user to generate multiple target keywords, and each target keyword is related to the category match the corresponding relationship between items to obtain the predicted category corresponding to each target keyword; among them, the query rewrite (query rewrite, QR) technology can analyze and process the search words entered by the user to obtain the target that can better reflect the actual intention of the user. Keyword, the analysis process includes operations such as word segmentation and preprocessing on the search word. Therefore, in this solution, even if the search word entered by the user is a long-tail word, operations such as word segmentation can be performed on the long-tail word through query rewriting technology , so as to generate target keywords that are easier to predict categories, and improve the search hit rate of long-tail words.
S102、将目标关键词与类目对应关系进行匹配,确定预测类目;其中,类目对应关系包括:通过历史点击行为数据确定的关键词与类目之间的第一类目对应关系、和/或商品库内的商品的词汇与类目之间的第二类目对应关系、和/或预定周期内交易成功的商品的词汇与类目之间的第三类目对应关系。S102. Match the target keyword with the category correspondence to determine the predicted category; wherein, the category correspondence includes: the first category correspondence between keywords and categories determined through historical click behavior data, and /or the second category correspondence between vocabulary and categories of commodities in the commodity library, and/or the third category correspondence between vocabulary and categories of commodities that are successfully traded within a predetermined period.
在本实施例中,需要预先建立目标对应关系,该目标对应关系可通过多种方式确定,具体来说,本方案通过历史点击行为数据确定的关键词与类目之间的第一类目对应关系,可确定发生点击行为的类目与关键词的对应关系,若目标关键词与第一对应关系中的关键词匹配到,则与匹配到的关键词对应的类目即为目标关键词的预测类目;本方案通过商品库内的商品的词汇与类目之间的第二类目对应关系,可确定未发生点击行为或者点击行为少的类目与关键词的对应关系,该对应关系主要考虑商品的词汇与类目之间的对应关系,若目标关键词与第二对应关系中的商品的词汇匹配到,则与匹配到的商品的词汇对应的类目即为目标关键词的预测类目;本方案通过预定周期内交易成功的商品的词汇与类目之间的第三类目对应关系,可根据用户的搜索query和最终交易的商品存在的正相关性,为类目提供预测。In this embodiment, it is necessary to establish a target correspondence in advance, and the target correspondence can be determined in various ways. Specifically, the first category correspondence between keywords and categories determined by historical click behavior data in this solution relationship, which can determine the corresponding relationship between the category where the click behavior occurs and the keyword. If the target keyword matches the keyword in the first corresponding relationship, then the category corresponding to the matched keyword is the target keyword. Predict categories; this program can determine the correspondence between categories and keywords with no click behavior or less click behavior through the second category correspondence between commodity vocabulary and categories in the commodity library. Mainly consider the correspondence between the vocabulary of the commodity and the category. If the target keyword matches the vocabulary of the commodity in the second correspondence, the category corresponding to the vocabulary of the matched commodity is the prediction of the target keyword. Category: This solution provides predictions for categories based on the positive correlation between the user's search query and the final traded commodity through the third category correspondence between the vocabulary of the successfully traded commodity within the predetermined period and the category .
综上可见,本方案在可从多角度进行类目预测,不仅可根据历史点击行为数据确定的类目对应关系预测类目,还可利用商品库内的商品的词汇及预定周期内交易成功的商品的词汇与类目的对应关系预测,使得不管类目是否发生点击行为,均可准确进行类目预测,进而提高了商品召回的准确度。To sum up, this program can predict categories from multiple perspectives, not only predicting categories based on the category correspondence determined by historical click behavior data, but also using the vocabulary of commodities in the commodity library and the success of transactions within a predetermined period. The prediction of the corresponding relationship between the vocabulary of the product and the category makes it possible to accurately predict the category regardless of whether the click behavior occurs in the category, thereby improving the accuracy of product recall.
基于上述方法实施例,在本实施例中,公开了第一类目对应关系的具体生成过程,该过程主要包括如下步骤:Based on the above method embodiment, in this embodiment, a specific generation process of the first category correspondence relationship is disclosed, and the process mainly includes the following steps:
获取每个关键词在各召回商品的第一类目下的历史点击行为数据;Obtain the historical click behavior data of each keyword under the first category of each recalled product;
根据所述历史点击行为数据确定每个关键词在各第一类目下的点击次数,以及每个关键词的总点击次数;Determine the number of clicks of each keyword under each first category according to the historical click behavior data, and the total number of clicks of each keyword;
利用每个关键词在各第一类目下的点击次数以及每个关键词的总点击次数,计算每个关键词的第一权重值,生成第一类目对应关系。Using the number of clicks of each keyword under each first category and the total number of clicks of each keyword, the first weight value of each keyword is calculated to generate a first category correspondence.
具体来说,本实施例可通过点击挖掘模块生成第一类目对应关系;在生成第一类目对应关系时,可通过点击模型统计每个query在各个召回商品的类目下的历史一段时间内的历史点击行为数据,该历史点击行为数据能根据用户行为确定各关键词与类目之间的相关性。例如:用户输入的关键词为“电视机”,此时召回商品的第一类目包括:“平板电视”、“电视机架”、“激光电视”等等,若在预定时间内,用户点击了“平板电视”这一类目,说明“平板电视”这一类目与电视机有相关性,此时便建立“电视机”与“平板电视”的对应关系。Specifically, this embodiment can generate the first category correspondence through the click mining module; when generating the first category correspondence, the history of each query under the category of each recalled product can be counted for a period of time through the click model The historical click behavior data in the website can determine the correlation between keywords and categories according to user behavior. For example: the keyword entered by the user is "television", and the first category of recalled products includes: "flat-screen TV", "television stand", "laser TV", etc., if the user clicks The category of "flat-panel TV" is identified, indicating that the category of "flat-panel TV" is related to TV sets, and at this time, the corresponding relationship between "television sets" and "flat-panel TVs" is established.
需要说明的是,为了提高关键词的质量,可在获取历史点击行为数据后,确定每个关键词在各第一类目下的点击次数之前,进行关键词条件预处理,该预处理具体包括:将历史点击行为数据中的无效关键词删除;该无效关键词包括:特殊字符关键词及长度为1的关键词。并且,本方案可根据各个类目的点击次数计算类目相关性,本方案通过权重值体现相关性。具体来说,本方案计算query和类目(cate_id)的权重值的方法为条件概率,即在给定query的情况下,用户点击到对应类目上的概率:It should be noted that, in order to improve the quality of keywords, keyword condition preprocessing can be performed after obtaining historical click behavior data and before determining the number of clicks of each keyword under each first category. Including: deleting invalid keywords in historical click behavior data; the invalid keywords include: keywords with special characters and keywords with a length of 1. In addition, this solution can calculate the category relevance according to the number of clicks of each category, and this solution reflects the relevance through the weight value. Specifically, the method of calculating the weight value of query and category (cate_id) in this solution is the conditional probability, that is, the probability that the user clicks on the corresponding category given the query:
cate_score1=P(cate_id|query)=qc/qq;cate_score1=P(cate_id|query)=qc/qq;
其中,cate_score1为第一权重值,qc表示关键词query在类目cate_id上的点击次数,qq表示该关键词query的总点击次数。Wherein, cate_score1 is the first weight value, qc indicates the number of clicks of the keyword query on the category cate_id, and qq indicates the total number of clicks of the keyword query.
并且,本方案计算每个关键词的第一权重值之后,可将各个第一权重值按照数值大小进行排序,并检测每个关键词在各第一类目下的权重值是否小于预定阈值;若是,则将权重值小于预定阈值的关键词及第一类目的对应关系删除。通过该方式,可以将低频query词排除。Moreover, after calculating the first weight value of each keyword in this solution, each first weight value can be sorted according to the numerical value, and it can be detected whether the weight value of each keyword under each first category is less than a predetermined threshold; If yes, delete the corresponding relationship between the keywords whose weight value is less than the predetermined threshold and the first category. In this way, low-frequency query words can be excluded.
参见图3,本发明实施例提供的第一类目对应关系示意图;通过该图可以看出,该第一类目对应关系,不仅包括各关键词与类目的对应关系,还包括关键词与类目的权重值,如:关键词为“电视机”,则第一类目对应关系中记载的与“电视机”对应的类目包括:第一权重值为0.44的“平板电视”、第一权重值为0.21的“电视机架”、第一权重值为0.12的“激光电视”等等。Referring to FIG. 3 , a schematic diagram of the first category correspondence provided by the embodiment of the present invention; it can be seen from this figure that the first category correspondence not only includes the correspondence between keywords and categories, but also includes the correspondence between keywords and categories. The weight value of the category, such as: the keyword is "television", then the category corresponding to "television" recorded in the first category correspondence includes: "flat-panel television" with the first weight value of 0.44, the first category A "television rack" with a weight value of 0.21, a "laser TV" with a first weight value of 0.12, and so on.
需要说明的是,上述将用户搜索点击行为数据进行分析、筛选和存储,得出一份<query,[cate_id,weight]>字典数据的过程,可以理解为离线字典构建过程,构建离线字典数据后,需要将这份数据持久化,并对外暴露接口,供使用方通过接口查询使用。其中,持久层存储使用的是elasticsearch。It should be noted that the above-mentioned process of analyzing, filtering and storing user search and click behavior data to obtain a <query,[cate_id,weight]> dictionary data can be understood as an offline dictionary construction process. After constructing offline dictionary data , it is necessary to persist this data and expose the interface for the user to query and use through the interface. Among them, the persistent layer storage uses elasticsearch.
基于上述方法实施例,在本实施例中,公开了第二类目对应关系的具体生成过程,该过程主要包括如下步骤:Based on the above method embodiment, in this embodiment, a specific generation process of the second category correspondence relationship is disclosed, and the process mainly includes the following steps:
获取商品库内各商品的第一基础信息;第一基础信息包括商品标题及属性;Obtain the first basic information of each commodity in the commodity library; the first basic information includes commodity titles and attributes;
将各第一基础信息拆分成多个第一商品词汇;splitting each first basic information into a plurality of first commodity words;
确定每个第一商品词汇与各第一基础信息中的各第二类目,在同一个商品的第一基础信息中共同出现的共现次数;Determining the co-occurrence times that each first commodity vocabulary and each second category in each first basic information co-occur in the first basic information of the same commodity;
确定每个第一商品词汇的总出现次数;determining the total number of occurrences of each first commodity term;
利用每个第一商品词汇与各第二类目的共现次数及总出现次数,计算每个第一商品词汇在各第二类目下的第二权重值,生成第二类目对应关系。The second weight value of each first commodity vocabulary under each second category is calculated by using the co-occurrence times and total occurrence times of each first commodity vocabulary and each second category to generate a second category correspondence.
具体来说,本实施例可通过文本语义挖掘模块生成第二类目对应关系;其中,商品的基础信息包括商品标题及属性,该属性包括商品的型号、品牌等等;若将商品标题与属性拆分成单个商品词汇,则每个商品词汇与类目的相关性不同,或者说每个商品词汇相关的类目有限且权重不一。因此在本方案中,可计算出每个商品词汇与类目的权重值,值域一般控制在[0,1],0代表不相关,1代表非常相关。该过程相当于特征选择,特征即为商品词汇,选择重要的商品词汇。商品词汇的重要性由词在类目中的聚集性决定,即:若商品词汇只在少数几个类目中出现,则词的辨别性越强,重要性越高。Specifically, this embodiment can generate the second category correspondence through the text semantic mining module; wherein, the basic information of the commodity includes the commodity title and attributes, and the attributes include the model, brand, etc. of the commodity; if the commodity title and the attribute If it is split into a single commodity vocabulary, the correlation between each commodity vocabulary and category is different, or the categories related to each commodity vocabulary are limited and have different weights. Therefore, in this solution, the weight value of each commodity vocabulary and category can be calculated, and the value range is generally controlled at [0,1], 0 means irrelevant, and 1 means very relevant. This process is equivalent to feature selection, the feature is the commodity vocabulary, and the important commodity vocabulary is selected. The importance of commodity vocabulary is determined by the clustering of words in categories, that is, if commodity vocabulary only appears in a few categories, the more discriminative a word is, the higher its importance will be.
在本实施例中,将<商品标题词/品牌型号词,类目>配对,统计商品词汇和类目的共现关系,计算词汇与类目之间的相关性得分的方法通常有条件概率、点互信息(Pointwise Mutual Information,PMI)、卡方、tf-dc(Term Frequency-distributionconcentration,词权重计算方法)等,基于实际数据检验结果,本方案采用条件概率计算方法,即计算在给定第一商品词汇(token)的情况下,类目(cate_id)与词汇共现的概率:In this embodiment, the <commodity title word/brand model word, category> is paired, the co-occurrence relationship between the commodity vocabulary and the category is counted, and the method for calculating the correlation score between the vocabulary and the category usually has conditional probability, Pointwise mutual information (Pointwise Mutual Information, PMI), chi-square, tf-dc (Term Frequency-distribution concentration, word weight calculation method), etc., based on the actual data test results, this program adopts the conditional probability calculation method, that is, it is calculated at a given In the case of a commodity vocabulary (token), the probability of co-occurrence of the category (cate_id) and the vocabulary:
cate_score2=P(cate_id|token)=tc/tt;cate_score2=P(cate_id|token)=tc/tt;
其中,cate_score2为第二权重值,tc表示第一商品词汇(token)和某一类目(cate_id)的共现次数,tt表示该token的总出现次数。Among them, cate_score2 is the second weight value, tc indicates the co-occurrence times of the first commodity vocabulary (token) and a certain category (cate_id), and tt indicates the total occurrence times of the token.
参见图4,本发明实施例提供的第二类目对应关系示意图;通过该图可以看出,该第二类目对应关系,不仅包括各商品词汇与类目的对应关系,还包括商品词汇与类目的权重值,例如:第一基础信息为“美的空调KFR-72LW/DY-PA400”,将该第一基础信息拆分的商品词汇包括:“美的”、“空调”、“KFR-72LW”、“DY-PA400”,在此选择“美的”作为第一商品词汇,各个第一基础信息中包括如下第二类目:“空调”、“电饭煲”、“电风扇”、“电热水器”、“电热水壶”,经过统计“美的”这个商品词汇与上述第二类目的共现关系后得到,“美的”词汇一共出现555次,其中与“空调”类目一起出现85次,与“电饭煲”类目一起出现59次,与“电风扇”类目一起出现42次,与“电热水器”类目一起出现36次,与“暖风机/取暖器”类目一起出现26次,与“电热水壶”类目一起出现20次等;通过上述方式计算出以下结果:"空调":0.15,"电饭煲":0.11,"电风扇":0.08,"电热水器":0.06,"暖风机/取暖器":0.05,"电热水壶":0.04。Referring to FIG. 4 , a schematic diagram of the corresponding relationship of the second category provided by the embodiment of the present invention; it can be seen from the figure that the corresponding relationship of the second category not only includes the corresponding relationship between each commodity vocabulary and the category, but also includes the commodity vocabulary and the corresponding relationship. The weight value of the category, for example: the first basic information is "Midea air conditioner KFR-72LW/DY-PA400", and the commodity vocabulary that splits the first basic information includes: "Midea", "air conditioner", "KFR-72LW ", "DY-PA400", here select "Midea" as the first commodity vocabulary, each first basic information includes the following second categories: "air conditioner", "rice cooker", "electric fan", "electric water heater" , "Electric Kettle", after counting the co-occurrence relationship between the product word "Midea" and the above-mentioned second category, the word "Midea" appeared 555 times in total, of which 85 times appeared together with the category "air conditioner", and it appeared with the category "Air Conditioner". The category of "rice cooker" appeared 59 times together, 42 times with the category of "electric fan", 36 times with the category of "electric water heater", 26 times with the category of "heater/heater", and 26 times with the category of " The "electric kettle" category appears 20 times together; the following results are calculated by the above method: "air conditioner": 0.15, "rice cooker": 0.11, "electric fan": 0.08, "electric water heater": 0.06, "heater/heating Appliance": 0.05, "Electric Kettle": 0.04.
需要说明的是,本方案针对搜集不到足够的搜索-点击行为数据时,可通过建立第二类目对应关系构建在线字典,在线字典补充了模型对长尾词场景的预测能力。在线字典通过对大量已有商品标题的训练,构建出一套在长尾词场景下的预测能力。长尾query词并没有出现在历史的搜索点击行为数据中,而是作为一个新词输入,在线字典对输入的长尾词进行分词、预处理,而后输入模型,得出类目预测结果。此外,本方案为了提升预测结果的整体响应时间,该字典做了内存级别的缓存,因为确定的query词和确定的模型下预测结果是一样的。It should be noted that this solution can build an online dictionary by establishing a second category correspondence when not enough search-click behavior data is collected. The online dictionary complements the model's ability to predict long-tail word scenarios. The online dictionary builds a set of prediction capabilities in the long-tail word scene by training a large number of existing product titles. The long-tail query word does not appear in the historical search click behavior data, but is input as a new word. The online dictionary performs word segmentation and preprocessing on the input long-tail word, and then inputs it into the model to obtain the category prediction result. In addition, in order to improve the overall response time of the prediction results in this solution, the dictionary is cached at the memory level, because the determined query words are the same as the predicted results under the determined model.
基于上述方法实施例,在本实施例中,公开了第三类目对应关系的具体生成过程,该过程主要包括如下步骤:Based on the above method embodiment, in this embodiment, a specific generation process of the third category correspondence relationship is disclosed, and the process mainly includes the following steps:
获取预定周期内交易成功的商品的第二基础信息;所述第二基础信息包括商品标题及属性;Acquiring second basic information of commodities successfully traded within a predetermined period; the second basic information includes commodity titles and attributes;
将各第二基础信息拆分成多个第二商品词汇;splitting each second basic information into a plurality of second commodity words;
确定每个第二商品词汇与各第二基础信息中的各第三类目,在同一个商品的第一基础信息中共同出现的共现次数;Determining the co-occurrence times that each second commodity vocabulary and each third category in each second basic information co-occur in the first basic information of the same commodity;
确定每个第二商品词汇的总出现次数;determining the total number of occurrences of each second commodity term;
利用每个第二商品词汇与各第三类目的共现次数及总出现次数,计算每个第二商品词汇在各第三类目下的第三权重值,生成第三类目对应关系。The third weight value of each second commodity vocabulary under each third category is calculated by using the co-occurrence times and total occurrence times of each second commodity vocabulary and each third category to generate a third category correspondence.
具体来说,本实施例可通过类目融合模块生成第三类目对应关系;在生成第三类目对应关系时,考虑到用户的搜索query和最终交易的商品存在正相关性,将最近一段周期内的交易未失败的基础信息进行分词,为了进行区分,将确定第二类目对应关系所涉及的基础信息称为第一基础信息,将确定第三类目对应关系所涉及的基础信息称为第二基础信息,该第二基础信息同样包括商品标题及属性,该属性包括商品品牌、型号等信息。第二基础信息确定后,即可将第二基础信息拆分成多个第二商品词汇,为了进行区分,将确定第二类目对应关系时所拆分的商品词汇称为第一商品词汇,将确定第三类目对应关系时所拆分的商品词汇称为第二商品词汇;将确定在历史点击行为数据中的类目称为第一类目,将第一基础信息中的类目称为第二类目,将第二基础信息中的类目称为第三类目;将第一类目对应关系中的权重值称为第一权重值,将第二类目对应关系中的权重值称为第二权重值,将第三类目对应关系中的权重值称为第三权重值。Specifically, in this embodiment, the third category correspondence can be generated through the category fusion module; when generating the third category correspondence, considering that there is a positive correlation between the user's search query and the final transaction commodity, the most recent period The basic information involved in determining the corresponding relationship of the second category is called the first basic information, and the basic information involved in determining the corresponding relationship of the third category is called It is the second basic information, and the second basic information also includes product title and attribute, and the attribute includes information such as product brand and model. After the second basic information is determined, the second basic information can be split into multiple second commodity vocabulary. In order to distinguish, the commodity vocabulary split when determining the correspondence relationship of the second category is called the first commodity vocabulary. The commodity vocabulary split when determining the corresponding relationship of the third category is called the second commodity vocabulary; the category determined in the historical click behavior data is called the first category, and the category in the first basic information is called For the second category, the category in the second basic information is called the third category; the weight value in the first category correspondence is called the first weight value, and the weight value in the second category correspondence is called The value is called the second weight value, and the weight value in the third category correspondence is called the third weight value.
本方案在确定每个第二商品词汇与第三类目的第三权重值时,可依据条件概率计算方法,即:计算在给定第二商品词汇(token)的情况下,类目(cate_id)与第二商品词汇共现的概率:In this solution, when determining the third weight value of each second commodity vocabulary and the third category item, the conditional probability calculation method can be used, that is, to calculate the category (cate_id ) and the probability of co-occurrence of the second commodity vocabulary:
cate_score3=P(cate_id|token)=tc/ttcate_score3=P(cate_id|token)=tc/tt
其中,cate_score3为第三权重值,tc表示第二商品词汇(token)和某一类目(cate_id)的共现次数,tt表示该token的总出现次数。Among them, cate_score3 is the third weight value, tc indicates the co-occurrence times of the second commodity vocabulary (token) and a certain category (cate_id), and tt indicates the total occurrence times of the token.
参见图5,本发明实施例提供的类目预测整体流程示意图;基于图5可以看出,点击挖掘模块通过离线计算,基于点击埋点数据建立搜索关键词和类目的对应关系;文本语义挖掘模块通过在线计算,基于商品维表数据建立起商品库内的商品和类目的对应关系;类目融合模块建立起近期交易成功的商品词汇和类目的对应概率关系;用户搜索query输入后,通过类目预测接口进行预测,在预测首先通过查询改写确定目标关键词,再借助于上述各模块建立的类目对应关系,得到类目预测结果,返回类目预测结果后,预测的类目将参与搜索商品召回。需要说明的是,本实施例通过类目对应关系确定预测类目后,同样可确定预测类目的权重值,因此在返回预测结果时,可将各预测类目的权重值、以及具体通过哪一类目对应关系得到的预测结果一同返回,以便在商品召回作为参考。Referring to Fig. 5 , a schematic diagram of the overall process flow of category prediction provided by the embodiment of the present invention; based on Fig. 5, it can be seen that the click mining module establishes the corresponding relationship between search keywords and categories based on click buried point data through offline calculation; text semantic mining Through online calculation, the module establishes the corresponding relationship between commodities and categories in the commodity library based on the commodity dimension table data; the category fusion module establishes the corresponding probability relationship between the commodity vocabulary and categories that have been successfully traded recently; after the user searches for query input, Prediction is performed through the category prediction interface. In the prediction, the target keyword is first determined by query rewriting, and then the category prediction result is obtained with the help of the category correspondence relationship established by the above modules. After returning the category prediction result, the predicted category will be Participate in searching for product recalls. It should be noted that, in this embodiment, after determining the predicted category through the category correspondence, the weight value of the predicted category can also be determined. Therefore, when returning the predicted result, the weight value of each predicted category and the specific pass The prediction results obtained from the corresponding relationship of a category are returned together, so as to be used as a reference in product recalls.
综上可以看出,本方案通过存储搜索点击命中的高频query记录,以及query所对应的类目,在离线字典形成关键词和类目的对应关系,形成正向循环,提高类目预测的准确率;本方案针对不在离线字典中的长尾词、未曾发生点击的类目以及点击少的类目,设置文本语义挖掘模块,通过文本语义挖掘模块可使长尾词以及未在离线字典中的关键词也能命中对应的类目;本方案通过类目融合模块融合点击挖掘模块及文本语义挖掘模块,使得近期交易成功的商品词汇和对应的商品类目建立起对应匹配关系,而且,此对应关系,具有逐步加强和演化的能力,匹配关系会随着使用而逐步强化。可见,通过本方案,可针对query分词后分别匹配出类目预测结果并融合权重值返回,使得商品召回的类目准确度得到了明显的提升。To sum up, it can be seen that this solution forms the corresponding relationship between keywords and categories in the offline dictionary by storing the high-frequency query records hit by search clicks and the categories corresponding to the queries, forming a positive cycle and improving the accuracy of category prediction. Accuracy rate; this solution sets up a text semantic mining module for long-tail words that are not in the offline dictionary, categories that have never been clicked, and categories that have few clicks. Through the text semantic mining module, long-tail words and categories that are not in the offline dictionary The keyword can also hit the corresponding category; this scheme integrates the click mining module and the text semantic mining module through the category fusion module, so that the recent transaction successful commodity vocabulary and the corresponding commodity category establish a corresponding matching relationship, and, here The corresponding relationship has the ability to gradually strengthen and evolve, and the matching relationship will be gradually strengthened with use. It can be seen that through this solution, the category prediction results can be matched respectively after the word segmentation of the query and returned by integrating the weight value, so that the category accuracy of product recall has been significantly improved.
下面对本发明实施例提供的类目预测装置、设备及存储介质进行介绍,下文描述的类目预测装置、设备及存储介质与上文描述的类目预测方法可以相互参照。The category prediction apparatus, equipment, and storage medium provided by the embodiments of the present invention are introduced below. The category prediction apparatus, equipment, and storage medium described below and the category prediction method described above may be referred to each other.
参见图6,本发明实施例提供的一种类目预测装置结构示意图,包括:Referring to Figure 6, a schematic structural diagram of a category prediction device provided by an embodiment of the present invention, including:
确定模块11,用于确定用户输入的目标关键词;Determining module 11, is used for determining the target keyword of user input;
匹配模块12,用于将所述目标关键词与类目对应关系进行匹配,确定预测类目;Matching module 12, is used for matching described target keyword and category correspondence, determines prediction category;
其中,所述类目对应关系包括:通过历史点击行为数据确定的关键词与类目之间的第一类目对应关系、和/或商品库内的商品的词汇与类目之间的第二类目对应关系、和/或预定周期内交易成功的商品的词汇与类目之间的第三类目对应关系。Wherein, the category correspondence includes: the first category correspondence between keywords and categories determined through historical click behavior data, and/or the second category correspondence between commodity vocabulary and categories in the commodity library. A category correspondence, and/or a third category correspondence between vocabulary and categories of commodities that are successfully traded within a predetermined period.
在本发明另一实施例中,所述装置还包括:点击挖掘模块;In another embodiment of the present invention, the device further includes: a click mining module;
所述点击挖掘模块包括:The click mining module includes:
第一获取单元,用于获取每个关键词在各召回商品的第一类目下的历史点击行为数据;The first acquisition unit is used to acquire the historical click behavior data of each keyword under the first category of each recalled commodity;
第一确定单元,用于根据所述历史点击行为数据确定每个关键词在各第一类目下的点击次数,以及每个关键词的总点击次数;The first determination unit is configured to determine the number of clicks of each keyword under each first category and the total number of clicks of each keyword according to the historical click behavior data;
第一计算单元,用于利用每个关键词在各第一类目下的点击次数以及每个关键词的总点击次数,计算每个关键词的第一权重值,生成第一类目对应关系。The first calculation unit is used to calculate the first weight value of each keyword by using the number of clicks of each keyword under each first category and the total number of clicks of each keyword, and generate the corresponding relationship of the first category .
在本发明另一实施例中,所述点击挖掘模块还包括:In another embodiment of the present invention, the click mining module further includes:
第一删除单元,用于将所述历史点击行为数据中的无效关键词删除;所述无效关键词包括:特殊字符关键词及长度为1的关键词。The first deletion unit is configured to delete invalid keywords in the historical click behavior data; the invalid keywords include: keywords with special characters and keywords with a length of 1.
在本发明另一实施例中,所述点击挖掘模块还包括:In another embodiment of the present invention, the click mining module further includes:
检测单元,用于检测每个关键词在各第一类目下的权重值是否小于预定阈值;若是,则触发第二删除单元;The detection unit is used to detect whether the weight value of each keyword under each first category is less than a predetermined threshold; if so, trigger the second deletion unit;
第二删除单元,用于将权重值小于预定阈值的关键词及第一类目的对应关系删除。The second deleting unit is used to delete the corresponding relationship between keywords whose weight value is less than a predetermined threshold and the first category.
在本发明另一实施例中,所述装置还包括:文本语义挖掘模块;In another embodiment of the present invention, the device further includes: a text semantic mining module;
所述文本语义挖掘模块包括:The text semantic mining module includes:
第二获取单元,用于获取商品库内各商品的第一基础信息;所述第一基础信息包括商品标题及属性;The second acquisition unit is used to acquire the first basic information of each commodity in the commodity library; the first basic information includes commodity titles and attributes;
第一拆分单元,用于将各第一基础信息拆分成多个第一商品词汇;A first splitting unit, configured to split each first basic information into a plurality of first commodity words;
第二确定单元,用于确定每个第一商品词汇与各第一基础信息中的各第二类目,在同一个商品的第一基础信息中共同出现的共现次数;The second determining unit is used to determine the number of co-occurrence times that each first commodity vocabulary and each second category in each first basic information co-occur in the first basic information of the same commodity;
第三确定单元,用于确定每个第一商品词汇的总出现次数;a third determining unit, configured to determine the total number of occurrences of each first commodity vocabulary;
第二计算单元,用于利用每个第一商品词汇与各第二类目的共现次数及总出现次数,计算每个第一商品词汇在各第二类目下的第二权重值,生成第二类目对应关系。The second calculation unit is used to calculate the second weight value of each first commodity vocabulary under each second category by using the co-occurrence times and total occurrence times of each first commodity vocabulary and each second category, and generate Correspondence to the second category.
在本发明另一实施例中,所述装置还包括:类目融合模块;In another embodiment of the present invention, the device further includes: a category fusion module;
所述类目融合模块包括:The category fusion module includes:
第三获取单元,用于获取预定周期内交易成功的商品的第二基础信息;所述第二基础信息包括商品标题及属性;The third obtaining unit is used to obtain the second basic information of the commodities successfully traded within the predetermined period; the second basic information includes commodity titles and attributes;
第二拆分单元,用于将各第二基础信息拆分成多个第二商品词汇;a second splitting unit, configured to split each second basic information into a plurality of second commodity words;
第四确定单元,用于确定每个第二商品词汇与各第二基础信息中的各第三类目,在同一个商品的第一基础信息中共同出现的共现次数;The fourth determination unit is used to determine the co-occurrence times of each second commodity vocabulary and each third category in each second basic information in the first basic information of the same commodity;
第五确定单元,用于确定每个第二商品词汇的总出现次数;A fifth determining unit, configured to determine the total number of occurrences of each second commodity vocabulary;
第三计算单元,用于利用每个第二商品词汇与各第三类目的共现次数及总出现次数,计算每个第二商品词汇在各第三类目下的第三权重值,生成第三类目对应关系。The third calculation unit is used to calculate the third weight value of each second commodity vocabulary under each third category by using the co-occurrence times and total occurrence times of each second commodity vocabulary and each third category, and generate Correspondence to the third category.
在本发明另一实施例中,所述确定模块具体用于:获取用户输入的搜索词,将所述搜索词进行查询改写后生成多个目标关键词。In another embodiment of the present invention, the determining module is specifically configured to: obtain a search word input by a user, and rewrite the search word to generate a plurality of target keywords.
参见图7,本发明实施例提供的一种电子设备结构示意图,包括:Referring to FIG. 7 , a schematic structural diagram of an electronic device provided by an embodiment of the present invention includes:
存储器21,用于存储计算机程序;Memory 21, used to store computer programs;
处理器22,用于执行所述计算机程序时实现上述任意方法实施例所述的类目预测方法的步骤。The processor 22 is configured to implement the steps of the category prediction method described in any method embodiment above when executing the computer program.
在本实施例中,设备可以是终端设备或服务器,终端设备包括PC(PersonalComputer,个人电脑)、智能手机、平板电脑、掌上电脑、便携计算机等。In this embodiment, the device may be a terminal device or a server, and the terminal device includes a PC (Personal Computer, personal computer), a smart phone, a tablet computer, a palmtop computer, a portable computer, and the like.
该设备可以包括存储器21、处理器22和总线23。The device may include a memory 21 , a processor 22 and a bus 23 .
其中,存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器21在一些实施例中可以是设备的内部存储单元,例如该设备的硬盘。存储器21在另一些实施例中也可以是设备的外部存储设备,例如设备上配备的插接式硬盘,智能存储卡(SmartMedia Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器21还可以既包括设备的内部存储单元也包括外部存储设备。存储器21不仅可以用于存储安装于设备的应用软件及各类数据,例如执行类目预测方法的程序代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 21 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The storage 21 may be an internal storage unit of the device in some embodiments, such as a hard disk of the device. The memory 21 may also be an external storage device of the device in other embodiments, such as a plug-in hard disk equipped on the device, a smart memory card (SmartMedia Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card ( Flash Card), etc. Further, the memory 21 may also include both an internal storage unit of the device and an external storage device. The memory 21 can not only be used to store application software and various data installed in the device, such as program codes for executing the category prediction method, etc., but also can be used to temporarily store data that has been output or will be output.
处理器22在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器21中存储的程序代码或处理数据,例如执行类目预测方法的程序代码等。In some embodiments, the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips for running program codes or processing stored in the memory 21. Data, such as program codes for executing category prediction methods, etc.
该总线23可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 23 may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 7 , but it does not mean that there is only one bus or one type of bus.
进一步地,设备还可以包括网络接口24,网络接口24可选的可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该设备与其他电子设备之间建立通信连接。Further, the device can also include a network interface 24, and the network interface 24 can optionally include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are usually used for communication between the device and other electronic devices Establish a communication connection.
可选地,该设备还可以包括用户接口25,用户接口25可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口25还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在设备中处理的信息以及用于显示可视化的用户界面。Optionally, the device may further include a user interface 25, which may include a display (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 25 may also include a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, Organic Light-Emitting Diode) touch device, and the like. Wherein, the display may also be properly referred to as a display screen or a display unit, and is used for displaying information processed in the device and for displaying a visualized user interface.
图7仅示出了具有组件21-25的设备,本领域技术人员可以理解的是,图7示出的结构并不构成对设备的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 7 only shows a device with components 21-25. Those skilled in the art can understand that the structure shown in FIG. 7 does not constitute a limitation to the device, and may include fewer or more components than shown in the figure. Or combine certain components, or different component arrangements.
本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意方法实施例所述的类目预测方法的步骤。An embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the category prediction method described in any of the above method embodiments is implemented A step of.
其中,该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Wherein, the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes. medium.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310615330.0A CN116579825A (en) | 2023-05-29 | 2023-05-29 | Category prediction method, device, equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310615330.0A CN116579825A (en) | 2023-05-29 | 2023-05-29 | Category prediction method, device, equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116579825A true CN116579825A (en) | 2023-08-11 |
Family
ID=87533930
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310615330.0A Pending CN116579825A (en) | 2023-05-29 | 2023-05-29 | Category prediction method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116579825A (en) |
-
2023
- 2023-05-29 CN CN202310615330.0A patent/CN116579825A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112632359B (en) | Information recommendation method, device, electronic equipment and storage medium | |
| TWI567673B (en) | Sorting methods and devices for search results | |
| US10354308B2 (en) | Distinguishing accessories from products for ranking search results | |
| CN107784092A (en) | A kind of method, server and computer-readable medium for recommending hot word | |
| EP2842060A1 (en) | Recommending keywords | |
| CN111506831A (en) | Collaborative filtering recommendation module and method, electronic device and storage medium | |
| CN111639255B (en) | Recommendation method and device for search keywords, storage medium and electronic equipment | |
| TWI549004B (en) | Search Method Based on Online Trading Platform and Establishment Method of Device and Web Database | |
| CN111310046A (en) | Object recommendation method and device | |
| CN112579854A (en) | Information processing method, device, equipment and storage medium | |
| WO2014008139A2 (en) | Generating search results | |
| CN110827112A (en) | Deep learning commodity recommendation method and device, computer equipment and storage medium | |
| CN103198118B (en) | A kind of commodity webpage retrogressive method and system | |
| US20150221023A1 (en) | Information providing device, information providing method, information providing program, and computer-readable storage medium storing the program | |
| WO2017088496A1 (en) | Search recommendation method, device, apparatus and computer storage medium | |
| CN111737418A (en) | Method, apparatus and storage medium for predicting relevance of search term and commodity | |
| CN110889024A (en) | Method and device for calculating information-related stock | |
| CN103069418B (en) | Information provider unit, information providing method, program and information recording carrier | |
| CN118657590A (en) | E-commerce product search and recommendation method, device, electronic device and storage medium | |
| CN105389331A (en) | Open source software analyzing and comparing method based on market requirements | |
| CN112288510A (en) | Item recommendation method, device, equipment and storage medium | |
| CN112579896A (en) | Information recommendation method and device, electronic equipment and storage medium | |
| CN116843407A (en) | Online shopping platform recommendation method and system based on big data | |
| CN116579825A (en) | Category prediction method, device, equipment and storage medium | |
| CN119180693A (en) | Commodity pushing method and system based on hidden authentication and used for real-time intelligent recommendation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| CB02 | Change of applicant information |
Country or region after: China Address after: 5th Floor, Zone 2, Building 1, Science and Technology Economic Block 9, Zhuantang Street, Xihu District, Hangzhou City, Zhejiang Province 310024 Applicant after: Zhengcai Cloud Co.,Ltd. Address before: 5 / F, area 2, building 1, No.9, Zhuantang science and technology economic block, Xihu District, Hangzhou City, Zhejiang Province, 310000 Applicant before: ZHENGCAIYUN Co.,Ltd. Country or region before: China |
|
| CB02 | Change of applicant information |