WO2019184217A1 - Hotspot event classification method and apparatus, and storage medium - Google Patents
Hotspot event classification method and apparatus, and storage medium Download PDFInfo
- Publication number
- WO2019184217A1 WO2019184217A1 PCT/CN2018/102083 CN2018102083W WO2019184217A1 WO 2019184217 A1 WO2019184217 A1 WO 2019184217A1 CN 2018102083 W CN2018102083 W CN 2018102083W WO 2019184217 A1 WO2019184217 A1 WO 2019184217A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- event
- information
- word
- preset
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Definitions
- the present application relates to the field of information technology, and in particular, to a hot event classification method, apparatus, and computer readable storage medium.
- the present application provides a hot event classification method, apparatus, and computer readable storage medium, the main purpose of which is to improve the speed and accuracy of hot event classification on social media.
- the present application provides a method for classifying hotspot events, including:
- Obtaining step obtaining, in real time, a first preset number of information texts published by the user from a predetermined server;
- Word segmentation step segmenting the above information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text;
- the classification step is: determining whether the hot event indicator value is greater than a preset threshold, and if the hot event indicator value is greater than a preset threshold, acquiring an information vector of the information text corresponding to the feature word by using a preset vectorization manner, and inputting the information vector In the pre-trained event classification model, the event type corresponding to the information text is determined.
- the present application further provides an electronic device, including: a memory and a processor, the hotspot event classification program is stored on the memory, and the hotspot event classification program is executed by the processor, and the following steps can be implemented:
- Word segmentation step segmenting the above information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text;
- Calculating step calculating a hotspot event index value corresponding to the feature word according to a preset calculation formula
- the classification step is: determining whether the hot event indicator value is greater than a preset threshold, and if the hot event indicator value is greater than a preset threshold, acquiring an information vector of the information text corresponding to the feature word by using a preset vectorization manner, and inputting the information vector In the pre-trained event classification model, the event type corresponding to the information text is determined.
- the present application further provides a computer readable storage medium, where the computer readable storage medium includes a hotspot event classification program, and when the hotspot event classification program is executed by a processor, the foregoing can be implemented as described above. Any step in the hot event classification method.
- the hot event classification method, the electronic device and the computer readable storage medium proposed by the application obtain the information texts published by the social account in the server, and segment the information text to extract the feature words, and then calculate the maximum corresponding to the feature words. Probabilistic event theme, and using the preset calculation formula to calculate the event index value corresponding to the feature word, and finally vectorizing the information text corresponding to the feature word whose event index value is greater than the preset threshold value, and input the event classification model, thereby accurately Determine the event type of the information text and improve the event classification speed.
- FIG. 1 is a schematic diagram of a preferred embodiment of an electronic device of the present application.
- FIG. 2 is a block diagram showing a preferred embodiment of the hot event classification procedure of FIG. 1;
- FIG. 1 is a schematic diagram of a preferred embodiment of an electronic device 1 of the present application.
- FIG. 2 for a schematic diagram of a preferred embodiment of the hotspot event classification procedure 10
- FIG. 3 for an overview of a flow chart of a preferred embodiment of the hot event classification method.
- the determining module 130 is configured to extract a feature word preset in the word segment, and determine a event topic corresponding to the feature word by using a predetermined probability algorithm.
- the feature words are pre-labeled and stored in the thesaurus 15.
- the predetermined probability algorithm includes calculating a final probability P 3 according to the first selection probability P 1 and the second selection probability P 2 .
- a second predetermined number of implicit event topics are added between the feature words and the event topic text, the hidden event topics being virtual and having no real meaning.
- FIG. 3 it is a flowchart of a preferred embodiment of the hot event classification method of the present application.
- the training is completed. If the accuracy is less than or equal to the preset value, the number of sample data is increased, and then the step of dividing the sample data into the training set and the verification set is returned. Assume that the default value is 98%. If the verification accuracy is greater than 98%, the training is completed. If the accuracy is less than 98%, then 20,000 sample data is added, and then the steps of dividing the sample data into the training set and the verification set are returned.
- the hot event classification method proposed by the above embodiment obtains the information text published by the user from the server, performs word segmentation processing on the information text, extracts feature words in the word segmentation, and then calculates a maximum probability event of the feature word by using a predetermined probability algorithm.
- Calculating step calculating a hot event indicator value corresponding to the feature word according to a preset calculation formula
- the preset calculation formula is as follows:
- v represents the speed of event development
- a represents the hot event indicator value
- t represents the time point
- T represents the time interval
- i is an integer
- t i represents the time point at which the i-th feature word appears
- X i represents the i-th feature The number of times a word appears.
- the predetermined word segmentation rule comprises:
- the predetermined probability algorithm comprises:
- P 1 represents the first selection probability
- P 2 represents the second selection probability
- P 3 represents the final probability
- the preset vectorization manner includes:
- the event classification model is a long-term and short-term memory network model, and the training steps of the event classification model are as follows:
- the technical solution of the present application which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM as described above). , a disk, an optical disk, including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.
- a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请要求于2018年03月26日提交中国专利局、申请号为201810252849.6,名称为“热点事件分类方法、装置及存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合本申请中。The present application claims priority to Chinese Patent Application No. 201100252849.6, entitled "Hot Spot Event Classification Method, Apparatus and Storage Medium", which is filed on March 26, 2018, the entire contents of which are incorporated by reference. The way it is combined with this application.
本申请涉及信息技术领域,尤其涉及一种热点事件分类方法、装置及计算机可读存储介质。The present application relates to the field of information technology, and in particular, to a hot event classification method, apparatus, and computer readable storage medium.
随着网络技术的发展,社交媒体的运用也越来越广泛,社交媒体中的各种事件数量也与日俱增。面对数量暴增的事件,如何快速分辨社交媒体的事件类型,了解社交媒体用户关心的领域及热门话题,并作出相应决策已成为管理者面临的难题。With the development of network technology, the use of social media has become more widespread, and the number of events in social media has also increased. In the face of the number of incidents, how to quickly identify the types of events in social media, understand the areas of social media users and hot topics, and make corresponding decisions has become a difficult problem for managers.
目前,现有的社交媒体热点事件分类方法不完善,亟待一种分类方法能够在热点事件发展的早期,准确、快速的分析出热点事件的事件类型。At present, the existing classification methods of social media hotspot events are not perfect, and a classification method is needed to accurately and quickly analyze the event types of hot events in the early stage of hot event development.
发明内容Summary of the invention
鉴于以上内容,本申请提供一种热点事件分类方法、装置及计算机可读存储介质,其主要目的在于提高社交媒体上热点事件分类的速度及准确性。In view of the above, the present application provides a hot event classification method, apparatus, and computer readable storage medium, the main purpose of which is to improve the speed and accuracy of hot event classification on social media.
为实现上述目的,本申请提供一种热点事件分类方法,该方法包括:To achieve the above objective, the present application provides a method for classifying hotspot events, including:
获取步骤:实时从预先确定的服务器中获取第一预设数量用户发布的信息文本;Obtaining step: obtaining, in real time, a first preset number of information texts published by the user from a predetermined server;
分词步骤:利用预先确定的分词规则对上述信息文本进行分词,获得各个信息文本对应的分词;Word segmentation step: segmenting the above information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text;
确定步骤:提取出分词中预设的特征词,利用预先确定的概率算法确定该特征词对应的事件主题;Determining step: extracting a feature word preset in the word segment, and determining a event theme corresponding to the feature word by using a predetermined probability algorithm;
计算步骤:根据预设的计算公式,计算出该特征词对应的热点事件指标值;Calculating step: calculating a hot event indicator value corresponding to the feature word according to a preset calculation formula;
分类步骤:判断热点事件指标值是否大于预设阈值,若热点事件指标值大于预设阈值,则利用预设的向量化方式获取该特征词对应的信息文本的信息向量,将所述信息向量输入预先训练的事件分类模型中,确定出该信息文本对应的事件类型。The classification step is: determining whether the hot event indicator value is greater than a preset threshold, and if the hot event indicator value is greater than a preset threshold, acquiring an information vector of the information text corresponding to the feature word by using a preset vectorization manner, and inputting the information vector In the pre-trained event classification model, the event type corresponding to the information text is determined.
此外,本申请还提供一种电子装置,该电子装置包括:存储器及处理器,所述存储器上存储热点事件分类程序,所述热点事件分类程序被所述处理器执行,可实现如下步骤:In addition, the present application further provides an electronic device, including: a memory and a processor, the hotspot event classification program is stored on the memory, and the hotspot event classification program is executed by the processor, and the following steps can be implemented:
获取步骤:实时从预先确定的服务器中获取第一预设数量用户发布的信息文本;Obtaining step: obtaining, in real time, a first preset number of information texts published by the user from a predetermined server;
分词步骤:利用预先确定的分词规则对上述信息文本进行分词,获得各个信息文本对应的分词;Word segmentation step: segmenting the above information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text;
确定步骤:提取出分词中预设的特征词,利用预先确定的概率算法确定该特征词对应的事件主题;Determining step: extracting a feature word preset in the word segment, and determining a event theme corresponding to the feature word by using a predetermined probability algorithm;
计算步骤:根据预设的计算公式,计算出该特征词对应的热点事件指标指值;Calculating step: calculating a hotspot event index value corresponding to the feature word according to a preset calculation formula;
分类步骤:判断热点事件指标值是否大于预设阈值,若热点事件指标值大于预设阈值,则利用预设的向量化方式获取该特征词对应的信息文本的信息向量,将所述信息向量输入预先训练的事件分类模型中,确定出该信息文本对应的事件类型。The classification step is: determining whether the hot event indicator value is greater than a preset threshold, and if the hot event indicator value is greater than a preset threshold, acquiring an information vector of the information text corresponding to the feature word by using a preset vectorization manner, and inputting the information vector In the pre-trained event classification model, the event type corresponding to the information text is determined.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括热点事件分类程序,所述热点事件分类程序被处理器执行时,可实现如上所述热点事件分类方法中的任意步骤。In addition, in order to achieve the above object, the present application further provides a computer readable storage medium, where the computer readable storage medium includes a hotspot event classification program, and when the hotspot event classification program is executed by a processor, the foregoing can be implemented as described above. Any step in the hot event classification method.
本申请提出的热点事件分类方法、电子装置及计算机可读存储介质,通过获取服务器中社交账号发布的信息文本,并对所述信息文本进行分词,提取出特征词,接着计算特征词对应的最大概率的事件主题,并利用预设的计算公式计算出特征词对应的事件指标值,最后将事件指标值大于预设阈值的特征词所对应的信息文本向量化,输入事件分类模型中,从而准确地判断该信息文本的事件类型,提高事件分类速度。The hot event classification method, the electronic device and the computer readable storage medium proposed by the application obtain the information texts published by the social account in the server, and segment the information text to extract the feature words, and then calculate the maximum corresponding to the feature words. Probabilistic event theme, and using the preset calculation formula to calculate the event index value corresponding to the feature word, and finally vectorizing the information text corresponding to the feature word whose event index value is greater than the preset threshold value, and input the event classification model, thereby accurately Determine the event type of the information text and improve the event classification speed.
图1为本申请电子装置较佳实施例的示意图;1 is a schematic diagram of a preferred embodiment of an electronic device of the present application;
图2为图1中热点事件分类程序较佳实施例的模块示意图;2 is a block diagram showing a preferred embodiment of the hot event classification procedure of FIG. 1;
图3为本申请热点事件分类方法较佳实施例的流程图;3 is a flowchart of a preferred embodiment of a hot event classification method according to the present application;
图4为本申请事件分类模型训练的流程图。FIG. 4 is a flowchart of the training of the event classification model of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
如图1所示,是本申请电子装置1较佳实施例的示意图。FIG. 1 is a schematic diagram of a preferred embodiment of an electronic device 1 of the present application.
在本实施例中,电子装置1可以是服务器、智能手机、平板电脑、个人电脑、便携计算机以及其它具有运算功能的电子设备。In this embodiment, the electronic device 1 may be a server, a smart phone, a tablet computer, a personal computer, a portable computer, and other electronic devices having computing functions.
该电子装置1包括:存储器11、处理器12、网络接口13、通信总线14及词库15。其中,网络接口13可选地可以包括标准的有线接口、无线接口(如WI-FI接口)。通信总线14用于实现这些组件之间的连接通信。The electronic device 1 includes a memory 11, a processor 12, a network interface 13, a communication bus 14, and a thesaurus 15. The network interface 13 can optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). Communication bus 14 is used to implement connection communication between these components.
存储器11至少包括一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,所述存储器11可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,所述存储器11也可以是所述电子装置1的外部存储单元,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like. In some embodiments, the memory 11 may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage unit of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC), and security. Digital (Secure Digital, SD) card, flash card (Flash Card), etc.
在本实施例中,所述存储器11不仅可以用于存储安装于所述电子装置1的应用软件及各类数据,例如热点事件分类程序10、词库15等。其中,词库15用于存放分词过程中所涉及的所有字和词及标注的特征词。In this embodiment, the memory 11 can be used not only for storing application software and various types of data installed in the electronic device 1, such as the hot event classification program 10, the vocabulary 15, and the like. The vocabulary 15 is used to store all the words and words involved in the word segmentation process and the feature words of the annotation.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其它数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行热点事件分类程序10的计算机程序代码、事件分类模型的训练等。The processor 12, in some embodiments, may be a Central Processing Unit (CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as performing hotspot event classification. The computer program code of the program 10, the training of the event classification model, and the like.
图1仅示出了具有组件11-15以及热点事件分类程序10的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。1 shows only the electronic device 1 having the components 11-15 and the hotspot event classification program 10, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
可选地,该电子装置1还可以包括显示器,显示器可以称为显示屏或显示单元。在一些实施例中显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置1中处理的信息以及用于显示可视化的工作界面,例如信息文本的事件类型。Optionally, the electronic device 1 may further include a display, which may be referred to as a display screen or a display unit. In some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an Organic Light-Emitting Diode (OLED) touch sensor. The display is used to display information processed in the electronic device 1 and a work interface for displaying visualizations, such as event types of information text.
可选地,该电子装置1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may further include a user interface, and the user interface may include an input unit such as a keyboard, a voice output device such as an audio, a headphone, etc., optionally, the user interface may further include a standard wired interface and a wireless interface. .
该电子装置1还可以包括射频(Radio Frequency,RF)电路、传感器和音频电路等等,在此不再赘述。The electronic device 1 may further include a radio frequency (RF) circuit, a sensor, an audio circuit, and the like, and details are not described herein.
在图1所示的电子装置1实施例中,作为一种计算机存储介质的存储器11中存储热点事件分类程序10的程序代码,处理器12执行热点事件分类程序10的程序代码时,实现如下步骤:In the embodiment of the electronic device 1 shown in FIG. 1, the program code of the hotspot event classification program 10 is stored in the memory 11 as a computer storage medium, and when the processor 12 executes the program code of the hotspot event classification program 10, the following steps are implemented. :
获取步骤:实时从预先确定的服务器中获取第一预设数量用户发布的信息文本;Obtaining step: obtaining, in real time, a first preset number of information texts published by the user from a predetermined server;
分词步骤:利用预先确定的分词规则对上述信息文本进行分词,获得各个信息文本对应的分词;Word segmentation step: segmenting the above information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text;
确定步骤:提取出分词中预设的特征词,利用预先确定的概率算法确定该特征词对应的事件主题;Determining step: extracting a feature word preset in the word segment, and determining a event theme corresponding to the feature word by using a predetermined probability algorithm;
计算步骤:根据预设的计算公式,计算出该特征词对应的热点事件指标值;Calculating step: calculating a hot event indicator value corresponding to the feature word according to a preset calculation formula;
分类步骤:判断热点事件指标值是否大于预设阈值,若热点事件指标值大于预设阈值,则利用预设的向量化方式获取该特征词对应的信息文本的信息向量,将所述信息向量输入预先训练的事件分类模型中,确定出该信息文本对应的事件类型。The classification step is: determining whether the hot event indicator value is greater than a preset threshold, and if the hot event indicator value is greater than a preset threshold, acquiring an information vector of the information text corresponding to the feature word by using a preset vectorization manner, and inputting the information vector In the pre-trained event classification model, the event type corresponding to the information text is determined.
具体原理请参照下述图2关于热点事件分类程序10较佳实施例的模块示 意图及图3关于热点事件分类方法较佳实施例的流程图的介绍。For specific principles, please refer to the following FIG. 2 for a schematic diagram of a preferred embodiment of the hotspot event classification procedure 10 and FIG. 3 for an overview of a flow chart of a preferred embodiment of the hot event classification method.
如图2所示,是图1中热点事件分类程序10较佳实施例的模块示意图。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。As shown in FIG. 2, it is a block diagram of a preferred embodiment of the hotspot event classification program 10 of FIG. A module as referred to in this application refers to a series of computer program instructions that are capable of performing a particular function.
在本实施例中,热点事件分类程序10包括:获取模块110、分词模块120、确定模块130、计算模块140、判断模块150及分类模块160,所述模块110-160所实现的功能或操作步骤均与上文类似,此处不再详述,示例性地,例如其中:In this embodiment, the hotspot event classification program 10 includes: an acquisition module 110, a word segmentation module 120, a determination module 130, a calculation module 140, a determination module 150, and a classification module 160, and functions or operation steps implemented by the modules 110-160 Both are similar to the above, and will not be described in detail here, exemplarily, for example:
获取模块110,用于实时从预先确定的服务器中获取第一预设数量用户发布的信息文本。其中,所述预先确定的服务器可以是微信服务器、微博服务器、QQ服务器等社交服务器。所述用户是指社交服务器的社交账号,所述第一预设数量用户可以指社交服务器的部分社交账号,也可以指社交服务器的全部社交账号。The obtaining module 110 is configured to obtain, in real time, a first preset number of user-published information texts from a predetermined server. The predetermined server may be a social server such as a WeChat server, a Weibo server, or a QQ server. The user refers to a social account of the social server, and the first preset number of users may refer to a part of the social account of the social server, and may also refer to all social accounts of the social server.
分词模块120,用于利用预先确定的分词规则对上述信息文本进行分词,获得各个信息文本对应的分词。其中,所述预先确定的分词规则包括:根据预设类型标点符号,如“,”、“。”、“!”、“;”、“?”等等,将获取的各个信息文本拆分成短句。根据词库15中存储的词语,利用长词优先原则对每个短句进行分词。所述长词优先原则是指从词库15中找出与短句相同的最长词语作为该短句的一个分词。The word segmentation module 120 is configured to segment the information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text. The predetermined word segmentation rule includes: splitting the obtained information text according to preset type punctuation marks, such as “,”, “.”, “!”, “;”, “?”, and the like. Short sentence. According to the words stored in the thesaurus 15, each short sentence is segmented using the long word priority principle. The long word priority principle refers to finding the longest word from the lexicon 15 that is the same as the short sentence as a participle of the short sentence.
确定模块130,用于提取出分词中预设的特征词,利用预先确定的概率算法确定该特征词对应的事件主题。其中,所述特征词是预先标注并存储于词库15中的。所述预先确定的概率算法包括:根据第一选择概率P 1和第二选择概率P 2计算出最终概率P 3。在特征词与事件主题文本之间添加第二预设数量的隐含事件主题,所述隐含事件主题是虚拟的,没有真实含义。第一选择概率P 1的计算方法:根据预先确定的隐含事件主题与特征词的映射关系,确定每个隐含事件主题含有的特征词的第一数量X 1及每个特征词所属的隐含事件主题的第二数量X 2,根据X 1和X 2确定每个特征词对各个隐含事件主题的第一选择概率P 1=1/(X 1*X 2)。第二选择概率P 2的计算方法:根据预先确定的隐含事件主题与事件主题的映射关系,确定每个事件主题含有的隐含事件主题的第三数量X 3及每个隐含事件主题所属的事件主题的第四数量X 4,根据X 3 和X 4确定每个隐含事件主题对各个事件主题的第二选择概率P 2=1/(X 3*X 4)。将P 1和P 2代入预先确定的概率计算公式,计算出每个特征词对各个事件主题的最终概率P 3。所述预先确定的概率计算公式为P 3=P 1*P 2。 The determining module 130 is configured to extract a feature word preset in the word segment, and determine a event topic corresponding to the feature word by using a predetermined probability algorithm. The feature words are pre-labeled and stored in the thesaurus 15. The predetermined probability algorithm includes calculating a final probability P 3 according to the first selection probability P 1 and the second selection probability P 2 . A second predetermined number of implicit event topics are added between the feature words and the event topic text, the hidden event topics being virtual and having no real meaning. The calculation method of the first selection probability P 1 : determining the first quantity X 1 of the feature words contained in each implicit event topic and the hiddenness of each feature word according to the mapping relationship between the predetermined implicit event topic and the feature words A second number X 2 containing event subjects, the first selection probability P 1 =1/(X 1 *X 2 ) of each feature word for each implicit event subject is determined according to X 1 and X 2 . The second selection probability P 2 is calculated according to a predetermined mapping relationship between the implicit event topic and the event topic, determining a third quantity X 3 of the implicit event topic contained in each event topic and each implicit event topic belongs to The fourth number X 4 of event subjects, based on X 3 and X 4 , determines a second selection probability P 2 =1/(X 3 *X 4 ) for each event topic for each event topic. Substituting P 1 and P 2 into a predetermined probability calculation formula, the final probability P 3 of each feature word for each event subject is calculated. The predetermined probability is calculated as P 3 =P 1 *P 2 .
计算模块140,用于根据预设的计算公式,计算出该特征词对应的热点事件指标值。其中,所述预设的计算公式如下:The calculation module 140 is configured to calculate a hot event indicator value corresponding to the feature word according to a preset calculation formula. Wherein, the preset calculation formula is as follows:
其中,v代表事件发展的速度,a代表热点事件指标值,即事件发展的“加速的”,t代表时间点,T代表时间间隔,i为整数,t i代表第i个特征词出现的时间点,X i代表第i个特征词出现的次数。 Where v represents the speed of event development, a represents the hot event indicator value, ie “accelerated” of event development, t represents time point, T represents time interval, i is an integer, and t i represents the time when the i-th feature word appears. Point, X i represents the number of occurrences of the i-th feature word.
判断模块150,用于判断热点事件指标值是否大于预设阈值。所述预设阈值是预先设置的,当热点事件指标值大于预设阈值时,则表明该事件主题的事件发展的“加速度”已经超越了一定范围,应立即分析事件的类型。The determining module 150 is configured to determine whether the hot event indicator value is greater than a preset threshold. The preset threshold is preset. When the hot event indicator value is greater than the preset threshold, it indicates that the “acceleration” of the event development of the event subject has exceeded a certain range, and the type of the event should be analyzed immediately.
分类模块160,用于当热点事件指标值大于预设阈值时,利用预设的向量化方式获取该特征词对应的信息文本的信息向量,将所述信息向量输入预先训练的事件分类模型中,确定出该信息文本对应的事件类型。其中,所述预设的向量化方式包括:使用自动编码器对信息文本的用户信息进行编码,生成用户信息向量;使用预先确定的词向量模型对该信息文本进行词向量编码,生成该信息文本的文本信息向量;将用户信息向量与文本信息向量拼接起来生成该信息文本对应的信息向量。The classification module 160 is configured to acquire an information vector of the information text corresponding to the feature word by using a preset vectorization manner when the hot event indicator value is greater than a preset threshold, and input the information vector into the pre-trained event classification model. Determine the type of event corresponding to the information text. The preset vectorization manner includes: encoding an user information of the information text by using an automatic encoder to generate a user information vector; and performing word vector coding on the information text by using a predetermined word vector model to generate the information text. The text information vector; the user information vector is combined with the text information vector to generate an information vector corresponding to the information text.
所述事件分类模型为长短期记忆网络模型,如图4所示,是本申请事件分类模型训练的流程图,所述事件分类模型的训练步骤如下:The event classification model is a long-term and short-term memory network model, as shown in FIG. 4, which is a flowchart of the training of the event classification model of the present application, and the training steps of the event classification model are as follows:
获取第三预设数量的信息文本,并生成各个信息文本对应的信息向量,根据预先确定的信息文本与事件类型的映射关系,确定各个信息向量对应的事件类型,并将信息向量与事件类型的映射关系数据作为样本数据;Obtaining a third preset number of information texts, and generating an information vector corresponding to each information text, determining an event type corresponding to each information vector according to a predetermined mapping relationship between the information text and the event type, and determining the information vector and the event type Mapping relational data as sample data;
将样本数据分成第一比例的训练集和第二比例的验证集,其中,第一比例大于第二比例;Dividing the sample data into a training set of a first ratio and a verification set of a second ratio, wherein the first ratio is greater than the second ratio;
利用训练集中的样本数据对所述事件分类模型进行训练,并在训练完后利用验证集中的样本数据对所述事件分类模型的准确率进行验证;The event classification model is trained by using sample data in the training set, and the accuracy of the event classification model is verified by using sample data in the verification set after training;
若准确率大于预设值,则训练完成,若准确率小于或等于预设值,则增加样本数据的数量,之后返回将样本数据分成训练集和验证集的步骤。If the accuracy is greater than the preset value, the training is completed. If the accuracy is less than or equal to the preset value, the number of sample data is increased, and then the step of dividing the sample data into the training set and the verification set is returned.
如图3所示,是本申请热点事件分类方法较佳实施例的流程图。As shown in FIG. 3, it is a flowchart of a preferred embodiment of the hot event classification method of the present application.
在本实施例中,处理器12执行存储器11中存储的热点事件分类程序10的计算机程序时实现热点事件分类方法包括:步骤S10-步骤S60:In this embodiment, when the processor 12 executes the computer program of the hotspot event classification program 10 stored in the memory 11, the hotspot event classification method includes: Step S10 - Step S60:
步骤S10,获取模块110实时从预先确定的服务器中获取第一预设数量用户发布的信息文本。其中,所述预先确定的服务器可以是微信服务器、微博服务器、QQ服务器等社交服务器。所述用户是指社交服务器的社交账号,所述第一预设数量用户可以指社交服务器的部分社交账号,也可以指社交服务器的全部社交账号。例如,从微信服务器中获取销售业务员A 1的微信账号在朋友圈或朋友群发布的信息文本。 In step S10, the obtaining module 110 acquires the information text of the first preset number of users from the predetermined server in real time. The predetermined server may be a social server such as a WeChat server, a Weibo server, or a QQ server. The user refers to a social account of the social server, and the first preset number of users may refer to a part of the social account of the social server, and may also refer to all social accounts of the social server. For example, the information text of the WeChat account of the sales clerk A 1 in the circle of friends or the group of friends is obtained from the WeChat server.
步骤S20,根据获取的信息文本,分词模块120利用预先确定的分词规则对上述信息文本进行分词,获得各个信息文本对应的分词。所述分词是指将信息文本分成字或词。例如,信息文本是“B 1成功研制出了C 1产品”,分词后的结果为“B 1”、“成功”、“研制”、“出”、“了”、“C 1”、“产品”,其中,B 1可以是公司或部门,C 1可以是产品名称。其中,所述预先确定的分词规则包括:根据预设类型标点符号,如“,”、“。”、“!”、“;”、“?”等等,将获取的各个信息文本拆分成短句。例如,从信息文本的起始位置(第一个字)至第一个预设类型标点符号之间的信息为一个短句,第一个预设类型标点符号至第二个预设类型标点符号之间的信息为一个短句,……,每两个预设类型标点符号之间的信息为一个短句,直至将该信息文本全部拆分成短句。但应理解的是,若信息结束位置无预设类型标点符号,则从倒数第一预设类型标点符号至信息结束位置(最后一个字)之间的信息为一个短句。根据词库15中存储的词语,利用长词优先原则对每个短句进行分词。其中所述长词优先原则是指从词库15中找出与短句相同的最长词语作为该短句的一个分词。假设,需要分词的短句T1的第一个字是a,先从第一个字a开始,在词库15中找出一个由a开始的最长词语R 1,R 1与T 1部分相同,然后从T1中剔除R 1剩下T 2部分,再对T 2采用相同的方法直至从词库15中找出T 1的所有字和词,得到的结果为“R 1/R 2……”。 In step S20, according to the acquired information text, the word segmentation module 120 performs segmentation on the information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text. The word segmentation refers to dividing the information text into words or words. For example, the message text is “B 1 successfully developed the C 1 product”, and the results after the word segmentation are “B 1 ”, “success”, “development”, “out”, “to”, “C 1 ”, “product” ", where B 1 can be a company or department, and C 1 can be a product name. The predetermined word segmentation rule includes: splitting the obtained information text according to preset type punctuation marks, such as “,”, “.”, “!”, “;”, “?”, and the like. Short sentence. For example, the information from the start position of the message text (the first word) to the first preset type punctuation mark is a short sentence, the first preset type punctuation mark to the second preset type punctuation mark The information between the two is a short sentence, ..., the information between each of the two preset types of punctuation marks is a short sentence until the information text is completely split into short sentences. However, it should be understood that if there is no preset type punctuation at the end position of the information, the information from the last preset type punctuation mark to the information end position (the last word) is a short sentence. According to the words stored in the thesaurus 15, each short sentence is segmented using the long word priority principle. The long word priority principle refers to finding the longest word from the lexicon 15 that is the same as the short sentence as a participle of the short sentence. Suppose, first required a word phrase is a word T1, starting with a first word begins, starting from a find a longest word in lexicon R 1 15, R 1 and T 1 part of the same and T 2 from a rest portion T1 excluded R, T 2 again in the same manner until all words and identify the words in the thesaurus from T 1 as 15, the result is "R 1 / R 2 ...... ".
步骤S30,若信息文本的分词中含有词库15存储的特征词,则确定模块130利用预先确定的概率算法确定该特征词对应的事件主题。但应理解的是,信息文本的分词中可能不含有特征词,也可能含有一个或多个特征词。所述特征词是预先标注并存储于词库15中的。In step S30, if the word segmentation of the information text contains the feature word stored in the thesaurus 15, the determining module 130 determines the event subject corresponding to the feature word by using a predetermined probability algorithm. However, it should be understood that the word segmentation of the information text may not contain feature words or one or more feature words. The feature words are pre-labeled and stored in the lexicon 15.
其中,所述预先确定的概率算法包括:在特征词与事件主题文本之间添加第二预设数量的隐含事件主题,所述隐含事件主题是虚拟的,没有真实含义。例如,在特征词与事件主题文本之间添加50个隐含事件主题:k 1,k 2,……,k 50。根据预先确定的隐含事件主题与特征词的映射关系,确定每个隐含事件主题含有的特征词的第一数量X 1及每个特征词所属的隐含事件主题的第二数量X 2,根据第一数量X 1和第二数量X 2确定每个特征词对各个隐含事件主题的第一选择概率P 1=1/(X 1*X 2)。例如,特征词Y所属的隐含事件主题的第二数量为5,其中一个隐含事件主题k 7含有的特征词的第一数量为7,则该特征词Y对该隐含事件主题k 7的第一选择概率为1/35。根据预先确定的隐含事件主题与事件主题的映射关系,确定每个事件主题含有的隐含事件主题的第三数量X 3及每个隐含事件主题所属的事件主题的第四数量X 4,根据第三数量X 3和第四数量X 4确定每个隐含事件主题对各个事件主题的第二选择概率P 2=1/(X 3*X 4)。例如,隐含事件主题k 7所属的事件主题的第四数量为4,其中一个事件主题Z含有的隐含事件主题的第三数量为5,则该隐含事件主题k 7对事件主题Z的第二选择概率为1/20。将第一选择概率P 1和第二选择概率P 2代入预先确定的概率计算公式,计算出每个特征词对各个事件主题的最终概率P 3的分布。所述预先确定的概率计算公式为P 3=P 1*P 2。例如,特征词Y对隐含事件主题k 7的第一选择概率P 1为1/35,隐含事件主题k 7对事件主题文本Z的第二选择概率P 2为1/20,则特征词Y对事件主题文本Z的最终概率P 3为1/700。同理,算出特征词Y对其它事件主题文本的最终概率P3及该信息文本的其它特征词的各个事件主题文本的最终概率P 3。最后将各个特征词对应的最大概率的事件主题作为该特征词对应的事件主题。 The predetermined probability algorithm includes: adding a second preset number of implicit event topics between the feature words and the event topic text, the hidden event topics being virtual and having no real meaning. For example, add 50 implicit event themes between the feature word and the event topic text: k 1 , k 2 , ..., k 50 . Determining, according to a predetermined mapping relationship between the subject matter of the implicit event and the feature word, a first quantity X 1 of the feature words contained in each implicit event topic and a second quantity X 2 of the implicit event subject to which each feature word belongs, A first selection probability P 1 =1/(X 1 *X 2 ) of each feature word for each implied event subject is determined according to the first quantity X 1 and the second quantity X 2 . For example, the second number of implicit event topics to which the feature word Y belongs is 5, wherein the first number of feature words contained in one hidden event topic k 7 is 7, and the feature word Y is the implicit event topic k 7 The first selection probability is 1/35. Determining, according to a predetermined mapping relationship between the implicit event topic and the event topic, a third quantity X 3 of the implicit event topic included in each event topic and a fourth number X 4 of the event topic to which each implicit event topic belongs, A second selection probability P 2 =1/(X 3 *X 4 ) for each event subject for each event subject is determined according to the third number X 3 and the fourth number X 4 . For example, the fourth number of event topics to which the implicit event topic k 7 belongs is 4, wherein the third number of implicit event topics contained in one event topic Z is 5, then the implicit event topic k 7 is for the event topic Z The second selection probability is 1/20. The first selection probability P 1 and the second selection probability P 2 are substituted into a predetermined probability calculation formula, and the distribution of the final probability P 3 of each feature word to each event topic is calculated. The predetermined probability is calculated as P 3 =P 1 *P 2 . For example, the first selection probability P 1 of the feature word Y to the implicit event topic k 7 is 1/35, and the second selection probability P 2 of the event theme topic k 7 to the event subject text Z is 1/20, then the feature word The final probability P 3 of Y to the event subject text Z is 1/700. Similarly, the final probability P3 of the feature word Y to other event subject texts and the final probability P 3 of each event topic text of other feature words of the information text are calculated. Finally, the event theme of the maximum probability corresponding to each feature word is used as the event topic corresponding to the feature word.
步骤S40,计算模块140根据预设的计算公式,计算出每个特征词对应的热点事件指标值。其中,所述预设的计算公式如下:In step S40, the calculation module 140 calculates a hot event indicator value corresponding to each feature word according to a preset calculation formula. Wherein, the preset calculation formula is as follows:
其中,v代表事件发展的速度,a代表热点事件指标值,即事件发展的“加速的”,t代表时间点,T代表时间间隔,i为整数,t i代表第i个特征词出现的时间点,X i代表第i个特征词出现的次数。从而计算出所有特征词对应的事件主题的热点事件指标值,热点指标值越大,代表该事件主题的事件发展趋势越快。 Where v represents the speed of event development, a represents the hot event indicator value, ie “accelerated” of event development, t represents time point, T represents time interval, i is an integer, and t i represents the time when the i-th feature word appears. Point, X i represents the number of occurrences of the i-th feature word. Therefore, the hot event indicator value of the event topic corresponding to all feature words is calculated, and the hotspot index value is larger, and the event development trend representing the event topic is faster.
步骤S50,判断模块150判断热点事件指标值是否大于预设阈值。所述预设阈值是预先设置的,当热点事件指标值大于预设阈值时,则表明该事件主题的事件发展的“加速度”已经超越了一定范围,应立即分析事件的类型。In step S50, the determining module 150 determines whether the hot event indicator value is greater than a preset threshold. The preset threshold is preset. When the hot event indicator value is greater than the preset threshold, it indicates that the “acceleration” of the event development of the event subject has exceeded a certain range, and the type of the event should be analyzed immediately.
步骤S60,若热点事件指标值大于预设阈值,则分类模块150利用预设的向量化方式获取该特征词对应的信息文本的信息向量,将所述信息向量输入预先训练的事件分类模型中,确定出该信息文本对应的事件类型。其中,所述预设的向量化方式包括:使用自动编码器,如Auto-Encoder对信息文本的用户信息进行编码,生成用户信息向量。进一步地,所述Auto-Encoder是一种无监督的学习算法,主要用于数据的降维或特征抽取。接着使用预先确定的词向量模型对该信息文本进行词向量编码,生成该信息文本的文本信息向量。所述预先确定的词向量模型可以是Word2Vec模型或Doc2Vec模型。例如,使用Word2Vec模型对该信息文本进行词向量编码,生成该信息文本的文本信息向量。最后将用户信息向量与文本信息向量拼接起来生成该信息文本对应的信息向量。Step S60: If the hot event indicator value is greater than the preset threshold, the classification module 150 acquires the information vector of the information text corresponding to the feature word by using a preset vectorization manner, and inputs the information vector into the pre-trained event classification model. Determine the type of event corresponding to the information text. The preset vectorization manner includes: using an automatic encoder, such as an Auto-Encoder, encoding user information of the information text to generate a user information vector. Further, the Auto-Encoder is an unsupervised learning algorithm mainly used for data dimensionality reduction or feature extraction. The information text is then subjected to word vector coding using a predetermined word vector model to generate a text information vector of the information text. The predetermined word vector model may be a Word2Vec model or a Doc2Vec model. For example, the information text is coded using the Word2Vec model to generate a text information vector of the information text. Finally, the user information vector and the text information vector are spliced together to generate an information vector corresponding to the information text.
其中,所述事件分类模型为LSTM模型,如图4所示,是本申请事件分类模型训练的流程图,所述事件分类模型的训练步骤如下:The event classification model is an LSTM model, as shown in FIG. 4, which is a flowchart of the event classification model training of the present application. The training steps of the event classification model are as follows:
获取第三预设数量的信息文本,并生成各个信息文本对应的信息向量,根据预先确定的信息文本与事件类型的映射关系,确定各个信息向量对应的事件类型,并将信息向量与事件类型的映射关系数据作为样本数据。例如,从微博服务器中获取10万个信息文本,标注信息文本的事件类型,并将信息文本生成10万个对应的信息向量,根据预先确定的信息文本与事件类型的映射关系,确定各个信息文本的事件类型,将信息向量与对应的事件类型的映射关系作为样本数据。Obtaining a third preset number of information texts, and generating an information vector corresponding to each information text, determining an event type corresponding to each information vector according to a predetermined mapping relationship between the information text and the event type, and determining the information vector and the event type Map relational data as sample data. For example, 100,000 pieces of information text are obtained from the microblog server, the event type of the information text is marked, and the information text is generated into 100,000 corresponding information vectors, and each information is determined according to a predetermined mapping relationship between the information text and the event type. The event type of the text, and the mapping relationship between the information vector and the corresponding event type is taken as sample data.
将样本数据分成第一比例的训练集和第二比例的验证集,其中,第一比 例大于第二比例。例如,随机将80%的样本数据,即8万个样本数据作为训练集,将剩余20%的样本数据,即2万个样本数据作为验证集。The sample data is divided into a training set of a first ratio and a verification set of a second ratio, wherein the first ratio is greater than the second ratio. For example, 80% of the sample data, that is, 80,000 sample data are randomly used as the training set, and the remaining 20% of the sample data, that is, 20,000 sample data, is used as the verification set.
利用训练集中的样本数据对所述事件分类模型进行训练,并在训练完后利用验证集中的样本数据对所述事件分类模型的准确率进行验证。例如,将训练集中8万个用户的样本数据输入到LSTM模型中训练,生成事件分类模型,并将验证集中2万个用户的样本数据输入到生成的事件分类模型中进行准确率验证。The event classification model is trained by using the sample data in the training set, and the accuracy of the event classification model is verified by using the sample data in the verification set after the training. For example, the sample data of 80,000 users in the training set is input into the LSTM model for training, an event classification model is generated, and sample data of 20,000 users in the verification set is input into the generated event classification model for accuracy verification.
若准确率大于预设值,则训练完成,若准确率小于或等于预设值,则增加样本数据的数量,之后返回将样本数据分成训练集和验证集的步骤。假设,预设值为98%,若验证准确率大于98%,则训练完成,若准确率小于98%,则增加2万个样本数据,之后返回将样本数据分成训练集和验证集的步骤。If the accuracy is greater than the preset value, the training is completed. If the accuracy is less than or equal to the preset value, the number of sample data is increased, and then the step of dividing the sample data into the training set and the verification set is returned. Assume that the default value is 98%. If the verification accuracy is greater than 98%, the training is completed. If the accuracy is less than 98%, then 20,000 sample data is added, and then the steps of dividing the sample data into the training set and the verification set are returned.
上述实施例提出的热点事件分类方法,通过从服务器获取用户发布的信息文本,对信息文本进行分词处理,提取出分词中的特征词,接着利用预先确定的概率算法算出特征词的最大概率的事件主题,并利用预设的计算公式计算特征词的热点事件指标值,将热点事件指标值大于预设值的特征词对应的信息文本向量化,输入事件分类模型中确定事件类型,提高事件分类的效率,缩短分析时间。The hot event classification method proposed by the above embodiment obtains the information text published by the user from the server, performs word segmentation processing on the information text, extracts feature words in the word segmentation, and then calculates a maximum probability event of the feature word by using a predetermined probability algorithm. The theme, and using the preset calculation formula to calculate the hot event indicator value of the feature word, vectorizing the information text corresponding to the feature word whose hot spot event index value is greater than the preset value, input the event classification model to determine the event type, and improve the event classification. Efficiency and shorten analysis time.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质中包括热点事件分类程序10,所述热点事件分类程序10被处理器执行时实现如下操作:In addition, the embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium includes a hotspot event classification program 10, and when the hotspot event classification program 10 is executed by the processor, the following operations are implemented:
获取步骤:实时从预先确定的服务器中获取第一预设数量用户发布的信息文本;Obtaining step: obtaining, in real time, a first preset number of information texts published by the user from a predetermined server;
分词步骤:利用预先确定的分词规则对上述信息文本进行分词,获得各个信息文本对应的分词;Word segmentation step: segmenting the above information text by using a predetermined word segmentation rule to obtain a word segment corresponding to each information text;
确定步骤:提取出分词中预设的特征词,利用预先确定的概率算法确定该特征词对应的事件主题;Determining step: extracting a feature word preset in the word segment, and determining a event theme corresponding to the feature word by using a predetermined probability algorithm;
计算步骤:根据预设的计算公式,计算出该特征词对应的热点事件指标值;Calculating step: calculating a hot event indicator value corresponding to the feature word according to a preset calculation formula;
分类步骤:判断热点事件指标值是否大于预设阈值,若热点事件指标值 大于预设阈值,则利用预设的向量化方式获取该特征词对应的信息文本的信息向量,将所述信息向量输入预先训练的事件分类模型中,确定出该信息文本对应的事件类型。The classification step is: determining whether the hot event indicator value is greater than a preset threshold, and if the hot event indicator value is greater than a preset threshold, acquiring an information vector of the information text corresponding to the feature word by using a preset vectorization manner, and inputting the information vector In the pre-trained event classification model, the event type corresponding to the information text is determined.
优选地,所述预设的计算公式如下:Preferably, the preset calculation formula is as follows:
其中,v代表事件发展的速度,a代表热点事件指标值,t代表时间点,T代表时间间隔,i为整数,t i代表第i个特征词出现的时间点,X i代表第i个特征词出现的次数。 Where v represents the speed of event development, a represents the hot event indicator value, t represents the time point, T represents the time interval, i is an integer, t i represents the time point at which the i-th feature word appears, and X i represents the i-th feature The number of times a word appears.
优选地,所述预先确定的分词规则包括:Preferably, the predetermined word segmentation rule comprises:
根据预设类型标点符号,将获取的各个信息文本拆分成短句;According to the preset type punctuation marks, the obtained information texts are divided into short sentences;
根据词库中存储的词语,利用长词优先原则对每个短句进行分词。According to the words stored in the thesaurus, each short sentence is segmented using the long word priority principle.
优选地,所述预先确定的概率算法包括:Preferably, the predetermined probability algorithm comprises:
在特征词与事件主题文本之间添加第二预设数量的隐含事件主题;Adding a second preset number of implicit event topics between the feature word and the event topic text;
根据预先确定的隐含事件主题与特征词的映射关系,确定每个隐含事件主题含有的特征词的第一数量X 1及每个特征词所属的隐含事件主题的第二数量X 2,根据第一数量X 1和第二数量X 2确定每个特征词对各个隐含事件主题的第一选择概率P 1=1/(X 1*X 2); Determining, according to a predetermined mapping relationship between the subject matter of the implicit event and the feature word, a first quantity X 1 of the feature words contained in each implicit event topic and a second quantity X 2 of the implicit event subject to which each feature word belongs, Determining, according to the first quantity X 1 and the second quantity X 2 , a first selection probability P 1 =1/(X 1 *X 2 ) of each feature word to each implicit event topic;
根据预先确定的隐含事件主题与事件主题的映射关系,确定每个事件主题含有的隐含事件主题的第三数量X 3及每个隐含事件主题所属的事件主题的第四数量X 4,根据第三数量X 3和第四数量X 4确定每个隐含事件主题对各个事件主题的第二选择概率P 2=1/(X 3*X 4); Determining, according to a predetermined mapping relationship between the implicit event topic and the event topic, a third quantity X 3 of the implicit event topic included in each event topic and a fourth number X 4 of the event topic to which each implicit event topic belongs, Determining, according to the third quantity X 3 and the fourth quantity X 4 , a second selection probability P 2 =1/(X 3 *X 4 ) of each implicit event topic for each event topic;
将第一选择概率P 1和第二选择概率P 2代入预先确定的概率计算公式,计算出每个特征词对各个事件主题的最终概率P 3的分布。 The first selection probability P 1 and the second selection probability P 2 are substituted into a predetermined probability calculation formula, and the distribution of the final probability P 3 of each feature word to each event topic is calculated.
优选地,所述预先确定的概率计算公式如下:Preferably, the predetermined probability calculation formula is as follows:
P 3=P 1*P 2 P 3 =P 1 *P 2
其中,P 1代表第一选择概率,P 2代表第二选择概率,P 3代表最终概率。 Where P 1 represents the first selection probability, P 2 represents the second selection probability, and P 3 represents the final probability.
优选地,所述预设的向量化方式包括:Preferably, the preset vectorization manner includes:
使用自动编码器对信息文本的用户信息进行编码,生成用户信息向量;Encoding the user information of the information text using an automatic encoder to generate a user information vector;
使用预先确定的词向量模型对该信息文本进行词向量编码,生成该信息文本的文本信息向量;Generating a word vector of the information text using a predetermined word vector model to generate a text information vector of the information text;
将用户信息向量与文本信息向量拼接起来生成该信息文本对应的信息向量。The user information vector and the text information vector are spliced together to generate an information vector corresponding to the information text.
优选地,所述事件分类模型为长短期记忆网络模型,所述事件分类模型的训练步骤如下:Preferably, the event classification model is a long-term and short-term memory network model, and the training steps of the event classification model are as follows:
获取第三预设数量的信息文本,并生成各个信息文本对应的信息向量,根据预先确定的信息文本与事件类型的映射关系,确定各个信息向量对应的事件类型,并将信息向量与事件类型的映射关系数据作为样本数据;Obtaining a third preset number of information texts, and generating an information vector corresponding to each information text, determining an event type corresponding to each information vector according to a predetermined mapping relationship between the information text and the event type, and determining the information vector and the event type Mapping relational data as sample data;
将样本数据分成第一比例的训练集和第二比例的验证集,其中,第一比例大于第二比例;Dividing the sample data into a training set of a first ratio and a verification set of a second ratio, wherein the first ratio is greater than the second ratio;
利用训练集中的样本数据对所述事件分类模型进行训练,并在训练完后利用验证集中的样本数据对所述事件分类模型的准确率进行验证;The event classification model is trained by using sample data in the training set, and the accuracy of the event classification model is verified by using sample data in the verification set after training;
若准确率大于预设值,则训练完成,若准确率小于或等于预设值,则增加样本数据的数量,之后返回将样本数据分成训练集和验证集的步骤。If the accuracy is greater than the preset value, the training is completed. If the accuracy is less than or equal to the preset value, the number of sample data is increased, and then the step of dividing the sample data into the training set and the verification set is returned.
本申请之计算机可读存储介质的具体实施方式与上述热点事件分类方法的具体实施方式大致相同,在此不再赘述。The specific implementation manner of the computer readable storage medium of the present application is substantially the same as the specific implementation manner of the foregoing hot event classification method, and details are not described herein again.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM as described above). , a disk, an optical disk, including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and thus does not limit the scope of the patent application, and the equivalent structure or equivalent process transformation made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.
Claims (20)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810252849.6A CN108595519A (en) | 2018-03-26 | 2018-03-26 | Focus incident sorting technique, device and storage medium |
| CN201810252849.6 | 2018-03-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019184217A1 true WO2019184217A1 (en) | 2019-10-03 |
Family
ID=63623682
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/102083 Ceased WO2019184217A1 (en) | 2018-03-26 | 2018-08-24 | Hotspot event classification method and apparatus, and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN108595519A (en) |
| WO (1) | WO2019184217A1 (en) |
Cited By (29)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111222032A (en) * | 2019-12-17 | 2020-06-02 | 中国平安人寿保险股份有限公司 | Public opinion analysis method and related equipment |
| CN111274782A (en) * | 2020-02-25 | 2020-06-12 | 平安科技(深圳)有限公司 | Text auditing method and device, computer equipment and readable storage medium |
| CN111291562A (en) * | 2020-01-17 | 2020-06-16 | 中国石油集团安全环保技术研究院有限公司 | Intelligent Semantic Recognition Method Based on HSE |
| CN111324811A (en) * | 2020-02-20 | 2020-06-23 | 北京奇艺世纪科技有限公司 | Hot content confirmation method and device |
| CN111506727A (en) * | 2020-04-16 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Text content category acquisition method and device, computer equipment and storage medium |
| CN111552790A (en) * | 2020-04-27 | 2020-08-18 | 北京学之途网络科技有限公司 | Method and device for identifying article list brushing |
| CN111858725A (en) * | 2020-04-30 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Method and system for determining event attributes |
| CN111967601A (en) * | 2020-06-30 | 2020-11-20 | 北京百度网讯科技有限公司 | Event relation generation method, event relation rule generation method and device |
| CN112135334A (en) * | 2020-10-27 | 2020-12-25 | 上海连尚网络科技有限公司 | A method and device for determining a hotspot type of a wireless access point |
| CN112667791A (en) * | 2020-12-23 | 2021-04-16 | 深圳壹账通智能科技有限公司 | Latent event prediction method, device, equipment and storage medium |
| CN112765349A (en) * | 2021-01-12 | 2021-05-07 | 深圳前海微众银行股份有限公司 | Industry classification method, apparatus, system and computer readable storage medium |
| CN112926308A (en) * | 2021-02-25 | 2021-06-08 | 北京百度网讯科技有限公司 | Method, apparatus, device, storage medium and program product for matching text |
| CN113127576A (en) * | 2021-04-15 | 2021-07-16 | 微梦创科网络科技(中国)有限公司 | Hotspot discovery method and system based on user content consumption analysis |
| CN113220999A (en) * | 2021-05-14 | 2021-08-06 | 北京百度网讯科技有限公司 | User feature generation method and device, electronic equipment and storage medium |
| CN113392213A (en) * | 2021-04-19 | 2021-09-14 | 合肥讯飞数码科技有限公司 | Event extraction method, electronic device and storage device |
| CN113822069A (en) * | 2021-09-17 | 2021-12-21 | 国家计算机网络与信息安全管理中心 | Emergency early warning method and device based on meta-knowledge and electronic device |
| CN114297099A (en) * | 2021-12-29 | 2022-04-08 | 中国电信股份有限公司 | Data cache optimization method and device, nonvolatile storage medium and electronic equipment |
| CN114386394A (en) * | 2020-10-16 | 2022-04-22 | 电科云(北京)科技有限公司 | Prediction model training method, prediction method and prediction device for platform public opinion data theme |
| CN114461948A (en) * | 2021-12-24 | 2022-05-10 | 天翼云科技有限公司 | Web cache setting optimization method and electronic device |
| CN114492926A (en) * | 2021-12-20 | 2022-05-13 | 华能煤炭技术研究有限公司 | A method and system for text analysis and prediction of coal mine safety hazards |
| CN114764440A (en) * | 2022-04-15 | 2022-07-19 | 中南林业科技大学 | Main event duplicate removal method based on graph node selection and optimization |
| CN114792096A (en) * | 2021-01-26 | 2022-07-26 | 腾讯科技(深圳)有限公司 | A kind of classification method and device of content publishing subject |
| CN114861805A (en) * | 2022-05-18 | 2022-08-05 | 湖南快乐阳光互动娱乐传媒有限公司 | Hot event classification model construction method, hot event classification method and device |
| CN115409105A (en) * | 2022-08-26 | 2022-11-29 | 中国人民解放军战略支援部队信息工程大学 | Method for constructing user portrait based on Android external storage space file operation |
| WO2023125589A1 (en) * | 2021-12-29 | 2023-07-06 | 北京辰安科技股份有限公司 | Emergency monitoring method and apparatus |
| CN116542238A (en) * | 2023-07-07 | 2023-08-04 | 和元达信息科技有限公司 | Event heat trend determining method and system based on small program |
| CN117271857A (en) * | 2023-09-22 | 2023-12-22 | 中国工商银行股份有限公司 | Information display methods, devices, equipment, storage media and program products |
| CN118041707A (en) * | 2024-04-15 | 2024-05-14 | 深圳市奇兔软件技术有限公司 | Identity verification method based on computer network |
| CN118474427A (en) * | 2024-07-09 | 2024-08-09 | 中译文娱科技(青岛)有限公司 | Network public opinion detection method and system |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110232149B (en) * | 2019-05-09 | 2022-03-01 | 北京邮电大学 | Hot event detection method and system |
| CN110414006B (en) * | 2019-07-31 | 2023-09-08 | 京东方科技集团股份有限公司 | Text subject annotation method, device, electronic equipment and storage medium |
| CN110458296B (en) * | 2019-08-02 | 2023-08-29 | 腾讯科技(深圳)有限公司 | Method and device for marking target event, storage medium and electronic device |
| CN110956021B (en) * | 2019-11-14 | 2025-01-10 | 微民保险代理有限公司 | A method, device, system and server for generating original articles |
| CN111078883A (en) * | 2019-12-13 | 2020-04-28 | 北京明略软件系统有限公司 | Risk index analysis method and device, electronic equipment and storage medium |
| CN111177319B (en) * | 2019-12-24 | 2024-08-27 | 中国建设银行股份有限公司 | Method and device for determining risk event, electronic equipment and storage medium |
| CN113065329A (en) * | 2020-01-02 | 2021-07-02 | 广州越秀金融科技有限公司 | Data processing method and device |
| CN111275327B (en) * | 2020-01-19 | 2024-06-07 | 深圳前海微众银行股份有限公司 | Resource allocation method, device, equipment and storage medium |
| CN111369148A (en) * | 2020-03-05 | 2020-07-03 | 广州快盈信息技术服务有限公司 | Object index monitoring method, electronic device and storage medium |
| CN112100374B (en) * | 2020-08-28 | 2024-12-27 | 清华大学 | Text clustering method, device, electronic device and storage medium |
| CN113342979B (en) * | 2021-06-24 | 2023-12-05 | 中国平安人寿保险股份有限公司 | Hot topic identification method, computer device and storage medium |
| CN113434273B (en) * | 2021-06-29 | 2022-12-23 | 平安科技(深圳)有限公司 | Data processing method, device, system and storage medium |
| CN113743746B (en) * | 2021-08-17 | 2024-11-19 | 携程旅游网络技术(上海)有限公司 | Model training method, event dispatching processing method, equipment and medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104965867A (en) * | 2015-06-08 | 2015-10-07 | 南京师范大学 | Text event classification method based on CHI feature selection |
| CN105335476A (en) * | 2015-10-08 | 2016-02-17 | 北京邮电大学 | Method and device for classifying hot event |
| US20160071024A1 (en) * | 2014-02-25 | 2016-03-10 | Sri International | Dynamic hybrid models for multimodal analysis |
| CN106570164A (en) * | 2016-11-07 | 2017-04-19 | 中国农业大学 | Integrated foodstuff safety text classification method based on deep learning |
| CN107797983A (en) * | 2017-04-07 | 2018-03-13 | 平安科技(深圳)有限公司 | Microblog data processing method, device, computer equipment and storage medium |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106095928B (en) * | 2016-06-12 | 2019-10-29 | 国家计算机网络与信息安全管理中心 | A kind of event type recognition methods and device |
| CN107220648B (en) * | 2017-04-11 | 2018-06-22 | 平安科技(深圳)有限公司 | The character identifying method and server of Claims Resolution document |
| CN107644012B (en) * | 2017-08-29 | 2019-03-01 | 平安科技(深圳)有限公司 | Electronic device, problem identification confirmation method and computer readable storage medium |
-
2018
- 2018-03-26 CN CN201810252849.6A patent/CN108595519A/en active Pending
- 2018-08-24 WO PCT/CN2018/102083 patent/WO2019184217A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160071024A1 (en) * | 2014-02-25 | 2016-03-10 | Sri International | Dynamic hybrid models for multimodal analysis |
| CN104965867A (en) * | 2015-06-08 | 2015-10-07 | 南京师范大学 | Text event classification method based on CHI feature selection |
| CN105335476A (en) * | 2015-10-08 | 2016-02-17 | 北京邮电大学 | Method and device for classifying hot event |
| CN106570164A (en) * | 2016-11-07 | 2017-04-19 | 中国农业大学 | Integrated foodstuff safety text classification method based on deep learning |
| CN107797983A (en) * | 2017-04-07 | 2018-03-13 | 平安科技(深圳)有限公司 | Microblog data processing method, device, computer equipment and storage medium |
Cited By (43)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111222032A (en) * | 2019-12-17 | 2020-06-02 | 中国平安人寿保险股份有限公司 | Public opinion analysis method and related equipment |
| CN111222032B (en) * | 2019-12-17 | 2024-04-30 | 中国平安人寿保险股份有限公司 | Public opinion analysis method and related equipment |
| CN111291562A (en) * | 2020-01-17 | 2020-06-16 | 中国石油集团安全环保技术研究院有限公司 | Intelligent Semantic Recognition Method Based on HSE |
| CN111291562B (en) * | 2020-01-17 | 2024-05-03 | 中国石油天然气集团有限公司 | Intelligent semantic recognition method based on HSE |
| CN111324811A (en) * | 2020-02-20 | 2020-06-23 | 北京奇艺世纪科技有限公司 | Hot content confirmation method and device |
| CN111324811B (en) * | 2020-02-20 | 2024-04-12 | 北京奇艺世纪科技有限公司 | Hot content confirmation method and device |
| CN111274782A (en) * | 2020-02-25 | 2020-06-12 | 平安科技(深圳)有限公司 | Text auditing method and device, computer equipment and readable storage medium |
| CN111274782B (en) * | 2020-02-25 | 2023-10-20 | 平安科技(深圳)有限公司 | Text auditing method and device, computer equipment and readable storage medium |
| CN111506727A (en) * | 2020-04-16 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Text content category acquisition method and device, computer equipment and storage medium |
| CN111506727B (en) * | 2020-04-16 | 2023-10-03 | 腾讯科技(深圳)有限公司 | Text content category acquisition method, apparatus, computer device and storage medium |
| CN111552790A (en) * | 2020-04-27 | 2020-08-18 | 北京学之途网络科技有限公司 | Method and device for identifying article list brushing |
| CN111552790B (en) * | 2020-04-27 | 2024-03-08 | 北京明略昭辉科技有限公司 | Method and device for identifying article form |
| CN111858725A (en) * | 2020-04-30 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Method and system for determining event attributes |
| CN111967601A (en) * | 2020-06-30 | 2020-11-20 | 北京百度网讯科技有限公司 | Event relation generation method, event relation rule generation method and device |
| CN111967601B (en) * | 2020-06-30 | 2024-02-20 | 北京百度网讯科技有限公司 | Method for generating event relationships, methods and devices for generating event relationship rules |
| CN114386394A (en) * | 2020-10-16 | 2022-04-22 | 电科云(北京)科技有限公司 | Prediction model training method, prediction method and prediction device for platform public opinion data theme |
| CN112135334A (en) * | 2020-10-27 | 2020-12-25 | 上海连尚网络科技有限公司 | A method and device for determining a hotspot type of a wireless access point |
| CN112667791A (en) * | 2020-12-23 | 2021-04-16 | 深圳壹账通智能科技有限公司 | Latent event prediction method, device, equipment and storage medium |
| CN112765349A (en) * | 2021-01-12 | 2021-05-07 | 深圳前海微众银行股份有限公司 | Industry classification method, apparatus, system and computer readable storage medium |
| CN114792096A (en) * | 2021-01-26 | 2022-07-26 | 腾讯科技(深圳)有限公司 | A kind of classification method and device of content publishing subject |
| CN114792096B (en) * | 2021-01-26 | 2025-07-08 | 腾讯科技(深圳)有限公司 | Content release main body classification method and device |
| CN112926308A (en) * | 2021-02-25 | 2021-06-08 | 北京百度网讯科技有限公司 | Method, apparatus, device, storage medium and program product for matching text |
| CN112926308B (en) * | 2021-02-25 | 2024-01-12 | 北京百度网讯科技有限公司 | Methods, devices, equipment, storage media and program products for matching text |
| CN113127576A (en) * | 2021-04-15 | 2021-07-16 | 微梦创科网络科技(中国)有限公司 | Hotspot discovery method and system based on user content consumption analysis |
| CN113127576B (en) * | 2021-04-15 | 2024-05-24 | 微梦创科网络科技(中国)有限公司 | Hot spot discovery method and system based on user content consumption analysis |
| CN113392213B (en) * | 2021-04-19 | 2024-05-31 | 合肥讯飞数码科技有限公司 | Event extraction method, electronic equipment and storage device |
| CN113392213A (en) * | 2021-04-19 | 2021-09-14 | 合肥讯飞数码科技有限公司 | Event extraction method, electronic device and storage device |
| CN113220999A (en) * | 2021-05-14 | 2021-08-06 | 北京百度网讯科技有限公司 | User feature generation method and device, electronic equipment and storage medium |
| CN113822069A (en) * | 2021-09-17 | 2021-12-21 | 国家计算机网络与信息安全管理中心 | Emergency early warning method and device based on meta-knowledge and electronic device |
| CN113822069B (en) * | 2021-09-17 | 2024-03-12 | 国家计算机网络与信息安全管理中心 | Sudden event early warning method and device based on meta-knowledge and electronic device |
| CN114492926A (en) * | 2021-12-20 | 2022-05-13 | 华能煤炭技术研究有限公司 | A method and system for text analysis and prediction of coal mine safety hazards |
| CN114461948B (en) * | 2021-12-24 | 2024-12-10 | 天翼云科技有限公司 | Web cache setting optimization method and electronic device |
| CN114461948A (en) * | 2021-12-24 | 2022-05-10 | 天翼云科技有限公司 | Web cache setting optimization method and electronic device |
| CN114297099A (en) * | 2021-12-29 | 2022-04-08 | 中国电信股份有限公司 | Data cache optimization method and device, nonvolatile storage medium and electronic equipment |
| WO2023125589A1 (en) * | 2021-12-29 | 2023-07-06 | 北京辰安科技股份有限公司 | Emergency monitoring method and apparatus |
| CN114764440A (en) * | 2022-04-15 | 2022-07-19 | 中南林业科技大学 | Main event duplicate removal method based on graph node selection and optimization |
| CN114861805A (en) * | 2022-05-18 | 2022-08-05 | 湖南快乐阳光互动娱乐传媒有限公司 | Hot event classification model construction method, hot event classification method and device |
| CN115409105A (en) * | 2022-08-26 | 2022-11-29 | 中国人民解放军战略支援部队信息工程大学 | Method for constructing user portrait based on Android external storage space file operation |
| CN116542238A (en) * | 2023-07-07 | 2023-08-04 | 和元达信息科技有限公司 | Event heat trend determining method and system based on small program |
| CN116542238B (en) * | 2023-07-07 | 2024-03-15 | 和元达信息科技有限公司 | Event heat trend determining method and system based on small program |
| CN117271857A (en) * | 2023-09-22 | 2023-12-22 | 中国工商银行股份有限公司 | Information display methods, devices, equipment, storage media and program products |
| CN118041707A (en) * | 2024-04-15 | 2024-05-14 | 深圳市奇兔软件技术有限公司 | Identity verification method based on computer network |
| CN118474427A (en) * | 2024-07-09 | 2024-08-09 | 中译文娱科技(青岛)有限公司 | Network public opinion detection method and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108595519A (en) | 2018-09-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019184217A1 (en) | Hotspot event classification method and apparatus, and storage medium | |
| CN111814465B (en) | Machine learning-based information extraction method, device, computer equipment, and medium | |
| US8543375B2 (en) | Multi-mode input method editor | |
| CN110457680B (en) | Entity disambiguation method, device, computer equipment and storage medium | |
| WO2019214149A1 (en) | Text key information identification method, electronic device, and readable storage medium | |
| US8983826B2 (en) | Method and system for extracting shadow entities from emails | |
| CN112395391A (en) | Concept graph construction method and device, computer equipment and storage medium | |
| CN113094478B (en) | Expression reply method, device, equipment and storage medium | |
| CN113127621A (en) | Dialogue module pushing method, device, equipment and storage medium | |
| CN110795942B (en) | Keyword determination method and device based on semantic recognition and storage medium | |
| CN114416976A (en) | Text annotation method, device and electronic equipment | |
| CN111538830B (en) | French searching method, device, computer equipment and storage medium | |
| CN115481599A (en) | Document processing method and device, electronic equipment and storage medium | |
| CN109753646B (en) | Article attribute identification method and electronic equipment | |
| CN110807322B (en) | Method, device, server and storage medium for identifying new words based on information entropy | |
| CN117556050B (en) | Data classification and classification method and device, electronic equipment and storage medium | |
| CN114138951B (en) | Question and answer processing method, device, electronic device and storage medium | |
| CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
| CN114780678B (en) | Text retrieval method, device, equipment and storage medium | |
| CN117743577A (en) | Text classification method, device, electronic device and storage medium | |
| CN111783447B (en) | Sensitive word detection method, device and equipment based on ngram distance and storage medium | |
| CN115563515A (en) | Text similarity detection method, device and equipment and storage medium | |
| KR20220024251A (en) | Method and apparatus for building event library, electronic device, and computer-readable medium | |
| CN114461771A (en) | Question and answer method, apparatus, electronic device and readable storage medium | |
| CN115879452A (en) | New word discovery method, device, equipment and computer readable storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 03.02.2021) |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18912789 Country of ref document: EP Kind code of ref document: A1 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18912789 Country of ref document: EP Kind code of ref document: A1 |