Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide it is a kind of can be improved matching accuracy enterprise's full name with
Abbreviation matching method, apparatus, computer equipment and storage medium.
A kind of enterprise's full name and abbreviation matching method, which comprises
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter
The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described
The success of target abbreviation matching.
The described pair of text comprising abbreviation to be identified carries out abbreviation identifying processing in one of the embodiments, obtains standby
Before choosing is referred to as gathered, further includes:
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, is obtained
Take the sample data set for carrying referred to as mark;
According to the sample data set, training obtains Named Entity Extraction Model, and the Named Entity Extraction Model is used for
Carry out abbreviation identifying processing.
It is described in one of the embodiments, to obtain in the alternative abbreviation set each alternative abbreviation in pre-set text library
Word frequency determine that target referred to as includes: according to the word frequency of each alternative abbreviation
When abbreviation alternative there are multiclass in the text comprising abbreviation to be identified, according to the word sequence of alternative abbreviation
Column, to alternatively referred to as classifying in the alternative abbreviation set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one of the embodiments, it is described according to the target referred to as, traverse default abbreviation abbreviation library, obtain with it is described
Before the abbreviation abbreviation of target abbreviation matching, further includes:
Obtain enterprise's full name library, according to the compositional model of enterprise's full name, to enterprise's full name in enterprise's full name library into
Row classification;
According to default contraction rule corresponding with the compositional model, abbreviation processing is carried out to all kinds of enterprise's full name,
Abbreviation corresponding with enterprise's full name is obtained referred to as to gather;
Referred to as gathered according to the abbreviation, constructs the default abbreviation abbreviation library corresponding with enterprise's full name library.
The basis default contraction rule corresponding with the compositional model in one of the embodiments, to all kinds of institutes
It states enterprise's full name and carries out abbreviation processing, before acquisition abbreviation corresponding with enterprise's full name is referred to as gathered, further includes:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name described in the sample data, according to the letter of enterprise described in the sample data
Claim, determines the default contraction rule corresponding with compositional model.
It is described when the text for finding the target abbreviation and enterprise's full name co-occurrence in one of the embodiments,
When, after determining that enterprise's full name and the target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and the target are referred to as updated to the full abbreviation matching data of preset enterprise
Library.
Enterprise's full name by successful match is referred to as updated to pre- with the target in one of the embodiments,
If the full abbreviation matching database of enterprise after, further includes:
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in the text and referred to as, is updated to the full abbreviation of preset enterprise
Matching database.
A kind of enterprise's full name and abbreviation matching device, described device include:
Alternative referred to as set obtains module, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains
It is alternative referred to as to gather;
Target abbreviation determining module, for obtaining in the alternative abbreviation set each alternative abbreviation in pre-set text library
Word frequency determines target referred to as according to the word frequency of each alternative abbreviation;
Abbreviation referred to as obtains module, for referred to as, traversing default abbreviation abbreviation library according to the target, obtains and the mesh
Mark the abbreviation of abbreviation matching referred to as;
Enterprise's full name obtains module, for obtaining enterprise's full name referred to as corresponding with the abbreviation;
Matching result determining module, for when finding the target referred to as and when the text of enterprise's full name co-occurrence,
Determine the successful match of enterprise's full name Yu the target abbreviation.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device performs the steps of when executing the computer program
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter
The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described
The success of target abbreviation matching.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
It is performed the steps of when row
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter
The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described
The success of target abbreviation matching.
Above-mentioned enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium, by comprising to be identified
The text of abbreviation carries out abbreviation identifying processing, obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as,
By traversal default abbreviation abbreviation library referred to as corresponding with enterprise, enterprise's full name with target abbreviation matching is obtained, and by looking into
Looking for text confirmation target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name with referred to as whether match into
Function.In entire scheme, on the one hand by screening to the abbreviation of identification, the data improved in abbreviation cognitive phase are accurate
Property, on the other hand after obtaining enterprise's full name referred to as corresponding with target, pass through confirmation target abbreviation and corresponding enterprise's full name
Whether co-occurrence in one text, is confirmed whether successful match, improves the accuracy of matching result.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Enterprise's full name provided by the present application and abbreviation matching method, can be applied in application environment as shown in Figure 1.Its
In, terminal 102 is communicated with server 104 by network by network.Server 104 is to the text comprising abbreviation to be identified
Progress abbreviation identifying processing obtains alternative referred to as set, and each alternative abbreviation is in pre-set text library in the alternative referred to as set of acquisition
Word frequency target is determined referred to as according to the word frequency of each alternative abbreviation, referred to as according to target, traverse default abbreviation abbreviation library, obtain
Referred to as with the abbreviation of target abbreviation matching, enterprise's full name referred to as corresponding with abbreviation is obtained, when finding target abbreviation and enterprise
When the text of full name co-occurrence, enterprise's full name and the success of target abbreviation matching are determined, and by the enterprise's full name and target of successful match
Referred to as push to terminal 102.Wherein, terminal 102 can be, but not limited to be various personal computers, laptop, intelligent hand
Machine, tablet computer and portable wearable device, server 104 can be formed with the either multiple servers of independent server
Server cluster realize.
In one embodiment, it as shown in Fig. 2, providing a kind of enterprise's full name and abbreviation matching method, answers in this way
For being illustrated for the server in Fig. 1, comprising the following steps:
Step S200 carries out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains alternative referred to as set.
Referred to as refer to the brief word form being compressed by long, complicated title, wherein for being related to peculiar name
Word such as specific enterprise referred to as also belongs to official's appellation that official is duly admitted, for the succinct of expression, enterprise longer for full name
Title is generally described using the mode of abbreviation, especially requires stringent public sentiment text to occur in numbers of words such as headline
In, often recorded in the form of enterprise's abbreviation.Text comprising abbreviation to be identified can use web crawlers algorithm, obtain
It takes comprising public sentiment text to be identified, by obtaining the text comprising abbreviation to be identified for the progress subordinate sentence processing of public sentiment text,
In some embodiments, the text comprising abbreviation to be identified can be the mark of the news category public sentiment text including abbreviation to be identified
Lead part includes the sentence etc. of abbreviation to be identified in topic or news.Referred to as identification refers to by for abbreviation identifying processing
Named Entity Extraction Model, to comprising abbreviation to be identified text carry out feature vector extraction and name Entity recognition, obtain
The process for the multiple alternative abbreviations that may include into the text.Named Entity Extraction Model is by carrying the sample referred to as marked
Data set training obtains, and according to the feature vector of text, the name entity of identification is the abbreviation in text.Wherein, due to comprising
The succinct expression of the text of abbreviation to be identified, identified abbreviation is there are multiple, for example, the text comprising abbreviation to be identified are as follows:
" Space Dynamic: quasi- open be listed transfers the possession of west boat 70.94% equity of Aluminum ", can by the alternative abbreviation that abbreviation identifying processing obtains
It can include " space flight ", " power ", " Space Dynamic " and " west boat Aluminum ", " west boat aluminium ", " west boat " etc..In embodiment, it adopts
Abbreviation identifying processing is carried out with Named Entity Extraction Model, word segmentation processing is carried out by the text that will include abbreviation to be identified, is obtained
The sequence of terms for taking the text comprising abbreviation to be identified generates feature according to the sequence of terms of the text comprising abbreviation to be identified
Feature vector is inputted trained Named Entity Extraction Model in advance, identified in the text comprising abbreviation to be identified by vector
The multiple abbreviations that may include form alternative referred to as set.
Step S300 obtains word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set, according to each alternative
The word frequency of abbreviation determines target referred to as.
Multiple alternative abbreviations in alternative referred to as set can be to be extracted from one text, such as " refreshing kindling
Electricity ", " south mind kindling electricity ", in " refreshing kindling ", only one of them is correctly, to obtain and alternative referred to as relevant pre-set text
Library obtains word frequency of each alternative abbreviation in pre-set text library, when the word frequency of the alternative abbreviation of difference in one text is identical
Or when close, take the longest alternative abbreviation of string length as target referred to as, word frequency it is low then as target abbreviation can
Energy property is small, for example, identical as the word frequency of " kindling electricity " at " refreshing kindling electricity ", the word frequency of " south mind kindling is electric " is lower, therefore, it will " mind
Kindling electricity " is as target abbreviation.
Step S400 referred to as according to target traverses default abbreviation abbreviation library, obtains the abbreviation letter with target abbreviation matching
Claim.
Default abbreviation abbreviation library, which refers to, abridges according to existing enterprise's full name data according to the contraction rule of setting
The database that the abbreviation abbreviation data obtained after processing are constituted by setting contraction rule or can pass through in embodiment
Model abridge referred to as to realize the abbreviation processing of enterprise's full name.By traversing default abbreviation abbreviation library, obtain and target abbreviation phase
With abbreviation abbreviation when, referred to as abridge to obtain with from enterprise's full name by the target that identifies from the text comprising abbreviation to be identified
Abbreviation abbreviation matching, realize being associated with for enterprise's full name and enterprise's abbreviation.
Step S500 obtains enterprise's full name referred to as corresponding with abbreviation.
It obtains and presets the abbreviation associated enterprise's full name in abbreviation library library, closed according to the mapping of enterprise's full name and abbreviation abbreviation
System referred to as according to determining abbreviation can determine that enterprise's full name referred to as corresponding with the abbreviation can pass through in embodiment
The industrial and commercial data acquisition for obtaining each enterprise constructs enterprise's full name library according to enterprise's full name data of each enterprise to enterprise's full name.
Step S600 determines enterprise's full name and target letter when finding text of the target referred to as with enterprise full name co-occurrence
Claim successful match.
Co-occurrence refers to the phenomenon that feature vocabulary occurs jointly, and feature vocabulary here can be target referred to as and enterprise is complete
Claim, with target referred to as and enterprise's full name be search target, public sentiment data is scanned for, when get simultaneously comprising target abbreviation
When with the text of enterprise full name, target is determined referred to as and enterprise's full name successful match, conversely, if it does not exist simultaneously comprising target letter
Claim the text with enterprise's full name, it fails to match.
Above-mentioned enterprise's full name and abbreviation matching method, by being carried out at abbreviation identification to the text comprising abbreviation to be identified
Reason obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as, referred to as corresponding with enterprise by traversal
Default abbreviation abbreviation library obtains enterprise's full name with target abbreviation matching, and by searching for text confirmation target abbreviation and enterprise
Whether co-occurrence is in one text for full name, confirm enterprise's full name and referred to as whether successful match.In entire scheme, on the one hand by pair
The abbreviation of identification is screened, and the data accuracy in abbreviation cognitive phase is improved, and is on the other hand being obtained with target referred to as
After corresponding enterprise's full name, by confirmation target, referred to as whether co-occurrence in one text, is confirmed whether with corresponding enterprise's full name
Successful match, the word vector or term vector avoided only according to enterprise's full name generates referred to as, and directly carries out enterprise's full name
The matching bring error in library and enterprise's abbreviation library, improves the accuracy of matching result.
In some embodiments, the above method can also apply to the organ of constituted by law, cause, enterprise, corporations and its
The unit of his nomocracy may include the full name of government department, R&D institution, all kinds of universities and colleges, incorporated business, international organization etc.
With the matching of abbreviation.
In one embodiment, as shown in figure 3, step S200, carries out abbreviation identification to the text comprising abbreviation to be identified
Processing obtains before alternatively referred to as gathering, further includes:
Step S120 obtains multiple sample datas comprising enterprise's abbreviation.
Step S140 carries out the processing of abbreviation mark to each sample data, obtains according to the corresponding known abbreviation of each sample data
Take the sample data set for carrying referred to as mark.
Step S160, according to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is used for
Carry out abbreviation identifying processing.
Sample data comprising enterprise's abbreviation refers to the text of known abbreviation, and referred to as mark, which refers to, is divided sample data
Word processing, and known abbreviation is labeled sample data, and the sample data after mark is trained to term vector, the word to
Amount carries referred to as mark label, using the corresponding term vector of multiple sample datas as the input number of Named Entity Extraction Model
According to, Named Entity Extraction Model is trained, Named Entity Extraction Model be Bi-LSTM+CRF model, wherein Bi-LSTM+
CRF model is the output sequence that global optimum is obtained with CRF, is equivalent to the recycling to LSTM information, Bi-LSTM is called two-way
LSTM, while considering the feature (by rear to procedure extraction) of past feature (extracting by forward process) and future, phase
When in two LSTM, a positive list entries, a reversed list entries, then the output of the two is combined as final
Result.The training tool of term vector can be gensimword2vec, glove etc..It is named using input data training
Entity recognition model, after the completion of training, using accuracy rate as the evaluation parameter of Named Entity Extraction Model, when accuracy rate does not reach
When to given threshold range, model parameter is adjusted, the optimization of entity recognition model is named.Name Entity recognition mould
Type can be used for inputting the term vector of the text comprising abbreviation to be identified, and identification comprising that may include in the text of abbreviation to be identified
Abbreviation, and export alternative abbreviation that may be present, formed and alternative referred to as gathered.
In one embodiment, as shown in figure 3, step S300, obtains each alternative abbreviation in alternative referred to as set and presetting
Word frequency in text library determines that target referred to as includes: according to the word frequency of each alternative abbreviation
Step S320, when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified, according to the word of alternative abbreviation
Word order column, to alternatively referred to as classifying in alternative abbreviation set.
Step S330 obtains word frequency of each alternative abbreviation of each classification in pre-set text library,
Step S340 determines the target of each classification referred to as according to the word frequency of each alternative abbreviation of each classification.
Sequence of terms refers to the incidence relation between the multiple words and each word of composition word, can be true using sequence labelling method
Determine sequence of terms, according to sequence of terms, alternative abbreviation is sorted out.During the preliminary treatment of Named Entity Extraction Model not
Evitable to generate some noises, there are partial noise data in obtained multiple alternative abbreviations, in order to denoise, by default
Text library text carries out intersection denoising.By taking the alternative abbreviation of one type as an example, each spare abbreviation in such is obtained respectively
It is spare to filter out every one kind according to the length of sequence of terms and word frequency height for word frequency in multiple texts in pre-set text library
Target in abbreviation referred to as, the step for be denoising process, can first gather one kind similar word in one of the embodiments, such as
It when there is the word comprising identical sequence of terms, is classified as together, such as by " refreshing kindling electricity ", " south mind kindling electricity " is " refreshing
Kindling " is classified as one kind, then, needs to count word frequency of each of this kind word in multiple newsletter archives, when same
When the word frequency of different terms in one newsletter archive is identical or close, take sequence of terms length is longest to be used as target referred to as,
Word frequency it is low then become target abbreviation a possibility that it is small, the low spare abbreviation of word frequency can retain abbreviation data and word frequency information.
In one embodiment, as shown in figure 3, step S400 referred to as according to target traverses default abbreviation abbreviation library, obtains
Before taking the abbreviation abbreviation with target abbreviation matching, further includes:
Step S360 obtains enterprise's full name library, complete to the enterprise in enterprise's full name library according to the compositional model of enterprise's full name
Title is classified.
Step S370 carries out abbreviation processing to all kinds of enterprise's full name according to default contraction rule corresponding with compositional model,
Abbreviation corresponding with enterprise's full name is obtained referred to as to gather.
Step S380 referred to as gathers according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In embodiment, by the industrial and commercial data acquisition of each enterprise of acquisition to enterprise's full name, the enterprise according to each enterprise is complete
Claim data, constructs enterprise's full name library.According to the compositional model of enterprise's full name, the compositional model of full name can be divided into several
Class, the first kind are " places+name+category of employment+company attributes ", such as meet the having of this mode " Tencent, Shenzhen calculates
Machine System Co., Ltd ", " Jiangsu Ya Bang fuel limited liability company ";Second class is " name+category of employment+company attributes ",
There are also " names+place+company attributes ", and " name+company attributes " etc..When full name data generate abbreviation by abbreviation,
Number of words can will be limited as five words and following.The contraction rule for the spare abbreviation that this kind of full name directly generates usually has several classes,
One kind is only name, such as " Tencent ", " Ya Bang ";One kind is name+industry, and such as " Tencent's computer ", " sub- nation's dyestuff ", " sea is logical
Security ";One kind is location/location abbreviation+name, such as " Chinese safety ";It is name+company attributes abbreviation there are also one kind, such as " rises
News are holding ", " Apple Inc. ".In general, usually can be using abbreviation when name, industry attribute etc. are more than four words, or use
Word is taken out at preceding two word or interval, such as " middle petrochemical industry " (China PetroChemical Corporation).It is complete for one according to this generation logic
Claim, the set of the spare abbreviation of a series of production can be generated, such as Chang'an Co., Ltd, Ford Motor, " Ford " will be generated,
" Ford Motor ", " Chang'an Ford ", " Ford " these abbreviations referred to as collect to form abbreviation corresponding with enterprise's full name
It closes.Since enterprise full name inventory is in multiple enterprise's full name, referred to as gathered according to the abbreviation of enterprise's full name, building and enterprise's full name library
Abbreviation abbreviation library and enterprise's full name inventory are preset in mapping relations in corresponding default abbreviation abbreviation library.
In one embodiment, step S370 is complete to all kinds of enterprises according to default contraction rule corresponding with compositional model
Title carries out abbreviation processing, before acquisition abbreviation corresponding with enterprise's full name is referred to as gathered, further includes:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship.
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould
The corresponding default contraction rule of formula.
According to known business referred to as with the sample data of enterprise full name, available enterprise referred to as with the matching of enterprise full name
Relationship, referred to as according to the compositional model of enterprise's full name and corresponding enterprise, the contracting of enterprise's full name in the available sample data
Rule is write, by counting the contraction rule in multiple sample datas to enterprise's full name, determines that the default abbreviation to enterprise's full name is advised
Then, in some embodiments, enterprise's full name may exist multiple corresponding default contraction rules.
In one embodiment, as shown in figure 3, step S600, when find target referred to as with the text of enterprise full name co-occurrence
This when, after determining that enterprise's full name and target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching data of preset enterprise by step S720
Library.
Enterprise's full name of successful match and enterprise are referred to as updated to the full abbreviation matching database of preset enterprise, it can be so as to
When to all kinds of progress the analysis of public opinion comprising data, quick determination enterprise's full name referred to as corresponding with enterprise improves public sentiment point
Analyse efficiency.
In one embodiment, as shown in figure 3, step S600, enterprise's full name of successful match and target are referred to as updated
After to the full abbreviation matching database of preset enterprise, further includes:
Step S740, according to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship.
Step S760 extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation of preset enterprise
With database.
Text can be public sentiments text such as news etc., and predetermined keyword can be the word of the full abbreviation of enterprise for identification,
Such as " ... referred to as ... " in embodiment, by scanning news documents, especially headline etc., directly mentioned by preset rules
Full abbreviation matching is taken as a result, the corresponding entity of this kind of data is updated to the full abbreviation of preset enterprise by such as " A abbreviation B " etc.
With database.The case where having a large amount of public sentiment datas and a large amount of enterprise's full name data is being faced when searching, but can not be by text
In enterprise referred to as the problem of finding corresponding full name when, by searching for the full abbreviation matching database of preset enterprise, can keep away
Exempt to occur that a referred to as corresponding multiple full name, referred to as corresponding with full name there are the even complete referred to as completely unrelated feelings of ambiguity
Condition improves matched accuracy.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow,
These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps
Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3
Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps
Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively
It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately
It executes.
In one embodiment, as shown in figure 4, providing a kind of enterprise's full name and abbreviation matching device, comprising:
Alternative referred to as set obtains module 200, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains
It obtains and alternatively referred to as gathers;
Target abbreviation determining module 300, for obtaining in alternative referred to as set each alternative abbreviation in pre-set text library
Word frequency determines target referred to as according to the word frequency of each alternative abbreviation;
Abbreviation referred to as obtains module 400, for referred to as, traversing default abbreviation abbreviation library according to target, obtains and target letter
Claim matched abbreviation referred to as;
Enterprise's full name obtains module 500, for obtaining and referred to as corresponding enterprise's full name of abridging;
Matching result determining module 600, for determining enterprise when finding text of the target referred to as with enterprise full name co-occurrence
The successful match of industry full name and target abbreviation.
In one embodiment, enterprise's full name and abbreviation matching device, further include Named Entity Extraction Model training module,
For obtaining multiple sample datas comprising enterprise's abbreviation, according to the corresponding known abbreviation of each sample data, to each sample data
The processing of abbreviation mark is carried out, the sample data set for carrying referred to as mark is obtained, according to sample data set, it is real that training obtains name
Body identification model, Named Entity Extraction Model is for carrying out abbreviation identifying processing.
In one embodiment, target abbreviation determining module 300 is also used to exist in the text comprising abbreviation to be identified
When the alternative abbreviation of multiclass, alternatively referred to as classifying in alternative abbreviation set is obtained according to the sequence of terms of alternative abbreviation
Word frequency of each alternative abbreviation of each classification in pre-set text library, according to the word frequency of each alternative abbreviation of each classification,
Determine the target of each classification referred to as.
In one embodiment, enterprise's full name and abbreviation matching device further include default abbreviation abbreviation library building module, use
Classified according to the compositional model of enterprise's full name to enterprise's full name in enterprise's full name library in obtaining enterprise's full name library, according to
Default contraction rule corresponding with compositional model carries out abbreviation processing to all kinds of enterprise's full name, obtains corresponding with enterprise's full name
Abbreviation is referred to as gathered, and is referred to as gathered according to abbreviation, is constructed default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it presets abbreviation library of abridging and constructs module, be also used to obtain comprising enterprise's full name and abbreviation
Sample data with relationship analyzes the compositional model of enterprise's full name in sample data, referred to as according to enterprise in sample data, determines
Default contraction rule corresponding with compositional model.
In one embodiment, enterprise's full name and abbreviation matching device, further include the full abbreviation matching database update of enterprise
Module, for enterprise's full name of successful match and target to be referred to as updated to the full abbreviation matching database of preset enterprise.
The full abbreviation matching database update module of enterprise in one embodiment is also used to according to predetermined keyword, search
Text comprising enterprise's full name Yu enterprise's abbreviation matching relationship extracts matched enterprise's full name and enterprise in text and referred to as, updates
To the full abbreviation matching database of preset enterprise.
Above-mentioned enterprise's full name and abbreviation matching device, by being carried out at abbreviation identification to the text comprising abbreviation to be identified
Reason obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as, referred to as corresponding with enterprise by traversal
Default abbreviation abbreviation library obtains enterprise's full name with target abbreviation matching, and by searching for text confirmation target abbreviation and enterprise
Whether co-occurrence is in one text for full name, confirm enterprise's full name and referred to as whether successful match.In entire scheme, on the one hand by pair
The abbreviation of identification is screened, and the data accuracy in abbreviation cognitive phase is improved, and is on the other hand being obtained with target referred to as
After corresponding enterprise's full name, by confirmation target, referred to as whether co-occurrence in one text, is confirmed whether with corresponding enterprise's full name
Successful match improves the accuracy of matching result.
Specific restriction about enterprise's full name and abbreviation matching device may refer to above for enterprise's full name and abbreviation
The restriction of matching process, details are not described herein.Modules in above-mentioned enterprise's full name and abbreviation matching device can whole or portion
Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment
In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold
The corresponding operation of the above modules of row.
In one embodiment, a kind of computer equipment is provided, which can be terminal, internal structure
Figure can be as shown in Figure 5.The computer equipment includes processor, the memory, network interface, display connected by system bus
Screen and input unit.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment is deposited
Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer journey
Sequence.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The network interface of machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor with
Realize a kind of enterprise's full name and abbreviation matching method.The display screen of the computer equipment can be liquid crystal display or electronic ink
Water display screen, the input unit of the computer equipment can be the touch layer covered on display screen, be also possible to computer equipment
Key, trace ball or the Trackpad being arranged on shell can also be external keyboard, Trackpad or mouse etc..
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with
Computer program, the processor perform the steps of when executing computer program
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set is obtained, according to the word of each alternative abbreviation
Frequently, determine target referred to as;
Referred to as according to target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with abbreviation;
When finding text of the target referred to as with enterprise full name co-occurrence, determine enterprise's full name and target abbreviation matching at
Function.
In one embodiment, it is also performed the steps of when processor executes computer program
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition carries
The referred to as sample data set of mark;
According to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is for carrying out referred to as
Identifying processing.
In one embodiment, it is also performed the steps of when processor executes computer program
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified
Alternatively referred to as classifying in alternative referred to as set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one embodiment, it is also performed the steps of when processor executes computer program
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name
Class;
According to default contraction rule corresponding with compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, obtains and looks forward to
The corresponding abbreviation of industry full name is referred to as gathered;
Referred to as gathered according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it is also performed the steps of when processor executes computer program
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould
The corresponding default contraction rule of formula.
In one embodiment, it is also performed the steps of when processor executes computer program
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching database of preset enterprise.
In one embodiment, it is also performed the steps of when processor executes computer program
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation matching database of preset enterprise.
The above-mentioned computer equipment for realizing enterprise's full name and abbreviation matching method, by including abbreviation to be identified
Text carries out abbreviation identifying processing, obtain it is alternative referred to as gather, according to the word frequency of alternative abbreviation, obtain target referred to as, by time
Default abbreviation abbreviation library referred to as corresponding with enterprise is gone through, obtains enterprise's full name with target abbreviation matching, and by searching for text
Confirming target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name and referred to as whether successful match.Entirely
In scheme, on the one hand by screening to the abbreviation of identification, the data accuracy in abbreviation cognitive phase, another party are improved
Face obtain with after target referred to as corresponding enterprise's full name, by confirmation target abbreviation and corresponding enterprise's full name whether co-occurrence in
One text is confirmed whether successful match, improves the accuracy of matching result.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program performs the steps of when being executed by processor
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set is obtained, according to the word of each alternative abbreviation
Frequently, determine target referred to as;
Referred to as according to target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with abbreviation;
When finding text of the target referred to as with enterprise full name co-occurrence, determine enterprise's full name and target abbreviation matching at
Function.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition carries
The referred to as sample data set of mark;
According to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is for carrying out referred to as
Identifying processing.
In one embodiment, it is also performed the steps of when computer program is executed by processor
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified
Alternatively referred to as classifying in alternative referred to as set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name
Class;
According to default contraction rule corresponding with compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, obtains and looks forward to
The corresponding abbreviation of industry full name is referred to as gathered;
Referred to as gathered according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould
The corresponding default contraction rule of formula.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching database of preset enterprise.
In one embodiment, it is also performed the steps of when computer program is executed by processor
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation matching database of preset enterprise.
The above-mentioned computer readable storage medium for realizing enterprise's full name and abbreviation matching method, by comprising wait know
The text of other abbreviation carries out abbreviation identifying processing, obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target letter
Claim, by traversal default abbreviation abbreviation library referred to as corresponding with enterprise, obtains enterprise's full name with target abbreviation matching, and pass through
Searching text confirmation target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name with referred to as whether match into
Function.In entire scheme, on the one hand by screening to the abbreviation of identification, the data improved in abbreviation cognitive phase are accurate
Property, on the other hand after obtaining enterprise's full name referred to as corresponding with target, pass through confirmation target abbreviation and corresponding enterprise's full name
Whether co-occurrence in one text, is confirmed whether successful match, improves the accuracy of matching result.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable
It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen
Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise
Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not
Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art,
Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection scope of the application.
Therefore, the scope of protection shall be subject to the appended claims for the application patent.