CN108986790A

CN108986790A - The method and apparatus of voice recognition of contact

Info

Publication number: CN108986790A
Application number: CN201811148211.4A
Authority: CN
Inventors: 张腾飞; 宋晔; 欧阳能钧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2018-12-11

Abstract

The embodiment of the present application discloses the method and apparatus of voice recognition of contact.One specific embodiment of this method includes: to carry out speech recognition to the speech polling formula received, and the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person is extracted from recognition result；The phonotactics corresponding with being used to identify the identification of contacts of default contact person in default linkman set of the corresponding aligned phoneme sequence of target identification are matched, determine the object contact person that speech polling formula is inquired from default linkman set according to matching result.The embodiment improves the efficiency and precision of offline speech recognition.

Description

The method and apparatus of voice recognition of contact

Technical field

The invention relates to field of computer technology, and in particular to voice technology field more particularly to speech recognition The method and apparatus of contact person.

Background technique

The speech recognition of usual mobile terminal needs the computing capability by means of server, the offline voice in some scenes Identification or user provide more accurately voice service.Offline voice recognition of contact is in operating system without using network Scene under, by offline speech recognition application by voice messaging that user issues title and contact person's surname for locally saving Name is compared, and obtains being best suitable for the name of contact person that user it is expected lookup.

In the technology of above-mentioned offline voice recognition of contact, by taking Chinese language as an example, local speech recognition application to After the speech recognition at family, the similar Chinese characters in common use combination of user pronunciation is returned, is then converted Chinese character combination to corresponding Phonetic full name compares with the phonetic full name of contact person one by one, obtains matching result.

Summary of the invention

The embodiment of the present application proposes the method and apparatus of voice recognition of contact.

In a first aspect, the embodiment of the present application provides a kind of method of voice recognition of contact, comprising: to the language received Sound query formulation carries out speech recognition, and the target identification pair for identifying inquired object contact person is extracted from recognition result The aligned phoneme sequence answered；By the corresponding aligned phoneme sequence of target identification be used to identify default contact person's in default linkman set The corresponding phonotactics of identification of contacts are matched, and determine speech polling from default linkman set according to matching result The object contact person that formula is inquired.

In some embodiments, the above method further include: determine the connection of the default contact person in default linkman set People identifies corresponding phonotactics.

In some embodiments, the identification of contacts of the default contact person in the default linkman set of above-mentioned determination is corresponding Phonotactics, comprising: according to the corresponding relationship of individual character and phoneme in character library, to the default contact person in default linkman set Identification of contacts carry out phoneme decomposition according to the individual character for being included, obtain the connection of the default contact person in default linkman set It is that people identifies corresponding phonotactics.

In some embodiments, above-mentioned that speech recognition is carried out to the speech polling formula received, it is extracted from recognition result Out for identifying the corresponding aligned phoneme sequence of target identification of inquired object contact person, comprising: based on acoustic model to voice Query formulation is decoded, and obtains the corresponding aligned phoneme sequence of speech polling formula；Based on language model by the corresponding sound of speech polling formula Prime sequences are converted into corresponding text identification result；Text identification result is matched with preset instruction template, from text It is extracted in recognition result and the matched instruction text section of preset instruction template；From the corresponding aligned phoneme sequence of speech polling formula Aligned phoneme sequence corresponding with instruction text section is rejected, the target identification obtained for identifying inquired object contact person is corresponding Aligned phoneme sequence.

In some embodiments, above-mentioned that speech recognition is carried out to the speech polling formula received, it is extracted from recognition result Out for identifying the corresponding aligned phoneme sequence of target identification of inquired object contact person, comprising: by the input of speech polling formula Trained character recognition and label phoneme extracts model, obtains the corresponding phoneme of target identification for identifying inquired object contact person Sequence.

Second aspect, the embodiment of the present application provide a kind of device of voice recognition of contact, comprising: recognition unit, quilt It is configured to carry out speech recognition to the speech polling formula received, be extracted from recognition result for identifying inquired target The corresponding aligned phoneme sequence of the target identification of contact person；Matching unit is configured as the corresponding aligned phoneme sequence of target identification and uses It is matched in the corresponding phonotactics of identification of contacts for identifying the default contact person in default linkman set, according to matching As a result the object contact person that speech polling formula is inquired is determined from default linkman set.

In some embodiments, above-mentioned apparatus further include: determination unit is configured to determine that in default linkman set The corresponding phonotactics of identification of contacts of default contact person.

In some embodiments, above-mentioned determination unit is configured to determine default contact person's collection as follows The corresponding phonotactics of identification of contacts of default contact person in conjunction: according to the corresponding relationship of individual character and phoneme in character library, Phoneme decomposition is carried out according to the individual character for being included to the identification of contacts of the default contact person in default linkman set, is obtained pre- If the corresponding phonotactics of identification of contacts of the default contact person in linkman set.

In some embodiments, above-mentioned recognition unit is configured to as follows look into the voice received Inquiry formula carries out speech recognition, and the target identification extracted from recognition result for identifying inquired object contact person is corresponding Aligned phoneme sequence: speech polling formula is decoded based on acoustic model, obtains the corresponding aligned phoneme sequence of speech polling formula；Based on language Say that the corresponding aligned phoneme sequence of speech polling formula is converted corresponding text identification result by model；By text identification result and preset Instruction template matched, extracted from text identification result and the matched instruction text section of preset instruction template；From It rejects corresponding with instruction text section aligned phoneme sequence in the corresponding aligned phoneme sequence of speech polling formula, obtains being inquired for identifying The corresponding aligned phoneme sequence of the target identification of object contact person.

In some embodiments, above-mentioned recognition unit is configured to as follows look into the voice received Inquiry formula carries out speech recognition, and the target identification extracted from recognition result for identifying inquired object contact person is corresponding Aligned phoneme sequence: the character recognition and label phoneme that the input of speech polling formula has been trained is extracted into model, is obtained for identifying inquired mesh Mark the corresponding aligned phoneme sequence of target identification of contact person.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors；Storage dress It sets, for storing one or more programs, when one or more programs are executed by one or more processors, so that one or more The method that a processor realizes the voice recognition of contact provided such as first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, In, the method for the voice recognition of contact that first aspect provides is realized when program is executed by processor.

The method and apparatus of the voice recognition of contact of the above embodiments of the present application, by the speech polling formula received Speech recognition is carried out, the corresponding phoneme of target identification for identifying inquired object contact person is extracted from recognition result Sequence；The corresponding aligned phoneme sequence of target identification is marked with the contact person for being used to identify the default contact person in default linkman set Know corresponding phonotactics to be matched, determines that speech polling formula is inquired from default linkman set according to matching result Object contact person, optimize the process of voice recognition of contact, eliminate and convert the Chinese for the corresponding phoneme of speech polling formula Word, the step of converting corresponding phonetic for Chinese character again, are able to ascend contact person's matching efficiency.

Further, since phoneme is phonetic unit more smaller than phonetic, therefore, it is based on the matched contact identification method of phoneme It is more advantageous to the aligned phoneme sequence and phonotactics for distinguishing similar pronunciation, so the voice recognition of contact of the above embodiments of the present application Method can also promote the accuracy rate of identification.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that the embodiment of the present application can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for the voice recognition of contact of the application；

Fig. 3 is the flow chart according to another embodiment of the method for the voice recognition of contact of the application；

Fig. 4 is the structural schematic diagram of one embodiment of the device of the voice recognition of contact of the application；

Fig. 5 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the device of the method or voice recognition of contact of the voice recognition of contact of the application Exemplary system architecture 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102, network and server 103.Network is used To provide the medium of communication link between terminal device 101,102 and server 103.Network may include various connection classes Type, such as wired, wireless communication link or fiber optic cables etc..

Terminal device 101,102 can be interacted by network with server 103, to receive or send message etc..Terminal is set Various voice messaging interactive applications, such as voice assistant application, information search application, map can be installed on standby 101,102 Using the application of, social platform, audio and video playing application etc..

Terminal device 101,102 can be the equipment with audio signal sample function, can be with microphone and props up Hold the various electronic equipments of internet access, including but not limited to car-mounted terminal, intelligent sound box, smart phone, tablet computer, intelligence Energy wrist-watch, laptop, above-knee pocket computer, E-book reader etc..

Server 103 can be to provide the server of Audio Signal Processing, such as speech recognition server.In network communication When quality is good, the audio signal that server 103 can send terminal device 101,102 is decoded, and identifies that message is believed Number corresponding text.The recognition result of voice signal can be passed through network-feedback to terminal device 101,102 by server 103.

Terminal device 101,102 can also or network poor in network communication quality it is unavailable when, to the user 110 of acquisition Audio signal parsed, judge that user is intended to, and responded.Such as user issues audio signal " phoning XXX ", Terminal device 101,102 can find the name of contact person that user wishes connection with off-line address list, and execute and make a phone call Operation.Terminal device 101,102 may include the component (such as the processors such as GPU) for executing physical manipulations, the application The method of voice recognition of contact provided by embodiment can be executed by terminal device 101,102, correspondingly, speech recognition connection It is that the device of people can be set in terminal device 101,102.

It should be understood that the terminal device, network, the number of server in Fig. 1 are only schematical.According to realization need It wants, can have any number of terminal device, network, server.Also, in the embodiment of the present application, above system framework Network and server can not included.

With continued reference to Fig. 2, it illustrates the streams according to one embodiment of the method for the voice recognition of contact of the application Journey 200.The method of the voice recognition of contact, comprising the following steps:

Step 201, speech recognition is carried out to the speech polling formula received, extracted from recognition result for identifying The corresponding aligned phoneme sequence of the target identification of the object contact person of inquiry.

In the present embodiment, the executing subject of the method for voice recognition of contact can receive speech polling formula.Voice is looked into Inquiry formula can be what the speech polling request issued by user generated.Specifically, speech polling formula can be sends out according to user Speech polling request out carries out the voice signal of coding generation, wherein may include the voice coder for requesting the content of inquiry Code.

In practice, user can issue the speech polling request of request inquiry object contact person to above-mentioned executing subject, Such as the voice request of " phoning Zhang San " can be issued.Above-mentioned executing subject can be according to the generation pair of the voice request of user The speech polling formula answered.

Above-mentioned executing subject can identify the speech polling formula received in local, can be in identification process Extract the acoustic feature of speech polling formula, such as extract fundamental frequency feature, mel cepstrum frequecy characteristic etc., based on the acoustics extracted Feature parses each speech frame of speech polling formula, obtains the corresponding phoneme of each speech frame, be then combined with it is continuous and Identical phoneme forms the corresponding aligned phoneme sequence of speech polling formula.

It is then based on the corresponding phoneme of each speech frame, optimal decoding paths are searched for using language model, is being searched for most During shortest path, the corresponding individual character of speech polling formula or word can be exported one by one, at this moment may determine that output individual character or Whether word is individual character or word for ID association people, specifically can be based on common character recognition and label (such as name, appellation) Library constructs key word library and keywords database, such as can construct the keywords database comprising common surname.In search optimal path In the process, it can be determined that currently decode the individual character obtained or word whether in above-mentioned key word library or keywords database.If so, It can determine and decode the individual character obtained or word currently as one in the target identification for identifying inquired object contact person A individual character or a word.Optionally, if currently decoding the individual character obtained or word in above-mentioned key word library or keywords database In, it can be combined with whether the context determination of the individual character individual character or word are for identifying inquired object contact person An individual character or a word in target identification.It, can be common if the individual character that decoding obtains is " Liu " by taking Chinese as an example Surname is matched to " Liu " in library, it is determined that the individual character is the individual character in the target identification for identifying object contact person；If solution The individual character that code obtains is " department ", and an individual character after the individual character is " horse ", then surname " department is matched in common surname library Horse " can also determine that " department ", " horse " are the individual character in the target identification for identifying inquired object contact person.

During searching for optimal decoding paths, if it is determined that go out one or more individual characters or word is for identifying mesh This then can be used to identify the corresponding aligned phoneme sequence of target identification of object contact person from voice by the target identification for marking contact person It is extracted in the corresponding aligned phoneme sequence of query formulation, obtains the corresponding aligned phoneme sequence of target identification.Herein, target identification can be with The appellation of object contact person, can with the name of object contact person, post, with the social relationships of local user (such as cousin, The Kinship Terms such as two uncle (mother's brother)s) etc. indicate.

In some optional implementations of the present embodiment, the speech polling formula that user issues can be to be marked by contact person Made of knowledge and corresponding operation instructing combination, such as " making a phone call to XX ", wherein " give ... and make a phone call " is operational order, " XX " Identification of contacts for the object contact person inquired.In this way, can by separation speech polling formula in identification of contacts and Operational order extracts the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person.It can be according to such as Under type carries out speech recognition to the speech polling formula received, is extracted from recognition result for identifying inquired target The corresponding aligned phoneme sequence of the target identification of contact person: it is primarily based on acoustic model and speech polling formula is decoded, obtain voice The corresponding aligned phoneme sequence of query formulation；It is then based on language model and converts corresponding text for the corresponding aligned phoneme sequence of speech polling formula This recognition result；Text identification result is matched with preset instruction template later, is extracted from text identification result With the matched instruction text section of preset instruction template；It is rejected and instruction text section from the corresponding aligned phoneme sequence of speech polling formula Corresponding aligned phoneme sequence obtains the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person.

Specifically, the acoustic feature that speech polling formula can be extracted first, be then based on acoustic model to acoustic feature into Row decoding, obtains the corresponding aligned phoneme sequence of speech polling formula.It is then based on language model to be decoded aligned phoneme sequence, search is most Excellent decoding paths obtain the text identification result of speech polling formula.It later, can be using fuzzy matching or accurate matched side Formula matches the corresponding text identification result of speech polling formula with preset instruction template.Preset instruction template can be Indicate to execute the instruction template of predetermined registration operation, such as " phoning ", " sending out wechat to ... ", " calling " etc..It can be from text Extracted in recognition result with the matched text chunk of preset instruction template, by text identification result in addition to preset instruction Other text chunks except the text chunk of template matching are as the target identification for identifying object contact person.Finally, can be from Determined in the corresponding aligned phoneme sequence of speech polling formula phoneme corresponding with the matched text chunk of preset instruction template and by its It rejects, then available aligned phoneme sequence corresponding with target identification.

It, can be as follows to the speech polling received in other optional implementations of the present embodiment Formula carries out speech recognition, and the corresponding sound of target identification for identifying inquired object contact person is extracted from recognition result Prime sequences: the character recognition and label phoneme that the input of speech polling formula has been trained is extracted into model, is obtained for identifying inquired target The corresponding aligned phoneme sequence of the target identification of contact person.

Specifically, personage's labeling phonemes can be trained to extract model in advance, can will include character recognition and label in training The corresponding voice data of instruction text as sample voice data, mark out the personage in the instruction text comprising character recognition and label The standard pronunciation of mark, and it is converted into corresponding aligned phoneme sequence.In the training process by adjusting character recognition and label sound to be trained The parameter that element extracts model is come so that character recognition and label phoneme to be trained extracts model to included in sample voice data The prediction result of the corresponding aligned phoneme sequence of character recognition and label reaches unanimity with annotation results.It is instructed based on a large amount of sample voice data After getting out character recognition and label phoneme extraction model, the speech polling formula received can be inputted into the character recognition and label trained Phoneme extracts model, extracts the corresponding aligned phoneme sequence of character recognition and label therein, then in the recognition result for obtaining speech polling formula The corresponding aligned phoneme sequence of target identification for identifying inquired object contact person.

Step 202, the corresponding aligned phoneme sequence of target identification default is contacted be used to identify in default linkman set The corresponding phonotactics of the identification of contacts of people match, and determine voice from default linkman set according to matching result The object contact person that query formulation is inquired.

In the present embodiment, the corresponding sound of identification of contacts of the default contact person in available default linkman set Element combination, wherein identification of contacts is for identifying default contact person.Then the corresponding sound of target identification step 201 extracted Prime sequences phonotactics corresponding with the identification of contacts of default contact person in default linkman set match.Wherein, Default linkman set can be local linkages people set, such as can be the address list that above-mentioned executing subject is saved and included All Contacts set, the identification of contacts of default contact person can be the appellation of default contact person, such as name, post Appellation etc..The corresponding aligned phoneme sequence of matched target identification and pre- for identifying can be treated using various string matching modes If the corresponding phonotactics of identification of contacts of the default contact person in linkman set match, such as can calculate similar Degree or diversity factor.Specifically, two words can be calculated for example, by using the mode of editing distance (Levenshtein distance) The distance between symbol string determines that matching degree is given a mark according to the distance being calculated.Distance is closer, then shows to be matched two Character string is more similar, and matching degree marking is higher.

It can be corresponding with the default identification of contacts of contact person of each local to the corresponding aligned phoneme sequence of target identification Phonotactics carry out matching degree marking, then can determine that the artificial target identification of the highest default connection of matching degree marking is marked The object contact person of knowledge.It is thus achieved that the object contact person that identification user is inquired in user speech inquiry.

It is alternatively possible to provide at least one matching result according to the sequence that matching degree is given a mark, that is, determine at least one The candidate result of object contact person, and matching result is ranked up according to the sequence that matching degree is given a mark.

It should be noted that each the identification of contacts of default contact person can at least correspond to a phonotactics.One In a little scenes, preset in the identification of contacts of contact person when including polyphone, for example, in default name of contact person comprising " all ", When the polyphones such as " pleasure ", " weight ", the pronunciation that the identification of contacts of the contact person can correspond to the polyphone for being included with it is combined The identical phonotactics of quantity.

The method of the voice recognition of contact of the above embodiments of the present application, by carrying out language to the speech polling formula received Sound identification, extracts the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person from recognition result； By the corresponding aligned phoneme sequence of target identification and the identification of contacts pair for being used to identify the default contact person in default linkman set The phonotactics answered are matched, and determine the mesh that speech polling formula is inquired from default linkman set according to matching result Contact person is marked, the process of offline voice recognition of contact is optimized, eliminates and convert the Chinese for the corresponding phoneme of speech polling formula Word, the step of converting corresponding phonetic for Chinese character again, are able to ascend contact person's matching efficiency.

With continued reference to Fig. 3, it illustrates according to another embodiment of the method for the voice recognition of contact of the application Flow chart.As shown in figure 3, the method flow 300 of the voice recognition of contact of the present embodiment, comprising the following steps:

Step 301, speech recognition is carried out to the speech polling formula received, extracted from recognition result for identifying The corresponding aligned phoneme sequence of the target identification of the object contact person of inquiry.

In the present embodiment, the executing subject of the method for voice recognition of contact can receive the speech polling according to user Contact person requests the speech polling formula generated, then can identify to speech polling formula.Speech polling can specifically be extracted Then the acoustic feature of formula converts corresponding aligned phoneme sequence for speech polling formula using acoustic model.It can use language later Speech model is decoded aligned phoneme sequence, the individual character or word being sequentially output in speech polling formula, in utilization language model to sound In prime sequences decoding process, it can detecte each and currently decode whether obtained individual character or word are to be identified by frequent contact In individual character or predetermined keyword library in the preset keyword library for individual character or the word building that identification of contacts in library is included Word, if so, the corresponding phoneme of individual character for currently decoding and obtaining can be extracted, and then extract for identifying target connection It is the corresponding aligned phoneme sequence of target identification of people.

In some optional implementations of the present embodiment, acoustic model can be primarily based on, speech polling formula is carried out Decoding, obtains the corresponding aligned phoneme sequence of speech polling formula；Language model is then based on by the corresponding aligned phoneme sequence of speech polling formula It is converted into corresponding text identification result；Text identification result is matched with preset instruction template later, is known from text It is extracted in other result and the matched instruction text section of preset instruction template；It is picked from the corresponding aligned phoneme sequence of speech polling formula Except aligned phoneme sequence corresponding with instruction text section, the corresponding sound of target identification for identifying inquired object contact person is obtained Prime sequences.Wherein, preset instruction template can be instruction execute predetermined registration operation instruction template, such as " phoning ", " is given ... to send out wechat ", " calling " etc..

In other optional implementations of the present embodiment, speech polling formula can be inputted to the personage trained and marked Know phoneme and extract model, obtains the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person.Wherein, Trained character recognition and label phoneme, which extracts model, can be used for extracting the corresponding aligned phoneme sequence of character recognition and label in the voice data of input.

Step 302, the corresponding phonotactics of identification of contacts of the default contact person in default linkman set are determined.

In the present embodiment, available default linkman set, then by the default connection in default linkman set The identification of contacts of people is converted into corresponding phonotactics.Default linkman set can be the user for issuing speech polling request Address list in contact person set, can by be stored in above-mentioned executing subject local address list obtain.Default connection The identification of contacts of people can be appellation of default contact person, such as name, post appellation, social relationships appellation etc..

Specifically, the connection of default contact person can be marked out according to the pronunciation dictionary of Chinese phonetic alphabet dictionary or other language People identifies corresponding phonetic or pronunciation, then carries out sound according to corresponding languages to the pronunciation of the phonetic marked out or other languages Element decomposes, and obtains the default corresponding phonotactics of identification of contacts.

In some optional implementations of the present embodiment, default connection can be determined in a manner of as follows 3021 It is the corresponding phonotactics of identification of contacts of the default contact person in people's set:

Step 3021, according to the corresponding relationship of individual character and phoneme in character library, to default in default linkman set It is that the identification of contacts of people carries out phoneme decomposition according to the individual character for being included, obtains the default contact person in default linkman set The corresponding phonotactics of identification of contacts.

It specifically, can be according between the individual character in the phoneme constructed in advance and basic character library in step 3021 Contrast relationship, by the identification of contacts of default contact person each individual character or word be converted into corresponding phoneme, then according to Sequence combination forms the corresponding phonotactics of identification of contacts of default contact person.In this way, can be directly according to phoneme and individual character Contrast relationship quickly determine out the corresponding phonotactics of identification of contacts of default contact person.

Step 303, the corresponding aligned phoneme sequence of target identification default is contacted be used to identify in default linkman set The corresponding phonotactics of the identification of contacts of people match, and determine voice from default linkman set according to matching result The object contact person that query formulation is inquired.

In the present embodiment, the corresponding aligned phoneme sequence of target identification and step 302 that step 301 can be extracted determine The corresponding phonotactics of identification of contacts of default contact person out match.Various string matching modes pair can be used The corresponding aligned phoneme sequence of target identification to be matched and contact person for identifying the default contact person in default linkman set It identifies corresponding phonotactics to be matched, such as similarity or diversity factor can be calculated.As an example, can using editor away from Mode from (Levenshtein distance) calculates the distance between two character strings, according to the distance being calculated come really Determine matching degree marking.Distance is closer, then shows that two character strings to be matched are more similar, and matching degree marking is higher.

The step 301 of the method flow of the voice recognition of contact of the present embodiment, step 303 respectively with previous embodiment Step 201, step 202 be consistent, and step 301, the specific implementation of step 303 can be retouched with reference to step 201, step 202 It states, details are not described herein again.

The method of the voice recognition of contact of the present embodiment is preset default in linkman set by increased determination The step of being the identification of contacts corresponding phonotactics of people, can quickly, in real time construct for matching under off-line state The corresponding phonotactics of identification of contacts of the default contact person for the object contact person that user is inquired are constructing default contact person Identification of contacts corresponding phonotactics when can determine corresponding phonotactics for each contact person in local address book, Avoid the corresponding sound of identification of contacts of the default contact person after address list updates in predetermined default linkman set Element combination fails the influence to timely update to matching result, can further promote the identification essence of offline voice recognition of contact Degree.

With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of speech recognition connection It is one embodiment of the device of people, the Installation practice is corresponding with Fig. 2 and embodiment of the method shown in Fig. 3, device tool Body can be applied in various electronic equipments.

As shown in figure 4, the device 400 of the voice recognition of contact of the present embodiment includes recognition unit 401 and matching unit 402.Wherein, recognition unit 401, which can be configured as, carries out speech recognition to the speech polling formula received, from recognition result Extract the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person；Matching unit 402 can be matched It is set to and marks the corresponding aligned phoneme sequence of target identification with the contact person for being used to identify the default contact person in default linkman set Know corresponding phonotactics to be matched, determines that speech polling formula is inquired from default linkman set according to matching result Object contact person.

In some embodiments, above-mentioned apparatus 400 can also comprise determining that unit, be configured to determine that default contact person The corresponding phonotactics of identification of contacts of default contact person in set.

In some embodiments, above-mentioned determination unit can be configured to determine default connection as follows The corresponding phonotactics of identification of contacts of default contact person in people's set: it is closed according to the individual character in character library is corresponding with phoneme System carries out phoneme decomposition according to the individual character for being included to the identification of contacts of the default contact person in default linkman set, obtains The corresponding phonotactics of identification of contacts of default contact person into default linkman set.

In some embodiments, above-mentioned recognition unit 401 can be configured to as follows to receiving Speech polling formula carry out speech recognition, the target mark for identifying inquired object contact person is extracted from recognition result Know corresponding aligned phoneme sequence: speech polling formula being decoded based on acoustic model, obtains the corresponding phoneme sequence of speech polling formula Column；Corresponding text identification result is converted by the corresponding aligned phoneme sequence of speech polling formula based on language model；By text identification As a result it is matched, is extracted from text identification result and the matched instruction of preset instruction template with preset instruction template Text chunk；Aligned phoneme sequence corresponding with instruction text section is rejected from the corresponding aligned phoneme sequence of speech polling formula, is obtained for marking Know the corresponding aligned phoneme sequence of target identification of inquired object contact person.

In some embodiments, above-mentioned recognition unit 401 can be configured to as follows to receiving Speech polling formula carry out speech recognition, the target mark for identifying inquired object contact person is extracted from recognition result Know corresponding aligned phoneme sequence: the character recognition and label phoneme that the input of speech polling formula has been trained being extracted into model, is obtained for identifying The corresponding aligned phoneme sequence of the target identification of the object contact person of inquiry.

It should be appreciated that all units recorded in device 400 and each step phase in the method described referring to figs. 2 and 3 It is corresponding.It is equally applicable to device 400 and unit wherein included above with respect to the operation and feature of method description as a result, herein It repeats no more.

The device 400 of the voice recognition of contact of the above embodiments of the present application, by being obtained using identification speech polling formula Carry out contact person's matching for identifying the corresponding aligned phoneme sequence of target identification of inquired object contact person, optimize language Sound identifies the process of contact person, eliminates and converts Chinese character for the corresponding phoneme of speech polling formula, by Chinese character converts correspondence again Phonetic the step of, be able to ascend contact person's matching efficiency.Simultaneously as phoneme is phonetic unit more smaller than phonetic, therefore, The aligned phoneme sequence and phonotactics for distinguishing similar pronunciation are more advantageous to based on the matched contact identification device of phoneme, so above-mentioned The device of voice recognition of contact can also promote the accuracy rate of identification.

Below with reference to Fig. 5, it illustrates the computer systems 500 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Electronic equipment shown in Fig. 5 is only an example, function to the embodiment of the present application and should not use model Shroud carrys out any restrictions.

As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into the program in random access storage device (RAM) 503 from storage section 508 and Execute various movements appropriate and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.

I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 505 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 508 including hard disk etc.； And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net executes communication process.Driver 510 is also connected to I/O interface 505 as needed.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to read from thereon Computer program be mounted into storage section 508 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 509, and/or from detachable media 511 are mounted.When the computer program is executed by central processing unit (CPU) 501, limited in execution the present processes Above-mentioned function.It should be noted that the computer-readable medium of the application can be computer-readable signal media or calculating Machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.In this application, computer readable storage medium can be it is any include or storage program Tangible medium, which can be commanded execution system, device or device use or in connection.And in this Shen Please in, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium Sequence code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, programming language include object oriented program language-such as Java, Smalltalk, C++, also Including conventional procedural programming language-such as " C " language or similar programming language.Program code can be complete It executes, partly executed on the user computer on the user computer entirely, being executed as an independent software package, part Part executes on the remote computer or executes on a remote computer or server completely on the user computer.It is relating to And in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or extensively Domain net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include recognition unit and matching unit.Wherein, the title of these units does not constitute the limit to the unit itself under certain conditions It is fixed, for example, recognition unit is also described as " speech recognition being carried out to the speech polling formula received, from recognition result Extract the unit of the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment；It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: speech recognition is carried out to the speech polling formula received, is extracted from recognition result for identifying inquired target The corresponding aligned phoneme sequence of the target identification of contact person；By the corresponding aligned phoneme sequence of target identification and it is used to identify default contact person's collection The corresponding phonotactics of identification of contacts of default contact person in conjunction match, and are collected according to matching result from default contact person The object contact person that speech polling formula is inquired is determined in conjunction.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method of voice recognition of contact, comprising:

Speech recognition is carried out to the speech polling formula received, is extracted from recognition result for identifying inquired target connection It is the corresponding aligned phoneme sequence of target identification of people；

By the corresponding aligned phoneme sequence of the target identification be used to identify contacting for default contact person in default linkman set People identifies corresponding phonotactics and matches, and determines the voice from the default linkman set according to matching result The object contact person that query formulation is inquired.

2. according to the method described in claim 1, wherein, the method also includes:

Determine the corresponding phonotactics of identification of contacts of the default contact person in default linkman set.

3. according to the method described in claim 2, wherein, the connection of the default contact person in linkman set is preset in the determination People identifies corresponding phonotactics, comprising:

Connection according to the corresponding relationship of individual character and phoneme in character library, to the default contact person in the default linkman set People's mark carries out phoneme decomposition according to the individual character for being included, and obtains the connection of the default contact person in the default linkman set People identifies corresponding phonotactics.

4. method according to claim 1-3, wherein the described pair of speech polling formula received carries out voice knowledge Not, the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person is extracted from recognition result, comprising:

The speech polling formula is decoded based on acoustic model, obtains the corresponding aligned phoneme sequence of the speech polling formula；

Corresponding text identification result is converted by the corresponding aligned phoneme sequence of the speech polling formula based on language model；

The text identification result is matched with preset instruction template, is extracted from the text identification result and institute State the matched instruction text section of preset instruction template；

Aligned phoneme sequence corresponding with described instruction text chunk is rejected from the corresponding aligned phoneme sequence of the speech polling formula, obtains institute State the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person.

5. method according to claim 1-3, wherein the described pair of speech polling formula received carries out voice knowledge Not, the corresponding aligned phoneme sequence of target identification for identifying inquired object contact person is extracted from recognition result, comprising:

The character recognition and label phoneme trained of speech polling formula input extracted into model, obtain it is described be used to identify inquired The corresponding aligned phoneme sequence of the target identification of object contact person.

6. a kind of device of voice recognition of contact, comprising:

Recognition unit is configured as carrying out speech recognition to the speech polling formula received, extracts and be used for from recognition result Identify the corresponding aligned phoneme sequence of target identification of inquired object contact person；

Matching unit is configured as the corresponding aligned phoneme sequence of the target identification and is used to identify in default linkman set The corresponding phonotactics of identification of contacts of default contact person match, according to matching result from the default linkman set In determine the object contact person that the speech polling formula is inquired.

7. device according to claim 6, wherein described device further include:

Determination unit is configured to determine that the corresponding phoneme group of the identification of contacts of the default contact person in default linkman set It closes.

8. device according to claim 7, wherein the determination unit is configured to determine as follows The corresponding phonotactics of identification of contacts of default contact person in default linkman set:

9. according to the described in any item devices of claim 6-8, wherein the recognition unit is configured to according to as follows Mode carries out speech recognition to the speech polling formula received, is extracted from recognition result for identifying inquired target connection It is the corresponding aligned phoneme sequence of target identification of people:

10. according to the described in any item devices of claim 6-8, wherein the recognition unit is configured to according to such as Under type carries out speech recognition to the speech polling formula received, is extracted from recognition result for identifying inquired target The corresponding aligned phoneme sequence of the target identification of contact person:

11. a kind of electronic equipment, comprising:

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.

12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor Now such as method as claimed in any one of claims 1 to 5.