[go: up one dir, main page]

CN104766604B - The labeling method and device of voice data - Google Patents

The labeling method and device of voice data Download PDF

Info

Publication number
CN104766604B
CN104766604B CN201510154477.XA CN201510154477A CN104766604B CN 104766604 B CN104766604 B CN 104766604B CN 201510154477 A CN201510154477 A CN 201510154477A CN 104766604 B CN104766604 B CN 104766604B
Authority
CN
China
Prior art keywords
voice data
information
characteristic information
extracted
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510154477.XA
Other languages
Chinese (zh)
Other versions
CN104766604A (en
Inventor
王彦文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201510154477.XA priority Critical patent/CN104766604B/en
Publication of CN104766604A publication Critical patent/CN104766604A/en
Application granted granted Critical
Publication of CN104766604B publication Critical patent/CN104766604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephone Function (AREA)

Abstract

The invention discloses a kind of labeling methods of voice data, comprising steps of monitoring the voice data of recording after opening voice recording;Characteristic information is extracted from the voice data of the detection;When the characteristic information extracted meets flag condition, to voice data setting flag information corresponding with the characteristic information extracted.The invention also discloses a kind of labelling apparatus of voice data.The present invention reduces marking operation processes, improve the labeling effciency of voice data in recording.

Description

The labeling method and device of voice data
Technical field
The present invention relates to language data process technical field more particularly to the labeling methods and device of voice data.
Background technique
With the generalization of terminal function, intelligence, it carries more and more functions to meet different demands layer User.For example, the voice call function and sound-recording function of terminal, in terminal Recording Process, subsequent find is needed for convenience The recording wanted can get label (mark information) ready by recording and distinguish to the content in recording.That records at present gets ready Mode is limited to very much, when recording is marked, needs quickly to light terminal screen, or even also need to unlock;Or in screen locking Interface carries out record labels, is operated it also requires lighting screen manually.
To sum up, existing record labels operating process is cumbersome, and then leads to record labels inefficient.
Summary of the invention
It is a primary object of the present invention to propose the labeling method and device of a kind of voice data, it is intended to solve existing record Phonetic symbol remembers that operating process is cumbersome, and then leads to the inefficient problem of record labels.
To achieve the above object, the labeling method of a kind of voice data provided by the invention, comprising steps of
After opening voice recording, the voice data of recording is monitored;
Characteristic information is extracted from the voice data of the monitoring;
When the characteristic information extracted meets flag condition, to language corresponding with the characteristic information extracted Sound data setting flag information.
Preferably, the characteristic information includes at least one in voiceprint, word speed information, loudness information or tone information Kind.
Preferably, the flag condition includes one of following scenario described:
The characteristic information extracted is consistent with preset characteristic information;
Alternatively, the changing value of the characteristic information extracted is more than specific threshold.
Preferably, described when the characteristic information extracted meets flag condition, to the feature extracted The step of information corresponding voice data setting flag information includes:
Determine the voice data for meeting flag condition;
Obtain preset mark information;
The preset mark information is added in the voice data for meeting flag condition.
Preferably, the preset mark information is corresponding with the characteristic information.
In addition, to achieve the above object, the present invention also proposes a kind of labelling apparatus of voice data, comprising:
Monitoring modular, for monitoring the voice data of recording after opening voice recording;
Extraction module, for extracting characteristic information from the voice data of the monitoring;
Setup module, for when the characteristic information extracted meets flag condition, to the spy extracted Reference ceases corresponding voice data setting flag information.
Preferably, the characteristic information includes at least one in voiceprint, word speed information, loudness information or tone information Kind.
Preferably, the flag condition includes one of following scenario described:
The characteristic information extracted is consistent with preset characteristic information;
Alternatively, the changing value of the characteristic information extracted is more than specific threshold.
Preferably, the setup module includes determination unit, acquiring unit and setting unit,
The determination unit, for determining the voice data for meeting flag condition;
The acquiring unit, for obtaining preset mark information;
The setting unit, for the preset mark information to be added to the voice data for meeting flag condition In.
Preferably, the preset mark information is corresponding with the characteristic information.
The tagging scheme of voice data proposed by the present invention is sentenced by extracting characteristic information from the voice data monitored Whether disconnected extracted characteristic information meets flag condition, when meeting flag condition, to the voice number for meeting flag condition According to being marked automatically.Reduce marking operation process, improves the labeling effciency of voice in recording.
Detailed description of the invention
Fig. 1 to realize the present invention the mobile terminal of each embodiment hardware configuration signal;
Fig. 2 is the wireless communication system schematic diagram of mobile terminal as shown in Figure 1;
Fig. 3 is the flow diagram of the preferred embodiment of the labeling method of voice data of the present invention;
Fig. 4 is that the present invention judges whether extracted characteristic information meets the flow diagram of one embodiment of flag condition;
Fig. 5 is the process that the present invention judges an extracted characteristic information embodiment whether consistent with preset characteristic information Schematic diagram;
Fig. 6 is the stream that the present invention judges extracted characteristic information another embodiment whether consistent with preset characteristic information Journey schematic diagram;
Fig. 7 is the functional block diagram of the preferred embodiment of the labelling apparatus of voice data of the present invention;
Fig. 8 is the refinement the functional block diagram of one embodiment of setup module in Fig. 7.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The mobile terminal of each embodiment of the present invention is realized in description with reference to the drawings.In subsequent description, use For indicate element such as " module ", " component " or " unit " suffix only for being conducive to explanation of the invention, itself There is no specific meanings.Therefore, " module " can be used mixedly with " component ".
Mobile terminal can be implemented in a variety of manners.For example, terminal described in the present invention may include such as moving Phone, smart phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP The mobile terminal of (portable media player), navigation device etc. and such as number TV, desktop computer etc. are consolidated Determine terminal.Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that in addition to being used in particular for moving Except the element of purpose, the construction of embodiment according to the present invention can also apply to the terminal of fixed type.
Fig. 1 to realize the present invention the mobile terminal of each embodiment hardware configuration signal.
Mobile terminal 100 may include wireless communication unit 110, A/V (audio/video) input unit 120, user's input Unit 130, sensing unit 140, output unit 150, memory 160, interface unit 170, controller 180 and power supply unit 190 Etc..Fig. 1 shows the mobile terminal with various assemblies, it should be understood that being not required for implementing all groups shown Part.More or fewer components can alternatively be implemented.The element of mobile terminal will be discussed in more detail below.
Wireless communication unit 110 generally includes one or more components, allows mobile terminal 100 and wireless communication system Or the radio communication between network.For example, wireless communication unit may include broadcasting reception module 111, mobile communication module 112, at least one of wireless Internet module 113, short range communication module 114 and location information module 115.
Broadcasting reception module 111 receives broadcast singal and/or broadcast from external broadcast management server via broadcast channel Relevant information.Broadcast channel may include satellite channel and/or terrestrial channel.Broadcast management server, which can be, to be generated and sent The broadcast singal and/or broadcast related information generated before the server or reception of broadcast singal and/or broadcast related information And send it to the server of terminal.Broadcast singal may include TV broadcast singal, radio signals, data broadcasting Signal etc..Moreover, broadcast singal may further include the broadcast singal combined with TV or radio signals.Broadcast phase Closing information can also provide via mobile communications network, and in this case, broadcast related information can be by mobile communication mould Block 112 receives.Broadcast singal can exist in a variety of manners, for example, it can be with the electronics of digital multimedia broadcasting (DMB) Program guide (EPG), digital video broadcast-handheld (DVB-H) electronic service guidebooks (ESG) etc. form and exist.Broadcast Receiving module 111 can receive signal broadcast by using various types of broadcast systems.Particularly, broadcasting reception module 111 It can be wide by using such as multimedia broadcasting-ground (DMB-T), digital multimedia broadcasting-satellite (DMB-S), digital video It broadcasts-holds (DVB-H), forward link media (MediaFLO@) Radio Data System, received terrestrial digital broadcasting integrated service (ISDB-T) etc. digit broadcasting system receives digital broadcasting.Broadcasting reception module 111, which may be constructed such that, to be adapted to provide for extensively Broadcast the various broadcast systems and above-mentioned digit broadcasting system of signal.Via the received broadcast singal of broadcasting reception module 111 and/ Or broadcast related information can store in memory 160 (or other types of storage medium).
Mobile communication module 112 sends radio signals to base station (for example, access point, node B etc.), exterior terminal And at least one of server and/or receive from it radio signal.Such radio signal may include that voice is logical Talk about signal, video calling signal or according to text and/or Multimedia Message transmission and/or received various types of data.
The Wi-Fi (Wireless Internet Access) of the support mobile terminal of wireless Internet module 113.The module can be internally or externally It is couple to terminal.Wi-Fi (Wireless Internet Access) technology involved in the module may include WLAN (Wireless LAN) (Wi-Fi), Wibro (WiMAX), Wimax (worldwide interoperability for microwave accesses), HSDPA (high-speed downlink packet access) etc..
Short range communication module 114 is the module for supporting short range communication.Some examples of short-range communication technology include indigo plant ToothTM, radio frequency identification (RFID), Infrared Data Association (IrDA), ultra wide band (UWB), purple honeybeeTMEtc..
Location information module 115 is the module for checking or obtaining the location information of mobile terminal.Location information module Typical case be GPS (global positioning system).According to current technology, GPS module 115, which calculates, comes from three or more satellites Range information and correct time information and the Information application triangulation for calculating, thus according to longitude, latitude Highly accurately calculate three-dimensional current location information.Currently, it is defended for the method for calculating position and temporal information using three Star and the error that calculated position and temporal information are corrected by using an other satellite.In addition, GPS module 115 It can be by Continuous plus current location information in real time come calculating speed information.
A/V input unit 120 is for receiving audio or video signal.A/V input unit 120 may include 121 He of camera Microphone 1220, camera 121 is to the static map obtained in video acquisition mode or image capture mode by image capture apparatus The image data of piece or video is handled.Treated, and picture frame may be displayed on display unit 151.At camera 121 Picture frame after reason can store in memory 160 (or other storage mediums) or carry out via wireless communication unit 110 It sends, two or more cameras 1210 can be provided according to the construction of mobile terminal.Microphone 122 can be in telephone relation mould Sound (audio data) is received via microphone in formula, logging mode, speech recognition mode etc. operational mode, and can be incited somebody to action Such acoustic processing is audio data.Audio that treated (voice) data can be converted in the case where telephone calling model For the format output that can be sent to mobile communication base station via mobile communication module 112.Various types can be implemented in microphone 122 Noise eliminate (or inhibit) algorithm with eliminate noise that (or inhibition) generates during sending and receiving audio signal or Person's interference.
The order that user input unit 130 can be inputted according to user generates key input data to control each of mobile terminal Kind operation.User input unit 130 allows user to input various types of information, and may include keyboard, metal dome, touch Plate (for example, the sensitive component of detection due to the variation of resistance, pressure, capacitor etc. caused by being contacted), idler wheel, rocking bar etc. Deng.Particularly, when touch tablet is superimposed upon in the form of layer on display unit 151, touch screen can be formed.
Sensing unit 140 detects the current state of mobile terminal 100, (for example, mobile terminal 100 opens or closes shape State), the position of mobile terminal 100, user is for the presence or absence of contact (that is, touch input) of mobile terminal 100, mobile terminal 100 orientation, the acceleration or deceleration movement of mobile terminal 100 and direction etc., and generate for controlling mobile terminal 100 The order of operation or signal.For example, sensing unit 140 can sense when mobile terminal 100 is embodied as sliding-type mobile phone The sliding-type phone is to open or close.In addition, sensing unit 140 be able to detect power supply unit 190 whether provide electric power or Whether person's interface unit 170 couples with external device (ED).Sensing unit 140 may include that proximity sensor 1410 will combine below Touch screen is described this.
Interface unit 170 be used as at least one external device (ED) connect with mobile terminal 100 can by interface.For example, External device (ED) may include wired or wireless headphone port, external power supply (or battery charger) port, wired or nothing Line data port, memory card port, the port for connecting the device with identification module, audio input/output (I/O) end Mouth, video i/o port, ear port etc..Identification module can be storage and use each of mobile terminal 100 for verifying user It plants information and may include subscriber identification module (UIM), client identification module (SIM), Universal Subscriber identification module (USIM) Etc..In addition, the device (hereinafter referred to as " identification device ") with identification module can take the form of smart card, therefore, know Other device can be connect via port or other attachment devices with mobile terminal 100.Interface unit 170, which can be used for receiving, to be come from The input (for example, data information, electric power etc.) of external device (ED) and the input received is transferred in mobile terminal 100 One or more elements can be used for transmitting data between mobile terminal and external device (ED).
In addition, when mobile terminal 100 is connect with external base, interface unit 170 may be used as allowing will be electric by it Power, which is provided from pedestal to the path or may be used as of mobile terminal 100, allows the various command signals inputted from pedestal to pass through it It is transferred to the path of mobile terminal.The various command signals or electric power inputted from pedestal, which may be used as mobile terminal for identification, is The no signal being accurately fitted on pedestal.Output unit 150 is configured to provide with vision, audio and/or tactile manner defeated Signal (for example, audio signal, vision signal, alarm signal, vibration signal etc.) out.Output unit 150 may include display Unit 151, audio output module 152, alarm unit 153 etc..
Display unit 151 may be displayed on the information handled in mobile terminal 100.For example, when mobile terminal 100 is in electricity When talking about call mode, display unit 151 can show and converse or other communicate (for example, text messaging, multimedia file Downloading etc.) relevant user interface (UI) or graphic user interface GUI).When mobile terminal 100 be in video calling mode or When person's image capture mode, display unit 151 can show captured image and/or received image, show video or image And UI or GUI of correlation function etc..
Meanwhile when display unit 151 and touch tablet in the form of layer it is superposed on one another to form touch screen when, display unit 151 may be used as input unit and output device.Display unit 151 may include liquid crystal display (LCD), thin film transistor (TFT) In LCD (TFT-LCD), Organic Light Emitting Diode (OLED) display, flexible display, three-dimensional (3D) display etc. at least It is a kind of.Some in these displays may be constructed such that transparence to allow user to watch from outside, this is properly termed as transparent Display, typical transparent display can be, for example, TOLED (transparent organic light emitting diode) display etc..According to specific Desired embodiment, mobile terminal 100 may include two or more display units (or other display devices), for example, moving Dynamic terminal may include outernal display unit (not shown) and inner display unit (not shown).Touch screen can be used for detecting touch Input pressure and touch input position and touch input area.
Audio output module 152 can mobile terminal be in call signal reception pattern, call mode, logging mode, It is when under the isotypes such as speech recognition mode, broadcast reception mode, wireless communication unit 110 is received or in memory 160 The audio data transducing audio signal of middle storage and to export be sound.Moreover, audio output module 152 can provide and movement The relevant audio output of specific function (for example, call signal receives sound, message sink sound etc.) that terminal 100 executes. Audio output module 152 may include loudspeaker, buzzer etc..
Alarm unit 153 can provide output notifying event to mobile terminal 100.Typical event can be with Including calling reception, message sink, key signals input, touch input etc..Other than audio or video output, alarm unit 153 can provide output in different ways with the generation of notification event.For example, alarm unit 153 can be in the form of vibration Output is provided, when receiving calling, message or some other entrance communications (incoming communication), alarm list Member 153 can provide tactile output (that is, vibration) to notify to user.By providing such tactile output, even if When the mobile phone of user is in the pocket of user, user also can recognize that the generation of various events.Alarm unit 153 The output of the generation of notification event can be provided via display unit 151 or audio output module 152.
Memory 160 can store the software program etc. of the processing and control operation that are executed by controller 180, Huo Zheke Temporarily to store oneself data (for example, telephone directory, message, still image, video etc.) through exporting or will export.And And memory 160 can store about the vibrations of various modes and audio signal exported when touching and being applied to touch screen Data.
Memory 160 may include the storage medium of at least one type, and the storage medium includes flash memory, hard disk, more Media card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access storage Device (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..Moreover, mobile terminal 100 can execute memory with by network connection The network storage device of 160 store function cooperates.
The overall operation of the usually control mobile terminal of controller 180.For example, controller 180 executes and voice communication, data Communication, video calling etc. relevant control and processing.In addition, controller 180 may include for reproducing (or playback) more matchmakers The multi-media module 1810 of volume data, multi-media module 1810 can construct in controller 180, or can be structured as and control Device 180 processed separates.Controller 180 can be with execution pattern identifying processing, by the handwriting input executed on the touchscreen or figure Piece draws input and is identified as character or image.
Power supply unit 190 receives external power or internal power under the control of controller 180 and provides operation each member Electric power appropriate needed for part and component.
Various embodiments described herein can be to use the calculating of such as computer software, hardware or any combination thereof Machine readable medium is implemented.Hardware is implemented, embodiment described herein can be by using application-specific IC (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), scene can Programming gate array (FPGA), controller, microcontroller, microprocessor, is designed to execute function described herein processor At least one of electronic unit is implemented, and in some cases, such embodiment can be implemented in controller 180. For software implementation, the embodiment of such as process or function can with allow to execute the individual of at least one functions or operations Software module is implemented.Software code can by the software application (or program) write with any programming language appropriate Lai Implement, software code can store in memory 160 and be executed by controller 180.
So far, oneself is through describing mobile terminal according to its function.In the following, for the sake of brevity, will description such as folded form, Slide type mobile terminal in various types of mobile terminals of board-type, oscillating-type, slide type mobile terminal etc., which is used as, to be shown Example.Therefore, the present invention can be applied to any kind of mobile terminal, and be not limited to slide type mobile terminal.
Mobile terminal 100 as shown in Figure 1 may be constructed such that using via frame or grouping send data it is all if any Line and wireless communication system and satellite-based communication system operate.
Referring now to Fig. 2 description communication system that wherein mobile terminal according to the present invention can operate.
Different air interface and/or physical layer can be used in such communication system.For example, used by communication system Air interface includes such as frequency division multiple access (FDMA), time division multiple acess (TDMA), CDMA (CDMA) and universal mobile communications system System (UMTS) (particularly, long term evolution (LTE)), global system for mobile communications (GSM) etc..As non-limiting example, under The description in face is related to cdma communication system, but such introduction is equally applicable to other types of system.
With reference to Fig. 2, cdma wireless communication system may include multiple mobile terminals 100, multiple base stations (BS) 270, base station Controller (BSC) 275 and mobile switching centre (MSC) 2800MSC280 are configured to and Public Switched Telephony Network (PSTN) 290 form interface.MSC280 is also structured to form interface with the BSC275 that can be couple to base station 270 via back haul link. Back haul link can be constructed according to any in several known interfaces, and the interface includes such as E1/T1, ATM, IP, PPP, frame relay, HDSL, ADSL or xDSL.It will be appreciated that system may include multiple BSC2750 as shown in Figure 2.
Each BS270 can service one or more subregions (or region), by multidirectional antenna or the day of direction specific direction Each subregion of line covering is radially far from BS270.Alternatively, each subregion can be by two or more for diversity reception Antenna covering.Each BS270, which may be constructed such that, supports multiple frequency distribution, and the distribution of each frequency has specific frequency spectrum (for example, 1.25MHz, 5MHz etc.).
What subregion and frequency were distributed, which intersects, can be referred to as CDMA Channel.BS270 can also be referred to as base station transceiver System (BTS) or other equivalent terms.In this case, term " base station " can be used for broadly indicating single BSC275 and at least one BS270.Base station can also be referred to as " cellular station ".Alternatively, each subregion of specific BS270 can be claimed For multiple cellular stations.
As shown in Figure 2, broadcast singal is sent to the mobile terminal operated in system by broadcsting transmitter (BT) 295 100.Broadcasting reception module 111 as shown in Figure 1 is arranged at mobile terminal 100 to receive the broadcast sent by BT295 Signal.In fig. 2 it is shown that several global positioning system (GPS) satellites 300.The help of satellite 300 positions multiple mobile terminals At least one of 100.
In Fig. 2, multiple satellites 300 are depicted, it is understood that, it is useful to can use any number of satellite acquisition Location information.GPS module 115 as shown in Figure 1 is generally configured to cooperate with satellite 300 to obtain desired positioning and believe Breath.It substitutes GPS tracking technique or except GPS tracking technique, the other of the position that can track mobile terminal can be used Technology.In addition, at least one 300 property of can choose of GPS satellite or extraly processing satellite dmb transmission.
As a typical operation of wireless communication system, BS270 receives the reverse link from various mobile terminals 100 Signal.Mobile terminal 100 usually participates in call, information receiving and transmitting and other types of communication.Certain base station 270 is received each anti- It is handled in specific BS270 to link signal.The data of acquisition are forwarded to relevant BSC275.BSC provides call The mobile management function of resource allocation and the coordination including the soft switching process between BS270.The number that BSC275 will also be received According to MSC280 is routed to, the additional route service for forming interface with PSTN290 is provided.Similarly, PSTN290 with MSC280 forms interface, and MSC and BSC275 form interface, and BSC275 controls BS270 correspondingly with by forward link signals It is sent to mobile terminal 100.
Based on above-mentioned mobile terminal hardware configuration and communication system, each of the labeling method of voice data of the present invention is proposed A embodiment.
As shown in figure 3, one embodiment of the invention proposes a kind of labeling method of voice data, comprising steps of
Step S10 monitors the voice data of recording after opening voice recording;
User converses by voice communication software, or is chatted by instant communication software, or in aspectant chat When, if user is subsequent to need to inquire current dialog context, needs to record to current dialog context, pass through terminal pair Current dialog context is recorded, for example, being recorded by mobile phone to dialog context, or by sound pick-up outfit in current talking Appearance is recorded etc..After opening voice recording, after recording to dialog context, the voice data of recording is detected.
Step S20 extracts characteristic information from the voice data of the detection;
The characteristic information include but is not limited in voiceprint, word speed information, loudness information or tone information at least It is a kind of;After the voice data for monitoring to record, characteristic information is extracted from the voice data monitored;Judgement is extracted Characteristic information whether meet flag condition;The flag condition that meets is extracted characteristic information and preset characteristic information Unanimously, the flag condition also can also be at least one of gender, keyword, converting speech type of user etc..It is described Preset characteristic information include but is not limited to vocal print, word speed, dialect, voice intensity, the frequency of voice, stress, the special tone Deng.Setting flag condition in advance, setting up procedure include: the setting instruction for receiving flag condition, setting and setting instruction pair The flag condition answered.After completing flag condition setting, after the voice data for monitoring to record, the voice data of monitoring is judged Whether flag condition is met.Described the case where meeting flag condition includes: that 1) extracted characteristic information and preset feature are believed Breath is consistent;2) changing value of the characteristic information extracted is more than specific threshold.
By it is above-mentioned 1) for, specifically, with reference to Fig. 4, when flag condition is preset characteristic information, the judgement is mentioned The process whether characteristic information taken meets flag condition may include:
Step S21 judges whether extracted characteristic information is consistent with preset characteristic information;
Step S22 judges extracted characteristic information when extracted characteristic information is consistent with preset characteristic information Meet flag condition.
Setting flag condition in advance, that is, the characteristic information of voice data is set in advance, and the preset characteristic information is at least Including above-mentioned one kind, for example, be set as vocal print or be set as word speed etc., it also can also be and be set as vocal print and word speed.It is supervising When measuring the voice data of recording, characteristic information is extracted from the voice data monitored, for example, the voice data mentions Take its vocal print, word speed, the frequency of voice, voice intensity, dialect, stress and/or the special tone etc., judge extracted feature Whether information consistent with preset characteristic information, for example, judge the voice of voice data frequency whether with preset frequency one Cause, when the frequency of the voice of the voice data is consistent with preset frequency, judge extracted characteristic information with it is preset Characteristic information is consistent, and then judges that extracted characteristic information meets flag condition.When being provided with multiple characteristic informations, in institute When the characteristic information of extraction and set all characteristic informations are consistent, extracted characteristic information and preset feature are judged Information is consistent;Have one it is inconsistent when, judge that extracted characteristic information and preset characteristic information are inconsistent, so judge Extracted characteristic information is unsatisfactory for flag condition.
Specifically, with reference to Fig. 5, when preset characteristic information is preset voiceprint, the extracted spy of judgement Reference breath whether with the consistent process of preset characteristic information may include:
Step S221 judges whether extracted characteristic information is consistent with preset voiceprint;
Step S222 judges extracted feature letter when extracted characteristic information is consistent with preset voiceprint It ceases consistent with preset characteristic information.
So-called vocal print is the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown.Modern scientific research shows Vocal print not only has specificity, but also has the characteristics of relative stability.That is, the vocal print of each user is specifically, with other Therefore the equal difference of people can distinguish different users by vocal print.Setting needs the corresponding vocal print of user marked in advance Information, such as the voiceprint of party A-subscriber is set in advance.After the voice data for monitoring to record, the voice of the monitoring is extracted The voiceprint of data, when the voiceprint is consistent with the voiceprint of party A-subscriber, judge extracted 0 voiceprint with Preset voiceprint is consistent, judges that extracted characteristic information is consistent with preset characteristic information.Sentenced by voiceprint Whether the user in the disconnected voice data recorded changes, such as, if it makes a speech from party A-subscriber's variation for party B-subscriber.
Specifically, with reference to Fig. 6, when preset characteristic information is preset word speed information, the judgement extraction Characteristic information whether with the consistent process of preset characteristic information may include:
Step S223 judges whether extracted characteristic information is consistent with preset word speed information;
Step S224 judges extracted feature letter when extracted characteristic information is consistent with preset word speed information It ceases consistent with preset characteristic information.
Setting needs the corresponding word speed information of user marked in advance, for example, the word speed of a user is set in advance, setting The word speed of user A.After the voice data for monitoring to record, the word speed information of the voice data of the monitoring is extracted, is being mentioned When the word speed information taken is consistent with the word speed of user A, judge that extracted word speed information is consistent with preset word speed information, in turn Judge that extracted characteristic information is consistent with preset characteristic information.
For 2), the specific threshold includes but is not limited to Speed variation 50%, tone variation 50% etc..For example, The word speed of same user accelerates 50%, then to the part setting flag information of quickening;It is right in the tone variation 50% of same user Changed part setting flag information, for example, when voiceprint is consistent, but word speed is different, then the portion different to word speed Minute mark note.
In other embodiments of the present invention, it is also possible to judge by stress, the special tone or transfer language type Whether extracted characteristic information is consistent with preset characteristic information, voiceprint in specific deterministic process and above-described embodiment Similar with the process of dialect information, this is no longer going to repeat them.Equally it is also possible to judge by the gender of keyword, user Whether flag condition is met, for example, marking when occurring preset keyword (record or extraction etc.) in voice data to meeting The voice data of condition is marked.It can also be and each characteristic information is combined into judgement, for example, by vocal print and stress Combine judgement, in order to improve accuracy, when both consistent, judges that extracted characteristic information and preset feature are believed Breath is consistent;It may be set to be one of, example consistent with preset characteristic information that be judged as extracted characteristic information of meeting Such as, voiceprint or stress one of them is consistent;It can also be in the language for being judged as same user's sending by vocal print When sound data, then labeling process completed by word speed or voice intensity, for example, in the word speed and/or language of the same user When sound Strength Changes, changed voice data is marked.It can also be when category of language changes, to variation Part be marked.
Step S30 is arranged the voice data for meeting flag condition when extracted characteristic information meets flag condition Mark information.
When extracted characteristic information meets flag condition, the voice data for meeting flag condition is determined, that is, from monitoring To voice data in determine the part for meeting flag condition, to identified voice data setting flag information.Specific mistake Journey are as follows: determine the voice data for meeting flag condition;Obtain preset mark information;The preset mark information is added to In the voice data for meeting flag condition.The identification information includes but is not limited to character string, number etc., for example, will expire Identification information 1 is arranged in the voice data of the one of flag condition of foot, and the voice data for meeting another flag condition is arranged Identification information 2, for example, being marked to the voice mark 1 of the same user to the voice mark 2 of other users, or to party A-subscriber 1, to party B-subscriber's label 2, to C user's mark 3 etc..By automatically during voice recording to the voice number for meeting flag condition According to being marked, it is not necessary that voice data is marked in the case where screen is lighted, so that phonetic symbol process automation, nothing The marking operation of voice need to be manually completed.
1) and 2) described according to different characteristic information or change during the above-mentioned setting flag information in the way of Change is worth the different flag information of different settings, for example, voiceprint, word speed information are believed from the corresponding different label of tone information Breath, the voiceprint correspond to digital signature information, and word speed information corresponds to string token information, the corresponding letter mark of tone information Information etc. is remembered, and different characteristic information corresponds to different mark informations, for example, word speed difference corresponds to different mark informations, sound Line difference corresponds to different mark informations etc..
In order to preferably describe voice data of the present invention labeling method process, described with three different scenes:
For example, scene one, party A-subscriber access to respondent B, A is women, and B is male.Two people take question-response Mode engages in the dialogue, and when the speech of A each time, is all marked, facilitates the positioning of problem, pass through voiceprint or word speed Information for party A-subscriber to determine whether talk;
The discussion group that scene two, user C have initiated five people is being directed to a problem and is carrying out Hot Issues Discussing, Recording can distinguish according to different vocal prints, recognize different voice, and carry out same label to the same sound, convenient The classified finishing of different people viewpoint;
Scene three, user D are set in advance " record " as label keyword, carry out during record to keyword real When identify, once recognizing " record " two word, mark at once.
Other than above scene, it may be implemented to carry out different voices under the recording scene of other voice data Identification label, facilitates positioning, the arrangement in later period etc. to operate.
The present embodiment judges that extracted characteristic information is by extracting characteristic information from the voice data monitored It is no to meet flag condition, when meeting flag condition, the voice data for meeting flag condition is marked automatically.Reduce mark Remember operating process, improves the labeling effciency of voice in recording.
The present invention further provides a kind of labelling apparatus of voice data.
It is the functional block diagram of the preferred embodiment of the labelling apparatus of voice data of the present invention referring to Fig. 7, Fig. 7.
A kind of labelling apparatus of voice data of the present embodiment, the labelling apparatus of the voice data include: monitoring modular 10, Extraction module 20 and setup module 30.
The monitoring modular 10, for monitoring the voice data of recording after opening voice recording;
User converses by voice communication software, or is chatted by instant communication software, or in aspectant chat When, if user is subsequent to need to inquire current dialog context, needs to record to current dialog context, pass through terminal pair Current dialog context is recorded, for example, being recorded by mobile phone to dialog context, or by sound pick-up outfit in current talking Appearance is recorded etc..After opening voice recording, after recording to dialog context, the voice data of recording is detected.
The extraction module 20, for extracting characteristic information from the voice data of the detection;
The characteristic information include but is not limited in voiceprint, word speed information, loudness information or tone information at least It is a kind of;After the voice data for monitoring to record, characteristic information is extracted from the voice data monitored;Judgement is extracted Characteristic information whether meet flag condition;The flag condition that meets is extracted characteristic information and preset characteristic information Unanimously, the flag condition also can also be at least one of gender, keyword, converting speech type of user etc..It is described Preset characteristic information include but is not limited to vocal print, word speed, dialect, voice intensity, the frequency of voice, stress, the special tone Deng.Setting flag condition in advance, setting up procedure include: the setting instruction for receiving flag condition, setting and setting instruction pair The flag condition answered.After completing flag condition setting, after the voice data for monitoring to record, the voice data of monitoring is judged Whether flag condition is met.Described the case where meeting flag condition includes: that 1) extracted characteristic information and preset feature are believed Breath is consistent;2) changing value of the characteristic information extracted is more than specific threshold.
By it is above-mentioned 1) for, specifically, when flag condition is preset characteristic information, the extracted feature of the judgement Information whether meet flag condition process may include: judge extracted characteristic information whether with preset characteristic information one It causes;When extracted characteristic information is consistent with preset characteristic information, judge that extracted characteristic information meets flag condition.
Setting flag condition in advance, that is, the characteristic information of voice data is set in advance, and the preset characteristic information is at least Including above-mentioned one kind, for example, be set as vocal print or be set as word speed etc., it also can also be and be set as vocal print and word speed.It is supervising When measuring the voice data of recording, characteristic information is extracted from the voice data monitored, for example, the voice data mentions Take its vocal print, word speed, the frequency of voice, voice intensity, dialect, stress and/or the special tone etc., judge extracted feature Whether information consistent with preset characteristic information, for example, judge the voice of voice data frequency whether with preset frequency one Cause, when the frequency of the voice of the voice data is consistent with preset frequency, judge extracted characteristic information with it is preset Characteristic information is consistent, and then judges that extracted characteristic information meets flag condition.When being provided with multiple characteristic informations, in institute When the characteristic information of extraction and set all characteristic informations are consistent, extracted characteristic information and preset feature are judged Information is consistent;Have one it is inconsistent when, judge that extracted characteristic information and preset characteristic information are inconsistent, so judge Extracted characteristic information is unsatisfactory for flag condition.
Specifically, when preset characteristic information is preset voiceprint, the extracted characteristic information of judgement is It is no with the consistent process of preset characteristic information may include: judge extracted characteristic information whether with preset voiceprint Unanimously;When extracted characteristic information is consistent with preset voiceprint, extracted characteristic information and preset spy are judged Reference breath is consistent.
So-called vocal print is the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown.Modern scientific research shows Vocal print not only has specificity, but also has the characteristics of relative stability.That is, the vocal print of each user is specifically, with other Therefore the equal difference of people can distinguish different users by vocal print.Setting needs the corresponding vocal print of user marked in advance Information, such as the voiceprint of party A-subscriber is set in advance.After the voice data for monitoring to record, the voice of the monitoring is extracted The voiceprint of data, when the voiceprint is consistent with the voiceprint of party A-subscriber, judge extracted 0 voiceprint with Preset voiceprint is consistent, judges that extracted characteristic information is consistent with preset characteristic information.Sentenced by voiceprint Whether the user in the disconnected voice data recorded changes, such as, if it makes a speech from party A-subscriber's variation for party B-subscriber.
Specifically, when preset characteristic information is preset word speed information, the characteristic information of the judgement extraction Whether with the consistent process of preset characteristic information may include: judge extracted characteristic information whether with preset word speed believe Breath is consistent;When extracted characteristic information is consistent with preset word speed information, judge extracted characteristic information with it is preset Characteristic information is consistent.
Setting needs the corresponding word speed information of user marked in advance, for example, the word speed of a user is set in advance, setting The word speed of user A.After the voice data for monitoring to record, the word speed information of the voice data of the monitoring is extracted, is being mentioned When the word speed information taken is consistent with the word speed of user A, judge that extracted word speed information is consistent with preset word speed information, in turn Judge that extracted characteristic information is consistent with preset characteristic information.
For 2), the specific threshold includes but is not limited to Speed variation 50%, tone variation 50% etc..For example, The word speed of same user accelerates 50%, then to the part setting flag information of quickening;It is right in the tone variation 50% of same user Changed part setting flag information, for example, when voiceprint is consistent, but word speed is different, then the portion different to word speed Minute mark note.
In other embodiments of the present invention, it is also possible to judge by stress, the special tone or transfer language type Whether extracted characteristic information is consistent with preset characteristic information, voiceprint in specific deterministic process and above-described embodiment Similar with the process of dialect information, this is no longer going to repeat them.Equally it is also possible to judge by the gender of keyword, user Whether flag condition is met, for example, marking when occurring preset keyword (record or extraction etc.) in voice data to meeting The voice data of condition is marked.It can also be and each characteristic information is combined into judgement, for example, by vocal print and stress Combine judgement, in order to improve accuracy, when both consistent, judges that extracted characteristic information and preset feature are believed Breath is consistent;It may be set to be one of, example consistent with preset characteristic information that be judged as extracted characteristic information of meeting Such as, voiceprint or stress one of them is consistent;It can also be in the language for being judged as same user's sending by vocal print When sound data, then labeling process completed by word speed or voice intensity, for example, in the word speed and/or language of the same user When sound Strength Changes, changed voice data is marked.It can also be when category of language changes, to variation Part be marked.
The setup module 30, for when extracted characteristic information meets flag condition, to meeting flag condition Voice data setting flag information.
Specifically, the setup module 30 includes determination unit 31, acquiring unit 32 and setting unit 33 with reference to Fig. 8,
When extracted characteristic information meets flag condition, the voice data for meeting flag condition is determined, that is, from monitoring To voice data in determine the part for meeting flag condition, to identified voice data setting flag information;Specifically, The determination unit 31, for determining the voice data for meeting flag condition;The acquiring unit 32, for obtaining preset mark Information;The setting unit 33, for the preset mark information to be added in the voice data for meeting flag condition. The identification information includes but is not limited to character string, number etc., for example, the voice data for meeting one of flag condition is set Identification information 1 is set, identification information 2 is arranged in the voice data for meeting another flag condition, for example, to the sound of the same user Phonetic symbol note 1, to the voice mark 2 of other users, or to party A-subscriber's label 1, to party B-subscriber's label 2, to C user's mark 3 etc..It is logical It crosses and the voice data for meeting flag condition is marked during voice recording automatically, without in the case where screen is lighted Voice data is marked, so that phonetic symbol process automation, without manually completing the marking operation of voice.
1) and 2) described according to different characteristic information or change during the above-mentioned setting flag information in the way of Change is worth the different flag information of different settings, for example, voiceprint, word speed information are believed from the corresponding different label of tone information Breath, the voiceprint correspond to digital signature information, and word speed information corresponds to string token information, the corresponding letter mark of tone information Information etc. is remembered, and different characteristic information corresponds to different mark informations, for example, word speed difference corresponds to different mark informations, sound Line difference corresponds to different mark informations etc..
In order to preferably describe voice data of the present invention labeling method process, described with three different scenes:
For example, scene one, party A-subscriber access to respondent B, A is women, and B is male.Two people take question-response Mode engages in the dialogue, and when the speech of A each time, is all marked, facilitates the positioning of problem, pass through voiceprint or word speed Information for party A-subscriber to determine whether talk;
The discussion group that scene two, user C have initiated five people is being directed to a problem and is carrying out Hot Issues Discussing, Recording can distinguish according to different vocal prints, recognize different voice, and carry out same label to the same sound, convenient The classified finishing of different people viewpoint;
Scene three, user D are set in advance " record " as label keyword, carry out during record to keyword real When identify, once recognizing " record " two word, mark at once.
Other than above scene, it may be implemented to carry out different voices under the recording scene of other voice data Identification label, facilitates positioning, the arrangement in later period etc. to operate.
The present embodiment judges that extracted characteristic information is by extracting characteristic information from the voice data monitored It is no to meet flag condition, when meeting flag condition, the voice data for meeting flag condition is marked automatically.Reduce mark Remember operating process, improves the labeling effciency of voice in recording.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (8)

1. a kind of labeling method of voice data, which is characterized in that comprising steps of
After opening voice recording, the voice data of recording is monitored;
Characteristic information is extracted from the voice data of the monitoring;
When the characteristic information extracted meets flag condition, to voice number corresponding with the characteristic information extracted According to setting flag information;
Wherein, the flag condition includes:
The characteristic information first extracted described in judgement is consistent with preset characteristic information;
The changing value for judging the characteristic information extracted again is more than specific threshold.
2. the labeling method of voice data as described in claim 1, which is characterized in that the characteristic information includes vocal print letter At least one of breath, word speed information, loudness information or tone information.
3. such as the labeling method of the described in any item voice data of claim 1 to 2, which is characterized in that described to work as the extraction To characteristic information meet flag condition when, to voice data setting flag information corresponding with the characteristic information extracted The step of include:
Determine the voice data for meeting flag condition;
Obtain preset mark information;
The preset mark information is added in the voice data for meeting flag condition.
4. the labeling method of voice data as claimed in claim 3, which is characterized in that the preset mark information and the spy Reference manner of breathing is corresponding.
5. a kind of labelling apparatus of voice data characterized by comprising
Monitoring modular, for monitoring the voice data of recording after opening voice recording;
Extraction module, for extracting characteristic information from the voice data of the monitoring;
Setup module, for believing with the feature extracted when the characteristic information extracted meets flag condition Cease corresponding voice data setting flag information;
Wherein, the flag condition includes:
The characteristic information first extracted described in judgement is consistent with preset characteristic information;
The changing value for judging the characteristic information extracted again is more than specific threshold.
6. the labelling apparatus of voice data as claimed in claim 5, which is characterized in that the characteristic information includes vocal print letter At least one of breath, word speed information, loudness information or tone information.
7. such as the labelling apparatus of the described in any item voice data of claim 5 to 6, which is characterized in that the setup module packet Determination unit, acquiring unit and setting unit are included,
The determination unit, for determining the voice data for meeting flag condition;
The acquiring unit, for obtaining preset mark information;
The setting unit, for the preset mark information to be added in the voice data for meeting flag condition.
8. the labelling apparatus of voice data as claimed in claim 7, which is characterized in that the preset mark information and the spy Reference manner of breathing is corresponding.
CN201510154477.XA 2015-04-02 2015-04-02 The labeling method and device of voice data Active CN104766604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510154477.XA CN104766604B (en) 2015-04-02 2015-04-02 The labeling method and device of voice data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510154477.XA CN104766604B (en) 2015-04-02 2015-04-02 The labeling method and device of voice data

Publications (2)

Publication Number Publication Date
CN104766604A CN104766604A (en) 2015-07-08
CN104766604B true CN104766604B (en) 2019-01-08

Family

ID=53648388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510154477.XA Active CN104766604B (en) 2015-04-02 2015-04-02 The labeling method and device of voice data

Country Status (1)

Country Link
CN (1) CN104766604B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825111A (en) * 2016-03-18 2016-08-03 乐视移动智能信息技术(北京)有限公司 Method and device for controlling recording and tapping of terminal, and terminal
CN105871842B (en) * 2016-03-31 2020-04-07 宇龙计算机通信科技(深圳)有限公司 Voice encryption and decryption method, encryption and decryption device and terminal
CN106128460A (en) * 2016-08-04 2016-11-16 周奇 A kind of record labels method and device
CN106571137A (en) * 2016-10-28 2017-04-19 努比亚技术有限公司 Terminal voice dotting control device and method
CN107181849A (en) * 2017-04-19 2017-09-19 北京小米移动软件有限公司 The way of recording and device
CN109102810B (en) * 2017-06-21 2021-10-15 北京搜狗科技发展有限公司 Voiceprint recognition method and device
CN107610718A (en) * 2017-08-29 2018-01-19 深圳市买买提乐购金融服务有限公司 A kind of method and device that voice document content is marked
CN108364664B (en) * 2018-02-01 2020-04-24 云知声智能科技股份有限公司 Method for automatic data acquisition and marking
CN109273008A (en) * 2018-10-15 2019-01-25 腾讯科技(深圳)有限公司 Processing method, device, computer storage medium and the terminal of voice document
CN109493882A (en) * 2018-11-04 2019-03-19 国家计算机网络与信息安全管理中心 A kind of fraudulent call voice automatic marking system and method
CN111935552A (en) * 2020-07-30 2020-11-13 安徽鸿程光电有限公司 Information labeling method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1497932A (en) * 2002-10-23 2004-05-19 国际商业机器公司 System and method of managing personal telephone recording
US20090187405A1 (en) * 2008-01-18 2009-07-23 International Business Machines Corporation Arrangements for Using Voice Biometrics in Internet Based Activities
CN102522084A (en) * 2011-12-22 2012-06-27 广东威创视讯科技股份有限公司 Method and system for converting voice data into text files
CN102985965A (en) * 2010-05-24 2013-03-20 微软公司 Voice print identification
CN103701999A (en) * 2012-09-27 2014-04-02 中国电信股份有限公司 Method and system for monitoring voice communication of call center

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1497932A (en) * 2002-10-23 2004-05-19 国际商业机器公司 System and method of managing personal telephone recording
US20090187405A1 (en) * 2008-01-18 2009-07-23 International Business Machines Corporation Arrangements for Using Voice Biometrics in Internet Based Activities
CN102985965A (en) * 2010-05-24 2013-03-20 微软公司 Voice print identification
CN102522084A (en) * 2011-12-22 2012-06-27 广东威创视讯科技股份有限公司 Method and system for converting voice data into text files
CN103701999A (en) * 2012-09-27 2014-04-02 中国电信股份有限公司 Method and system for monitoring voice communication of call center

Also Published As

Publication number Publication date
CN104766604A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN104766604B (en) The labeling method and device of voice data
CN105100476B (en) A kind of mobile terminal unlocking device and method
CN104883430B (en) A kind of mobile terminal and do not read the treating method and apparatus of footmark
CN104850443B (en) A kind of method and mobile terminal for closing error starting application program
CN104850799B (en) The method and mobile terminal of a kind of data in hiding mobile terminal
CN105592228B (en) Group chat information reminding device and method
CN105897995B (en) A kind of method and apparatus for adding contact information
CN105049582B (en) A kind of save set of calling record, method and display methods
CN105100477B (en) Terminal notifying device and method
CN106254617B (en) A kind of mobile terminal and control method
CN104731508B (en) Audio frequency playing method and device
CN106791027A (en) A kind of terminal based on audio-frequency information searches device and method
CN105933907B (en) A kind of pseudo-base station identification device, method and mobile terminal
CN106412328B (en) A kind of method and apparatus obtaining field feedback
CN104735254B (en) terminal screen locking method and system
CN104834473B (en) The method and device of input
CN104810033B (en) Audio frequency playing method and device
CN105049610B (en) A kind of input method and terminal
CN104837122B (en) Multi-party communications method, terminal and system
CN104915230B (en) application control method and device
CN105187621B (en) The method, apparatus and terminal of message notifying
CN107071161A (en) The aggregation display method and mobile terminal of icon in a kind of status bar
CN106550133A (en) Calling identification device and method
CN106227454B (en) A kind of touch trajectory detection system and method
CN106125898A (en) The method and device of screen rotation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant