WO2008035829A1

WO2008035829A1 - Apparatus and method for playback speed altering with preservation of tone signal

Info

Publication number: WO2008035829A1
Application number: PCT/KR2006/003770
Authority: WO
Inventors: Seok Bong Kang; Hwa Jung Jun; Sung Hwan Choi
Original assignee: I-WARE Inc Ltd
Current assignee: I-WARE Inc Ltd
Priority date: 2006-09-22
Filing date: 2006-09-22
Publication date: 2008-03-27
Anticipated expiration: 2009-03-22

Abstract

Provided is an apparatus and method for altering a playback speed while preserving a voice signal. The apparatus includes: a memory for storing compressed audio files; a file buffer for buffering the audio files stored in the memory; a playback speed controller for storing audio file information of the file buffer and generating playback speed information according to a playback speed request from a user; a decoder for restoring audio files transferred from the file buffer to PCM data; a data buffer for buffering PCM data from the decoder; a PCM data processor for finding a voiceless period from the PCM data from the data buffer and controlling a playback speed according to the playback speed information; and a CODEC for transforming the PCM data from the PCM data processor to audio analog signal.

Description

APPARATUS AND METHOD FOR PLAYBACK SPEED ALTERING WITH PRESERVATION OF TONE SIGNAL

Technical Field

[1] The present invention relates to an apparatus and method for altering a playback speed while preserving a voice signal; more particularly, to an apparatus and method for dynamically altering a playback speed of a meaningful voice signal without deteriorating a tone color and voice quality of the voice signal using voice density variation. Background Art

[2] In generally, if a playback speed of reproducing a speech signal is slowed down, it gives a longer time to a human's brain to recognize the meaning of reproduced speech after hearing the reproduced speech through the hearing sense of the human, thereby improving human's speech recognition capability. However, if the playback speed is simply slowed down for improving the speech recognition capability, a voice signal may be deteriorated. That is, it may degrade the speech recognition capability.

[3] In a conventional method for altering a playback speed of reproducing voice signal, a non-linear filter is used to compensate voice signal deterioration caused due to the playback speed variation. Fig. 1 shows an original sound reproducing block of a MP3 player.

[4] Since the conventional method of altering a playback time uses the speech nonlinear filter, the speech non-linear filter needs to be designed according to the frequency characteristics of a corresponding speech. Disclosure of Invention Technical Problem

[5] It is, therefore, an object of the present invention to provide a playback speed altering apparatus and method for altering a playback speed without deteriorating a meaningful voice signal by dynamically changing a playback speed of a non- meaningful voice period. Technical Solution

[6] In accordance with one aspect of the present invention, there is an apparatus for altering a playback speed while preserving a speech signal, including: a memory for storing compressed audio files; a file buffer for buffering the audio files stored in the memory; a playback speed controller for storing audio file information of the file buffer and generating playback speed information according to a playback speed request from a user; a decoder for restoring audio files transferred from the file buffer to PCM data; a data buffer for buffering PCM data from the decoder; a PCM data processor for finding a voiceless period from the PCM data from the data buffer and controlling a playback speed according to the playback speed information; and a CODEC for transforming the PCM data from the PCM data processor to audio analog signal.

[7] The memory may store at least one MPEG Audio layer 3 (ME3) files, windows media audio (WMA) files, and an Ogg vorbis (Ogg) file, and the decoder is at least one of a MP3 decoder, a WMA decoder, and OGG decoder for decoding the audio files in the memory.

[8] The playback speed controller may include: a header data analyzer for storing audio file information of the file buffer; a playback speed generator for setting up playback speed information of a currently processed audio file in the file buffer according to a playback speed request from a user; a PCM data controller for controlling the PCM data processor to reproduce the audio file according to the playback speed information; and an operation state controller for controlling the playback speed generator to setup playback speed information according to a sensed playback speed request from a user.

[9] The playback speed controller may include an operation state display for displaying operation state information of an audio file in the file buffer.

[10] The PCM data processor may include: a voice filter for attenuating a sound signal excepting voice from the PCM data transferred from the data buffer; a sound quantity measuring unit for setting up a reference to find a voiced period and a voiceless period from the PCM data, and generating voiceless period information according to the setup reference; a PCM data adder for adding voiceless interval to the voiceless period according to the playback speed information; and a PCM data attenuator for removing voiceless interval from the voiceless period according to the playback speed information.

[11] In accordance with another aspect of the present invention, there is provided a method of altering a playback speed while preserving a voice signal in reproducing audio data using audio data reproducing information separated from an audio file, including the steps of: a) setting up a reference sound pressure defining a playback speed variable period of the audio file; b) setting up a period in a predetermined range from the reference sound pressure as a variable period; c) setting up playback speed control information based on a ratio of the variable period and other periods; d) receiving a playback speed from a user; and e) reproducing the audio file using the playback speed control information and the playback speed.

[12] In the step a), the reference sound pressure may be set after reproducing a predetermined part of the audio file, and attenuating audio signal except voice.

[13] In the step a), the reference sound pressure may be set by an average value of absolute sampled sound pressures obtained through sampling a predetermined portion of the audio file at least four times at a time period of about 24/lOOOsec.

[14] In the step b), a period having a sound pressure less than 30% of the reference sound pressure may be set as a variable period.

[15] In the step e), the audio file may be reproduced according to an equation:

[16] user's input speed- sound pressure occupying ratio out of variable period

X = sound pressure occupying ratio in variable period

[17] , where X denotes a ratio of adding or removing voiceless intervals to/from a voiceless period to control a playback speed.

Advantageous Effects

[18] An apparatus and method for altering a playback speed while preserving a voice signal according to the present invention control the playback speed by dynamically changing a playback speed for a less-meaningful voice period. Therefore, the playback speed can be controlled without deteriorating meaningful voice without using a voice non-learner filter for compensating voice deterioration.

Brief Description of the Drawings [19] The above and other objects and features of the present invention will become apparent from the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which: [20] Fig. 1 shows an original sound reproducing block of a MP3 player;

[21] Fig. 2 is a block diagram illustrating an apparatus for altering a playback speed while preserving a voice signal without deteriorating the speed signal according to an exemplary embodiment of the present invention; [22] Fig. 3 is a block diagram illustrating a playback speed controller of a playback speed altering apparatus of Fig. 2; [23] Fig. 4 is a block diagram illustrating a PCM data processor of an apparatus for controlling a playback speed for preserving tone signal of Fig. 2; and [24] Fig. 5 is a flowchart illustrating a method for altering a playback speed while preserving a voice signal according to an embodiment of the present invention.

Best Mode for Carrying Out the Invention [25] Other objects and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. [26] Fig. 2 is a block diagram illustrating an apparatus for altering a playback speed while preserving a voice signal without deteriorating the speed signal according to an exemplary embodiment of the present invention. [27] As shown in Fig. 2, the playback speed altering apparatus according to the present embodiment includes a memory 110, a file buffer 120, a decoder 130, a data buffer 140, a CODEC 150, a playback speed controller 160, and a PCM data processor 170.

[28] The playback speed altering apparatus according to the present embodiment can be used for an original sound reproducing device as a functional block of changing a playback speed, such as a MP3 player. The playback speed altering apparatus can be embodied as combination of software and hardware.

[29] The memory 110 stores audio files. It is preferable that the audio file includes

MPEG Audio layer 3 (ME3) files, windows media audio (WMA) files, and an Ogg vorbis (Ogg) file.

[30] The decoder 130 decodes the audio files stored in the memory 110 to a PCM data.

The playback altering apparatus according to the present embodiment may include more than one decoder. However, it is preferable to have a plurality of decoders for each type of audio files. For example, the decoder 130 may include a MP3 decoder, a WMA decoder, and an OGG decoder.

[31] The CODEC 150 receives the PCM data from the decoder 130 through the PCM data processor 170, and transforms the PCM data to audio analog signal. The CODEC 150 can be connected to an ear phone 180 that converts the audio analog signal to sound.

[32] It is preferable that the file buffer 120 is interposed between the memory 110 and the decoder 130, and the data buffer 140 is interposed between the decoder 130 and the CODEC 150.

[33] The playback speed controller 160 stores information about a file to be reproduced and provides a reproducing state of a file that is currently reproduced to a user. Also, the playback speed controller 160 creates playback speed information according to a request of controlling a playback speed from a user. Furthermore, the playback speed controller 160 controls and drives the PCM data processor 170.

[34] The PCM data processor 170 is driven in response to the control of the playback speed controller 160. The PCM data processor 170 finds a voiceless period from the PCM data from the data buffer 140 and attenuates sounds excepting voice therefrom. Furthermore, the PCM data processor 170 changes the playback speed of the voiceless period according to the playback speed information.

[35] In the present embodiment, the playback speed controller 160 does not influence restoration of the compressed filed although the audio file information of the file buffer 120 is used because the audio file is not modified.

[36] When the playback speed is not controlled, the PCM data processor 170 transfers the PCM data stored in the data buffer 140 to the CODEC 150 without processing the PCM data. Therefore, the compressed file can be processed like as a conventional MP3 player.

[37] Fig. 3 is a block diagram illustrating a playback speed controller of a playback speed altering apparatus of Fig. 2.

[38] As shown in Fig. 3, the playback speed controller 160 includes a header data analyzer 162, an operation state display 164, a playback speed generator 166, a PCM data controller 168, and an operation state controller 167.

[39] The header data analyzer 162 stores file information to process. Herein, the file information includes a file type, a version, a sample rate, a samples per channel, packed information, required bits, a free format, and etc. The file information may further include additional information according to the file type.

[40] The operation state display 164 stores information about the operation state of files that are currently processing, and displays a time of processing a file, or error states in a text or an icon.

[41] Herein, the error states can be used as state information in the operation state controller 167. For example, the error state may includes a broken frame, data overflow, unsupported layer, forbidden bit rate, wrong MPEG build, and etc in ca of MP3. IN case of WMA, the error state may include a bad asf header, a bad packet header, a bad weighting mode, a bad packet, and etc.

[42] The playback speed generator 166 assigns a playback speed of a currently reproducing file according to a request of controlling a playback speed from a user.

[43] The PCM data controller 168 controls the PCM data processor 170. That is, the

PCM data controller 168 controls the voice filter 172 and the sound quantity measuring unit 174. Also, the PCM data controller 168 can add and reduce voiceless interval required to control the playback speed while preserving the voice signal according to a playback speed rate generated from the playback speed generator 166 by controlling the PCM data processor 170.

[44] A playback speed altering method according to the present embodiment is a method of adding voiceless interval to a period with no voice. The lengths of words forming a sentence in the audio file are different from each other. That is, a voiceless period between a word and other word is not constant. Since the voiceless periods are shown irregularly, it is preferable to add or reduce the voiceless interval according to the voice signal.

[45] The operation state controller 167 drives constitutional elements of the playback speed controller 160 such as the header data analyzer 162, the operation state display 164, the playback speed generator 166, and the PCM data controller 168.

[46] The operation state controller 167 is driven when a timer of an original sound reproducing apparatus generates an interruption, or when a management program of the original sound reproducing apparatus reads a new file. The management program may be a program for reproducing music, displaying information of MP3, performing a menu function and receiving input through a key board.

[47] The operation state controller 167 checks whether contents of the operation state display 164 is modified or not when the timer generates the interrupt. If the contents of the operation state display 164 are modified, the operation state controller 167 informs the operation state display 164 of the modified contents.

[48] If the modified contents are about the change of a playback speed, the operation state controller 167 controls the playback speed generator 166 to generate a playback speed according to the modified playback speed. The operation state controller 167 informs the PCM data controller 168 of the modified contents, thereby driving the PCM data processor 170.

[49] The operation state controller 167 terminates the related operation and returns the control to a process performed right before the interrupt is generated if the contents are not modified in the operation state display 164.

[50] Fig. 4 is a block diagram illustrating a PCM data processor of an apparatus for controlling a playback speed for preserving tone signal of Fig. 2.

[51] As shown in Fig. 4, the PCM data processor 170 includes a voice filter 172, a sound quantity measuring unit 174, a PCM data adder 176, and a PCM data attenuator 177.

[52] The voice filter 172 receives PCM data from the data buffer 140, and attenuates a sound signal excepting a voice. It is because the voiceless period cannot be found if the PCM buffer 178 includes a sound signal excepting the voice.

[53] The PCM data processor 170 according to the present embodiment may include two or more voice filters 172 and it is preferable to use the voice filters dedicated for male and female, respectively. Since a sound quality differs according to male and female, the male can be distinguished from the female with only voice. Therefore, the attenuation accuracy can be improved by using the dedicated voice filters 172 according to the female and the male.

[54] The sound quantity measuring unit 174 sets a reference to find a voiced period and a voiceless period in the PCM buffer 178, and creates information about a voiceless period according to the set reference.

[55] The PCM data adder 176 performs a function of slowing down the playback speed while preserving a voice signal. The PCM data adder 176 adds voiceless interval to a voiceless period at a predetermined interval assigned by the playback speed generator 166 using the information about voiceless period from the sound quantity measuring unit 174.

[56] The PCM data attenuator 177 increases up the playback speed while preserving a voice signal. The PCM data attenuator 177 removes voiceless interval from the voiceless period provided from the sound quantity measuring unit 174 at a pre- determined time interval defined by the playback speed generator 166.

[57] Fig. 5 is a flowchart illustrating a method for altering a playback speed while preserving a voice signal according to an embodiment of the present invention.

[58] Referring to Fig. 5, the method of altering a playback speed while preserving a voice signal according to the present invention includes a playback sound pressure threshold setup step SlOO, a variable period setup step S200, a playback speed setup step S300, a speed ratio receiving step S400, and a playback step S500.

[59] In the method of altering a playback speed while preserving a voice signal according to an embodiment of the present invention, a compressed digital audio signal is separated to sound components and sound playback data for reproducing the sound components, the sound component and the sound playback data are stored, and the sound component are reproduced using the sound playback data.

[60] At the playback sound pressure threshold setup step SlOO, a sound quantity measuring unit 174 sets up a playback sound pressure threshold Tp which is a reference to set a period for controlling a sound source playback speed.

[61] At the step SlOO, the playback speed generator 166 reproduces a sound with a low sound level as low as a user cannot hear or reproduces a sound recorded for a very short time, for example, shorter than 30/1000 second, and passes the reproduced sound through the voice filter 172. The voice filter 172 extracts a voice signal only from the reproduced audio signal by attenuating other signals in the reproduced audio signal, and the sound quantity measuring unit 174 sets up the playback sound pressure threshold Tp to distinguish a voice sound from a voiceless sound using the extracted voice signal.

[62] For example, the playback sound pressure threshold may be set up using a sampled value obtained by sampling the recorded sound at least four times at a time period of 24/lOOOsec.

[63] That is, an average value of sampled values can be set up as the playback sound pressure threshold. It is preferable to set up the playback sound pressure threshold as an average value of absolute sampled values to correct a sound pressure value generated from the comparative difference with the reference value.

[64] In order to enhance a predetermined sound-band, the playback sound pressure threshold value can be set up by multiplying a predetermined weight to a predetermined signal band among the sampled signals and obtaining an average thereof.

[65] It is obvious to the skilled in the art that various methods of defining the playback sound pressure threshold using statistical characteristics of the sampled values based on a playback environment, a characteristic of audio file to reproduce, and user's selection can be used.

[66] At the variable period setting step S200, the playback speed generator 166 sets a predetermined period as an active variable period using the playback sound pressure threshold.

[67] In the present embodiment, a voiceless period is not a period with absolutely no- voice included, but is a period with comparatively no voice based on a sound pressure threshold.

[68] That is, in the present embodiment, a period not much influencing the hearing recognition of a user based on the sound pressure threshold is setup as the variable period. Such a variable period is extended or shorted in the present embodiment.

[69] For example, a sound pressure period less than about 30% from the set sound pressure threshold can be set as a variable period. It can be expressed as Eq. 1.

[70] Eq. 1

[71] variable period < (T -0.7T =0.3T )

P P P

[72] It is possible to set a period with a sound pressure lower than the sound pressure threshold as a variable period. By setting a period with a sound pressure less than 30% of sound pressure threshold as shown in Eq. 1 , the influence of the hearing recognition can be further reduced although the period is extended or shortened.

[73] At the step S300 for setting up the playback speed, the playback speed generator

166 sets the variable period defined at the variable period setting up step S200 as a playback speed setting up period. Also, the playback speed generator 166 sets playback speed control information based on a ratio of the variable period and other periods.

[74] The playback speed control information is information for controlling a playback speed according to a requested playback speed from a user.

[75] At the step S400 for receiving the speed ratio, the operation state controller 167 receives a speed from a user to extend or shorten a sound.

[76] The speed inputted from a user is information about a desired playback speed of reproducing a current audio file. The format of the speed inputted from a user can be modified in various types and forms according to user interfaces. For example, a comparative speed ratio based on a current playback speed can be inputted.

[77] At the step S500 for reproducing audio file according to the speed ratio, the PCM data controller 168 reproduces a predetermined audio file based on sound reproducing information, playback speed control information, and a speed inputted from a user.

[78] At the step S500, voiceless interval is added to a period having a predetermined value lower than a threshold value according to the playback speed control information through controlling the PCM data adder 176 in case of slowing down the playback speed. In case of increasing the playback speed, the voiceless interval is removed from a period having a predetermined value lower than the threshold according to the playback speed control information through controlling the PCM data attenuator.

[79] For example, a sound may be reproduced according to a speed ratio inputted from a user like as Eq. 2 in a period with sound pressure exceeding the playback sound pressure threshold range, that is, a variable period. [80] user's input speed- sound pressure occupying ratio out of variable period

K = - sound pressure occupying ratio in variable period

[81] In Eq. 2, X denotes a ratio of adding or removing a voiceless interval to/from a voiceless period for controlling a playback speed. Eq. 2 denotes that the playback speed can be dynamically changed only in a period with less meaningful voice by adding or removing the voiceless interval as long as the voiceless period multiplied with X if the voiceless period is found.

[82] Also, the sound pressure occupying ratio in the variable period denotes a ratio of variable periods in entire periods in one unit period of storing and processing in a buffer while processing audio signal.

[83] That is, if a period having a sound pressure less than 30% of the sound pressure threshold is set as a variable period, the sound pressure occupying ratio is a ratio of the variable period having sound pressure less than 30% of the sound pressure reference in entire periods.

[84] The sound pressure ratio out of the variable period is a ratio of not variable period in entire periods.

[85] Therefore, the playback speed altering method according to the present invention can reproduce original sound at a playback speed requested by a user without deteriorating the meaningful voices.

[86] While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.

Claims

[1] An apparatus for altering a playback speed while preserving a voice signal, comprising: a memory for storing compressed audio files; a file buffer for buffering the audio files stored in the memory; a playback speed controller for storing audio file information of the file buffer and generating playback speed information according to a playback speed request from a user; a decoder for restoring audio files transferred from the file buffer to PCM data; a data buffer for buffering PCM data from the decoder; a PCM data processor for finding a voiceless period from the PCM data from the data buffer and controlling a playback speed according to the playback speed information; and a CODEC for transforming the PCM data from the PCM data processor to audio analog signal.

[2] The apparatus of claim 1, wherein the memory stores at least one MPEG Audio layer 3 (ME3) files, windows media audio (WMA) files, and an Ogg vorbis

(Ogg) file, and the decoder is at least one of a MP3 decoder, a WMA decoder, and OGG decoder for decoding the audio files in the memory.

[3] The apparatus of claim 1, wherein the playback speed controller includes: a header data analyzer for storing audio file information of the file buffer; a playback speed generator for setting up playback speed information of a currently processed audio file in the file buffer according to a playback speed request from a user; a PCM data controller for controlling the PCM data processor to reproduce the audio file according to the playback speed information; and an operation state controller for controlling the playback speed generator to setup playback speed information according to a sensed playback speed request from a user.

[4] The apparatus of claim 1, wherein the playback speed controller includes an operation state display for displaying operation state information of an audio file in the file buffer.

[5] The apparatus of claim 1, wherein the PCM data processor includes: a voice filter for attenuating a sound signal excepting voice from the PCM data transferred from the data buffer; a sound quantity measuring unit for setting up a reference to find a voiced period and a voiceless period from the PCM data, and generating voiceless period in- formation according to the setup reference; a PCM data adder for adding voiceless interval to the voiceless period according to the playback speed information; and a PCM data attenuator for removing voiceless interval from the voiceless period according to the playback speed information.

[6] A method of altering a playback speed while preserving a voice signal in reproducing audio data using audio data reproducing information separated from an audio file, comprising the steps of: a) setting up a reference sound pressure defining a playback speed variable period of the audio file; b) setting up a period in a predetermined range from the reference sound pressure as a variable period; c) setting up playback speed control information based on a ratio of the variable period and other periods; d) receiving a playback speed from a user; and e) reproducing the audio file using the playback speed control information and the playback speed.

[7] The method of claim 6, wherein in the step a), the reference sound pressure is set after reproducing a predetermined part of the audio file, and attenuating audio signal except voice. [8] The method of claim 7, wherein in the step a), the reference sound pressure is set by an average value of absolute sampled sound pressures obtained through sampling a predetermined portion of the audio file at least four times at a time period of about 24/lOOOsec. [9] The method of claim 8, wherein in the step b), a period having a sound pressure less than 30% of the reference sound pressure is set as a variable period. [10] The method of claim 6, wherein in the step e), the audio file is reproduced according to an equation: user's input speed- sound pressure occupying ratio out of variable period sound pressure occupying ratio in variable period

, where X denotes a ratio of adding or removing voiceless intervals to/from a voiceless period to control a playback speed.