[go: up one dir, main page]

WO2006111041A1 - Procede et dispositif de sous-titrage - Google Patents

Procede et dispositif de sous-titrage Download PDF

Info

Publication number
WO2006111041A1
WO2006111041A1 PCT/CN2005/000535 CN2005000535W WO2006111041A1 WO 2006111041 A1 WO2006111041 A1 WO 2006111041A1 CN 2005000535 W CN2005000535 W CN 2005000535W WO 2006111041 A1 WO2006111041 A1 WO 2006111041A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
character
sound file
graphic
display
Prior art date
Application number
PCT/CN2005/000535
Other languages
English (en)
Chinese (zh)
Inventor
Rong Yi
Original Assignee
Rong Yi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rong Yi filed Critical Rong Yi
Priority to PCT/CN2005/000535 priority Critical patent/WO2006111041A1/fr
Publication of WO2006111041A1 publication Critical patent/WO2006111041A1/fr

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals

Definitions

  • the invention relates to a method and a device for making a subtitle editing.
  • This operating system allows the producer to observe the "position pointer” while listening to music ( Figure 1
  • Figure 1 The position of the red line in the waveform on the waveform map to determine the time range of each word.
  • This kind of system has made great progress in the editing mode of time division based on hearing alone. It can give the operator certain auxiliary information to reduce the difficulty of time determination and improve the accuracy of calibration.
  • W words Choinese characters
  • words Western words
  • Figure 1 the information contained in the waveform is very scarce, and the operator can't get intuitive information from it. It still relies on listening to music to set the time. It requires a high concentration of mind, a large workload, and easy fatigue. And the work efficiency is not high. It is very difficult for an operator to distinguish words between musical instruments and vocal music; and, for alphabetic characters such as English and French, it is very difficult to increase the accuracy of time division to a single letter or phoneme.
  • the object of the present invention is to provide a subtitle editing method and device for greatly reducing the difficulty of character area time calibration and easily achieving single words (for Chinese characters), syllables (letters or phonemes) (for Western characters).
  • a method for producing a subtitle editing comprising the following steps:
  • the step 3-1) includes the following steps,
  • the feature values are logarithmically normalized to a set maximum value, and then the graphic output is performed in the form of a gradient or a height.
  • the data structure is used, including at least the start and end positions of the character string corresponding to the character region in the character set, and the character The corresponding broadcast Let go and stop.
  • the start time of the next character area is determined based on the stop time of the previous character area at the start time of the character area.
  • the start and end time difference of the other calibrated character area is copied as the play time of the character area to be calibrated.
  • the graphic is segmentally displayed or continuously scrolled in a display window having an operation interface in a time axis thereof, and the time axis displayed in the display window is configurable.
  • Time span when the graphic is segmented display, an indication mark moving along the time axis indicates a corresponding position of the currently synchronized sound file; when the graphic is continuous scrolling, marked with a fixed position indicator The corresponding position of the currently synchronized sound file.
  • the moving speed of the indicator mark or the continuous scrolling speed of the graphic, and the playing speed of the synchronizedly played sound file are lower than the original playing speed of the sound file.
  • a subtitle editing and creating apparatus including: a data storage device for storing a sound file and a character set as materials;
  • a data processing device configured to convert the sound file into feature values whose time and frequency are two-dimensional variables, and divide the character set into a plurality of load segments, each of the load segments including one or more character regions;
  • a graphic display device configured to display and output the converted sound file in a graphic form
  • An instruction receiving device configured to receive an editing instruction issued by a user, and convert the instruction signal into an instruction signal that is identifiable by the instruction executing device;
  • the command execution device is configured to change a range of characters included in the character area according to the command signal, and perform calibration of the start and end time of each character area.
  • the beneficial technical effects of the present invention are as follows: 1) Converting the sound file data into feature values with time and frequency as two-dimensional variables, and displaying the output in the form of graphics. It greatly enriches the visual information available to the producer. In many cases, it can intuitively observe the starting and ending position of the character area from the graphic, greatly reducing the difficulty and intensity of the work of the producer, and improving the accuracy of the time calibration. , making subtitle editing Be a lighthearted thing.
  • FIG. 1 is a screenshot of an existing subtitle editing system operation interface waveform output window.
  • FIG. 2 is a block diagram showing a circuit configuration of a caption editing and producing apparatus provided by the present invention.
  • FIG. 3 is a flowchart of a method for creating a caption editing provided by the present invention.
  • Figure 4 is a two-dimensional grayscale spectrum of a lyric.
  • Figure 5 is a screenshot of a display window in which the spectrum is displayed in a green gradient.
  • Figure 6 is a screenshot of the operation interface for time calibration of the character area.
  • Figure 7 is a spectrum diagram displayed in three dimensions.
  • Embodiment 1 A subtitle editing and manufacturing apparatus, which is combined with the circuit configuration block diagram shown in FIG. 2, includes: a storage device 1 for storing a sound and a character set as a material; a data processing device 2 for converting the sound file into feature values whose time and frequency are two-dimensional variables, and dividing the character set into a plurality of loads a segment, each of the loading segments includes one or more character regions; a graphic display device 3, configured to display and output the converted sound file in a graphical form; and an instruction receiving device 4, configured to receive an editing command issued by the user And converted into a command signal for the instruction execution device to recognize; and an instruction execution device 5 for changing the range of characters included in the character region according to the command signal, and performing calibration of the start and end time of each character region.
  • a storage device 1 for storing a sound and a character set as a material
  • a data processing device 2 for converting the sound file into feature values whose time and frequency are two-dimensional variables, and dividing the character set into a pluralit
  • the data processing device 2 and the instruction executing device 5 may be realized by a microprocessor of a computer reading and executing a processing program stored on a temporary or fixed storage device.
  • the graphic display device 3 is a device capable of providing a display output window for the processing result, such as a display, a projector, etc.
  • the command receiving device 4 can generally employ a device capable of transmitting an identifiable command to the microprocessor, for example, Keyboard, mouse, trackball, etc.
  • Embodiment 2 A method for creating a subtitle editing, combined with the flowchart shown in FIG. 3, includes the following steps:
  • each of the load segments comprises one or more character regions (Re g ion).
  • the load segment of the character set typically corresponds to a line displayed in the editing interface, typically in the form of a textual natural sentence (eg, an Enter symbol in text editing).
  • the Region is divided by specific rules. For example, a space character is used as a separator to divide each Region (usually applied to a Western-language language, and a region is obtained in units of words), while for a Chinese-language language, it is usually divided into single words, that is, each Characters as a Region.
  • Each Region can contain one or more characters, and the user can expand or reduce the range of characters contained in a Region by using a specific operation method (for example, inputting merge or split instructions through an input device). 3)
  • the start and end time of the character area is calibrated according to the sound file. This is the core of the entire subtitle production process, and the most time and effort. The following process will provide a way to accomplish this step easily, intuitively and with high accuracy.
  • eigenvalues Normalize the eigenvalues with 255 as the maximum value, establish a two-dimensional plane with time and frequency as the coordinate axes, the horizontal axis represents time, the vertical axis represents frequency, and the points on the plane correspond to
  • the eigenvalues are displayed in the form of a gradient, that is, the eigenvalues corresponding to the points on the plane are converted into RGB color values, for example, 256-level gradients are directly converted into green component values to display color or monochrome two-dimensional image.
  • Figure 4 shows a spectrum of 256-level grayscale converted to the lyrics "Happy birthday to you" in the song "Happy Birthday".
  • the obtained sound pattern is stored, and according to the instruction of the editing instruction, segment display or continuous scrolling is performed in the display window having the operation interface in the order of the time axis, and the time axis displayed in the display window is settable. time span.
  • Figure 5 shows a screenshot of the display window with a window span of 8000ms, in which the spectrum is displayed with a 256-level green gradient.
  • the spectrogram adopts a segment display manner, and the indication mark (white vertical line in FIG. 5) moving in the time axis direction indicates the corresponding time position of the currently synchronized played sound file.
  • the playback speed of the W and the synchronized sound files is 0.5 times the original playback speed of the sound file.
  • FIG. 3-2 The start and end time of the character area is calibrated according to the graphic of the display output and the sound file played synchronously.
  • Figure 6 is a screenshot of the operation interface that is time calibrating the character area.
  • the program Increase the time at the left end of the window by 4s (that is, move the spectrum displayed in the right half to the left half), re-read the spectrum of the next 4000ms, and return the indicator to the middle of the window to continue moving. Repeat until the end of the playback. .
  • the operator listens to the music played at a slow speed while observing the positional change of the indicator mark (the white vertical line in Fig. 6). After confirming the start time or stop time of the current character area to be calibrated (the red underline character in the figure), select pause playback and stop the movement of the indicator, and then display the window through the input device (such as mouse, keyboard, etc.)
  • the calibration starts and ends the current character area.
  • the area marked by the yellow line in Figure 6 is the play area of the corresponding character
  • the white line indicates the current playback time point
  • the red vertical line is the time label currently being edited.
  • the current playback time point can be set to the start (or stop time) of the currently to-be-calibrated character area by inputting a confirmation signal.
  • the operator can also perform operations such as adjusting the range of characters covered by the character area, for example, expanding the range of characters to a phrase or narrowing down to a single phoneme or letter.
  • the start and end time labels can still be changed.
  • the start time of the latter character area can be determined based on the stop time of the previous character area, for example , Set in the program, if not set, the stop time of the previous character area is used as the start time of the next character area to save the operator's editing steps.
  • the playing time of a calibrated character area is copied as having the same melody. The playing time of the character area to be calibrated can greatly save the operator's editing time and improve the editing efficiency.
  • Embodiment 3 Another method for creating a subtitle editing process is basically the same as that of the second embodiment, except that the feature value is displayed and outputted in a high-level form in step 3-1-2).
  • Figure 7 shows a three-dimensional map of the song "Happy Birthday” with the lyrics "Happy birthday to you” converted to a height of 256 levels.
  • Figure 7 shows the elevation of each point in different colors. . It can be seen that the use of three-dimensional graphics display can provide more stereoscopic visual information, and the information expression is more abundant and complete.
  • the caption editing method and device provided by the invention can be used not only for editing song subtitles but also for pure speech subtitles, such as movie subtitles, TV subtitles, etc., and as an auxiliary tool for learning foreign languages. Since this method greatly simplifies the difficulty of subtitle production, and improves the accuracy of time label editing, it will make "subtitle DIY" a new form of entertainment for ordinary non-professional users.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Studio Circuits (AREA)

Abstract

L’invention concerne un procédé de sous-titrage. Ce procédé comprend les phases de mémorisation du fichier son et du jeu de caractères dans la banque de mémoire ; division du jeu de caractères en différents segments de chargement, chacun des segments de chargement comprenant une ou plusieurs régions de caractères ; conversion du fichier audio en valeurs caractéristiques consistant en des variables bidimensionnelles de temps et de fréquence, puis transmission et affichage des valeurs caractéristiques sous forme des graphiques ; démarcation du temps initial et du temps final de chaque région de caractères selon les graphiques affichés ; mémorisation des régions de caractères démarquées. L’effet technique bénéfique de l’invention réside dans le fait que les données du fichier audio sont converties dans les valeurs caractéristiques consistant en des variables bidimensionnelles de temps et de fréquence, puis transmises et affichées sous forme de graphiques. Par conséquent, les informations visuelles obtenues par le monteur sont considérablement enrichies, de telle manière que parfois la position initiale et la position finale des régions de caractères s’affichant peuvent être directement observées à partir des graphiques. Selon l’invention, l’intensité et la complexité opérationnelles peuvent être réduites, et la précision de la démarcation temporelle peut être améliorée, ce qui fait du sous-titrage un travail facile et plaisant.
PCT/CN2005/000535 2005-04-19 2005-04-19 Procede et dispositif de sous-titrage WO2006111041A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2005/000535 WO2006111041A1 (fr) 2005-04-19 2005-04-19 Procede et dispositif de sous-titrage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2005/000535 WO2006111041A1 (fr) 2005-04-19 2005-04-19 Procede et dispositif de sous-titrage

Publications (1)

Publication Number Publication Date
WO2006111041A1 true WO2006111041A1 (fr) 2006-10-26

Family

ID=37114689

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2005/000535 WO2006111041A1 (fr) 2005-04-19 2005-04-19 Procede et dispositif de sous-titrage

Country Status (1)

Country Link
WO (1) WO2006111041A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118764674A (zh) * 2024-09-09 2024-10-11 腾讯科技(深圳)有限公司 字幕渲染方法、装置、电子设备、存储介质及程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5194683A (en) * 1991-01-01 1993-03-16 Ricos Co., Ltd. Karaoke lyric position display device
US5243582A (en) * 1990-07-06 1993-09-07 Pioneer Electronic Corporation Apparatus for reproducing digital audio information related to musical accompaniments
JPH08234775A (ja) * 1995-02-24 1996-09-13 Victor Co Of Japan Ltd 音楽再生装置
US5997308A (en) * 1996-08-02 1999-12-07 Yamaha Corporation Apparatus for displaying words in a karaoke system
US20020193895A1 (en) * 2001-06-18 2002-12-19 Ziqiang Qian Enhanced encoder for synchronizing multimedia files into an audio bit stream

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5243582A (en) * 1990-07-06 1993-09-07 Pioneer Electronic Corporation Apparatus for reproducing digital audio information related to musical accompaniments
US5194683A (en) * 1991-01-01 1993-03-16 Ricos Co., Ltd. Karaoke lyric position display device
JPH08234775A (ja) * 1995-02-24 1996-09-13 Victor Co Of Japan Ltd 音楽再生装置
US5997308A (en) * 1996-08-02 1999-12-07 Yamaha Corporation Apparatus for displaying words in a karaoke system
US20020193895A1 (en) * 2001-06-18 2002-12-19 Ziqiang Qian Enhanced encoder for synchronizing multimedia files into an audio bit stream

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118764674A (zh) * 2024-09-09 2024-10-11 腾讯科技(深圳)有限公司 字幕渲染方法、装置、电子设备、存储介质及程序产品

Similar Documents

Publication Publication Date Title
US6424944B1 (en) Singing apparatus capable of synthesizing vocal sounds for given text data and a related recording medium
EP2680254B1 (fr) Procédé et appareil de synthèse de sons
KR101274961B1 (ko) 클라이언트단말기를 이용한 음악 컨텐츠 제작시스템
JP6465136B2 (ja) 電子楽器、方法、及びプログラム
US20170011725A1 (en) Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist
EP1512140B1 (fr) Systeme de notation musicale
US12183319B2 (en) Electronic musical instrument, method, and storage medium
JP7259817B2 (ja) 電子楽器、方法及びプログラム
EP3975167A1 (fr) Instrument de musique électronique, procédé de commande pour instrument de musique électronique, et support de stockage
US20220044662A1 (en) Audio Information Playback Method, Audio Information Playback Device, Audio Information Generation Method and Audio Information Generation Device
CN115273806A (zh) 歌曲合成模型的训练方法和装置、歌曲合成方法和装置
CN114550690B (zh) 歌曲合成方法及装置
US5806039A (en) Data processing method and apparatus for generating sound signals representing music and speech in a multimedia apparatus
JP2008020621A (ja) コンテンツオーサリングシステム
WO2006111041A1 (fr) Procede et dispositif de sous-titrage
KR100710600B1 (ko) 음성합성기를 이용한 영상, 텍스트, 입술 모양의 자동동기 생성/재생 방법 및 그 장치
JP2001134283A (ja) 音声合成装置および音声合成方法
JP2001125599A (ja) 音声データ同期装置及び音声データ作成装置
JP2580565B2 (ja) 音声情報辞書作成装置
JP2008020622A (ja) オーサリングシステムおよびプログラム
JP3620423B2 (ja) 楽曲情報入力編集装置
JP2005309173A (ja) 音声合成制御装置、その方法、そのプログラムおよび音声合成用データ生成装置
KR101427666B1 (ko) 악보 편집 서비스 제공 방법 및 장치
JP4161714B2 (ja) カラオケ装置
Bodo et al. Web Sonification with synesthesia tools

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC - FORM EPO 1205A DATED 11-04-2008

122 Ep: pct application non-entry in european phase

Ref document number: 05743435

Country of ref document: EP

Kind code of ref document: A1