[go: up one dir, main page]

CN105898556A - Plug-in subtitle automatic synchronization method and device - Google Patents

Plug-in subtitle automatic synchronization method and device Download PDF

Info

Publication number
CN105898556A
CN105898556A CN201511018280.XA CN201511018280A CN105898556A CN 105898556 A CN105898556 A CN 105898556A CN 201511018280 A CN201511018280 A CN 201511018280A CN 105898556 A CN105898556 A CN 105898556A
Authority
CN
China
Prior art keywords
audio
time
plug
short sentence
start time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511018280.XA
Other languages
Chinese (zh)
Inventor
蔡炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Original Assignee
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshi Zhixin Electronic Technology Tianjin Co Ltd filed Critical Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority to CN201511018280.XA priority Critical patent/CN105898556A/en
Publication of CN105898556A publication Critical patent/CN105898556A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明涉及视频播放技术领域,公开了一种外挂字幕的自动同步方法及装置。本发明中,先提取视频文件的音频部分,并对音频部分进行解码,获得脉冲编码调制数据;再将脉冲编码调制数据切分成音频片段,并对音频片段进行分类,接着将分类为语音的音频片段划分成短句,并确定短句的起始时间和结束时间;根据确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项;将匹配项的起始时间更改为当前视频的播放时间戳PTS,并根据播放时间戳,更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间。本发明使得字幕文件的显示时间与音视频的播放时间相一致,从而实现外挂字幕的自动同步,提高用户的观影体验。

The invention relates to the technical field of video playback, and discloses an automatic synchronization method and device for external subtitles. In the present invention, the audio part of the video file is first extracted, and the audio part is decoded to obtain pulse code modulation data; then the pulse code modulation data is divided into audio segments, and the audio segments are classified, and then classified into voice audio Divide the segment into short sentences, and determine the start time and end time of the short sentences; search for a match in the external subtitle file based on the determined start time and end time of the short sentences; change the start time of the match to The playback timestamp PTS of the current video, and according to the playback timestamp, update the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item. The invention makes the display time of the subtitle file consistent with the playing time of the audio and video, thereby realizing the automatic synchronization of the external subtitles and improving the viewing experience of users.

Description

一种外挂字幕的自动同步方法及装置A method and device for automatic synchronization of external subtitles

技术领域technical field

本发明涉及视频播放技术领域,特别涉及一种外挂字幕的自动同步方法及装置。The invention relates to the technical field of video playback, in particular to an automatic synchronization method and device for external subtitles.

背景技术Background technique

字幕(subtitles ofmotion picture)是指以文字形式显示电视、电影、舞台作品中的对话等非影像内容,也泛指影视作品后期加工的文字。在制作电影等视频作品时,可以将视频文件和字幕文件集成在一起,这样在播放时没有办法改变和去掉的字幕称为内嵌字幕。而一些作品中,各自独立存在的视频文件和字幕文件各自独立存在,在视频播放时则可以导入所需版本的字幕文件,该种字幕文件称为外挂字幕。相比内嵌字幕,外挂字幕具有灵活多样,导入方便且不会损害视频质量等的优点。Subtitles (subtitles of motion picture) refer to non-image content such as dialogues in TV, movies, and stage works in the form of text, and also generally refer to the text of post-processing of film and television works. When making video works such as movies, video files and subtitle files can be integrated together, so that the subtitles that cannot be changed or removed during playback are called embedded subtitles. In some works, the video file and the subtitle file exist independently, and the required version of the subtitle file can be imported when the video is played. This type of subtitle file is called an external subtitle. Compared with embedded subtitles, external subtitles have the advantages of being flexible, easy to import, and will not damage video quality.

外挂字幕一般采用专用字幕软件进行字幕制作。这种制作方式首先需要人工来听完整的台词,按照每句台词所述的内容将完整的台词字幕输入到电子文本之中,其利用专用字幕软件,边听字幕内容,边进行手工断句,以确定每一句对白的起始时间和对白长度,即所谓的“时间轴”。当全部字幕制作完毕,字幕软件会输出某一种或几种格式的外挂字幕文件。当某个播放系统能够识别并支持外挂字幕的播放方式时,即可在视频播放时加载这些字幕文件。然而,由于外挂字幕文件制作的自身特点决定,外挂字幕文件的时间标记准确度较差,导致播放时与音视频的同步性较差,而用户手动调节字幕的播放时间则显得十分麻烦,严重影响用户正常观影。External subtitles generally use special subtitle software for subtitle production. This production method first needs to listen to the complete lines manually, and input the subtitles of the complete lines into the electronic text according to the content described in each line. It uses special subtitle software to listen to the content of the subtitles while manually segmenting the lines. Determine the start time and length of each line of dialogue, the so-called "time axis". When all the subtitles are finished, the subtitle software will output one or more external subtitle files in one or more formats. When a playback system can recognize and support the playback method of external subtitles, these subtitle files can be loaded during video playback. However, due to the characteristics of the production of external subtitle files, the time stamp accuracy of external subtitle files is poor, resulting in poor synchronization with audio and video during playback, and it is very troublesome for users to manually adjust the playback time of subtitles, which seriously affects Users watch movies normally.

发明内容Contents of the invention

本发明的目的在于提供一种外挂字幕的自动同步方法及装置,使得字幕文件的显示时间与音视频的播放时间相一致,从而实现外挂字幕的自动同步,提高用户的观影体验。The purpose of the present invention is to provide an automatic synchronization method and device for external subtitles, so that the display time of subtitle files is consistent with the playback time of audio and video, thereby realizing automatic synchronization of external subtitles and improving the viewing experience of users.

为解决上述技术问题,本发明的实施方式提供了一种外挂字幕的自动同步方法,包含以下步骤:提取视频文件的音频部分,并对音频部分进行解码,获得脉冲编码调制数据;将所述脉冲编码调制数据切分成音频片段,并对所述音频片段进行分类;其中,所述分类的类别包含:静音、语音和非语音;将所述分类为语音的音频片段划分成短句,并确定所述短句的起始时间和结束时间;根据所述确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项;将所述匹配项的起始时间更改为当前视频的播放时间戳PTS,并根据所述播放时间戳,更新外挂字幕文件中起始时间大于所述匹配项的起始时间的每一项的起始时间。In order to solve the above-mentioned technical problems, the embodiment of the present invention provides a kind of automatic synchronizing method of subtitles, comprising the following steps: extracting the audio part of the video file, and decoding the audio part to obtain pulse code modulation data; The coded modulation data is divided into audio segments, and the audio segments are classified; wherein, the categories of the classification include: silence, speech and non-speech; the audio segments classified as speech are divided into short sentences, and the determined The start time and the end time of the phrase; according to the start time and the end time of the determined phrase, a match is searched in the external subtitle file; the start time of the match is changed to the current video The playback time stamp PTS, and according to the playback time stamp, update the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item.

本发明的实施方式还提供了一种外挂字幕的自动同步装置,包含:提取模块、切分模块、划分模块、搜索模块和更新模块;所述提取模块用于提取视频文件的音频部分,并对音频部分进行解码,获得脉冲编码调制数据;所述切分模块用于将所述脉冲编码调制数据切分成音频片段,并对所述音频片段进行分类;其中,所述分类的类别包含:静音、语音和非语音;所述划分模块用于将所述分类为语音的音频片段划分成短句,并确定所述短句的起始时间和结束时间;所述搜索模块用于根据所述确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项;所述更新模块用于将所述匹配项的起始时间更改为当前视频的播放时间戳PTS,并根据所述播放时间戳,更新外挂字幕文件中起始时间大于所述匹配项的起始时间的每一项的起始时间。Embodiments of the present invention also provide an automatic synchronization device for external subtitles, including: an extraction module, a segmentation module, a division module, a search module and an update module; the extraction module is used to extract the audio part of the video file, and The audio part is decoded to obtain pulse code modulation data; the segmentation module is used to segment the pulse code modulation data into audio segments, and classify the audio segments; wherein, the categories of the classification include: silence, Speech and non-speech; the division module is used to divide the audio segment classified as speech into short sentences, and determine the start time and end time of the short sentences; the search module is used to determine according to the The start time and the end time of the short sentence are searched for a match in the external subtitle file; the update module is used to change the start time of the match to the playback time stamp PTS of the current video, and play according to the Timestamp, updating the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item.

本发明实施方式相对于现有技术而言,提取视频文件的音频部分,并对音频部分进行解码获得脉冲编码调制数据,将脉冲编码调制数据切分成音频片段,并将音频片段分类为语音、静音、非语音,进而将分类为语音的音频片段的划分成短句,并确定短句的起始时间和结束时间,进而根据确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项,并将匹配项的起始时间更改为当前视频的播放时间戳PTS,并根据播放时间戳,更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间,从而使得字幕文件的对白的显示时间与视频播放自动同步,提高用户观影体验。Compared with the prior art, the embodiment of the present invention extracts the audio part of the video file, decodes the audio part to obtain pulse code modulation data, divides the pulse code modulation data into audio segments, and classifies the audio segments into speech, silence , non-speech, and then divide the audio clips classified as speech into short sentences, and determine the start time and end time of the short sentences, and then search in the external subtitle file according to the determined start time and end time of the short sentences A matching item, and change the start time of the matching item to the playback timestamp PTS of the current video, and update the start of each item in the external subtitle file whose starting time is greater than the starting time of the matching item according to the playback timestamp Time, so that the display time of the subtitle file's dialogue is automatically synchronized with the video playback, improving the user's viewing experience.

优选地,在所述根据所述确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项的步骤中,包含以下子步骤:在所述起始时间前后预设时长内,在所述外挂字幕文件中找到相应项;在所述找到的相应项中,找出与所述短句的对白时长在误差允许范围内的所有项;如果找出的项数目多于一个,将所述确定的短句的上一个记录与所述找出的项的上一个记录进行比较,直到找到最相似的一项作为匹配项。从而提高字幕和音视频的匹配效率以及准确度。Preferably, in the step of searching for a matching item in the external subtitle file according to the start time and end time of the determined short sentence, the following sub-steps are included: within a preset duration before and after the start time , find the corresponding item in the external subtitle file; in the found corresponding item, find out all items whose dialogue duration with the phrase is within the allowable range of error; if the number of found items is more than one, A previous record of the determined phrase is compared with a previous record of the found item until the most similar item is found as a match. Thereby, the matching efficiency and accuracy of subtitles and audio and video are improved.

优选地,在所述将所述音频片段划分成短句的步骤中,根据语音停顿进行划分;其中,所述语音停顿至少包含第一预设数目的音频段。从而可以提高语句划分的效率。Preferably, in the step of dividing the audio segment into short sentences, the division is performed according to speech pauses; wherein, the speech pauses include at least a first preset number of audio segments. Thus, the efficiency of sentence division can be improved.

优选地,所述第一预设数目为2个。从而可以忽略较短的伴音信息,更好地保护一句话的完整性。Preferably, the first preset number is 2. Thereby, the short accompanying sound information can be ignored, and the integrity of a sentence can be better protected.

优选地,所述短句至少包含第二预设数目的音频段,所述第二预设数目为3个。从而可以滤除掉音频中的短时无效信息,提高语句划分的效率。Preferably, the phrase contains at least a second preset number of audio segments, and the second preset number is three. Therefore, the short-term invalid information in the audio can be filtered out, and the efficiency of sentence division can be improved.

附图说明Description of drawings

图1是根据本发明第一实施方式外挂字幕的自动同步方法的流程图;Fig. 1 is the flow chart of the automatic synchronization method of subtitles according to the first embodiment of the present invention;

图2是根据本发明第一实施方式短句与字幕项匹配算法示意图;Fig. 2 is a schematic diagram of a matching algorithm between phrases and subtitles according to the first embodiment of the present invention;

图3是根据本发明第二实施方式外挂字幕的自动同步装置的结构框图。Fig. 3 is a structural block diagram of an automatic synchronization device for subtitles according to a second embodiment of the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本发明各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请各权利要求所要求保护的技术方案。In order to make the object, technical solution and advantages of the present invention clearer, various embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present invention, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in each claim of the present application can be realized.

本发明的第一实施方式涉及一种外挂字幕的自动同步方法,具体流程如图1所示,包含以下步骤:The first embodiment of the present invention relates to an automatic synchronization method for external subtitles, the specific process is shown in Figure 1, including the following steps:

步骤10:提取视频文件的音频部分,并对音频部分进行解码,获得脉冲编码调制数据。Step 10: extract the audio part of the video file, and decode the audio part to obtain pulse code modulation data.

视频文件是由视频流和音频流合成得到的,在线播放视频时,首先从视频文件中提取出音频流。可以采用开源库ffmpeg提取视频文件的音频部分,再通过相应解码器将音频部分解码为PCM(Pulse Coding Modulation,脉冲编码调制,简称PCM)数据。A video file is synthesized from a video stream and an audio stream. When playing a video online, the audio stream is first extracted from the video file. You can use the open source library ffmpeg to extract the audio part of the video file, and then decode the audio part into PCM (Pulse Coding Modulation, pulse code modulation, PCM for short) data through the corresponding decoder.

步骤11:将脉冲编码调制数据切分成音频片段,并对音频片段进行分类。Step 11: Segment the PCM data into audio segments, and classify the audio segments.

本实施方式中,可以利用Marsyas软件对提取的音频(脉冲编码调制数据)进行分类,比如,通过Marsyas,可以判断出该音频数据的类别:静音、语音和非语音。可以通过Marsyas提供的接口设定音频帧长为32ms,并将5个音频帧作为一个音频段,即音频段长为0.16s。在分类过程中,可以音频段为单位进行一次分类,提高分类的效率。本实施方式对于音频片段的分类方法不作具体限制,只要能够将语音和非语音区分开即可。由此可见,通过本步骤的分类即可获取音频片段中语音片段的起始时间以及结束时间,为从音频片段中提取语音语句打下基础。In this embodiment, the extracted audio (PCM data) can be classified by using Marsyas software. For example, by Marsyas, the category of the audio data can be determined: silence, speech and non-speech. The audio frame length can be set to 32ms through the interface provided by Marsyas, and 5 audio frames are used as an audio segment, that is, the audio segment length is 0.16s. In the classification process, one classification can be performed in units of audio segments, so as to improve the efficiency of classification. In this embodiment, there is no specific limitation on the classification method of audio segments, as long as speech and non-speech can be distinguished. It can be seen that, through the classification in this step, the start time and end time of the voice segment in the audio segment can be obtained, laying a foundation for extracting the voice sentence from the audio segment.

步骤12:将分类为语音的音频片段划分成短句,并确定短句的起始时间和结束时间。通过步骤11的分类,可以确定语音、非语音、静音等的起始时间和结束时间,进而可以根据语音停顿将语音划分为短句。Step 12: Divide the audio segment classified as speech into short sentences, and determine the start time and end time of the short sentences. Through the classification in step 11, the start time and end time of voice, non-voice, silence, etc. can be determined, and then the voice can be divided into short sentences according to the voice pause.

本实施方式中检测句子的开始与结束是短句划分的关键,因为只有达到较高的端点检测精度,才可以做到有的放矢,实现对句子长短和数目的控制。本步骤基于步骤11中获取的分类信息,通过采取预设的切分算法即可从音频中截取出语音单元(即短句)。具体地,可以采用以下策略进行音频切分:以进入连续语音段之前的静音段或非语音段的时间点作为句子的开始时间,以结束连续语音段时的最后一个语音段的时间点作为句子的结束时间。这样对音频切分后即可得到以一定时长的语音停顿作为分割边界的、语义相对完整的“类句子”单元,即本实施方式中的短句。In this embodiment, detecting the beginning and end of a sentence is the key to short sentence division, because only when a high endpoint detection accuracy is achieved can it be targeted and control the length and number of sentences can be achieved. This step is based on the classification information obtained in step 11, and the speech unit (ie short sentence) can be intercepted from the audio by adopting a preset segmentation algorithm. Specifically, the following strategy can be used for audio segmentation: the time point of the silent segment or non-speech segment before entering the continuous speech segment is used as the start time of the sentence, and the time point of the last speech segment at the end of the continuous speech segment is used as the sentence end time of . In this way, after the audio is segmented, a "sentence-like" unit with a certain duration of speech pause as the segmentation boundary and relatively complete semantics can be obtained, that is, a short sentence in this embodiment.

但是,通过上述切分策略检测句子端点可能会造成一些极端情况:比如会划分出一些极短的句子,这些句子的长度仅为一到两个音频段,这样的句子通常只包含一两个词语,甚至不包含任何有效的语音信息,因此这些句子需要被滤除而不能作为有效地句子进行字幕显示。However, detecting sentence endpoints through the above segmentation strategy may cause some extreme situations: for example, some extremely short sentences will be divided, the length of these sentences is only one or two audio segments, and such sentences usually only contain one or two words , does not even contain any valid speech information, so these sentences need to be filtered out and cannot be subtitled as valid sentences.

为了提高切分效率,在切分策略中设置语音停顿至少包含第一预设数目的音频段,较佳地,第一预设数目的音频段比如为2个音频段。通过设置语音停顿的最小长度,可以忽略较短的伴音信息,比如说话人的瞬时换气等,从而能够保护一句话的完整性。In order to improve the segmentation efficiency, it is set in the segmentation strategy that the speech pause includes at least a first preset number of audio segments, preferably, the first preset number of audio segments is, for example, 2 audio segments. By setting the minimum length of the speech pause, shorter audio information, such as the speaker's instantaneous breath change, can be ignored, thereby protecting the integrity of a sentence.

进一步地,切分出的短句至少包含第二预设数目的音频段,较佳地,第二预设数目的音频段比如可以为3个音频段,即忽略总长小于0.48秒的语音单元,通过限制句子的最小长度,可以滤除掉音频中的短时无效信息,比如说话人的轻咳等。Further, the segmented phrase contains at least a second preset number of audio segments, preferably, the second preset number of audio segments can be, for example, 3 audio segments, that is, speech units with a total length of less than 0.48 seconds are ignored, By limiting the minimum sentence length, short-term invalid information in the audio, such as the speaker's light cough, can be filtered out.

应当理解,本实施方式对于第一预设数据或者第二预设数目的具体数值不作限制,实际应用中,可以根据语言的特点进行调整以更精确、高效地确定语句单元的起始时间和结束时间。It should be understood that this embodiment does not limit the specific values of the first preset data or the second preset number. In practical applications, adjustments can be made according to the characteristics of the language to determine the start time and end of the sentence unit more accurately and efficiently. time.

通过步骤12,将提取出的一段音频切分成了一个一个独立的语句,并获取了语句的起始时间以及结束时间,据此可以确定语句的播放时长。Through step 12, the extracted piece of audio is divided into individual sentences, and the start time and end time of the sentences are obtained, so that the playback duration of the sentences can be determined.

步骤13:根据确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项。Step 13: Search for a matching item in the external subtitle file according to the determined start time and end time of the phrase.

通常,外挂字幕文件中包含有起始时间,对白时长等的信息。本实施方式在播放时,获取外挂字幕文件,并根据外挂字幕文件创建一个<起始时间,对白时长>的数据结构datastruct1,从而可以方便地查找到各对白的起始时间和对白时长。然后根据步骤12中划分出的短句(即视频中的对白)的起始时间和结束时间在数据结构datastruct1中寻找匹配项item。Usually, the external subtitle file contains information such as start time, dialogue duration and so on. In this embodiment, during playback, the external subtitle file is obtained, and a data structure datastruct1 of <start time, dialogue duration> is created according to the external subtitle file, so that the start time and dialogue duration of each dialogue can be easily found. Then search for the matching item item in the data structure datastruct1 according to the start time and end time of the short sentence (ie, the dialogue in the video) divided in step 12.

具体而言,步骤13包含以下子步骤:Specifically, step 13 includes the following sub-steps:

子步骤130:在起始时间前后预设时长内,在外挂字幕文件中找到相应项。Sub-step 130: Find the corresponding item in the external subtitle file within a preset time period before and after the start time.

理想的情况是音频部分中的各对白(类似于本实施方式中的短句)的起始时间、结束时间与字幕文件中对应对白(即本实施方式中的相应项)的起始时间、结束时间同步。由于现有技术中的字幕文件制作的特点,导致字幕文件中相应项的起始时间、结束时间等与音频部分中对白的起始时间、结束时间存在偏差。因此,本步骤需要在预设时长(即可能的字幕中相应项的起始时间与音频对白的起始时间之差)内,从外挂字幕中找到相应项,本实施方式中的预设时长可以为1分钟,即在从视频文件中提取的短句的起始时间前后1分钟内在外挂字幕中找到相应项。应当理解,预设时长可以根据字幕文件的实际特点进行设定,本实施方式对于预设时长的具体大小不作限制。Ideally, the start time and end time of each dialogue in the audio part (similar to the short sentences in this embodiment) and the start time and end time of the corresponding dialogue in the subtitle file (that is, the corresponding item in this embodiment) Time synchronization. Due to the characteristics of subtitle file production in the prior art, the start time and end time of corresponding items in the subtitle file deviate from the start time and end time of the dialogue in the audio part. Therefore, this step needs to find the corresponding item from the external subtitles within the preset duration (that is, the difference between the start time of the corresponding item in the possible subtitles and the start time of the audio dialogue), and the preset duration in this embodiment can be is 1 minute, that is, find the corresponding item in the external subtitles within 1 minute before and after the start time of the short sentence extracted from the video file. It should be understood that the preset duration can be set according to the actual characteristics of the subtitle file, and this embodiment does not limit the specific size of the preset duration.

子步骤131:在找到的相应项中,找出与短句的对白时长在误差允许范围内的所有项。Sub-step 131: Among the found corresponding items, find out all items whose dialogue duration with the short sentence is within the allowable range of error.

比如说,在短句的起始时间前后的1分钟内,在datastruct1中查找与短句的对白时长的误差在3秒的所有项。比如,短句的对白时长为4秒,如果在1分钟内,查找到对白时长在2.5秒至5.5秒之间的字幕项有3个,则提取出该3个相应项。应当理解,本实施方式对于误差允许范围的具体数值举例仅是为了方便理解,并不能以此限定本发明的保护范围。For example, within 1 minute before and after the start time of the short sentence, find all items in datastruct1 that have an error of 3 seconds from the dialogue duration of the short sentence. For example, the dialogue duration of a short sentence is 4 seconds, if within 1 minute, there are 3 subtitle items whose dialogue duration is between 2.5 seconds and 5.5 seconds, then the 3 corresponding items are extracted. It should be understood that the examples of specific numerical values in the tolerance range of errors in this embodiment are only for the convenience of understanding, and are not intended to limit the protection scope of the present invention.

子步骤132:判断找出的项数目是否多于一个。如果找出的项数目为一个,则认为该相应项为对应音频的匹配项,继续执行步骤14,如果找出的项数目多于一个,则需要进一步筛选中最接近的匹配项,因此继续执行步骤133。Sub-step 132: Determine whether the number of found items is more than one. If the number of items found is one, then the corresponding item is considered to be the matching item of the corresponding audio, and proceed to step 14, if the number of items found is more than one, it is necessary to further filter the closest matching item, so continue to execute Step 133.

子步骤133:将确定的短句的上一个记录与找出的项的上一个记录进行比较,直到找到最相似的一项作为匹配项。Sub-step 133: Compare the previous record of the determined phrase with the previous record of the found item until the most similar item is found as a matching item.

现举例说明如下:如图2所示,比如在步骤131中短句P在datastruct1中查找到3个字幕项(即字幕项A、字幕项B、字幕项C),则继续将短句P的上一个记录短句P-1分别与字幕项A、字幕项B、字幕项C的前一个字幕项A-1、字幕项B-1、字幕项C-1相匹配,匹配算法可以为比较起始时间和对白时长等,如果短句P-1查找到2个以上的字幕项,则继续将短句P-1的上一个记录短句P-2分别与查找到的多个字幕项的上一个记录进行匹配,依此类推,直到查找到与短句相匹配的字幕项。Now give an example and illustrate as follows: as shown in Figure 2, such as in step 131, phrase P finds 3 subtitle items (i.e. subtitle item A, subtitle item B, subtitle item C) in datastruct1, then continue to use phrase P's The previous record phrase P-1 is matched with the previous subtitle item A-1, subtitle item B-1, and subtitle item C-1 of subtitle item A, subtitle item B, and subtitle item C respectively, and the matching algorithm can be compared. Start time and dialogue duration, etc., if the phrase P-1 finds more than 2 subtitle items, then continue to record the last record phrase P-2 of the phrase P-1 with the last record of the multiple subtitle items found. One record is matched, and so on, until a subtitle entry matching the phrase is found.

步骤14:将匹配项的起始时间更改为当前视频的播放时间戳PTS,并根据播放时间戳,更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间。Step 14: Change the starting time of the matching item to the playback time stamp PTS of the current video, and update the starting time of each item in the external subtitle file whose starting time is greater than the starting time of the matching item according to the playing time stamp.

具体地说,首先将匹配项的起始时间更改为当前视频的播放时间戳PTS(Presentation time stamp,当前时间戳,简称PTS),并可以通过以下公式更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间:Specifically, first change the starting time of the matching item to the playback time stamp PTS (Presentation time stamp, current time stamp, PTS for short) of the current video, and the starting time in the external subtitle file can be updated by the following formula to be greater than the matching item The start time of each entry:

起始时间2=起始时间1-(item.起始时间-video.pts)Start time 2 = start time 1-(item.start time-video.pts)

其中,item.起始时间为当前匹配项的起始时间,video.pts为当前视频帧的时间,则(item.起始时间-video.pts)表示当前匹配项与音视频之间的时间差。起始时间1表示datastruct1中校正前字幕项的起始时间,起始时间2表示datastruct1中校正后字幕项的起始时间。Among them, item.start time is the start time of the current matching item, video.pts is the time of the current video frame, then (item.start time-video.pts) indicates the time difference between the current matching item and the audio and video. Start time 1 represents the start time of the uncorrected subtitle item in datastruct1, and start time 2 represents the start time of the corrected subtitle item in datastruct1.

本实施方式可以嵌入在播放软件中,在视频播放过程中,在视频播放的开始端以及之后的预定时间间隔(比如10分钟)内均执行本实施方式,即获取具有一定时长的音频数据,进行解码从而获取脉冲编码调制数据,然后将该部分音频数据进行分类并切分成短句,并在字幕文件中查找到短句的匹配项,进而更新匹配项以及播放时间位于该匹配项之后的所有字幕的起始时间。或者,也可以将音频数据中所有对白的起始时间进行匹配,使得外挂字幕与音视频完全同步,达到更佳的观影效果。This implementation mode can be embedded in the playback software. During the video playback process, this implementation mode is executed at the beginning of the video playback and within a predetermined time interval (such as 10 minutes) thereafter, that is, the audio data with a certain duration is acquired, and the Decode to obtain the pulse code modulation data, and then classify and segment the part of the audio data into short sentences, and find the matching item of the short sentence in the subtitle file, and then update the matching item and all subtitles whose playback time is after the matching item start time of . Or, it is also possible to match the starting time of all the dialogues in the audio data, so that the external subtitles are completely synchronized with the audio and video, achieving a better viewing effect.

本实施方式相对于现有技术而言,提取视频文件的音频部分,对音频部分进行解码获得脉冲编码调制数据,从而可以对音频中的语音信息进行分析,再将脉冲编码调制数据切分成音频片段,从而可以通过分析将将音频片段分类为语音、静音、非语音,进一步可以将分类为语音的音频片段的划分成短句,并以当前视频帧的播放时间戳PTS确定短句的起始时间和结束时间,再根据确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项,从而可以将匹配项的起始时间更改为当前视频的播放时间戳PTS,并根据播放时间戳,更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间。通过上述步骤,本实施方式可以根据对白时间自动校正字幕项的显示时间,使字幕显示与音视频播放时间一致,从而使得外挂字幕与音视频自动同步,达到较佳的观影效果,提高用户体验。Compared with the prior art, this embodiment extracts the audio part of the video file, and decodes the audio part to obtain the pulse code modulation data, so that the voice information in the audio can be analyzed, and then the pulse code modulation data is divided into audio segments , so that the audio clips can be classified into speech, silence, and non-speech through analysis, and further the audio clips classified as speech can be divided into short sentences, and the start time of the short sentences can be determined with the playback time stamp PTS of the current video frame and end time, and then search for a match in the external subtitle file according to the start time and end time of the determined phrase, so that the start time of the match can be changed to the playback timestamp PTS of the current video, and according to the playback Timestamp, update the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item. Through the above steps, this embodiment can automatically correct the display time of the subtitle item according to the dialogue time, so that the subtitle display is consistent with the audio and video playback time, so that the external subtitles and audio and video are automatically synchronized to achieve a better viewing effect and improve user experience. .

上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包含相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。The division of steps in the above methods is only for the sake of clarity of description. During implementation, they can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they contain the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

本发明第二实施方式涉及一种外挂字幕的自动同步装置,如图3所示,包含:提取模块、切分模块、划分模块、搜索模块和更新模块。The second embodiment of the present invention relates to an automatic synchronization device for external subtitles. As shown in FIG. 3 , it includes: an extraction module, a segmentation module, a division module, a search module and an update module.

提取模块用于提取视频文件的音频部分,并对音频部分进行解码,获得脉冲编码调制数据。The extraction module is used to extract the audio part of the video file, and decode the audio part to obtain pulse code modulation data.

切分模块用于将脉冲编码调制数据切分成音频片段,并对音频片段进行分类,其中,分类的类别包含:静音、语音和非语音。The segmentation module is used to segment the PCM data into audio segments and classify the audio segments, wherein the categories of classification include: silence, speech and non-speech.

划分模块用于将分类为语音的音频片段划分成短句,并确定短句的起始时间和结束时间。具体地,划分模块用于根据语音停顿进行短句划分,语音停顿至少包含第一预设数目的音频段,并且将音频片段划分成至少包含第二预设数目的音频段的短句。其中,第一预设数目的音频段的时长为,第二预设蛇目的音频端的时长为。应当理解,第一预设数目以及第二预设数目均可根据音频数据以及字幕文件的自身特点进行设定,本实施方式对于其具体数值不作限制。The division module is used to divide the audio segment classified as speech into short sentences, and determine the start time and end time of the short sentences. Specifically, the division module is used to perform short sentence division according to the speech pause, the speech pause contains at least a first preset number of audio segments, and divides the audio segment into short sentences containing at least a second preset number of audio segments. Wherein, the duration of the first preset number of audio segments is , and the duration of the second preset number of audio segments is . It should be understood that the first preset number and the second preset number can be set according to the characteristics of the audio data and the subtitle file, and this embodiment does not limit their specific values.

搜索模块进一步包含:起始匹配子模块、对白匹配子模块和比较匹配子模块。起始匹配子模块用于在起始时间前后预设时长内,在外挂字幕文件中找到相应项用于根据确定的短句的起始时间和结束时间,在外挂字幕文件中搜索一个匹配项。对白匹配子模块用于在起始匹配子模块找到的相应项中,找出与短句的对白时长在误差允许范围内的所有项。比较匹配子模块用于在对白匹配子模块找出的项数目多于一个时,将确定的短句的上一个记录与找出的项的上一个记录进行比较,直到找到最相似的一项作为匹配项。The search module further includes: an initial matching submodule, a dialogue matching submodule and a comparison matching submodule. The start matching submodule is used to find a corresponding item in the external subtitle file within a preset time period before and after the start time, and is used to search for a matching item in the external subtitle file according to the determined start time and end time of the short sentence. The dialogue matching submodule is used to find out all items whose dialogue duration with the short sentence is within the allowable error range among the corresponding items found by the initial matching submodule. The comparison and matching submodule is used to compare the last record of the determined phrase with the last record of the found item when the number of items found by the dialogue matching submodule is more than one, until the most similar item is found as matches.

更新模块用于将匹配项的起始时间更改为当前视频的播放时间戳PTS,并根据播放时间戳,更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间。The update module is used to change the starting time of the matching item to the playback timestamp PTS of the current video, and update the starting time of each item in the external subtitle file whose starting time is greater than the starting time of the matching item according to the playback timestamp .

本实施方式相对于现有技术而言,通过提取视频文件中的音频数据,并对音频数据进行分类、切分成语句,从而获取精确的语句的起始时间、结束时间,并据此在字幕文件中查找到匹配项,并将匹配项中的起始时间相应地进行修改以使字幕文件同步与音视频达到同步。因此,本实施方式无需用户手动调节外挂字幕,能够使得外挂字幕自动同步于音视频,从而达到较佳的观影效果,提高用户体验。Compared with the prior art, this embodiment extracts the audio data in the video file, classifies the audio data, and divides the audio data into sentences, so as to obtain the precise start time and end time of the sentence, and then add them to the subtitle file accordingly. Find a matching item in , and modify the start time in the matching item accordingly to synchronize the subtitle file with the audio and video. Therefore, this embodiment does not require the user to manually adjust the external subtitles, and can automatically synchronize the external subtitles with the audio and video, thereby achieving a better viewing effect and improving user experience.

不难发现,本实施方式为与第一实施方式相对应的系统实施例,本实施方式可与第一实施方式互相配合实施。第一实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在第一实施方式中。It is not difficult to find that this embodiment is a system embodiment corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment. The relevant technical details mentioned in the first embodiment are still valid in this embodiment, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner can also be applied in the first implementation manner.

值得一提的是,本实施方式中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本发明的创新部分,本实施方式中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入,但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present invention, units that are not closely related to solving the technical problems proposed by the present invention are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.

本领域的普通技术人员可以理解,上述各实施方式是实现本发明的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本发明的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific examples for realizing the present invention, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present invention. scope.

Claims (11)

1. the automatic synchronous method of plug-in captions, it is characterised in that comprise the steps of
Extract the audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse code modulation Data;
Described pulse code modulation data is cut into audio fragment, and described audio fragment is carried out point Class;Wherein, the classification of described classification comprises: quiet, voice and non-voice;
The described audio fragment being categorized as voice is divided into short sentence, and determines the initial time of described short sentence And the end time;
Initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one Individual occurrence;
The initial time of described occurrence is changed to the reproduction time stamp PTS of current video, and according to institute State reproduction time stamp, update initial time in plug-in subtitle file and be more than the initial time of described occurrence The initial time of each.
The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that The described initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one In the step of individual occurrence, comprise following sub-step:
Before and after described initial time in preset duration, in described plug-in subtitle file, find corresponding entry;
In the described corresponding entry found, find out the dialogue duration with described short sentence in error allowed band All items;
If the item number found out is more than one, a upper record of the described short sentence determined is looked for described A upper record of the item gone out compares, until finding most like one as occurrence.
The automatic synchronous method of plug-in captions the most according to claim 1 and 2, it is characterised in that Described, described audio fragment is divided in the step of short sentence, divides according to speech pause;
Wherein, described speech pause is including at least the audio section of the first preset number.
The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute Stating the first preset number is 2.
The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute State the short sentence audio section including at least the second preset number.
The automatic synchronous method of plug-in captions the most according to claim 5, it is characterised in that institute Stating the second preset number is 3.
The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that
In the described initial time determining described short sentence and the step of end time, to enter continuous speech Before Duan quiet section or the time point of non-speech segment are as the time started of sentence, to terminate continuous speech The time point of last voice segments during section is as the end time of sentence.
8. the automatic synchronizing apparatus of plug-in captions, it is characterised in that comprise: extraction module, cut Sub-module, division module, search module and more new module;
Described extraction module is for extracting the audio-frequency unit of video file, and is decoded audio-frequency unit, Obtain pulse code modulation data;
Described cutting module is used for being cut into described pulse code modulation data audio fragment, and to described Audio fragment is classified;Wherein, the classification of described classification comprises: quiet, voice and non-voice;
Described division module is for being divided into short sentence by the described audio fragment being categorized as voice, and determines institute State initial time and the end time of short sentence;
Described search module is for the initial time according to the described short sentence determined and end time, plug-in Subtitle file is searched for an occurrence;
When described more new module for changing to the broadcasting of current video by the initial time of described occurrence Between stab PTS, and stab according to described reproduction time, update in plug-in subtitle file initial time more than described The initial time of each of the initial time of occurrence.
The self-synchronous system of plug-in captions the most according to claim 8, it is characterised in that institute State search module to comprise: initial matched sub-block, dialogue matched sub-block and comparison match submodule;
Described initial matched sub-block is used for before and after described initial time in preset duration, described plug-in Subtitle file finds corresponding entry;
Described dialogue matched sub-block, in the corresponding entry that described initial matched sub-block finds, is found out With the dialogue duration of the described short sentence all items in error allowed band;
Described comparison match submodule is more than one for the item number found out in described dialogue matched sub-block Time individual, a upper record of a upper record of the described short sentence determined with the described item found out is compared Relatively, until finding most like one as occurrence.
The self-synchronous system of plug-in captions the most according to claim 8 or claim 9, it is characterised in that Described division module is additionally operable to divide according to speech pause;
Wherein, described speech pause is including at least the audio section of the first preset number.
The self-synchronous system of 11. plug-in captions according to claim 10, it is characterised in that Described division module is additionally operable to described audio fragment is divided into the audio frequency including at least the second preset number The short sentence of section.
CN201511018280.XA 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device Pending CN105898556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511018280.XA CN105898556A (en) 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511018280.XA CN105898556A (en) 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device

Publications (1)

Publication Number Publication Date
CN105898556A true CN105898556A (en) 2016-08-24

Family

ID=57002208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511018280.XA Pending CN105898556A (en) 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device

Country Status (1)

Country Link
CN (1) CN105898556A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504773A (en) * 2016-11-08 2017-03-15 上海贝生医疗设备有限公司 A kind of wearable device and voice and activities monitoring system
CN107402530A (en) * 2017-09-20 2017-11-28 淮安市维达科技有限公司 Control system of one computer using lines captions as core coordination linkage stage equipment
CN107562737A (en) * 2017-09-05 2018-01-09 语联网(武汉)信息技术有限公司 A kind of methods of video segmentation and its system for being used to translate
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN108924664A (en) * 2018-07-26 2018-11-30 青岛海信电器股份有限公司 A kind of synchronous display method and terminal of program credits
CN109005444A (en) * 2017-06-07 2018-12-14 纳宝株式会社 Content providing server, content providing terminal and content providing
CN109413475A (en) * 2017-05-09 2019-03-01 北京嘀嘀无限科技发展有限公司 Method of adjustment, device and the server of subtitle in a kind of video
CN110781649A (en) * 2019-10-30 2020-02-11 中央电视台 Subtitle editing method and device, computer storage medium and electronic equipment
CN111050201A (en) * 2019-12-10 2020-04-21 Oppo广东移动通信有限公司 Data processing method, device, electronic device and storage medium
CN113992940A (en) * 2021-12-27 2022-01-28 北京美摄网络科技有限公司 Web end character video editing method, system, electronic equipment and storage medium
CN114640874A (en) * 2022-03-09 2022-06-17 湖南国科微电子股份有限公司 Subtitle synchronization method, device, set-top box, and computer-readable storage medium
WO2023015416A1 (en) * 2021-08-09 2023-02-16 深圳Tcl新技术有限公司 Subtitle processing method and apparatus, and storage medium
CN118158464A (en) * 2024-04-10 2024-06-07 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
US20090213924A1 (en) * 2008-02-22 2009-08-27 Sheng-Nan Sun Method and Related Device for Converting Transport Stream into File
CN103647909A (en) * 2013-12-16 2014-03-19 宇龙计算机通信科技(深圳)有限公司 Caption adjusting method and caption adjusting device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
US20090213924A1 (en) * 2008-02-22 2009-08-27 Sheng-Nan Sun Method and Related Device for Converting Transport Stream into File
CN103647909A (en) * 2013-12-16 2014-03-19 宇龙计算机通信科技(深圳)有限公司 Caption adjusting method and caption adjusting device

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504773A (en) * 2016-11-08 2017-03-15 上海贝生医疗设备有限公司 A kind of wearable device and voice and activities monitoring system
CN109413475A (en) * 2017-05-09 2019-03-01 北京嘀嘀无限科技发展有限公司 Method of adjustment, device and the server of subtitle in a kind of video
CN109005444A (en) * 2017-06-07 2018-12-14 纳宝株式会社 Content providing server, content providing terminal and content providing
CN107562737A (en) * 2017-09-05 2018-01-09 语联网(武汉)信息技术有限公司 A kind of methods of video segmentation and its system for being used to translate
CN107402530A (en) * 2017-09-20 2017-11-28 淮安市维达科技有限公司 Control system of one computer using lines captions as core coordination linkage stage equipment
US11538456B2 (en) 2017-11-06 2022-12-27 Tencent Technology (Shenzhen) Company Limited Audio file processing method, electronic device, and storage medium
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
WO2019086044A1 (en) * 2017-11-06 2019-05-09 腾讯科技(深圳)有限公司 Audio file processing method, electronic device and storage medium
CN108924664A (en) * 2018-07-26 2018-11-30 青岛海信电器股份有限公司 A kind of synchronous display method and terminal of program credits
CN108924664B (en) * 2018-07-26 2021-06-08 海信视像科技股份有限公司 Synchronous display method and terminal for program subtitles
CN110781649A (en) * 2019-10-30 2020-02-11 中央电视台 Subtitle editing method and device, computer storage medium and electronic equipment
CN110781649B (en) * 2019-10-30 2023-09-15 中央电视台 Subtitle editing method and device, computer storage medium and electronic equipment
CN111050201A (en) * 2019-12-10 2020-04-21 Oppo广东移动通信有限公司 Data processing method, device, electronic device and storage medium
CN111050201B (en) * 2019-12-10 2022-06-14 Oppo广东移动通信有限公司 Data processing method, device, electronic device and storage medium
WO2023015416A1 (en) * 2021-08-09 2023-02-16 深圳Tcl新技术有限公司 Subtitle processing method and apparatus, and storage medium
CN113992940B (en) * 2021-12-27 2022-03-29 北京美摄网络科技有限公司 Web end character video editing method, system, electronic equipment and storage medium
CN113992940A (en) * 2021-12-27 2022-01-28 北京美摄网络科技有限公司 Web end character video editing method, system, electronic equipment and storage medium
CN114640874A (en) * 2022-03-09 2022-06-17 湖南国科微电子股份有限公司 Subtitle synchronization method, device, set-top box, and computer-readable storage medium
WO2023169240A1 (en) * 2022-03-09 2023-09-14 湖南国科微电子股份有限公司 Subtitle synchronization method and apparatus, set-top box and computer readable storage medium
CN118158464A (en) * 2024-04-10 2024-06-07 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105898556A (en) Plug-in subtitle automatic synchronization method and device
US11887578B2 (en) Automatic dubbing method and apparatus
US8281231B2 (en) Timeline alignment for closed-caption text using speech recognition transcripts
CN103226947B (en) Audio processing method and device based on mobile terminal
US8958013B2 (en) Aligning video clips to closed caption files
CN108604455B (en) Automatic determination of timing window for speech captions in an audio stream
US9305552B2 (en) Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment
US11190855B2 (en) Automatic generation of descriptive video service tracks
US9418650B2 (en) Training speech recognition using captions
US20080195386A1 (en) Method and a Device For Performing an Automatic Dubbing on a Multimedia Signal
CN105979347A (en) Video play method and device
CN117201889A (en) Automatic speech translation dubbing of pre-recorded video
US8564721B1 (en) Timeline alignment and coordination for closed-caption text using speech recognition transcripts
KR20150057591A (en) Method and apparatus for controlling playing video
CN105635782A (en) Subtitle output method and device
Federico et al. An automatic caption alignment mechanism for off-the-shelf speech recognition technologies
US9020817B2 (en) Using speech to text for detecting commercials and aligning edited episodes with transcripts
WO2013043984A1 (en) Systems and methods for extracting and processing intelligent structured data from media files
US9905221B2 (en) Automatic generation of a database for speech recognition from video captions
KR101618777B1 (en) A server and method for extracting text after uploading a file to synchronize between video and audio
Bordel et al. Automatic Subtitling of the Basque Parliament Plenary Sessions Videos.
Mocanu et al. Automatic subtitle synchronization and positioning system dedicated to deaf and hearing impaired people
CN112218142A (en) Method and device for separating voice from video with subtitles, storage medium and electronic equipment
JP2006510304A (en) Method and apparatus for selectable rate playback without speech distortion
WO2004100164A1 (en) Voice script system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160824

WD01 Invention patent application deemed withdrawn after publication