CN105898556A

CN105898556A - Plug-in subtitle automatic synchronization method and device

Info

Publication number: CN105898556A
Application number: CN201511018280.XA
Authority: CN
Inventors: 蔡炜
Original assignee: Leshi Zhixin Electronic Technology Tianjin Co Ltd
Current assignee: Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority date: 2015-12-30
Filing date: 2015-12-30
Publication date: 2016-08-24

Abstract

The invention relates to the technical field of video playback, and discloses an automatic synchronization method and device for external subtitles. In the present invention, the audio part of the video file is first extracted, and the audio part is decoded to obtain pulse code modulation data; then the pulse code modulation data is divided into audio segments, and the audio segments are classified, and then classified into voice audio Divide the segment into short sentences, and determine the start time and end time of the short sentences; search for a match in the external subtitle file based on the determined start time and end time of the short sentences; change the start time of the match to The playback timestamp PTS of the current video, and according to the playback timestamp, update the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item. The invention makes the display time of the subtitle file consistent with the playing time of the audio and video, thereby realizing the automatic synchronization of the external subtitles and improving the viewing experience of users.

Description

A method and device for automatic synchronization of external subtitles

技术领域technical field

本发明涉及视频播放技术领域，特别涉及一种外挂字幕的自动同步方法及装置。The invention relates to the technical field of video playback, in particular to an automatic synchronization method and device for external subtitles.

背景技术Background technique

字幕(subtitles ofmotion picture)是指以文字形式显示电视、电影、舞台作品中的对话等非影像内容，也泛指影视作品后期加工的文字。在制作电影等视频作品时，可以将视频文件和字幕文件集成在一起，这样在播放时没有办法改变和去掉的字幕称为内嵌字幕。而一些作品中，各自独立存在的视频文件和字幕文件各自独立存在，在视频播放时则可以导入所需版本的字幕文件，该种字幕文件称为外挂字幕。相比内嵌字幕，外挂字幕具有灵活多样，导入方便且不会损害视频质量等的优点。Subtitles (subtitles of motion picture) refer to non-image content such as dialogues in TV, movies, and stage works in the form of text, and also generally refer to the text of post-processing of film and television works. When making video works such as movies, video files and subtitle files can be integrated together, so that the subtitles that cannot be changed or removed during playback are called embedded subtitles. In some works, the video file and the subtitle file exist independently, and the required version of the subtitle file can be imported when the video is played. This type of subtitle file is called an external subtitle. Compared with embedded subtitles, external subtitles have the advantages of being flexible, easy to import, and will not damage video quality.

外挂字幕一般采用专用字幕软件进行字幕制作。这种制作方式首先需要人工来听完整的台词，按照每句台词所述的内容将完整的台词字幕输入到电子文本之中，其利用专用字幕软件，边听字幕内容，边进行手工断句，以确定每一句对白的起始时间和对白长度，即所谓的“时间轴”。当全部字幕制作完毕，字幕软件会输出某一种或几种格式的外挂字幕文件。当某个播放系统能够识别并支持外挂字幕的播放方式时，即可在视频播放时加载这些字幕文件。然而，由于外挂字幕文件制作的自身特点决定，外挂字幕文件的时间标记准确度较差，导致播放时与音视频的同步性较差，而用户手动调节字幕的播放时间则显得十分麻烦，严重影响用户正常观影。External subtitles generally use special subtitle software for subtitle production. This production method first needs to listen to the complete lines manually, and input the subtitles of the complete lines into the electronic text according to the content described in each line. It uses special subtitle software to listen to the content of the subtitles while manually segmenting the lines. Determine the start time and length of each line of dialogue, the so-called "time axis". When all the subtitles are finished, the subtitle software will output one or more external subtitle files in one or more formats. When a playback system can recognize and support the playback method of external subtitles, these subtitle files can be loaded during video playback. However, due to the characteristics of the production of external subtitle files, the time stamp accuracy of external subtitle files is poor, resulting in poor synchronization with audio and video during playback, and it is very troublesome for users to manually adjust the playback time of subtitles, which seriously affects Users watch movies normally.

发明内容Contents of the invention

本发明的目的在于提供一种外挂字幕的自动同步方法及装置，使得字幕文件的显示时间与音视频的播放时间相一致，从而实现外挂字幕的自动同步，提高用户的观影体验。The purpose of the present invention is to provide an automatic synchronization method and device for external subtitles, so that the display time of subtitle files is consistent with the playback time of audio and video, thereby realizing automatic synchronization of external subtitles and improving the viewing experience of users.

为解决上述技术问题，本发明的实施方式提供了一种外挂字幕的自动同步方法，包含以下步骤：提取视频文件的音频部分，并对音频部分进行解码，获得脉冲编码调制数据；将所述脉冲编码调制数据切分成音频片段，并对所述音频片段进行分类；其中，所述分类的类别包含：静音、语音和非语音；将所述分类为语音的音频片段划分成短句，并确定所述短句的起始时间和结束时间；根据所述确定的短句的起始时间和结束时间，在外挂字幕文件中搜索一个匹配项；将所述匹配项的起始时间更改为当前视频的播放时间戳PTS，并根据所述播放时间戳，更新外挂字幕文件中起始时间大于所述匹配项的起始时间的每一项的起始时间。In order to solve the above-mentioned technical problems, the embodiment of the present invention provides a kind of automatic synchronizing method of subtitles, comprising the following steps: extracting the audio part of the video file, and decoding the audio part to obtain pulse code modulation data; The coded modulation data is divided into audio segments, and the audio segments are classified; wherein, the categories of the classification include: silence, speech and non-speech; the audio segments classified as speech are divided into short sentences, and the determined The start time and the end time of the phrase; according to the start time and the end time of the determined phrase, a match is searched in the external subtitle file; the start time of the match is changed to the current video The playback time stamp PTS, and according to the playback time stamp, update the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item.

本发明的实施方式还提供了一种外挂字幕的自动同步装置，包含：提取模块、切分模块、划分模块、搜索模块和更新模块；所述提取模块用于提取视频文件的音频部分，并对音频部分进行解码，获得脉冲编码调制数据；所述切分模块用于将所述脉冲编码调制数据切分成音频片段，并对所述音频片段进行分类；其中，所述分类的类别包含：静音、语音和非语音；所述划分模块用于将所述分类为语音的音频片段划分成短句，并确定所述短句的起始时间和结束时间；所述搜索模块用于根据所述确定的短句的起始时间和结束时间，在外挂字幕文件中搜索一个匹配项；所述更新模块用于将所述匹配项的起始时间更改为当前视频的播放时间戳PTS，并根据所述播放时间戳，更新外挂字幕文件中起始时间大于所述匹配项的起始时间的每一项的起始时间。Embodiments of the present invention also provide an automatic synchronization device for external subtitles, including: an extraction module, a segmentation module, a division module, a search module and an update module; the extraction module is used to extract the audio part of the video file, and The audio part is decoded to obtain pulse code modulation data; the segmentation module is used to segment the pulse code modulation data into audio segments, and classify the audio segments; wherein, the categories of the classification include: silence, Speech and non-speech; the division module is used to divide the audio segment classified as speech into short sentences, and determine the start time and end time of the short sentences; the search module is used to determine according to the The start time and the end time of the short sentence are searched for a match in the external subtitle file; the update module is used to change the start time of the match to the playback time stamp PTS of the current video, and play according to the Timestamp, updating the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item.

本发明实施方式相对于现有技术而言，提取视频文件的音频部分，并对音频部分进行解码获得脉冲编码调制数据，将脉冲编码调制数据切分成音频片段，并将音频片段分类为语音、静音、非语音，进而将分类为语音的音频片段的划分成短句，并确定短句的起始时间和结束时间，进而根据确定的短句的起始时间和结束时间，在外挂字幕文件中搜索一个匹配项，并将匹配项的起始时间更改为当前视频的播放时间戳PTS，并根据播放时间戳，更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间，从而使得字幕文件的对白的显示时间与视频播放自动同步，提高用户观影体验。Compared with the prior art, the embodiment of the present invention extracts the audio part of the video file, decodes the audio part to obtain pulse code modulation data, divides the pulse code modulation data into audio segments, and classifies the audio segments into speech, silence , non-speech, and then divide the audio clips classified as speech into short sentences, and determine the start time and end time of the short sentences, and then search in the external subtitle file according to the determined start time and end time of the short sentences A matching item, and change the start time of the matching item to the playback timestamp PTS of the current video, and update the start of each item in the external subtitle file whose starting time is greater than the starting time of the matching item according to the playback timestamp Time, so that the display time of the subtitle file's dialogue is automatically synchronized with the video playback, improving the user's viewing experience.

优选地，在所述根据所述确定的短句的起始时间和结束时间，在外挂字幕文件中搜索一个匹配项的步骤中，包含以下子步骤：在所述起始时间前后预设时长内，在所述外挂字幕文件中找到相应项；在所述找到的相应项中，找出与所述短句的对白时长在误差允许范围内的所有项；如果找出的项数目多于一个，将所述确定的短句的上一个记录与所述找出的项的上一个记录进行比较，直到找到最相似的一项作为匹配项。从而提高字幕和音视频的匹配效率以及准确度。Preferably, in the step of searching for a matching item in the external subtitle file according to the start time and end time of the determined short sentence, the following sub-steps are included: within a preset duration before and after the start time , find the corresponding item in the external subtitle file; in the found corresponding item, find out all items whose dialogue duration with the phrase is within the allowable range of error; if the number of found items is more than one, A previous record of the determined phrase is compared with a previous record of the found item until the most similar item is found as a match. Thereby, the matching efficiency and accuracy of subtitles and audio and video are improved.

优选地，在所述将所述音频片段划分成短句的步骤中，根据语音停顿进行划分；其中，所述语音停顿至少包含第一预设数目的音频段。从而可以提高语句划分的效率。Preferably, in the step of dividing the audio segment into short sentences, the division is performed according to speech pauses; wherein, the speech pauses include at least a first preset number of audio segments. Thus, the efficiency of sentence division can be improved.

优选地，所述第一预设数目为2个。从而可以忽略较短的伴音信息，更好地保护一句话的完整性。Preferably, the first preset number is 2. Thereby, the short accompanying sound information can be ignored, and the integrity of a sentence can be better protected.

优选地，所述短句至少包含第二预设数目的音频段，所述第二预设数目为3个。从而可以滤除掉音频中的短时无效信息，提高语句划分的效率。Preferably, the phrase contains at least a second preset number of audio segments, and the second preset number is three. Therefore, the short-term invalid information in the audio can be filtered out, and the efficiency of sentence division can be improved.

附图说明Description of drawings

图1是根据本发明第一实施方式外挂字幕的自动同步方法的流程图；Fig. 1 is the flow chart of the automatic synchronization method of subtitles according to the first embodiment of the present invention;

图2是根据本发明第一实施方式短句与字幕项匹配算法示意图；Fig. 2 is a schematic diagram of a matching algorithm between phrases and subtitles according to the first embodiment of the present invention;

图3是根据本发明第二实施方式外挂字幕的自动同步装置的结构框图。Fig. 3 is a structural block diagram of an automatic synchronization device for subtitles according to a second embodiment of the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明的各实施方式进行详细的阐述。然而，本领域的普通技术人员可以理解，在本发明各实施方式中，为了使读者更好地理解本申请而提出了许多技术细节。但是，即使没有这些技术细节和基于以下各实施方式的种种变化和修改，也可以实现本申请各权利要求所要求保护的技术方案。In order to make the object, technical solution and advantages of the present invention clearer, various embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present invention, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in each claim of the present application can be realized.

本发明的第一实施方式涉及一种外挂字幕的自动同步方法，具体流程如图1所示，包含以下步骤：The first embodiment of the present invention relates to an automatic synchronization method for external subtitles, the specific process is shown in Figure 1, including the following steps:

步骤10：提取视频文件的音频部分，并对音频部分进行解码，获得脉冲编码调制数据。Step 10: extract the audio part of the video file, and decode the audio part to obtain pulse code modulation data.

视频文件是由视频流和音频流合成得到的，在线播放视频时，首先从视频文件中提取出音频流。可以采用开源库ffmpeg提取视频文件的音频部分，再通过相应解码器将音频部分解码为PCM(Pulse Coding Modulation，脉冲编码调制，简称PCM)数据。A video file is synthesized from a video stream and an audio stream. When playing a video online, the audio stream is first extracted from the video file. You can use the open source library ffmpeg to extract the audio part of the video file, and then decode the audio part into PCM (Pulse Coding Modulation, pulse code modulation, PCM for short) data through the corresponding decoder.

步骤11：将脉冲编码调制数据切分成音频片段，并对音频片段进行分类。Step 11: Segment the PCM data into audio segments, and classify the audio segments.

本实施方式中，可以利用Marsyas软件对提取的音频(脉冲编码调制数据)进行分类，比如，通过Marsyas，可以判断出该音频数据的类别：静音、语音和非语音。可以通过Marsyas提供的接口设定音频帧长为32ms，并将5个音频帧作为一个音频段，即音频段长为0.16s。在分类过程中，可以音频段为单位进行一次分类，提高分类的效率。本实施方式对于音频片段的分类方法不作具体限制，只要能够将语音和非语音区分开即可。由此可见，通过本步骤的分类即可获取音频片段中语音片段的起始时间以及结束时间，为从音频片段中提取语音语句打下基础。In this embodiment, the extracted audio (PCM data) can be classified by using Marsyas software. For example, by Marsyas, the category of the audio data can be determined: silence, speech and non-speech. The audio frame length can be set to 32ms through the interface provided by Marsyas, and 5 audio frames are used as an audio segment, that is, the audio segment length is 0.16s. In the classification process, one classification can be performed in units of audio segments, so as to improve the efficiency of classification. In this embodiment, there is no specific limitation on the classification method of audio segments, as long as speech and non-speech can be distinguished. It can be seen that, through the classification in this step, the start time and end time of the voice segment in the audio segment can be obtained, laying a foundation for extracting the voice sentence from the audio segment.

步骤12：将分类为语音的音频片段划分成短句，并确定短句的起始时间和结束时间。通过步骤11的分类，可以确定语音、非语音、静音等的起始时间和结束时间，进而可以根据语音停顿将语音划分为短句。Step 12: Divide the audio segment classified as speech into short sentences, and determine the start time and end time of the short sentences. Through the classification in step 11, the start time and end time of voice, non-voice, silence, etc. can be determined, and then the voice can be divided into short sentences according to the voice pause.

本实施方式中检测句子的开始与结束是短句划分的关键,因为只有达到较高的端点检测精度，才可以做到有的放矢，实现对句子长短和数目的控制。本步骤基于步骤11中获取的分类信息，通过采取预设的切分算法即可从音频中截取出语音单元(即短句)。具体地，可以采用以下策略进行音频切分：以进入连续语音段之前的静音段或非语音段的时间点作为句子的开始时间，以结束连续语音段时的最后一个语音段的时间点作为句子的结束时间。这样对音频切分后即可得到以一定时长的语音停顿作为分割边界的、语义相对完整的“类句子”单元，即本实施方式中的短句。In this embodiment, detecting the beginning and end of a sentence is the key to short sentence division, because only when a high endpoint detection accuracy is achieved can it be targeted and control the length and number of sentences can be achieved. This step is based on the classification information obtained in step 11, and the speech unit (ie short sentence) can be intercepted from the audio by adopting a preset segmentation algorithm. Specifically, the following strategy can be used for audio segmentation: the time point of the silent segment or non-speech segment before entering the continuous speech segment is used as the start time of the sentence, and the time point of the last speech segment at the end of the continuous speech segment is used as the sentence end time of . In this way, after the audio is segmented, a "sentence-like" unit with a certain duration of speech pause as the segmentation boundary and relatively complete semantics can be obtained, that is, a short sentence in this embodiment.

但是，通过上述切分策略检测句子端点可能会造成一些极端情况：比如会划分出一些极短的句子，这些句子的长度仅为一到两个音频段，这样的句子通常只包含一两个词语，甚至不包含任何有效的语音信息，因此这些句子需要被滤除而不能作为有效地句子进行字幕显示。However, detecting sentence endpoints through the above segmentation strategy may cause some extreme situations: for example, some extremely short sentences will be divided, the length of these sentences is only one or two audio segments, and such sentences usually only contain one or two words , does not even contain any valid speech information, so these sentences need to be filtered out and cannot be subtitled as valid sentences.

为了提高切分效率，在切分策略中设置语音停顿至少包含第一预设数目的音频段，较佳地，第一预设数目的音频段比如为2个音频段。通过设置语音停顿的最小长度，可以忽略较短的伴音信息，比如说话人的瞬时换气等，从而能够保护一句话的完整性。In order to improve the segmentation efficiency, it is set in the segmentation strategy that the speech pause includes at least a first preset number of audio segments, preferably, the first preset number of audio segments is, for example, 2 audio segments. By setting the minimum length of the speech pause, shorter audio information, such as the speaker's instantaneous breath change, can be ignored, thereby protecting the integrity of a sentence.

进一步地，切分出的短句至少包含第二预设数目的音频段，较佳地，第二预设数目的音频段比如可以为3个音频段，即忽略总长小于0.48秒的语音单元，通过限制句子的最小长度，可以滤除掉音频中的短时无效信息，比如说话人的轻咳等。Further, the segmented phrase contains at least a second preset number of audio segments, preferably, the second preset number of audio segments can be, for example, 3 audio segments, that is, speech units with a total length of less than 0.48 seconds are ignored, By limiting the minimum sentence length, short-term invalid information in the audio, such as the speaker's light cough, can be filtered out.

应当理解，本实施方式对于第一预设数据或者第二预设数目的具体数值不作限制，实际应用中，可以根据语言的特点进行调整以更精确、高效地确定语句单元的起始时间和结束时间。It should be understood that this embodiment does not limit the specific values of the first preset data or the second preset number. In practical applications, adjustments can be made according to the characteristics of the language to determine the start time and end of the sentence unit more accurately and efficiently. time.

通过步骤12，将提取出的一段音频切分成了一个一个独立的语句，并获取了语句的起始时间以及结束时间，据此可以确定语句的播放时长。Through step 12, the extracted piece of audio is divided into individual sentences, and the start time and end time of the sentences are obtained, so that the playback duration of the sentences can be determined.

步骤13：根据确定的短句的起始时间和结束时间，在外挂字幕文件中搜索一个匹配项。Step 13: Search for a matching item in the external subtitle file according to the determined start time and end time of the phrase.

通常，外挂字幕文件中包含有起始时间，对白时长等的信息。本实施方式在播放时，获取外挂字幕文件，并根据外挂字幕文件创建一个<起始时间，对白时长>的数据结构datastruct1，从而可以方便地查找到各对白的起始时间和对白时长。然后根据步骤12中划分出的短句(即视频中的对白)的起始时间和结束时间在数据结构datastruct1中寻找匹配项item。Usually, the external subtitle file contains information such as start time, dialogue duration and so on. In this embodiment, during playback, the external subtitle file is obtained, and a data structure datastruct1 of <start time, dialogue duration> is created according to the external subtitle file, so that the start time and dialogue duration of each dialogue can be easily found. Then search for the matching item item in the data structure datastruct1 according to the start time and end time of the short sentence (ie, the dialogue in the video) divided in step 12.

具体而言，步骤13包含以下子步骤：Specifically, step 13 includes the following sub-steps:

子步骤130：在起始时间前后预设时长内，在外挂字幕文件中找到相应项。Sub-step 130: Find the corresponding item in the external subtitle file within a preset time period before and after the start time.

理想的情况是音频部分中的各对白(类似于本实施方式中的短句)的起始时间、结束时间与字幕文件中对应对白(即本实施方式中的相应项)的起始时间、结束时间同步。由于现有技术中的字幕文件制作的特点，导致字幕文件中相应项的起始时间、结束时间等与音频部分中对白的起始时间、结束时间存在偏差。因此，本步骤需要在预设时长(即可能的字幕中相应项的起始时间与音频对白的起始时间之差)内，从外挂字幕中找到相应项，本实施方式中的预设时长可以为1分钟，即在从视频文件中提取的短句的起始时间前后1分钟内在外挂字幕中找到相应项。应当理解，预设时长可以根据字幕文件的实际特点进行设定，本实施方式对于预设时长的具体大小不作限制。Ideally, the start time and end time of each dialogue in the audio part (similar to the short sentences in this embodiment) and the start time and end time of the corresponding dialogue in the subtitle file (that is, the corresponding item in this embodiment) Time synchronization. Due to the characteristics of subtitle file production in the prior art, the start time and end time of corresponding items in the subtitle file deviate from the start time and end time of the dialogue in the audio part. Therefore, this step needs to find the corresponding item from the external subtitles within the preset duration (that is, the difference between the start time of the corresponding item in the possible subtitles and the start time of the audio dialogue), and the preset duration in this embodiment can be is 1 minute, that is, find the corresponding item in the external subtitles within 1 minute before and after the start time of the short sentence extracted from the video file. It should be understood that the preset duration can be set according to the actual characteristics of the subtitle file, and this embodiment does not limit the specific size of the preset duration.

子步骤131：在找到的相应项中，找出与短句的对白时长在误差允许范围内的所有项。Sub-step 131: Among the found corresponding items, find out all items whose dialogue duration with the short sentence is within the allowable range of error.

比如说，在短句的起始时间前后的1分钟内，在datastruct1中查找与短句的对白时长的误差在3秒的所有项。比如，短句的对白时长为4秒，如果在1分钟内，查找到对白时长在2.5秒至5.5秒之间的字幕项有3个，则提取出该3个相应项。应当理解，本实施方式对于误差允许范围的具体数值举例仅是为了方便理解，并不能以此限定本发明的保护范围。For example, within 1 minute before and after the start time of the short sentence, find all items in datastruct1 that have an error of 3 seconds from the dialogue duration of the short sentence. For example, the dialogue duration of a short sentence is 4 seconds, if within 1 minute, there are 3 subtitle items whose dialogue duration is between 2.5 seconds and 5.5 seconds, then the 3 corresponding items are extracted. It should be understood that the examples of specific numerical values in the tolerance range of errors in this embodiment are only for the convenience of understanding, and are not intended to limit the protection scope of the present invention.

子步骤132：判断找出的项数目是否多于一个。如果找出的项数目为一个，则认为该相应项为对应音频的匹配项，继续执行步骤14，如果找出的项数目多于一个，则需要进一步筛选中最接近的匹配项，因此继续执行步骤133。Sub-step 132: Determine whether the number of found items is more than one. If the number of items found is one, then the corresponding item is considered to be the matching item of the corresponding audio, and proceed to step 14, if the number of items found is more than one, it is necessary to further filter the closest matching item, so continue to execute Step 133.

子步骤133：将确定的短句的上一个记录与找出的项的上一个记录进行比较，直到找到最相似的一项作为匹配项。Sub-step 133: Compare the previous record of the determined phrase with the previous record of the found item until the most similar item is found as a matching item.

现举例说明如下：如图2所示，比如在步骤131中短句P在datastruct1中查找到3个字幕项(即字幕项A、字幕项B、字幕项C)，则继续将短句P的上一个记录短句P-1分别与字幕项A、字幕项B、字幕项C的前一个字幕项A-1、字幕项B-1、字幕项C-1相匹配，匹配算法可以为比较起始时间和对白时长等，如果短句P-1查找到2个以上的字幕项，则继续将短句P-1的上一个记录短句P-2分别与查找到的多个字幕项的上一个记录进行匹配，依此类推，直到查找到与短句相匹配的字幕项。Now give an example and illustrate as follows: as shown in Figure 2, such as in step 131, phrase P finds 3 subtitle items (i.e. subtitle item A, subtitle item B, subtitle item C) in datastruct1, then continue to use phrase P's The previous record phrase P-1 is matched with the previous subtitle item A-1, subtitle item B-1, and subtitle item C-1 of subtitle item A, subtitle item B, and subtitle item C respectively, and the matching algorithm can be compared. Start time and dialogue duration, etc., if the phrase P-1 finds more than 2 subtitle items, then continue to record the last record phrase P-2 of the phrase P-1 with the last record of the multiple subtitle items found. One record is matched, and so on, until a subtitle entry matching the phrase is found.

步骤14：将匹配项的起始时间更改为当前视频的播放时间戳PTS，并根据播放时间戳，更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间。Step 14: Change the starting time of the matching item to the playback time stamp PTS of the current video, and update the starting time of each item in the external subtitle file whose starting time is greater than the starting time of the matching item according to the playing time stamp.

具体地说，首先将匹配项的起始时间更改为当前视频的播放时间戳PTS(Presentation time stamp，当前时间戳，简称PTS)，并可以通过以下公式更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间：Specifically, first change the starting time of the matching item to the playback time stamp PTS (Presentation time stamp, current time stamp, PTS for short) of the current video, and the starting time in the external subtitle file can be updated by the following formula to be greater than the matching item The start time of each entry:

起始时间2＝起始时间1-(item.起始时间-video.pts)Start time 2 = start time 1-(item.start time-video.pts)

其中，item.起始时间为当前匹配项的起始时间，video.pts为当前视频帧的时间，则(item.起始时间-video.pts)表示当前匹配项与音视频之间的时间差。起始时间1表示datastruct1中校正前字幕项的起始时间，起始时间2表示datastruct1中校正后字幕项的起始时间。Among them, item.start time is the start time of the current matching item, video.pts is the time of the current video frame, then (item.start time-video.pts) indicates the time difference between the current matching item and the audio and video. Start time 1 represents the start time of the uncorrected subtitle item in datastruct1, and start time 2 represents the start time of the corrected subtitle item in datastruct1.

本实施方式可以嵌入在播放软件中，在视频播放过程中，在视频播放的开始端以及之后的预定时间间隔(比如10分钟)内均执行本实施方式，即获取具有一定时长的音频数据，进行解码从而获取脉冲编码调制数据，然后将该部分音频数据进行分类并切分成短句，并在字幕文件中查找到短句的匹配项，进而更新匹配项以及播放时间位于该匹配项之后的所有字幕的起始时间。或者，也可以将音频数据中所有对白的起始时间进行匹配，使得外挂字幕与音视频完全同步，达到更佳的观影效果。This implementation mode can be embedded in the playback software. During the video playback process, this implementation mode is executed at the beginning of the video playback and within a predetermined time interval (such as 10 minutes) thereafter, that is, the audio data with a certain duration is acquired, and the Decode to obtain the pulse code modulation data, and then classify and segment the part of the audio data into short sentences, and find the matching item of the short sentence in the subtitle file, and then update the matching item and all subtitles whose playback time is after the matching item start time of . Or, it is also possible to match the starting time of all the dialogues in the audio data, so that the external subtitles are completely synchronized with the audio and video, achieving a better viewing effect.

本实施方式相对于现有技术而言，提取视频文件的音频部分，对音频部分进行解码获得脉冲编码调制数据，从而可以对音频中的语音信息进行分析，再将脉冲编码调制数据切分成音频片段，从而可以通过分析将将音频片段分类为语音、静音、非语音，进一步可以将分类为语音的音频片段的划分成短句，并以当前视频帧的播放时间戳PTS确定短句的起始时间和结束时间，再根据确定的短句的起始时间和结束时间，在外挂字幕文件中搜索一个匹配项，从而可以将匹配项的起始时间更改为当前视频的播放时间戳PTS，并根据播放时间戳，更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间。通过上述步骤，本实施方式可以根据对白时间自动校正字幕项的显示时间，使字幕显示与音视频播放时间一致，从而使得外挂字幕与音视频自动同步，达到较佳的观影效果，提高用户体验。Compared with the prior art, this embodiment extracts the audio part of the video file, and decodes the audio part to obtain the pulse code modulation data, so that the voice information in the audio can be analyzed, and then the pulse code modulation data is divided into audio segments , so that the audio clips can be classified into speech, silence, and non-speech through analysis, and further the audio clips classified as speech can be divided into short sentences, and the start time of the short sentences can be determined with the playback time stamp PTS of the current video frame and end time, and then search for a match in the external subtitle file according to the start time and end time of the determined phrase, so that the start time of the match can be changed to the playback timestamp PTS of the current video, and according to the playback Timestamp, update the start time of each item in the external subtitle file whose start time is greater than the start time of the matching item. Through the above steps, this embodiment can automatically correct the display time of the subtitle item according to the dialogue time, so that the subtitle display is consistent with the audio and video playback time, so that the external subtitles and audio and video are automatically synchronized to achieve a better viewing effect and improve user experience. .

上面各种方法的步骤划分，只是为了描述清楚，实现时可以合并为一个步骤或者对某些步骤进行拆分，分解为多个步骤，只要包含相同的逻辑关系，都在本专利的保护范围内；对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计，但不改变其算法和流程的核心设计都在该专利的保护范围内。The division of steps in the above methods is only for the sake of clarity of description. During implementation, they can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they contain the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

本发明第二实施方式涉及一种外挂字幕的自动同步装置，如图3所示，包含：提取模块、切分模块、划分模块、搜索模块和更新模块。The second embodiment of the present invention relates to an automatic synchronization device for external subtitles. As shown in FIG. 3 , it includes: an extraction module, a segmentation module, a division module, a search module and an update module.

提取模块用于提取视频文件的音频部分，并对音频部分进行解码，获得脉冲编码调制数据。The extraction module is used to extract the audio part of the video file, and decode the audio part to obtain pulse code modulation data.

切分模块用于将脉冲编码调制数据切分成音频片段，并对音频片段进行分类，其中，分类的类别包含：静音、语音和非语音。The segmentation module is used to segment the PCM data into audio segments and classify the audio segments, wherein the categories of classification include: silence, speech and non-speech.

划分模块用于将分类为语音的音频片段划分成短句，并确定短句的起始时间和结束时间。具体地，划分模块用于根据语音停顿进行短句划分，语音停顿至少包含第一预设数目的音频段，并且将音频片段划分成至少包含第二预设数目的音频段的短句。其中，第一预设数目的音频段的时长为，第二预设蛇目的音频端的时长为。应当理解，第一预设数目以及第二预设数目均可根据音频数据以及字幕文件的自身特点进行设定，本实施方式对于其具体数值不作限制。The division module is used to divide the audio segment classified as speech into short sentences, and determine the start time and end time of the short sentences. Specifically, the division module is used to perform short sentence division according to the speech pause, the speech pause contains at least a first preset number of audio segments, and divides the audio segment into short sentences containing at least a second preset number of audio segments. Wherein, the duration of the first preset number of audio segments is , and the duration of the second preset number of audio segments is . It should be understood that the first preset number and the second preset number can be set according to the characteristics of the audio data and the subtitle file, and this embodiment does not limit their specific values.

搜索模块进一步包含：起始匹配子模块、对白匹配子模块和比较匹配子模块。起始匹配子模块用于在起始时间前后预设时长内，在外挂字幕文件中找到相应项用于根据确定的短句的起始时间和结束时间，在外挂字幕文件中搜索一个匹配项。对白匹配子模块用于在起始匹配子模块找到的相应项中，找出与短句的对白时长在误差允许范围内的所有项。比较匹配子模块用于在对白匹配子模块找出的项数目多于一个时，将确定的短句的上一个记录与找出的项的上一个记录进行比较，直到找到最相似的一项作为匹配项。The search module further includes: an initial matching submodule, a dialogue matching submodule and a comparison matching submodule. The start matching submodule is used to find a corresponding item in the external subtitle file within a preset time period before and after the start time, and is used to search for a matching item in the external subtitle file according to the determined start time and end time of the short sentence. The dialogue matching submodule is used to find out all items whose dialogue duration with the short sentence is within the allowable error range among the corresponding items found by the initial matching submodule. The comparison and matching submodule is used to compare the last record of the determined phrase with the last record of the found item when the number of items found by the dialogue matching submodule is more than one, until the most similar item is found as matches.

更新模块用于将匹配项的起始时间更改为当前视频的播放时间戳PTS，并根据播放时间戳，更新外挂字幕文件中起始时间大于匹配项的起始时间的每一项的起始时间。The update module is used to change the starting time of the matching item to the playback timestamp PTS of the current video, and update the starting time of each item in the external subtitle file whose starting time is greater than the starting time of the matching item according to the playback timestamp .

本实施方式相对于现有技术而言，通过提取视频文件中的音频数据，并对音频数据进行分类、切分成语句，从而获取精确的语句的起始时间、结束时间，并据此在字幕文件中查找到匹配项，并将匹配项中的起始时间相应地进行修改以使字幕文件同步与音视频达到同步。因此，本实施方式无需用户手动调节外挂字幕，能够使得外挂字幕自动同步于音视频，从而达到较佳的观影效果，提高用户体验。Compared with the prior art, this embodiment extracts the audio data in the video file, classifies the audio data, and divides the audio data into sentences, so as to obtain the precise start time and end time of the sentence, and then add them to the subtitle file accordingly. Find a matching item in , and modify the start time in the matching item accordingly to synchronize the subtitle file with the audio and video. Therefore, this embodiment does not require the user to manually adjust the external subtitles, and can automatically synchronize the external subtitles with the audio and video, thereby achieving a better viewing effect and improving user experience.

不难发现，本实施方式为与第一实施方式相对应的系统实施例，本实施方式可与第一实施方式互相配合实施。第一实施方式中提到的相关技术细节在本实施方式中依然有效，为了减少重复，这里不再赘述。相应地，本实施方式中提到的相关技术细节也可应用在第一实施方式中。It is not difficult to find that this embodiment is a system embodiment corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment. The relevant technical details mentioned in the first embodiment are still valid in this embodiment, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner can also be applied in the first implementation manner.

值得一提的是，本实施方式中所涉及到的各模块均为逻辑模块，在实际应用中，一个逻辑单元可以是一个物理单元，也可以是一个物理单元的一部分，还可以以多个物理单元的组合实现。此外，为了突出本发明的创新部分，本实施方式中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入，但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present invention, units that are not closely related to solving the technical problems proposed by the present invention are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.

本领域的普通技术人员可以理解，上述各实施方式是实现本发明的具体实施例，而在实际应用中，可以在形式上和细节上对其作各种改变，而不偏离本发明的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific examples for realizing the present invention, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present invention. scope.

Claims

1. the automatic synchronous method of plug-in captions, it is characterised in that comprise the steps of

Extract the audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse code modulation Data；

Described pulse code modulation data is cut into audio fragment, and described audio fragment is carried out point Class；Wherein, the classification of described classification comprises: quiet, voice and non-voice；

The described audio fragment being categorized as voice is divided into short sentence, and determines the initial time of described short sentence And the end time；

Initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one Individual occurrence；

The initial time of described occurrence is changed to the reproduction time stamp PTS of current video, and according to institute State reproduction time stamp, update initial time in plug-in subtitle file and be more than the initial time of described occurrence The initial time of each.

The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that The described initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one In the step of individual occurrence, comprise following sub-step:

Before and after described initial time in preset duration, in described plug-in subtitle file, find corresponding entry；

In the described corresponding entry found, find out the dialogue duration with described short sentence in error allowed band All items；

If the item number found out is more than one, a upper record of the described short sentence determined is looked for described A upper record of the item gone out compares, until finding most like one as occurrence.

The automatic synchronous method of plug-in captions the most according to claim 1 and 2, it is characterised in that Described, described audio fragment is divided in the step of short sentence, divides according to speech pause；

Wherein, described speech pause is including at least the audio section of the first preset number.

The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute Stating the first preset number is 2.

The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute State the short sentence audio section including at least the second preset number.

The automatic synchronous method of plug-in captions the most according to claim 5, it is characterised in that institute Stating the second preset number is 3.

The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that

In the described initial time determining described short sentence and the step of end time, to enter continuous speech Before Duan quiet section or the time point of non-speech segment are as the time started of sentence, to terminate continuous speech The time point of last voice segments during section is as the end time of sentence.

8. the automatic synchronizing apparatus of plug-in captions, it is characterised in that comprise: extraction module, cut Sub-module, division module, search module and more new module；

Described extraction module is for extracting the audio-frequency unit of video file, and is decoded audio-frequency unit, Obtain pulse code modulation data；

Described cutting module is used for being cut into described pulse code modulation data audio fragment, and to described Audio fragment is classified；Wherein, the classification of described classification comprises: quiet, voice and non-voice；

Described division module is for being divided into short sentence by the described audio fragment being categorized as voice, and determines institute State initial time and the end time of short sentence；

Described search module is for the initial time according to the described short sentence determined and end time, plug-in Subtitle file is searched for an occurrence；

When described more new module for changing to the broadcasting of current video by the initial time of described occurrence Between stab PTS, and stab according to described reproduction time, update in plug-in subtitle file initial time more than described The initial time of each of the initial time of occurrence.

The self-synchronous system of plug-in captions the most according to claim 8, it is characterised in that institute State search module to comprise: initial matched sub-block, dialogue matched sub-block and comparison match submodule；

Described initial matched sub-block is used for before and after described initial time in preset duration, described plug-in Subtitle file finds corresponding entry；

Described dialogue matched sub-block, in the corresponding entry that described initial matched sub-block finds, is found out With the dialogue duration of the described short sentence all items in error allowed band；

Described comparison match submodule is more than one for the item number found out in described dialogue matched sub-block Time individual, a upper record of a upper record of the described short sentence determined with the described item found out is compared Relatively, until finding most like one as occurrence.

The self-synchronous system of plug-in captions the most according to claim 8 or claim 9, it is characterised in that Described division module is additionally operable to divide according to speech pause；

The self-synchronous system of 11. plug-in captions according to claim 10, it is characterised in that Described division module is additionally operable to described audio fragment is divided into the audio frequency including at least the second preset number The short sentence of section.