[go: up one dir, main page]

WO2017054488A1 - Procédé de commande de lecture de télévision, serveur et système de commande de lecture de télévision - Google Patents

Procédé de commande de lecture de télévision, serveur et système de commande de lecture de télévision Download PDF

Info

Publication number
WO2017054488A1
WO2017054488A1 PCT/CN2016/084461 CN2016084461W WO2017054488A1 WO 2017054488 A1 WO2017054488 A1 WO 2017054488A1 CN 2016084461 W CN2016084461 W CN 2016084461W WO 2017054488 A1 WO2017054488 A1 WO 2017054488A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
audio
data
television terminal
time segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/084461
Other languages
English (en)
Chinese (zh)
Inventor
戚炎兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL New Technology Co Ltd
Original Assignee
Shenzhen TCL New Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL New Technology Co Ltd filed Critical Shenzhen TCL New Technology Co Ltd
Publication of WO2017054488A1 publication Critical patent/WO2017054488A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4852End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the present invention relates to the field of television technologies, and in particular, to a television broadcast control method, a server, and a television broadcast control system.
  • Most video files may only provide one voice, but more than two subtitles are provided.
  • the user can only listen to the default voice provided in the video file, but when the user does not understand the default language, You can only understand the character dialogue and plot by watching the subtitles. This will reduce the user's audiovisual experience.
  • the main objective of the present invention is to provide a television broadcast control method, a server and a television broadcast control system, which are designed to provide audio that can be understood by a user according to the language requirements of different users, so as to avoid the use of subtitles to understand the character dialogue and The flaws in the plot, thus improving the user experience of watching TV.
  • the present invention provides a television broadcast control method, and the television broadcast control method includes the following steps:
  • the present invention further provides a server, where the server includes:
  • a first receiving module configured to receive first audio data and subtitle data sent by the television terminal
  • a generating processing module configured to perform recognition processing on the first audio data and the caption data, and generate a role list and sample audio parameters
  • a synthesis processing module configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter that is reported by the television terminal according to the role list and the sample audio parameter, The first audio data is synthesized into the second audio data;
  • a first sending module configured to send the second audio data to the television terminal to control the second audio data and the subtitle data to be played at the television terminal.
  • the present invention further provides a television broadcast control system, the television broadcast control system comprising a television terminal and a server as described above, the television terminal comprising:
  • a second sending module configured to send first audio data and subtitle data to the server
  • a second receiving module configured to receive a role list and sample audio parameters generated by the server after the first audio data and the caption data are identified and processed;
  • a feedback module configured to generate a user setting parameter according to the role list and the sample audio parameter, and feed back the user setting parameter to the server;
  • An acquiring module configured to acquire second audio data that is synthesized by the server when the user setting parameter is received, where the first audio data is synthesized;
  • a synchronous play module configured to synchronously play the second audio data, the video data, and the caption data
  • the television terminal extracts the video data, the first audio data, and the subtitle data from a video file.
  • the television broadcast control method, the server and the television broadcast control system provided by the present invention first receive the first audio data and the caption data sent by the television terminal through the server, and perform recognition processing to generate a character list and sample audio parameters, and then Transmitting the role list and the sample audio parameters to the television terminal, and synthesizing the first audio data into the second audio data according to the user setting parameter when receiving the user setting parameter fed back by the television terminal, and finally Transmitting the second audio data to the television terminal to control the second audio data and the caption data to be played at the television terminal.
  • the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
  • FIG. 1 is a schematic flow chart of an embodiment of a television broadcast control method according to the present invention.
  • FIG. 2 is a schematic diagram of a refinement process of the step of FIG. 1 for identifying the first audio data, and generating a role list and sample audio parameters;
  • 3 is a waveform diagram of a subtitle time stamp and first audio data
  • FIG. 4 is a schematic diagram of a refinement process of performing spectrum analysis on the first audio data in the time segment in the step of FIG. 2, and performing a classification to generate a role list;
  • FIG. 5 is a schematic diagram of a refinement process of generating a sample audio parameter corresponding to the role list by using a speech synthesis technology in the step of FIG. 2;
  • FIG. 6 is a step of FIG. 1 for transmitting the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter fed back by the television terminal, the first setting according to the user setting parameter
  • the audio data is synthesized into a refinement flow diagram of the second audio data
  • FIG. 7 is a schematic diagram showing a synthesized waveform of the second audio data
  • FIG. 8 is a schematic diagram of functional modules of an embodiment of a server according to the present invention.
  • FIG. 9 is a schematic diagram of a refinement function module of the generation processing module in FIG. 8;
  • FIG. 10 is a schematic diagram of a refinement function module of the categorizing unit of FIG. 9;
  • FIG. 11 is a schematic diagram of a refinement function module of the generating unit in FIG. 9;
  • FIG. 12 is a schematic diagram of a refinement function module of the synthesis processing module of FIG. 8;
  • FIG. 13 is a schematic diagram of functional modules of an embodiment of a television broadcast control system according to the present invention.
  • FIG. 14 is a schematic diagram of a refinement function module of the television terminal of FIG.
  • a television broadcast control method of the television terminal includes the following steps:
  • Step S10 the server receives the first audio data and the caption data sent by the television terminal;
  • the audio and video playback of the television is completed by the television terminal and the server.
  • the television terminal completes the collation and transmission of audio, subtitle and other data, and provides a user interface for the user to set parameters.
  • the server receives data such as audio and subtitles sent by the television terminal, and completes processing of the audio and subtitle data to synthesize the audio data and transmit it to the television terminal for display.
  • the television terminal when the user turns on the dubbing setting function through the remote controller of the television terminal, the television terminal quickly decodes the video file, extracts the audio or default audio selected by the user, and the subtitle or default subtitle selected by the user, and the audio data. And the subtitle data is packaged and sent to the server.
  • Step S20 performing identification processing on the first audio data and the caption data, and generating a role list and sample audio parameters
  • the server performs identification processing on the first audio data and the caption data to generate a character list and sample audio parameters.
  • the generation of the role list may first select a predetermined number of timestamps, such as selecting three timestamps, and then separately identifying and analyzing the audio data in the three timestamps, and similarly sounding each timestamp.
  • the voices are classified into one type, namely, role 1, role 2, and the like.
  • the specific classification method can distinguish statistics according to the audio spectrum.
  • the sample audio is a preset fixed audio, which can be audio of different genders, and sample audio corresponding to different frequencies of different genders.
  • sample audio such as male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio, etc.
  • the frequency range of the audio can be further fined. The points are not limited to the high, medium and low audio ranges in the embodiment.
  • the sample audio can also be audio of a famous dubbing person or a professional dubbing person.
  • Step S30 sending the role list and the sample audio parameters to the television terminal, and synthesizing the first audio data into the second audio data when receiving the user setting parameter fed back by the television terminal;
  • the server sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface. And the sample audio parameters to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server.
  • the server synthesizes the first audio data into the second audio data according to the user setting parameter, wherein the synthesis process of the second audio data requires a text speech engine and a vocal cancellation program, which may be specifically
  • the text speech engine generates new audio data corresponding to the subtitle data (specifically according to user setting parameters), and performs vocal cancellation on the first audio data according to the vocal cancellation program, and then The new audio data and the vocal-removed first audio data are combined into the second audio data.
  • Step S40 Send the second audio data to the television terminal to control the second audio data and the subtitle data to be played in the television terminal.
  • the server sends the synthesized second audio data corresponding to the role list to the television terminal. It can be understood that, in addition to extracting audio data and subtitle data from the video file, the television terminal extracts video data from the video file, and at this time, the television terminal receives the second In the case of audio data, the video data is synchronized with the second audio data and the subtitle data, and finally played.
  • the television broadcast control method provided by the present invention first receives the first audio data and the caption data sent by the television terminal through the server, and performs recognition processing to generate a character list and sample audio parameters, and then the character list and the sample audio.
  • Sending parameters to the television terminal when receiving the user setting parameter fed back by the television terminal, synthesizing the first audio data into second audio data according to the user setting parameter, and finally using the second audio data Sending to the television terminal to control the second audio data and the subtitle data to be played at the television terminal.
  • the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
  • the step S20 includes:
  • Step S201 the server extracts a subtitle timestamp from the subtitle data
  • Step S202 searching for a time segment in which the first audio data appears according to the subtitle timestamp
  • the server extracts the subtitle timestamp from the subtitle data, and finds a time segment in which the character dubbing appears according to the subtitle timestamp, and calls the speech recognition module to perform recognition processing, and statistics A plurality of audio data having a higher frequency appear in the time segment for the user to select.
  • Step S203 performing spectrum analysis on the first audio data in the time segment, and performing classification to generate a role list
  • the step S203 may specifically include:
  • Step S2031 respectively acquiring first audio data in the first time segment and the second time segment
  • Step S2032 determining whether the spectrum range and the spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
  • the server respectively acquires first audio data in the first time segment and the second time segment, and separately processes the first time segment and A spectrum range and a spectral amplitude of the first audio data in the second time segment are analyzed, and determining whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment are different from a spectrum of the first audio data in the second time segment The range and spectrum are consistent.
  • Step S2034 if no, classifying the first audio data in the first time segment and the second time segment into different roles.
  • the server determines that the first audio data in the first time segment and the second time segment is classified into the same role, if not, the first time segment and the second time are The first audio data within the segment is classified into different roles.
  • the spectral range and the spectral amplitude of the first audio data in the two time segments are consistent, if the similarity between the two is greater than or equal to 90%, the determination is consistent.
  • the value of the similarity is not limited to the embodiment, but may be reasonably selected according to actual needs.
  • the audio spectrum of one of the time segments is used as a reference, and is defined as role 1, and then compared with the audio spectrum in subsequent time segments. If the characteristics of the two spectra are determined to be close, the audio in the two time segments is taken. Classified as role 1; if it is judged that the features of the two spectra do not match, the audio in the subsequent time segment is classified as role 2 until the audio spectrum recognition in all time segments is completed. Finally, the number of occurrences of the character is counted, and the characters with more occurrences are the main personas.
  • this embodiment mainly analyzes the audio spectrum of the audio data in the time stamp. Because the pronunciation of each persona is different in spectrum, for example, the pronunciation spectrum of male voice is mainly concentrated in the middle and low frequency regions, while the pronunciation spectrum of female voice is concentrated in the middle and high frequency regions. In addition, in the pronunciation spectrum between characters, the spectral amplitudes of the respective frequency points also differ. Therefore, the pronunciation and audio between the characters can be distinguished by combining the spectral range and the spectral amplitude.
  • Step S204 Generate a sample audio parameter corresponding to the character list by using a speech synthesis technology.
  • the timestamp indicates that the audio data in the time segment has persona audio, and the time segment The audio data is speech-recognized to recognize the audio of one of the personas.
  • the sample audio is a preset fixed audio, which may be audio of different genders, and sample audio corresponding to different frequencies of different genders, for example, selecting a specific audio “whether or not the voice of the segment is selected as a character. Dubbing?", and provide male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio and other sample audio, of course, in other embodiments, you can also audio
  • the frequency range is further subdivided and is not limited to the high, medium and low audio ranges in this embodiment.
  • the user interface mode is popped up for the user to select, wherein the role list is the result of the role classification in the foregoing text; and the sample parameter refers to each The timestamp parameter in the role collation and the sample audio that the user can choose to preview.
  • the timestamp parameter allows the user to preview the original dubbing as well as the sample audio.
  • the step S204 includes:
  • Step S2041 Extract, for each role in the role list, a predetermined number of subtitle timestamps from the subtitle data;
  • Step S2042 Generate, by the text-to-speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
  • the user can preview the original voice of the character and the selected sample audio.
  • the television terminal transmits the corresponding parameter to the server, and the server uses the text.
  • the speech engine generates a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps for transmission to the television terminal for preview selection.
  • the server provides three time stamps for each role classification, and simultaneously sends the generated sample audio to the television terminal.
  • the user may select the provided time stamp for each role categorization to play the audio of the corresponding time at the television terminal, so that the user recognizes the person represented by the role classification.
  • the user can preview the sample audio produced by the audition text speech engine to select and determine the appropriate sample audio parameters.
  • the step S30 includes:
  • Step S301 the generated role list and sample audio parameters are sent to the television terminal
  • Step S302 receiving user setting parameters fed back by the television terminal
  • the server sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface.
  • a sample audio parameter to generate the user setting parameter, and then the television terminal feeds back the user setting parameter to the server.
  • Step S303 performing audio filtering on the first audio data, and synthesizing the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters.
  • new audio data corresponding to the subtitle data may be generated according to the text speech engine (specifically according to setting parameters of the user), and the first An audio data is subjected to vocal cancellation, and then the new audio data and the vocal-removed first audio data are synthesized into second audio data corresponding to the character list.
  • the existing vocal elimination method mainly uses the same vocal pronunciation in the left and right channels, and subtracts the left and right channels to remove the same part of the two channels, but this method is not only
  • the background sound causes a large loss (especially in the low frequency part), and when the vocal pronunciation is different in the two channels, the vocal is not well eliminated.
  • a bandpass filter is used, and in the frequency band of the bandpass filter, only the amplitude of the original pronunciation is reduced, and the discrimination of the synthesized audio is not affected, so that the low frequency and high frequency portions can be better preserved. .
  • the server 1 includes:
  • the first receiving module 10 is configured to receive first audio data and subtitle data sent by the television terminal;
  • the audio and video playback of the television is completed by the television terminal in cooperation with the server 1.
  • the television terminal completes the collation and transmission of audio and subtitle data, and provides a user interface for the user to set parameters.
  • the server 1 receives data such as audio and subtitles transmitted by the television terminal, and completes processing of audio and subtitle data to synthesize the audio data and transmit it to the television terminal for display.
  • the television terminal when the user turns on the dubbing setting function through the remote controller of the television terminal, the television terminal quickly decodes the video file, extracts the audio or default audio selected by the user, and the subtitle or default subtitle selected by the user, and the audio data. And the subtitle data is packaged and sent to the server 1.
  • the generating processing module 20 is configured to perform recognition processing on the first audio data and the caption data to generate a role list and sample audio parameters;
  • the server 1 performs identification processing on the first audio data and the caption data to generate a character list and sample audio parameters.
  • the generation of the role list may first select a predetermined number of timestamps, such as selecting three timestamps, and then separately identifying and analyzing the audio data in the three timestamps, and similarly sounding each timestamp.
  • the voices are classified into one type, namely, role 1, role 2, and the like.
  • the specific classification method can distinguish statistics according to the audio spectrum.
  • the sample audio is a preset fixed audio, which can be audio of different genders, and sample audio corresponding to different frequencies of different genders.
  • sample audio such as male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio, etc.
  • the frequency range of the audio can be further fined. The points are not limited to the high, medium and low audio ranges in the embodiment.
  • the sample audio can also be audio of a famous dubbing person or a professional dubbing person.
  • a synthesis processing module 30 configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameters fed back by the television terminal according to the role list and the sample audio parameters, The first audio data is synthesized into second audio data;
  • the server 1 sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role on the user interface.
  • the list and sample audio parameters are generated to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server 1.
  • the server 1 synthesizes the first audio data into the second audio data according to the user setting parameter, wherein the synthesis process of the second audio data requires a text speech engine and a vocal cancellation program, specifically Generating new audio data corresponding to the subtitle data according to the text speech engine (specifically according to user setting parameters), and performing vocal cancellation on the first audio data according to the vocal cancellation program, and then The new audio data and the vocal-removed first audio data are combined into the second audio data.
  • the first sending module 40 is configured to send the second audio data to the television terminal to control the second audio data and the caption data to be played in the television terminal.
  • the server 1 sends the synthesized second audio data corresponding to the role list to the television terminal. It can be understood that, in addition to extracting audio data and subtitle data from the video file, the television terminal extracts video data from the video file, and at this time, the television terminal receives the second In the case of audio data, the video data is synchronized with the second audio data and the subtitle data, and finally played.
  • the server 1 provided by the present invention first receives the first audio data and the caption data sent by the television terminal, and performs recognition processing to generate a character list and sample audio parameters, and then sends the role list and the sample audio parameters to The television terminal, when receiving the user setting parameter fed back by the television terminal, synthesizing the first audio data into second audio data according to the user setting parameter, and finally transmitting the second audio data to the The television terminal controls the second audio data and the subtitle data to be played at the television terminal.
  • the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
  • the generation processing module 20 includes:
  • An obtaining unit 201 configured to extract a subtitle timestamp from the subtitle data
  • the searching unit 202 is configured to search, according to the subtitle timestamp, a time segment in which the first audio data appears;
  • the server 1 extracts a subtitle timestamp from the subtitle data, and finds a time segment in which the character dubbing appears according to the subtitle timestamp, and calls the speech recognition module to perform recognition processing, and statistics are performed. A plurality of audio data having a higher frequency appear in the time segment for the user to select.
  • the categorizing unit 203 is configured to perform spectrum analysis on the first audio data in the time segment, and perform categorization to generate a role list.
  • the categorizing unit 203 includes:
  • the obtaining subunit 2031 is configured to respectively acquire first audio data in the first time segment and the second time segment;
  • a determining subunit 2032 configured to determine whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
  • the server respectively acquires first audio data in the first time segment and the second time segment, and separately processes the first time segment and A spectrum range and a spectral amplitude of the first audio data in the second time segment are analyzed, and determining whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment are different from a spectrum of the first audio data in the second time segment The range and spectrum are consistent.
  • a first categorization sub-unit 2033 configured to: when determining a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment, the first time segment and the second time The first audio data within the segment is classified into the same role;
  • a second collation sub-unit 2034 configured to: when determining that a spectrum range and/or a spectrum amplitude of the first audio data in the first time segment and the second time segment are inconsistent, then the first time segment and the first time segment The first audio data within the two time segments is classified into different roles.
  • the generating unit 204 is configured to generate a sample audio parameter corresponding to the character list by using a voice synthesis technology.
  • the generating unit 204 includes:
  • An extracting subunit 2041 configured to extract, from each of the roles in the role list, a predetermined number of subtitle timestamps from the subtitle data;
  • the generating subunit 2042 is configured to generate, by the text speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
  • the synthesis processing module 30 includes:
  • the sending unit 301 is configured to send the generated role list and sample audio parameters to the television terminal;
  • the receiving unit 302 is configured to receive user setting parameters that are feedback by the television terminal according to the role list and the sample audio parameters.
  • the server 1 sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface. And the sample audio parameters to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server 1.
  • the synthesizing unit 303 is configured to perform audio filtering on the first audio data, and synthesize the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters.
  • the new audio data corresponding to the subtitle data may be generated according to the text speech engine (specifically according to a setting parameter of the user), and according to the vocal elimination program, The first audio data is subjected to vocal cancellation, and then the new audio data and the vocal-removed first audio data are synthesized into second audio data corresponding to the character list.
  • the existing vocal elimination method mainly uses the same vocal pronunciation in the left and right channels, and subtracts the left and right channels to remove the same part of the two channels, but this method is not only
  • the background sound causes a large loss (especially in the low frequency part), and when the vocal pronunciation is different in the two channels, the vocal is not well eliminated.
  • a bandpass filter is used, and in the frequency band of the bandpass filter, only the amplitude of the original pronunciation is reduced, and the discrimination of the synthesized audio is not affected, so that the low frequency and high frequency portions can be better preserved. .
  • the present invention also provides a television broadcast control system 100.
  • the television broadcast control system 100 includes a television terminal 2 and a server 1 as described above.
  • the television terminal 2 include:
  • a second sending module 50 configured to send first audio data and caption data to the server 1;
  • the second receiving module 60 is configured to receive a role list and sample audio parameters generated by the server 1 after the first audio data and the caption data are identified and processed;
  • the feedback module 70 is configured to generate a user setting parameter according to the role list and the sample audio parameter, and feed back the user setting parameter to the server 1;
  • the obtaining module 80 is configured to acquire second audio data that is synthesized by the server 1 when the user setting parameter is received, where the first audio data is synthesized;
  • the synchronous play module 90 is configured to synchronously play the second audio data, the video data, and the caption data.
  • the television terminal 2 when receiving the second audio data corresponding to the role list synthesized by the server 1, the television terminal 2 synchronizes the second audio data with the video data and the caption data. Finally, the playback is performed, so that the audio of the video file is pre-processed by the server 1 to synthesize a language that the user can understand, which can enhance the user's viewing experience; in addition, the user can also provide various character audio selections, thereby further enhancing the user's viewing experience. User experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

La présente invention porte sur un procédé de commande de lecture de télévision, comprenant les étapes suivantes : recevoir, par un serveur, des premières données audio et des données de sous-titres transmises par un terminal de télévision ; effectuer un processus d'identification sur les premières données audio et les données de sous-titres pour générer une liste de rôles et un échantillon de paramètres audio ; transmettre la liste de rôles et l'échantillon de paramètre audio au terminal de télévision et, lors de la réception d'un paramètre de réglage d'utilisateur renvoyé par le terminal de télévision selon la liste de rôles et l'échantillon de paramètre audio, synthétiser les premières données audio en secondes données audio ; et transmettre les secondes données audio au terminal de télévision de manière à commander la lecture des secondes données audio et des données de sous-titres sur le terminal de télévision. L'invention concerne également un serveur et un système de commande de lecture de télévision. La présente invention peut fournir de façon correspondante un audio qui peut être compris par un utilisateur en fonction des exigences linguistiques de différents utilisateurs, afin d'éviter le défaut selon lequel les dialogues des personnages et les intrigues ne peuvent être compris que par des sous-titres, améliorant ainsi l'expérience d'utilisateur lorsque l'on regarde la télévision.
PCT/CN2016/084461 2015-09-29 2016-06-02 Procédé de commande de lecture de télévision, serveur et système de commande de lecture de télévision Ceased WO2017054488A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510633934.3A CN105227966A (zh) 2015-09-29 2015-09-29 电视播放控制方法、服务器及电视播放控制系统
CN201510633934.3 2015-09-29

Publications (1)

Publication Number Publication Date
WO2017054488A1 true WO2017054488A1 (fr) 2017-04-06

Family

ID=54996603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/084461 Ceased WO2017054488A1 (fr) 2015-09-29 2016-06-02 Procédé de commande de lecture de télévision, serveur et système de commande de lecture de télévision

Country Status (2)

Country Link
CN (1) CN105227966A (fr)
WO (1) WO2017054488A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112714348A (zh) * 2020-12-28 2021-04-27 深圳市亿联智能有限公司 智能音视频同步方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227966A (zh) * 2015-09-29 2016-01-06 深圳Tcl新技术有限公司 电视播放控制方法、服务器及电视播放控制系统
CN107659850B (zh) * 2016-11-24 2019-09-17 腾讯科技(北京)有限公司 媒体信息处理方法和装置
CN107172449A (zh) * 2017-06-19 2017-09-15 微鲸科技有限公司 多媒体播放方法、装置及多媒体存储方法
CN107484016A (zh) * 2017-09-05 2017-12-15 深圳Tcl新技术有限公司 视频的配音切换方法、电视机及计算机可读存储介质
CN109242802B (zh) * 2018-09-28 2021-06-15 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及计算机可读介质
CN110366032B (zh) * 2019-08-09 2020-12-15 腾讯科技(深圳)有限公司 视频数据处理方法、装置和视频播放方法、装置
CN113766288B (zh) * 2021-08-04 2023-05-23 深圳Tcl新技术有限公司 电量提示方法、装置以及计算机可读存储介质
CN114554285B (zh) * 2022-02-25 2024-08-02 京东方科技集团股份有限公司 视频插帧处理方法、视频插帧处理装置和可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1774715A (zh) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 用于对音频-视频流执行自动配音的系统和方法
CN101189657A (zh) * 2005-05-31 2008-05-28 皇家飞利浦电子股份有限公司 一种用于对多媒体信号执行自动配音的方法和设备
CN101359473A (zh) * 2007-07-30 2009-02-04 国际商业机器公司 自动进行语音转换的方法和装置
US20120105719A1 (en) * 2010-10-29 2012-05-03 Lsi Corporation Speech substitution of a real-time multimedia presentation
US20120259630A1 (en) * 2011-04-11 2012-10-11 Samsung Electronics Co., Ltd. Display apparatus and voice conversion method thereof
CN105227966A (zh) * 2015-09-29 2016-01-06 深圳Tcl新技术有限公司 电视播放控制方法、服务器及电视播放控制系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1534595A (zh) * 2003-03-28 2004-10-06 中颖电子(上海)有限公司 语音转换合成装置及其方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1774715A (zh) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 用于对音频-视频流执行自动配音的系统和方法
CN101189657A (zh) * 2005-05-31 2008-05-28 皇家飞利浦电子股份有限公司 一种用于对多媒体信号执行自动配音的方法和设备
CN101359473A (zh) * 2007-07-30 2009-02-04 国际商业机器公司 自动进行语音转换的方法和装置
US20120105719A1 (en) * 2010-10-29 2012-05-03 Lsi Corporation Speech substitution of a real-time multimedia presentation
US20120259630A1 (en) * 2011-04-11 2012-10-11 Samsung Electronics Co., Ltd. Display apparatus and voice conversion method thereof
CN105227966A (zh) * 2015-09-29 2016-01-06 深圳Tcl新技术有限公司 电视播放控制方法、服务器及电视播放控制系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112714348A (zh) * 2020-12-28 2021-04-27 深圳市亿联智能有限公司 智能音视频同步方法

Also Published As

Publication number Publication date
CN105227966A (zh) 2016-01-06

Similar Documents

Publication Publication Date Title
WO2017054488A1 (fr) Procédé de commande de lecture de télévision, serveur et système de commande de lecture de télévision
WO2014003283A1 (fr) Dispositif d'affichage, procédé de commande de dispositif d'affichage, et système interactif
WO2017143692A1 (fr) Téléviseur intelligent et son procédé de commande vocale
WO2018043991A1 (fr) Procédé et appareil de reconnaissance vocale basée sur la reconnaissance de locuteur
WO2014107097A1 (fr) Appareil d'affichage et procédé de commande dudit appareil d'affichage
WO2018032680A1 (fr) Procédé et système de lecture audio et vidéo
WO2017177524A1 (fr) Procédé et dispositif de synchronisation de lecture audio et vidéo
WO2017160073A1 (fr) Procédé et dispositif pour une lecture, une transmission et un stockage accélérés de fichiers multimédia
WO2018006489A1 (fr) Procédé et dispositif d'interaction vocale de terminal
WO2014107102A1 (fr) Appareil d'affichage et procédé de commande d'un appareil d'affichage
WO2017045441A1 (fr) Procédé et appareil de lecture audio utilisant une télévision intelligente
WO2019051902A1 (fr) Procédé de commande de terminal, climatiseur et support d'informations lisible par un ordinateur
WO2016091011A1 (fr) Dispositif et procédé de commutation de sous-titres
WO2016032021A1 (fr) Appareil et procédé de reconnaissance de commandes vocales
WO2017005066A1 (fr) Procédé et appareil d'enregistrement d'estampille temporelle de synchronisation audio et vidéo
WO2014187158A1 (fr) Procédé, serveur, et terminal pour contrôler le partage de données de terminal en nuage
WO2018028124A1 (fr) Téléviseur, et procédé de commutation de source de signal correspondant
WO2019114127A1 (fr) Procédé et dispositif de sortie vocale pour conditionneur d'air
WO2017020649A1 (fr) Procédé de commande de lecture audio/vidéo et dispositif associé
WO2018233221A1 (fr) Procédé de sortie sonore multi-fenêtre, télévision et support de stockage lisible par ordinateur
WO2017121066A1 (fr) Procédé et système d'affichage de programme d'application
WO2019085543A1 (fr) Système de télévision et procédé de commande de télévision
WO2019088627A1 (fr) Appareil électronique et procédé de commande associé
WO2017080195A1 (fr) Procédé et dispositif de reconnaissance audio
WO2019091128A1 (fr) Procédé de prévisualisation de signal pour une nouvelle source d'accès, et poste de télévision

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16850113

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16850113

Country of ref document: EP

Kind code of ref document: A1