WO2025108378A1 - Procédé et appareil de traitement pour données de conférence, procédé et appareil de division pour contenu multimédia, procédé et appareil de génération de condensé, dispositif électronique et support lisible par ordinateur - Google Patents
Procédé et appareil de traitement pour données de conférence, procédé et appareil de division pour contenu multimédia, procédé et appareil de génération de condensé, dispositif électronique et support lisible par ordinateur Download PDFInfo
- Publication number
- WO2025108378A1 WO2025108378A1 PCT/CN2024/133540 CN2024133540W WO2025108378A1 WO 2025108378 A1 WO2025108378 A1 WO 2025108378A1 CN 2024133540 W CN2024133540 W CN 2024133540W WO 2025108378 A1 WO2025108378 A1 WO 2025108378A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- conference
- sub
- data
- media content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Definitions
- Embodiments of the present disclosure relate to a method and device for processing conference data, a method and device for dividing media content, a method and device for generating a summary, an electronic device, and a computer-readable medium.
- users need to quickly understand the specific content. For example, in a video playback scenario, users may need to understand the general content of the video to determine whether to continue watching. For another example, in a meeting scenario, after the meeting is over, users may need to view the meeting content to understand what was discussed in the meeting.
- a web conference also known as an online conference, refers to a conference based on the Internet. Participants of a web conference can initiate and participate in a conference through the Internet. Participants of a web conference can use the client that provides web conference services to automatically record the content of the web conference and generate conference data for the web conference. After the meeting, participants can use the conference data to view the content of the meeting.
- An embodiment of the present disclosure provides a method for processing conference data, including: acquiring conference data of a network conference, wherein the conference data includes voice data of the network conference; determining the conference type of the network conference based on the conference data; dividing the network conference based on the conference type and the conference content of the network conference to obtain conference segments of the network conference.
- An embodiment of the present disclosure provides a method for dividing media content, including: acquiring data of media content, wherein the media content includes content of at least two content dimensions; determining a target segmentation point of the media content based on the content dimensions and the data of the media content; and determining sub-content of the media content based on the target segmentation point.
- An embodiment of the present disclosure provides a summary generation method, comprising: obtaining content data of each sub-content included in media content, wherein the sub-content is obtained by dividing the media content; determining a summary of the sub-content based on the content data of each sub-content; and fusing the summaries of each sub-content based on the weight of each sub-content to obtain a summary of the media content, wherein the weight of the sub-content is used to indicate the importance of the sub-content in the media content.
- An embodiment of the present disclosure provides a device for processing conference data, including: a first acquisition unit, configured to acquire conference data of a network conference, wherein the conference data includes voice data of the network conference; a first determination unit, configured to determine the conference type of the network conference based on the conference data; and a first division unit, configured to divide the network conference based on the conference type and the conference content of the network conference to obtain conference segments of the network conference.
- An embodiment of the present disclosure provides a device for dividing media content, including: a second acquisition unit, configured to obtain data of media content, wherein the media content includes content of at least two content dimensions; a second determination unit, configured to determine a target segmentation point of the media content based on the content dimension and the data of the media content; and a second division unit, configured to determine sub-content of the media content based on the target segmentation point.
- An embodiment of the present disclosure provides a summary generation device, including: a third acquisition unit, configured to acquire content data of each sub-content included in the media content, wherein the sub-content is obtained by dividing the media content; a third determination unit, configured to determine the summary of the sub-content based on the content data of each sub-content; and a generation unit, configured to merge the summaries of each sub-content based on the weight of each sub-content to obtain the summary of the media content, wherein the weight of the sub-content is used to represent the importance of the sub-content in the media content.
- An embodiment of the present disclosure provides an electronic device, comprising: at least one processor; a storage device, on which at least one program is stored, and when the at least one program is executed by the at least one processor, the at least one processor implements the conference data processing method provided by any embodiment of the present disclosure, the media content division method provided by any embodiment of the present disclosure, or the summary generation method provided by any embodiment of the present disclosure.
- An embodiment of the present disclosure provides a computer-readable medium having a computer program stored thereon, wherein when the program is executed by a processor, the conference data processing method provided by any embodiment of the present disclosure, the media content division method provided by any embodiment of the present disclosure, or the summary generation method provided by any embodiment of the present disclosure is implemented.
- FIG1 is a flow chart of a method for processing conference data provided by an embodiment of the present disclosure
- FIG2 is a schematic diagram of the structure of a conference data processing device provided by an embodiment of the present disclosure.
- FIG3 is a flow chart of a method for dividing media content provided by an embodiment of the present disclosure
- FIG4 is a schematic structural diagram of a device for dividing media content provided by an embodiment of the present disclosure.
- FIG5 is a flowchart of a summary generation method provided by an embodiment of the present disclosure.
- FIG6 is a schematic diagram of the structure of a summary generation device provided by an embodiment of the present disclosure.
- FIG. 7 is a schematic diagram of a basic structure of an electronic device provided by an embodiment of the present disclosure.
- conference data for recording the content of the conference can be generated.
- the conference data includes, for example, voice data of the online conference. Users with permission to view the content of the conference can view the conference data and understand the content of the conference. However, for longer meetings, there is more content in the meeting. It is inconvenient for users to view a large amount of conference content.
- conference data of a network conference is obtained.
- the conference data includes voice data of the network conference.
- the conference type of the network conference is determined.
- the network conference is then divided based on the conference type and the conference content of the network conference to obtain conference segments of the network conference.
- a method for processing conference data can be applied to electronic devices with conference data processing capabilities.
- the electronic device for example, can be a server, or a terminal.
- the terminal includes but is not limited to a smart phone, a tablet computer, a laptop computer, a personal digital assistant (PDA) or a smart wearable device.
- the server can be a cloud server, such as a central server in a central computing cluster, or an edge server in an edge computing cluster.
- the server can also be a server in a local data center.
- a local data center refers to a data center directly controlled by a user.
- the electronic device obtains conference data of a network conference including voice data, and determines the conference type of the network conference by analyzing the conference data.
- the network conference is divided based on the conference type to obtain conference segments of the network conference including different conference contents.
- the conference segments obtained by dividing according to the conference type and conference content are more reasonable. In this way, the user can understand the conference content included in the conference segment, which is convenient for the user to view the conference content and improve the user experience.
- the method for processing conference data may include S101-S103:
- S101 Acquire conference data of a network conference, where the conference data includes voice data of the network conference.
- the conference data of a network conference is data generated during the network conference and used to record the content of the conference.
- the disclosed embodiments do not limit the source of the conference data of the network conference.
- the conference data of the network conference is generated by a participant of the network conference triggering a recording function of the network conference.
- the conference data of the network conference is data generated during the network conference and used to implement conference interaction between participants.
- the conference data of the online conference includes voice data.
- the voice data is generated based on the voices of the participants of the online conference.
- the conference data of the online conference also includes other types of data such as video data, shared content display data, and conference reservation data.
- Video data is generated based on the videos of the participants of the online conference.
- Shared content display data is generated based on the content shared by the participants in the online conference.
- the shared content is, for example, a device screen, a file, a video, a picture, etc.
- the conference reservation data is the data of the online conference that the participants have reserved to join.
- the conference reservation data includes relevant information about the online conference input by the participants.
- the type of data included in the conference data of the online conference is determined by the interaction method adopted by the participants in the online conference.
- S102 Determine the conference type of the network conference based on the conference data.
- the conference type can be a predefined type based on the need for segmenting the web conference.
- the disclosed embodiments do not limit the way of dividing conference types.
- network conferences are divided into interview types, sharing types, multi-topic types and general types.
- an interview type network conference for example, can be a meeting attended by two participants, where one party asks questions and the other party answers.
- a sharing type network conference for example, can be a meeting attended by at least two participants, where one participant gives explanations.
- a sharing type network conference can also be a meeting attended by at least two participants, where one participant shares and explains content, and other participants ask questions.
- a multi-topic type network conference is a meeting that discusses multiple conference topics.
- a multi-topic type network conference is usually attended by multiple participants.
- a general type network conference is a meeting that does not belong to the interview type, sharing type and multi-topic type.
- the above method of dividing conference types is only used as an example, and does not limit the specific method of dividing the conference types of network conferences.
- the conference types of network conferences can be divided according to the number of participants.
- Network conferences of different conference types have different characteristics of conference content.
- the conference data of the network conference is analyzed to determine the conference type of the network conference.
- the embodiments of the present disclosure provide two specific implementations of analyzing conference data to obtain the conference type of the network conference, which are described below for details.
- S103 Divide the online conference based on the conference type and the conference content of the online conference to obtain conference segments of the online conference.
- the embodiments of the present disclosure do not limit the manner of dividing network conferences based on conference type and conference content.
- the segmentation rules corresponding to the conference type are pre-configured.
- the segmentation rules corresponding to the conference type can be pre-set based on the conference type and segmentation needs. Taking the interview type as an example, the segmentation rule corresponding to the interview type is that one question and one answer are regarded as a conference segment. Alternatively, the segmentation rule corresponding to the interview type is that questions and answers belonging to the same question are regarded as a conference segment. In other words, the first question, follow-up question, and answer all belong to one conference segment.
- the conference data of the network conference is first processed to obtain the conference content of the network conference, and then the conference content is divided based on the conference content according to the segmentation rules corresponding to the conference type.
- the conference content of the network conference can be represented by conference text data obtained by performing speech recognition processing on the voice data of the network conference.
- a segmentation model corresponding to the conference type is pre-trained.
- the segmentation model can identify the conference content of the online conference based on the conference data of the online conference, and segment the online conference based on the conference type.
- the segmentation model corresponding to the conference type can be trained using the conference data of the conference type and the segmentation results of the online conference.
- the conference data of the online conference is processed using the segmentation model corresponding to the conference type to obtain the conference segmentation determined by the segmentation model. It should be noted that for voice data of general types of online conferences, a general segmentation model can be used to divide the conference segments.
- the disclosed embodiments do not limit the method of identifying different conference segments.
- the segmentation time point is determined.
- the segmentation time point is used to identify that the voice data before the segmentation time point and the voice data after the segmentation time point are different conference segment voice data.
- the following embodiments of the present disclosure provide two possible implementation methods for determining the conference type of a network conference based on conference data.
- the first type analyzing the conference data to obtain the communication information of the network conference.
- the communication information of the network conference includes information related to the participants participating in the network conference and information related to the communication process of the participants in the network conference.
- the communication information of the network conference at least includes information representing the number of participants, information representing the communication form and information representing the communication content.
- the information representing the number of participants is used to indicate the number of participants, that is, the number of participants participating in this network conference.
- the information representing the communication form is used to indicate the communication form, which is the form of communication between participants.
- the communication form includes, for example, a question-and-answer form, an explanation form, a discussion form, etc.
- the information representing the communication content indicates the communication content, which is the specific content of the communication between the participants.
- the information representing the communication form and the information representing the communication content can be determined based on the behavior information of the participants.
- the behavior information of the participants includes the speaking time period of the participants and the speech content of the participants.
- the speaking time period of the participants is the time period for each participant to speak each time in the network conference.
- the speech content of the participants is the speech content of each participant in each speech in the network conference.
- the behavior information of the participants can be obtained based on the analysis of the conference data.
- the conference data also includes the shared content display data of the network conference.
- the communication form and the communication content can also be determined based on the shared content display data. For example, based on the shared content display data, it can be determined that the communication information includes the explanation form. Based on the shared content display data, it can be determined that the communication content includes the shared content.
- the conference type of the network conference is determined based on one or more of the information characterizing the number of participants, the information characterizing the communication form, and the information characterizing the communication content included in the communication information.
- the determination conditions of the conference type are determined in advance based on the characteristics of each conference type. The determination conditions of the conference type that can be satisfied by the communication information are determined, and the conference type that satisfies the conditions is determined as the conference type of the network conference.
- the determination condition of the interview type may be, for example, that the number of participants is two and the communication form is a question-and-answer form. Based on the information characterizing the number of participants and the information characterizing the communication form included in the communication information, it is possible to determine whether the online conference is an interview type. For example, the meeting type of an online conference with two participants and a question-and-answer form is determined to be an interview type.
- the conditions for determining the interview type may be: the number of participants is two, the communication format is that the ratio of the speaking time of each participant to the total duration of the online conference is less than 20%, and the communication content is that the proportion of questions in the speech of one of the two participants is greater than 80%.
- the determination condition of the sharing type may be that the communication form includes the explanation form.
- the communication form includes the explanation form.
- the conference type of an online conference whose communication form includes the explanation form may be determined as an interview type.
- the conditions for determining the sharing type may be: the number of participants is greater than or equal to two, the communication form is that the ratio of the speaking time of the first participant among the participants to the total duration of the online conference is greater than 60%, the communication content is that the proportion of questions in the speech of the second participant is greater than 80%, and the speech of the first participant includes shared content.
- the determination condition of the multi-topic type may be that the communication content includes multiple conference topics. Based on the information characterizing the communication content included in the communication information, it is possible to determine whether the network conference is of the multi-topic type. For example, the conference type of a network conference whose communication content includes multiple conference topics may be determined as an interview type.
- the conditions for determining the multi-topic type are: the number of participants is greater than or equal to two, the communication form is that there are participants among multiple participants whose ratio of speaking time to the total time of the online meeting is greater than 30%, and the communication content is that the correlation of the participants' speech content in different speaking time periods is less than 20%.
- the speaking time of the participant is the length of time included in the speaking time period of the participant.
- the proportion of question content in the speech content of the participant is the ratio of question sentences included in the speech content.
- the conference type of the network conference can be determined based on the communication information.
- an artificial intelligence model for determining the conference type of a network conference is also pre-trained.
- the training data of the artificial intelligence model is, for example, training communication information obtained by analyzing the training conference data of the training network conference and a label corresponding to the training communication information.
- the training communication information includes one or more of information representing the number of participants, information representing the form of communication, and information representing the content of communication.
- the label corresponding to the training communication information is the conference type of the training network conference.
- the communication information is input into the trained artificial intelligence model to obtain the conference type of the network conference output by the artificial intelligence model.
- the second type: the conference data also includes conference reservation data for the online conference.
- the conference reservation data includes the conference reservation type.
- the conference reservation type can be determined by the information of the scheduled conference pre-entered by the participants. For example, the conference reservation type is determined based on the title of the scheduled conference. For example, the title of the scheduled conference is "xx's sharing conference.” Based on "xx's sharing conference”, the conference reservation type is determined to be a sharing type.
- the conference reservation type is determined based on the meeting theme of the scheduled conference. For example, the meeting theme of the scheduled conference is "Discussion on business a, business b, and business c.” Based on "Discussion on business a, business b, and business c", the conference reservation type is determined to be a multi-topic type.
- the disclosed embodiments do not limit the method for determining the conference reservation type.
- the conference reservation type can be determined by analyzing the semantics of the conference reservation data.
- the conference reservation type can be determined by identifying keywords included in the text of the conference reservation data.
- the conference reservation type can be used as reference information for determining the conference type of the network conference.
- the conference type of the network conference is determined based on the communication information and the conference reservation type.
- the communication information it is determined whether the communication information can satisfy the determination condition of the conference type corresponding to the conference reservation type. If the communication information can satisfy the determination condition of the conference type, the conference type of the network conference is determined to be the conference type, that is, the conference reservation type. If the communication information cannot satisfy the determination condition of the conference type, it is determined that the communication information can satisfy other determination conditions of conference types.
- an artificial intelligence model for determining the conference type of a network conference is pre-trained.
- the training data of the artificial intelligence model is, for example, training communication information, training conference appointment type, and conference type label obtained by analyzing the training conference data of the training network conference.
- the training communication information includes one or more of information representing the number of participants, information representing the communication form, and information representing the communication content.
- the conference type label is the conference type of the training network conference.
- the communication information and the conference appointment type are input into the trained artificial intelligence model to obtain the conference type of the network conference output by the artificial intelligence model.
- a web conference may include parts of different conference types.
- part of the web conference is a question-and-answer session between the interviewer and the interviewee
- part of the web conference is a process of explaining the interviewee's answers to the interview test questions.
- the present disclosure provides a possible implementation method for dividing a network conference based on the conference type and the conference content of the network conference, including:
- the sub-conferences are divided based on the conference type and conference content of the sub-conference.
- a sub-conference is a segment of a web conference that belongs to the same conference type. For example, the first 50% of the web conference belongs to the interview type, and the last 50% is the sharing type. The first 50% and the last 50% of the web conference are divided into two sub-conferences.
- the sub-conference is divided to achieve the division of the network conference.
- the method of segmenting the sub-conference is similar to the method of segmenting the network conference in S103 above, and will not be described in detail here.
- the conference segments of each sub-conference obtained after the segmentation can be used as the conference segments of the network conference.
- the divided conference segments can also be merged. For example, when the number of conference segments of a web conference is large, the divided conference segments are merged. As an example, determine whether the conference segments of a web conference exceed the quantity threshold.
- the quantity threshold is the maximum number of conference segments of a web conference that is preset. As an example, the quantity threshold is 12 segments per hour. In other words, if the ratio of the number of conference segments to the number of hours of the web conference is greater than 12, it means that the number of conference segments exceeds the quantity threshold. For example, a web conference with a total duration of 2 hours has 30 conference segments, which exceeds the quantity threshold of 24.
- text data is first extracted from the conference data of each conference segment, and the text data is input into a semantic extraction model to obtain the semantics of the conference segment output by the semantic extraction model.
- text data is first extracted from the conference data of each conference segment, keywords are extracted from the text data, and the keywords are used to represent the semantics of the conference segment.
- the present disclosure embodiment also provides a conference data processing device, which will be described below in conjunction with the accompanying drawings.
- FIG2 is a schematic diagram of the structure of a conference data processing device provided by an embodiment of the present disclosure.
- the conference data processing device includes:
- a first acquisition unit 201 is configured to acquire conference data of a network conference, where the conference data includes voice data of the network conference;
- a first determining unit 202 is configured to determine a conference type of the network conference based on the conference data
- the first dividing unit 203 is configured to divide the network conference based on the conference type and the conference content of the network conference to obtain conference segments of the network conference.
- the first determining unit 202 is specifically configured to obtain communication information of the network conference based on the conference data; and determine the conference type of the network conference based on the communication information.
- the communication information includes one or more of the following information:
- Information representing the number of participants, information representing the form of communication, and information representing the content of communication are included in
- the conference data further includes shared content display data of the network conference, and the shared content display data is used to determine information representing the communication form and information representing the communication content.
- the communication information includes information representing a communication form
- the first determining unit 202 is configured to determine the conference type of the network conference based on the communication information, including:
- the first determining unit 202 is configured to determine the conference type of the network conference as a sharing type if the information representing the communication form indicates that the communication form includes a teaching form.
- the communication information includes information representing the number of participants and information representing the communication form.
- the first determination unit 202 is configured to determine the conference type of the network conference based on the communication information, including:
- the first determining unit 202 is configured to determine the conference type of the network conference as an interview type if the information representing the number of participants indicates that the number of participants is two and the information representing the communication form indicates that the communication form is a question-and-answer form.
- the communication information includes information representing the communication content
- the first determination unit 202 is configured to determine the conference type of the network conference based on the communication information, including:
- the first determining unit 202 is configured to determine the conference type of the network conference as a multi-topic type if the information representing the communication content indicates that the communication content includes multiple conference topics.
- the conference data further includes a conference reservation type of the network conference
- the first determining unit 202 is specifically configured to obtain communication information and a conference reservation type of the network conference based on the conference data
- the first division unit 203 is configured to divide the network conference based on the conference type and the conference content of the network conference, including:
- the first division unit 203 is configured to divide the network conference into multiple sub-conferences according to the conference type, where a sub-conference is a conference segment of the network conference belonging to the same conference type; and divide the sub-conference based on the conference type and conference content of the sub-conference.
- the first division unit 203 is configured to divide the network conference based on the conference type and the conference content of the network conference, including:
- the first division unit 203 is configured to adopt a segmentation rule corresponding to the conference type and divide the network conference according to the conference content of the network conference.
- the first division unit 203 is configured to divide the network conference based on the conference type and the conference content of the network conference, including:
- the first segmentation unit 203 is configured to process the conference data of the network conference by using the segmentation model corresponding to the conference type, where the segmentation model is used to segment the network conference.
- the conference data processing device further includes:
- the merging unit is configured to determine the semantics of each conference segment; and merge the conference segments that are adjacent in time periods in the network conference and whose semantic similarity is greater than or equal to a similarity threshold.
- Media content is content expressed through various means of communication. Media content can be transmitted through the Internet, making it easy for users to view it on the Internet. Some media content includes a lot of content. If a user is interested in part of the content of the media content, the user may need to view other content included in the media content to determine the location of the part of the content that the user is interested in in the media content and to view the part of the content that the user is interested in. This will result in a poor experience for users viewing media content.
- the embodiments of the present disclosure provide a method, apparatus, device and medium for dividing media content.
- data of the media content is obtained, and the media content includes data content of at least two content dimensions; based on the content dimensions and the data of the media content, the target segmentation points of the media content are determined, and the sub-content of the media content is determined using the target segmentation points.
- a method for dividing media content provided by an embodiment of the present disclosure can be applied to electronic devices with data processing capabilities.
- the electronic device for example, can be a server, or a terminal.
- the terminal includes but is not limited to a smart phone, a tablet computer, a laptop computer, a personal digital assistant (PDA) or a smart wearable device.
- the server can be a cloud server, such as a central server in a central computing cluster, or an edge server in an edge computing cluster.
- the server can also be a server in a local data center.
- a local data center refers to a data center directly controlled by a user.
- the electronic device obtains data of media content, where the media content includes data content of at least two content dimensions; based on the content dimensions and the data of the media content, a target segmentation point of the media content is determined, and sub-content of the media content is determined using the target segmentation point.
- the method for dividing media content may include S301-S303:
- S301 Acquire data of media content, where the media content includes content of at least two content dimensions.
- Media content is content generated based on a communication method.
- the embodiments of the present disclosure do not limit the specific type of media content.
- the media content is film and television video content, or the media content is live video content, or the media content is conference content.
- the data of the media content can include one or more types of data among video data, audio data, text data and image data.
- the data of the media content is specifically determined based on the type of the media content.
- the media content includes content of multiple content dimensions.
- the embodiments of the present disclosure do not limit the way of dividing the content dimensions.
- different content dimensions are divided based on different content generation methods.
- different content dimensions are divided based on different data types.
- the content dimensions included in the media content include one or more of the following content dimensions:
- Voice content dimension shared screen content dimension, shared document content dimension, stage time content dimension and stage content dimension.
- S302 Determine target segmentation points of the media content based on content dimensions and data of the media content.
- Contents of different content dimensions have different characteristics. According to the characteristics of the various content dimensions included in the media content, the data of the media content is processed separately using the division methods corresponding to each content dimension to determine the segmentation points used to divide the media content.
- the segmentation points of the media content are used to identify the division positions for dividing the media content. Taking the example that the data of the media content includes audio data or video data, the segmentation points of the media content are the moments in the timeline of the media content where the media content needs to be divided. Taking the example that the data of the media content includes text data, the segmentation points of the media content are the delimiters for dividing text in the text data.
- the embodiments of the present disclosure respectively provide voice content dimension, shared screen content dimension, shared document content dimension, stage time content dimension and stage content dimension. These five content dimensions determine the implementation methods of the segmentation points of the media content. Please see below for details.
- the embodiments of the present disclosure do not limit possible implementation methods for determining the target segmentation point.
- each segmentation point determined by multiple content dimensions is used as a target segmentation point of the media content.
- the segmentation points determined based on each content dimension are all used as candidate segmentation points of the media content, and a target segmentation point is selected from the candidate segmentation points of the media content.
- the embodiments of the present disclosure also do not limit the implementation method of selecting the target segmentation point from the candidate segmentation points.
- multiple candidate segmentation points that meet the merging condition are merged into one target segmentation point, and candidate segmentation points that do not meet the merging condition are used as the target segmentation point.
- the merging condition is, for example, that the distance between the division positions of the media content identified by the candidate segmentation points is less than a threshold.
- the division positions of the divided media content indicated by the multiple candidate segmentation points that meet the merging condition are relatively small, which can indicate that the accuracy of the division at the division position is relatively high.
- the merging condition is, for example, that the time interval between the candidate segmentation points is less than 5 minutes. The disclosed embodiment does not limit the way to merge the candidate segmentation points.
- any candidate segmentation point among the candidate segmentation points that meet the merging condition is selected as the target segmentation point obtained by merging the candidate segmentation points that meet the merging condition.
- the candidate segmentation points that do not meet the merging condition are used as target segmentation points. In this way, the number of target segmentation points can be reduced and the number of sub-contents can be reduced on the basis of ensuring the accuracy of dividing the media content, so as to facilitate user viewing.
- the segmentation confidence of each candidate segmentation point is determined.
- the segmentation confidence is used to measure the segmentation accuracy of the candidate segmentation point, and also represents the reliability of the media content division from the candidate segmentation point.
- the segment confidence can be determined by a pre-set segment confidence configuration rule.
- the segmentation confidence is determined based on the division method of the candidate segmentation point.
- the segmentation confidence is determined by the dimension type of the content dimension corresponding to the candidate segmentation point. Taking the above five content dimensions as an example, based on the voice content dimension and the shared content dimension, that is, the shared screen content dimension and the shared document content dimension, the segmentation confidence of the determined candidate segmentation point is relatively high. Based on the description content dimension, that is, the stage time content dimension and the stage content dimension, the segmentation confidence of the determined candidate segmentation point is relatively low. For another example, the segmentation confidence is determined based on the accuracy of the candidate segmentation point determined by the division method.
- the accuracy of the candidate segmentation point determined based on the description content dimension is determined by the similarity of the content before and after the candidate segmentation point. If the similarity is high, the accuracy of the candidate segmentation point is low and the segmentation confidence is low. If the similarity is low, the accuracy of the candidate segmentation point is high and the segmentation confidence is high.
- the segmentation confidence is determined based on the segmentation granularity of the candidate segmentation point.
- Segmentation granularity refers to the granularity of dividing the media content of the candidate segmentation point.
- Segmentation granularity can be determined based on the type of content dimension. For example, taking the above five content dimensions as an example, based on the voice content dimension and the shared content dimension, that is, the shared screen content dimension and the shared document content dimension, the determined candidate segmentation points are fine-grained candidate segmentation points. Based on the description content dimension, that is, the stage time content dimension and the stage content dimension, the determined candidate segmentation points are coarse-grained candidate segmentation points. Coarse-grained candidate segmentation points may have the problem of inaccurate division, and the segmentation confidence of coarse-grained candidate segmentation points is low. The accuracy of the division of fine-grained candidate segmentation points is higher, and the segmentation confidence is higher.
- each segmentation confidence of the candidate segmentation point can be calculated as the segmentation confidence of the candidate segmentation point.
- the weight of each segmentation confidence can be determined by the method of determining the segmentation confidence. For example, the weight of the segmentation confidence determined according to the division method is greater than the weight of the segmentation confidence determined according to the segmentation granularity.
- each candidate segmentation point After determining the segmentation confidence of each candidate segmentation point, sort the segmentation confidence of each candidate segmentation point from high to low, and select the candidate segmentation point whose segmentation confidence is ranked before the preset sequence number as the target segmentation point. Alternatively, select the candidate segmentation point whose segmentation confidence is greater than the confidence threshold as the high-priority candidate segmentation point, and select the candidate segmentation point whose segmentation confidence is less than or equal to the confidence threshold as the low-priority candidate segmentation point. Select the high-priority candidate segmentation point as the target segmentation point.
- S303 Determine sub-content of the media content based on the target segmentation point.
- the media content is divided according to the target segmentation point to obtain sub-content of the media content.
- the media content is the conference content
- the sub-content is the conference segment content.
- the target segmentation points of the media content determined according to the content dimension can achieve the division of the media content based on the characteristics of the content dimension, and the obtained sub-contents have a high content relevance and the division of the media content is more accurate.
- the user can understand the content of the sub-content by viewing the sub-content, and quickly browse the media content without carefully viewing the complete media content, thereby improving the user experience.
- the following introduces possible implementation methods of determining segmentation points of media content using five content dimensions, namely, voice content dimension, shared screen content dimension, shared document content dimension, stage time content dimension, and stage content dimension, provided in the embodiments of the present disclosure.
- the data of the media content includes voice data.
- the media content includes voice content, and at least two content dimensions involved in the media content include a voice content dimension.
- the first segmentation point is determined in the following way according to the voice content dimension:
- A1 Based on the voice data, the speech content of the participant with the process guidance identity is obtained.
- the voice data included in the media content may be the voice data of the person who participated in the generation of the media content.
- the media content is the content of the meeting.
- the person is the participant of the meeting.
- the person may include a person with a process guide identity.
- the person with a process guide identity is responsible for guiding the communication process of the content.
- the person with a process guide identity is the host of the meeting, or the organizer of the meeting.
- the embodiments of the present disclosure do not limit the method of determining the person whose identity information is the process guide identity.
- the person pre-set as the process guide can be determined.
- the person as the process guide is determined based on the person name or person type. For example, a person whose person type is "host" is determined as the person as the process guide.
- the speech content of the person with the process guide identity has the characteristic of guiding the process.
- the voice data included in the data of the media content is analyzed to determine the person with the process guide identity.
- the speech content of the person in the media content is analyzed.
- the person with the process guidance identity is determined by detecting the statements related to the process guidance in the speech content, or the statements with similar semantics to the statements related to the process guidance.
- the statements related to the process guidance can be pre-set reference statements. Reference statements are, for example, "I will preside over the process of today's meeting", "First, discuss", etc.
- the speech content of the person whose speech order is within the sequence threshold is analyzed.
- the sequence threshold can be determined based on the number of people included in the media content. For example, for media content including 5 people, the sequence threshold is, for example, 3. In this way, the scope of detecting people with the process guidance identity can be narrowed and costs can be reduced.
- a recognition model capable of identifying the identity of a person in media content is pre-trained.
- the recognition model can use input voice data, or text data obtained by recognizing the voice data, to determine the identity of a person in the media content.
- the voice data included in the data, or the text data obtained by recognizing the voice data is input into the recognition model, and the process guide identity of the person output by the recognition model is obtained.
- A2 Determine the first segmentation point of the media content based on different process stages indicated by the speech content.
- the speech content of the person with the process guide identity includes content indicating different process stages of the media content.
- semantic recognition is performed on the speech content of the person with the process guide identity to determine the sentences indicating different process stages.
- the sentences indicating different process stages are, for example, process sentences and summary sentences.
- the first segmentation point for dividing the media content of the different process stages is determined.
- the switching moment of different process stages is used as the first segmentation point of the media content.
- the first segmentation point is used to determine the target segmentation point.
- the above method for determining the first segmentation point based on the voice content dimension is only an example, and the embodiments of the present disclosure are not limited thereto.
- the first segmentation point can be determined based on the moment when a person pauses in the voice data.
- the data of the media content includes shared data.
- the media content includes shared content, and at least two content dimensions involved in the media content include shared content dimensions.
- the shared data includes one or more of shared screen data and shared document data.
- Shared screen data is, for example, video data generated by a shared device screen or an interface window displayed on a shared device screen.
- Shared screen data can reflect the screen content shared and displayed by a person. Different screen contents can reflect different content stages in media content.
- Shared document data is video data generated by a shared document.
- Shared document data can reflect shared document content. Different document contents displayed can reflect different content stages in media content.
- the shared content dimension includes a shared screen content dimension.
- the shared content dimension includes a shared document content dimension.
- the second segmentation point is determined for the shared screen content dimension in the following manner:
- the switching of the content of the shared screen can reflect the change of the content stage included in the media content.
- the embodiments of the present disclosure do not limit the method for determining the switching of the content of the shared screen.
- the images displayed on the shared screen in different time periods are obtained from the shared screen data. Then, by determining the similarity of the images displayed on the screens in adjacent time periods, it is determined whether the content of the shared screen has changed. The moment separating two time periods whose image similarity is less than a similarity threshold is used as the moment of switching the content of the shared screen.
- the operation of the operating cursor in the shared screen is detected to determine the moment of switching the content of the shared screen.
- the moment when the operating cursor clicks the button for switching pages is used as the moment of switching the content of the shared screen.
- the moment of starting the shared screen and the moment of ending the shared screen are used as the moments of switching the content of the shared screen.
- B2 Determine a second segmentation point of the media content based on the moment when the content of the shared screen is switched.
- the moment when the content of the shared screen is switched is determined as the second segmentation point for dividing the media content.
- the moment when the content of the shared screen is switched is partially determined as the second segmentation point for dividing the media content.
- the second segmentation point is used to determine the target segmentation point.
- the third segmentation point is determined for the shared document content dimension in the following way:
- the shared document title can reflect the different contents involved in the process of sharing the document.
- the shared document title can be determined based on the processing of the shared document data.
- the position of the operating cursor in the display area of the document is detected based on the shared document data.
- the content of the currently shared document is determined. For example, when it is detected that the operating cursor selects the document title, or is in the display area of the document title, or the operating cursor moves to the display area of the content corresponding to different document titles, it is determined that the content of the document has changed, and this moment is used as the switching moment of the document content.
- the shared document uses a special display method to display the document content being shared. Based on the shared document data, the moment when the document title to which the document content specially displayed in the document belongs changes is used as the switching moment of the document content.
- C2 Determine the third segmentation point of the media content based on the switching time of the document content.
- the determined switching moment of the document content is used as the third segmentation point of the media content.
- some switching moments are selected from the determined switching moment of the document content as the third segmentation point of the media content.
- the third segmentation point is used to determine the target segmentation point.
- the media content includes description content.
- the at least two content dimensions involved in the media content include the description content dimension.
- the description content is content describing the media content.
- the data of the media content includes description data.
- the description data is, for example, text data.
- the media content is conference content, and the description content is conference agenda content.
- the description data includes one or more of stage time data and stage content data.
- the stage time data includes time information of different content stages in the media content.
- the stage content data includes information of main content of different content stages in the media content.
- the stage time data includes the time information of different content stages in the meeting content, for example: the first half hour of the meeting discusses topic a, and the second half discusses topic b. Another example is: 4:00-5:00 discuss issue x, and 5:00-5:30 discuss issue y.
- the stage content data includes the information of the main content of different content stages in the meeting content, for example: the meeting discusses: 1. Topic a; 2. Topic b.
- the description content dimension includes phase time content dimension.
- the description content dimension includes phase content dimension.
- the fourth segmentation point is determined in the following way for the stage time content dimension:
- a fourth segmentation point of the media content is determined.
- each moment indicated by the stage time content is used as the fourth segmentation point of the media content.
- the main content of the media content in the time period before and the main content of the media content in the time period after each moment indicated by the stage time content are determined.
- the duration of the previous time period and the subsequent time period can be a preset duration. If the content similarity of the main content of the media content in the previous time period of the moment and the main content of the media content in the subsequent time period is less than or equal to a threshold, the moment is used as the fourth segmentation point of the media content. If the content similarity of the main content of the media content in the previous time period of the moment and the main content of the media content in the subsequent time period is greater than a threshold, the moment is discarded.
- the fifth segmentation point is determined in the following way for the stage content dimension:
- the media content is clustered based on the stage content, and the fifth segmentation point of the media content is determined based on different clustered contents.
- the stage content data indicates the main content of each content stage included in the media content.
- the media content is clustered.
- the data of the media content includes voice data, and the voice data is converted into text data.
- the sentences included in the text data are semantically clustered to achieve clustering of the media content.
- the clustering categories are the different main content categories indicated by the stage content data.
- the above are five possible implementations of determining segmentation points based on content dimensions provided in the embodiments of the present disclosure.
- the above implementations of determining segmentation points are all examples and are not intended to be limiting methods of determining segmentation points based on content dimensions.
- the embodiment of the present disclosure further provides a media content division device, which will be described below in conjunction with the accompanying drawings.
- FIG. 4 this figure is a schematic diagram of the structure of a media content division device provided by an embodiment of the present disclosure.
- the media content division device includes:
- the second acquisition unit 401 is configured to acquire data of media content, where the media content includes content of at least two content dimensions;
- a second determination unit 402 is configured to determine a target segmentation point of the media content based on the content dimension and the data of the media content;
- the second segmentation unit 403 is configured to determine the sub-content of the media content based on the target segmentation point.
- the second determination unit 402 is specifically configured to determine candidate segmentation points of the media content under the content dimension based on the division method corresponding to the content dimension and the data of the media content; and determine the target segmentation points of the media content based on the candidate segmentation points.
- the second determining unit 402 is configured to determine a target segmentation point of the media content based on the candidate segmentation points, including:
- the second determination unit 402 is configured to determine a target segmentation point of the media content based on the segmentation confidence of the candidate segmentation point, where the segmentation confidence is used to measure the accuracy of the segmentation of the candidate segmentation point.
- the segmentation confidence is determined based on a division method for determining candidate segmentation points.
- the segmentation confidence is determined based on the segmentation granularity of the candidate segmentation point, where the segmentation granularity indicates the degree of refinement of the media content divided by the candidate segmentation point.
- the media content includes voice content
- the content dimension includes a voice content dimension
- the first segmentation point is determined in the following manner for the voice content dimension, where the first segmentation point is used to determine the target segmentation point:
- the media content includes shared content.
- the shared content includes shared screen content
- the content dimension includes a shared screen content dimension
- the second segmentation point is determined in the following manner for the shared screen content dimension, where the second segmentation point is used to determine the target segmentation point:
- a second segmentation point of the media content is determined.
- a third segmentation point of the media content is determined.
- the media content includes description content.
- the description content includes stage time content, which is a stage time content dimension.
- stage time content which is a stage time content dimension.
- the fourth segmentation point is determined in the following manner for the stage time content dimension, and the fourth segmentation point is used to determine the target segmentation point:
- a fourth segmentation point of the media content is determined.
- the description content includes stage content, which is a stage content dimension.
- stage content which is a stage content dimension.
- the fifth segmentation point is determined in the following manner for the stage content dimension, and the fifth segmentation point is used to determine the target segmentation point:
- the media content is clustered based on the stage content, and the fifth segmentation point of the media content is determined based on different clustered contents.
- the media content is conference content
- the sub-content of the media content is conference segment content
- Media content refers to content expressed through various communication methods.
- media content includes one or more of video content, image content, audio content, and text content.
- Media content can be transmitted through the Internet, which is convenient for users to view through the Internet.
- media contents include richer content.
- a summary of the main content of the media content can be provided to the user. By viewing the summary, the user can quickly understand the specific content included in the media content, which is convenient for the user to choose to view the required media content, or enhance the understanding of the media content.
- the current summary of media content is difficult to accurately describe the main content included in the media content.
- the embodiments of the present disclosure provide a summary generation method, apparatus, device and medium.
- content data of each sub-content included in the media content is obtained, and a summary of each sub-content is obtained based on the content data of each sub-content.
- the summary of the sub-content can describe the main content of the sub-content. Extracting the summary of the sub-content first can reduce the difficulty of generating the summary of the media content. Based on the weight of each sub-content, the summary of each sub-content is fused to obtain the summary of the media content. Among them, the weight of the sub-content can reflect the importance of the sub-content in the media content.
- the summary of the sub-content is fused, which can reduce the omission of important content and avoid excessive description of non-important content.
- the summary of the media content obtained can more accurately describe the main content of the media content, which is convenient for users to understand the media content through the summary.
- the electronic device may be, for example, a server or a terminal.
- the terminal includes but is not limited to a smart phone, a tablet computer, a laptop computer, a personal digital assistant (PDA) or a smart wearable device.
- the server may be a cloud server, such as a central server in a central computing cluster, or an edge server in an edge computing cluster.
- the server may also be a server in a local data center.
- a local data center refers to a data center directly controlled by a user.
- the electronic device obtains content data of each sub-content included in the media content, obtains a summary of each sub-content based on the content data of each sub-content, and then fuses the summaries of each sub-content based on the weight of each sub-content to obtain a summary of the media content.
- the weight of the sub-content is used to indicate the importance of the sub-content in the media content.
- the method for generating a summary may include S501-S503:
- S501 Acquire content data of each sub-content included in the media content.
- the media content includes, for example, one or more of video content, image content, audio content, and text content, which is not limited in the embodiments of the present disclosure.
- Sub-content is a portion of media content obtained by dividing the media content.
- the media content includes at least two sub-contents.
- the embodiments of the present disclosure do not limit the manner of dividing sub-contents.
- the media content is evenly divided according to the duration to obtain various sub-contents. For example, if the media content is video content, the video content is divided into 20-minute divisions to obtain a plurality of sub-contents.
- the content included in the media content is analyzed, the content is clustered, and the media content is divided based on the same type of content. In this way, part of the content with a high content relevance can be divided into a sub-content.
- the sub-content obtained by such division conforms to the actual content structure of the media content, and can extract a more accurate summary of the sub-content, thereby generating a summary of the media content that more accurately summarizes the media content.
- the media content is the content of a conference
- the sub-content is the content of a sub-conference obtained by dividing the conference.
- the disclosed embodiment does not limit the method of dividing a conference into sub-conferences.
- a meeting is divided into sub-meetings based on the meeting type and the meeting content of the meeting.
- the meeting type of the meeting can be determined based on the communication information of the meeting.
- the communication information of the meeting includes, for example, one or more of information representing the number of participants, information representing the form of communication, and information representing the content of communication.
- the meeting is divided into sub-meetings using a division model corresponding to the meeting type.
- the meeting is divided into sub-meetings using a division rule corresponding to the meeting type.
- the content of the meeting includes content of at least two content dimensions.
- the meeting is divided based on the content dimension to determine sub-conferences.
- a division method corresponding to the content dimension is used to determine candidate segmentation points of the meeting and the segmentation confidence of the candidate segmentation points.
- the target segmentation points are determined according to the segmentation confidence, and the meeting is divided using the target segmentation points to obtain sub-conferences.
- the meeting is divided using a division method corresponding to the content dimension to obtain meeting segments.
- the qualified meeting segments are then merged to obtain sub-conferences.
- Qualified meeting segments are, for example, meeting segments whose time periods in the meeting are adjacent and whose content similarity is greater than a similarity threshold.
- the meeting is a recurring scheduled meeting.
- the sub-content is the content of at least one scheduled meeting included in the recurring scheduled meeting.
- the meeting is a regular meeting held every Monday afternoon.
- the sub-content is at least one Monday afternoon meeting included in the regular meeting.
- the content data of the sub-content is data related to the sub-content. Taking the sub-content as a sub-conference as an example, the content data of the sub-conference is conference data. The present disclosure embodiment does not limit the specific type of the content data of the sub-content. As an example, the content data of the sub-content is one or more of audio data, video data, image data and text data.
- S502 Determine a summary of each sub-content based on the content data of each sub-content.
- the summary of the sub-content is used to describe the main content of the sub-content.
- the disclosed embodiment does not limit the implementation method of determining the summary of each sub-content based on the content data of each sub-content.
- the keywords of the sub-content are determined by analyzing the content data of the sub-content.
- the summary of the sub-content is generated based on the keywords.
- the content data of the sub-content is processed using a second language processing model to obtain a summary of the sub-content.
- the second language processing model has a natural language processing function.
- the second language processing model can analyze the input content data and output a summary.
- the amount of content data of each sub-content included in the media content is slightly smaller, which facilitates processing the content data of the sub-content to obtain a summary of the sub-content.
- the cost of generating a summary of the sub-content is low, and the accuracy of summarizing the sub-content is high, thereby improving the accuracy of the summary of the obtained media content.
- S503 The summaries of the sub-contents are integrated based on the weights of the sub-contents to obtain a summary of the media content.
- the embodiment of the present disclosure does not limit the method for determining the weight of the sub-content.
- the weight of the sub-content may be set by, for example, the producer of the media content.
- the weight of each sub-content in the media content may be set by, for example, the organizer, host, or other person with conference management authority of the web conference.
- the weight of the sub-content is determined based on the content information of the sub-content.
- the content information of the sub-content is determined by the time information of the sub-content in the media content and the content data.
- the content information of the sub-content includes the time information of the sub-content in the media content and the relevant information of the specific content determined by the content data.
- the time information of the sub-content in the media content can reflect the importance of the sub-content in the media content to a certain extent.
- the sub-content at the beginning stage of the media content is usually the introduction part and has a lower importance.
- the sub-content at the end stage of the media content is usually the summary part and has a higher importance. Analyzing the content data can obtain the specific content of the sub-content, and then determine the importance of the sub-content and the weight of the sub-content.
- an artificial intelligence model for determining the weight of a sub-content is pre-trained.
- the artificial intelligence model is trained using training data including training content information and a label of the training content information.
- the label of the training content information is the weight of the training content information.
- the trained artificial intelligence model can determine the weight of the sub-content based on the content information of the input sub-content.
- the label of the training content information is the importance value of the training content information.
- the importance value is used to measure the importance of the training content information.
- the trained artificial intelligence model can determine the importance value of the sub-content based on the content information of the input sub-content. Then, the weight of the sub-content is determined based on the weight corresponding to the importance value of the sub-content.
- the content information of the sub-content is analyzed to obtain sub-information of at least one dimension that can determine the weight of the sub-content.
- the embodiment of the present disclosure does not limit the division method of the dimensions included in the content information.
- the sub-information included in the content information is one or more of the duration of the sub-content, the number of characters involved in the sub-content, and the position of the time period of the sub-content in the time period of the media content.
- the duration of the sub-content and the position of the time period of the sub-content in the time period of the media content can be determined by analyzing the time information of the sub-content in the media content.
- the number of characters involved in the sub-content can be obtained by analyzing the content data of the sub-content.
- the characters involved in the sub-content are participants, and the sub-information also includes the meeting identity of the participant in the meeting.
- the meeting identity can be determined based on the speech characteristics of the participant's speech style, speech frequency, etc. included in the content data of the sub-content. For example, for a speech style of summarizing events and a high speech frequency, the participant is determined to be the speaker. For a speech style of promoting the process and a high speech frequency, the participant is determined to be the host.
- Each sub-information has a corresponding sub-weight.
- the sub-weight of the sub-information can be determined based on a pre-set sub-weight determination rule, for example. For example, the sub-weight of the sub-content in the initial stage of the media content has a lower value, and the sub-weight of the sub-content in the final stage of the media content has a higher value.
- the weight of the sub-content is determined based on the sub-weight of each sub-information.
- the statistical value of the sub-weight of each sub-information included in the sub-content is calculated as the weight of the sub-content.
- the statistical value is, for example, a value obtained by using a data statistical method, such as an average value, a weighted average value, and a median.
- the weight of the sub-content can affect the proportion of the summary of the sub-content in the summary of the media content. Based on the weight of each sub-content, the summary of each sub-content is merged to obtain the summary of the media content.
- the embodiments of the present disclosure do not limit the implementation method of obtaining a summary of the media content by fusing the summaries of the sub-contents based on the weights of the sub-contents.
- the weights of the sub-contents and the summaries of the sub-contents are processed by a first language processing model to obtain a summary of the media content.
- the first language processing model has the ability of natural language processing.
- the first language processing model can be used to fuse the summaries based on the summaries and the weights of the summaries and output the summaries.
- rules for fusing summaries are pre-set. The rules for fusing summaries are used to define the method for fusing summaries of different weights. Based on the rules for fusing summaries, the summaries of the sub-contents are processed to obtain a summary of the media content.
- extracting the summary of the sub-content first and then fusing the summary of the sub-content to obtain the summary of the media content can reduce the difficulty of generating the summary of the media content and facilitate the generation of the summary.
- fusing the summary of the sub-content can reduce the omission of important content and avoid excessive description of non-important content.
- the summary of the media content obtained can more accurately summarize the main content of the media content, meet the needs of users to view the summary of the media content, and improve the user experience.
- the embodiment of the present disclosure further provides a summary generation device, which will be described below in conjunction with the accompanying drawings.
- FIG6 is a schematic diagram of the structure of a summary generation device provided by an embodiment of the present disclosure.
- the summary generation device includes:
- the third acquisition unit 601 is configured to acquire content data of each sub-content included in the media content, where the sub-content is obtained by dividing the media content;
- the third determining unit 602 is configured to determine the summary of the sub-content based on the content data of each sub-content;
- the generating unit 603 is configured to merge the summaries of the sub-contents based on the weights of the sub-contents to obtain the summary of the media content.
- the weights of the sub-contents are used to indicate the importance of the sub-contents in the media content.
- the weight of the sub-content is determined according to content information of the sub-content, and the content information is determined based on time information of the sub-content in the media content and content data of the sub-content.
- the weight of the sub-content is determined based on the sub-weight corresponding to the sub-information included in the content information of the sub-content.
- the content information includes one or more of the following sub-information: the duration of the sub-content, the number of characters involved in the sub-content, and the position of the time period of the sub-content in the time period of the media content.
- the weight of the sub-content is determined based on an artificial intelligence model, and the artificial intelligence model is used to output the weight based on the input content information.
- the generating unit 603 is specifically configured to process the weights of each sub-content and the summaries of each sub-content based on the first language processing model to obtain the summary of the media content.
- the content data is text data
- the third determining unit 602 is configured to process the content data of each sub-content respectively based on the second language processing model to obtain a summary of each sub-content.
- the media content is the content of a meeting
- the sub-content is the content of a sub-meeting obtained by dividing the meeting, or the meeting is a recurring scheduled meeting
- the sub-content is the content of at least one scheduled meeting included in the recurring scheduled meeting.
- the sub-conferences are obtained by dividing the conference in the following manner: dividing the conference based on the conference type and the conference content to obtain the sub-conferences.
- the sub-conferences are obtained by dividing the conference in the following manner: the conference is divided based on at least two content dimensions of the conference to obtain sub-conferences.
- the present disclosure also provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are executed by one or more processors, the one or more processors implement the conference data processing method, media content division method or summary generation method provided in any of the above embodiments.
- FIG7 shows a schematic diagram of the structure of an electronic device 700 suitable for implementing the embodiment of the present disclosure.
- the terminal device in the embodiment of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (portable android devices), PMPs (Portable Media Players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs (televisions), desktop computers, etc.
- the electronic device shown in FIG7 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
- the electronic device 700 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 701, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 702 or a program loaded from a storage device 708 to a random access memory (RAM) 703.
- a processing device e.g., a central processing unit, a graphics processing unit, etc.
- RAM random access memory
- various programs and data required for the operation of the electronic device 700 are also stored.
- the processing device 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704.
- An input/output (I/O) interface 705 is also connected to the bus 704.
- the following devices may be connected to the I/O interface 705: input devices 708 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 707 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 708 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 709.
- the communication device 709 may allow the electronic device 700 to communicate wirelessly or wired with other devices to exchange data.
- FIG. 7 shows an electronic device 700 with various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or have alternatively.
- an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, and the computer program contains program code for executing the method shown in the above flowchart.
- the computer program can be downloaded and installed from the network through the communication device 709, or installed from the storage device 708, or installed from the ROM 702.
- the processing device 701 the above functions defined in the method for processing conference data, the method for dividing media content, or the method for generating a summary provided in the embodiment of the present disclosure are executed.
- the electronic device provided by the embodiment of the present disclosure and the conference data processing method, media content division method or summary generation method provided by the above-mentioned embodiments belong to the same inventive concept.
- the technical details not fully described in this embodiment can be referred to the above-mentioned embodiments, and this embodiment has the same beneficial effects as the above-mentioned embodiments.
- the embodiments of the present disclosure provide a computer storage medium on which a computer program is stored, wherein when the program is executed by a processor, the conference data processing method, media content division method or summary generation method provided in any of the above embodiments is implemented.
- the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
- Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
- This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
- the computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
- the program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
- the client and server may communicate using any currently known or future developed network protocol such as HTTP (Hyper Text Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network).
- HTTP Hyper Text Transfer Protocol
- Examples of communication networks include a local area network ("LAN”), a wide area network ("WAN”), an internet (e.g., the Internet), and a peer-to-peer network (e.g., an ad hoc peer-to-peer network), as well as any currently known or future developed network.
- the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
- the computer-readable medium carries one or more programs.
- the electronic device executes the conference data processing method, media content division method or summary generation method provided in any of the above embodiments.
- Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including, but not limited to, object-oriented programming languages, such as Java, Smalltalk, C++, and conventional procedural programming languages, such as "C" or similar programming languages.
- the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
- LAN local area network
- WAN wide area network
- Internet service provider e.g., AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
- the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
- each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
- the name of a unit/module does not limit the unit itself in some cases.
- a voice data acquisition module may also be described as a "data acquisition module”.
- exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), and the like.
- FPGAs field programmable gate arrays
- ASICs application specific integrated circuits
- ASSPs application specific standard products
- SOCs systems on chips
- CPLDs complex programmable logic devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
- a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or flash memory erasable programmable read-only memory
- CD-ROM portable compact disk read-only memory
- CD-ROM compact disk read-only memory
- magnetic storage device or any suitable combination of the foregoing.
- At least one (item) means one or more, and “plurality” means two or more.
- “And/or” is used to describe the association relationship of associated objects, indicating that three relationships may exist.
- a and/or B can mean: only A exists, only B exists, and A and B exist at the same time, where A and B can be singular or plural.
- the character “/” generally indicates that the objects associated before and after are in an “or” relationship.
- At least one of the following” or similar expressions refers to any combination of these items, including any combination of single or plural items.
- At least one of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c", where a, b, c can be single or multiple.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Telephonic Communication Services (AREA)
Abstract
Le mode de réalisation de la présente divulgation concerne un procédé et un appareil de traitement pour des données de conférence, un procédé et un appareil de division pour du contenu multimédia, un procédé et un appareil de génération de condensé, un dispositif électronique et un support lisible par ordinateur. Le procédé de traitement pour des données de conférence comprend : l'acquisition de données de conférence d'une conférence en réseau, les données de conférence comprenant des données vocales de la conférence en réseau ; sur la base des données de conférence, la détermination du type de conférence de la conférence en réseau ; et sur la base du type de conférence et du contenu de conférence de la conférence en réseau, la division de la conférence en réseau pour obtenir des segmentations de conférence de la conférence en réseau. Le procédé de division de contenu multimédia comprend : l'acquisition de données du contenu multimédia, le contenu multimédia comprenant au moins deux dimensions de contenu du contenu ; sur la base des dimensions de contenu et des données du contenu multimédia, la détermination de points de segmentation cibles du contenu multimédia ; et sur la base des points de segmentation cibles, la détermination d'un sous-contenu du contenu multimédia. Le procédé de génération de condensé comprend : l'acquisition de données de contenu de chaque sous-contenu compris dans du contenu multimédia, le sous-contenu étant obtenu par division du contenu multimédia ; sur la base des données de contenu de chaque sous-contenu, la détermination d'un condensé du sous-contenu ; et sur la base d'un poids de chaque sous-contenu, la fusion des condensés du sous-contenu pour obtenir un condensé du contenu multimédia, les poids du sous-contenu étant utilisés pour représenter le degré d'importance du sous-contenu dans le contenu multimédia.
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311569394.8 | 2023-11-22 | ||
| CN202311569394.8A CN117609491A (zh) | 2023-11-22 | 2023-11-22 | 一种摘要生成方法、装置、设备及介质 |
| CN202311569183.4A CN117478444A (zh) | 2023-11-22 | 2023-11-22 | 一种会议数据的处理方法、装置、设备及介质 |
| CN202311569948.4 | 2023-11-22 | ||
| CN202311569948.4A CN117615227A (zh) | 2023-11-22 | 2023-11-22 | 一种媒体内容的划分方法、装置、设备及介质 |
| CN202311569183.4 | 2023-11-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025108378A1 true WO2025108378A1 (fr) | 2025-05-30 |
Family
ID=95826064
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/133540 Pending WO2025108378A1 (fr) | 2023-11-22 | 2024-11-21 | Procédé et appareil de traitement pour données de conférence, procédé et appareil de division pour contenu multimédia, procédé et appareil de génération de condensé, dispositif électronique et support lisible par ordinateur |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025108378A1 (fr) |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106021226A (zh) * | 2016-05-16 | 2016-10-12 | 中国建设银行股份有限公司 | 一种文本摘要生成方法及装置 |
| CN107211058A (zh) * | 2015-02-03 | 2017-09-26 | 杜比实验室特许公司 | 基于会话动态的会议分段 |
| CN111918145A (zh) * | 2019-05-07 | 2020-11-10 | 华为技术有限公司 | 视频分段方法和视频分段装置 |
| CN113096687A (zh) * | 2021-03-30 | 2021-07-09 | 中国建设银行股份有限公司 | 音视频处理方法、装置、计算机设备及存储介质 |
| CN116319697A (zh) * | 2023-04-11 | 2023-06-23 | 北京百度网讯科技有限公司 | 一种在线会议实现方法、装置、电子设备及存储介质 |
| JP2023113052A (ja) * | 2022-02-02 | 2023-08-15 | 株式会社日立製作所 | ウェブ会議を支援するシステム及び方法 |
| CN117478444A (zh) * | 2023-11-22 | 2024-01-30 | 北京字跳网络技术有限公司 | 一种会议数据的处理方法、装置、设备及介质 |
| CN117615227A (zh) * | 2023-11-22 | 2024-02-27 | 北京字跳网络技术有限公司 | 一种媒体内容的划分方法、装置、设备及介质 |
| CN117609491A (zh) * | 2023-11-22 | 2024-02-27 | 北京字跳网络技术有限公司 | 一种摘要生成方法、装置、设备及介质 |
-
2024
- 2024-11-21 WO PCT/CN2024/133540 patent/WO2025108378A1/fr active Pending
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107211058A (zh) * | 2015-02-03 | 2017-09-26 | 杜比实验室特许公司 | 基于会话动态的会议分段 |
| CN106021226A (zh) * | 2016-05-16 | 2016-10-12 | 中国建设银行股份有限公司 | 一种文本摘要生成方法及装置 |
| CN111918145A (zh) * | 2019-05-07 | 2020-11-10 | 华为技术有限公司 | 视频分段方法和视频分段装置 |
| CN113096687A (zh) * | 2021-03-30 | 2021-07-09 | 中国建设银行股份有限公司 | 音视频处理方法、装置、计算机设备及存储介质 |
| JP2023113052A (ja) * | 2022-02-02 | 2023-08-15 | 株式会社日立製作所 | ウェブ会議を支援するシステム及び方法 |
| CN116319697A (zh) * | 2023-04-11 | 2023-06-23 | 北京百度网讯科技有限公司 | 一种在线会议实现方法、装置、电子设备及存储介质 |
| CN117478444A (zh) * | 2023-11-22 | 2024-01-30 | 北京字跳网络技术有限公司 | 一种会议数据的处理方法、装置、设备及介质 |
| CN117615227A (zh) * | 2023-11-22 | 2024-02-27 | 北京字跳网络技术有限公司 | 一种媒体内容的划分方法、装置、设备及介质 |
| CN117609491A (zh) * | 2023-11-22 | 2024-02-27 | 北京字跳网络技术有限公司 | 一种摘要生成方法、装置、设备及介质 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7572108B2 (ja) | 議事録のインタラクション方法、装置、機器及び媒体 | |
| CN114501064B (zh) | 一种视频生成方法、装置、设备、介质及产品 | |
| CN109255037B (zh) | 用于输出信息的方法和装置 | |
| CN113574555A (zh) | 基于自动学习和用户输入的上下文分析的智能摘要 | |
| CN112804580B (zh) | 一种视频打点的方法和装置 | |
| JP7462070B2 (ja) | インタラクション情報処理方法、装置、電子デバイス及び記憶媒体 | |
| CN106326486B (zh) | 基于人工智能的推送信息的方法和装置 | |
| CN113111197B (zh) | 多媒体内容的推荐方法、装置、设备及存储介质 | |
| CN113011169A (zh) | 一种会议纪要的处理方法、装置、设备及介质 | |
| WO2019144849A1 (fr) | Procédé et dispositif d'envoi d'informations à un utilisateur | |
| CN115052188B (zh) | 一种视频剪辑方法、装置、设备及介质 | |
| US20180024982A1 (en) | Real-time dynamic visual aid implementation based on context obtained from heterogeneous sources | |
| CN115964553A (zh) | 页面显示方法、装置、设备及存储介质 | |
| CN113420723A (zh) | 获取视频热点的方法、装置、可读介质和电子设备 | |
| WO2024099171A1 (fr) | Procédé et appareil de génération de vidéo | |
| CN117609491A (zh) | 一种摘要生成方法、装置、设备及介质 | |
| CN117478444A (zh) | 一种会议数据的处理方法、装置、设备及介质 | |
| CN117615227A (zh) | 一种媒体内容的划分方法、装置、设备及介质 | |
| WO2025108378A1 (fr) | Procédé et appareil de traitement pour données de conférence, procédé et appareil de division pour contenu multimédia, procédé et appareil de génération de condensé, dispositif électronique et support lisible par ordinateur | |
| WO2025092132A1 (fr) | Procédé et appareil de traitement de données, et support de stockage | |
| CN115547330B (zh) | 基于语音交互的信息展示方法、装置和电子设备 | |
| CN114697763B (zh) | 一种视频处理方法、装置、电子设备及介质 | |
| CN113808582B (zh) | 语音识别方法、装置、设备及存储介质 | |
| CN116629236A (zh) | 一种待办事项提取方法、装置、设备及存储介质 | |
| CN112699687A (zh) | 内容编目方法、装置和电子设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24893533 Country of ref document: EP Kind code of ref document: A1 |