WO2014071152A1 - Teleconferencing for participants at different locations - Google Patents
Teleconferencing for participants at different locations Download PDFInfo
- Publication number
- WO2014071152A1 WO2014071152A1 PCT/US2013/068000 US2013068000W WO2014071152A1 WO 2014071152 A1 WO2014071152 A1 WO 2014071152A1 US 2013068000 W US2013068000 W US 2013068000W WO 2014071152 A1 WO2014071152 A1 WO 2014071152A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- segments
- audio data
- location
- audio
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
- H04L12/1827—Network arrangements for conference optimisation or adaptation
Definitions
- the present invention relates to telecommunications networks, and more particularly to teleconferencing.
- Teleconferencing allows people to communicate remotely over a
- Teleconferencing equipment receives the audio signals from each person (each teleconference participant), mixes the audio, and sends the mixed audio to each participant. Additionally, if video teleconferencing is available, then a participant may receive images (e.g. photographs or computer screen images) from one or more other participants. The images can be displayed on a computer monitor.
- images e.g. photographs or computer screen images
- Some teleconferencing embodiments of the present invention step away from imitating face-to-face interaction between participants, and such embodiments enhance a teleconference with features not available in face-to-face interaction.
- the participants' audio is not mixed and hence not obscured by other participants.
- some embodiments allow people at different locations to have a discussion (i.e. a teleconference) using a telecommunications network (possibly voice and/or data network) in such a way that the following benefits and conveniences are provided:
- Participation schedules need not be precisely coordinated: a person can join the teleconference late yet still participate and hear all of the discussion. This can be achieved, for example, by recording each participant's contribution for later reproduction by any participant including those who join late.
- Interruptions such as a call on another phone or other distraction, also do not cause any of the discussion to be missed by any person.
- the person being distracted can hear the other participants' contributions later during the teleconference if the contributions are recorded.
- the other participants thus do not have to wait for the distracted person; they can continue the discussion, or they can listen to earlier recorded contributions if desired.
- the distracted person can hear all the communications other than his own in the same order as everyone else.
- a speaker can pause briefly without being interrupted by another speaker who starts speaking at the pause - both speakers can speak at the same time.
- a moderator is not needed.
- a moderator can help establish priorities but is not needed to achieve the previously mentioned benefits.
- Some embodiments do not need a moderator to prioritize speakers.
- Muting is automatically applied to reduce noise from locations where no one is speaking to make a contribution.
- [0017] A meeting among several people of an organization when those people are in different locations. Some of them might be traveling and some working in home offices.
- An example of a kind of meeting that might benefit significantly is a brainstorming session because each person can contribute at the time his idea occurs. Long pauses in the discussion are less burdensome because if multiple participants speak at the same time, then each participant can listen to other participants' audio during a pause.
- Figure 1 illustrates audio data flow in some embodiments of the present invention.
- Figure 2 illustrates operations performed by teleconferencing equipment according to some embodiments of the present invention.
- Figure 3 illustrates user interface according to some embodiments of the present invention.
- Figure 4 illustrates limits imposed in a teleconference according to some embodiments of the present invention.
- Figure 5 illustrates priorities used in teleconferencing according to some embodiments of the present invention.
- Figure 6 illustrates an audio segment according to some embodiments of the present invention.
- Figure 7 illustrates data flow in serialization according to some embodiments of the present invention.
- Figure 8 is a flowchart of segmentation according to some embodiments of the present invention.
- Figure 9 is a block diagram of a teleconferencing system according to some embodiments of the present invention.
- Figure 10 is a block diagram of a central computing system used in
- Figure 11 is a block diagram of a teleconferencing participant' s system according to some embodiments of the present invention.
- Figure 1 illustrates a conference (i.e. teleconference) of four participants at respective four locations 110A, 110B, 1 IOC, 110D for some embodiments of the present invention.
- each location has a microphone 120 and a speaker device 130.
- the microphone converts the audio signals (i.e. air oscillation signals) into electrical signals
- the speaker device performs the opposite transformation, as known in the art.
- the number of participants and other details are not limiting.
- Figure 2 illustrates exemplary operations. These operations can be performed by any suitable systems, which may include one or more computer processors executing computer instructions (which can be called software) that are stored on a computer readable media (e.g. disks, tapes, semiconductor memories, and other types of storage). Alternatively or in addition, parts or all of one or more operations can be executed by non- software-programmable systems, e.g. systems that include non-software circuits and computer storage.
- the operations are:
- Audio data which are physical signals, e.g. electrical. This process can be performed by microphone 120 and possibly other equipment as known in the art. This description often refers to "spoken" contribution, but other types of audio can also be present (e.g. music, touch-tone digital codes, etc.). Audio data can be digital data.
- each location 110 Break up the audio data from each participant (each location 110) into segments (step 1020).
- the segments are shown in Figure 1 as 140 A, 140B, ... for the audio data from respective locations 110A, HOB, ....
- a segment 140 (i.e. 110A or HOB etc.) can be defined by audio data that records one person's continuously spoken contribution.
- a location 110 may generate multiple segments or no segments, as any participant might contribute more than one segment at any given time in the discussion, but a segment 140 contains an uninterrupted contribution from one person. However, in some cases, where several participants share a single conference room at each location 110 (possibly with a single microphone 120), one segment 140 might contain contributions from more than one of the participants in the conference room.
- the start and end of each segment 140 are determined as shown at step 1020 of Figure 2 (steps 1010 and 1020 can be combined). For example, the end of a segment can be automatically determined by a minimum length pause.
- step 1030 the segments from all participants are placed in a single sequence (e.g. sequence 150 in Figure 1).
- This process is called serialization herein; the segments are serialized.
- the sequence could be created on a single storage device or system (e.g. 264 in Figure 9) by a central computing device such as a computer server (e.g. 260 in Figure 9).
- reference number 150 is used to refer both to the sequence of serialized segments and to an individual segment 140 in the sequence.
- the serialized segments 140 are played for each participant (e.g. by respective speaker devices 130) in the order of sequence 150 when each participant is ready to hear it.
- Sequence 150 is thus organized as a queue (first- in-first-out). Every participant hears the same sequence 150, but not necessarily at the same time. An exception is that a participant might or might not hear the segments 140 created from his own contributions. Also, in some embodiments, a participant may re-wind to an earlier point in the conference, or issue a command to skip a segment, or issue some other command to payback the segments in a different order.
- the audio output is paused for him at his speaker device 130 until he finishes speaking, when the output is resumed for him.
- step 1030 (defining a segment's place in sequence 150) can be performed before the segment's end is determined at step 1020, or even before the entire segment's audio is captured at step 1010; a segment can be played while its audio is still being generated by a participant.
- the playback step 1040 can also begin before the segment's audio is all captured.
- step 1030 is performed by a central computer system
- some embodiments have no central computer system at all: all audio is transmitted to all locations 110 other than the location at which the audio was captured, and the serialization 1030 is performed at each location 110.
- Segmentation 1020 can be performed at the playback locations 110, or at the location at which the audio is captured (at 1010), or by a central computer or computers.
- steps 1020 and 1030 are distributed, i.e. are performed in concert at locations 110 or at one or more locations 110 and a central computer or computers.
- the playback is performed in the same segment order for all the participants (except perhaps that each participant is not played the participant's own contribution). Therefore, it may be simpler to perform serialization by a central computer system. On the other hand, it may be more efficient for at least some of the segmentation work to be performed at locations 110. Such segmentation work may include, for example, elimination of long pauses.
- a user interface at each location 110 can provide the participant (or participants) at that location with information 610 (Figure 3) about the discussion.
- the user interface may include video devices such as light indicators, computer monitors, etc., and may include audio devices (e.g. audio alarms).
- the information 610 can be defined by data obtained by conferencing equipment at location 110.
- Information 610 may include any combination of:
- Playback latency 610c This is the time required to hear all of the remaining discussion not yet heard (not yet played) at location 110 but already recorded by other participants.
- the remaining discussion may include just the serialized segments 150, or may possibly include all or some of the audio not yet serialized but already captured (at step 1010).
- - Information 610d shows, to a speaking participant, whether his speaking has been recognized and is being stored for a segment 140.
- the participant's speaking may be rejected if it violates one or more limits discussed below (see Figure 4), or may be rejected due to a technical malfunction.
- the feature 610d may include an audible alarm produced at the location 110.
- Other possible indicators include: a red indicator if a limit is exceeded; a yellow indicator if a limit is about to be exceeded (e.g. 20 seconds in advance of an estimated time when the limit will be exceeded); a green indicator otherwise. If the participant's contribution is not recognized and thus is no longer captured, then playing of serialized segments 150 may resume if it was paused to capture a contribution.
- Feature 610e shows the amount of additional speaking time that the participant can contribute at this point in time without violating a limit.
- the limits can change over time, as exemplified by limit 630a ( Figure 4) defined by the proportion of the participant's contribution in the overall discussion.
- - Feature 610f provides some or all of the limits in effect for the participants at location 110.
- the limits shown in Figure 4 can be displayed.
- Feature 610g illustrates the segment priority of the audio being captured; segment priorities affect placing segments into queue 150 as discussed below (note priority 640g in Figure 5).
- - Feature 610h shows maximum time remaining for the participant's current segment if the segment's duration is limited (e.g. by limit 630c in Figure 4).
- - Feature 610i shows whether the current contribution will likely be broken into multiple segments (e.g. because at least some of the current contribution exceeds some limit and thus will be placed in a separate segment 140 and assigned a lower segment priority (640g) as explained below).
- User interface at a location 110 can include input features 620 as follows (these features may be implemented by any suitable input devices, e.g. buttons, a computer keyboard or mouse, a voice recognition system, and maybe others):
- Pause feature 620a allows a participant at location 110 to pause the playing of the discussion. This may be useful when the participant needs to refer to another source of information, answer a phone call, or attend to some other need.
- - Rewind feature 620b allows the participant to rewind the discussion if the participant needs to hear part of the discussion again to better understand it.
- - Feature 620c allows the participant to skip ahead after Rewind to the first point not yet heard by the participant (not yet played at this location).
- - Feature 620d allows the participant to skip ahead to the end of a segment 140 contributed by the same participant.
- the system may be configured to automatically skip the participant's own segments.
- - Feature 620e allows the participant to continue the playback after a pause, and maybe also after “rewind” or “skip”.
- Commands “rewind” (620b) and “skip” (620c, 620d) change the point at which playback is positioned.
- the playback does not proceed after "rewind” and/or “skip” until “continue” is issued via 620e. In effect, "rewind” and/or “skip” end in a pause. In other embodiments, the playback proceeds automatically after “rewind” and/or “skip” even if no "continue” is issued.
- limits sometimes it is desirable to control the proportion of each person's contribution to the discussion so that no person inappropriately dominates. At the same time, is desirable for each participant to hear all of the other participants. Therefore, some embodiments can place limits 630 on one or more of the participants as follows (any one of these limits may or may not apply to all participants, at all locations 110; the limits may have different values for different participants):
- Limit 630a limits the proportion of the discussion time (speaking time) available to one participant.
- - Limit 630b limits the length (in time) of discussion unheard by a participant who is contributing new audio. This helps to ensure that, when a participant makes a contribution, he is aware of most of what others have said, and this limit discourages a participant who wants to contribute from falling behind due to interruptions, rewinding, or listening to his own segments.
- - Limit 630c is a time limit on the length of any one segment 140.
- - Limit 63 Od is a time limit on the total contribution of any one participant (or any one location 110) in the entire discussion.
- Limits can be stored as data in computer readable storage.
- At least some of the limits can be applied each time a participant begins to speak to determine whether his speech will be stored as a new segment. As the participant speaks, limits can be applied to determine a cut-off time for the contribution. Once the participant is cut off, his subsequent contribution may be discarded until the limits are met, or may be given a lower priority in serialization (step 1030), i.e. the limit-offending segments can be placed farther behind in sequence 150. (Note description below of priority parameter 640g in Figure 5). For example, in some embodiments, when a limit is exceeded, the participant's current segment 140 is terminated, and a new segment is started with a lower priority to capture the limit-offending portion of the contribution. When the participant is speaking, the priority of the segment being created can be displayed to the participant as shown at 610g in Figure 3.
- the limits 630 can depend on the participant.
- the limits for a lecturer for example, can be much larger than for a student if the lecturer contributes a much larger proportion of the discussion.
- the conference equipment is configured so that that an initial portion of the discussion is not subject to individual limits 630. In other words, any contribution within an initial portion, e.g. first five minutes of the discussion, is not counted towards any limits for any participants or some of the participants. This can be used to encourage the participants to start the discussion.
- a time allowed for to a participant to speak at any point in the discussion (shown as parameter 610e in Figure 3) is calculated as:
- [0079] - 1 is the initial allowance for the participant (five minutes in the example above).
- the p and I parameters may differ for different participants. Some participants might have no limit, and this preferentially encourages them to begin the discussion. Different participants may be associated with different types of limits and different formulas for computing the allowed time 610e.
- FIG. 6 illustrates exemplary contents of a segment 140 structure received or formed by serializer 1030.
- Each segment 140 may (or may not) contain the following information (this information is defined by computer data transmitted over a network and/or stored in computer storage):
- This ID 520 can be, for example, an ID of the corresponding location 110 or the microphone 120 or other equipment, e.g. an ID the client 210 in Figure 9 discussed below.
- Time stamp 530 identifying the absolute time when the participant began speaking.
- the time stamp can be encoded according to a standard such as Unix time which encodes time as the number of seconds after January 1 , 1970, or in some other way.
- Index 550 of a related segment which can be the segment played by the participant's speaker 130 at the instant the participant began speaking or otherwise began contributing audio. If the participant began during a pause between two segments, the index can be the index of the last segment played to the participant. If no segment of the discussion had yet been played to the participant, then the related segment's index 550 can indicate this as a predefined value, e.g. 0 if the serialization indices are positive.
- Time stamp 560 showing the time when the participant began speaking relative to the beginning of the related segment (shown by index 550) if any.
- the time stamp 560 can be encoded by the number seconds or other time units from the start of the related segment.
- the serializer may place the segments into sequence 150 using pertinent priorities.
- the priorities may include one or more of the parameters 640 shown in Figure 5 (the term "parameter” as used herein may indicate any collection of data).
- the pertinent priorities can be provided to serializer 1030 together with each segment 140 (e.g. they can be appended to the segment's structure of Figure 6 if they are not part of the structure).
- the priorities of Figure 5 are as follows:
- Priority parameter 640a indicates the priority and/or status of the participant and/or location (based on participant ID 520 shown in Figure 6).
- the participant's priority may be different for different participants; for example, a manager may be given a higher priority than lower level employees, or vice versa.
- a lecturer or other presenter may be given a higher priority than the audience (e.g. students).
- the lecturer's segments 140 are placed alternately in sequence 150 since it is natural for the lecturer to answer each question from students and, when he asks a question, respond to each answer from a student.
- the lecturer's segments 140 are placed alternately with the students' segments; the students' segments are ordered among themselves using other factors, e.g. other priorities described below.
- the alternate placement may be indicated by the lecturer's and/or students' priorities in data 640a.
- the parameter 640a may indicate status information and may be displayed as one of the features 610 ( Figure 3).
- Priority 640b is the segment's starting time (time stamp 530). In some embodiments, the earlier starting time gives higher priority.
- Priority 640c is the segment's end time (based on time stamp 530 and segment length 534).
- the earlier end time gives higher priority.
- the students' segments could be prioritized entirely by their end times 640c, and placed alternately with the lecturer's segments.
- the segments' length 534 may still be unknown when the segments arrive at serializer 1030 or even when the segments' playback begins at step 1040, so priority 640c may be undefined.
- this segment is serialized next (i.e. assigned an index); if no unserialized segment has an end-time, then the segment with the earliest start-time is serialized next.
- Priority 640d is the segment's length, i.e. duration (based on segment length 534). In some embodiments, shorter segments are given higher priority. This may be desirable to encourage brevity, and/or to give priority to questions since questions tend to be short.
- the segment length may be unknown at the time when the segment is serialized (at 1030) or even when the segment's playback begins.
- Playback latency 640e at the time 530. Same as 610c ( Figure 3) at the time 530.
- the lower playback latency 640e gives higher priority to encourage the participants to listen to the other contributors before speaking. For example, in a lecture, the playback latency 640e can be used to define priorities for the students' segments but not the lecturer's segments.
- Priority 640f is total length of participant' s contribution in proportion to the total discussion time, measured at the segment's starting time 530. (This is the same value as 610n.)
- Segment priority 640g could be some default priority (e.g. same as 640a) except when the segment priority is reduced due to violation of one or more limits 630 or possibly other limits.
- parameter 640g merely indicates whether or not the segment priority was reduced, and possibly by how much.
- Priority 640h refers to the related segment as indicated by index 550 ( Figure 6). Priority 640h could be the value of index 550, or could be the starting and end times of the related segment which can be obtained from the fields 530, 534 of the related segment (except that the related segment's length 534 may still be undefined as noted above). Parameter 640h can be used to place related segments close to each other.
- a segment is assigned a priority class (this can be done by serializer 1030 or at capture step 1010 or segmentation step 1020 or at some other step).
- the lecturer's segments can be one priority class, and the students' segments can be another priority class.
- the conference includes a panel discussion with the moderator, the panelists, and the audience; there could be a separate class for the moderator, another class for the panelists, and still another class for the audience.
- a segment's priority class can be determined from priority parameter 640a ( Figure 5) and/or Participant ID 520 ( Figure 6). First, each priority class is serialized separately.
- the moderator segments receive the highest priority, and are always moved to queue 150 first as long as the queue 150.1 is not empty. If queue 150.1 is empty, then the panelists' segments alternate with the audience's segments; in other words, one segment is taken from queue 150.2 and the next segment is taken from queue 150.3. If only one of queues 150.1, 150.2, 150.3 is not empty, then the segments are moved from that queue to queue 150.
- the priority class of the segment being captured can be provided to the speaking participant as feature 610j ( Figure 3). Also, feature 610k can provide the priority class that the segment will have if one or more limits 630 become exceeded (see e.g. the limits in Figure 4).
- Feature 610/ provides, for each priority class, the estimated time until the current contribution would appear in the discussion if the current contribution were in that class. This parameter may be provided even if the participant is not currently speaking; this parameter will then refer to a contribution that the participant could start.
- the time estimate 610/ could be estimated by the conferencing equipment as the sum of:
- the latter estimate (ii) can include information on audio captured at other locations 110 even if such audio has not yet been provided to the participant's location 110.
- the participant's location 110 can obtain such information by querying other locations 110 and/or the central computer if one is used.
- Time estimate 610/ can also be provided if there are no priority classes (in other words, if all the segments are in the same class).
- segmenter 1020 The system portion performing segmentation will be referred to as “segmenter 1020". This portion may overlap with serializer 1030 and/or other parts of the system.
- the conferencing system can be configured how segments are defined, and the segment definitions may be different for different participants.
- a segment end can be defined by a minimal length pause ("MLP") and/or by a maximum segment length ("MSL") 630c ( Figure 4), and these two parameters can be different for different participants and/or different discussions.
- the conferencing system may store discussion configuration data (e.g. 268 in Figure 9) which define these parameters and other configuration parameters selected for a particular discussion (i.e. particular conference).
- discussion configuration data may include the limits 630, and may define what to do if a limit is exceeded (e.g. if a participant attempts to speak beyond his current limit or limits 630). The following three kinds of limits are of particular interest in the segmentation examples below:
- configuration data may specify:
- the configuration data may specify any one of the following possibilities for the audio generated when a limit is exceeded:
- segment priority 640g may be reduced for the segment.
- each segment 140 is defined primarily by pauses in the audio. But if a participant does not provide adequate pauses, the maximum segment length (MSL) 630c can be used. A participant's successive segments can be interrupted by other participants' contributions, so breaking a segment at the MSL may degrade audio clarity (for example, if the segment is broken in the middle of a word, and the two successive segments containing this work are played back at different times, with other segments intervening). The participant can use indicators 610h and 610i ( Figure 3) to ensure segment termination at pauses. User interface 610 also informs the participant of delays of his contributions being heard (feature 610/ ).
- Segmentation example 1 performed by segmenter 1020 for one location:
- [00128] 1. Determine the start of a new segment (step 810). This may be the start of the first audio received from the location, or may be based on the end of the previous segment as described below.
- step 820 scan the audio of the new segment for a minimum length pause MLP1 (i.e. a pause of a length at least MLP1) or the maximum segment length 630c (MSL), whichever occurs first. Both MLP1 and MSL may differ from location to location. MLP1 and limits 630 can be defined by configuration data as explained above.
- step 820 to find the end of the new segment, or to step 810 if the new segment has not begun (i.e. no audio has been received after MLP1).
- step 840 If the maximum segment length MSL is detected before the minimum length pause MLP1 (step 840), conduct a backward search of the audio from the MSL point to the segment start to find a shorter pause, of another minimum length MLP2 (i.e. a pause of at least MLP2, where MLP2 is possibly defined by configuration data).
- MLP2 is smaller than MLP1.
- MLP1 can be 5 seconds
- MLP2 can be 1 second. Other values are also possible.
- a shorter pause MLP2 i.e. of a length at least MLP2
- terminate the current segment at the start or during the shorter pause MLP2 the latest shorter pause MLP2 if there are multiple shorter pauses
- place the subsequent audio into a new segment The new segment will start at the end of the previous segment (i.e. sometime during the shorter pause).
- step 860 terminate the current segment at the maximum length MSL, and start a new segment some time before the previous segment' s end (i.e. before the MSL point) so that the new segment will overlap with the previous segment. For example, if the audio is received in network packets according to some protocol, then the new segment can be started at the start of the packet containing the MSL point.
- the overlap between the new and previous segments can be determined as starting at a fixed time (e.g. 0.5 seconds) before the MSL point. How the overlap is defined can be specified by the configuration data. The overlap will cause duplication of some of the audio in the playback so as to help the listener to understand the audio.
- the segmenter 1020 determines a segment's start or end, the start and end information (as defined by data 530 or 534 in Figure 6 for example) is passed to serializer 1030.
- the segmenter 1020 determines pertinent priority information 640 (and/or other priority information) for each segment, and passes such information to serializer 1030.
- Feedback can be provided to the participant using the features 610 described above.
- the participant can be provided with indicator 610i when his audio contribution might be cut to form a new segment.
- the indicator 610i could be any of these:
- the policy of the serializer 1030 may encourage participants to pause speaking through a policy giving higher priority to the soonest ending segments (as specified by priority 640c in Figure 5).
- a type of feedback that can encourage a participant to pause is the number of other participants with segments not yet completed because each participant would want to end his segment before the other participants stop speaking. That information can be displayed by user interface 610 (as feature 610m for example). Alternatively, feature 610m can be used to encourage the participants to pause.
- the output of the serialized discussion sequence can include pauses of a desired length between the segments 140. Pauses between the segments provide a participant who wishes to make a contribution with an obvious and convenient moment to do so. This is especially useful in embodiments that do not allow a participant to start a new segment during a playback of another segment. Such embodiments are useful where the participant's computing device is not powerful enough for the speech processing needed to separate the segment being output from the new speech.
- Video can accompany the audio segments 140, and can be played in the same sequence as the respective segments 140 for participants with video display capabilities.
- the video segments can be produced by a speaking participant if the participant has a means to produce video like a web-cam.
- a presentation includes visual media, such as Microsoft PowerPoint slides
- the presenter can provide the video and the video can be stored with the serialized discussion sequence 150.
- the segment can be associated with a point in time of the video provided by the presenter. Then, when the audience's segment is played, the segment's audio can be accompanied by the associated video from the presenter for providing the context for the question. If the video originally comes from the presenter's computer, the presenter has the option to take control back and switch the video in real-time, but that only affects what will be seen by participants who have not yet played back the segment.
- segmentation 1020 and serialization 1030 are performed by a central computing system 260 ( Figure 9).
- Audio capture 1010 and playback 1040 are performed by the conferencing equipment at each location 110.
- Such conferencing equipment is shown as "discussion clients" 210 in Figure 9.
- Figure 1 shows the segments 150 are streamed to locations 110. The streaming to each location 110 can occur independently from the other locations.
- FIG. 9 shows examples of conferencing equipment 210 locations 110A, HOB, 1 IOC, HOD.
- Each of the four locations has a discussion client 210 which captures participants' contributions at that location and plays segments 150.
- Each discussion client 210 includes computer storage for storing the serialized segments 150.
- Each discussion client 210 also includes a network interface which allows the discussion client to access a
- Each location 110 in the figure is set up differently:
- Location 110A has only audio devices, and namely an audio speaker 130 and a microphone 120. Location 110A cannot generate or view video. (Microphone 120 and speaker 130 are shown as separate from the discussion clients, but the term “discussion client" may include the microphone and the speaker.)
- Location 110B has a speaker device 130 and a microphone 120, and in addition has video devices including a web cam 220 and a display screen 240 for video.
- video devices including a web cam 220 and a display screen 240 for video.
- iscussion client may include the web cam and the display screen.
- Location 1 IOC is a mobile phone.
- the phone can be used for audio as the phone includes a microphone and a speaker device (not shown), but in some embodiments the phone could also be used for video since the phone includes a screen and may or may not include a camera.
- the phone includes a computer (not shown) including a computer processor and memory and other associated circuitry (e.g. a network interface card).
- Location 110D is a laptop computer which may be able to participate fully in the discussion, i.e. to provide both audio and video capture and display.
- the discussion client is the laptop computer's processor and memory and other associated circuitry (e.g. network interface card, etc.).
- Each of these discussion clients 210 at locations 1 lOA-110D communicates with central computing system 260 that stores the serialized discussion segments 150 and performs serialization 1030 to assign to each segment an index, e.g. a whole number greater than zero, which is used to identify the segment.
- the central computing system 260 can be a computer on the Internet or can be a network of computers. Segmentation 1020 is also performed by central computing system 260.
- the central computing system 260 also stores, in non-transitory computer storage (e.g. disks, tapes, semiconductor memory, etc.) configuration data 268 and the status 272 of the discussion in progress, and provides such information to clients 210.
- Configuration data 268 includes limits 630 and possibly other data as described above (e.g. lengths of pauses for segmentation 1020).
- Discussion status 272 includes information that can be used by a discussion client to obtain the UI features 610.
- Figure 10 shows a block diagram of an exemplary central computing system 260.
- Newly captured audio is directed from discussion clients 210 to Speech processing unit 310 ( Figure 10) within central computing system 260.
- Discussion clients 210 may also periodically time-stamp different points in the audio to allow the speech processing unit 310 to later determine the segments' start 530 and length 534 ( Figure 6) when segmentation is performed.
- a discussion client 210 may include no time stamp, and speech processing unit 310 assumes that the audio is generated at the time it is received by the speech processing unit.
- Different discussion clients can operate differently from each other in the same discussion.
- the discussion client 210 also sends the segment's index "RSI" (for defining the field 550) and the time stamp TS within the related segment (for field 560).
- RSSI segment's index
- TS time stamp
- speech processing unit 310 when speech processing unit 310 cleans up the sound (the audio) upon receipt.
- the speech processing unit might remove noise, but an important part of the cleanup is to remove sound from the related segment.
- the related segment' s sound might be picked up by the microphone 120, and should preferably be removed to make the contributor's presentation clearer. If the contributor uses a head phone for microphone 120 or starts in a pause, the unwanted sound might be minor, but otherwise, it can be significant.
- This processing is similar to echo cancellation and prior art might be used to implement it. Spectral subtraction, also prior art, might also be used.
- speech processing unit 310 may use the related segment index RSI and the time stamp TS received from the discussion client 210 with the new audio.
- speech processing unit 310 supplies RSI and TS to Serialized Segment Server 320 (described below) which stores segments 140.
- Serialized Segment Server 320 returns the pertinent portion of the related segment's sound 510.
- only a short portion of the new audio is processed for removal of the related segment's sound because the discussion client 210 pauses playback of the related segment as soon as the participant begins speaking.
- At least some sound cleanup is performed by the discussion client 210, and additional cleanup may or may not be performed by the speech processing unit 310.
- some discussion clients 210 cleanup the sound while other do not, and speech processing unit 310 may perform cleanup for some but not other discussion clients, and different types of cleanup may be performed for different discussion clients.
- Both the speech processing unit 310 and the discussion clients 210 have access to the related segment data 510.
- the central speech processing unit 310 might perform these functions because it may have more powerful processing capabilities, but the client system 210 might perform these functions because it can optimize the processing for the acoustic environment and because it does not need to do processing for other participants.
- speech processing unit 310 performs segmentation 1020 and provides new segments 140 and each segment's pertinent priorities 640 to segment serializer 330.
- segmentation is performed by some Discussion Clients 210 but not all the Discussion Clients; speech processing unit 310 performs segmentation for those clients 210 which do not perform segmentation.
- a client 210 may remove long pauses from the audio data, but the remaining segmentation work is performed by speech processing unit 310.
- audio may arrive simultaneously from different Discussion Clients 210, and can be processed simultaneously.
- Segment serializer 330 performs serialization 1030, placing each segment 140 into queue 150 and assigning a segment index to the segment.
- the segment index is a number that is greater for segments later in the sequence 150.
- Any suitable serialization algorithm can be used including those described above. Another possible algorithm is to simply assign increasing indexes to segments 140 in the order that new segments initially arrive at the serializer 330. Of note, a new segment 140 may arrive over time as it is being created and the serializer 330 can wait until it has received all of the new segment before assigning an index to the new segment, especially since some possible rules for assigning the index use the segment length 534 ( Figure 6) or the segment's end time (530 plus 534).
- the serialization algorithm is defined by serialization rules 342 which are part of configuration data 268 stored in computer storage in discussion configuration unit 340.
- Discussion configuration unit 340 provides the serialization rules to serializer 330.
- the serializer 330 can work as follows:
- Serializer 330 checks the serialization rules 342 for privileges of the contributing participant.
- the serialization rules may indicate that the new segment's contributor (identified by ID 520 and/or priority 640a) might have the privilege that his segments 140 appear alternately in the sequence (for example, if the contributor is a lecturer). If so, and if the last serialized segment 140 was contributed by another participant, then the segment 140 is immediately assigned the next index (possibly even before the segment's end is determined).
- the new segment 140 is placed in a group of unassigned segments to which other rules 342 are applied. If enough Discussion Clients 210 are waiting for segments because the Discussion Clients have already played all serialized segments, then the serializer 330 can pick an incomplete new segment 140 and assign to it the next index so the segment can become available to the waiting clients 210. Serializer 330 can apply a rule such as picking a segment 140 based on its earliest time stamp 530. Serializer 330 can also consider the priority 640a of the contributor. The serializer can obtain current information about waiting clients from the Serialized Segment Server 320 described below.
- serializer 330 Immediately upon assigning an index to a new segment, the serializer 330 starts transmitting the segment to the Serialized Segment Server 320. (The serializer 330 transmits the whole segment or, possibly, stores the segment and transmits its address to server 320.)
- the Serialized Segment Server 320 does the following:
- Serialized Segment Server 320 streams the segment 150 to the Discussion Clients 210 directly from the serializer 330 (server 320 stores the segment in storage 264 at the same time). Otherwise, Serialized Segment Server 320 streams the segment 150 to the client 210 from the storage 264 where the segment 150 has been previously stored by the segment server.
- Status 272 can include:
- the segment serializer 330 can stream new segments 140 into the Serialized Segment Server 320 to begin storing each new segment 140 before the segment is assigned an index.
- the segment serializer 330 can later send the segment's serialization index to the Serialized Segment Server 320.
- a new segment 140 Before being assigned an index, a new segment 140 can be assigned a temporary index that the serializer 330 and Serialized Segment Server 320 can use to reference the segment, or the segment can be referenced by its position (e.g. starting address) in storage 264 or possibly by its participant ID 520 ( Figure 6), or in some other way.
- Discussion configuration unit 340 stores configuration data 268 that other parts of the system, such as the Segment Serializer 330, the Discussion Clients 210, and unit 310 doing the segmentation, can access.
- configuration data 268 may include the limits 630 for each participant, and may also include other information, for example privilege information on participants (e.g. same as 640a and maybe other information).
- the privilege information may include:
- Configuration data 268 may include the length of pauses to be inserted by the Discussion Client 210 between the playing of consecutive segments 150, or the lengths of pauses used in segmentation as described above.
- the term "participant" for these purposes can mean all of persons sharing one Discussion Client 210 since the system does not distinguish those persons from each other.
- Figure 11 is a block diagram of a Discussion Client 210.
- the major functions of a typical Discussion Client are capture 1010 and playback 1040.
- the Discussion Client of Figure 11 has a video buffer 410 and an audio buffer 420 (implemented in main memory or some other computer storage) to capture and store the raw data from any video capture device such as a web cam 220 and audio capture device 120 such as a microphone so that the data is not lost before it can be processed.
- a video buffer 410 and an audio buffer 420 (implemented in main memory or some other computer storage) to capture and store the raw data from any video capture device such as a web cam 220 and audio capture device 120 such as a microphone so that the data is not lost before it can be processed.
- User interface 430 may include the user interface features 610 and 620 ( Figure 3), and may display discussion status 272 described above and may accept commands from the participant in one form or another such as a voice command or touch of a button. (User interface 430 may be combined with screen 240 ( Figure 9) to display both status 272 and video segments, and/or user interface 430 can be combined with user interfaces of other devices, such as 120 or 220.)
- VAD (Voice Activity Detection) unit 440 detects the start of new speech. The detection is performed based on the signal from audio capture device 120. When VAD 440 detects new speech, VAD 440 alerts New Segment Control unit 450 to capture the new audio. VAD unit 440 can use algorithms from prior art, such as counting zero crossings, to detect the start of new speech. The detection can err toward inferring start of speech when there is none because another unit, the speech processing unit 460, can compensate. In this approach, VAD 440 can make a quick judgment and the more complex analysis is only performed when VAD 440 detects start of speech.
- VAD 440 is likely to produce false positives (false signals indicating start of speech when there is none), but speech processing unit 460 can subtract the feedback (the played sound) as described below, and can re-run the VAD algorithm.
- Speech processing unit 460 is very similar to the speech processing unit 310 in the Central computing system 260. As stated before, only one or both of the two speech processing units 460, 310 may perform sound cleanup. However, regardless of the functionality of unit 310, the unit 460 in the Discussion Client 210 may also perform the following tasks:
- unit 460 may use information received from the New Segment Control Unit 450 described below.
- Unit 460 transmits the video stream from capture device 220 as well the audio stream to central computer 260. If unit 460 is directed by the New Segment Control Unit 450 not to transmit the audio as described below (due to limit violations for example), then unit 460 may also block transmission of the associated video.
- speech processing unit 460 performs reduction of noise and of sound from a related segment, then speech processing 460 can have these special features:
- Speech processing unit 460 can augment the VAD algorithm because speech processing unit 460 uses information about the sound from a simultaneously played segment and about how the input audio is affected by the simultaneously played segment. After removal of noise and sound of the simultaneously played segment from the audio input, speech processing 460 can test more accurately for the start of new speech. If unit 460 determines that new speech has not occurred (VAD was triggered by noise or playback), unit 460 signals the New Segment Control 450 that there is no new audio to transmit, and speech processing 460 does not transmit the audio.
- New Segment Control unit 450 directs transmission of newly captured audio to the central computer 260 as follows:
- User interface 430 which might provide a participant's command to indicate creation of a new segment (via voice command, button touch, or other human interface). To the participant, this is a "record" command.
- VAD unit 440 [00193] - b. VAD unit 440.
- New Segment Control 450 applies rules based on (i) participant status 640a received from the Segment Player 470, (ii) discussion status 272 and discussion configuration 268 from the Central computing system 260, to determine whether the new audio should be allowed (accepted). Possible rules are described above in connection with limits 630a, 630b, 630d ( Figure 4). For example, New Segment Control 450 can use the rules to compute the parameter 610e ( Figure 3) indicating how much additional time this participant can contribute to new segments based on current discussion status 272. New Segment Control 450 might use this information to enforce the rules. New Segment Control 450 can additionally transmit the information to user interface 430 for display.
- New Segment Control 450 determines that the new audio should be accepted, New Segment Control 450 signals the Segment Player 470 to pause any currently playing segments and signals the Speech processing unit 460 to transmit the new audio. At the same time New Segment Control 450 sends information to the Speech processing 460 to be used in segmentation and/or serialization. Such information may include:
- New Segment Control 450 accepts any signal from the Speech processing unit 460 to stop the new audio and responds by signaling the Segment Player 470 to resume any playback.
- Audio segment buffer 482 and video segment buffer 484 monitor the Central computing system 260 for serialized segments 150 and the associated video segments and buffer the audio and video segments as the segments become available.
- a video segment contains the video data for the corresponding audio segment 150.
- Audio buffer 482 includes, for each segment 150 it stores, all segment information as described above including the serialized segment index.
- the speech processing unit 460 can access audio segment buffer 482 for a related segment 150 played during capture of new audio and can remove from the new audio the sound played during the new audio capture.
- Segment Player 470 performs the following functions:
- Segment Player 470 [00204] - a. Provides the index of the currently playing segment 150 to the New Segment Control Unit 450.
- the index can be used as the related segment index (note field 550 in Figure 6).
- Segment Control unit 450 in determining compliance with limit 630b.
- New Segment Control unit 450 computes this estimate from other participants' status data 272 (obtained from Segment Player 470) and configuration data 268. Such data may be provided by the Serialized Segment Server 320 and passed to unit 450 via
- the total time of not yet serialized segments 140 of a given priority can be obtained by the Segment Player 470 from the Serialized Segment Server 320.
- Other information for display can be obtained in one of these ways:
- Serialized Segment Server 320 which has most of the information except the playback latency.
- the Serialized Segment Server 320 can provide:
- Segment Serializer 330 also passes segment priorities 640g to Serialized Segment Server 320, then the priority 640g for any segment 140 the participant is currently creating can also be provided.
- Information from the Serialized Segment Server 320 and the Segment Player 470 is passed from the Segment Player 470 to the New Segment Control Unit 450 as the participant status (included in 640a). Therefore, the New Segment Control Unit 450 can provide to user interface 430 all of the above information for display or alerts.
- the information 610i to determine when the participant's contribution might be cut can be computed using discussion status 272 that the Segment Player 470 uses to create participant status 640a.
- the Serialized Segment Server 320 stores not only serialized segments but also segments not yet serialized, it can provide the needed information to the Segment Player 470 - the starting time 530 and duration 534 of the participant's most recent segment 140 (duration 534 can be provided when it becomes defined in segmentation 1020). If that segment does not include the current time, this is assumed to mean that the participant is not speaking and thus is not close to exceeding the limit 630c.
- the Segment Player 470 can pass the start time 530 to the New Segment Control Unit 450 which can compute the current length of the new segment being created and how much remains before the limit 630c is reached.
- a cut-off might also be enforced due to the participant falling behind in playback (limit 630b), and the new segment control unit 450 can also estimate how soon that condition might occur from the participant status 640a and can cause the user interface 430 to use user interface to alert the participant, for example, showing yellow, then red when the user is close to falling too far behind in playback.
- Discussion Clients 210 there are one or more Discussion Clients 210 that are not trusted to apply limits 630 or other limitations, or to honestly provide information on playback latency (so as to affect the limit 630b) or other information.
- the central computing system 260 would watch out to enforce the limitations, e.g. to authenticate the Discussion Clients or conferencing software used by the Discussion Clients.
- the central computing system 260 could never know for sure what segments have been played.
- all Discussion Clients would have some credential that the server could authenticate.
- Some embodiments provide a teleconferencing method comprising performing, by a teleconferencing system, operations including, during a teleconference conducted by participants located at two or more locations interconnected by a telecommunications network, obtaining segments of audio data representing audio signals generated by the participants.
- the segments can be segments 140 obtained by serializer 1030.
- Each segment contains audio data from a respective one of the locations.
- the audio data in each segment are associated with a time at which the audio signals are assumed to have been generated.
- the "time" can be starting time 530, or the end time, or some other time associated with the segment's audio (e.g. the time of the middle of the audio).
- the time may also be a time interval, i.e. may include more than one time points.
- the time can be absolute or relative for example.
- the method further comprises serializing the segments received from two or more of the locations. Serialization can be performed on segments from all the locations or just some of the locations. For example, if the serialization is performed at each location, the
- serialization may omit the segments generated at the location.
- Serialization establishes an order of the audio data (e.g. the order in sequence 150).
- the order is established based on one or more predefined rules for establishing the order, the serializing being performed even if audio signals of two or more of the segments overlap, the serializing thus allowing the audio signals overlapping in time to be reproduced (played) from the audio data separately rather than mixed;
- the method further comprises processing the segments taking the order into account.
- processing may include playing all or some of the segments, or sending all or some of the segments to one or more locations (e.g. by a central computer).
- the teleconference is conducted by three or more participants.
- the order takes into account one or more of:
- processing the segments comprises sending the segments' audio data and information on the segments' order over the telecommunications network to at least two of the locations.
- processing the segments comprises sending the segments' audio data and information on the segments' order over the telecommunications network to at least two of the locations.
- Some embodiments provide a teleconferencing method comprising executing teleconferencing operations by a teleconferencing system located at a first location which is one of two or more locations interconnected by a telecommunications network. The teleconferencing operations are executed during a teleconference conducted by participants located at the two or more locations, the teleconferencing operations comprising:
- the teleconferencing system is operable to play the audio data
- Some embodiments provide a teleconferencing method in which regardless of whether or not the audio signals or audio data are generated simultaneously at two or more locations, the audio data represent the audio signals from each location separately without mixing the audio signals from different location regardless of whether or not the audio signals at different locations were generated simultaneously.
- audio data is typically digital data.
- the audio data represent the audio captured at the participant's location, even though this audio includes the related audio originally captured at a different location and now re-captured at the participant's location.
- the sound represented by audio data from different locations is not mixed in the audio data representation.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
In a teleconference, audio data represent audio signals (air oscillations) from different participants without mixing the audio signals from different locations even if participants speak simultaneously; each participant's audio is not obscured by other participants. All participants' audio data are queued in a common queue (150) based on the time the audio was generated, and/or on the participants' priorities, and/or other information. The audio is played at each location in the queue's order. Other features are also provided.
Description
TELECONFERENCING FOR PARTICIPANTS AT DIFFERENT LOCATIONS
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority and benefit of U.S. provisional application no. 61/721,032 filed November 1, 2012, incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] The present invention relates to telecommunications networks, and more particularly to teleconferencing.
[0003] Teleconferencing allows people to communicate remotely over a
telecommunications network as if they were talking face to face. Teleconferencing equipment receives the audio signals from each person (each teleconference participant), mixes the audio, and sends the mixed audio to each participant. Additionally, if video teleconferencing is available, then a participant may receive images (e.g. photographs or computer screen images) from one or more other participants. The images can be displayed on a computer monitor.
[0004] Improved teleconferencing facilities are desirable.
SUMMARY
[0005] This section summarizes some features of the invention. Other features may be described in the subsequent sections. The invention is defined by the appended claims, which are incorporated into this section by reference.
[0006] Some teleconferencing embodiments of the present invention step away from imitating face-to-face interaction between participants, and such embodiments enhance a teleconference with features not available in face-to-face interaction. In particular, in some embodiments, the participants' audio is not mixed and hence not obscured by other participants. Thus, some embodiments allow people at different locations to have a discussion (i.e. a teleconference) using a telecommunications network (possibly voice and/or data network) in such a way that the following benefits and conveniences are provided:
[0007] 1. No person's spoken contribution is missed by any other person. To be heard in the discussion, a speaker makes very little effort. In some embodiments, the speaker may start to speak as soon as the words come to mind. In some embodiments, the speaker can start speaking at any pause in the discussion and his contribution will be heard regardless of what other participants do - other participants may start speaking at the same time and/or be distracted without missing any participant' s contribution.
[0008] 2. Even if two or more participants speak at the same time, they are not heard as
speaking at the same time; their contributions to the discussion can be heard one at a time, in sequence.
[0009] 3. Participation schedules need not be precisely coordinated: a person can join the teleconference late yet still participate and hear all of the discussion. This can be achieved, for example, by recording each participant's contribution for later reproduction by any participant including those who join late.
[0010] 4. Interruptions, such as a call on another phone or other distraction, also do not cause any of the discussion to be missed by any person. The person being distracted can hear the other participants' contributions later during the teleconference if the contributions are recorded. The other participants thus do not have to wait for the distracted person; they can continue the discussion, or they can listen to earlier recorded contributions if desired. The distracted person can hear all the communications other than his own in the same order as everyone else.
[0011] 5. A speaker can pause briefly without being interrupted by another speaker who starts speaking at the pause - both speakers can speak at the same time.
[0012] 6. A moderator is not needed. A moderator can help establish priorities but is not needed to achieve the previously mentioned benefits.
[0013] 7. Some embodiments do not need a moderator to prioritize speakers.
[0014] 8. If one person is doing a presentation including video, questions can be asked at any time but can be heard by other participants later; any visual context shown by the presenter at the instant of the question can be automatically provided to each participant when the question is played to the participant.
[0015] 9. Muting is automatically applied to reduce noise from locations where no one is speaking to make a contribution.
[0016] There are many situations in which some embodiments are useful. Some of these situations are as follows:
[0017] 1. A meeting among several people of an organization when those people are in different locations. Some of them might be traveling and some working in home offices. An example of a kind of meeting that might benefit significantly is a brainstorming session because each person can contribute at the time his idea occurs. Long pauses in the discussion are less burdensome because if multiple participants speak at the same time, then each participant can listen to other participants' audio during a pause.
[0018] 2. A lecture given over the internet. Students will ask questions. A student can ask a question at any time and the question will placed in a queue. The lecturer might ask
questions to be answered by the students and the student's answers can be placed in a queue.
[0019] 3. Other kinds of presentations to an audience when questions are asked by the presenter or the audience.
[0020] 4. A chat among sport fans about a game in progress. In cases like this, the system might not provide any video; the users (chat participants) are presumed to each use their own favorite means of observing the game - TV or internet.
[0021] 5. Persons reporting about and coordinating response to an emergency such as one caused by an earthquake or hurricane.
[0022] 6. Any other discussion in which the participants are at different locations and participate at approximately, but not necessarily exactly, the same time. For example, a person may join the discussion near the end and still hear the whole discussion and participate in the discussion (by adding comments for example). If the person does not participate, he can still hear the whole recording.
[0023] In this document, an individual participant is sometimes referred to as "he" meaning "he or she".
[0024] The invention is not limited to the features and advantages described above except as defined by the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] Figure 1 illustrates audio data flow in some embodiments of the present invention.
[0026] Figure 2 illustrates operations performed by teleconferencing equipment according to some embodiments of the present invention.
[0027] Figure 3 illustrates user interface according to some embodiments of the present invention.
[0028] Figure 4 illustrates limits imposed in a teleconference according to some embodiments of the present invention.
[0029] Figure 5 illustrates priorities used in teleconferencing according to some embodiments of the present invention.
[0030] Figure 6 illustrates an audio segment according to some embodiments of the present invention.
[0031] Figure 7 illustrates data flow in serialization according to some embodiments of the present invention.
[0032] Figure 8 is a flowchart of segmentation according to some embodiments of the present invention.
[0033] Figure 9 is a block diagram of a teleconferencing system according to some
embodiments of the present invention.
[0034] Figure 10 is a block diagram of a central computing system used in
teleconferencing according to some embodiments of the present invention.
[0035] Figure 11 is a block diagram of a teleconferencing participant' s system according to some embodiments of the present invention.
DESCRIPTION OF SOME EMBODIMENTS
[0036] The embodiments described in this section illustrate but do not limit the invention. The invention is defined by the appended claims.
[0037] Figure 1 illustrates a conference (i.e. teleconference) of four participants at respective four locations 110A, 110B, 1 IOC, 110D for some embodiments of the present invention. In this schematic illustration, each location has a microphone 120 and a speaker device 130. The microphone converts the audio signals (i.e. air oscillation signals) into electrical signals, and the speaker device performs the opposite transformation, as known in the art. The number of participants and other details are not limiting. Figure 2 illustrates exemplary operations. These operations can be performed by any suitable systems, which may include one or more computer processors executing computer instructions (which can be called software) that are stored on a computer readable media (e.g. disks, tapes, semiconductor memories, and other types of storage). Alternatively or in addition, parts or all of one or more operations can be executed by non- software-programmable systems, e.g. systems that include non-software circuits and computer storage. The operations are:
[0038] 1. Capture each person's spoken contribution when it occurs (step 1010 in Figure 2), and convert the spoken contribution to data ("audio data", which are physical signals, e.g. electrical). This process can be performed by microphone 120 and possibly other equipment as known in the art. This description often refers to "spoken" contribution, but other types of audio can also be present (e.g. music, touch-tone digital codes, etc.). Audio data can be digital data.
[0039] 2. Break up the audio data from each participant (each location 110) into segments (step 1020). The segments are shown in Figure 1 as 140 A, 140B, ... for the audio data from respective locations 110A, HOB, .... A segment 140 (i.e. 110A or HOB etc.) can be defined by audio data that records one person's continuously spoken contribution. A location 110 may generate multiple segments or no segments, as any participant might contribute more than one segment at any given time in the discussion, but a segment 140 contains an uninterrupted contribution from one person. However, in some cases, where several participants share a single conference room at each location 110 (possibly with a single microphone 120), one
segment 140 might contain contributions from more than one of the participants in the conference room. The start and end of each segment 140 are determined as shown at step 1020 of Figure 2 (steps 1010 and 1020 can be combined). For example, the end of a segment can be automatically determined by a minimum length pause.
[0040] 3. At step 1030 (Figure 2), the segments from all participants are placed in a single sequence (e.g. sequence 150 in Figure 1). This process is called serialization herein; the segments are serialized. The sequence could be created on a single storage device or system (e.g. 264 in Figure 9) by a central computing device such as a computer server (e.g. 260 in Figure 9). Below, reference number 150 is used to refer both to the sequence of serialized segments and to an individual segment 140 in the sequence.
[0041] 4. At step 1040, the serialized segments 140 are played for each participant (e.g. by respective speaker devices 130) in the order of sequence 150 when each participant is ready to hear it. Sequence 150 is thus organized as a queue (first- in-first-out). Every participant hears the same sequence 150, but not necessarily at the same time. An exception is that a participant might or might not hear the segments 140 created from his own contributions. Also, in some embodiments, a participant may re-wind to an earlier point in the conference, or issue a command to skip a segment, or issue some other command to payback the segments in a different order. In some embodiments, when a participant is speaking to create a new segment 140 (step 1010), the audio output (playback) is paused for him at his speaker device 130 until he finishes speaking, when the output is resumed for him.
[0042] The steps of Figure 2 do not have to be performed in the order shown. For example, step 1030 (defining a segment's place in sequence 150) can be performed before the segment's end is determined at step 1020, or even before the entire segment's audio is captured at step 1010; a segment can be played while its audio is still being generated by a participant. The playback step 1040 can also begin before the segment's audio is all captured.
[0043] Also, while the example above mentions step 1030 as performed by a central computer system, some embodiments have no central computer system at all: all audio is transmitted to all locations 110 other than the location at which the audio was captured, and the serialization 1030 is performed at each location 110. Segmentation 1020 can be performed at the playback locations 110, or at the location at which the audio is captured (at 1010), or by a central computer or computers. In some embodiments, one or both of steps 1020 and 1030 are distributed, i.e. are performed in concert at locations 110 or at one or more locations 110 and a central computer or computers.
[0044] In some embodiments, the playback is performed in the same segment order for all
the participants (except perhaps that each participant is not played the participant's own contribution). Therefore, it may be simpler to perform serialization by a central computer system. On the other hand, it may be more efficient for at least some of the segmentation work to be performed at locations 110. Such segmentation work may include, for example, elimination of long pauses.
[0045] A user interface at each location 110 can provide the participant (or participants) at that location with information 610 (Figure 3) about the discussion. The user interface may include video devices such as light indicators, computer monitors, etc., and may include audio devices (e.g. audio alarms). The information 610 can be defined by data obtained by conferencing equipment at location 110. Information 610 may include any combination of:
[0046] - The total number 610a of current participants (or of locations 110).
[0047] - Identity 610b of the participant who contributed the segment 140 currently being played at the location 110.
[0048] - Playback latency 610c. This is the time required to hear all of the remaining discussion not yet heard (not yet played) at location 110 but already recorded by other participants. The remaining discussion may include just the serialized segments 150, or may possibly include all or some of the audio not yet serialized but already captured (at step 1010).
[0049] - Information 610d shows, to a speaking participant, whether his speaking has been recognized and is being stored for a segment 140. The participant's speaking may be rejected if it violates one or more limits discussed below (see Figure 4), or may be rejected due to a technical malfunction. In the case where the participant speaking is rejected due to limit violation, the feature 610d may include an audible alarm produced at the location 110. Other possible indicators include: a red indicator if a limit is exceeded; a yellow indicator if a limit is about to be exceeded (e.g. 20 seconds in advance of an estimated time when the limit will be exceeded); a green indicator otherwise. If the participant's contribution is not recognized and thus is no longer captured, then playing of serialized segments 150 may resume if it was paused to capture a contribution.
[0050] - Feature 610e shows the amount of additional speaking time that the participant can contribute at this point in time without violating a limit. (The limits can change over time, as exemplified by limit 630a (Figure 4) defined by the proportion of the participant's contribution in the overall discussion.)
[0051] - Feature 610f provides some or all of the limits in effect for the participants at location 110. For example, the limits shown in Figure 4 can be displayed.
[0052] - Feature 610g illustrates the segment priority of the audio being captured; segment
priorities affect placing segments into queue 150 as discussed below (note priority 640g in Figure 5).
[0053] - Feature 610h shows maximum time remaining for the participant's current segment if the segment's duration is limited (e.g. by limit 630c in Figure 4).
[0054] - Feature 610i shows whether the current contribution will likely be broken into multiple segments (e.g. because at least some of the current contribution exceeds some limit and thus will be placed in a separate segment 140 and assigned a lower segment priority (640g) as explained below).
[0055] - Features 610j, 610k, 610/ are discussed below in connection with serialization 1030.
[0056] - Feature 610m shows the number of currently speaking participants.
[0057] - Feature 61 On shows the proportion of the participant's contribution so far in the overall discussion. This proportion may be limited by the limit 630a (Figure 4) discussed below.
[0058] User interface at a location 110 can include input features 620 as follows (these features may be implemented by any suitable input devices, e.g. buttons, a computer keyboard or mouse, a voice recognition system, and maybe others):
[0059] - Pause feature 620a allows a participant at location 110 to pause the playing of the discussion. This may be useful when the participant needs to refer to another source of information, answer a phone call, or attend to some other need.
[0060] - Rewind feature 620b allows the participant to rewind the discussion if the participant needs to hear part of the discussion again to better understand it.
[0061] - Feature 620c allows the participant to skip ahead after Rewind to the first point not yet heard by the participant (not yet played at this location).
[0062] - Feature 620d allows the participant to skip ahead to the end of a segment 140 contributed by the same participant. In some embodiments, though, the system may be configured to automatically skip the participant's own segments.
[0063] - Feature 620e allows the participant to continue the playback after a pause, and maybe also after "rewind" or "skip". Commands "rewind" (620b) and "skip" (620c, 620d) change the point at which playback is positioned. In some embodiments, the playback does not proceed after "rewind" and/or "skip" until "continue" is issued via 620e. In effect, "rewind" and/or "skip" end in a pause. In other embodiments, the playback proceeds automatically after "rewind" and/or "skip" even if no "continue" is issued.
[0064] Turning now to limits (see Figure 4), sometimes it is desirable to control the
proportion of each person's contribution to the discussion so that no person inappropriately dominates. At the same time, is desirable for each participant to hear all of the other participants. Therefore, some embodiments can place limits 630 on one or more of the participants as follows (any one of these limits may or may not apply to all participants, at all locations 110; the limits may have different values for different participants):
[0065] - Limit 630a limits the proportion of the discussion time (speaking time) available to one participant.
[0066] - Limit 630b limits the length (in time) of discussion unheard by a participant who is contributing new audio. This helps to ensure that, when a participant makes a contribution, he is aware of most of what others have said, and this limit discourages a participant who wants to contribute from falling behind due to interruptions, rewinding, or listening to his own segments.
[0067] - Limit 630c is a time limit on the length of any one segment 140.
[0068] - Limit 63 Od is a time limit on the total contribution of any one participant (or any one location 110) in the entire discussion.
[0069] Other limits are possible. Limits can be stored as data in computer readable storage.
[0070] At least some of the limits (e.g. 630a, 630d) can be applied each time a participant begins to speak to determine whether his speech will be stored as a new segment. As the participant speaks, limits can be applied to determine a cut-off time for the contribution. Once the participant is cut off, his subsequent contribution may be discarded until the limits are met, or may be given a lower priority in serialization (step 1030), i.e. the limit-offending segments can be placed farther behind in sequence 150. (Note description below of priority parameter 640g in Figure 5). For example, in some embodiments, when a limit is exceeded, the participant's current segment 140 is terminated, and a new segment is started with a lower priority to capture the limit-offending portion of the contribution. When the participant is speaking, the priority of the segment being created can be displayed to the participant as shown at 610g in Figure 3.
[0071] For some discussions, such as a lecture, the limits 630 can depend on the participant. The limits for a lecturer, for example, can be much larger than for a student if the lecturer contributes a much larger proportion of the discussion.
[0072] In some embodiments, the conference equipment is configured so that that an initial portion of the discussion is not subject to individual limits 630. In other words, any contribution within an initial portion, e.g. first five minutes of the discussion, is not counted towards any limits for any participants or some of the participants. This can be used to
encourage the participants to start the discussion.
[0073] For example, in some embodiments, a time allowed for to a participant to speak at any point in the discussion (shown as parameter 610e in Figure 3) is calculated as:
[0074] ((p x T) + 1) - Y (1)
[0075] where:
[0076] - Y is the total time that the participant has contributed (limited by 630d);
[0077] - T is the total time of all audio in the discussion so far;
[0078] - p is the proportion of time the participant is allowed (given by limit 630a);
[0079] - 1 is the initial allowance for the participant (five minutes in the example above).
[0080] The p and I parameters may differ for different participants. Some participants might have no limit, and this preferentially encourages them to begin the discussion. Different participants may be associated with different types of limits and different formulas for computing the allowed time 610e.
[0081] Serialization (step 1030)
[0082] Below, a system portion that performs the serialization step 1030 will be called
"serializer 1030".
[0083] Figure 6 illustrates exemplary contents of a segment 140 structure received or formed by serializer 1030. Each segment 140 may (or may not) contain the following information (this information is defined by computer data transmitted over a network and/or stored in computer storage):
[0084] - The audio data 510 that represents the spoken contribution of a participant.
[0085] - Identity 520 of the participant who made the contribution. This ID 520 can be, for example, an ID of the corresponding location 110 or the microphone 120 or other equipment, e.g. an ID the client 210 in Figure 9 discussed below.
[0086] - Time stamp 530 identifying the absolute time when the participant began speaking. (For example, the time stamp can be encoded according to a standard such as Unix time which encodes time as the number of seconds after January 1 , 1970, or in some other way.)
[0087] - Segment length (duration) 534.
[0088] - Indicator 540 that indicates whether or not the participant began speaking while another segment was being played at his location.
[0089] - Index 550 of a related segment, which can be the segment played by the participant's speaker 130 at the instant the participant began speaking or otherwise began contributing audio. If the participant began during a pause between two segments, the index
can be the index of the last segment played to the participant. If no segment of the discussion had yet been played to the participant, then the related segment's index 550 can indicate this as a predefined value, e.g. 0 if the serialization indices are positive.
[0090] - Time stamp 560 showing the time when the participant began speaking relative to the beginning of the related segment (shown by index 550) if any. The time stamp 560 can be encoded by the number seconds or other time units from the start of the related segment.
[0091] When segments 140 arrive at serializer 1030, the serializer may place the segments into sequence 150 using pertinent priorities. The priorities may include one or more of the parameters 640 shown in Figure 5 (the term "parameter" as used herein may indicate any collection of data). The pertinent priorities can be provided to serializer 1030 together with each segment 140 (e.g. they can be appended to the segment's structure of Figure 6 if they are not part of the structure). The priorities of Figure 5 are as follows:
[0092] - Priority parameter 640a indicates the priority and/or status of the participant and/or location (based on participant ID 520 shown in Figure 6). The participant's priority may be different for different participants; for example, a manager may be given a higher priority than lower level employees, or vice versa. A lecturer or other presenter may be given a higher priority than the audience (e.g. students). In some embodiments, the lecturer's segments 140 are placed alternately in sequence 150 since it is natural for the lecturer to answer each question from students and, when he asks a question, respond to each answer from a student. Thus, in some embodiments, the lecturer's segments 140 are placed alternately with the students' segments; the students' segments are ordered among themselves using other factors, e.g. other priorities described below. The alternate placement may be indicated by the lecturer's and/or students' priorities in data 640a. In addition or in the alternative, the parameter 640a may indicate status information and may be displayed as one of the features 610 (Figure 3).
[0093] - Priority 640b is the segment's starting time (time stamp 530). In some embodiments, the earlier starting time gives higher priority.
[0094] - Priority 640c is the segment's end time (based on time stamp 530 and segment length 534). In some embodiments, the earlier end time (completion time) gives higher priority. For example, in a lecture, the students' segments could be prioritized entirely by their end times 640c, and placed alternately with the lecturer's segments. However, the segments' length 534 may still be unknown when the segments arrive at serializer 1030 or even when the segments' playback begins at step 1040, so priority 640c may be undefined. In some embodiments, if only one unserialized segment has an end-time then this segment is serialized
next (i.e. assigned an index); if no unserialized segment has an end-time, then the segment with the earliest start-time is serialized next.
[0095] - Priority 640d is the segment's length, i.e. duration (based on segment length 534). In some embodiments, shorter segments are given higher priority. This may be desirable to encourage brevity, and/or to give priority to questions since questions tend to be short.
However, as noted above, the segment length may be unknown at the time when the segment is serialized (at 1030) or even when the segment's playback begins.
[0096] - Playback latency 640e at the time 530. Same as 610c (Figure 3) at the time 530. In some embodiments, the lower playback latency 640e gives higher priority to encourage the participants to listen to the other contributors before speaking. For example, in a lecture, the playback latency 640e can be used to define priorities for the students' segments but not the lecturer's segments.
[0097] - Priority 640f is total length of participant' s contribution in proportion to the total discussion time, measured at the segment's starting time 530. (This is the same value as 610n.)
[0098] - Segment priority 640g could be some default priority (e.g. same as 640a) except when the segment priority is reduced due to violation of one or more limits 630 or possibly other limits. In some embodiments, parameter 640g merely indicates whether or not the segment priority was reduced, and possibly by how much.
[0099] - Priority 640h refers to the related segment as indicated by index 550 (Figure 6). Priority 640h could be the value of index 550, or could be the starting and end times of the related segment which can be obtained from the fields 530, 534 of the related segment (except that the related segment's length 534 may still be undefined as noted above). Parameter 640h can be used to place related segments close to each other.
[00100] Serialization Example 1 (to emphasize promptness and brevity of contributions).
[00101] 1. Serialize the segments in the order completed (i.e. in the order of increasing parameters 640c). (This means that a segment's serialization is delayed until the segment is completed.)
[00102] Serialization Example 2 (using priority classes).
[00103] In this example, before a segment is serialized, it is assigned a priority class (this can be done by serializer 1030 or at capture step 1010 or segmentation step 1020 or at some other step). For instance, in a lecture, the lecturer's segments can be one priority class, and the students' segments can be another priority class. In a different example, the conference includes a panel discussion with the moderator, the panelists, and the audience; there could be a separate class for the moderator, another class for the panelists, and still another class for the
audience. A segment's priority class can be determined from priority parameter 640a (Figure 5) and/or Participant ID 520 (Figure 6). First, each priority class is serialized separately. See Figure 7, showing an example with three priority classes 650.1, 650.2, 650.3 whose segments are serialized in respective queues 150.1, 150.2, 150.3. Any serialization algorithm or algorithms, including those in Serialization Example 1, may be used for this purpose. Then another algorithm is applied to move the segments from queues 150.1, 150.2, 150.3 to queue 150. For example, in one moderator/panelists/audience embodiment, the moderator's segments are in queue 150.1, the panelists' segments are in queue 150.2, and the audience's segments are in queue 150.3. In each of queues 150.1, 150.2, 150.3, the segments are ordered based on the segment starting times 530 (priorities 640b). In moving the segments to queue 150, the moderator segments (queue 150.1) receive the highest priority, and are always moved to queue 150 first as long as the queue 150.1 is not empty. If queue 150.1 is empty, then the panelists' segments alternate with the audience's segments; in other words, one segment is taken from queue 150.2 and the next segment is taken from queue 150.3. If only one of queues 150.1, 150.2, 150.3 is not empty, then the segments are moved from that queue to queue 150.
[00104] Other embodiments are also possible, e.g. taking into account the segments' other priorities 640 or other data. For example, in some embodiments, a separate class is defined for segments whose priority has been lowered as indicated by parameter 640g, and this class receives the lowest priority for moving the segments to queue 150.
[00105] The priority class of the segment being captured can be provided to the speaking participant as feature 610j (Figure 3). Also, feature 610k can provide the priority class that the segment will have if one or more limits 630 become exceeded (see e.g. the limits in Figure 4).
[00106] Feature 610/ provides, for each priority class, the estimated time until the current contribution would appear in the discussion if the current contribution were in that class. This parameter may be provided even if the participant is not currently speaking; this parameter will then refer to a contribution that the participant could start. The time estimate 610/ could be estimated by the conferencing equipment as the sum of:
[00107] i. The playback latency (610c) of the speaking participant (i.e. the time to hear all the serialized segments or segment portions 150 not yet played, possibly excluding the participant's own segments).
[00108] ii. The total time to hear all segments 140 which have not yet been serialized but would be ahead of the participant's contribution in queue 150 (taking priorities into account).
[00109] The latter estimate (ii) can include information on audio captured at other locations 110 even if such audio has not yet been provided to the participant's location 110. The
participant's location 110 can obtain such information by querying other locations 110 and/or the central computer if one is used.
[00110] Time estimate 610/ can also be provided if there are no priority classes (in other words, if all the segments are in the same class).
[00111] Segmentation 1020
[00112] The system portion performing segmentation will be referred to as "segmenter 1020". This portion may overlap with serializer 1030 and/or other parts of the system.
[00113] In some embodiments, the conferencing system can be configured how segments are defined, and the segment definitions may be different for different participants. For example, a segment end can be defined by a minimal length pause ("MLP") and/or by a maximum segment length ("MSL") 630c (Figure 4), and these two parameters can be different for different participants and/or different discussions. The conferencing system may store discussion configuration data (e.g. 268 in Figure 9) which define these parameters and other configuration parameters selected for a particular discussion (i.e. particular conference). For example, discussion configuration data may include the limits 630, and may define what to do if a limit is exceeded (e.g. if a participant attempts to speak beyond his current limit or limits 630). The following three kinds of limits are of particular interest in the segmentation examples below:
[00114] 1. Limit 630a on total participation as a proportion of the entire discussion.
[00115] 2. Limit 630c on the duration of a single segment 140 from that participant (i.e. MSL).
[00116] 3. Limit 630b on playback latency.
[00117] If a participant exceeds any of these limits, or possibly other limits, the
configuration data may specify:
[00118] 1. What to do with the participant's audio violating the limits.
[00119] 2. If such audio is discarded, then when subsequent audio will be accepted.
[00120] 3. How to segment the audio in such cases.
[00121] In some embodiments, the configuration data may specify any one of the following possibilities for the audio generated when a limit is exceeded:
[00122] 1. The limit is ignored for the purposes of segmentation, i.e. the audio is segmented as if no limit has been exceeded. However, segment priority 640g may be reduced for the segment.
[00123] 2. The current segment is ended, and the limit-offending audio is placed in a new segment 140. Segment priority 640g can be reduced for the new segment.
[00124] 3. The limit-offending audio is discarded. This option may be useful to save hardware resources.
[00125] In some embodiments, the start and end of each segment 140 is defined primarily by pauses in the audio. But if a participant does not provide adequate pauses, the maximum segment length (MSL) 630c can be used. A participant's successive segments can be interrupted by other participants' contributions, so breaking a segment at the MSL may degrade audio clarity (for example, if the segment is broken in the middle of a word, and the two successive segments containing this work are played back at different times, with other segments intervening). The participant can use indicators 610h and 610i (Figure 3) to ensure segment termination at pauses. User interface 610 also informs the participant of delays of his contributions being heard (feature 610/ ).
[00126] The flowing example, illustrated in Figure 8, describes segmentation for audio captured at a single location 110 (the same or different segmentation algorithms may be used for different locations 110):
[00127] Segmentation example 1 performed by segmenter 1020 for one location:
[00128] 1. Determine the start of a new segment (step 810). This may be the start of the first audio received from the location, or may be based on the end of the previous segment as described below.
[00129] 2. At step 820, scan the audio of the new segment for a minimum length pause MLP1 (i.e. a pause of a length at least MLP1) or the maximum segment length 630c (MSL), whichever occurs first. Both MLP1 and MSL may differ from location to location. MLP1 and limits 630 can be defined by configuration data as explained above.
[00130] 3. If the minimum length pause (MLP1) is detected before the maximum segment length (MSL), terminate the segment at the start or during the pause (step 830). In a variation, the segment is terminated even if MLP1 starts at MSL. The subsequent audio from the same location will be in another (new) segment. The new segment will start at the end of the previous segment or at the end of the pause or at some other time during the pause. (If one or more segments represent a participant' s contribution played continuously without intervening segments from other participants, then the pauses in the participant's contribution may or may not be reproduced in the playback, and thus may or may not be included in segments' audio data 510; in some embodiments, the pauses below some configurable limit (e.g. 10 seconds) are reproduced, but longer pauses are not; the configurable limit can be part of configuration data 268). Go to step 820 to find the end of the new segment, or to step 810 if the new segment has not begun (i.e. no audio has been received after MLP1).
[00131] 4. If the maximum segment length MSL is detected before the minimum length pause MLP1 (step 840), conduct a backward search of the audio from the MSL point to the segment start to find a shorter pause, of another minimum length MLP2 (i.e. a pause of at least MLP2, where MLP2 is possibly defined by configuration data). MLP2 is smaller than MLP1. For example, MLP1 can be 5 seconds, and MLP2 can be 1 second. Other values are also possible. If a shorter pause MLP2 (i.e. of a length at least MLP2) is found (step 850), then terminate the current segment at the start or during the shorter pause MLP2 (the latest shorter pause MLP2 if there are multiple shorter pauses), and place the subsequent audio into a new segment. The new segment will start at the end of the previous segment (i.e. sometime during the shorter pause). Go to step 820 to find the end of the new segment, or to step 810 if the new segment has not begun (i.e. no audio has been received after MLP2).
[00132] 5. If a shorter pause MLP2 is not found (step 860), terminate the current segment at the maximum length MSL, and start a new segment some time before the previous segment' s end (i.e. before the MSL point) so that the new segment will overlap with the previous segment. For example, if the audio is received in network packets according to some protocol, then the new segment can be started at the start of the packet containing the MSL point.
Alternatively, the overlap between the new and previous segments can be determined as starting at a fixed time (e.g. 0.5 seconds) before the MSL point. How the overlap is defined can be specified by the configuration data. The overlap will cause duplication of some of the audio in the playback so as to help the listener to understand the audio.
[00133] Go to step 820 to find the end of the new segment.
[00134] End of segmentation example.
[00135] In some embodiments, whenever the segmenter 1020 determines a segment's start or end, the start and end information (as defined by data 530 or 534 in Figure 6 for example) is passed to serializer 1030. The segmenter 1020 determines pertinent priority information 640 (and/or other priority information) for each segment, and passes such information to serializer 1030.
[00136] Feedback can be provided to the participant using the features 610 described above. In particular, the participant can be provided with indicator 610i when his audio contribution might be cut to form a new segment. The indicator 610i could be any of these:
[00137] - A green light that turns red, possibly turning yellow before turning red. It can turn green again when the limit 630c expires or a new segment is started.
[00138] - An audible alarm, possibly changing as the limit 630c is reached.
[00139] - If any audio is discarded, discarding can be indicated by resuming playback as
noted above.
[00140] As noted above, the policy of the serializer 1030 may encourage participants to pause speaking through a policy giving higher priority to the soonest ending segments (as specified by priority 640c in Figure 5). In this case, a type of feedback that can encourage a participant to pause is the number of other participants with segments not yet completed because each participant would want to end his segment before the other participants stop speaking. That information can be displayed by user interface 610 (as feature 610m for example). Alternatively, feature 610m can be used to encourage the participants to pause.
[00141] Playback 1040
[00142] Turning now to playback step 1040, the output of the serialized discussion sequence can include pauses of a desired length between the segments 140. Pauses between the segments provide a participant who wishes to make a contribution with an obvious and convenient moment to do so. This is especially useful in embodiments that do not allow a participant to start a new segment during a playback of another segment. Such embodiments are useful where the participant's computing device is not powerful enough for the speech processing needed to separate the segment being output from the new speech.
[00143] Video can accompany the audio segments 140, and can be played in the same sequence as the respective segments 140 for participants with video display capabilities. The video segments can be produced by a speaking participant if the participant has a means to produce video like a web-cam.
[00144] When a presentation includes visual media, such as Microsoft PowerPoint slides, the presenter can provide the video and the video can be stored with the serialized discussion sequence 150. When a new segment 140 is created from a question from the audience, the segment can be associated with a point in time of the video provided by the presenter. Then, when the audience's segment is played, the segment's audio can be accompanied by the associated video from the presenter for providing the context for the question. If the video originally comes from the presenter's computer, the presenter has the option to take control back and switch the video in real-time, but that only affects what will be seen by participants who have not yet played back the segment.
[00145] Now exemplary embodiments will be described in more detail in connection with Figures 9-11. In these embodiments, segmentation 1020 and serialization 1030 are performed by a central computing system 260 (Figure 9). Audio capture 1010 and playback 1040 are performed by the conferencing equipment at each location 110. Such conferencing equipment is shown as "discussion clients" 210 in Figure 9.
[00146] As noted above, Figure 1 shows the segments 150 are streamed to locations 110. The streaming to each location 110 can occur independently from the other locations.
[00147] Figure 9 shows examples of conferencing equipment 210 locations 110A, HOB, 1 IOC, HOD. Each of the four locations has a discussion client 210 which captures participants' contributions at that location and plays segments 150. Each discussion client 210 includes computer storage for storing the serialized segments 150. Each discussion client 210 also includes a network interface which allows the discussion client to access a
telecommunication network interconnecting the discussion client with central computer 260. Each location 110 in the figure is set up differently:
[00148] 1. Location 110A has only audio devices, and namely an audio speaker 130 and a microphone 120. Location 110A cannot generate or view video. (Microphone 120 and speaker 130 are shown as separate from the discussion clients, but the term "discussion client" may include the microphone and the speaker.)
[00149] 2. Location 110B has a speaker device 130 and a microphone 120, and in addition has video devices including a web cam 220 and a display screen 240 for video. (The term "discussion client" may include the web cam and the display screen.)
[00150] 3. Location 1 IOC is a mobile phone. The phone can be used for audio as the phone includes a microphone and a speaker device (not shown), but in some embodiments the phone could also be used for video since the phone includes a screen and may or may not include a camera. The phone includes a computer (not shown) including a computer processor and memory and other associated circuitry (e.g. a network interface card).
[00151] 4. Location 110D is a laptop computer which may be able to participate fully in the discussion, i.e. to provide both audio and video capture and display. The discussion client is the laptop computer's processor and memory and other associated circuitry (e.g. network interface card, etc.).
[00152] Each of these discussion clients 210 at locations 1 lOA-110D communicates with central computing system 260 that stores the serialized discussion segments 150 and performs serialization 1030 to assign to each segment an index, e.g. a whole number greater than zero, which is used to identify the segment. The central computing system 260 can be a computer on the Internet or can be a network of computers. Segmentation 1020 is also performed by central computing system 260. The central computing system 260 also stores, in non-transitory computer storage (e.g. disks, tapes, semiconductor memory, etc.) configuration data 268 and the status 272 of the discussion in progress, and provides such information to clients 210. Configuration data 268 includes limits 630 and possibly other data as described above (e.g.
lengths of pauses for segmentation 1020). Discussion status 272 includes information that can be used by a discussion client to obtain the UI features 610.
[00153] Figure 10 shows a block diagram of an exemplary central computing system 260. Newly captured audio is directed from discussion clients 210 to Speech processing unit 310 (Figure 10) within central computing system 260. Discussion clients 210 may also periodically time-stamp different points in the audio to allow the speech processing unit 310 to later determine the segments' start 530 and length 534 (Figure 6) when segmentation is performed. Alternatively, a discussion client 210 may include no time stamp, and speech processing unit 310 assumes that the audio is generated at the time it is received by the speech processing unit. Different discussion clients can operate differently from each other in the same discussion. If the audio was captured during playback of a segment 150, the discussion client 210 also sends the segment's index "RSI" (for defining the field 550) and the time stamp TS within the related segment (for field 560). These fields, as well as the other fields of Figure 6, will be defined by speech processing unit 310 in segmentation 1020.
[00154] In some embodiments, when speech processing unit 310 cleans up the sound (the audio) upon receipt. The speech processing unit might remove noise, but an important part of the cleanup is to remove sound from the related segment. The related segment' s sound might be picked up by the microphone 120, and should preferably be removed to make the contributor's presentation clearer. If the contributor uses a head phone for microphone 120 or starts in a pause, the unwanted sound might be minor, but otherwise, it can be significant. This processing is similar to echo cancellation and prior art might be used to implement it. Spectral subtraction, also prior art, might also be used. In removing sound from the related segment, speech processing unit 310 may use the related segment index RSI and the time stamp TS received from the discussion client 210 with the new audio. More particularly, speech processing unit 310 supplies RSI and TS to Serialized Segment Server 320 (described below) which stores segments 140. Serialized Segment Server 320 returns the pertinent portion of the related segment's sound 510. In some embodiments, only a short portion of the new audio is processed for removal of the related segment's sound because the discussion client 210 pauses playback of the related segment as soon as the participant begins speaking.
[00155] In some embodiments, at least some sound cleanup is performed by the discussion client 210, and additional cleanup may or may not be performed by the speech processing unit 310. In some embodiments, some discussion clients 210 cleanup the sound while other do not, and speech processing unit 310 may perform cleanup for some but not other discussion clients, and different types of cleanup may be performed for different discussion clients. Both the
speech processing unit 310 and the discussion clients 210 have access to the related segment data 510. The central speech processing unit 310 might perform these functions because it may have more powerful processing capabilities, but the client system 210 might perform these functions because it can optimize the processing for the acoustic environment and because it does not need to do processing for other participants.
[00156] Either way, speech processing unit 310 performs segmentation 1020 and provides new segments 140 and each segment's pertinent priorities 640 to segment serializer 330. (In some embodiments, segmentation is performed by some Discussion Clients 210 but not all the Discussion Clients; speech processing unit 310 performs segmentation for those clients 210 which do not perform segmentation. In other embodiments, a client 210 may remove long pauses from the audio data, but the remaining segmentation work is performed by speech processing unit 310.)
[00157] It should be noted that audio may arrive simultaneously from different Discussion Clients 210, and can be processed simultaneously.
[00158] Segment serializer 330 performs serialization 1030, placing each segment 140 into queue 150 and assigning a segment index to the segment. In the embodiment being described, the segment index is a number that is greater for segments later in the sequence 150. Any suitable serialization algorithm can be used including those described above. Another possible algorithm is to simply assign increasing indexes to segments 140 in the order that new segments initially arrive at the serializer 330. Of note, a new segment 140 may arrive over time as it is being created and the serializer 330 can wait until it has received all of the new segment before assigning an index to the new segment, especially since some possible rules for assigning the index use the segment length 534 (Figure 6) or the segment's end time (530 plus 534). The serialization algorithm is defined by serialization rules 342 which are part of configuration data 268 stored in computer storage in discussion configuration unit 340.
Discussion configuration unit 340 provides the serialization rules to serializer 330. For example, the serializer 330 can work as follows:
[00159] 1. When audio begins to arrive for a new segment 140 or the segment start is determined in previously obtained audio, serialization can start even before the new segment' s end is determined. Serializer 330 checks the serialization rules 342 for privileges of the contributing participant. For example, the serialization rules may indicate that the new segment's contributor (identified by ID 520 and/or priority 640a) might have the privilege that his segments 140 appear alternately in the sequence (for example, if the contributor is a lecturer). If so, and if the last serialized segment 140 was contributed by another participant,
then the segment 140 is immediately assigned the next index (possibly even before the segment's end is determined).
[00160] 2. Otherwise, the new segment 140 is placed in a group of unassigned segments to which other rules 342 are applied. If enough Discussion Clients 210 are waiting for segments because the Discussion Clients have already played all serialized segments, then the serializer 330 can pick an incomplete new segment 140 and assign to it the next index so the segment can become available to the waiting clients 210. Serializer 330 can apply a rule such as picking a segment 140 based on its earliest time stamp 530. Serializer 330 can also consider the priority 640a of the contributor. The serializer can obtain current information about waiting clients from the Serialized Segment Server 320 described below.
[00161] 3. Otherwise, if the current demand from clients 210 is low, other serialization rules 342 can be applied. Also, as each of these segments 140 is completed, other serialization rules 342 can be considered. For example, if a completed new segment 140 is short enough it can be assigned an index immediately. The segment can be deemed short enough if all other unassigned new segments are already longer. Alternately, the serializer can simply choose the earliest completed new segment 140 to be assigned the next index.
[00162] 4. Immediately upon assigning an index to a new segment, the serializer 330 starts transmitting the segment to the Serialized Segment Server 320. (The serializer 330 transmits the whole segment or, possibly, stores the segment and transmits its address to server 320.)
[00163] Such serialization algorithms can also be used with other embodiments discussed above in connection with Figure 2.
[00164] The Serialized Segment Server 320 does the following:
[00165] 1. Receives newly serialized segments 140 from the segment serializer 330 and stores them in computer storage 264 with all of the associated attributes mentioned above (e.g. those shown in Figure 6 and, possibly, some or all priorities 640). The storing can begin while the segment 140 is still being transmitted from the serializer 330.
[00166] 2. Handles requests for serialized segments 150 from Discussion Clients 210. A request can come while a serialized segment 150 has not been fully transmitted from the serializer 330 in which case the Serialized Segment Server 320 streams the segment 150 to the Discussion Clients 210 directly from the serializer 330 (server 320 stores the segment in storage 264 at the same time). Otherwise, Serialized Segment Server 320 streams the segment 150 to the client 210 from the storage 264 where the segment 150 has been previously stored by the segment server.
[00167] 3. Handles request from the Speech processing unit 310 for related speech
segments. These might also be streamed directly from serializer 330 or retrieved from the storage 264.
[00168] 4. Responds to requests from Discussion Clients 210 for discussion status 272. Status 272 can include:
[00169] - a. What serialization indices have been assigned.
[00170] - b. The total time of the audio data in each serialized segment 150 (for those segments which have not yet been completely received from serializer 330, the total time up to current time).
[00171] - c. The total time of the audio data of all serialized segments 150 (i.e. total time of the sound represented by the audio data) from the beginning of the conference, or from a time included in the request received from client 210.
[00172] - d. The total time of the audio data of all serialized segments 150 not yet transmitted to the Discussion Client 210 issuing the request.
[00173] - e. The participant identity 520 for a segment whose index is in the request.
[00174] Of note, in some embodiments, the segment serializer 330 can stream new segments 140 into the Serialized Segment Server 320 to begin storing each new segment 140 before the segment is assigned an index. The segment serializer 330 can later send the segment's serialization index to the Serialized Segment Server 320. Before being assigned an index, a new segment 140 can be assigned a temporary index that the serializer 330 and Serialized Segment Server 320 can use to reference the segment, or the segment can be referenced by its position (e.g. starting address) in storage 264 or possibly by its participant ID 520 (Figure 6), or in some other way.
[00175] Discussion configuration unit 340 stores configuration data 268 that other parts of the system, such as the Segment Serializer 330, the Discussion Clients 210, and unit 310 doing the segmentation, can access. As noted above, configuration data 268 may include the limits 630 for each participant, and may also include other information, for example privilege information on participants (e.g. same as 640a and maybe other information). The privilege information may include:
[00176] — Information indicating that a presenter's contributions are alternated with other presenters' contributions.
[00177] — Information on other priority for sequencing of contributions.
[00178] Configuration data 268 may include the length of pauses to be inserted by the Discussion Client 210 between the playing of consecutive segments 150, or the lengths of pauses used in segmentation as described above.
[00179] The term "participant" for these purposes can mean all of persons sharing one Discussion Client 210 since the system does not distinguish those persons from each other.
[00180] Figure 11 is a block diagram of a Discussion Client 210. The major functions of a typical Discussion Client are capture 1010 and playback 1040.
[00181] The Discussion Client of Figure 11 has a video buffer 410 and an audio buffer 420 (implemented in main memory or some other computer storage) to capture and store the raw data from any video capture device such as a web cam 220 and audio capture device 120 such as a microphone so that the data is not lost before it can be processed.
[00182] User interface 430 may include the user interface features 610 and 620 (Figure 3), and may display discussion status 272 described above and may accept commands from the participant in one form or another such as a voice command or touch of a button. (User interface 430 may be combined with screen 240 (Figure 9) to display both status 272 and video segments, and/or user interface 430 can be combined with user interfaces of other devices, such as 120 or 220.)
[00183] VAD (Voice Activity Detection) unit 440 detects the start of new speech. The detection is performed based on the signal from audio capture device 120. When VAD 440 detects new speech, VAD 440 alerts New Segment Control unit 450 to capture the new audio. VAD unit 440 can use algorithms from prior art, such as counting zero crossings, to detect the start of new speech. The detection can err toward inferring start of speech when there is none because another unit, the speech processing unit 460, can compensate. In this approach, VAD 440 can make a quick judgment and the more complex analysis is only performed when VAD 440 detects start of speech. This design is useful when hardware is not sufficiently powerful or the playing of segments by Segment Player 470 feeds sound back to microphone 120 confusing the VAD algorithm. If sound is fed back to microphone 120, VAD 440 is likely to produce false positives (false signals indicating start of speech when there is none), but speech processing unit 460 can subtract the feedback (the played sound) as described below, and can re-run the VAD algorithm.
[00184] Speech processing unit 460 is very similar to the speech processing unit 310 in the Central computing system 260. As stated before, only one or both of the two speech processing units 460, 310 may perform sound cleanup. However, regardless of the functionality of unit 310, the unit 460 in the Discussion Client 210 may also perform the following tasks:
[00185] 1. It provides information time stamps and related segment information for use in segmentation 1020 as described above. To provide such information, unit 460 may use
information received from the New Segment Control Unit 450 described below.
[00186] 2. Unit 460 transmits the video stream from capture device 220 as well the audio stream to central computer 260. If unit 460 is directed by the New Segment Control Unit 450 not to transmit the audio as described below (due to limit violations for example), then unit 460 may also block transmission of the associated video.
[00187] If speech processing unit 460 performs reduction of noise and of sound from a related segment, then speech processing 460 can have these special features:
[00188] 1. It can learn parameters that describe how the acoustic environment affects the audio of the new segment. How ambient noise and sound from simultaneously played segments affects the audio of the new segment depends largely on the environment. So speech processing unit 460 might do a better job in reducing noise and sound from a related segment than the corresponding unit 310 in the Central computing system 260.
[00189] 2. Speech processing unit 460 can augment the VAD algorithm because speech processing unit 460 uses information about the sound from a simultaneously played segment and about how the input audio is affected by the simultaneously played segment. After removal of noise and sound of the simultaneously played segment from the audio input, speech processing 460 can test more accurately for the start of new speech. If unit 460 determines that new speech has not occurred (VAD was triggered by noise or playback), unit 460 signals the New Segment Control 450 that there is no new audio to transmit, and speech processing 460 does not transmit the audio.
[00190] New Segment Control unit 450 directs transmission of newly captured audio to the central computer 260 as follows:
[00191] 1. It waits for indications of new audio from two sources:
[00192] - a. User interface 430 which might provide a participant's command to indicate creation of a new segment (via voice command, button touch, or other human interface). To the participant, this is a "record" command.
[00193] - b. VAD unit 440.
[00194] 2. When New Segment Control 450 has indication of new audio, New Segment Control 450 applies rules based on (i) participant status 640a received from the Segment Player 470, (ii) discussion status 272 and discussion configuration 268 from the Central computing system 260, to determine whether the new audio should be allowed (accepted). Possible rules are described above in connection with limits 630a, 630b, 630d (Figure 4). For example, New Segment Control 450 can use the rules to compute the parameter 610e (Figure 3) indicating how much additional time this participant can contribute to new segments based
on current discussion status 272. New Segment Control 450 might use this information to enforce the rules. New Segment Control 450 can additionally transmit the information to user interface 430 for display.
[00195] 3. If New Segment Control 450 determines that the new audio should be accepted, New Segment Control 450 signals the Segment Player 470 to pause any currently playing segments and signals the Speech processing unit 460 to transmit the new audio. At the same time New Segment Control 450 sends information to the Speech processing 460 to be used in segmentation and/or serialization. Such information may include:
[00196] - a. Identity of the participant.
[00197] - b. Status of playback at the time and information about any related segment being played at the time (such as related segment index 550 and time stamp 560 within the related segment).
[00198] 4. New Segment Control 450 accepts any signal from the Speech processing unit 460 to stop the new audio and responds by signaling the Segment Player 470 to resume any playback.
[00199] Audio segment buffer 482 and video segment buffer 484 monitor the Central computing system 260 for serialized segments 150 and the associated video segments and buffer the audio and video segments as the segments become available. A video segment contains the video data for the corresponding audio segment 150. Audio buffer 482 includes, for each segment 150 it stores, all segment information as described above including the serialized segment index. The speech processing unit 460 can access audio segment buffer 482 for a related segment 150 played during capture of new audio and can remove from the new audio the sound played during the new audio capture.
[00200] Segment Player 470 performs the following functions:
[00201] 1. Accepts pause, rewind, skip, and continue commands from the user interface 430 via New Segment Control unit 450 and pause-playback and resume-playback commands from the New Segment Control unit 450.
[00202] 2. In accordance with those commands, transmits data from the audio segment buffer 482 and video segment buffer 484 (if any) to the audio and video playback devices 130, 240.
[00203] 3. Obtains discussion status 272 from the Central computing system 260, tracks the activity of the participant using this client 210, and keeps status information for user interface 430 (e.g. for features 610) and New Segment Control unit 450. For example, in some embodiments, Segment Player 470:
[00204] - a. Provides the index of the currently playing segment 150 to the New Segment Control Unit 450. The index can be used as the related segment index (note field 550 in Figure 6).
[00205] - b. Provides the identity of the participant that created the currently playing segment 140 to the user interface 430 via New Segment Control unit 450 for display (see feature 610b in Figure 3).
[00206] - c. Obtains the playback latency for display as feature 610c and for use by the New
Segment Control unit 450 in determining compliance with limit 630b.
[00207] - d. Provides the total discussion time contributed by this participant and by all participants, so far, for display and use by the New Segment Control unit 450.
[00208] - e. Provides the total number of participants for display as feature 610a.
[00209] - f. Provides estimated time until current contribution will be heard (feature 610/ ).
In some embodiments, New Segment Control unit 450 computes this estimate from other participants' status data 272 (obtained from Segment Player 470) and configuration data 268. Such data may be provided by the Serialized Segment Server 320 and passed to unit 450 via
Segment Player 470.
[00210] With respect to the playback latency and the feature 610/, the total time of not yet serialized segments 140 of a given priority can be obtained by the Segment Player 470 from the Serialized Segment Server 320. Other information for display can be obtained in one of these ways:
[00211] 1. Computed by the New Segment Control Unit 450 which has knowledge of discussion rules such as limits 630.
[00212] 2. Computed by the Segment Player 470 which has information about what has been played.
[00213] 3. Obtained from the Serialized Segment Server 320 which has most of the information except the playback latency. In particular the Serialized Segment Server 320 can provide:
[00214] - Information on all serialized segments 150 including their lengths in units of time.
[00215] - The number of participants whose segments are being currently created assuming that the Segment Serializer 330 registers segments with Serialized Segment Server 320 before they are completed or serialized.
[00216] - If the Segment Serializer 330 also passes segment priorities 640g to Serialized Segment Server 320, then the priority 640g for any segment 140 the participant is currently creating can also be provided.
[00217] Information from the Serialized Segment Server 320 and the Segment Player 470 is passed from the Segment Player 470 to the New Segment Control Unit 450 as the participant status (included in 640a). Therefore, the New Segment Control Unit 450 can provide to user interface 430 all of the above information for display or alerts.
[00218] The information 610i to determine when the participant's contribution might be cut can be computed using discussion status 272 that the Segment Player 470 uses to create participant status 640a. In the mode in which the Serialized Segment Server 320 stores not only serialized segments but also segments not yet serialized, it can provide the needed information to the Segment Player 470 - the starting time 530 and duration 534 of the participant's most recent segment 140 (duration 534 can be provided when it becomes defined in segmentation 1020). If that segment does not include the current time, this is assumed to mean that the participant is not speaking and thus is not close to exceeding the limit 630c. Otherwise the Segment Player 470 can pass the start time 530 to the New Segment Control Unit 450 which can compute the current length of the new segment being created and how much remains before the limit 630c is reached. A cut-off might also be enforced due to the participant falling behind in playback (limit 630b), and the new segment control unit 450 can also estimate how soon that condition might occur from the participant status 640a and can cause the user interface 430 to use user interface to alert the participant, for example, showing yellow, then red when the user is close to falling too far behind in playback.
[00219] In some embodiments, there are one or more Discussion Clients 210 that are not trusted to apply limits 630 or other limitations, or to honestly provide information on playback latency (so as to affect the limit 630b) or other information. In those cases, the central computing system 260 would watch out to enforce the limitations, e.g. to authenticate the Discussion Clients or conferencing software used by the Discussion Clients. However, if a Discussion Client 210 is deceptive, the central computing system 260 could never know for sure what segments have been played. Ideally all Discussion Clients would have some credential that the server could authenticate.
[00220] Anything described herein as a "unit" can be implemented in hardware with or without the use of software (hardware can include a software-programmed computer). In some cases, hardware implementations are more expensive and less practical.
[00221] All of the systems, sub-systems, and units described herein can be parts of a system that serves many concurrent discussions. This may require suitable scaling, distribution of data and processing, and routing of requests and responses according to the discussions involved.
[00222] The invention is not limited to the embodiments described above. For example, any or all configuration data can be non-configurable.
[00223] Some embodiments provide a teleconferencing method comprising performing, by a teleconferencing system, operations including, during a teleconference conducted by participants located at two or more locations interconnected by a telecommunications network, obtaining segments of audio data representing audio signals generated by the participants. For example, the segments can be segments 140 obtained by serializer 1030. Each segment contains audio data from a respective one of the locations. The audio data in each segment are associated with a time at which the audio signals are assumed to have been generated. For example, the "time" can be starting time 530, or the end time, or some other time associated with the segment's audio (e.g. the time of the middle of the audio). The time may also be a time interval, i.e. may include more than one time points. The time can be absolute or relative for example.
[00224] The method further comprises serializing the segments received from two or more of the locations. Serialization can be performed on segments from all the locations or just some of the locations. For example, if the serialization is performed at each location, the
serialization may omit the segments generated at the location.
[00225] Serialization establishes an order of the audio data (e.g. the order in sequence 150). The order is established based on one or more predefined rules for establishing the order, the serializing being performed even if audio signals of two or more of the segments overlap, the serializing thus allowing the audio signals overlapping in time to be reproduced (played) from the audio data separately rather than mixed;
[00226] The method further comprises processing the segments taking the order into account. For example, processing may include playing all or some of the segments, or sending all or some of the segments to one or more locations (e.g. by a central computer).
[00227] In some embodiments, the teleconference is conducted by three or more participants.
[00228] In some embodiments, the order takes into account one or more of:
[00229] time when the audio signals were generated;
[00230] information on a priority and/or privilege of at least one segment' s participant;
[00231] the segments' lengths.
[00232] In some embodiments, processing the segments comprises sending the segments' audio data and information on the segments' order over the telecommunications network to at least two of the locations.
[00233] Some embodiments provide a teleconferencing method comprising executing teleconferencing operations by a teleconferencing system located at a first location which is one of two or more locations interconnected by a telecommunications network. The teleconferencing operations are executed during a teleconference conducted by participants located at the two or more locations, the teleconferencing operations comprising:
[00234] (1) obtaining, by the teleconferencing system, audio data representing audio signals generated at one or more other locations, the audio data representing the audio signals from each location separately without mixing the audio signals from different locations regardless of whether or not the audio signals at different locations were generated simultaneously;
[00235] wherein the teleconferencing system is operable to play the audio data;
[00236] (2) wherein the method further comprises recording audio signals generated at the first location to generate audio data representing the audio signals, and sending such audio data over the telecommunications network for use at the other locations.
[00237] Some embodiments provide a teleconferencing method in which regardless of whether or not the audio signals or audio data are generated simultaneously at two or more locations, the audio data represent the audio signals from each location separately without mixing the audio signals from different location regardless of whether or not the audio signals at different locations were generated simultaneously.
[00238] With respect to mixing of audio signals, if a participant speaks while related audio (e.g. a related segment) is played at his location, the participant's speech and the related audio can become mixed before being captured and converted to audio data (audio data is typically digital data). The audio data represent the audio captured at the participant's location, even though this audio includes the related audio originally captured at a different location and now re-captured at the participant's location. However, once audio is captured at any location and converted to audio data, the sound represented by audio data from different locations is not mixed in the audio data representation.
[00239] Other embodiments and variations are within the scope of the invention, as defined by the appended claims.
Claims
1. A teleconferencing method comprising performing, by a teleconferencing system, operations of:
during a teleconference conducted by participants located at two or more locations interconnected by a telecommunications network, obtaining segments of audio data representing audio signals generated by the participants, each segment containing audio data from a respective one of the locations, the audio data in each segment being associated with a time at which the audio signals are assumed to have been generated;
serializing the segments received from two or more of the locations to establish an order of the audio data, the order being established based on one or more predefined rules for establishing the order, the serializing being performed even if audio signals of two or more of the segments overlap in time, the serializing thus allowing the audio signals overlapping in time to be reproduced from the audio data separately rather than mixed; and
processing the segments taking the order into account.
2. The method of claim 1 wherein processing the segments comprises sending the segments' audio data over the telecommunications network to the locations.
3. The method of claim 2 further comprising sending to a location, for each segment sent to the location, information regarding an identity of the participant who contributed to the audio data currently being played by the teleconferencing system.
4. The method of claim 2 further comprising sending, to the locations, the total number of participants and/or locations.
5. The method of claim 2 further comprising sending, to the locations, the total time contributed by all participants.
6. The method of claim 1 wherein processing the segments comprises sending the segments' audio data over the telecommunications network to at least two of the locations, wherein each location is being sent at least the segments not generated at the location, wherein the segments sent to both of the locations are sent in the same order.
7. The method of claim 6 wherein the teleconference is conducted by three or more participants.
8. The method of claim 6 wherein the order takes into account one or more of: time when the audio signals were generated;
information on a priority and/or privilege of at least one segment's participant;
the segments' lengths.
9. The method of claim 6 wherein the order takes into account that segments obtained from one of the locations are to be alternated in the serialization with segments obtained from one or more other locations.
10. The method of claim 6 wherein at least one of the segments is serialized while the segment is still incomplete.
11. The method of claim 6 wherein processing the segments comprises sending the segments' audio data over the telecommunications network to the locations, wherein each location is not being sent the segments generated at the location.
12. The method of claim 6 further comprising performing, by the teleconferencing system, operations of:
obtaining configuration data specifying, for at least one location, information on at least one of:
a privilege of the location's participant;
a limit on the length of each of the location's segments;
a limit on the total proportion of the time taken by audio data allowed to come from the location;
a limit on how much of audio data can be unheard by the location when the location receives an audio signal for the teleconference;
a length of pauses to be inserted at the location between the playing of consecutive segments.
13. The method of claim 12 wherein the configuration data specify, for at least one location, information on the privileges of the location's participant, the information on the
privileges comprising information on sequencing of the location's participant's segments with segments of other participants.
14. The method of claim 12 wherein the configuration data specify, for at least one location, an indication that the participant's segments are to alternate with segments of other participants in said order.
15. The method of claim 6 wherein the locations include a first location and a second location;
wherein the audio data from the first location comprise first audio data associated with first video data from the first location;
wherein the audio data from the second location comprise second audio data which represent audio signals generated in association with the first audio data being played at the second location;
wherein the method further comprises providing the second audio data for being played at one or more locations other than the second location, and providing the first video data for being displayed while playing the second audio data.
16. The method of claim 6 wherein at least the end or the beginning of at least one of the segments of audio data representing audio signals generated by at least one of the participants ("first participant" below) is determined using a minimum- length pause in the audio signals generated by the first participant.
17. The method of claim 6 wherein at least the end or the beginning of at least one of the segments of audio data representing audio signals generated by at least one of the participants ("first participant" below) is determined using a maximum length for segments of the audio signals generated by the first participant.
18. The method of claim 6 wherein at least the end or the beginning of each of a plurality of segments of audio data representing audio signals generated by different participants is determined using maximum lengths for segments of the audio signals generated by said participants, wherein each maximum length depends on the participant.
19. The method of claim 6 wherein processing the segments comprises inserting pauses of a predetermined length between the segments.
20. The method of claim 6 wherein at least one segment ("first segment") represents audio signals generated at one location while playing, at said location, audio data from another; and
the method further comprises removing the played data from the first segment.
21. The method of claim 6 wherein in obtaining the segments, the audio data for the segments from each location are limited by a proportion of time for audio signals generated at the location relative to the audio signals generated at all the locations.
22. The method of claim 6 wherein in obtaining the segments, the audio data for the segments from each location are limited based on amount of audio data not yet played at the location.
23. A teleconferencing method comprising executing teleconferencing operations by a teleconferencing system located at a first location which is one of two or more locations interconnected by a telecommunications network, the teleconferencing operations being executed during a teleconference conducted by participants located at the two or more locations, the teleconferencing operations comprising:
(1) obtaining, by the teleconferencing system, a sequence of segments of audio data representing audio signals generated at one or more other locations, each segment containing audio data from a respective one of the locations, the audio data representing the audio signals from each location separately without mixing the audio signals from different location regardless of whether or not the audio signals at different locations were generated
simultaneously;
wherein the teleconferencing system is operable to play the audio data;
(2) wherein the method further comprises recording audio signals generated at the first location to generate audio data representing the audio signals, and sending such audio data over the telecommunications network for use at the other locations.
24. The method of claim 23 wherein the teleconferencing operations are executed during the teleconference conducted by participants located at the three or more locations.
25. The method of claim 23 further comprising providing, via user interface of the teleconferencing system, information regarding an identity of the participant who contributed to the audio data currently being played by the teleconferencing system.
26. The method of claim 23 further comprising providing, via user interface of the teleconferencing system, the total number of participants and/or locations.
27. The method of claim 23 further comprising providing, via user interface of the teleconferencing system, the total time contributed by all participants.
28. The method of claim 23 further comprising providing, via user interface of the teleconferencing system, the time contributed at the first location.
29. The method of claim 23 further comprising providing, via user interface of the teleconferencing system, information regarding an amount of time required to hear all of the audio signals that have been recorded in the teleconference but have not been played.
30. The method of claim 23 further comprising:
obtaining, by the teleconferencing system, information on a limit condition imposed on recording of audio signals in operation (2);
(3) detecting an audio signal;
performing or not performing the operation (2) on the audio signal detected in (3) depending on whether or not performing the operation (2) on the audio signal in (3) would violate the limit condition.
31. The method of claim 30 further comprising providing, via user interface of the teleconferencing system, an indication that the operation (2) is not performed on the audio signal in (3) if the operation (2) is not performed on the audio signal in (3) due to violation of the limit condition.
32. The method of claim 30 further comprising providing, via user interface of the teleconferencing system, an additional amount of time allowed by the limit condition for the audio signal (3) if the audio signal is to be processed by operation (2).
33. The method of claim 30 wherein the limit condition includes a limit on a proportion of time for audio signals processed in operation (2).
34. The method of claim 30 wherein the limit condition includes a limit on an amount of audio data not yet played by the teleconferencing system at the time of (3).
35. The method of claim 23 further comprising receiving, by user interface of the teleconferencing system, a command to:
- pause playing of the audio data;
- rewind playing of the audio data;
- skip ahead after a rewind to a beginning of the audio data not yet played;
- skip ahead to an end of at least a portion of audio data recorded by the
teleconferencing system;
- continue playing if the playing has been paused.
36. The method of claim 23 further comprising:
playing first audio data by the teleconferencing system, wherein the first audio data is obtained from another location and is associated with second audio data played at the other location at a time related to the time that the first audio data was produced at the other location, wherein the second audio data is associated with video data;
when playing the first audio data, the teleconferencing system discovering an association between the first audio data and the second audio data and between the second audio data and the video, and in response to this discovering, the teleconferencing system displaying the video data.
37. The method of claim 23 further comprising:
the recording comprises obtaining first audio data representing audio signals generated at the first location, and the method comprises determining that the first audio data is associated with second audio data played by the teleconferencing system at a time associated with the time of obtaining the first audio data; and
operation (2) further comprises sending, over the telecommunications network, information on the second audio data as associated with the first audio data.
38. The method of claim 23 further comprising inserting pauses of a predetermined length between the segments when playing the segments' audio data.
39. The method of claim 23 wherein the recording is performed while playing audio data so that the data generated at the first location is affected by played audio data; and the method further comprises removing the played data from the generated data.
40. The method of claim 23 wherein the teleconferencing system is operable to play the audio data generated at the first location.
41. The method of claim 23 wherein the teleconferencing system pauses the playing of the audio data when the teleconferencing system detects an audio signal not generated by the teleconferencing system.
42. The method of claim 23 wherein the teleconferencing system comprises user interface allowing a participant at the first location to indicate creation of a new segment of audio data to be sent over the telecommunications network for use at the other locations.
43. The method of claim 23 wherein the audio data generated at each location are subdivided into segments and segments from different locations are ordered in a common sequence determined by a central computer system, and at least the end or the beginning of at least one of the segments of audio data representing audio signals generated by at least one of the participants is determined using a minimum-length pause in the audio signals generated by said at least one of the participants.
44. The method of claim 23 wherein the audio data generated at each location are subdivided into segments and segments from different locations are ordered in a common sequence determined by a central computer system, and at least the end or the beginning of at least one of the segments of audio data representing audio signals generated by at least one of the participants ("first participant" below) is determined using a maximum length for segments of the audio signals generated by the first participant.
45. The method of claim 23 wherein the audio data generated at each location are subdivided into segments and segments from different locations are ordered in a common
sequence, and at least the end or the beginning of each of a plurality of segments of audio data representing audio signals generated by different participants is determined using maximum lengths for segments of the audio signals generated by said participants, wherein each maximum length depends on the participant.
46. A teleconferencing method comprising:
during a teleconference conducted by participants located at two or more locations interconnected by a telecommunications network, performing operations of:
at each location, generating audio data representing audio signals generated at the location;
sending the audio data generated at each location to every other location; and receiving, at each location, audio data generated at every other location, and playing the received audio data at the location at which the audio data is received;
wherein regardless of whether or not the audio signals or audio data are generated simultaneously at two or more locations, the audio data represent the audio signals from each location separately without mixing the audio signals from different location regardless of whether or not the audio signals at different locations were generated simultaneously.
47. A data processing apparatus operable to communicate over a
telecommunications network and perform the method of any one of claims 1 to 46.
48. Software embedded in a non-transitory computer-readable medium and operable to cause a computer to perform the method of any one of claims 1 to 46.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201261721032P | 2012-11-01 | 2012-11-01 | |
| US61/721,032 | 2012-11-01 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2014071152A1 true WO2014071152A1 (en) | 2014-05-08 |
Family
ID=50628064
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2013/067877 Ceased WO2014071076A1 (en) | 2012-11-01 | 2013-10-31 | Conferencing for participants at different locations |
| PCT/US2013/068000 Ceased WO2014071152A1 (en) | 2012-11-01 | 2013-11-01 | Teleconferencing for participants at different locations |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2013/067877 Ceased WO2014071076A1 (en) | 2012-11-01 | 2013-10-31 | Conferencing for participants at different locations |
Country Status (1)
| Country | Link |
|---|---|
| WO (2) | WO2014071076A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112463104A (en) * | 2017-11-02 | 2021-03-09 | 谷歌有限责任公司 | Automatic assistant with conference function |
| US12500856B2 (en) | 2022-09-14 | 2025-12-16 | Google Llc | Automated assistants with conference capabilities |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5765164A (en) * | 1995-12-21 | 1998-06-09 | Intel Corporation | Apparatus and method for management of discontinuous segments of multiple audio, video, and data streams |
| US20020097841A1 (en) * | 2001-01-24 | 2002-07-25 | Weissman Terry R. | Method and apparatus for serializing an asynchronous communication |
| US20090240818A1 (en) * | 2008-03-18 | 2009-09-24 | Nortel Networks Limited | Method and Apparatus for Reconstructing a Communication Session |
| US20110102539A1 (en) * | 2009-11-03 | 2011-05-05 | Bran Ferren | Video Teleconference Systems and Methods for Providing Virtual Round Table Meetings |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8180029B2 (en) * | 2007-06-28 | 2012-05-15 | Voxer Ip Llc | Telecommunication and multimedia management method and apparatus |
-
2013
- 2013-10-31 WO PCT/US2013/067877 patent/WO2014071076A1/en not_active Ceased
- 2013-11-01 WO PCT/US2013/068000 patent/WO2014071152A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5765164A (en) * | 1995-12-21 | 1998-06-09 | Intel Corporation | Apparatus and method for management of discontinuous segments of multiple audio, video, and data streams |
| US20020097841A1 (en) * | 2001-01-24 | 2002-07-25 | Weissman Terry R. | Method and apparatus for serializing an asynchronous communication |
| US20090240818A1 (en) * | 2008-03-18 | 2009-09-24 | Nortel Networks Limited | Method and Apparatus for Reconstructing a Communication Session |
| US20110102539A1 (en) * | 2009-11-03 | 2011-05-05 | Bran Ferren | Video Teleconference Systems and Methods for Providing Virtual Round Table Meetings |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112463104A (en) * | 2017-11-02 | 2021-03-09 | 谷歌有限责任公司 | Automatic assistant with conference function |
| CN112463104B (en) * | 2017-11-02 | 2024-05-14 | 谷歌有限责任公司 | Automatic assistant with conference function |
| US12500856B2 (en) | 2022-09-14 | 2025-12-16 | Google Llc | Automated assistants with conference capabilities |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2014071076A1 (en) | 2014-05-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3282669B1 (en) | Private communications in virtual meetings | |
| US10581710B2 (en) | Systems and methods for improved quality of a visualized call over network through scenario based buffer modulation | |
| CN114616606A (en) | Multi-device conferencing with improved destination playback | |
| US11115444B2 (en) | Private communications in virtual meetings | |
| US7808521B2 (en) | Multimedia conference recording and manipulation interface | |
| WO2000039996A2 (en) | System and method for visually identifying speaking participants in a multi-participant networked event | |
| US20120158849A1 (en) | Method and system for generating a collaboration timeline illustrating application artifacts in context | |
| US20140122588A1 (en) | Automatic Notification of Audience Boredom during Meetings and Conferences | |
| US12100415B2 (en) | Systems and methods for improved audio-video conferences | |
| JP2010074494A (en) | Conference support device | |
| US7719975B2 (en) | Method and system for communication session under conditions of bandwidth starvation | |
| US9838544B2 (en) | Systems and methods for improved quality of a call over network with load leveling and last mile signal indication | |
| JP2011180948A (en) | Terminal device, conference server and processing program | |
| Jansen et al. | Enabling composition-based video-conferencing for the home | |
| JP2010093583A (en) | Conference support apparatus | |
| CN113259620B (en) | Video conference data synchronization method and device | |
| US20240430365A1 (en) | Systems and methods for improved audio/video conferences | |
| WO2014071152A1 (en) | Teleconferencing for participants at different locations | |
| US7764973B2 (en) | Controlling playback of recorded media in a push-to-talk communication environment | |
| JP7292343B2 (en) | Information processing device, information processing method and information processing program | |
| JP6610076B2 (en) | Information processing apparatus, information processing system, program, and recording medium | |
| JP4562649B2 (en) | Audiovisual conference system | |
| WO2015200884A1 (en) | Systems and methods for improved quality of network calls with load leveling and last mile signal indication | |
| US12021647B2 (en) | Controlled access to portions of a communication session recording | |
| CN113157243B (en) | Operation triggering method, device, electronic device and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13850070 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 13850070 Country of ref document: EP Kind code of ref document: A1 |