EP1045586A2 - Image processing apparatus - Google Patents
Image processing apparatus Download PDFInfo
- Publication number
- EP1045586A2 EP1045586A2 EP00302422A EP00302422A EP1045586A2 EP 1045586 A2 EP1045586 A2 EP 1045586A2 EP 00302422 A EP00302422 A EP 00302422A EP 00302422 A EP00302422 A EP 00302422A EP 1045586 A2 EP1045586 A2 EP 1045586A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- speaker
- looking
- image data
- person
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Definitions
- the present invention relates to the archiving of image data.
- an apparatus or method in which processing is performed to archive image data from a plurality of cameras which shows people talking.
- the person speaking and the person (or object) at whom he is looking are determined, and a subset of the image data is selected to be archived in dependence thereon.
- the present invention also provides an apparatus or method for selecting image data from among image data recorded by a plurality of cameras which shows people talking, in which the position in three dimensions of at least the head of the person who is speaking and the person (or object) at whom he is looking are determined by processing at least some of the image data, and the selection of image data is made based on the determined positions and the views of the cameras.
- the present invention further provides instructions, including in signal and recorded form, for configuring a programmable processing apparatus to become arranged as an apparatus, or to become operable to perform a method, in such a system.
- a plurality of video cameras (three in the example shown in Figure 1, although this number may be different) 2-1, 2-2, 2-3 and a microphone array 4 are used to record image data and sound data respectively from a meeting taking place between a group of people 6, 8, 10, 12.
- the microphone array 4 comprises an array of microphones arranged such that the direction of any incoming sound can be determined, for example as described in GB-A-2140558, US 4333170 and US 3392392.
- the image data from the video cameras 2-1, 2-2, 2-3 and the sound data from the microphone array 4 is input via cables (not shown) to a computer 20 which processes the received data and stores data in a database to create an archive record of the meeting from which information can be subsequently retrieved.
- Computer 20 comprises a conventional personal computer having a processing apparatus 24 containing, in a conventional manner, one or more processors, memory, sound card etc., together with a display device 26 and user input devices, which, in this embodiment, comprise a keyboard 28 and a mouse 30.
- the processing apparatus 24 is programmed to operate in accordance with programming instructions input, for example, as data stored on a data storage medium, such as disk 32, and/or as a signal 34 input to the processing apparatus 24, for example from a remote database, by transmission over a communication network (not shown) such as the Internet or by transmission through the atmosphere, and/or entered by a user via a user input device such as keyboard 28 or other input device.
- a communication network not shown
- keyboard 28 a user input device
- processing apparatus 24 When programmed by the programming instructions, processing apparatus 24 effectively becomes configured into a number of functional units for performing processing operations. Examples of such functional units and their interconnections are shown in Figure 2. The illustrated units and interconnections in Figure 2 are, however, notional and are shown for illustration purposes only, to assist understanding; they do not necessarily represent the exact units and connections into which the processor, memory etc of the processing apparatus 24 become configured.
- a central controller 36 processes inputs from the user input devices 28, 30 and receives data input to the processing apparatus 24 by a user as data stored on a storage device, such as disk 38, or as a signal 40 transmitted to the processing apparatus 24.
- the central controller 36 also provides control and processing for a number of the other functional units.
- Memory 42 is provided for use by central controller 36 and other functional units.
- Head tracker 50 processes the image data received from video cameras 2-1, 2-2, 2-3 to track the position and orientation in three dimensions of the head of each of the participants 6, 8, 10, 12 in the meeting.
- head tracker 50 uses data defining a three-dimensional computer model of the head of each of the participants and data defining features thereof, which is stored in head model store 52, as will be described below.
- Direction processor 53 processes sound data from the microphone array 4 to determine the direction or directions from which the sound recorded by the microphones was received. Such processing is performed in a conventional manner, for example as described in GB-A-2140558, US 4333170 and US 3392392.
- Voice recognition processor 54 processes sound data received from microphone array 4 to generate text data therefrom. More particularly, voice recognition processor 54 operates in accordance with a conventional voice recognition program, such as "Dragon Dictate” or IBM “ViaVoice", to generate text data corresponding to the words spoken by the participants 6, 8, 10, 12. To perform the voice recognition processing, voice recognition processor 54 uses data defining the speech recognition parameters for each participant 6, 8, 10, 12, which is stored in speech recognition parameter store 56. More particularly, the data stored in speech recognition parameter store 56 comprises data defining the voice profile of each participant which is generated by training the voice recognition processor in a conventional manner. For example, the data comprises the data stored in the "user files" of Dragon Dictate after training.
- Archive processor 58 generates data for storage in meeting archive database 60 using data received from head tracker 50, direction processor 53 and voice recognition processor 54. More particularly, as will be described below, video data from cameras 2-1, 2-2 and 2-3 and sound data from microphone array 4 is stored in meeting archive database 60 together with text data from voice recognition processor 54 and data defining at whom each participant in the meeting was looking at a given time.
- Text searcher 62 in conjunction with central controller 36, is used to search the meeting archive database 60 to find and replay the sound and video data for one or more parts of the meeting which meet search criteria specified by a user, as will be described in further detail below.
- Display processor 64 under control of central controller 36 displays information to a user via display device 26 and also replays sound and video data stored in meeting archive database 60.
- Output processor 66 outputs part or all of the data from archive database 60, for example on a storage device such as disk 68 or as a signal 70.
- FIG. 3 shows the processing operations performed by processing apparatus 24 during this initialisation.
- central controller 36 causes display processor 64 to display a message on display device 26 requesting the user to input the names of each person who will participate in the meeting.
- central controller 36 upon receipt of data defining the names, for example input by the user using keyboard 28, central controller 36 allocates a unique identification number to each participant, and stores data, for example table 80 shown in Figure 4, defining the relationship between the identification numbers and the participants' names in the meeting archive database 60.
- central controller 36 causes display processor 64 to display a message on display device 26 requesting the user to input the name of each object at which a person may look for a significant amount of time during the meeting, and for which it is desired to store archive data in the meeting archive database 60.
- objects may include, for example, a flip chart, such as the flip chart 14 shown in Figure 1, a whiteboard or blackboard, or a television, etc.
- central controller 36 allocates a unique identification number to each object, and stores data, for example as in table 80 shown in Figure 4, defining the relationship between the identification numbers and the names of the objects in the meeting archive database 60.
- central controller 36 searches the head model store 52 to determine whether data defining a head model is already stored for each participant in the meeting.
- central controller 36 causes display processor 64 to display a message on display device 26 requesting the user to input data defining a head model of each participant for whom a model is not already stored.
- the user enters data, for example on a storage medium such as disk 38 or by downloading the data as a signal 40 from a connected processing apparatus, defining the required head models.
- head models may be generated in a conventional manner, for example as described in "An Analysis/Synthesis Cooperation for Head Tracking and Video Face Cloning" by Valente et al in Proceedings ECCV '98 Workshop on Perception of Human Action, University of Freiberg, Germany, June 6 1998.
- central controller 36 stores the data input by the user in head model store 52.
- central controller 36 and display processor 64 render each three-dimensional computer head model input by the user to display the model to the user on display device 26, together with a message requesting the user to identify at least seven features in each model.
- the user designates using mouse 30 points in each model which correspond to prominent features on the front, sides and, if possible, the back, of the participant's head, such as the corners of eyes, nostrils, mouth, ears or features on glasses worn by the participant, etc.
- step S14 data defining the features identified by the user is stored by central controller 36 in head model store 52.
- steps S8 to S14 are omitted.
- central controller 36 searches speech recognition parameter store 56 to determine whether speech recognition parameters are already stored for each participant.
- central controller 36 causes display processor 64 to display a message on display device 26 requesting the user to input the speech recognition parameters for each participant for whom the parameters are not already stored.
- the user enters data, for example on a storage medium such as disk 38 or as a signal 40 from a remote processing apparatus, defining the necessary speech recognition parameters.
- these parameters define a profile of the user's speech and are generated by training a voice recognition processor in a conventional manner.
- a voice recognition processor comprising Dragon Dictate
- the speech recognition parameters input by the user correspond to the parameters stored in the "user files" of Dragon Dictate.
- step S20 data defining the speech recognition parameters input by the user is stored by central controller 36 in the speech recognition parameter store 56.
- steps S18 and S20 are omitted.
- central controller 36 causes display processor 64 to display a message on display device 26 requesting the user to perform steps to enable the cameras 2-1, 2-2 and 2-3 to be calibrated.
- the user carries out the necessary steps and, at step S24, central controller 36 performs processing to calibrate the cameras 2-1, 2-2 and 2-3. More particularly, in this embodiment, the steps performed by the user and the processing performed by central controller 36 are carried out in a manner such as that described in "Calibrating and 3D Modelling with a Multi-Camera System” by Wiles and Davison in 1999 IEEE Workshop on Multi-View Modelling and Analysis of Visual Scenes, ISBN 0769501109. This generates calibration data defining the position and orientation of each camera 2-1, 2-2 and 2-3 with respect to the meeting room and also the intrinsic parameters of each camera (aspect ratio, focal length, principal point, and first order radial distortion coefficient).
- the camera calibration data is stored, for example in memory 42.
- central controller 36 causes display processor 64 to display a message on display device 26 requesting the user to perform steps to enable the position and orientation of each of the objects for which identification data was stored at step S4 to be determined.
- central controller 36 performs processing to determine the position and orientation of each object. More particularly, in this embodiment, the user places coloured markers at points on the perimeter of the surface(s) of the object at which the participants in the meeting may look, for example the plane of the sheets of paper of flip chart 14. Image data recorded by each of cameras 2-1, 2-2 and 2-3 is then processed by central controller 36 using the camera calibration data stored at step S24 to determine, in a conventional manner, the position in three-dimensions of each of the coloured markers.
- This processing is performed for each camera 2-1, 2-2 and 2-3 to give separate estimates of the position of each coloured marker, and an average is then determined for the position of each marker from the positions calculated using data from each camera 2-1, 2-2 and 2-3.
- central controller 36 uses the average position of each marker, calculates in a conventional manner the centre of the object surface and a surface normal to define the orientation of the object surface.
- the determined position and orientation for each object is stored as object calibration data, for example in memory 42.
- central controller 36 causes display processor 64 to display a message on display device 26 requesting the next participant in the meeting (this being the first participant the first time step S27 is performed) to sit down.
- processing apparatus 24 waits for a predetermined period of time to give the requested participant time to sit down, and then, at step S30, central controller 36 processes the respective image data from each camera 2-1, 2-2 and 2-3 to determine an estimate of the position of the seated participant's head for each camera. More particularly, in this embodiment, central controller 36 carries out processing separately for each camera in a conventional manner to identify each portion in a frame of image data from the camera which has a colour corresponding to the colour of the skin of the participant (this colour being determined from the data defining the head model of the participant stored in head model store 52), and then selects the portion which corresponds to the highest position in the meeting room (since it is assumed that the head will be the highest skin-coloured part of the body).
- central controller 36 uses the position of the identified portion in the image and the camera calibration parameters determined at step S24, central controller 36 then determines an estimate of the three-dimensional position of the head in a conventional manner. This processing is performed for each camera 2-1, 2-2 and 2-3 to give a separate head position estimate for each camera.
- central controller 36 determines an estimate of the orientation of the participant's head in three dimensions for each camera 2-1, 2-2 and 2-3. More particularly, in this embodiment, central controller 36 renders the three-dimensional computer model of the participant's head stored in head model store 52 for a plurality of different orientations of the model to produce a respective two-dimensional image of the model for each orientation.
- the computer model of the participant's head is rendered in 108 different orientations to produce 108 respective two-dimensional images, the orientations corresponding to 36 rotations of the head model in 10° steps for each of three head inclinations corresponding to 0° (looking straight ahead), +45° (looking up) and -45° (looking down).
- Each two-dimensional image of the model is then compared by central processor 36 with the part of the video frame from a camera 2-1, 2-2, 2-3 which shows the participant's head, and the orientation for which the image of the model best matches the video image data is selected, this comparison and selection being performed for each camera to give a head orientation estimate for each camera.
- a conventional technique is used, for example as described in "Head Tracking Using a Textured Polygonal Model" by Schödl, Haro & Essa in Proceedings 1998 Workshop on Perceptual User Interfaces.
- step S34 the respective estimates of the position of the participant's head generated at step S30 and the respective estimates of the orientation of the participant's head generated at step S32 are input to head tracker 50 and frames of image data received from each of cameras 2-1, 2-2 and 2-3 are processed to track the head of the participant. More particularly, in this embodiment, head tracker 50 performs processing to track the head in a conventional manner, for example as described in "An Analysis/Synthesis Cooperation for Head Tracking and Video Face Cloning" by Valente et al in Proceedings EECV '98 Workshop on Perception of Human Action, University of Freiberg, Germany, June 6 1998.
- FIG. 5 summarises the processing operations performed by head tracker 50 at step S34.
- Figure 6 shows the processing operations performed at a given one of steps S42-1 to S42-n, the processing operations being the same at each step but being carried out on image data from a different camera.
- head tracker 50 reads the current estimates of the 3D position and orientation of the participant's head, these being the estimates produced at steps S30 and S32 in Figure 3 the first time step S50 is performed.
- head tracker 50 uses the camera calibration data generated at step S24 to render the three-dimensional computer model of the participant's head stored in head model store 52 in accordance with the estimates of position and orientation read at step S50.
- head tracker 50 processes the image data for the current frame of video data received from the camera to extract the image data from each area which surrounds the expected position of one of the head features identified by the user and stored at step S14, the expected positions being determined from the estimates read at step S50 and the camera calibration data generated at step S24.
- head tracker 50 uses the camera image data identified at step S56 which best matches the rendered head model together with the camera calibration data stored at step S24 ( Figure 3) to determine the 3D position and orientation of the participant's head for the current frame of video data.
- head tracker 50 uses the camera image data identified at each of steps S42-1 to S42-n which best matches the rendered head model (identified at step S58 in Figure 6) to determine an average 3D position and orientation of the participant's head for the current frame of video data.
- step S46 the positions of the head features in the camera image data determined at each of steps S42-1 to S42-n (identified at step S58 in Figure 6) are input into a conventional Kalman filter to generate an estimate of the 3D position and orientation of the participant's head for the next frame of video data.
- Steps S42 to S46 are performed repeatedly for the participant as frames of video data are received from video camera 2-1, 2-2 and 2-3.
- step S36 central controller 36 determines whether there is another participant in the meeting, and steps S27 to S36 are repeated until processing has been performed for each participant in the manner described above. However, while these steps are performed for each participant, at step S34, head tracker 50 continues to track the head of each participant who has already sat down.
- central controller 36 causes an audible signal to be output from processing apparatus 24 to indicate that the meeting between the participants can begin.
- Figure 7 shows the processing operations performed by processing apparatus 24 as the meeting between the participants takes place.
- step S70 head tracker 50 continues to track the head of each participant in the meeting.
- the processing performed by head tracker 50 at step S70 is the same as that described above with respect to step S34, and accordingly will not be described again here.
- step S72 processing is performed to generate and store data in meeting archive database 60.
- FIG. 8 shows the processing operations performed at step S72.
- archive processor 58 generates a so-called "viewing parameter" for each participant defining at which person or which object the participant is looking.
- FIG. 9 shows the processing operations performed at step S80.
- archive processor 58 reads the current three-dimensional position of each participant's head from head tracker 50, this being the average position generated in the processing performed by head tracker 50 at step S44 ( Figure 5).
- archive processor 58 reads the current orientation of the head of the next participant (this being the first participant the first time step S112 is performed) from head tracker 50.
- the orientation read at step S112 is the average orientation generated in the processing performed by head tracker 50 at step S44 ( Figure 5).
- archive processor 58 determines the angle between a ray defining where the participant is looking (a so-called “viewing ray") and each notional line which connects the head of the participant with the centre of the head of another participant.
- FIG. 10 an example of the processing performed at step S114 is illustrated for one of the participants, namely participant 6 in Figure 1.
- the orientation of the participant's head read at step S112 defines a viewing ray 90 from a point between the centre of the participant's eyes which is perpendicular to the participant's head.
- the positions of all of the participant's heads read at step S110 define notional lines 92, 94, 96 from the point between the centre of the eyes of participant 6 to the centre of the heads of each of the other participants 8, 10, 12.
- archive processor 58 determines the angles 98, 100, 102 between the viewing ray 90 and each of the notional lines 92, 94, 96.
- archive processor 58 selects the angle 98, 100 or 102 which has the smallest value.
- the angle 100 would be selected.
- archive processor 58 determines whether the angle selected at step S116 has a value less than 10°.
- archive processor 58 sets the viewing parameter for the participant to the identification number (allocated at step S2 in Figure 3) of the participant connected by the notional line which makes the smallest angle with the viewing ray.
- the viewing parameter would be set to the identification number of participant 10 since angle 100 is the angle between viewing ray 90 and notional line 94 which connects participant 6 to participant 10.
- archive processor 58 reads the position of each object previously stored at step S26 ( Figure 3).
- archive processor 58 determines whether the viewing ray 90 of the participant intersects the plane of any of the objects.
- archive processor 50 sets the viewing parameter for the participant to the identification number (allocated at step S4 in Figure 3) of the object which is intersected by the viewing ray, this being the nearest intersected object to the participant if more than one object is intersected by the viewing ray 90.
- archive processor 58 sets the value of the viewing parameter for the participant to "0". This indicates that the participant is determined to be looking at none of the other participants (since the viewing ray 90 is not close enough to any of the notional lines 92, 94, 96) and none of the objects (since the viewing ray 90 does not intersect an object).
- archive processor 58 sets the value of the viewing parameter for the participant to "0". This indicates that the participant is determined to be looking at none of the other participants (since the viewing ray 90 is not close enough to any of the notional lines 92, 94, 96) and none of the objects (since the viewing ray 90 does not intersect an object).
- step S1208 archive processor 58 sets the value of the viewing parameter for the participant to "0".
- central controller 36 and voice recognition processor 54 determine whether any speech data has been received from the microphone array 4 corresponding to the current frame of video data.
- FIG. 12 shows the processing operations performed at step S84.
- direction processor 53 processes the sound data from the microphone array 4 to determine the direction or directions from which the speech is coming. This processing is performed in a conventional manner, for example as described in GB-A-2140558, US 4333170 and US 3392392.
- archive processor 58 reads the position of each participant's head determined by head tracker 50 at step S44 ( Figure 5) for the current frame of image data and determines therefrom which of the participants has a head at a position corresponding to a direction determined at step S140, that is, a direction from which the speech is coming.
- archive processor 58 selects the participant in the direction from which the speech is coming as the speaker for the current frame of image data.
- archive processor 58 determines which image data is to be stored in the meeting archive database 60, that is, the image data from which of the cameras 2-1, 2-2 and 2-3 is to be stored.
- Figure 13 shows the processing operations performed by archive processor 58 at step S89.
- archive processor 58 uses the information read at step S176 together with information defining the three-dimensional head position and orientation of the speaking participant (determined at step S44 in Figure 5) and the three-dimensional head position and orientation of the participant at whom the speaking participant is looking (determined at step S44 in Figure 5) or the three-dimensional position and orientation of the object being looked at (stored at step S26 in Figure 3) to determine whether the speaking participant and the participant or object at which the speaking participant is looking are both within the field of view of the camera currently being considered (that is, whether the camera currently being considered can see both the speaking participant and the participant or object at which the speaking participant is looking). More particularly, in this embodiment, archive processor 58 evaluates the following equations and determines that the camera can see both the speaking participant and the participant or object at which the speaking participant is looking if all of the inequalities hold: where:
- Q2 is a scalar having a value between -1 if the back of the head of the participant or the back of the surface of the object is directly facing the camera, +1 if the face of the participant or the front surface of the object is directly facing the camera, and values therebetween for other orientations of the participant's head or object surface.
- archive processor 58 compares the "worst view” values stored for each of the cameras when processing was performed at step S184 (that is, the value of Q1 or Q2 stored for each camera at step S184) and selects the highest one of these stored values. This highest value represents the "best worst view” and accordingly, at step S188, archive processor 58 selects the camera for which this "best worst view” value was stored at step S184 as a camera from which image data should be stored in the meeting archive database, since this camera has the best view of both the speaking participant and the participant or object at which the speaking participant is looking.
- archive processor 58 encodes the current frame of video data received from the camera or cameras selected at step S89 and the sound data received from microphone array 4 as MPEG 2 data in a conventional manner, and stores the encoded data in meeting archive database 60.
- archive processor 58 stores any text data generated by voice recognition processor 54 at step S88 for the current frame in meeting archive database 60 (indicated at 204 in Figure 15). More particularly, the text data is stored with a link to the corresponding MPEG 2 data, this link being represented in Figure 15 by the text data being stored in the same vertical column as the MPEG 2 data.
- archive processor 58 stores the viewing parameter value generated for the current frame for each participant at step S80 in the meeting archive database 60 (indicated at 212 in Figure 15).
- a viewing parameter value is stored for each participant together with a link to the associated MPEG 2 data 202 and the associated text data 204 (this link being represented in Figure 15 by the viewing parameter values being shown in the same column as the associated MPEG 2 data 202 and associated text data 204).
- the viewing parameter value for participant 1 is 3, indicating that participant 1 is looking at participant 3, the viewing parameter value for participant 2 is 5, indicating that participant 2 is looking at the flip chart 14, the viewing parameter value for participant 3 is 1, indicating that participant 3 is looking at participant 1, and the viewing parameter value for participant 4 is "0", indicating that participant 4 is not looking at any of the other participants (in the example shown in Figure 1, the participant indicated at 12 is looking at her notes rather than any of the other participants).
- central controller 36 and archive processor 58 determine whether one of the participants in the meeting has stopped speaking. In this embodiment, this check is performed by examining the text data 204 to determine whether text data for a given participant was present for the previous time slot, but is not present for the current time slot. If this condition is satisfied for any participant (that is, a participant has stopped speaking), then, at step S98, archive processor 58 processes the viewing parameter values previously stored when step S86 was performed for each participant who has stopped speaking (these viewing parameter values defining at whom or what the participant was looking during the period of speech which has now stopped) to generate data defining a viewing histogram. More particularly, the viewing parameter values for the period in which the participant was speaking are processed to generate data defining the percentage of time during that period that the speaking participant was looking at each of the other participants and objects.
- Figures 16A and 16B show the viewing histograms corresponding to the periods of text 206 and 208 respectively in Figure 15.
- participant 3 was looking at participant 1 for approximately 45% of the time, which is indicated at 320 in Figure 16B, at object 5 (that is, the flip chart 14) for approximately 33% of the time, indicated at 330 in Figure 16B, and at participant 2 for approximately 22% of the time, which is indicated at 340 in Figure 16B.
- steps S98 and S100 are omitted.
- archive processor 58 corrects data stored in the meeting archive database 60 for the previous frame of video data (that is, the frame preceding the frame for which data has just been generated and stored at steps S80 to S100) and other preceding frames, if such correction is necessary.
- Figure 17 shows the processing operations performed by archive processor 58 at step S102.
- archive processor 58 determines whether any data for a "potential" speaking participant is stored in the meeting archive database 60 for the next preceding frame (this being the frame which immediately precedes the current frame the first time step S190 is performed, that is the "i-1"th frame if the current frame is the"i"th frame).
- step S190 If it is determined at step S190 that no data is stored for a "potential" speaking participant for the preceding frame being considered, then it is not necessary to correct any data in the meeting archive database 60.
- archive processor 58 determines whether one of the "potential" speaking participants for which data was stored for the preceding frame is the same as a speaking participant (but not a "potential" speaking participant) identified for the current frame, that is a speaking participant identified at step S146 in Figure 12.
- step S192 If it is determined at step S192 that none of the "potential" speaking participants for the preceding frame is the same as a speaking participant identified at step S146 for the current frame, then no correction of the data stored in the meeting archive database 60 for the preceding frame being considered is carried out.
- archive processor 58 deletes the text data 204 for the preceding frame being considered from the meeting archive database 60 for each "potential" speaking participant who is not the same as the speaking participant for the current frame.
- steps S190 to S194 are repeated for the next preceding frame. More particularly, if the current frame is the "i"th frame then, the "i-1"th frame is considered the first time steps S190 to S194 are performed, the "i-2"th frame is considered the second time steps S190 to S194 are performed, etc. Steps S190 to S194 continue to be repeated until it is determined at step S190 that data for "potential" speaking participants is not stored in the preceding frame being considered or it is determined at step S192 that none of the "potential" speaking participants in the preceding frame being considered is the same as a speaking participant unambiguously identified for the current frame. In this way, in cases where "potential" speaking participants were identified for a number of successive frames, the data stored in the meeting archive database is corrected if the actual speaking participant from among the "potential" speaking participants is identified in the next frame.
- central controller 36 determines whether another frame of video data has been received from the cameras 2-1, 2-2, 2-3.
- Steps S80 to S104 are repeatedly performed while image data is received from the cameras 2-1, 2-2, 2-3.
- meeting archive database 60 When data is stored in meeting archive database 60, then the meeting archive database 60 may be interrogated to retrieve data relating to the meeting.
- Figure 18 shows the processing operations performed to search the meeting archive database 60 to retrieve data relating to each part of the meeting which satisfies search criteria specified by a user.
- the user is requested to enter information defining the part or parts of the meeting which he wishes to find in the meeting archive database 60. More particularly, in this embodiment, the user is requested to enter information 400 defining a participant who was talking, information 410 comprising one or more key words which were said by the participant identified in information 400, and information 420 defining the participant or object at which the participant identified in information 400 was looking when he was talking.
- the user is able to enter time information defining a portion or portions of the meeting for which the search is to be carried out.
- the user can enter information 430 defining a time in the meeting beyond which the search should be discontinued (that is, the period of the meeting before the specified time should be searched), information 440 defining a time in the meeting after which the search should be carried out, and information 450 and 460 defining a start time and end time respectively between which the search is to be carried out.
- information 430, 440, 450 and 460 may be entered either by specifying a time in absolute terms, for example in minutes, or in relative terms by entering a decimal value which indicates a proportion of the total meeting time. For example, entering the value 0.25 as information 430 would restrict the search to the first quarter of the meeting.
- a search is carried out to identify each part of the meeting in which the participant defined in information 400 was talking, irrespective of what was said and to whom. If information 400 is omitted, then a search is carried out to identify each part of the meeting in which any of the participants spoke the key words defined in information 410 while looking at the participant or object defined in information 420. If information 400 and 410 is omitted, then a search is carried out to identify each part of the meeting in which any of the participants spoke to the participant or object defined in information 420. If information 420 is omitted, then a search is carried out to identify each part of the meeting in which the participant defined in information 400 spoke the key words defined in information 410, irrespective of to whom the key words were spoken. Similarly, if information 400 and 420 is omitted, then a search is carried out to identify each part of the meeting in which the key words identified in information 410 were spoken, irrespective of who said the key words and to whom.
- the user may enter all of the time information 430, 440, 450 and 460 or may omit one or more pieces of this information.
- central controller 36 and text searcher 62 search each portion of text previously identified on the basis of information 400 and 420 (or all portions of text if information 400 and 420 was not entered) to identify each portion containing the key word(s) identified in information 410. If any time information has been entered by the user, then the searches described above are restricted to the meeting times defined by those limits.
- central controller 36 causes display processor 64 to display a list of relevant speeches identified during the search to the user on display device 26. More particularly, central controller 36 causes information such as that shown in Figure 19B to be displayed to the user. Referring to Figure 19B, a list is produced of each speech which satisfies the search parameters, and information is displayed defining the start time for the speech both in absolute terms and as a proportion of the full meeting time. The user is then able to select one of the speeches for playback, for example by clicking on the required speech in the list using the mouse 30.
- a microphone array 4 is provided on the meeting room table to determine the direction from which received sound has come.
- a respective microphone may be provided for each participant in the meeting (such as a microphone which attaches to the clothing of the participant). In this way, the speaking participant(s) can be readily identified because the sound data for the participants is input into processing apparatus 24 on respective channels.
- step S34 Figure 3
- step S70 Figure 7
- the head of each of the participants in the meeting is tracked.
- objects for which data was stored at step S4 and S26 could also be tracked if they moved (such objects may comprise, for example, notes which are likely to be moved by a participant or an object which is to be passed between the participants).
- step S168 processing is performed to identify the camera which has the best view of the speaking participant and also the participant or object at which the speaking participant is looking.
- a user instead of identifying the camera in the way described in the embodiment above, it is possible for a user to define during the initialisation of processing apparatus 24 which of the cameras 2-1, 2-2, 2-3 has the best view of each respective pair of the seating positions around the meeting table and/or the best view of each respective seating position and a given object (such as flip chart 14).
- the camera defined by the user to have the best view of those predefined seating positions can be selected as a camera from which image data is to be stored.
- the speaking participant is in a predefined position and is looking at an object
- the camera defined by the user to have the best view of that predefined seating position and object can be selected as the camera from which image data is to be stored.
- a default camera is selected as a camera from which image data was stored for the previous frame.
- the default camera may be selected by a user, for example during the initialisation of processing apparatus 24.
- the text data 204 is deleted from meeting archive database 60 for the "potential" speaking participants who have now been identified as actually not being speaking participants.
- the associated viewing histogram data 214 may also be deleted.
- MPEG 2 data 202 from more than one of the cameras 2-1, 2-2, 2-3 was stored, then the MPEG 2 data related to the "potential" speaking participants may also be deleted.
- step S102 in Figure 8 when it is not possible to uniquely identify a speaking participant, "potential" speaking participants are defined, data is processed and stored in meeting archive database 60 for the potential speaking participants, and subsequently the data stored in the meeting archive database 60 is corrected (step S102 in Figure 8).
- video data received from cameras 2-1, 2-2 and 2-3 and audio data received from microphone array 4 may be stored for subsequent processing and archiving when the speaking participant has been identified from data relating to a future frame.
- image data from the cameras 2-1, 2-2 and 2-3 may be processed to detect lip movements of the participants and to select as the speaking participant the participant in the direction from which the speech is coming whose lips are moving.
- processing is performed to determine the position of each person's head, the orientation of each person's head and a viewing parameter for each person defining at whom or what the person is looking.
- the viewing parameter value for each person is then stored in the meeting archive database 60 for each frame of image data.
- the viewing histogram for a particular portion of text is considered and it is determined that the participant was talking to a further participant or object if the percentage of gaze time for the further participant or object in the viewing histogram is equal to or above a predetermined threshold.
- the participant or object at whom the speaking participant was looking during the period of text may be defined to be the participant or object having the highest percentage gaze value in the viewing histogram (for example participant 3 in Figure 16A, and participant 1 in Figure 16B).
- the MPEG 2 data 202, the text data 204, the viewing parameters 212 and the viewing histograms 214 are generated and stored in the meeting archive database 60 before the database is interrogated to retrieve data for a defined part of the meeting.
- some, or all, of the viewing histogram data 214 may be generated in response to a search of the meeting archive database 60 being requested by the user by processing the data already stored in meeting archive database 60, rather than being generated and stored prior to such a request.
- the viewing histograms 214 are calculated and stored in real-time at steps S98 and S100 ( Figure 8), these histograms could be calculated in response to a search request being input by the user.
- text data 204 is stored in meeting archive database 60.
- audio data may be stored in the meeting archive database 60 instead of the text data 204.
- the stored audio data would then either itself be searched for key words using voice recognition processing or converted to text using voice recognition processing and the text search using a conventional text searcher.
- processing apparatus 24 includes functional components for receiving and generating data to be archived (for example, central controller 36, head tracker 50, head model store 52, direction processor 53, voice recognition processor 54, speech recognition parameter store 56 and archive processor 58), functional components for storing the archive data (for example meeting archive database 60), and also functional components for searching the database and retrieving information therefrom (for example central controller 36 and text searcher 62).
- these functional components may be provided in separate apparatus.
- one or more apparatus for generating data to be archived, and one or more apparatus for database searching may be connected to one or more databases via a network, such as the Internet.
- video and sound data from one or more meetings 500, 510, 520 may be input to a data processing and database storage apparatus 530 (which comprises functional components to generate and store the archive data), and one or more database interrogation apparatus 540, 550 may be connected to the data processing and database storage apparatus 530 for interrogating the database to retrieve information therefrom.
- a data processing and database storage apparatus 530 which comprises functional components to generate and store the archive data
- database interrogation apparatus 540, 550 may be connected to the data processing and database storage apparatus 530 for interrogating the database to retrieve information therefrom.
- processing is performed by a computer using processing routines defined by programming instructions. However, some, or all, of the processing could be performed using hardware.
- the invention is not limited to this application, and, instead, can be used for other applications, such as to process image and sound data on a film set etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Studio Devices (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
Description
Claims (43)
- Image processing apparatus, comprising:means for receiving image data recorded by a plurality of cameras showing the movements of a plurality of people;speaker identification means for determining which of the people is speaking;means for determining at whom the speaker is looking;means for determining the position of the speaker and the position of the person at whom the speaker is looking; andcamera selection means for selecting image data from the received image data on the basis of the determined positions of the speaker and the person at whom the speaker is looking.
- Apparatus according to claim 1, wherein the camera selection means is arranged to select image data in which both the speaker and the person at whom the speaker is looking appear.
- Apparatus according to claim 2, wherein the camera selection means is arranged to generate quality values representing a quality of the views that at least some of the cameras have of the speaker and the person at whom the speaker is looking, and to select the image data on the basis of which camera has the quality value representing the highest quality.
- Apparatus according to claim 3, wherein the camera selection means is arranged to determine which of the cameras have a view of the speaker and the person at whom the speaker is looking, and to generate a respective quality value for each camera which has a view of the speaker and the person at whom the speaker is looking.
- Apparatus according to claim 3 or claim 4, wherein the camera selection means is arranged to generate each quality value in dependence upon the position and orientation of the head of the speaker and the position and orientation of the head of the person at whom the speaker is looking.
- Apparatus according to claim 1 or claim 2, wherein the camera selection means comprises:data storage means for storing data defining a camera from which image data is to be selected for respective pairs of positions; andmeans for using data stored in the data storage means to select the image data in dependence upon the positions of the speaker and the person at whom the speaker is looking.
- Apparatus according to any preceding claim, wherein the means for determining at whom the speaker is looking and the means for determining the positions of the speaker and the person at whom the speaker is looking comprise image processing means for processing the image data from at least one of the cameras to determine at whom the speaker is looking and the positions.
- Apparatus according to claim 7, wherein the image processing means is arranged to determine the position of each person and at whom each person is looking by processing the image data from the at least one camera.
- Apparatus according to claim 7 or claim 8, wherein the image processing means is arranged to track the position and orientation of each person's head in three dimensions.
- Apparatus according to any preceding claim, wherein the speaker identification means is arranged to receive speech data from a plurality of microphones each of which is allocated to a respective one of the people, and to determine which of the people is speaking on the basis of the microphone from which the speech data was received.
- Apparatus according to any preceding claim, further comprising sound processing means for processing sound data defining words spoken by the people to generate text data therefrom in dependence upon the result of the processing performed by the speaker identification means.
- Apparatus according to claim 11, wherein the sound processing means includes storage means for storing respective voice recognition parameters for each of the people, and means for selecting the voice recognition parameters to be used to process the sound data in dependence upon the person determined to be speaking by the speaker identification means.
- Apparatus according to claim 11 or claim 12, further comprising a database for storing at least some of the received image data, the sound data, the text data produced by the sound processing means and viewing data defining at whom at least the person who is speaking is looking, the database being arranged to store the data such that corresponding text data and viewing data are associated with each other and with the corresponding image data and sound data.
- Apparatus according to claim 13, further comprising means for compressing the image data and the sound data for storage in the database.
- Apparatus according to claim 14, wherein the means for compressing the image data and the sound data comprises means for encoding the image data and the sound data as MPEG data.
- Apparatus according to any of claims 13 to 15, further comprising means for generating data defining, for a predetermined period, the proportion of time spent by a given person looking at each of the other people during the predetermined period, and wherein the database is arranged to store the data so that it is associated with the corresponding image data, sound data, text data and viewing data.
- Apparatus according to claim 16, wherein the predetermined period comprises a period during which the given person was talking.
- Image processing apparatus, comprising:means for receiving image data recorded by a plurality of cameras showing the movements of a plurality of people;speaker identification means for determining which of the people is speaking;means for determining at what the speaker is looking;means for determining the position of the speaker and the position of the object at which the speaker is looking; andcamera selection means for selecting image data from the received image data on the basis of the determined positions of the speaker and the object at which the speaker is looking.
- A method of processing image data recorded by a plurality of cameras showing the movements of a plurality of people to select image data for storage, the method comprising:a speaker identification step of determining which of the people is speaking;a step of determining at whom the speaker is looking;a step of determining the position of the speaker and the position of the person at whom the speaker is looking; anda camera selection step of selecting image data on the basis of the determined positions of the speaker and the person at whom the speaker is looking.
- A method according to claim 19, wherein, in the camera selection step, image data is selected in which both the speaker and the person at whom the speaker is looking appear.
- A method according to claim 20, wherein, in the camera selection step, quality values are generated representing a quality of the views that at least some of the cameras have of the speaker and the person at whom the speaker is looking, and the image data is selected on the basis of which camera has the quality value representing the highest quality.
- A method according to claim 21, wherein, in the camera selection step, processing is performed to determine which of the cameras have a view of the speaker and the person at whom the speaker is looking, and to generate a respective quality value for each camera which has a view of the speaker and the person at whom the speaker is looking.
- A method according to claim 21 or claim 22, wherein, in the camera selection step, each quality value is generated in dependence upon the position and orientation of the head of the speaker and the position and orientation of the head of the person at whom the speaker is looking.
- A method according to claim 19 or claim 20, wherein, in the camera selection step pre-stored data defining a camera from which image data is to be selected for respective pairs of positions is used to select the image data in dependence upon the positions of the speaker and the person at whom the speaker is looking.
- A method according to any of claims 19 to 24, wherein, in the steps of determining at whom the speaker is looking and determining the positions of the speaker and the person at whom the speaker is looking, image data from at least one of the cameras is processed to determine at whom the speaker is looking and the positions.
- A method according to claim 25, wherein, the image data from that at least one camera is processed to determine the position of each person and at whom each person is looking.
- A method according to claim 25 or claim 26, wherein image data is processed to track the position and orientation of each person's head in three dimensions.
- A method according to any of claims 19 to 26, wherein speech data is received from a plurality of microphones each of which is allocated to a respective one of the people, and, in the speaker identification step, it is determined which of the people is speaking on the basis of the microphone from which the speech data was received.
- A method according to any of claims 19 to 28, further comprising a sound processing step of processing sound data defining words spoken by the people to generate text data therefrom in dependence upon the result of the processing performed in the speaker identification step.
- A method according to claim 29, wherein the sound processing step includes selecting, from among stored respective voice recognition parameters for each of the people, the voice recognition parameters to be used to process the sound data in dependence upon the person determined to be speaking in the speaker identification step.
- A method according to claim 29 or claim 30, further comprising the step of storing in a database at least some of the received image data, the sound data, the text data produced in the sound processing step and viewing data defining at whom at least the person who is speaking is looking, the data being stored in the database such that corresponding text data and viewing data are associated with each other and with the corresponding image data and sound data.
- A method according to claim 31, wherein the image data and the sound data are stored in the database in compressed form.
- A method according to claim 32, wherein the image data and the sound data are stored as MPEG data.
- A method according to any of claims 31 to 33, further comprising the steps of generating data defining, for a predetermined period, the proportion of time spent by a given person looking at each of the other people during the predetermined period, and storing the data in the database so that it is associated with the corresponding image data, sound data, text data and viewing data.
- A method according to claim 34, wherein the predetermined period comprises a period during which the given person was talking.
- A method according to any of claims 19 to 35, further comprising the step of generating a signal conveying information defining the image data selected in the camera selection step.
- A method according to any of claims 31 to 35, further comprising the step of generating a signal conveying the database with data therein.
- A method according to claim 37, further comprising the step of recording the signal either directly or indirectly to generate a recording thereof.
- A method of processing image data recorded by a plurality of cameras showing the movements of a plurality of people to select image data for storage, the method comprising:a speaker identification step of determining which of the people is speaking;a step of determining at what the speaker is looking;a step of determining the position of the speaker and the position of the object at which the speaker is looking; anda camera selection step of selecting image data on the basis of the determined positions of the speaker and the object at which the speaker is looking.
- A storage device storing instructions for causing a programmable processing apparatus to become configured as an apparatus as set out in at least one of claims 1 to 18.
- A storage device storing instructions for causing a programmable processing apparatus to become operable to perform a method as set out in at least one of claims 19 to 39.
- A signal conveying instructions for causing a programmable processing apparatus to become configured as an apparatus as set out in at least one of claims 1 to 18.
- A signal conveying instructions for causing a programmable processing apparatus to become operable to perform a method as set out in at least one of claims 19 to 39.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GBGB9908545.8A GB9908545D0 (en) | 1999-04-14 | 1999-04-14 | Image processing apparatus |
| GB9908545 | 1999-04-14 |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| EP1045586A2 true EP1045586A2 (en) | 2000-10-18 |
| EP1045586A3 EP1045586A3 (en) | 2002-08-07 |
| EP1045586B1 EP1045586B1 (en) | 2006-08-16 |
Family
ID=10851529
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP00302422A Expired - Lifetime EP1045586B1 (en) | 1999-04-14 | 2000-03-24 | Image processing apparatus |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US7113201B1 (en) |
| EP (1) | EP1045586B1 (en) |
| JP (1) | JP4697907B2 (en) |
| DE (1) | DE60030027D1 (en) |
| GB (1) | GB9908545D0 (en) |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2002082249A3 (en) * | 2001-04-03 | 2003-03-20 | Canesta Inc | Method and apparatus for approximating a source position of a sound-causing event |
| EP1217608A3 (en) * | 2000-12-19 | 2004-02-11 | Hewlett-Packard Company | Activation of voice-controlled apparatus |
| EP1513345A1 (en) * | 2003-09-05 | 2005-03-09 | Sony Corporation | Communication apparatus and conference apparatus |
| US6876775B2 (en) | 2001-02-16 | 2005-04-05 | Canesta, Inc. | Technique for removing blurring from a captured image |
| US7006236B2 (en) | 2002-05-22 | 2006-02-28 | Canesta, Inc. | Method and apparatus for approximating depth of an object's placement onto a monitored region with applications to virtual interface devices |
| US7050177B2 (en) | 2002-05-22 | 2006-05-23 | Canesta, Inc. | Method and apparatus for approximating depth of an object's placement onto a monitored region with applications to virtual interface devices |
| US7173230B2 (en) | 2001-09-05 | 2007-02-06 | Canesta, Inc. | Electromagnetic wave detection arrangement with capacitive feedback |
| US7340077B2 (en) | 2002-02-15 | 2008-03-04 | Canesta, Inc. | Gesture recognition system using depth perceptive sensors |
| US7472134B2 (en) * | 2000-07-03 | 2008-12-30 | Fujifilm Corporation | Image distributing system |
| US7526120B2 (en) | 2002-09-11 | 2009-04-28 | Canesta, Inc. | System and method for providing intelligent airbag deployment |
| WO2014096908A1 (en) * | 2012-12-21 | 2014-06-26 | Nokia Corporation | Spatial audio apparatus |
| JP2015152316A (en) * | 2014-02-10 | 2015-08-24 | 株式会社小野測器 | Sound source visualization device |
| US9165368B2 (en) | 2005-02-08 | 2015-10-20 | Microsoft Technology Licensing, Llc | Method and system to segment depth images and to detect shapes in three-dimensionally acquired data |
| US10242255B2 (en) | 2002-02-15 | 2019-03-26 | Microsoft Technology Licensing, Llc | Gesture recognition system using depth perceptive sensors |
| US11153472B2 (en) | 2005-10-17 | 2021-10-19 | Cutting Edge Vision, LLC | Automatic upload of pictures from a camera |
Families Citing this family (40)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7379978B2 (en) * | 2002-07-19 | 2008-05-27 | Fiserv Incorporated | Electronic item management and archival system and method of operating the same |
| US7428000B2 (en) * | 2003-06-26 | 2008-09-23 | Microsoft Corp. | System and method for distributed meetings |
| GB2404297B (en) * | 2003-07-24 | 2007-12-05 | Hewlett Packard Development Co | Editing multiple camera outputs |
| JP2005122128A (en) * | 2003-09-25 | 2005-05-12 | Fuji Photo Film Co Ltd | Speech recognition system and program |
| US20050228673A1 (en) * | 2004-03-30 | 2005-10-13 | Nefian Ara V | Techniques for separating and evaluating audio and video source data |
| JP2006197115A (en) * | 2005-01-12 | 2006-07-27 | Fuji Photo Film Co Ltd | Imaging device and image output device |
| US8094193B2 (en) * | 2005-10-12 | 2012-01-10 | New Vad, Llc | Presentation video control system |
| EP1850640B1 (en) * | 2006-04-25 | 2009-06-17 | Harman/Becker Automotive Systems GmbH | Vehicle communication system |
| US10410676B2 (en) | 2006-11-27 | 2019-09-10 | Kbport Llc | Portable tablet computer based multiple sensor mount having sensor input integration with real time user controlled commenting and flagging and method of using same |
| US10002539B2 (en) * | 2006-11-27 | 2018-06-19 | Kbport Llc | Method and apparatus for integrated recording and playback of video audio and data inputs |
| US8074581B2 (en) | 2007-10-12 | 2011-12-13 | Steelcase Inc. | Conference table assembly |
| US8537196B2 (en) | 2008-10-06 | 2013-09-17 | Microsoft Corporation | Multi-device capture and spatial browsing of conferences |
| US20140361954A1 (en) | 2013-06-07 | 2014-12-11 | Lewis Epstein | Personal control apparatus and method for sharing information in a collaboration workspace |
| US10631632B2 (en) | 2008-10-13 | 2020-04-28 | Steelcase Inc. | Egalitarian control apparatus and method for sharing information in a collaborative workspace |
| WO2010071928A1 (en) * | 2008-12-22 | 2010-07-01 | Seeing Machines Limited | Automatic calibration of a gaze direction algorithm from user behaviour |
| CN101442654B (en) * | 2008-12-26 | 2012-05-23 | 华为终端有限公司 | Method, apparatus and system for switching video object of video communication |
| US10884607B1 (en) | 2009-05-29 | 2021-01-05 | Steelcase Inc. | Personal control apparatus and method for sharing information in a collaborative workspace |
| US8676581B2 (en) * | 2010-01-22 | 2014-03-18 | Microsoft Corporation | Speech recognition analysis via identification information |
| US8265341B2 (en) * | 2010-01-25 | 2012-09-11 | Microsoft Corporation | Voice-body identity correlation |
| DE102010035834A1 (en) * | 2010-08-30 | 2012-03-01 | Vodafone Holding Gmbh | An imaging system and method for detecting an object |
| US8630854B2 (en) * | 2010-08-31 | 2014-01-14 | Fujitsu Limited | System and method for generating videoconference transcriptions |
| US8791977B2 (en) | 2010-10-05 | 2014-07-29 | Fujitsu Limited | Method and system for presenting metadata during a videoconference |
| US8705812B2 (en) * | 2011-06-10 | 2014-04-22 | Amazon Technologies, Inc. | Enhanced face recognition in video |
| DE102012105784A1 (en) * | 2012-06-29 | 2014-01-02 | Vahit Tas | Video conference system for receiving and bidirectional transmission of video and audio signals, has video camera device with video camera and movement unit which is formed to adjust video camera on certain object |
| US9661041B2 (en) * | 2013-06-28 | 2017-05-23 | Linkedin Corporation | Virtual conference manager |
| US9113036B2 (en) * | 2013-07-17 | 2015-08-18 | Ebay Inc. | Methods, systems, and apparatus for providing video communications |
| JP6415932B2 (en) * | 2014-11-05 | 2018-10-31 | 日本電信電話株式会社 | Estimation apparatus, estimation method, and program |
| JP6528574B2 (en) | 2015-07-14 | 2019-06-12 | 株式会社リコー | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM |
| JP2017028375A (en) | 2015-07-16 | 2017-02-02 | 株式会社リコー | Image processing device and program |
| JP2017028633A (en) | 2015-07-27 | 2017-02-02 | 株式会社リコー | Video distribution terminal, program, and video distribution method |
| US9621795B1 (en) | 2016-01-08 | 2017-04-11 | Microsoft Technology Licensing, Llc | Active speaker location detection |
| US10264213B1 (en) | 2016-12-15 | 2019-04-16 | Steelcase Inc. | Content amplification system and method |
| JP2019121105A (en) * | 2017-12-28 | 2019-07-22 | 富士ゼロックス株式会社 | Control device and control program |
| US10951859B2 (en) | 2018-05-30 | 2021-03-16 | Microsoft Technology Licensing, Llc | Videoconferencing device and method |
| US11227602B2 (en) * | 2019-11-20 | 2022-01-18 | Facebook Technologies, Llc | Speech transcription using multiple data sources |
| KR102184649B1 (en) * | 2019-12-05 | 2020-11-30 | (주)힐링사운드 | Sound control system and method for dental surgery |
| JP6860178B1 (en) * | 2019-12-27 | 2021-04-14 | Necプラットフォームズ株式会社 | Video processing equipment and video processing method |
| US11295543B2 (en) | 2020-03-31 | 2022-04-05 | International Business Machines Corporation | Object detection in an image |
| ES1249310Y (en) * | 2020-05-17 | 2020-10-01 | Bemyvega S L | Device to improve visual and / or auditory monitoring of a presentation given by a speaker |
| US11743428B2 (en) | 2022-01-19 | 2023-08-29 | Ebay Inc. | Detailed videoconference viewpoint generation |
Family Cites Families (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US3392392A (en) | 1967-06-05 | 1968-07-09 | Motorola Inc | Bearing measurement system using statistical signal processing by analog techniques |
| US3601530A (en) | 1969-04-29 | 1971-08-24 | Bell Telephone Labor Inc | Video conference system using voice-switched cameras |
| US4333170A (en) | 1977-11-21 | 1982-06-01 | Northrop Corporation | Acoustical detection and tracking system |
| DE3381357D1 (en) | 1982-12-22 | 1990-04-26 | Marconi Co Ltd | ACOUSTIC BEARING SYSTEMS. |
| JPH0771279B2 (en) | 1988-08-17 | 1995-07-31 | 富士通株式会社 | Image processing device for video conference |
| US5231674A (en) | 1989-06-09 | 1993-07-27 | Lc Technologies, Inc. | Eye tracking method and apparatus |
| US5206721A (en) * | 1990-03-08 | 1993-04-27 | Fujitsu Limited | Television conference system |
| JPH04297196A (en) * | 1991-03-26 | 1992-10-21 | Toshiba Corp | Image pickup device for object to be photographed |
| JPH04301976A (en) * | 1991-03-28 | 1992-10-26 | Kyocera Corp | video conference system |
| JPH0759075A (en) * | 1993-08-19 | 1995-03-03 | Jamco Corp | Omnidirectional monitoring device |
| US5347306A (en) | 1993-12-17 | 1994-09-13 | Mitsubishi Electric Research Laboratories, Inc. | Animated electronic meeting place |
| JP3631266B2 (en) | 1994-05-13 | 2005-03-23 | 株式会社応用計測研究所 | Measuring device for moving objects |
| US5508734A (en) | 1994-07-27 | 1996-04-16 | International Business Machines Corporation | Method and apparatus for hemispheric imaging which emphasizes peripheral content |
| US5500671A (en) | 1994-10-25 | 1996-03-19 | At&T Corp. | Video conference system and method of providing parallax correction and a sense of presence |
| JPH08163526A (en) * | 1994-11-30 | 1996-06-21 | Canon Inc | Video selection device |
| JP3272906B2 (en) * | 1995-05-29 | 2002-04-08 | シャープ株式会社 | Gaze direction detecting method and apparatus and man-machine interface apparatus including the same |
| DE19528425C1 (en) | 1995-08-02 | 1996-05-15 | Siemens Ag | Automated stereoscopic camera selection arrangement |
| JPH10145763A (en) * | 1996-11-15 | 1998-05-29 | Mitsubishi Electric Corp | Conference system |
| US5995936A (en) | 1997-02-04 | 1999-11-30 | Brais; Louis | Report generation system and method for capturing prose, audio, and video by voice command and automatically linking sound and image to formatted text locations |
| JP3572849B2 (en) * | 1997-02-14 | 2004-10-06 | 富士ゼロックス株式会社 | Sound source position measuring device and camera photographing control device |
| CA2310114A1 (en) * | 1998-02-02 | 1999-08-02 | Steve Mann | Wearable camera system with viewfinder means |
| US6593956B1 (en) | 1998-05-15 | 2003-07-15 | Polycom, Inc. | Locating an audio source |
| GB2342802B (en) | 1998-10-14 | 2003-04-16 | Picturetel Corp | Method and apparatus for indexing conference content |
-
1999
- 1999-04-14 GB GBGB9908545.8A patent/GB9908545D0/en not_active Ceased
-
2000
- 2000-03-22 US US09/533,398 patent/US7113201B1/en not_active Expired - Fee Related
- 2000-03-24 DE DE60030027T patent/DE60030027D1/en not_active Expired - Lifetime
- 2000-03-24 EP EP00302422A patent/EP1045586B1/en not_active Expired - Lifetime
- 2000-03-27 JP JP2000086806A patent/JP4697907B2/en not_active Expired - Fee Related
Cited By (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7472134B2 (en) * | 2000-07-03 | 2008-12-30 | Fujifilm Corporation | Image distributing system |
| EP1217608A3 (en) * | 2000-12-19 | 2004-02-11 | Hewlett-Packard Company | Activation of voice-controlled apparatus |
| US6876775B2 (en) | 2001-02-16 | 2005-04-05 | Canesta, Inc. | Technique for removing blurring from a captured image |
| US6690618B2 (en) | 2001-04-03 | 2004-02-10 | Canesta, Inc. | Method and apparatus for approximating a source position of a sound-causing event for determining an input used in operating an electronic device |
| WO2002082249A3 (en) * | 2001-04-03 | 2003-03-20 | Canesta Inc | Method and apparatus for approximating a source position of a sound-causing event |
| US7173230B2 (en) | 2001-09-05 | 2007-02-06 | Canesta, Inc. | Electromagnetic wave detection arrangement with capacitive feedback |
| US10242255B2 (en) | 2002-02-15 | 2019-03-26 | Microsoft Technology Licensing, Llc | Gesture recognition system using depth perceptive sensors |
| US7340077B2 (en) | 2002-02-15 | 2008-03-04 | Canesta, Inc. | Gesture recognition system using depth perceptive sensors |
| US7006236B2 (en) | 2002-05-22 | 2006-02-28 | Canesta, Inc. | Method and apparatus for approximating depth of an object's placement onto a monitored region with applications to virtual interface devices |
| US7050177B2 (en) | 2002-05-22 | 2006-05-23 | Canesta, Inc. | Method and apparatus for approximating depth of an object's placement onto a monitored region with applications to virtual interface devices |
| US7526120B2 (en) | 2002-09-11 | 2009-04-28 | Canesta, Inc. | System and method for providing intelligent airbag deployment |
| US7227566B2 (en) | 2003-09-05 | 2007-06-05 | Sony Corporation | Communication apparatus and TV conference apparatus |
| EP1513345A1 (en) * | 2003-09-05 | 2005-03-09 | Sony Corporation | Communication apparatus and conference apparatus |
| US9165368B2 (en) | 2005-02-08 | 2015-10-20 | Microsoft Technology Licensing, Llc | Method and system to segment depth images and to detect shapes in three-dimensionally acquired data |
| US9311715B2 (en) | 2005-02-08 | 2016-04-12 | Microsoft Technology Licensing, Llc | Method and system to segment depth images and to detect shapes in three-dimensionally acquired data |
| US11153472B2 (en) | 2005-10-17 | 2021-10-19 | Cutting Edge Vision, LLC | Automatic upload of pictures from a camera |
| US11818458B2 (en) | 2005-10-17 | 2023-11-14 | Cutting Edge Vision, LLC | Camera touchpad |
| WO2014096908A1 (en) * | 2012-12-21 | 2014-06-26 | Nokia Corporation | Spatial audio apparatus |
| JP2015152316A (en) * | 2014-02-10 | 2015-08-24 | 株式会社小野測器 | Sound source visualization device |
Also Published As
| Publication number | Publication date |
|---|---|
| EP1045586A3 (en) | 2002-08-07 |
| JP2000350192A (en) | 2000-12-15 |
| EP1045586B1 (en) | 2006-08-16 |
| JP4697907B2 (en) | 2011-06-08 |
| US7113201B1 (en) | 2006-09-26 |
| GB9908545D0 (en) | 1999-06-09 |
| DE60030027D1 (en) | 2006-09-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP1045586B1 (en) | Image processing apparatus | |
| US7117157B1 (en) | Processing apparatus for determining which person in a group is speaking | |
| US7139767B1 (en) | Image processing apparatus and database | |
| TWI311286B (en) | ||
| EP4075794A1 (en) | Region of interest based adjustment of camera parameters in a teleconferencing environment | |
| JP2000125274A (en) | Method and apparatus for indexing meeting content | |
| JP2004515982A (en) | Method and apparatus for predicting events in video conferencing and other applications | |
| US20080235724A1 (en) | Face Annotation In Streaming Video | |
| CN107820037B (en) | Audio signal, image processing method, device and system | |
| US11477393B2 (en) | Detecting and tracking a subject of interest in a teleconference | |
| JP7388188B2 (en) | Speaker recognition system, speaker recognition method, and speaker recognition program | |
| US20210409645A1 (en) | Joint upper-body and face detection using multi-task cascaded convolutional networks | |
| Zhang et al. | Boosting-based multimodal speaker detection for distributed meeting videos | |
| GB2351628A (en) | Image and sound processing apparatus | |
| GB2351627A (en) | Image processing apparatus | |
| TWI799048B (en) | Panoramic video conference system and method | |
| CN114390267B (en) | Stereoscopic image data synthesis method, device, electronic device and storage medium | |
| US12333807B2 (en) | Object data generation for remote image processing | |
| WO2022006693A1 (en) | Videoconferencing systems with facial image rectification | |
| CN113099158A (en) | Method, device, equipment and storage medium for controlling pickup device in shooting site | |
| KR102735118B1 (en) | AI studio systems for online lectures and a method for controlling them | |
| CN110730378A (en) | Information processing method and system | |
| CN117319594A (en) | Conference personnel tracking display method, device, equipment and readable storage medium | |
| JP2005295133A (en) | Information distribution system | |
| JP2021108411A (en) | Video processing apparatus and video processing method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
| AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
| PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
| AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
| AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
| RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7H 04N 7/15 A, 7H 04N 7/14 B |
|
| 17P | Request for examination filed |
Effective date: 20021219 |
|
| AKX | Designation fees paid |
Designated state(s): CH DE FR GB IT LI |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): CH DE FR GB IT LI |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060816 Ref country code: LI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060816 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED. Effective date: 20060816 |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
| REF | Corresponds to: |
Ref document number: 60030027 Country of ref document: DE Date of ref document: 20060928 Kind code of ref document: P |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20061117 |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
| EN | Fr: translation not filed | ||
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20070518 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20070511 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060816 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20150316 Year of fee payment: 16 |
|
| GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20160324 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160324 |