[go: up one dir, main page]

WO2000056070A1 - Videophone with audio source tracking - Google Patents

Videophone with audio source tracking Download PDF

Info

Publication number
WO2000056070A1
WO2000056070A1 PCT/US2000/007384 US0007384W WO0056070A1 WO 2000056070 A1 WO2000056070 A1 WO 2000056070A1 US 0007384 W US0007384 W US 0007384W WO 0056070 A1 WO0056070 A1 WO 0056070A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
input device
video
audio input
mouth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2000/007384
Other languages
French (fr)
Inventor
Daisuke Terasawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority to AU40168/00A priority Critical patent/AU4016800A/en
Publication of WO2000056070A1 publication Critical patent/WO2000056070A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/19Arrangements of transmitters, receivers, or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Mouthpieces or receivers specially adapted therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display

Definitions

  • the present invention is related generally to a videophone and, more particularly, to a videophone having a tracking audio input device.
  • Wireless communication devices such as cellular telephones and other personal communication devices are widely used as a supplement to, or replacement for, conventional telephone systems.
  • wireless communication devices offer the advantage of portability, thus enabling the user to establish a wireless communication link between virtually any two locations on earth.
  • wireless communication devices In addition to conventional voice communication, wireless communication devices also provide features such as voicemail. Other, more advanced wireless communication devices offer the opportunity for video teleconferencing. These "videophones" include a small, solid-state camera and a video display that enable the user to conduct a video teleconference via a wireless communication link. Operation of such a videophone requires the device be located at some distance from the user to enable the solid-state video input device to capture an image of the user. At such distances, audio reception becomes more difficult. Therefore, it can be appreciated that there is a significant need for a system and method that will allow satisfactory operation of the audio portion of the system. The present invention provides this and other advantages as will be apparent from the following detailed description and accompanying figures.
  • the present invention is directed to an audiovisual communication device and comprises a receiver to receive data, including video data, from a location remote from the communication device.
  • a display is coupled to the receiver to display video images corresponding to the received video data.
  • a video input device is provided to sense a video image of a user and to generate video data corresponding to the sensed video image.
  • An audio input device is also provided to sense speech signals from the user.
  • the audio input device is responsive to control signals to orient a directional sensitivity of the audio input device toward the mouth of the user.
  • the system further comprises an image recognition processor to analyze the generated video data and thereby identify and track the position of the user's mouth.
  • the image recognition processor generates control signals for the audio input device to orient the directional sensitivity of the audio input device toward the mouth of the user.
  • the received data may also include audio data.
  • the device can include an audio output device to provide audible signals relating to the received audio data.
  • the device may also include a transmitter to transmit electrical signals generated by the audio input device and may also transmit the video data generated by the video input device.
  • the device may further comprise a connector that couples the device to a network, such as a public switched telephone network.
  • a network such as a public switched telephone network.
  • the receiver and transmitter may be wireless devices that receive and transmit data via a wireless connection.
  • the wireless communication device may be contained within a housing sized to be held in the hand of the user.
  • FIG. 1 is a perspective view of a wireless communication device having audio and video capabilities.
  • FIG. 2 illustrates the operation of the wireless communication device of FIG. 1 to capture a video image of the user and to direct the audio input device toward the user's mouth.
  • FIG. 3 is a functional block diagram of an exemplary embodiment of the wireless communication device of the present invention.
  • FIG. 4 illustrates the operation of a directional audio input device that is maneuvered to direct the audio input device toward the user's mouth.
  • FIG. 5 is a perspective view of a desktop version of the communication device of the present invention.
  • the present invention is directed to a technique for adjusting an audio input device so as to track the position of the user's mouth and thereby enhance the detection of the user's voice.
  • the system described herein operates in conjunction with a video input device and uses image processing technology to track the position of the user's mouth and thereby direct the audio input device in a manner that tracks the position of the user's mouth.
  • a video input device uses image processing technology to track the position of the user's mouth and thereby direct the audio input device in a manner that tracks the position of the user's mouth.
  • the present invention is embodied in a wireless communication system 100 illustrated in FIG. 1.
  • the system 100 includes a housing 102 that contains many components, such as an audio output device 104, an audio input device 106, and a keypad 110.
  • a transmitter 128 and receiver 130 (see FIG. 3) are also contained within the housing 102.
  • the transmitter 128 and receiver 130 are coupled to an antenna 112 illustrated in FIG. 1 in the extended or operational position.
  • the system 100 also includes a display 116 and a video input device 118.
  • the display 116 is a liquid crystal display (LCD).
  • the display 116 is a high resolution color display to allow the display of video images received by the receiver 130 (see FIG. 3).
  • the video input device 118 may be a conventional vidicon or charge-coupled device (CCD) to detect the image of the user.
  • CCD charge-coupled device
  • image processing technology is used to track the position of the user's mouth and within the detected image to generate control signals related thereto. The control signals are used to direct the audio input device 106.
  • FIG. 2 is not drawn to scale and does not accurately represent the relative size and position of the user's head with respect to the system 100.
  • FIG. 2 is provided merely to illustrate the fundamental principles of operation of the system 100.
  • the system 100 may be conveniently contained in the user's hand and held at arm's length from the user's head. This allows the video input device 118 to have a sufficient area of coverage 120 that includes the user's entire head. When held at arm's length, the user's arm may become fatigued and shake, resulting in a wobbly picture.
  • Known technologies may be readily implemented with the system 100 to provide a stabilized video image.
  • Another drawback of extended use is that the shaking hand of the user causes the position and orientation of the audio input device 106 to vary with respect to the user's mouth, resulting in voice dropout or other unreliable operation of the audio portion of the system.
  • the system 100 tracks the position of the user's mouth and generates control signals related thereto.
  • image processing and pattern recognition technologies can be used to identify the initial location of the user's mouth and to track changes in the position of the user's mouth with respect to the system 100.
  • the image processing system generates control signals that are used to direct the audio input device 106 in the direction of the user's mouth to maximize the sensitivity of the audio input device in the direction of the user's mouth.
  • the audio input device 106 is steerable (either electronically or mechanically) so that it provides a directivity pattern 122, as illustrated in FIG. 2.
  • Various forms of the audio input device 106 are described below.
  • the system 100 is illustrated in the functional block diagram of
  • FIG. 3 includes a central processing unit (CPU) 124, which controls operation of the system.
  • a memory 126 which may include both read-only memory (ROM) and random access memory (RAM), provides instructions and data to the CPU 124.
  • a portion of the memory 126 may also include non- volatile random access memory (NVRAM).
  • NVRAM non- volatile random access memory
  • the transmitter 128 and receiver 130 are also contained within the housing 102 to allow transmission and reception of data, such as audio and video communications, between the system 100 and a remote location, such as a cell site controller (not shown).
  • the transmitter 128 and receiver 130 may be combined into a transceiver 132.
  • the antenna 112 is attached to the housing 102 and electrically coupled to the transceiver 132. Although the antenna 112 illustrated in FIG. 1 is extended from the housing 102, this is not necessary for satisfactory operation of the system 100.
  • the antenna 112 may be a fixed antenna extending from the housing 102 or may be contained completely within the housing.
  • the operation of the transmitter 128, receiver 130, and antenna 112 is well known in the art and need not be described herein.
  • the audio output device 104 operates in a conventional manner to provide audio signals to the user.
  • the audio output device 104 may be a conventional speaker or, alternatively, may be a headset worn by the user.
  • the display 116 may be used to display alphanumeric data as well as the video images.
  • the receiver 130 receives data, which may include audio data, alphanumeric digital data, and video data. Those skilled in the art can appreciate that the three forms of data described above may all be transmitted as digitized data. That is, the audio data, video data, and alphanumeric data may be each digitized and transmitted to the system 100 in a well-known fashion.
  • the receiver 130 may conveniently receive and demodulate the data to recover the audio, alphanumeric, and video data.
  • the video data may be further processed for use with the display 116.
  • the system 100 also generates video data using the video input device 118.
  • the video input device 118 can be any form of video device, such as a vidicon tube, charge coupled device (CCD) or the like.
  • CCD charge coupled device
  • the present invention is not limited by the specific form of the video input device 118.
  • the video signal generated by the video input device 118 may be transmitted by the transmitter 128 to a location remote from the system 100.
  • the signals generated by the video input device 118 are analyzed by an image processor 138.
  • the image processor 138 tracks the position of the user's mouth within the image and generates control signals related thereto.
  • the control signals generated by the image processor 138 are used to control the directionality of the audio input device 106.
  • Image stabilization circuits are used in some conventional video cameras to stabilize the image even when the camera is shaking. For example, a hand-held camera is subject to vibrations due to the user's inability to hold the camera completely still.
  • known image stabilization techniques track the position of a primary object within the field of view and adjust the video signal to compensate for variations in the position of the objects with respect to the video camera. Thus, minor vibrations and jitter are overcome to a large extent by the image stabilization techniques.
  • Similar techniques may be used by the system 100 to track the location of the user's mouth within the field of view even though the user's mouth may move with respect to the video input device 118 and with respect to the audio input device 106.
  • the change in position is detected by the video input device 118 and the relative change in position of the user's mouth is determined by the image processor 138.
  • the control signals generated by the image processor 138 are coupled to a directional control circuit 140.
  • the directional control circuit 140 directs the directivity pattern 122 (see FIG. 1) to track the user's mouth and thereby provide more reliable detection of the user's voice.
  • the various components described above are coupled together by a bus system 142.
  • the bus system 142 may comprise a control bus, address bus, status bus, as well as a data bus.
  • the various buses are illustrated in FIG. 3 as the bus system 142.
  • the various buses are illustrated in FIG. 3 as the bus system 142.
  • the CPU 124 executing instructions from the memory 126.
  • the image processor 138 and directional control unit 140 may be implemented by the CPU 124.
  • these components may be independent devices.
  • FIG. 3 illustrates each of these elements as a separate component since each performs a separate function.
  • the specific implementation details of the directional control circuit 140 depend on the form of the audio input device 106.
  • the audio input device 106 may comprise an array of microphones having relatively broad directional response.
  • the directional control circuit 140 may comprise a phase delay circuit such that the combination of the audio input device 106 and directional control circuit 140 form a phased array microphone assembly.
  • the control signals generated by the image processor 138 are used to set the delay times for individual delay lines and thereby focus the directivity pattern 122
  • phased array microphone systems are known in the art and are described, by way of example, in "The Phased Array Microphone By Charge Transfer Device" by
  • the audio input device 106 may comprise a plurality of microphones that have an enhanced directional selectivity using beamforming technologies, such as described in "A Self-Steering Digital Microphone Array,” by Walter Kellermann, IEEE, 1991, and in “Calibration, Optimization and DSP Implementation of Microphone Array for Speech Processing,” by A. Wang et al., IEEE, 1966.
  • the image processor 138 generates beamforming control signals that are used to focus the directivity pattern 122 (see FIG. 2) toward the user's mouth and thereby track changes in the position of the user's mouth.
  • the audio input device 106 is a single, highly-directional microphone or other audio input device.
  • the audio input device 106 is mounted so as to pivot in the X and Y directions.
  • Motors 144 control the pivotal position of the audio input device 106.
  • the control signals generated by the image processor 138 are used to control the position of the motors 144 and in turn control the directivity pattern 120 (see FIG. 2) of the audio input device 106.
  • the motors 144 may be low-power motors or solid-state devices that minimize power consumption. If the system 100 is battery powered, other alternative embodiments described above may be used with an array of microphones.
  • the system 100 illustrated in FIG. 5 may be line powered.
  • the display 116 may be a conventional video cathode ray tube (CRT) display.
  • the video input device 118 may be contained within the housing 102 or may be a separate device, such as a video camera mounted atop the housing 102, or mounted in another convenient place.
  • the audio output device 104 may also be contained within the housing 102 or a separate device, as illustrated in FIG. 5.
  • the audio input device 106 may also be contained within the housing 102 or externally mounted. The external mounted audio input device may be particularly useful with an array of microphones.
  • the single microphone and motors 144 may be contained within the housing 102 to minimize exposure to dirt and dust.
  • the system 100 illustrated in FIG. 5 may also be a wireless communication system, as illustrated in the functional block diagram of FIG. 3. However, the system 100 may also include a connector 150 to hardwire the system to a network 152, such as a switched telephone network.
  • the operation of the system 100 advantageously allows the audio input device 106 to track the position of the user's mouth and thereby improve detection of the user's voice. If the user stands up or sits down, the video input device 118 detects the changes in the position of the user's mouth and generates appropriate signals related to such movement. The direction and amount of movement is determined by the image processor 138 and appropriate control signals are sent to the directional control circuit 140 so as to alter the directivity pattern 120 of the audio input device 106 in a corresponding manner. Similarly, if the user moves to the left or right, such movement is detected by the video input device 118 and the amount and direction of movement is determined by the image processor 138. The control signals generated by the image processor 138 are provided to the directional control circuit 140. Thus, the system 100 utilizes video image pattern recognition and image tracking technology to track the position of the user's mouth and generates appropriate control signals whereby the audio input device tracks the user's mouth.
  • FIG. 1 illustrates the use of a stand alone wireless communication device while FIG. 4 illustrates a line powered video teleconferencing device.
  • the principles of the present invention may be readily implemented in either embodiment.
  • a multitude of known components may be used for the audio input and output devices and for the video input and output devices.
  • a variety of known technologies may be used to direct the directivity pattern of the audio input device in the direction of the user's mouth. Therefore, the present invention is to be limited only by the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

A video teleconferencing device (100) uses pattern recognition technology and image tracking technology to generate control signals indicative of the relative position of the user's mouth with respect to an audio input device (106). The directionality of the audio input device (106) is altered to compensate for changes in the position of the user's mouth. A self-contained wireless version of the device includes a transmitter and receiver. A desktop version of the device may be wireless or may include a connector to provide a network connection.

Description

VIDEOPHONE WITH AUDIO SOURCE TRACKING
FIELD OF THE INVENTION
The present invention is related generally to a videophone and, more particularly, to a videophone having a tracking audio input device.
BACKGROUND OF THE INVENTION
Wireless communication devices, such as cellular telephones and other personal communication devices are widely used as a supplement to, or replacement for, conventional telephone systems. In addition to functioning as a replacement for a conventional telephone, wireless communication devices offer the advantage of portability, thus enabling the user to establish a wireless communication link between virtually any two locations on earth.
In addition to conventional voice communication, wireless communication devices also provide features such as voicemail. Other, more advanced wireless communication devices offer the opportunity for video teleconferencing. These "videophones" include a small, solid-state camera and a video display that enable the user to conduct a video teleconference via a wireless communication link. Operation of such a videophone requires the device be located at some distance from the user to enable the solid-state video input device to capture an image of the user. At such distances, audio reception becomes more difficult. Therefore, it can be appreciated that there is a significant need for a system and method that will allow satisfactory operation of the audio portion of the system. The present invention provides this and other advantages as will be apparent from the following detailed description and accompanying figures.
SUMMARY OF THE INVENTION
The present invention is directed to an audiovisual communication device and comprises a receiver to receive data, including video data, from a location remote from the communication device. A display is coupled to the receiver to display video images corresponding to the received video data. A video input device is provided to sense a video image of a user and to generate video data corresponding to the sensed video image. An audio input device is also provided to sense speech signals from the user. The audio input device is responsive to control signals to orient a directional sensitivity of the audio input device toward the mouth of the user. The system further comprises an image recognition processor to analyze the generated video data and thereby identify and track the position of the user's mouth. The image recognition processor generates control signals for the audio input device to orient the directional sensitivity of the audio input device toward the mouth of the user.
The received data may also include audio data. The device can include an audio output device to provide audible signals relating to the received audio data. The device may also include a transmitter to transmit electrical signals generated by the audio input device and may also transmit the video data generated by the video input device.
The device may further comprise a connector that couples the device to a network, such as a public switched telephone network. Alternatively, the receiver and transmitter may be wireless devices that receive and transmit data via a wireless connection. In one embodiment, the wireless communication device may be contained within a housing sized to be held in the hand of the user.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a perspective view of a wireless communication device having audio and video capabilities.
FIG. 2 illustrates the operation of the wireless communication device of FIG. 1 to capture a video image of the user and to direct the audio input device toward the user's mouth.
FIG. 3 is a functional block diagram of an exemplary embodiment of the wireless communication device of the present invention.
FIG. 4 illustrates the operation of a directional audio input device that is maneuvered to direct the audio input device toward the user's mouth. FIG. 5 is a perspective view of a desktop version of the communication device of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention is directed to a technique for adjusting an audio input device so as to track the position of the user's mouth and thereby enhance the detection of the user's voice. The system described herein operates in conjunction with a video input device and uses image processing technology to track the position of the user's mouth and thereby direct the audio input device in a manner that tracks the position of the user's mouth. Although described herein as a cellular telephone, those skilled in the art can appreciate that the present invention is applicable to other forms of communication, such as PCS, mobile radio telephones, mobile radio, and the like.
The present invention is embodied in a wireless communication system 100 illustrated in FIG. 1. The system 100 includes a housing 102 that contains many components, such as an audio output device 104, an audio input device 106, and a keypad 110. A transmitter 128 and receiver 130 (see FIG. 3) are also contained within the housing 102. The transmitter 128 and receiver 130 are coupled to an antenna 112 illustrated in FIG. 1 in the extended or operational position.
The system 100 also includes a display 116 and a video input device 118. In an exemplary embodiment, the display 116 is a liquid crystal display (LCD). Unlike many conventional wireless communication devices, the display 116 is a high resolution color display to allow the display of video images received by the receiver 130 (see FIG. 3). The video input device 118 may be a conventional vidicon or charge-coupled device (CCD) to detect the image of the user. As will be described in greater detail below, image processing technology is used to track the position of the user's mouth and within the detected image to generate control signals related thereto. The control signals are used to direct the audio input device 106.
The principles of operation of the system 100 may be more readily understood with respect to FIG. 2. It should be noted that FIG. 2 is not drawn to scale and does not accurately represent the relative size and position of the user's head with respect to the system 100. However, FIG. 2 is provided merely to illustrate the fundamental principles of operation of the system 100. In normal operation, the system 100 may be conveniently contained in the user's hand and held at arm's length from the user's head. This allows the video input device 118 to have a sufficient area of coverage 120 that includes the user's entire head. When held at arm's length, the user's arm may become fatigued and shake, resulting in a wobbly picture. Known technologies may be readily implemented with the system 100 to provide a stabilized video image. Another drawback of extended use is that the shaking hand of the user causes the position and orientation of the audio input device 106 to vary with respect to the user's mouth, resulting in voice dropout or other unreliable operation of the audio portion of the system.
To overcome this problem, the system 100 tracks the position of the user's mouth and generates control signals related thereto. For example, image processing and pattern recognition technologies can be used to identify the initial location of the user's mouth and to track changes in the position of the user's mouth with respect to the system 100. The image processing system generates control signals that are used to direct the audio input device 106 in the direction of the user's mouth to maximize the sensitivity of the audio input device in the direction of the user's mouth. The audio input device 106 is steerable (either electronically or mechanically) so that it provides a directivity pattern 122, as illustrated in FIG. 2. Various forms of the audio input device 106 are described below. The system 100 is illustrated in the functional block diagram of
FIG. 3 and includes a central processing unit (CPU) 124, which controls operation of the system. A memory 126, which may include both read-only memory (ROM) and random access memory (RAM), provides instructions and data to the CPU 124. A portion of the memory 126 may also include non- volatile random access memory (NVRAM).
Also contained within the housing 102 is the transmitter 128 and receiver 130 to allow transmission and reception of data, such as audio and video communications, between the system 100 and a remote location, such as a cell site controller (not shown). The transmitter 128 and receiver 130 may be combined into a transceiver 132. The antenna 112 is attached to the housing 102 and electrically coupled to the transceiver 132. Although the antenna 112 illustrated in FIG. 1 is extended from the housing 102, this is not necessary for satisfactory operation of the system 100. The antenna 112 may be a fixed antenna extending from the housing 102 or may be contained completely within the housing. The operation of the transmitter 128, receiver 130, and antenna 112 is well known in the art and need not be described herein.
As previously discussed, the audio output device 104 operates in a conventional manner to provide audio signals to the user. The audio output device 104 may be a conventional speaker or, alternatively, may be a headset worn by the user.
The display 116 may be used to display alphanumeric data as well as the video images. The receiver 130 receives data, which may include audio data, alphanumeric digital data, and video data. Those skilled in the art can appreciate that the three forms of data described above may all be transmitted as digitized data. That is, the audio data, video data, and alphanumeric data may be each digitized and transmitted to the system 100 in a well-known fashion. The receiver 130 may conveniently receive and demodulate the data to recover the audio, alphanumeric, and video data. The video data may be further processed for use with the display 116.
The system 100 also generates video data using the video input device 118. The video input device 118 can be any form of video device, such as a vidicon tube, charge coupled device (CCD) or the like. The present invention is not limited by the specific form of the video input device 118.
For operation as a video teleconferencing device, the video signal generated by the video input device 118 may be transmitted by the transmitter 128 to a location remote from the system 100.
In addition, the signals generated by the video input device 118 are analyzed by an image processor 138. The image processor 138 tracks the position of the user's mouth within the image and generates control signals related thereto. As will be discussed in greater detail below, the control signals generated by the image processor 138 are used to control the directionality of the audio input device 106. Image stabilization circuits are used in some conventional video cameras to stabilize the image even when the camera is shaking. For example, a hand-held camera is subject to vibrations due to the user's inability to hold the camera completely still. However, known image stabilization techniques track the position of a primary object within the field of view and adjust the video signal to compensate for variations in the position of the objects with respect to the video camera. Thus, minor vibrations and jitter are overcome to a large extent by the image stabilization techniques.
Similar techniques may be used by the system 100 to track the location of the user's mouth within the field of view even though the user's mouth may move with respect to the video input device 118 and with respect to the audio input device 106. As the position of the user's mouth changes, the change in position is detected by the video input device 118 and the relative change in position of the user's mouth is determined by the image processor 138. The control signals generated by the image processor 138 are coupled to a directional control circuit 140. The directional control circuit 140 directs the directivity pattern 122 (see FIG. 1) to track the user's mouth and thereby provide more reliable detection of the user's voice.
The various components described above are coupled together by a bus system 142. The bus system 142 may comprise a control bus, address bus, status bus, as well as a data bus. For the sake of clarity, the various buses are illustrated in FIG. 3 as the bus system 142. Those skilled in the art will appreciate that some of the components illustrated in FIG. 3 may be implemented by the CPU 124 executing instructions from the memory 126. For example, the image processor 138 and directional control unit 140 may be implemented by the CPU 124. Alternatively, these components may be independent devices. However, FIG. 3 illustrates each of these elements as a separate component since each performs a separate function.
The specific implementation details of the directional control circuit 140 depend on the form of the audio input device 106. For example, the audio input device 106 may comprise an array of microphones having relatively broad directional response. The directional control circuit 140 may comprise a phase delay circuit such that the combination of the audio input device 106 and directional control circuit 140 form a phased array microphone assembly. The control signals generated by the image processor 138 are used to set the delay times for individual delay lines and thereby focus the directivity pattern 122
(see FIG. 2) of the microphone array toward the mouth of the user. Such phased array microphone systems are known in the art and are described, by way of example, in "The Phased Array Microphone By Charge Transfer Device" by
Minoru Murayama et al., 1981.
Alternatively, the audio input device 106 may comprise a plurality of microphones that have an enhanced directional selectivity using beamforming technologies, such as described in "A Self-Steering Digital Microphone Array," by Walter Kellermann, IEEE, 1991, and in "Calibration, Optimization and DSP Implementation of Microphone Array for Speech Processing," by A. Wang et al., IEEE, 1966. The image processor 138 generates beamforming control signals that are used to focus the directivity pattern 122 (see FIG. 2) toward the user's mouth and thereby track changes in the position of the user's mouth.
In yet another alternative embodiment, illustrated in FIG. 4, the audio input device 106 is a single, highly-directional microphone or other audio input device. In this embodiment, the audio input device 106 is mounted so as to pivot in the X and Y directions. Motors 144 control the pivotal position of the audio input device 106. In this embodiment, the control signals generated by the image processor 138 are used to control the position of the motors 144 and in turn control the directivity pattern 120 (see FIG. 2) of the audio input device 106. Those skilled in the art will appreciate that it is desirable to minimize the power consumed by the motors 144 if the system 100 is battery powered. The motors 144 may be low-power motors or solid-state devices that minimize power consumption. If the system 100 is battery powered, other alternative embodiments described above may be used with an array of microphones.
Although described above as a wireless communication device that may be battery operated, the principles of the present invention may be extended to other forms of video teleconferencing devices, such as illustrated in FIG. 5. The system 100 illustrated in FIG. 5 may be line powered. In this embodiment, the display 116 may be a conventional video cathode ray tube (CRT) display. The video input device 118 may be contained within the housing 102 or may be a separate device, such as a video camera mounted atop the housing 102, or mounted in another convenient place. The audio output device 104 may also be contained within the housing 102 or a separate device, as illustrated in FIG. 5. The audio input device 106 may also be contained within the housing 102 or externally mounted. The external mounted audio input device may be particularly useful with an array of microphones.
Alternatively, the single microphone and motors 144 (see FIG. 4) may be contained within the housing 102 to minimize exposure to dirt and dust. The system 100 illustrated in FIG. 5 may also be a wireless communication system, as illustrated in the functional block diagram of FIG. 3. However, the system 100 may also include a connector 150 to hardwire the system to a network 152, such as a switched telephone network.
The operation of the system 100 advantageously allows the audio input device 106 to track the position of the user's mouth and thereby improve detection of the user's voice. If the user stands up or sits down, the video input device 118 detects the changes in the position of the user's mouth and generates appropriate signals related to such movement. The direction and amount of movement is determined by the image processor 138 and appropriate control signals are sent to the directional control circuit 140 so as to alter the directivity pattern 120 of the audio input device 106 in a corresponding manner. Similarly, if the user moves to the left or right, such movement is detected by the video input device 118 and the amount and direction of movement is determined by the image processor 138. The control signals generated by the image processor 138 are provided to the directional control circuit 140. Thus, the system 100 utilizes video image pattern recognition and image tracking technology to track the position of the user's mouth and generates appropriate control signals whereby the audio input device tracks the user's mouth.
It is to be understood that although various embodiments and advantages of the present invention have been set forth in the foregoing description, the above description is illustrative only, and changes may be made in detail, yet remain within the broad principles of the invention. For example, FIG. 1 illustrates the use of a stand alone wireless communication device while FIG. 4 illustrates a line powered video teleconferencing device. The principles of the present invention may be readily implemented in either embodiment. In addition, a multitude of known components may be used for the audio input and output devices and for the video input and output devices. Furthermore, a variety of known technologies may be used to direct the directivity pattern of the audio input device in the direction of the user's mouth. Therefore, the present invention is to be limited only by the appended claims.
What is claimed is:

Claims

1. A wireless audiovisual communication device comprising: a housing sized to be held in the hand of a user; a receiver to receive data, including video data, from a location remote from the communication device; a display mounted to the housing to display video images corresponding to the received video data; a video input device mounted to the housing to sense a video image of the user and to generate video data corresponding to the sensed video image; an audio input device mounted to the housing to sense speech signals from the user and to generate electrical signals related thereto, the audio input device having a controllable sensitivity in a selected direction; and a directional control circuit to orient the selected direction toward the mouth of the user.
2. The device of claim 1, further including an image recognition processor to analyze the generated video data to thereby identify and track the position of the user's mouth, the image recognition processor generating control signals for the directional control circuit to orient the selected direction toward the mouth of the user.
3. The device of claim 1 wherein the audio input device is a phased array audio input device and the directional control circuit supplies phase control signals to orient the selected direction toward the mouth of the user.
4. The device of claim 3 wherein the phased array audio input device comprises a plurality of microphones and the directional control circuit generates phase delay control signals to orient the selected direction toward the mouth of the user.
5. The device of claim 1 wherein the audio input device comprises a plurality of microphones and the directional control circuit comprises a beamforming circuit that generates control signals to orient the selected direction toward the mouth of the user.
6. The device of claim 1, further comprising a transmitter to transmit the electrical signals generated by the audio input device.
7. The device of claim 6 wherein the transmitter further transmits the video data generated by the video input device.
8. The device of claim 7 wherein the image recognition processor analyzes the video data generated by the video input device and generates an image stabilized video image and the transmitter transmits the stabilized video image.
9. An audiovisual communication device comprising: a receiver to receive data, including video data, from a location remote from the communication device; a display coupled to the receiver to display video images corresponding to the received video data; a video input device mounted to sense a video image of a user and to generate video data corresponding to the sensed video image; an audio input device mounted to sense speech signals from the user, the audio input device being responsive to control signals to orient a directional sensitivity of the audio input device toward the mouth of the user; and an image recognition processor to analyze the generated video data to thereby identify and track the position of the user's mouth, the image recognition processor generating control signals for the audio input device to orient the directional sensitivity toward the mouth of the user.
10. The device of claim 9, further comprising an audio output device coupled to the receiver, the received data also including received audio data that is provided to the audio output device.
11. The device of claim 9, further comprising a transmitter to transmit the electrical signals generated by the audio input device.
12. The device of claim 11 wherein the transmitter further transmits the video data generated by the video input device.
13. The device of claim 9, further comprising a connector to couple the audiovisual communication device to a network, the receiver receiving the video data via the network.
14. The device of claim 13 wherein the network is a switched telephone network.
15. The device of claim 9 wherein the receiver is a wireless receiver and receives the data from the remote location via a wireless connection.
16. A method for communicating with audiovisual communication device, the method comprising: receiving data, including video data, from a location remote from the communication device; displaying video images corresponding to the received video data; sensing a video image of a user and generating video data corresponding to the sensed video image; analyzing the generated video data to thereby identify and track the position of the user's mouth and generating control signals relating to the position of the user's mouth; orienting a directional sensitivity of an audio input device toward the mouth of the user in response to the control signals; and sensing speech signals from the user using the audio input device and generating electrical signals related thereto.
17. The method of claim 16 wherein the received data also includes audio data, the method further comprising generating audible signals based on the received audio data.
18. The method of claim 16, further comprising transmitting the electrical signals generated by the audio input device.
19. The method of claim 16, further comprising transmitting the generated video data.
20. The method of claim 16, further comprising coupling the audiovisual communication device to a network, the received data being received via the network.
21. The method of claim 20 wherein the network is a switched telephone network.
22. The method of claim 16 wherein the receiver is a wireless receiver and receives the data from the remote location via a wireless connection.
23. The method of claim 16 wherein the audio input device is a phased array audio input device and orienting the directional sensitivity comprises generating phase delay control signals to orient the directional sensitivity toward the mouth of the user.
24. The method of claim 16 wherein the audio input device comprises a plurality of microphones and orienting the directional sensitivity comprises generating beamforming signals to orient the directional sensitivity toward the mouth of the user.
PCT/US2000/007384 1999-03-18 2000-03-20 Videophone with audio source tracking Ceased WO2000056070A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU40168/00A AU4016800A (en) 1999-03-18 2000-03-20 Videophone with audio source tracking

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US27206599A 1999-03-18 1999-03-18
US09/272,065 1999-03-18

Publications (1)

Publication Number Publication Date
WO2000056070A1 true WO2000056070A1 (en) 2000-09-21

Family

ID=23038251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/007384 Ceased WO2000056070A1 (en) 1999-03-18 2000-03-20 Videophone with audio source tracking

Country Status (2)

Country Link
AU (1) AU4016800A (en)
WO (1) WO2000056070A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002041632A1 (en) * 2000-11-16 2002-05-23 Telefonaktiebolaget Lm Ericsson (Publ) Recording of moving images
US8275147B2 (en) 2004-05-05 2012-09-25 Deka Products Limited Partnership Selective shaping of communication signals
CN107067414A (en) * 2008-07-31 2017-08-18 诺基亚技术有限公司 Electronic device directional audio-video capture
US10939207B2 (en) 2017-07-14 2021-03-02 Hewlett-Packard Development Company, L.P. Microwave image processing to steer beam direction of microphone array

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0751473A1 (en) * 1995-06-26 1997-01-02 Lucent Technologies Inc. Locating features in an image
EP0796026A2 (en) * 1996-03-15 1997-09-17 AT&T Corp. Multimedia terminal cover hinge
WO1997043856A1 (en) * 1996-05-16 1997-11-20 Unisearch Limited Compression and coding of audio-visual services
EP0836324A2 (en) * 1996-10-09 1998-04-15 PictureTel Corporation Integrated portable videoconferencing
JPH10164401A (en) * 1996-11-29 1998-06-19 Hitachi Ltd Display device of portable information communication device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0751473A1 (en) * 1995-06-26 1997-01-02 Lucent Technologies Inc. Locating features in an image
EP0796026A2 (en) * 1996-03-15 1997-09-17 AT&T Corp. Multimedia terminal cover hinge
WO1997043856A1 (en) * 1996-05-16 1997-11-20 Unisearch Limited Compression and coding of audio-visual services
EP0836324A2 (en) * 1996-10-09 1998-04-15 PictureTel Corporation Integrated portable videoconferencing
JPH10164401A (en) * 1996-11-29 1998-06-19 Hitachi Ltd Display device of portable information communication device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PATENT ABSTRACTS OF JAPAN vol. 1998, no. 11 30 September 1998 (1998-09-30) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002041632A1 (en) * 2000-11-16 2002-05-23 Telefonaktiebolaget Lm Ericsson (Publ) Recording of moving images
GB2386023A (en) * 2000-11-16 2003-09-03 Ericsson Telefon Ab L M Recording of moving images
GB2386023B (en) * 2000-11-16 2005-02-02 Ericsson Telefon Ab L M Recording of moving images
US7271827B2 (en) 2000-11-16 2007-09-18 Telefonaktiebolaget Lm Ericsson (Publ) System and method for recording moving images
KR100894010B1 (en) 2000-11-16 2009-04-17 텔레폰악티에볼라겟엘엠에릭슨(펍) Video recording
US8275147B2 (en) 2004-05-05 2012-09-25 Deka Products Limited Partnership Selective shaping of communication signals
CN107067414A (en) * 2008-07-31 2017-08-18 诺基亚技术有限公司 Electronic device directional audio-video capture
CN107067414B (en) * 2008-07-31 2021-05-25 诺基亚技术有限公司 Electronic device directional audio video capture
US10939207B2 (en) 2017-07-14 2021-03-02 Hewlett-Packard Development Company, L.P. Microwave image processing to steer beam direction of microphone array

Also Published As

Publication number Publication date
AU4016800A (en) 2000-10-04

Similar Documents

Publication Publication Date Title
CA2391994C (en) Mobile communications
US20040203537A1 (en) Wireless transmission module
US6788332B1 (en) Wireless imaging device and system
US7852369B2 (en) Integrated design for omni-directional camera and microphone array
US7436427B2 (en) Integrated camera stand with wireless audio conversion and battery charging
WO2001011881A1 (en) Videophone device
JP2005176301A (en) Image processing apparatus, network camera system, image processing method, and program
US9479704B2 (en) Apparatus and method for supporting zoom microphone functional in mobile terminal
US6373516B1 (en) Picture position indicator for picture phone
US20030123069A1 (en) Method and device for forming an image in an electronic device
KR101780969B1 (en) Apparatus and method for supproting zoom microphone functionality in portable terminal
WO2000056070A1 (en) Videophone with audio source tracking
JP3804766B2 (en) Image communication apparatus and portable telephone
US10956122B1 (en) Electronic device that utilizes eye position detection for audio adjustment
US7154448B2 (en) Antenna module
US12021906B2 (en) Electronic device with automatic selection of image capturing devices for video communication
JP2005123932A (en) Wireless tag video system
EP1166199B1 (en) Virtual user interface for mobile telecommunications device
JP2004173095A (en) Portable videophone device
KR100413268B1 (en) Mobile Communication Device including Camera-direction Automatic Control Apparatus using Location of Hands-free-Set
TW202203006A (en) Pickup system and pickup device including a pickup, a controller and a turning adjustment mechanism
US20050122404A1 (en) [portable communication device]
JP2001078160A (en) Videophone device and imaging method
JPH08279711A (en) Antenna device
CA2491177A1 (en) Mobile communications

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)