HK1058425B

HK1058425B - Method and apparatus for entering data using a virtual input device

Info

Publication number: HK1058425B
Application number: HK04101087.8A
Authority: HK
Inventors: 艾巴斯‧拉菲; 塞路斯‧班吉; 纳兹姆‧卡里米; 什拉兹‧什弗吉
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2000-02-11
Filing date: 2001-02-12
Publication date: 2006-08-18

Description

Method and apparatus for inputting data using virtual input device

This application claims the priority of U.S. provisional patent application (serial No. 60/163445) "Method and Devices for 3D Sensing of Input Commands to electronic Devices", filed on 4.11.1999, to which the applicant is a member. The provisional patent application, assigned to Canasta corporation, assignee of the present application, is incorporated herein by reference.

Reference is additionally made to the applicant's co-pending U.S. patent application serial No. 09/401059, filed 1999, 9.22.5, "CMOS-COMPATIBLE method-DIMENSIONAL IMAGE SENSOR IC," which is assigned to the same assignee, Canasta, inc. The co-pending U.S. patent application is also incorporated herein by reference.

Technical Field

The present invention relates generally to the entry of commands and/or data (collectively referred to herein as "data") into electronic systems, including computer systems. More particularly, the present invention relates to a method and apparatus for inputting data when the form factor of a computing device makes it impossible to use a generally sized input device such as a keyboard, or when the distance between the computing device and the input device makes it inconvenient to use a conventional input device coupled to the computing device through a cable.

Background

Computer systems that receive and process input data are well known in the art. Typically such a system comprises a Central Processing Unit (CPU), a persistent read-only memory (ROM), a Random Access Memory (RAM), at least one bus interconnecting the CPU and the memory, at least one input port to which devices are coupled for inputting data and commands, and an output port to which a monitor is coupled for displaying results. Traditional data entry technologies include keyboards, mice, joysticks, remote controls, electronic pens, touch screens or pads or display screens, switches and knobs, and more recently the use of handwriting recognition and voice recognition.

In recent years, computer systems and computer-type systems have sought to enter a new generation of electronic devices, including interactive TVs, set-top boxes, electronic cash registers, synthetic music generators, hand-held portable devices including so-called Personal Digital Assistants (PDAs), and wireless telephones. Conventional input methods and devices are not always suitable or convenient for use with such systems.

For example, some portable computer systems have shrunk to the point where the entire system can be placed in a user's hand or pocket. To overcome the difficulty of viewing small display orientations, commercially available virtual display accessories can be used that clip onto a frame worn by the system user. The user views the accessory (which may be a 1 inch VGA display) as if it were a large display measuring perhaps 15 inches diagonally.

Research has shown that a keyboard and/or mouse-like input device may be the most efficient technique for entering or editing data in a companion computer or computer-type system. Unfortunately, the problems associated with smaller sized input devices are difficult to overcome because the smaller sized input devices can significantly slow down the speed at which data is input. For example, some PDA systems have keyboards that are about 3 inches by 7 inches in size. Although data and commands can be entered into the PDA through the keyboard, the input speed is reduced and the discomfort is increased compared to a standard size keyboard having a size of about 6 inches by 12 inches. Other PDA systems simply cancel the keyboard and provide a touch screen on which the user can write alphanumeric characters using a stylus. Handwriting recognition software within the PDA then attempts to interpret and recognize the alphanumeric characters drawn on the touch screen by the user using the stylus. Some PDAs display an image of a keyboard on a touch screen and allow a user to input data by using an input stroke and an image of each key. In other systems, the distance between the user and the computer system makes it impossible to conveniently use a wire-coupled input device, for example, the distance between a user and a set-top box in a living room makes it impossible to browse using a wire-coupled mouse.

Another method of inputting data and commands to an electronic device is to recognize various images of user operations and gestures that are subsequently interpreted and translated into commands accompanying a computer system. One such Method is described in Korth, U.S. Pat. No.5767842, "Method and Device for optical Input of Commands or Data". Korth teaches striking a virtual or virtual keyboard, such as a keyboard-sized stencil or a printed outline of paper, against a computer system user. The template is used to guide the user's fingers to strike the virtual keyboard keys. When a user "hits" a virtual keyboard, a conventional TV (two-dimensional) video camera focused on the virtual keyboard allows in some way to identify which virtual key (e.g., a printed thumbnail of the key) the user's finger is touching.

There is an inherent uncertainty with the Korth approach due to the reliance on relative lighting data, and indeed on sufficient ambient lighting sources. Although the video signal output of a conventional two-dimensional video camera is in a format suitable for image recognition by the human eye, the signal output is not suitable for computer recognition of the viewed image. For example, in a Korth-type application, in order to track the position of a user's finger, a computer-executable program must utilize changes in the luminosity of pixels in the video camera output signal to determine the profile of each finger. Such tracking and contouring is a difficult task to accomplish when the background color or illumination cannot be accurately controlled, and may actually resemble the user's finger. Further, each frame of video obtained by Korth, typically at least 100 pixels by 100 pixels, has only one gray level or color level code (commonly referred to as RGB). As with the limitations on such RGB value data, the microprocessor or signal processor in the Korth system is at best able to detect the outline of a finger against a background image if the ambient lighting conditions are optimal.

The attendant problems are as significant as the potential uncertainty of tracking the user's finger. Since the conventional video camera outputs two-dimensional image data and does not provide determination information on the actual shape and distance of an object in a video scene, the technique of Korth cannot avoid uncertainty. In practice, the keystroke activity along the optical axis of the camera lens is difficult to detect from the vantage point of the Korth video camera. Thus, to adequately capture complex keystroke activity, multiple cameras with different vantage points are required. In addition, as suggested by Korth in FIG. 1, it is difficult to obtain only clear views of the individual fingers on both hands of the user, e.g. due to image blockage of the middle finger of the right hand, making it impossible to obtain an image of the index finger of the right hand, etc. In summary, even with good ambient lighting and good video camera vantage, the method of Korth still suffers from a number of drawbacks, including uncertainty as to which row of the virtual keyboard the user's finger is touching.

In an attempt to obtain depth information, the Korth method may be repeated by utilizing multiple two-dimensional video cameras, each aimed at the object of interest from a different perspective. This method sounds simple but is not practical. When multiple cameras are used, it is cumbersome and expensive to install multiple cameras. Each camera must be accurately calibrated with respect to the object under observation and with respect to each other. To obtain sufficient accuracy, a stereo camera may have to be arranged at the upper left and upper right positions of the keyboard. Even with this structure, however, the camera is disturbed by the fingers covering them in the field of view of at least one camera. Furthermore, the computation required to generate three-dimensional information from the two-dimensional image information output of each camera affects the processing overhead of the computer system for processing the image data. It will be appreciated that the use of multiple cameras greatly complicates the Korth signal processing requirements. Finally, it is difficult to obtain the camera-object distance resolution necessary to detect and recognize small object movements, such as small movements of a user's finger busy with a keystroke action.

In summary, it is not practical to use the Korth method to examine a two-dimensional video image based on luminosity of the user's hands busy with keystrokes and to determine from the image when which finger touches which key (virtual key or another key). This drawback exists even when the two-dimensional video information processing required is increased by computerized image pattern recognition as proposed by Korth. It can also be seen that the technique of Korth is not suitable for portability in practice. For example, at virtually all times, the image acquisition system and ambient light source will be on, consuming enough operating power to make meaningful battery operation impossible. Even though Korth can reduce the frame rate of data acquisition to save some power, the Korth system still requires sufficient ambient lighting sources.

In addition to power considerations, the two-dimensional imaging system of Korth does not contribute to portability for small, paired devices such as cellular phones because the video camera (or possibly the camera) of Korth needs to be located in a favorable position above the keyboard. This requirement places a limit on the physical size of the Korth system, whether the system is operating or being stored during transport.

What is needed is a method and system by which a user can enter data into a paired computing system using a virtual keyboard or other virtual input device that is not electrically connected to the computing system. The data input interface simulation implemented by such methods and systems should provide meaningful three-dimensional acquisition information as to which finger of the user touches which key (or other symbol) on the virtual input device in what time sequence, preferably without having to use multiple image acquisition devices. Such a system should preferably include signal processing so that the system output can be used directly as input by the companion computing system in scan mode or other format. Finally, such a system should be portable and easy to install and operate.

Disclosure of Invention

The present invention provides such a method and system.

According to an aspect of the present invention, there is provided a method of user interaction with a virtual input device by using a user-controllable object, the method comprising the steps of:

(a) providing a sensor capable of obtaining position coordinate information of a relative position of at least a portion of the user-controllable object with respect to a work surface on which the virtual input device is defined;

(b) processing information obtained by the sensor to determine, independent of the velocity of the user-controllable object, at least one of: (i) whether a portion of the user-controllable object touches a location of the working surface representing a portion of the virtual input device, and (ii) whether a portion of the user-controllable object touches a location representing a portion of the virtual input device, and if so, which function of the virtual input device is associated with the location;

(c) outputting the information processed in step (b) to the pairing device.

According to another aspect of the present invention, there is provided a system for use with a pairing device for receiving digital input provided by a user manipulating a user-controllable object relative to a virtual input device, comprising:

a sensor capable of obtaining positional coordinate information of at least a portion of the user-controllable object relative to a work surface on which the virtual input device is defined such that the user inputs information into the companion device using the user-controllable object;

processing information obtained by the sensor to determine, independent of the velocity of the user-controllable object, at least one of: (i) whether a portion of the user-controllable object touches a location of the working surface representing a portion of the virtual input device, and (ii) whether a portion of the user-controllable object touches a location representing a portion of the virtual input device, and if so, a processor that determines which function of the virtual input device is associated with the location; and

the processor outputs digital information commensurate with the location of the touch to the pairing system.

According to yet another aspect of the present invention, there is provided a system for allowing a user to interact with a virtual input device by manipulating a user-controllable object, comprising:

a sensor array capable of collecting positional information of a relative position of at least a portion of the user-controllable object with respect to a work surface on which the virtual input device is defined;

processing information obtained by the sensor array to determine, independent of the velocity of the user-controllable object, at least one of: (i) whether a portion of the user-controllable object touches a location of the working surface representing a portion of the virtual input device, and (ii) whether a portion of the user-controllable object touches a location representing a portion of the virtual input device, and if so, a processor that determines which function of the virtual input device is associated with the location; and

a pairing device coupled to receive digital information output from the processor commensurate with the location of the touch.

The present invention enables a user to input commands and data (collectively referred to as data) from a passive virtual simulation of a manual input device into a companion computer system, which may be a PDA, wireless telephone, or any electronic system or appliance adapted to receive digital input signals. The present invention comprises a three-dimensional sensor imaging system that functions, even in the absence of ambient light, to collect, in real time, three-dimensional data relating to the placement of a user's finger on a substrate that supports or displays a template for simulating an input device such as a keyboard, a numeric keypad, or a digitizing surface. The substrate is preferably passive and may be a foldable or rollable paper or plastic containing a printed image of the keyboard keys, or simply a marking line distinguishing which rows and columns the keyboard keys are to be located in. The substrate may be defined to lie in a horizontal X-Z plane in which the Z-axis defines rows of template keys, the X-axis defines columns of template keys, and the Y-axis represents the vertical height above the substrate. If desired, instead of a base keyboard, the invention may include a projector that uses light to project an image of the grid or keyboard onto the work surface in front of the mating device. The projected pattern may be used as a guide for the user when "typing" on the surface. The projection device is preferably contained within or attached to the counter device.

On the other hand, the base as a typing guide may be eliminated. Instead, when the user types alphanumeric characters on a desktop or other work surface (which may be the desktop) in front of the paired device, the screen of the paired device is used to display these alphanumeric characters. For users who have not reached the touch typing level, the present invention may instead (or in addition) provide a display image representing the "keys" of the keyboard when the user "presses" or "hits" the "keys of the keyboard. The perceived "key" directly under the user's finger may be highlighted in the display image in one color, while the perceived actually triggered "key" may be highlighted in another color or the opposite color. This configuration permits the user to strike a work surface in front of the pairing device or strike a virtual keyboard. When the user hits the work surface or virtual keyboard, the corresponding text preferably appears in a text field displayed on the counterpart device.

Thus, various forms of feedback may be used to guide the user's virtual typing. Which fingers of the user's hands strike which virtual keys at what timing or virtual key positions are determined by the three-dimensional sensor system. The three-dimensional sensor system preferably includes a signal processor including a Central Processing Unit (CPU) and associated Read Only Memory (ROM) and Random Access Memory (RAM). Stored in the ROM are software routines executable by the signal processor CPU such that three-dimensional positional information is received and converted in substantially real-time into key scan data or other forms of data directly compatible with device input to a companion computer system. The three-dimensional sensor preferably emits light of a particular wavelength and detects the time-of-flight (return energy time-of-flight) of energy reflected back from different surface areas of the object being scanned, e.g., the user's hands.

At the beginning of typing, the user will place his or her fingers near or on the work surface or virtual keyboard (if present). The present invention remains in a standby, low energy consumption mode until the user or some other object comes within the imaging range of the three-dimensional sensor. In the standby mode, the repetition frequency of the emitted optical pulses is reduced to 1-1-pulse/second in order to conserve operating power, which is an important consideration if the present invention is used with a battery as a power source. Thus, the present invention will emit relatively few pulses, but still be able to acquire image data, albeit with coarse or lower Z-axis resolution. As an alternative to three-dimensional data collection, methods of reducing the acquisition frame rate and resolution to conserve energy may be used. But such low resolution information is at least sufficient to alert the present invention to the presence of objects within the observed imaging field. When the subject is indeed within the imaging field of view, the CPU governing the operation of the present invention commands entry into a normal operating mode, where a high pulse frequency is employed, and the system functions are now operating at full power. To conserve operating power, the present invention will power down when the user's finger or other potentially relevant object disappears from the observed imaging field, returning to the standby power mode. Such a power reduction is also preferably produced when the relevant object is deemed to remain stationary for an extended period of time that exceeds a time threshold.

Now assume that the user has placed his or her finger on the home row key (e.g., A, S, D, F, J, K, L,: for example) of the virtual keyboard (or on the face before the pairing device with which the invention is practiced if the virtual keyboard is not present). The present invention, already in full power mode, now preferably initiates a soft key calibration in which the computer assigns positions to keyboard keys based on user input. The user's fingers are placed on certain (intended) keys and, depending on the exact position of the fingers, the software assigns positions to the keys on the keyboard based on the position of the user's fingers.

The three-dimensional sensor system observes the user's fingers when the user "hits" the keys displayed on the base template, or when the user hits the workspace where the "keys" would normally be if there were a real keyboard before the pairing device. The sensor system outputs data to the companion computer system in a format that is functionally indistinguishable from data output by conventional input devices such as a keyboard, mouse, etc. Software executable by a signal processor (CPU) (or by a CPU in a companion computer system) preferably processes the three-dimensional information input and identifies the position of the user's hands and fingers in three-dimensional space relative to an image of the keyboard or work surface on the base (if no virtual keyboard is present).

The software routine preferably identifies the outline of the user's finger in each frame by examining the Z-axis discontinuity. The physical interface between a user's finger and a virtual keyboard or work surface is detected when the finger "taps" a key, or "taps" an area on the work surface where the key would be if there were a keyboard (real or virtual). The software routine preferably detects optically acquired data and determines such interface boundaries in successive frames to calculate the Y-axis velocity of the finger. (in other embodiments, low frequency energy such as ultrasound may be used instead). When such vertical finger motion ceases, or depending on the routine, when the finger touches the substrate, the virtual key being pressed is determined from the (Z, X) coordinates of the finger in question. An appropriate KEYDOWN event command may then be issued. The present invention performs a similar analysis on all fingers, including the thumb, to accurately determine the order in which different keys are touched (e.g., pressed). In this manner, the software issues appropriate KEYUP, KEYDOWN, and scan code data commands to the paired computer system.

The software routine preferably recognizes and corrects the offset of the user's hands while typing, such as an offset on a virtual keyboard. The software routine also provides some hysteresis to reduce errors resulting from the user resting a finger on a virtual key without actually "pressing" the key. Measurement error is further reduced by observing measurement error in the typing application, with the frame rate required to track the Z-values being less than the frame rate required to track the X-and Y-values. That is, finger motion in the Z direction is generally slower than finger motion in the other axis directions. The present invention also distinguishes between different competing fingers when hitting on a keyboard or other work surface. This distinction is best accomplished by observing the X-axis, Y-axis data values at a sufficiently high frame rate, since it is the Y-dimension timing that is to be distinguished. It is not necessary to distinguish the Z-axis observations between different fingers, so that the frame rate can be controlled by the speed at which a single finger moves between different keys in the Z-dimension. The software routine provided by the present invention preferably averages the acquired Z-axis data over several frames to reduce noise or jitter. Although the effective frame rate of the Z-values is reduced relative to the effective frame rates of the X-values and Y-values, the accuracy of the Z-values is improved and meaningful data acquisition frame rates can still be obtained.

The software routine allows the user to switch the paired computer system from alphanumeric data entry mode to graphical mode simply by "typing" certain key combinations, perhaps simultaneously pressing the Control key and the Shift key. In the graphics mode, the template simulates a digitizer form, and as the user drags his or her finger within the template, the (Z, X) trajectory of the touched point can be used to draw a line, a signature, or other graphics into the companion computer system.

The display associated with the companion computer system preferably displays alphanumeric or other data entered by the user in substantially real time. In addition to depicting images of keyboard keys and fingers, the display of the companion computer system may also provide a block cursor that displays the alphanumeric characters to be entered. Another form of input feedback is achieved by forming a spring zone under some or all of the keys to provide tactile feedback when a "key" is touched by a user's finger. The name of the "clicked" key can even be specified aloud letter by letter using a pairing device if appropriate, for example, the letters "c" - "a" - "t" are read out clearly when the user types the word "cat". A simpler form of audible feedback is provided by causing the pairing device to sound an electronic keystroke when a user's finger pressing a virtual key is detected.

Drawings

Other features and advantages of the present invention will be apparent from the following description, taken in conjunction with the accompanying drawings, in which preferred embodiments are set forth in detail.

FIG. 1A depicts a three-dimensional sensor system for use with a passive matrix keyboard template according to the present invention;

FIG. 1B depicts a three-dimensional sensor system used in the absence of a base keyboard template according to the present invention;

FIG. 1C depicts a paired device display showing a virtual keyboard of a user's finger touching a virtual key, in accordance with the present invention;

FIG. 1D depicts the display of FIG. 1C further showing text entered by a user on the virtual keyboard, in accordance with the present invention;

FIG. 2A depicts a passive matrix in a partially folded state according to the present invention;

FIG. 2B depicts a passive matrix supporting different character sets in a partially rolled state, in accordance with the present invention;

FIG. 3 is a block diagram of an exemplary implementation of a three-dimensional signal processing and sensor system with which the present invention may be practiced;

FIG. 4 is a block diagram of an exemplary single pixel detector with associated photon pulse detector and high speed counter in a three dimensional sensor system with which the present invention may be practiced;

FIG. 5 depicts contour recognition of a user's finger in accordance with the present invention;

FIG. 6 depicts an application of staggered key positions in identifying a virtual key pressed in accordance with the present invention;

7A-7O depict a clustering matrix generated from optically acquired three-dimensional data for use in identifying a position of a user's finger in accordance with the present invention.

Detailed Description

FIG. 1A depicts a three-dimensional sensor system 10 including three-dimensional sensors 20, the three-dimensional sensors 20 being substantially focused on the fingers of a user's hands 40 when the fingers 30 of the user's hands 40 "hit" a base 50 (here shown above a desk or other work surface 60). The substrate 50 preferably supports a printing or projection template 70 that includes lines or indicia representing a data entry device, such as a keyboard. In this way, the template 70 may have printed images of the keyboard keys, as shown, but it is understood that the keys are electronically passive and merely representations of the actual keys. Matrix 50 is defined to lie in a Z-X plane, where different points along the X axis are associated with left-to-right column positions of keys, different points along the Z axis are associated with front-to-back row positions of keys, and the Y axis position is associated with a vertical distance above the Z-X plane. (X, Y, Z) the positions are a continuum of vector position points, and different axial positions may be defined using the points indicated in FIG. 1A.

If desired, template 70 may simply contain row and column lines to distinguish where present. The substrate 50 with the stencil 70 printed or otherwise displayed thereon is a virtual input device that, in this illustrated example, simulates a keyboard. As such, substrate 50 and/or template 70 may be referred to herein as a virtual keyboard or virtual device for entering numerical data and/or commands. The advantage of this virtual input device is that it can be printed on paper or flexible plastic and folded as shown in fig. 2A. The arrangement of keys need not be in the form of a rectangular array as shown for ease of illustration of several fingers, but may be arranged in staggered or offset positions as in a true QWERTY keyboard. Fig. 2B also shows a device having a preliminary set of keys printed as a template 70, here cyrillic characters. If desired, one set of keys may be printed on one side of the template and another set of keys, such as English and Russian characters, may be printed on the other side.

As described with respect to fig. 1B-1D, images of the virtual keyboard may also be displayed on a screen associated with the paired device. In this embodiment, the base and even the work surface may be omitted, if desired, allowing the user to "hit" thin air. This embodiment is particularly flexible in allowing on-the-fly variations of the "keyboard" being used, such as providing an English keyboard, or a German keyboard, a Russian keyboard, an analog-to-digital conversion table, etc. The different keyboards and key sets are simply displayed on the screen 90 associated with the paired device or appliance. It will be appreciated that great flexibility is obtained by presenting alternative sets of keys in the form of displayed images of virtual keys having different sets of characters on the display of a pairing device used in connection with the present invention. Thus, in FIG. 1B, the virtual keyboard is eliminated as a guide, further improving portability and flexibility.

In various embodiments, data (and/or commands) that a user would enter from virtual keyboard 50 (as shown in FIG. 1A) or from a work surface even without a virtual keyboard (as shown in FIG. 1B) would be coupled to a companion computer or other system 80. Without limitation, the companion computer system or similar system may be a PDA, wireless telephone, laptop PC, pen-input computer, or any other electronic system to which data needs to be input. The folded or rolled up size can be made small enough to be stored with a PDA or other companion computer system 80 with which it will be used to enter data and commands. For example, when folded, the keyboard may be 2.5 "by 3" in size, and preferably at least less than 8 "by 8". The virtual keyboard of the PDA may have a folded form factor that may be placed into a pocket on the back of the PDA. But when in use, the virtual keyboard is opened or expanded to become a full-size virtual keyboard.

When a user enters data into the pairing system 80, the display 90, which is typically present on the system 80, can display in real time the data 100 being entered from the virtual keyboard, e.g., can enter the text of a PDA, can enter an email for a wireless telephone, etc. In one embodiment, the block cursor 102 perceives a display of a single alphanumeric character to be struck, such as the letter "d" in FIG. 1A, around the present invention. Such visual feedback features may help the user confirm the accuracy of the data entry and may provide guidance for repositioning the user's finger to ensure that the desired character will be struck. The system 80 may issue an audible feedback, such as a "key click," when each virtual key is pressed, to provide more feedback to the user. If desired, passive protrusions 107 may be formed in the virtual keyboard to provide tactile feedback to the user. Such protrusions may be, for example, hemispheres formed under individual "keys" in a simulated keyboard made of resilient plastic.

As mentioned above, visual feedback may also be provided by displaying an image of a virtual keyboard (which is the base or empty work surface in front of the pairing device) on the screen of the pairing device. When the user strokes a key, he or she is guided by the keyboard image representing the user's fingers as they move relative to the virtual keyboard. The image may include highlighting the keys directly under the user's finger, which may be highlighted in a different color or the opposite color if a key is actually pressed. If desired, the screen of the companion device may be "split" such that when an alphanumeric character is "hit", the actual alphanumeric character appears in the upper portion of the screen and the image of the virtual key on which the user's finger is placed appears in the lower portion of the screen (or vice versa).

In fig. 1A and 1B, the pairing system 80 is shown mounted on a bracket 110, and the three-dimensional sensor 20 may be permanently attached to the bracket 110. Alternatively, the sensor 20 may be permanently mounted in the lower portion of the counter device 80. The output from sensor 20 is coupled to a data input port on companion device 80 via channel 120. If a cradle or similar device is used, insertion of the device 80 into the cradle 110 may be used to automatically effect a connection between the output of the sensor 20 and the input of the device 80.

As described herein, the configuration of FIG. 1B is advantageous for allowing a user to enter data (e.g., text, graphics, commands) into companion device 80 even in the absence of a printed virtual keyboard as shown in FIG. 1A. For ease of understanding, grid lines along the X and Y axes are shown on the working surface area 60 in front of the counter 80. The various software mapping techniques described herein allow the present invention to discern which virtual keys (if any) the user's finger is intended to strike. While the embodiment of FIG. 1A facilitates obtaining haptic feedback from a virtual keyboard, the embodiment of FIG. 1B does not allow such haptic feedback to be obtained. Thus, the display 90 of the device 80 preferably displays images to assist the user in typing keystrokes. Of course, as in the embodiment of FIG. 1A, at "key stroke," the device 80 may emit a key stroke sound when the user's finger presses the work surface 60.

FIG. 1C depicts one visual aid that may be obtained from an appropriate device 80, which of course may be used with the embodiment of FIG. 1A. In FIG. 1C, the screen 90 displays at least a portion of the image of the keyboard 115 and the outline or other representation 40 'of the user's hands, representing the position of the hands and fingers relative to the position of the keys on a real or virtual keyboard. For ease of illustration, FIG. 1C depicts only the position of the user's left hand. When a key is "touched" or the user's finger is close enough to "touch" a key (e.g., where such a key exists on the work surface 60 if a keyboard is present), the device 80 may highlight the image of the key (e.g., display the corresponding "soft key"), and when the key is "pressed" or "struck," the device 80 may display the key with a different color or an opposite color. For example, in FIG. 1C, the "Y" key is shown highlighted or otherwise contrasted, which may indicate that the key is being touched or about to be touched, or is being pressed by the user's left index finger. As shown in FIG. 1D, device 80 may provide a split-screen display, where a portion of the screen depicts an image to guide the placement of a user's finger over a non-existent keyboard, while another portion of the screen displays data or commands 100 input by the user to device 80. Although FIG. 1D shows that when the spelling of the word "key" on screen 90 is completed, text corresponding to the key being struck, such as the letter "Y" in the word "key", is highlighted, data 100 may also be graphical. For example, a user may command device 80 to enter a graphical mode such that movement of a finger within work surface 60 (or virtual keyboard 70) will generate a graphic, e.g., a user signature "written" on work surface 60 with an index finger or stylus. The user's finger or stylus may be collectively referred to as a "user's finger". Optionally, software associated with the present invention (e.g., software 285 in FIG. 3) may use word context to help reduce "typing" errors. It is assumed that the vocabulary of text in the language being input, e.g., english, is known in advance. The memory in the pairing device will hold a dictionary of the most frequently used words in the language and when the user "types" a word on a virtual keyboard or actually in rarefied air, the pairing device software will match the letters typed so far with the dictionary word candidates. For example, if the user enters "S," all words beginning with the letter "S" are candidate words. If the user then types "digital magnetic recording/reproducing device", there is no matching candidate word in at least english. As the user types more letters, the set of candidate words that can match the word being typed is reduced to a tractable scale. At some threshold point, such as when the size of the candidate word is reduced to 5-10 words, the software may specify the probability that the user will type the next letter. For example, if the user has entered "SUBJ," the probability that the next letter is the letter "E" is higher than the probability of the letter "W". However, since the letters "E" and "W" are adjacent on a real or virtual keyboard, it is possible for the user to press an area near the key of the letter "W". In this example, the pairing device software may be used to correct key input and assume that the user intends to enter the letter "E".

Turning now to the operation of the three-dimensional sensor 20, the sensor emits radiation of a known frequency and detects the energy returned by the surface of the object within the observed field of light. The emitted radiation in fig. 1A and 1B is represented as ray 140. The sensor 20 is aimed along the Z-axis to determine which fingertips 30 of the user touch which portions of the template 70, e.g., which virtual keys, in which temporal order. As shown in fig. 1B, even if template 70 is not present, the user merely hits on the workspace in front of device 80, and sensor 20 is still able to function, outputting meaningful data. In such an embodiment, the screen 90 of the pairing apparatus 80 may display an image 100' of the keyboard 105, in which "pressed" or potentially "keys" are highlighted, such as the keys 107 of the letter "T".

As shown in fig. 1A and 1B, lights or other projectors 145 that emit a visible light beam 147 may be used to project images of the virtual keyboard, if desired, to guide the user in typing keystrokes. For example, a visible light source (possibly a laser at visible wavelengths) may be used with a diffractive lens to project an image that guides a user's keystrokes. In such an embodiment, an image of the keyboard (which may be provided in the form of a generic graphic file format (e.g., GIF)) is used to "etch" the diffraction pattern on the lens. Although a portion of the projected image may sometimes fall on the surface of the user's finger, such a projection guide is useful in the absence of a substrate being struck. Applications of diffractive optics, including optics such as those available from MEMS Optical, LLC of AL35806, Huntsville, may be used to implement such projection embodiments.

Fig. 3 is a block diagram depicting an exemplary three-dimensional image sensor system 200 preferably constructed on a single CMOS IC 210. The system 200 may be placed in the same frame as the three-dimensional sensor 20 and used to implement the present invention. As described in more detail in co-pending U.S. patent application serial No. 09/401059, which is incorporated herein by reference, such a system advantageously requires no moving parts and requires relatively few off-chip components, primarily Light Emitting Diodes (LEDs) or laser sources 220 and associated optical focusing systems, to which the laser sources 220 may be bonded if a suitable screen is provided, to form the IC 210. It is to be understood that while the present invention is described with respect to a three-dimensional sensor 20, as disclosed in the above-mentioned co-pending US invention patent application, the present invention may also be practiced with other three-dimensional sensors 20.

The system 200 includes an array 230 of pixel detectors 240, each having dedicated circuitry 250 for processing the detected charge output produced by the associated detector. In a virtual keyboard identification application, the array 230 may include 15 x 100 pixels and a corresponding 15 x 100 processing circuit 250. Note that the array size is significantly smaller than that required for existing two-dimensional video systems such as described by Korth. Korth requires an aspect ratio of 4: 3 or in some cases possibly 2: 1, and the present invention obtains and processes data with an aspect ratio significantly less than 3: 1, preferably about 15: 2 or even 15: 1. Referring to fig. 1A and 1B, although it is necessary to encompass a relatively large X-axis extent, the edge-on arrangement of the sensor 20 relative to the substrate 50 means that only a relatively small Y-axis distance needs to be encompassed.

A high frame rate is required during user keystrokes in order to distinguish between the user's individual fingers along a row of virtual keys. In practice, however, the finger is not moved back and forth as quickly as the intended keystroke. The acquisition rate of the Z-axis data may be less than the acquisition rate of the X-axis and Y-axis data, e.g., 10 frames/second for Z-axis data and 30 frames/second for X-axis and Y-axis data.

A practical advantage of reducing the Z-axis frame rate is that less current is required by the present invention in obtaining keyboard finger position information. In fact, with respect to signal processing of the obtained information, the present invention may average the Z-axis information over multiple frames, such as frames that are checked 1/3 with respect to Z-axis position information. The Z-axis values obtained will have noise or jitter that can be reduced by averaging. For example, the Z value may be averaged over three consecutive 30 frames/second frames, so that three consecutive image frames will share the same processed Z value. Although the effective frame rate of the Z values is reduced to 1/3 times the acquisition rate of the X-axis and Y-axis data acquisition, the accuracy of the Z data is improved by neutralizing noise or jitter. The resulting reduced Z-axis frame rate is still fast enough to obtain meaningful information. Using a different frame rate for the X and Y values than the Z axis is useful for the present invention. For example, reducing the acquisition rate of Z-axis data relative to X-axis and Y-axis data minimizes current consumption, avoiding burdening the signal processor (CPU 260) with excessive signal processing.

Thus, the present invention obtains three-dimensional image data without the need for ambient light, whereas prior Korth-like systems obtain two-dimensional luminosity data in the presence of ambient light. In essence, the present invention can perceive three-dimensional objects, such as fingers and substrates, similar to human perception of objects by touch. Advantageously, this may be accomplished by utilizing a relatively small operating power, such as 3.3VDC at 10mW, which allows the present invention to be battery powered and manufactured in a relatively small and portable form factor.

Multiple frames per second of three-dimensional image data of the user's hands and fingers and substrate may be obtained from the array 230. Using this data, the present invention constructs a three-dimensional image of the hands and fingers relative to the substrate, or if no substrate is present, the virtual keyboard if located on the work surface in front of the pairing device 80. Exemplary techniques for accomplishing this are described in the earlier referenced applicant's co-pending U.S. patent application to Bamji. Constructing such a three-dimensional image from time-of-flight data is advantageous over the prior art, such as that proposed by Korth, which attempts to infer spatial relationships using data based on two-dimensional luminosity. It should be noted that the time-of-flight method may include return pulse time measurement, phase or frequency detection, or a high-speed shutter method, as described in the Bamji patent application. Other methods that do not rely on time of flight can capture three dimensional data, including stereoscopic images, and luminosity-based techniques that know depth from reflected intensity.

In practice, the array 230 may acquire and process data at a rate of 30 frames per second (a frame rate sufficient to handle a virtual typing of 5 characters per second, about 60 words per minute). If the array 230 is rectangular, for example comprising n X-axis pixels and m Y-axis pixels, if n is 100 and m is 15, a grid containing 1500 pixels is formed. For each frame of data, each pixel in array 230 will have a value, such as a certain vector or Z value, that represents the vector distance from sensor 20 to the surface of the object (e.g., a portion of the user's finger, a portion of the substrate, etc.) captured by that pixel. This data is more useful than Korth's luminosity-based image data that provides RGB gray scale or color level values at most to the video frame in determining the outline of the user's finger in two dimensions and position on the virtual keyboard.

The use of the acquired three-dimensional data enables software 285 to determine the actual shape of the user's finger (nominally assumed to be somewhat cylindrical) and thereby determine relative finger position with respect to other fingers, with respect to a position above or on the substrate, and with respect to three-dimensional sensor 20. For example, in fig. 1A, when it is sensed that a finger is moving to a position where Y is 0, it may be determined that the finger may be ready to hit a virtual key. If the finger is also felt approaching the Z-Z1 area, the finger may be ready to strike a virtual key in the first row of keys on the virtual keyboard. Determining whether a virtual key is to be pressed also takes into account speed data. For example, a user's finger that detects being moved quickly down to Y-0 may be preparing to hit a virtual key. In fig. 3, the IC 210 further includes a microprocessor or microcontroller 260 (shown as a CPU), a random access memory 270(RAM), and a read only memory 280(ROM), portions of which preferably store software routines 285 that may be executed by the CPU to implement the present invention. The controller 260 is preferably a 16-bit RISC microprocessor operating at 50 MHz. The CPU 260 performs, among other functions, vector distance to objects, which are the base and the user's hands, and object velocity calculations. The IC 210 also includes a high speed distributable clock 290 and various computing, optical drive input/output (I/O) circuitry 300, and interface data/command input/output (I/O) circuitry 310. Numeric keyboard scan type data or numeric translation tablet/mouse type data is output from the I/O310, such as from COM and/or USP port associated with the system 200.

The two-dimensional array 230 of pixel sensing detectors is preferably fabricated using standard commercial silicon technology that advantageously allows the circuits 250, 260, 270, 280, 290 and 300 to be fabricated on the same IC 210. It will be appreciated that the ability to fabricate such circuitry on the same IC with a pixel detector array may reduce processing and delay times due to the shortened signal path.

Each pixel detector may be represented as a parallel combination of a current source, an ideal diode, and a shunt impedance and noise current source. Each pixel detector will output a current proportional to the amount of input photonic light energy falling on it. The array of CMOS pixel diodes or photogate detector devices is preferably implemented using CMOS processing. For example, a photodiode can be fabricated using a diffusion-well, or well-substrate junction. The well-substrate photodiode is more sensitive to Infrared (IR) light and has a smaller capacitance, which is preferable.

As shown in fig. 3 and 4, a circuit 250 is associated with each pixel detector 240. Each circuit 250 preferably includes a pulse peak detector 315, a high speed counter 320, and may utilize a high speed clock 290. The high speed clock 200 preferably formed on the IC 210 outputs a series of high frequency clock pulses, preferably at a fixed frequency of 500MHz, with a lower duty cycle when outputting the pulses. Of course other high speed clock parameters may be used. The series of pulses are coupled to the input ports of the respective high speed interpolation counters 320. Counter 320 preferably performs a sub-count, as described in the Bamji co-pending patent application, and saves about 70% of the time. Each counter 320 preferably also has a port that receives a START signal (e.g., now starting to count), a port that receives a STOP signal (e.g., now stopping to count), and a port that receives a CONTROL signal (e.g., now resetting the accumulated count). The CONTROL and STAT signals are obtained from controller 260, the CLOCK signal is obtained from CLOCK 290, and the STOP signal is obtained from pulse peak detector 315.

The term sensor system may be used to refer collectively and include sensor array 230, lens 288 (if present), emitter 220, lens 288' (if present), and electronics to coordinate the timing relationship between emitter 220 and array 230.

The virtual keyboard 50 will be placed approximately 20 cm from the three-dimensional sensor 20, substantially in the same plane as the sensor lens. Since a typical sensor lens angle is 60 °, a distance of 20 cm ensures optical coverage of the virtual keyboard. In fig. 3, the distance between the light emitted and collected by the sensor 20 is exaggerated for the sake of illustration.

In general, the system 200 operates as follows. At time t0, microprocessor 260 commands light source 220 to emit a light pulse of known wavelength that passes through focusing lens 288 'and travels at a speed of light (C) of 300000 km/sec towards an object of interest, such as substrate 50 and user's finger 30. If the light source 220 is sufficiently powerful, the lens 288' may be omitted. At the surface of the object being imaged, at least some of the light rays may be reflected back into the system 200 for detection by the detector array. In the context of figure 3 of the drawings,

the object of interest is the fingers 30 of both hands of the user and, if present, the substrate 50 may include visual indicia, such as keyboard keys 70 or projected grid lines, as previously described, that guide the user in placing the fingers when typing.

As shown in FIG. 1A, the location of a virtual key 70 (or other user-accessible indicia) on substrate 50 is known relative to the location of other such keys on the substrate in two dimensions in the X-Z plane. As the user's finger moves back and forth across the substrate 50, the virtual keys 70 are touched while "stroking". One function of CPU 260 and software routines 285 is to detect return light energy in order to identify which virtual keys, if any, are being touched by the user's finger at what time. Once such information is obtained, the appropriate KEYUP, KEYDOWN, and key scan codes or other input signals may be provided to the input port 130 of the companion device 80 as if the data or commands being provided were generated by an actual keyboard or other input device.

At or before time t0, each pixel counter 320 in array 230 receives a CONTROL signal from controller 260 that sets any previously held value in the counter to 0. At time t0, controller 260 issues a START command to each counter, whereupon each counter begins counting and accumulating CLOCK pulses from CLOCK 290. Within the round-trip time of flight (TOF) of the light pulses, each counter accumulates CLOCK pulses, with more accumulated CLOCK pulses representing longer TOF, i.e., greater distance between the reflective spot on the imaged object and the system 200.

The essential feature of the focusing lens 288 associated with the system 200 is that the reflected light from a point on the surface of the imaging subject 20 will fall only on the pixel in the array that is focused at that point. Thus, at time t1, photon light energy reflected from the nearest point on the surface of object 20 will pass through lens/filter 288 and will fall on pixel detector 240 in array 230 focused at that point. The filter associated with lens 288 ensures that only input light having the wavelength emitted by light source 220 will fall unattenuated upon the detector array.

Assume that a particular one of the pixel detectors 240 within the array 230 is focused on the nearest surface point on the fingertip 70 of the user's finger. Correlation detector 300 will detect the voltage output by the pixel detector in response to photon energy input from such object points. The pulse detector 300 is preferably implemented as an amplified peak detector that detects small but rapidly changing pixel output currents or voltages. When the rapidly changing output voltage is large enough to be detected, the logic circuitry (e.g., the SR flip-flop) within the detector 300 switches to latching the output pulse, which is provided to the associated counter 320 in the form of a STOP signal. Thus, the number of counts accumulated in the correlation counter 320 will represent the round trip TOF to the nearest part of the fingertip under consideration, a calculable distance Z1.

Distance Z1 may be determined according to the following relationship, where C is the speed of light:

Z1＝C*(t1-t0)/2

at a later time t2, the photon energy will reach lens 288 from a portion of the user's fingertip 30 that is a little further away, and will fall on array 230, to be detected by another pixel detector. The counter associated with the other detector thus far counts CLOCK pulses starting at time t0, as do all counters except the one that stopped counting at time t 1. At time t2, the pulse detector associated with the pixel that just received and detected the input photon energy will issue a STOP command to the associated counter. The count value accumulated in this counter will reflect the round trip TOF to the middle point on the fingertip, distance Z2. Within the IC 210, the controller 260 executing the software routine 285 stored in the memory 280 may calculate distances associated with TOF data for various light reflecting points on the surface of the object. By detecting the acquired data of consecutive frames, the speed can be calculated.

In a similar manner, at time t3, yet another pixel detector in the array will detect enough photon energy just arrived for its associated pulse detector 300 to issue a STOP command to the associated counter. The count value accumulated in the counter represents TOF data at a greater distance Z3 onto the imaging subject. Although fig. 3 shows only three emitted light rays and light reflections, each near one of the fingertips, for ease of illustration, virtually all of the substrate and the user's finger and thumb will be illuminated by the light source 220 and will reflect at least some of the energy into the lens 288 associated with the three-dimensional sensor 20.

Of course, some pixels in the array may not receive enough reflected light from the object point on which they are focused. Thus, after a predetermined time (which may be programmed into the controller 260), the counters associated with the individual pixels in the sensor array will be stopped (or will be assumed to hold count values corresponding to objects located at infinity distance Z) due to pulse detection.

As described above, in the present application, it is sufficient if the system 200 is capable of accurately imaging objects that are within a range of 20 centimeters to 30 centimeters, such as about 20 centimeters plus the distance between the top and bottom rows of virtual keys on the substrate 50. Using each reflected light pulse detected, TOF distance values calculated by the counter for each pixel in the array are determined and preferably stored in a frame buffer in RAM associated with the device 270. The microprocessor 260 preferably detects successive frames stored in RAM to identify objects and object locations in the field of view. The microprocessor 260 may then calculate the speed of movement of the object, such as a finger. In addition to calculating distance and velocity, the microprocessor and associated monolithic circuit are preferably programmed to recognize the shape or contour of the user's finger and to separate the finger surface from the base surface area. Once the finger outline is recognized, the system 200 may output the relevant digital data and commands to the paired computer system through a COM or USB or other port.

The above example illustrates how three pixel detectors receiving photon energy at three separate times t1, t2, t3 turn off associated counters whose cumulative count values can be used to calculate distances Z1, Z2, Z3 to the finger surface and substrate in the field of view. In practice, the present invention does not process only three such calculations for each light pulse, but thousands or tens of thousands of such calculations, depending on the size of the array. Such processing may be performed on the IC chip 210, for example, by executing a routine 285 stored (or storable) in the ROM 280 with the microprocessor 260. Each pixel detector in the array has a unique positional location on the detection array and can uniquely identify the count output from the high speed counter associated with the respective pixel detector. Thus, TOF data collected by the two-dimensional detection array 230 can be signal processed to provide an accurate distance to a three-dimensional object surface, such as a user's finger and substrate. It is to be understood that the output from the CMOS compatible detector 240 may be acquired randomly, if desired, which allows TOF data to be output in any order.

Light source 220 is preferably an LED or laser that emits energy at a wavelength of about 800nm, although other wavelengths may be used. Below 800nm wavelength, the emitted light starts to become visible and the laser efficiency is reduced. Above 900nm, the efficiency of CMOS sensors decreases rapidly, and in summary, 1100nm is the upper wavelength limit of devices fabricated on silicon substrates, such as IC 210. As previously described, by emitting pulses of light having a particular wavelength, and by filtering out input light of a different wavelength, the system 200 may operate in the presence or absence of ambient light. If the substrate 50 contains ridges, for example, that define the outline of the virtual keys, the user can type word by word in the dark and the system 200 can still function properly. This ability to operate independently of ambient light is entirely contrary to existing solutions such as that described by Korth. As described previously, the present invention can be used in the dark even for users who have not reached the touch typing level by providing an image of the virtual keyboard on the display of the pairing apparatus 80.

As previously described, the lens 288 preferably focuses the filtered input light energy onto the sensor array 230 so that each pixel in the array receives light from only a particular point in the field of view (e.g., a point on the surface of the object). The nature of the propagation of the light wave allows the light to be focused onto the sensor array using a common lens 288. If a lens is required to focus the emitted light, a single lens can be used for 288, 288' if a mirror-like structure is used.

In practical applications, the sensor array 230 preferably has sufficient resolution to distinguish between target distances of about 1 cm, which means that each pixel must be able to resolve a time difference of about 70 picoseconds (e.g., 1 cm/C). In the case of CMOS implemented system specifications, the high speed counter 320 must be able to resolve time to within about 70 picoseconds, and the peak pulse detector 315 must be a low noise high speed device that is also able to resolve about 70 picoseconds (after averaging about 100 samples) with a detection sensitivity of about several hundred microvolts ( V). Accurate distance measurement requires that impulse detector response time be removed from the total common time. Finally, the CLOCK signal output by circuit 280 should have a period of approximately 2 ns.

As previously mentioned, each interpolation counter 320 is preferably capable of resolving a distance of about 1 centimeter, which means that time is resolved to about 70 picoseconds. Using a 10-bit counter with an active cycle time of 70 picoseconds can produce a maximum system detection distance of about 10 meters (e.g., 1024 centimeters). In the worst case, a channel of 40 gates is typically required to implement a typical 10 counter, each gate typically requiring 200 picoseconds, with a total propagation time of about 8 ns. This limits the fastest system clock cycle time to about 10 ns. The counter propagation time can be reduced at the expense of using carry-look-ahead hardware, but a system cycle time of 2ns is difficult to achieve.

To obtain the required cycle time, a so-called pseudo-random sequence counter (PRSC), sometimes called Linear Shift Register (LSR), may be used. Details regarding the implementation of high speed counters including PRSC devices can be found in the applicant's aforementioned co-pending invention patent application.

Various considerations involved in identifying the profile of a user's finger within an observed light field are now explained with reference to FIG. 5, which depicts a cross-section of the user's two fingers. The + sign represents a sub-frame ((intra) samples) of the vector distance values for each pixel sensor in the array 210 imaging the finger, in each sample obtained, the inherent noise associated with the pixel sensor produces a constantly changing vector distance relative to the same point of the imaged finger object, the measurement results in the sensor and each pixel to produce an average for the frame, represented by the O sign in FIG. 5, the □ sign in FIG. 5 represents the corrected average when a saved template of an exemplary finger shape cross-section or a set of saved exemplary finger shape cross-sections are used by routine 285 to interpret the average.

Data collection noise affects the minimum frame rate required to identify the user's finger and determine finger motion and speed. In TOF-based imaging, as used in the present invention, pixel-level noise manifests itself as a change in the distance value of a given pixel, which varies from frame to frame, even if the imaged object remains stationary.

For ease of illustration, the keyboard images depicted in fig. 1A and 2A, 2B are depicted as a matrix, e.g., uniform rows and columns. In practice, however, as partially shown in fig. 6, a standard QWERTY-type keyboard (and indeed keyboards with other key configurations) is designed in an offset or staggered configuration. Advantageously, the present invention reduces the requirement for Z-axis resolution by allowing for staggering of the actual keyboard layout. Thus, the second row from the top of the keyboard moves slightly to the right, the third row (counted from the top) moves more to the right, and so on. This staggering places the keys in rows in positions that are offset relative to the keys in an adjacent row. For example, note the keyboard letter "G" in fig. 6. The dashed rectangle 400 indicates the range allowed for the user to hit the letter "G", e.g., any virtual contact within this rectangular area will be unambiguously understood as the user's finger touching the letter "G". The height of the rectangle (denoted by Z) is the maximum margin of error that is allowable when detecting the Z-axis coordinate. Note that the tolerance is greater than the height of a single row R in a QWERTY keyboard. It is further noted that the identification area for a certain key need not be rectangular but may be of any reasonable shape, for example an oval shape centred on the key.

CPU270 and routines 285 may use the three-dimensional data frames to identify the user's finger from the data obtained, as desired. This task is simplified because the data actually comprises a three-dimensional representation of the user's finger, and the finger has a well-known shape, e.g., when viewed from the edge, the finger somewhat resembles a cylinder in shape. As previously described, storing templates of finger shapes and finger and hand heuristics in memory 280 speeds up finger recognition by reducing the CPU time required to recognize and track finger positions. Such signal processing can quickly reduce data collection noise and make it easier to discern the user's finger from within the three-dimensional data obtained. The signal-to-noise ratio may also be improved in the intra-frame state because the conditions of the scene being imaged are known, including the virtual keyboard and the user's hands, for example. It is preferable to average several hundred data captures or to use several hundred data captures to construct data obtained for one frame.

Once the user's finger is identified, software routine 285 (or an equivalent routine, possibly executed by a device other than CPU 260) may then determine the position and motion (e.g., relative change in position per unit time) of the finger. Since the data representing the fingers is in three dimensions, routine 285 can easily eliminate the background image and focus only on the user's hands. In the Korth two-dimensional imaging scheme, this task is very difficult because the shape and movement of background objects (e.g., the user's sleeves, arms, body, chair contours, etc.) can interfere with the object tracking and recognition software routines.

Using the contour of the fingertip, routine 285 uses the Z-axis distance measurements to determine the position of the finger relative to the rows of the virtual keyboard, such as distances Z1 or Z2 in FIG. 1A. As previously mentioned, the granularity of such axis measurements is significantly larger than that depicted in fig. 1A. The X-axis distance measurements provide data relating to the position of the fingertip relative to the columns of the virtual keyboard. By utilizing the row and column coordinate numbers, software 285 may determine the actual virtual key touched by each finger, such as key "T" in FIG. 1A, touched by the left index finger.

To assist the user in locating a finger on a particular virtual input device, such as a keyboard, numeric keypad, telephone dial, etc., software within the companion device 80 may be used to display a soft keyboard on a screen 90 associated with the device (e.g., a PDA or cell phone screen), or on a display terminal coupled to the device 80. The soft keyboard image will show the user's finger positions for all keys on (or near) virtual keyboard 50 by, for example, highlighting the keys that are directly beneath the user's finger. When a key is actually actuated (as perceived by the user's finger movement), the actuated key may be highlighted using a different color or an opposite color. If the virtual key is not in the correct resting position, the user may command the pairing device to place the virtual keyboard or other input device in the correct starting position. For example, if the user generally places the right finger over start keys J, K, L and ": "Up, placing the left finger on the start key F, D, S and A begins the keystroke, the software will move the keys of the virtual keyboard to such positions.

Vertical Y-axis motion of the user's fingers is detected to determine which virtual keys on the device 50 are being struck. When striking a mechanical keyboard, several fingers may be simultaneously actuated, but in the absence of a double key input, such as simultaneously depressing the CONTROL key and the "P" key, or in the absence of typographical errors, typically only one finger strikes a key. In the present invention, the software routine 285 determines finger motion information from successive frames of acquired information. Advantageously, the human hand imposes certain limitations on the motion of the fingers, which are employed in the images simulating the user's hands and fingers. For example, the connected nature of the fingers imposes some coupling between the movements of the fingers. The degrees of freedom at the finger joints are to be moved, for example each finger approaching or moving away from the other fingers gives a certain freedom. Advantageously, routine 285 may employ several heuristics to determine which virtual key is being actuated. For example, a keystroke may be detected as beginning with an upward movement of a detected finger followed by a quick finger-down action. The user's finger with the minimum Y-axis position or the maximum downward velocity is selected as the keyboard input finger, e.g., the finger that will strike one of the virtual keys in the virtual data input device.

By intelligently monitoring the movement of the user's fingers, unintended keyboard input by the user is discerned. For example, a user may place a finger on the surface of the base 50 without triggering unintended key input. This is similar to the situation where a typist using a mechanical keyboard places his or her fingers on the keys without pressing down any of the keys hard enough to type. The user of the present invention is also allowed to move his or her finger gently on the virtual keyboard without inadvertently activating any of the keys. Software 285 may calibrate its operation such that only intentional gestures are considered valid keyboard inputs for entering data or commands into paired computer device 80.

When executed by a CPU, such as CPU270, software 285 may be used to implement an algorithm or routine for identifying which virtual keys are being struck by a user of the present invention. The input data to the algorithm is three-dimensional optical information obtained from the sensor 20. The exemplary algorithm can be seen to have three phases: templates are created and personalized, calibrated, and actually tracked for users who are striking a virtual keyboard or work surface. In the following description, it will be assumed that normal typing using all fingers is performed. For the case where only one or two fingers are used, a special case of the algorithm will apply.

Templates are predefined models of different typing gestures of different users. Such templates are based on an analysis of a group of system users whose various typing styles are categorized. Note that the template may result from an example of input data (e.g., an example of data collected by viewing a finger in a typing position) or from a pre-programmed mathematical description of the geometric characteristics of the tracked object (e.g., a cylindrical description of a finger). The resulting template may be generated at the time the ROM is prepared, particularly routine 285. Since the location and shape of the keyboard keys impose some stylistic commonality to the user, it is to be understood that the number of predetermined templates need not be excessive.

Preferably, individual users of the present invention can also build their own specialized templates using training tools that guide the user through the steps required to build the template. For example, the trainer section of the software 285 may present a command on the display 90 telling the user to place his or her fingers in typing positions on the virtual keyboard (if present) or working surface in front of the pairing device 80. The training program then tells the user to repeatedly press the virtual keys under each finger. Optically capturing thumb movement can be considered a special case because thumb movement is different from the movement of other fingers and is generally limited to pressing a virtual keyboard or spacebar region of a work surface.

In creating the template, the categories of objects in the template image are preferably constructed as different fingers of the user's hands. As described in more detail below, the method steps collect information about the physical characteristics of the user's hands for a classification program or algorithm routine. The classifier then uses the template to quickly map the images in the acquired frames to the individual user's fingers during the actual typing process. As part of the template construction, a map of the user's finger positions relative to the particular keyboard key in the rest position is preferably determined. For example, routine 285 and CPU270 may notify pairing apparatus 80 that the user's left finger touched at rest: the "A", "S", "D", and "F" keys, the right finger of the user touches the "J", "K", "L", and ": "bond". This method step personalizes the virtual keyboard to the style of the particular user. This personalization process is performed only once and does not have to be repeated unless the user's typing gesture changes significantly to the point where too many wrong keys are identified as having been hit. The calibration procedure according to the invention can be performed as follows. At the start of typing, the user will signal the companion device 80 by placing the application being run by the device 80 in a text entry mode. For example, if device 80 is a PDA, the user may touch a text field displayed on screen 80 with a stylus or finger to center the input of the companion device 80 application to the text field. Other paired devices may be adjusted to the appropriate text entry mode using programs associated with the other paired devices.

The user's fingers are then placed in typing positions on the work surface in front of the three-dimensional sensor 20, either in typing positions on a virtual keyboard or simply on the work surface. This step is used to map the user's fingers to the elements of the template and align the user's fingers with the keys of the virtual keyboard (or work surface) before beginning typing.

At this point, the three-dimensional sensor 20 will repeatedly capture the silhouette of the user's finger. The data so captured will be placed by the software 285 into a table or matrix as shown in fig. 7A-7O.

Fig. 7A depicts the left hand of a user hitting a real keyboard when imaged by sensor 20. The field of view (FOV) of the sensor 20 is intentionally directed to the upper work surface, which in this example is the actual keyboard. Five fingers of the left hand are shown in the figure and can be identified as fingers 1 (thumb), 2, 3, 4 and 5 (pinky). The cross-hatched area behind and between the fingers represents an area that is too dark to be considered part of the user's finger by the present invention. In a practical setting, the darkness will of course vary continuously, rather than the uniform dark area represented here for purposes of understanding and explanation.

An overlaid grid matrix or table is shown in fig. 7A, where different areas have quantified numbers representing normalized vector distances between the relevant surface portions of the user's finger and the sensor 20. It is to be appreciated that these quantized distance values are dynamically calculated by the present invention, for example, by software 285. In the map shown in FIG. 7A, low numerical values, e.g., 1, 2, indicate close distances, and higher numerical values, e.g., 7, 8, indicate far distances. The "d" value represents a perceived discontinuity. According to techniques related to the sensor 20, the "d" values may oscillate greatly apart and indicate the absence of foreground objects. In fig. 7A, the quantized distance value indicates that the user's left thumb is far away from the sensor 20 (indicated by the relatively high distances of 7 and 8) compared to the user's left index finger whose distance value is low, e.g., 1. It can further be seen that the user's left little finger is generally spaced from the sensor 20 by a greater distance than the user's index finger is spaced from the sensor 20.

The central portion of FIG. 7A is a table or matrix representing normalized distance values and where appropriate "d" entries. Similar tables are shown in FIGS. 7B-7O. The table entries may represent the contours of the user's finger and shadows have been added to these tables to help represent potential mappings of distance data to the user's finger contours. The arrows pointing from the FOV portion of FIG. 7A to the columns in the table indicate how the individual data columns can represent the outline of the user's finger position. In the tables shown in fig. 7A-7O, the circled numbers "1", "2" … "5" describe contours corresponding to the perceived positions of the user's left thumb (finger "1"), index finger, middle finger, ring finger, and little finger (finger "5"), respectively.

As previously mentioned, templates are preferably used in the present invention to help identify the user's finger position from the data obtained from the sensor 20. When the discontinuities are not necessarily apparent, the templates may help the classification algorithm (or classifier) 285 to distinguish the boundaries between fingers. For example, in FIG. 7A, the user's third and fourth fingers (fingers 3 and 4) are relatively close together.

Shown at the bottom of FIG. 7A is a dynamic display of what the user is typing based on the analysis of the present invention of sensor-perceived distance values, dynamic velocity values, and heuristics related to the overall task of identifying which keys (real or virtual) are being pressed at what time. Thus, at The time captured in FIG. 7A, because The partially entered phrase 100 may appear on The display 90 of The pairing device 80, The user's left index finger (finger 2) appears to have just entered The letter "f", which may be in The sentence "The quick brown fox jumped over The lazy dog".

The calibration phase of software routine 285 is preferably user friendly. Thus, routine 285 essentially moves or relocates the virtual keyboard below the user's finger. This procedure may be performed by mapping the image obtained from the sensor 20 to the finger of the template, and then mapping the touched key to the user's natural position, which is determined during the template construction phase.

The calibration step determines an initial state or rest position and maps the user's finger in the rest position to a particular key on the keyboard. As shown in fig. 1B, it is preferable to highlight the "keys" 107 that are touched or very close to (but not pressed) on the soft keyboard 105 displayed on the screen 90 of the pairing apparatus 80, assuming the presence of the screen 90. The rest position is also the position of the user's fingers at the end of the typing session.

During actual typing, routine 285 senses the user's fingers and maps the finger movements to the correct keys on the virtual keyboard. Prior to beginning this phase of the algorithm, the associated companion device 80 application has been placed into a text entry mode, and the associated companion device 80 application will be ready to accept keyboard events (e.g., KEYUP and KEYDOWN).

The routine 285 (or equivalent) may be implemented in a number of ways. In the preferred embodiment, routine 285 uses three modules. The "classifier" module is used to map the clusters in each frame to the user's finger. The "tracking program" module is used to track the motion of the active finger by searching for the finger striking the key and determining the coordinates of the striking point between the user's finger and a location on the virtual keyboard or other work surface. The third "mapping program" module maps the hit point of the user's finger to a particular key on the virtual keyboard and sends a key event to the pairing device 80. These exemplary modules are described in more detail below.

The task of the classifier simulation is to perceive a profile of the scene produced by the sensor 20 for each frame of data acquired optically. The clustering module will identify clusters that have some commonality, such as being part of the same surface. Importantly, the classifier will label each cluster so that the same cluster can be distinguished from other clusters in successive frames of data obtained. The classification program also determines the boundaries of the individual clusters, and in particular the tips of the individual clusters, which correspond to the tips of the user's fingers. The purpose is not to identify the user's finger itself, as in reality the user may be holding a stick or stylus for pressing a virtual key or virtual location of a key. The templates are thus used primarily to give meaning to these clusters and to help form the clusters.

One method of clustering or locating clusters is to use nearest neighbor conditions to form nearest neighbor partitions, where each partition maps to a respective finger of the user. Such a mapping would result in 5 partitions for the user's left hand and 5 partitions for the user's right hand, where the left hand and right hand partitions can be processed separately.

A partition forming method is based on the Llyod algorithm. Details of this algorithm, well known in the field of image processing, can be found in the textbook vector quantization and Signal compression of Allen Gersho and Robert Gray (see page 362). For example, assume C_t＝{c_i(ii) a i 1.. 5} is a set of partitions for one hand. In each partition, a set of points P is determined_i，t＝{r：d(r，c_i)＜d(r，c_j) (ii) a j < > i, where the function d () is a measure of the distance between two points in the set. If d (r, c)_i)＝d(r，c_j) Then this "equality" can be broken by placing the point in the set with the smaller subscript. For two points a and b, d (a, b) can be defined as (x)_a-x_b)²+(y_a-y_b)²+(z_a-z_b)²Where x, y and z are axis measurements obtained from the sensor 20. Function center (P)_i，t) Can be defined as P_i，tThe center of gravity or centroid of the midpoint. Definition of C_t+1＝{center(P_i，t) (ii) a 1.. 5 }. By utilizing the new centroid, P can be found as described above_i，t+1. Iterations continue (e.g., by routine 285 or equivalent) until two consecutive P' s_iThe totality of elements of the set remains unchanged. In general, through 3-4 iterations, the iterations converge, and the final set P_iThe points in (1) are clusters of points for the respective user's fingers. With this approach, the primary purpose of the classification procedure is not to identify the user's finger itself, but rather to determine which key was struck by the user's finger. This observation enables the sorting program to tolerate clustering errors in the keypad periphery that do not affect system performance.

The tracking program module will be described more fully below with respect to the matrices shown in fig. 7A-7O, where the clusters are shaded to aid in the visual understanding of the data. The perceived clusters are preferably entered into a tracking program module that will continuously track the movement of the individual clusters. The tracking program module takes care especially of the faster up and down movement and will calculate the speed and method of the clustering.

Figures 7D-7K depict matrix tables representing a series of images obtained when the user's second finger is raised and then moved downwardly to strike a (virtual) key located below the tip of the second finger. The tips of the clusters that are closely monitored by the tracker module are preferably identified by the classifier module. In a real image, the other user's fingers may also move slightly, but in the example described the classification program determines that the acceleration rate of the left index finger (finger 2) is significantly greater than the movement of the other fingers.

In fig. 7D-7D, pointing arrows are added that represent the direction and tip of the perceived cluster (e.g., the user's finger). In fig. 7D-7F, the clustering of finger movements is up, and fig. 7F represents the maximum upward position of the user's finger, e.g., the maximum Y-axis position determined from the data obtained by sensor 20. In fig. 7G-7H, the cluster of fingers is now moving downward, e.g., toward virtual keyboard 50 or work surface 60. In FIG. 7I, contact of a user's finger with a virtual key or key location on a work surface is sensed.

Routine 285 (or other routines) may calculate the vertical velocity of the fingertip in several ways. In a preferred embodiment, the tracking program module calculates the vertical velocity of the user's fingertip (identified by the classification program) by dividing the difference between the highest and lowest positions of the fingertip by the number of frames obtained within the sequence. The velocity is calculated using a Y-axis resolution in terms of number of frames, which is independent of the frame rate per second. To record keystrokes, the calculated Y-axis speed must be equal to or greater than a threshold speed. The threshold speed is a parameter used by the software 285 and is preferably adjustable by the user during the personalization step.

7J-7O depict a matrix table in which a more complex sequence represents the user's left index finger (finger 2) movement along the lower back. In fig. 7O, the finger action is represented as reaching the vertex of a keystroke on a certain key in the first row of the virtual keyboard (or on a work surface in front of device 80 where such a virtual key would be found).

Referring now to the mapping program module, when it determines that a keystroke has been detected, the tracking program module will notify the mapping program module and the tracking program module passes the cluster tip (X, Y, Z) coordinates of the cluster tip. The mapping program module uses the Z-axis values to determine the row locations on the virtual keyboard and the X-axis and Y-axis values to determine the keys in the row. For example, referring to fig. 1A, the coordinate (X, Y, Z) location (7, 0, 3) may represent the letter "T" on the virtual keyboard. It will also be appreciated that the various modules preferably comprise portions of the software routine 285, although other routines may be used, including routines executed by other execution devices other than the CPU 285.

Modifications and variations may be made to the disclosed embodiments without departing from the subject and spirit of the invention as defined by the following claims. For example, more than one sensor may be employed to acquire three-dimensional positional information, if desired.

Claims

1. A method for a user to interact with a virtual input device by using a user-controllable object, the method comprising the steps of:

(c) outputting the information processed in step (b) to the pairing device.

2. The method of claim 1, wherein the sensor utilizes at least one of: (i) time of flight from the sensor to a surface portion of the user-controllable object, (ii) data based on luminosity, (iii) a stereographically arranged camera, and (iv) obtaining the information using a solid-state sensor having an x-axis to y-axis aspect ratio greater than 2: 1.

3. The method of claim 1, wherein the user-controllable object is selected from the group consisting of (i) a finger on a user's hand, and (ii) a stylus device.

4. The method of claim 1, wherein said work surface is selected from the group consisting of (i) a three-dimensional space, (ii) a physical plane, (iii) a substrate, (iv) a substrate supporting a user-viewable image of an actual keyboard, (v) a substrate on which a user-viewable image of an actual keyboard is projected, (v) a substrate on which a user-viewable typing guide is projected, (vii) a passive substrate supporting a user-viewable image of an actual keyboard and including passive key-like regions that provide tactile feedback when depressed by said user's fingers, and (viii) a substrate that is at least 15.2cm x 30.4cm in size when in use, but is less than 15.2cm x 20.3cm in size when not in use, and (ix) a virtual plane.

5. The method of claim 1, further comprising providing feedback to the user guiding placement of the user-controllable object, the feedback comprising at least one of: (i) tactile feedback simulating a user striking an actual keyboard, (ii) audible feedback, (iii) visual feedback describing an image of at least one keyboard key, (iv) visual feedback wherein a virtual key touched by the user-controllable object is visually distinguishable from other virtual keys, and (iii) visual feedback describing data input by the user-controllable object.

6. The method of claim 1, wherein the determination made in step (b) comprises at least one of: (i) a numeric code representing an alphanumeric character, (ii) a numeric code representing a command, and (iii) a numeric code representing a trajectory of a point tracked by the user-controllable object.

7. The method of claim 1, wherein (b) comprises utilizing at least one of: (i) a position of a distal portion of the user-controllable object, (ii) velocity information of the distal portion in at least one direction, (iii) matching the obtained information with a template model of the user-controllable object, (iv) hysteresis information processing, and (v) determining a spatial position of the distal portion of the user-controllable object with respect to a certain position on the work surface using linguistic knowledge of data input by the virtual input device.

8. The method of claim 1, further comprising mapping the position of a tip portion of the user-controllable object to keys on an actual keyboard, and determining which of the keys have been struck if the keys are present on the working surface.

9. The method of claim 1, wherein the user-controllable object comprises a plurality of fingers on the user's hand, wherein data is collected by the sensor in frames such that position coordinate information can be obtained from a single one of the frames.

10. The method of claim 1, wherein the pairing system comprises at least one of: (i) a PDA, (ii) a wireless telephone, (iii) a set-top box, (iv) a computer, and (v) an appliance adapted to accept input data.

11. The method of claim 9, wherein step (b) comprises processing position coordinate information obtained in successive frames to determine at least one of: (i) position coordinate information associated with at least two fingers of the one hand of the user, and (ii) position coordinate information associated with at least two fingers of the one hand of the user, including a vertical velocity component of the at least two fingers.

12. The method of claim 1, wherein at least one of: (i) obtaining the location coordinate information, and (ii) processing the information.

13. A system for use with a companion device that receives digital input provided by a user manipulating a user-controllable object relative to a virtual input device, comprising:

14. The system of claim 13, wherein the sensor utilizes at least one of: (i) a solid-state sensor having an x-axis to y-axis aspect ratio greater than 2: 1, (ii) a stereographically arranged camera, (iii) a time of flight from the sensor to a surface portion of the user-controllable object, and (iv) obtaining the information based on data of luminosity.

15. The system of claim 13, wherein the user-controllable object is selected from the group consisting of (i) a finger on the user's hand, and (ii) a stylus device.

16. The system of claim 13, wherein said work surface is selected from the group consisting of (i) a three-dimensional space, (ii) a physical plane, (iii) a substrate, (iv) a substrate supporting a user-viewable image of an actual keyboard, (v) a substrate on which a user-viewable image of an actual keyboard is projected, (v) a substrate on which a user-viewable typing guide is projected, (vii) a passive substrate supporting a user-viewable image of an actual keyboard and including passive key-like regions that provide tactile feedback when depressed by said user's fingers, and (viii) a substrate that is at least 15.2cm x 30.4cm in size when in use, but is less than 15.2cm x 20.3cm in size when not in use, and (ix) a virtual plane.

17. The system of claim 13, wherein the system provides feedback to the user that guides placement of the user-controllable object, the feedback comprising at least one of: (i) tactile feedback simulating a user striking an actual keyboard, (ii) audible feedback, (iii) visual feedback describing an image of at least one keyboard key, (iv) visual feedback wherein keys touched by the user-controllable object are visually distinguishable from other virtual keys, and (v) visual feedback describing information input by the user-controllable object.

18. The system of claim 13, wherein said information includes at least one of: (i) a numeric code representing an alphanumeric character, (ii) a numeric code representing a command, and (iii) a numeric code representing a trajectory of a point tracked by the user-controllable object.

19. The system of claim 13, wherein the processor operates by utilizing at least one of: (i) a three-dimensional position of a tip portion of the user-controllable object, (ii) velocity information of the tip portion in at least one direction, (iii) matching the obtained information to a template model of the user-controllable object, (iv) hysteresis information processing, and (v) determining a spatial position of the tip portion of the user-controllable object relative to a position on the work surface using linguistic knowledge of data being input by the virtual input device.

20. The system of claim 13, wherein the processor maps the position of the tip portion of the user-controllable object to keys on an actual keyboard and determines which of the keys have been struck if the keys are present on the work surface.

21. The system of claim 13, wherein at least one of: (i) obtaining the location coordinate information, and (ii) processing the obtained information.

22. The system of claim 13, further comprising a sensor array capable of obtaining the position coordinate information, wherein the array and the processor are implemented on a single integrated circuit.

23. A system for allowing a user to interact with a virtual input device by manipulating user-controllable objects, comprising:

24. The system of claim 23, wherein said user-controllable object is selected from the group consisting of (i) a finger on one hand of said user, and (ii) a stylus device, wherein said work surface is selected from the group consisting of (i) a three-dimensional space, (ii) a physical plane, (iii) a substrate, (iv) a substrate bearing a user-viewable image of an actual keyboard, (v) a substrate upon which a user-viewable image of an actual keyboard is projected, (v) a substrate upon which a user-viewable typing guide is projected, (vii) a passive substrate bearing a user-viewable image of an actual keyboard and comprising passive key-like regions that emit audible sound when depressed by said user's finger, (viii) a substrate that is at least 15.2cm x 30.4cm in size when in use, but is less than 15.2cm x 20.3cm in size when not in use, and (ix) a virtual plane.

25. The system of claim 23, wherein at least one of: (i) obtaining the location coordinate information, and (ii) processing the obtained information.