WO2008010950A2

WO2008010950A2 - Free-space multi-dimensional absolute pointer using a projection marker system

Info

Publication number: WO2008010950A2
Application number: PCT/US2007/015955
Authority: WO
Inventors: Anders Grunnet-Jepsen; John Sweetster; Gopalan Panchanathan
Original assignee: Thinkoptics, Inc.
Priority date: 2006-07-17
Filing date: 2007-07-13
Publication date: 2008-01-24
Also published as: WO2008010950A3

Abstract

Methods and apparatuses for a free-space multi-dimensional absolute pointer using a projection marker system are described herein. In one embodiment, a presentation system includes, but is not limited to, a projection- based marker apparatus to project one or more optical spots on a display surface for displaying machine generated content capable of being manipulated via a cursor of a pointing device, a handheld device to wirelessly capture the projected optical spots from the display surface, and a control unit communicatively coupled to the projection-based marker apparatus and the handheld device to determine coordinates of the cursor based on characteristics of the captured light spots. Other methods and apparatuses are also described.

Description

FREE-SPACE MULTI-DIMENSIONAL ABSOLUTE POINTER USING A PROJECTION MARKER SYSTEM

RELATED APPLICATIONS

This application is a PCT application claiming the priority of co-pending U.S. Patent Application No. 1 1/777,073, filed July 12, 2007 and U.S. Patent Application No. 11/777,078, filed July 12, 2007, which claim the benefit of U.S. Provisional Application No. 60/831 ,735, filed July 17, 2006 and U.S. Provisional Application No. 60/840,881, filed August 28, 2006. The disclosure of the above- referenced applications is incorporated by reference herein in its entirety.

FIELD OF THE INVENTION

[0001] The present invention relates generally to a data processing system. More particularly, this invention relates to free-space multi-dimensional absolute pointer using a projection marker system.

BACKGROUND

[0002] Among the several handheld devices that exist for remotely controlling electronic equipment, the free-space multi-dimensional absolute pointer (as described in the above-incorporated applications) stands to bring the ease of use by unifying control of nearly all devices under one simple operational paradigm, point-twist-zoom. In a similar way that the mouse and the graphical user interface brought the simplicity and user-friendliness to the PC (personal computer) platform in the early 1970's with its "point-and-click" paradigm, the world of the digital living room is now experiencing a rapid convergence of electronic equipment and technologies that are overwhelming the control capabilities of traditional interfaces, such as universal IR remote controls, mice, and keyboards.

[0003] This is becoming even more evident with several key consumer trends: 1) strong sales of large screen digital TVs, 2) strong demand for digital video recording functionality (e.g.,: TiVo) and advanced TV viewing, 3) pervasive entrenchment of the internet in all aspects of human life (e.g.,: information search, travel, purchase/sales, banking, etc.), 4) nearly complete conversion to digital cameras and camcorders in the USA, and 5) increased demand for gaming for recreational purposes (e.g., on-line games, casual games, multi-player games, etc.). When these trends collide in the digital living room, the user needs a simple device and user paradigm to be able to manage and navigate this flood of content. [0004] Further, historically, the advertising industry has searched for ways to become more effective, and commercial advertising has become increasingly invasive. Consequently, consumers have, over the years, developed a strong ambivalence with the advertising industry and its traditional advertising models. On the one hand consumers do recognize many of the inherent benefits to being exposed to advertising, such as the need for being informed about new products that may interest them. They also acknowledge the indirect benefits of being able to receive free services, such as TV or radio shows, at the cost of being exposed to regular advertisements (e.g., commercials every 10 minutes) or continual ads (e.g., banner ads on the internet). However, all these benefits tend to come at the cost of a veritable flood of advertisements that are mostly intrusive, time consuming, and unwanted. The result has been that consumers are adapting to them by either ignoring them, as background noise, or finding clever ways of avoiding them altogether with such tools as time-shifted recordings and commercial skipping (e.g., personal video recorder or PVRs) when watching TV or pop-up blockers on web browsers. In response, the advertising industry is anxiously trying to adapt to these changing patterns by finding new ways to advertise more effectively. Unfortunately, this has, in large part, resulted in the advertising industry becoming even more intrusive by increasing the frequency of the ads, by using clever product placements in shows or by using viral advertising campaigns. The irony in this escalation is that neither side ends up satisfied and the race continues.

[0005] Online advertising is not much different despite the wonderful potential for interactivity offered by the internet. Web advertising has instead borrowed almost entirely the mass media advertising model, with very poor results as evidenced by the poor "click-through" rates of, for example, "banner ads".

[0006] Arguably, the most effective advertising model to date, has been the Google Search model whereby the consumer receives a service, i.e. the ability to find something fast that interests him, while being subsequently exposed to general as well as sponsored search results and hyperlinks that are directly applicable to what the user is looking for. This model has the merits of 1) being on-demand, meaning that it is only present when the consumer wants it to be, and 2) being relevant, personalized and targeted to the specific and immediate interests of the consumer. These are the traits that bring users back to the service rather than turn them away. This is a model in which both the consumer and the advertiser benefit. Given the success of this model, the challenge and purpose of this invention is to bring these traits to other media or services. [0007] When examining advertising in media today, it is also important to realize how delivery of media entertainment and content is undergoing a rapid transformation. Traditionally, "TV entertainment" has been enjoyed only in the living room or bedroom in front of the CRT TV. This is no longer true and will become even more archaic in the near future. For example, several companies now offer the ability to transport your TV shows directly from your home to your laptop or desktop PC, to be enjoyed as a small inset window or in full-screen mode. It is even possible to watch shows on mobile phones, PDAs, or mobile media players, such as the iconoclastic IPOD. Entertainment programming can now easily be downloaded or ported via rewritable DVDs or flash memory sticks. In the digital living room, multimedia content may just as easily come from hundreds of TV channels from the cable or satellite box, as from PVRs or online websites. With all this content and digital entertainment in all its forms, a need exists for a tool or service that all consumers/viewers would find beneficial, and which shares the ideal advertising traits exemplified by, for example, the Google Search model.

SUMMARY OF THE DESCRIPTION

[0008] Methods and apparatuses for free-space multi-dimensional absolute pointer using a projection marker system are described herein. In one aspect of the invention, a presentation system includes, but is not limited to, a projection- based marker apparatus to project one or more optical spots on a display surface for displaying machine generated content capable of being manipulated via a cursor of a pointing device, a handheld device to wirelessly capture the projected optical spots from the display surface, and a control unit communicatively coupled to the projection-based marker apparatus and the handheld device to determine coordinates of the cursor based on characteristics of the captured light spots.

[0009] According to another aspect of the invention, data having full descriptions and hyperlinks are tagged to specific objects in moving images and the invisible hyperlinks move dynamically to continually track the associated object. In one embodiment, a pointing device can be used to point to objects in the scene, whether moving or stationary, and, by appropriate action such as clicking or activating a button, substantially immediately recall part or all of the metadata content that pertains to the object. -

[0010] Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.

BRIEF DESCRIPTION OF THE DRA WfNGS [0011] The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.

[0012] Figure 1 is a block diagram illustrating major components of a system which may be used with one embodiment of the invention.

[0013] Figures 2A and 2B are diagrams illustrating certain light spots according to certain embodiments of the invention.

[0014] Figures 3 A-3C are diagrams illustrating certain configurations of major components of a system according to one embodiment of the invention.

[0015] Figures 4A-4C are block diagrams illustrating an operating environment according to one embodiment of the invention.

[0016] Figure 5 is a block diagram illustrating a presentation system according to one embodiment.

[0017] Figure 6 is a block diagram illustrating a presentation system according to an alternative embodiment.

[0018] Figures 7A and 7B are block diagrams illustrating a presentation system according to certain alternative embodiments.

[0019] Figure 8 is a block diagram illustrating a data processing system which may be used with one embodiment of the invention.

[0020] Figure 9 shows an example movie scene that will be used to exemplify a video stream that may contain various objects that may or may not be stationary.

[0021] Figure 10 illustrates an example of how the objects in a TV scene may have been uniquely tagged with invisible reference (e.g., "hyperlink") areas, according to certain embodiments.

[0022] Figure 1 1 illustrates how a user may use a free-space absolute pointing device to easily point to and select an object in a scene. Note that the tags are not directly visible to the user, according to one embodiment.

[0023] Figure 12 shows the results of such a visual search query according to one embodiment. [0024] Figure 13 shows an example of a click-history that the user can pull up at his convenience at a later time according to one embodiment.

[0025] Figure 14 shows an example file format for metadata according to one embodiment.

[0026] Figure 15 shows another embodiment of the metadata tagging file format.

[0027] Figure 16 shows one embodiment of the metadata. In one embodiment this metadata is strictly informative and yields results akin to a visual search query according to one embodiment.

DETAILED DESCRIPTION

[0028] Methods and apparatuses for a free-space multi-dimensional absolute pointer using a projection marker system are described herein. In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present invention. It will be apparent, however, to one skilled in the art, that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring embodiments of the present invention. [0029] Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification do not necessarily all refer to the same embodiment.

Free-Space Multi-Dimensional Absolute Pointer Using A Projection Marker System

[0030] According to certain embodiments of the invention, a free-space absolute pointer, henceforth referred to as the Wavlt, provides such a tool by combining simple 3D pointing with a graphical user interface on a large screen monitor/TV. The Wavlt is an absolute pointer, where a user points is where the cursor goes. It works on any type of screen (e.g., CRT, DLP, RPTV, LCD, Plasma etc). The Wavlt also tracks other degrees of freedom, such as the absolute angle of rotation of the user's wrist, and the user's absolute distance away from the screen. Some versions also track the user's location in the room. All this takes place in real time, and multiple users can use devices at the same time, which is of particular interest for multi-player gaming.

[0031] An embodiment of the invention is to expand on ways in which the Wavlt absolute pointing system may be engineered to work with large front- (and/or rear-) projection screens. Specifically, the techniques described throughout this application focus on how a projection-POD (Photonic Origin Designator) or p-POD may be developed to allow easy usability and setup, primarily for conference room settings.

[0032] The Wavlt multi-dimensional pointer is a high precision tracking system in which the core miniature sensing unit resides inside a handheld remote control device. The device tracks multiple degrees of freedom in absolute space, meaning that not only does it sense where the user is pointing, but it also senses whether the user is twisting his wrist, leaning forward, or sitting to the side of the room. Functionally, it basically acts like a localized GPS (global positioning system) device to track where you are with respect to the TV screen as well a laser-pointer to detect where you are pointing, and a tilt sensor to know how much your wrist is twisted. These functions that typically require several distinct sensing and pointing technologies are achieved using the same underlying optical tracking that is core to the Wavlt system.

[0033] Figure 1 is a block diagram illustrating an exemplary system configuration according to one embodiment of the invention. Referring to Figure 1 , system 100 includes a handheld device 101, a display surface 102, one or more emitters 103-104, and a data processing system 105. In one embodiment, the handheld device 101, hereafter referred to as the Handset, may be used by an end user as a pointing device. This device incorporates an optical sensing unit (such as a CMOS camera) that tracks the Optical Beacons transmitted from a photonic origin designator (POD) 106 (having emitters 103-104) and calculates its own multi-dimensional coordinates. The Handset 101 then sends various data (including coordinates) to the POD 106 using one of several wireless technologies. The handset can also receive RF commands from the POD 106. [0034] A receiving device 106, hereafter referred to as the POD, receives the data from the Handset using a variety of wireless communication protocols, such as, for example IR, Bluetooth, or IEEE 802. xx protocols. This device is coupled to a computer via a communication link such as a USB connection. The receiver channels the data from the Handset into a data processing system 105. The receiver also has the ability to "blast" IR signals to all other Infrared sensitive devices within a predetermined proximity such as a room. A sub-section of the POD is dedicated to generating the Optical Beacons, which serve as the optical markers that are tracked by the handset.

[0035] A host computer 105 (or set-top box) receives the data from the POD. This is handled by a driver, which communicates with the Handset using the USB device. A driver will, based on the data sent, calculate position and pointing coordinates and read the button presses and use this information to control the PC and specific programs or environments.

[0036] All interaction happens via a display surface such as a TV screen. This is the screen on which the content, e.g., movies or internet pages, will be displayed. It is also where additional graphical overlays may appear as dictated by a specific user interface.

[0037] The Wavlt multidimensional tracker is based on optical tracking of one or more spots, or marker images on an optical sensor. In one embodiment, an optical sensor is incorporated into a handset. This arrangement is one of the key aspects of Wavlt system from which many of its highly desirable features are derived. By incorporating specific optical wavelength filtering in the sensor, according to one embodiment, the sensor can be made to only see or detect a light of a specific wavelength range, such as, for example, ranging approximately from 900 tol 000 nm.

[0038] There are a number of ways to generate the optical beacons or markers. In one embodiment, one or more IR LEDs may be incorporated into a POD unit that is placed near the screen, with the IR LEDs emitting into the room, towards the handset. However, it is not so limited. Different embodiments of a POD and/or beacons may be incorporated with one or more IR LEDs that emit into the room. For example, the beacons may be built into the RF receiver and/or USB chip enclosure. Alternatively, the RF receiver may only contain the beacons, and RF reception is handled separately by a separate USB dongle unit with RF receiver. Figures 3 A - 3C illustrate certain configurations of devices according to certain embodiments. Other configurations may exist. [0039] With IR LEDs placed in the POD, according to certain embodiments, these light sources may be seen or detected by the handset as distinct spots. A single spot, as seen by the handset's image sensor and microcontroller, is shown in Figure 2A according to one embodiment. In this example as shown in Figure 2A, the image is approximately 13 x 14 pixels in size.

[0040] It will be appreciated that multiple spots may also be implemented. For example, as shown in Figure 2B, a set of, say, two spots are processed in the handset unit for their pixel coordinates on the sensor and their signal strengths. This information is subsequently sent to the POD receiver which in turn transmits this data to a computer. In this example as shown in Figure 2B, the coordinates are sent by a Zigbee RF 2.4GHz wireless chip. Note that in this example, the array size is approximately 352 x 288; however, fractional coordinates exist (e.g., a spot can have a location of 251.23, 122.57). The resolution is such that it can effectively have approximately 35200 by 28800 pixels. Other camera resolutions can of course be employed. Certain detailed operations of the system are described in the co-pending U.S. Patent Application Nos. 1 1/187,435; 1 1/187,405; and 1 1/187,387, filed July 21, 2005, where the disclosure of which is incorporated by reference herein in its entirety. [0041] There are situations in which placing a physical POD near a screen is either not very feasible or undesirable. This may be the case, for example, in a conference room scenario. Here the user would need to find a way to mount the POD near a screen and would then have a long USB cable extending back to the user's PC in order to make his system operational. Without prior knowledge of the room and/or time to set up before a meeting, this may be a risky undertaking. [0042] One option is to have a 3-piece system, as shown in Figures 3B and 3C, in which the Handset, the RF receiver/USB dongle, and beacons are separate devices. The beacon can now be more portable and adaptable to mounting, such as the spring-loaded battery powered version shown in Figure 3B. The portable beacons could also have an RF sensor to listen for RF communication and shut down to conserve power if no RF signals are sensed. Alternatively, it could be required that all conference rooms be equipped with an inexpensive set of beacons that are permanently mounted on the wall.

[0043] In one embodiment, that obviates the need for the beacons to be mounted or attached to a wall or screen, a virtual POD is created inside the screen. Figures 4A-4C are block diagrams illustrating an operating environment according to one embodiment of the invention. Figure 4A illustrates a standard projection (front or rear) system. Figure 4B illustrates the same system as detected by the sensor in the handset. Because of the optical filtering in the handset, which blocks substantially all visible light, the screen is essentially blank. Figure 4C illustrates the system when the IR sources (e.g., lasers) in the projection apparatus are turned on. In this embodiment, two spots, generated by the projected light scattered by the display screen, are detected by the handset's sensor but are invisible to the human eye. In this example, Figures 4A-4C show a configuration where all elements of the POD (the RF receiver, the beacons, and the USB PC communication) are co-located in a single unit placed on top of the projector, and where the POD projects the two IR spots onto a screen, for example, into the middle of the screen used by the front (or rear) projector. In one embodiment the projection beacons originate from inside the projector (e.g., they are built into the projector system).

[0044] In one embodiment, the two spots are generated by two collimated 98Onm IR lasers, pointing out of the projection-POD at a slightly diverging angle. It should be noted that it is also possible to project light from IR LEDs onto the screen, according to another embodiment, but that care must then be taken to refocus (re-image) the spots whenever the projection-POD is moved so that its distance from the screen changes appreciably. In order to have minimal system dependence (e.g., signal strength and spot size) on the POD location, it is useful to use collimated light sources, and for this reason, lasers are an ideal source. The IR spots could originate from individual sources or from a single source that is optically split into multiple beams.

[0045] The handset will now see two spots when pointing in the vicinity of the screen, in a similar way as if a POD had been mounted in the middle of the projection screen. The projection-POD has the benefit of not requiring cables extending from the screen to the PC. The setup procedure is also relatively simple — simply point the p-POD at the screen. In one embodiment, one or more guiding visible lasers are used to facilitate placement of the invisible IR spots onto the screen.

[0046] The functional components of a p-POD in which the light source and receiver electronics are integrated into a single unit are shown in Figure 5, according to one embodiment. In addition to the one or more IR lasers, according to one embodiment, the following components may be included: a wireless radio device (e.g., Chipcon/TI, CC2420 2.4 GHz radio IC) for receiving and transmitting data from/to the Handset, a micro-controller (e.g., Maxim, MaxQ2000) that controls the data flow between the Handset and computer as well as the switching of the power to the IR lasers, and a computer interface device (e.g., Silicon Labs, CP2102 USB-UART IC).

[0047] Note that some or all of these functions (wireless Rx/Tx, micro- control, and computer interface) may be integrated into one or two chips. For example, the Chipcon/TI CC2430 combines the wireless radio and microcontroller functions. Not shown in Figure 6 are other standard components, such as power supply and management circuitry (batteries or voltage regulators), switching and control devices (e.g., mechanical switches), and indicators such as LEDs showing on/off state. A particular configuration would use the USB computer interface to supply the electrical power for the p-POD, although separate power supplies or batteries may be desirable in some situations (e.g., if the current capacity of the USB interface is exceeded or if remote operation is required). An example of the latter case would be where the lasers are housed separately from the remaining components in the p-POD and the projector is physically separated from the computer. In this case, it may be desirable to power the laser portion using batteries.

[0048] Referring to Figure 5, according to one embodiment, the lasers are mounted such that their emitted beams exit the p-POD at an angle with respect to each other. The specific angle between the beams is not critical, but the optimal range of angles is determined by the range of distances between the POD/projector and the screen and the desired spot separation on the screen. If the separation is too small, then the accuracy in the distance and rotation sensing of the system is reduced, and if the separation is too large, then the angular pointing range of the Handset is reduced since both spots must be within the vision system's field of view.

[0049] For example, for typical operating distances of 2 to 5 meters from a screen with a 50 inch diagonal, a projected spot separation of approximately 15 to 25 cm is a relatively good compromise for a vision system with a 50 degree field of view. If the p-POD is placed approximately 2 to 3 meters from the screen, where a typical front projector would be located, then the optimal angle between the IR beams would be approximately 3 to 7 degrees. For other operating conditions and system characteristics, different optimal beam angles will result. In some configurations, this angle is fixed, and in other configurations, the angle is made to be adjustable. [0050] Figure 6 shows an alternative embodiment in which a single IR laser is used to generate the two beams, and a visible alignment laser is incorporated to aid in the placement of the p-POD. Lasers that may be used for this purpose are those used in commercially available laser pointers (e.g., red diode lasers). These are typically low power (< 5 mW) and inexpensive devices (<$1). In one embodiment, the alignment laser would be turned on during initial set up of the p- POD and pointing system, and the guide laser would be pointed at, or near, the middle of the screen.

[0051] In a particular embodiment, the visible guide laser beam bisects the two IR beams so that when it is pointed at the middle of the screen, it is known that the invisible IR spots are symmetrically located around the middle of the screen. During subsequent operation of the system, the visible guide laser may be turned off and used only when the p-POD alignment needs to be verified. The guide laser may be controlled by the micro-controller or by other means, such as an electro-mechanical switch. The two IR beams are generated from a single laser device using optical beamsplitters and mirrors, both standard optical components. In one embodiment, the beamsplitter divides the incident IR light from the IR laser into two equal power components and transmits one and reflects the other. The reflected beam then reflects off of the mirror and exits the POD. The beamsplitter and mirror are adjusted to provide the desired beam angle. The advantage of this arrangement is that only one IR laser is used, thus saving in cost, component count, and space. However this laser must be more powerful than those used in the two-laser arrangement (approximately twice the power) and additional optical components are needed.

[0052] Note that the visible alignment laser may be included in either the single or two-laser embodiments, and that some or all of the receiver/transmitter components may be housed separately from the lasers. For example, in an alternative embodiment, the optical components of the p-POD are contained in an enclosure that resides near the projector and the receiver components are contained in a small enclosure that plugs into a computer input port (e.g., a USB • dongle device). In this arrangement, the Handset and receiver communicate with each other, and the p-POD is used only to project the reference markers onto the screen. The p-POD would then have its own power source and switch. If it is desirable to communicate and control the lasers in the p-POD remotely, then a micro-controller and wireless chip could be included with the p-POD. This arrangement might be desirable in situations where the projector and computer are located far from each other.

[0053] In one embodiment, the laser beam vectors are slightly diverging (as shown in Figures 5 and 6) and the lasers are co-located with the projector. In this way, the farther away the projector and p-POD are placed from the screen, the larger the image is and the greater the separation of the projected IR spots will be. Moreover, the spot separation will scale in proportion with the image size. In this case, the spot separation can be used to directly calibrate the handset to the screen and no additional pointing and screen calibration, as described in the co-pending applications referenced above, will be required during setup. This is because the ratio of the marker separation and screen size is always the same. Ordinarily, using the standard fixed-marker arrangement, some type of initial pointing calibration is needed because there is no a priori knowledge of the screen size and no fixed relationship between the marker separation (i.e., size of the POD) and the screen dimensions. Furthermore, by pointing the IR laser beams to the center of the screen, the Wavlt system's angular operational range will be maximized. This is preferable to placing the spots above or beneath the screen which is the practical limitation of a normal emission-based POD that is not allowed to block the screens image.

[0054] Other configurations for the lasers and IR spots are possible. For example, according to one embodiment, the lasers may be mounted such that the divergence between the beams is in the vertical plane instead of the horizontal plane, thus producing spots that are arranged vertically on the screen, or any combination of horizontal and vertical displacement between the beams (e.g., spots arranged diagonally on the screen) is also possible. Other geometries include ones in which the beams from the two lasers cross and then diverge before reaching the screen or simply converge from the p-POD and do not cross before hitting the screen.

[0055] The infrared spots that are projected onto a normal screen will tend to scatter in the same way that the visible light does. This means that for normal screens there will be near Lambertian scatter of the incident light from the p- POD, which will mean that the spots will be visible from very large angles to the screen. In addition, many projection screens (rear projection in particular) are designed to have asymmetric scattering of the incident light in order to increase the viewing angle, typically in the horizontal plane. Such screens will also work well with the p-POD system since, in general, similar non-uniform scattering will increase the operational region (both in angle and distance from the screen) for the Wavlt system.

[0056] Another benefit of projected spots is that they are relatively easily tailored without the same physical constraints that govern the design of a regular emissive POD. For example, according to certain embodiments, the projected spots can be made bigger without impacting the size of the p-POD. Shaped spots, such as lines and crosses can be projected into various corners of the screen. Larger and shaped spots may be more easily resolved by the detection system in the Handset and thus provide additional information. Multiple spots could be arranged in a circle. The benefit of this is, for example, that if the spots are arranged into a large circle, then the aspect ratio of the circle, which is more accurately determined, could help the Wavlt Handset determine its location in the room. For example, a tall oval shape would indicate that the user is positioned to the left or right of the screen and not directly in front. A flat oval would indicate that he is positioned above or below the screen.

[0057] Multiple arrangements of spots (and/or their powers) can be used to break the symmetry to allow for discerning whether the user is to the right or left of the screen. For example, a four-marker arrangement in which the separation between the markers on the screen is a non-negligible fraction of the user's distance from the screen can permit the distinction between left and right view angle. This can be done using one or more of at least two properties of the markers — their apparent separation (the spot separation on the Handset sensor) and their detected powers. In essence, the pair of markers that are closer to the Handset will both have a larger separation and stronger signal strengths. The side of the screen on which the user is located will determine which pair of spot images is stronger and/or farther apart. The procedures for determining various degrees of freedom using multiple marker configurations are described in detail in the pending applications previously referenced. Returning to the two spot geometry, it is also now more feasible to place the spots much farther apart (perhaps even the length the screen). In addition to permitting better view angle resolution as described above (by using the difference in the received power between the pairs of spots), this has the benefit of improving the resolution of the measurement of distance from the Wavlt to the screen since the Handset uses the spot separation to gauge distance from Handset to the screen. It also improves the resolution of the roll angle (the angle of rotation of the Wavlt about the axis perpendicular to the sensor surface).

[0058J In one embodiment, the Wavlt handset can be modified to have a very narrowband laser-line band-pass 980 nm filter. This allows light only of very close spectral proximity to 980 nm to pass through to the image sensor. A continuous- wave (CW) 980nm laser usually has a bandwidth of much less than 1 nm, in comparison to an LED, whose spectrum normally spans > 10 nm. This means that the system can be made much more optically robust to spurious light sources, such as room lights, essentially increasing the inherent signal to noise ratio of the system.

[0059] In one embodiment, the p-POD contains two 35mW 980nm diode laser sources that each use ~70mA with an operating voltage of 3.2V. This is well within the power limit of the USB port of a PC which can support 50OmA at 5V. [0060] While it is noted that an alternative arrangement is possible in which the laser is placed in the Handset and/or the camera is fixed on the table or onto the projector, this "reverse" configuration suffers from a few weaknesses: 1) it does not lend itself very well to non-interfering robust multi-user operation because it will be difficult to distinguish which spots belong to which user without a much more computationally intensive and expensive image analysis system, and 2) it involves placing lasers in a handset where free hand motion is more likely to direct potentially dangerous laser beams into unsuspecting eyes, 3) the lasers consume more power (~150mA at 3.3V, or ~500mW) and hence would require much more regular recharging when compared to the <20mA (or -~70mW) for a Wavlt optical tracking handset, 4) the lasers are significantly more expensive than an optical sensor, and 5) the space requirement for two lasers is generally greater than that for a single image sensor and thus would add to the size of the Handset.

[0061] According to one embodiment of the invention, the p-POD Wavlt system is compatible with the regular Wavlt POD system in that no modifications need be made to the Wavlt Handset. This also means that all the regular benefits of Wavlt tracking system apply. For example, it allows for non-interfering robust multi-user operation, and the handset remains relatively low cost, low power, and robust with a very small form factor.

[0062] Note that, although we have discussed primarily the use of a p-POD configuration for front projectors, it is clear to those with ordinary skill in the art that the same principle applies equally to a rear-projection TV system, in which the lasers, or other projected IR light sources, are mounted inside the TV and the spots are projected onto the back-side of the TV screen along with the normal projected picture.

[0063] Figures 7A and 7B show a rear projection pointing system using a p- POD according to certain embodiments. The side view is shown in Figure 7A and the top view is shown in Figure 7B. There are several different optical arrangements that can be used for rear projection video systems. Figures 7A and 7B illustrate the use of the p-POD in a conventional folded geometry, a standard arrangement for rear projection systems. In order to preserve space behind the screen, according to one embodiment, mirrors may be used to direct the expanding cone of light from the projector onto the back of the viewing screen as shown in Figure 7A. Light incident on the screen is scattered in the general direction of the viewer. This is the basic operational principle for many large screen rear projection televisions (RPTVs).

[0064] In traditional RP systems, the screen is designed to scatter the light isotropically. More recently, systems are designed so that the front surface of the viewing screen scatters the incident light asymmetrically in the horizontal and vertical directions. This is done in order to make more efficient use of the available light since typical viewers will not be located at large vertical angles with respect to the screen. Advanced technologies such as Fresnel lenses and lenticular arrays are used to produce asymmetrically scattering screens. [0065] There are several different arrangements that could be employed for the incorporation of the p-POD based absolute pointing device into an RP system. The one shown in Figures 7A and 7B is essentially the equivalent of that shown in Figures 5 and 6 for a front projection system. According to one embodiment, the p-POD is mounted adjacent to the projector or projection unit and the projected IR marker spots are positioned near the middle of the screen. In systems that employ asymmetrically scattering screens, the IR light will be scattered asymmetrically in a manner similar to the visible light, as indicated in Figures 7A and 7B.

[0066] The vision system in the Handset held by the viewer will see two spots located near the center of the screen and displaced horizontally with respect to each other, as shown in Figure 7B. The view angle for the system (wide in the horizontal plane and narrower in the vertical plane) will be similar to that for a normal viewer. Other positions for the p-POD are possible. The p-POD could be mounted just below mirror 2 and pointed directly at the back of the screen at a slight angle such that the IR spots hit near the screen center. This arrangement would reduce any losses incurred upon reflections from the mirrors and obviate the need for mirrors with high reflectivity in the IR. Alternatively, the p-POD could be placed behind one of the mirrors with the beams directed at the screen center. In this case, the mirror behind which the POD is placed would have to be transparent to the IR. Other arrangements that produce the desired spot pattern may also be possible. ^"Note that the folding mirrors are not shown in Figure 7B, but the essence of the arrangement regarding the placement of the p-POD and the IR spots is not affected. Other configurations may exist.

[0067] In addition to standard RPTV systems, laser-based projection TVs are another, recently-developed, type of display system in which the p-POD may be integrated. The main difference between standard RP and laser-based RP displays is the type of light source. Instead of filtered lamp light or, in some cases, visible LEDs, laser-based displays use lasers as the source of illumination. The main advantages of laser-based displays are higher efficiency, smaller size and weight, longer lifetime, and superior color gamut compared with conventional projection systems. Although still in their infancy, laser-based TVs and displays are anticipated to become more prevalent over the next several years due to the continuing improvements in the quality and cost of the component solid state lasers used as the sources.

[0068] Laser-based projection displays (both front and rear) are potentially ideally suited for incorporation of a laser-based p-POD as the source of the reference markers for use with a Wavlt pointing device. In typical laser-based displays, at least one of the three component visible light sources (red, green, and blue) is derived from an IR laser source. For example, typically, the blue light is obtained by frequency doubling of a near-IR laser, although in some cases, the green and/or the red are also derived from an IR source via second-harmonic generation. This is done because of the difficulty and inefficiency in generating shorter wavelength (e.g. blue) laser light directly. A typical wavelength range for the blue component is 430 nm - 490 nm, which places the fundamental wavelength in the 860 nm — 980 nm range. This light is typically not used and must be blocked or filtered out of the projection system. However, this range is nearly ideal as a near-IR source of marker light. Furthermore, because of the relatively low conversion efficiency in the frequency-doubling process (< 50%), there is available a high power source of residual ~920nm laser light that is otherwise wasted. The only additional components necessary to use the IR light may be collimating, beam steering, and beam splitting optics to separate the IR from the visible light, split it into the desired number of beams (one for each marker), shape the beams as needed, and direct them to the screen. [0069] Note that, although the use of available IR light is ideal in many ways, it may not be practical in some systems. In such cases, an external IR source may be added to the system, as shown in Figures 4 and 7. Finally note that, although integration with laser-based displays may be a natural application of the p-POD, the system will work equally well in projection displays based on conventional light sources.

[0070] The preferred embodiments described thus far involve the use of one or more IR lasers as the source of the projected marker light in the p-POD. Lasers are generally preferred primarily because of their superior optical properties. In particular, their inherent brightness (or radiance) is typically many orders of magnitude larger than for incoherent light sources such as LEDs and lamps. This fact results in the ability to more efficiently collect the emitted light from the source and project to a target (e.g., a wall or screen). In some cases, however, it may be necessary or desirable to use more conventional sources such as IREDs, either for cost or safety reasons. Because of the inherently larger radiance associated with lasers, it is not practical for an IRED-based p-POD to produce the same signal level in a Wavlt Handset as a laser-based p-POD with the same optical power. However, it is conceivable, depending on the details of the system, to design a p-POD based on IREDs or other incoherent light sources. The relevant system details include the image detection method in the Handset, the required optical power and beam size, and the distance between the p-POD and the screen. [00711 The radiance of an optical source cannot be increased during transmission through an optical system, and because the inherent radiance of LEDs is much smaller than that of lasers, it is generally difficult to achieve the same amount of detectable optical power scattered off of the screen using LEDs versus using lasers with the same source power. There are a few approaches, however, that may permit the use of LED-based p-PODs in some situations. Note that one of the relevant parameters for determining the signal level is the amount of light from the source that is scattered in the direction of the Handset and not necessarily the radiance of the source. Therefore, even though the radiance of LEDs is much smaller than that of lasers, it is still conceivable that they can be used in some situations. In most cases, what matters more than the radiance (e.g., W/cm²-Sr) is the irradiance (e.g., W/cm²) of the projected light at the target. This is because the incident light is typically scattered uniformly at the screen, regardless of the source. Therefore, depending on the required spot size, LEDs may provide sufficient signal strength. The required spot size (or range of spot sizes) will depend on several factors including the detection method, the sensor resolution, and the operating distance.

[0072] In general, because of the radiance limitations, it is impractical to maintain sufficiently small spot sizes when the p-POD is located far from the screen (typically more than ~1 meter) without sacrificing optical power. In such cases, the irradiance at the screen may be increased in one of two ways. Increasing the size of the projection optics (e.g., refractive and/or reflective elements) will decrease the target spot size, thereby increasing the irradiance. However, in many cases, the required optic size would be impractically large. It is also possible to attempt to coHimate or focus the light from the LED (or other incoherent light source) to obtain a smaller spot. However, in order to achieve a sufficiently small spot containing a significant fraction of the optical power from the LED, the required optical system would also be impractically large (in length and height). The other option is to simply increase the power projected to the target area by adding additional sources and directing their projected beams to the desired location. Of course, this approach also requires a larger effective size for the p-POD and the addition of more sources which results in higher power requirements and additional cost.

[0073] A specific example of an LED-based p-POD uses, for each marker, three separate LEDs, each of which has a narrow divergence angle (<10° is typically achievable for a 5mm diameter device). The LEDs may be oriented so that their beams have maximum overlap at the target location which may be 1 — 3 meters away depending on the details of the display system (e.g., front vs rear projection). Additional optics may be added to the system to further reduce the spot size (subject to the inherent radiance limits of the source). In general, the further the p-POD is from the screen, the larger the optics must be to maintain the same irradiance. Using a set of three standard high-power (e.g., 35mW) IREDs for each marker would result in a ~100 mW of total power contained in a spot size of ~10 cm for a screen ~1 meter away. The corresponding irradiance of ~1.3 mW/cm² is to be compared with ~7.5 mW/cm² for a 35 mw laser spot of ~2.5 cm diameter. The larger the acceptable spot size for the marker, the more feasible the LED-based p-POD approach becomes since more of the available power is used. The maximum acceptable spot size will depend on factors such as the detection method and the acceptable separation between the two projected markers. In some detection methods, the signal on the image sensor is saturated such that the detected image size is much larger than the actual image size. In such cases, broadening the marker size on the screen can be done without significantly affecting the quality of the detected signal. In fact, in some cases, the signal may be improved (larger and/or more stable) by increasing the spot size whether using a laser or LED. The upper limit on spot size is ultimately determined by the maximum operating distance of the Handset since the images on the sensor approach each other as the user moves farther away from the screen. If the spots are too large then they will become too close to each other on the sensor to resolve for sufficiently large operating distances. [0074] Figure 8 is a block diagram of a digital processing system, which may be used with one embodiment of the invention. For example, the system 800 shown in Figure 8 may be used as a computer system described above, such as, for example, a host computer, a projector, a POD, and/or a handheld device, etc. [0075] Note that while Figure 8 illustrates various components of a computer system, it is not intended to represent any particular architecture or manner of interconnecting the components, as such details are not germane to the present invention. It will also be appreciated that network computers, handheld computers, cell phones, set-top boxes, digital TVs, and other data processing systems which have fewer components or perhaps more components may also be used with the present invention. The computer system of Figure 8 may, for example, be an Apple Macintosh computer or an IBM compatible PC. [0076] As shown in Figure 8, the computer system 800, which is a form of a data processing system, includes a bus 802 which is coupled to a microprocessor 803 and a ROM 807, a volatile RAM 805, and a non-volatile memory 806. The microprocessor 803, which may be, for example, a PowerPC G4 or PowerPC G5 microprocessor from Motorola, Inc. or IBM, is coupled to cache memory 804 as shown in the example of Figure 8. Microprocessor 803 may include multiple processors or multiple core logics (e.g., logical processors). The bus 802 interconnects these various components together and also interconnects these components 803, 807, 805, and 806 to a display controller and display device 808, as well as to input/output (I/O) devices 810, which may be mice, keyboards, modems, network interfaces, printers, and other devices which are well-known in the art.

[0077] Typically, the input/output devices 810 are coupled to the system through input/output controllers 809. The volatile RAM 805 is typically implemented as dynamic RAM (DRAM) which requires power continuously in order to refresh or maintain the data in the memory. The non-volatile memory 806 is typically a magnetic hard drive, a magnetic optical drive, an optical drive, or a DVD RAM or other type of memory system which maintains data even after power is removed from the system. Typically, the non-volatile memory will also be a random access memory, although this is not required. [0078] While Figure 8 shows that the non-volatile memory is a local device coupled directly to the rest of the components in the data processing system, the present invention may utilize a non- volatile memory which is remote from the system; such as, a network storage device which is coupled to the data processing system through a network interface such as a modem or Ethernet interface. The bus 802 may include one or more buses connected to each other through various bridges, controllers, and/or adapters, as is well-known in the art. In one embodiment, the I/O controller 809 includes a USB (Universal Serial Bus) adapter for controlling USB peripherals. Alternatively, I/O controller 809 may include an IEEE-1394 adapter, also known as FireWire adapter, for controlling FireWire devices.

Direct-Point On-Demand Information Exchanges

[0079] According to certain embodiments, an on-demand exchange of information is provided that allows viewers/consumers to interact in real-time with TV programs (or other media content) in order to gather relevant information about selected objects or images in a video scene. For example, a movie scene may present a group of high-society women enjoying coffee on a balcony, when suddenly Brad Pitt brings his red Lamborghini to a screeching halt in front of the appalled women. Consider now being able to point immediately to Brad Pitt's watch and have a cursor on the screen change shape and inform you in a call-out box - "Rolex-$300", which upon further clicking instantly brings you to a website with the option to buy this watch, or other potentially useful information such as the company website, local vendors, watch types, the history of clocks, etc. Alternatively, pointing to Brad Pitt's head may call up the metadata - "Brad Pitt" with subsequent biographical data being available in the lower half of the screen. Other viewers may be more inclined to point to one of the women's dresses to be informed that this is a "Pierre Cardin blue dress - $299" and a subsequent click may show a list of similar dresses, prices, and locations (both local and online) where they may be bought. Optional features may include the pausing of the show during these information-gathering actions.

[0080] This model of embedding and retrieving data clearly fulfills the two key attributes that define a good advertising model: 1) It is an "on-demand" service that fulfills the consumers desire to be informed when and where he wants, while being "invisible" and non-invasive when the consumer wants to just enjoy the show, and T) it is relevant, personalized and targeted to the specific and immediate interests of the consumer, making it an enriching experience as well as a more efficient means of relevant information exchange.

[0081] It is evident with this pointing-based information exchange model that some degree of product placement in the media content may be required. This phenomenon is already becoming widespread. However, it is not an absolute requirement for this model because pointing to a specific object, such as a car, may bring up more generic descriptions of the object that may still lead to sponsored information about similar cars from different vendors as well as more generic information about the object.

[0082] There are several technological factors that have converged to make these concepts viable. For one, digital content can now easily carry with it the simple metadata that would be required. With the standard processing power of content players, this metadata can now easily be made to dynamically associate with various objects on the viewer's screen. Second, direct, accurate, and fast pointing, which is a critical element of the implementation and viability of this model, is starting to become widely available. For example, for PC users watching TV at their desk, the computer mouse lends itself very well to quick pointing. For Mobile devices such as Cell Phones and PDAs, touch screens are becoming ever more common and are natural tools for pointing. And finally, for the digital living room, absolute-pointer remote controls, such as vision-based devices, have become available that make pointing as easy, natural, and fast as pointing your finger. This is especially true when the content is displayed on a large, high resolution digital TV screen.

[0083] In one embodiment, data having full descriptions and hyperlinks are tagged to specific objects in moving images and the invisible hyperlinks move dynamically to continually track the associated object. In one embodiment, a pointing device can be used to point to objects in the scene, whether moving or stationary, and by appropriate action such as clicking or activating a button, be able to substantially immediately recall part or all of the metadata content that pertains to the object.

[0084] In one embodiment, the pointing device is a multi-dimensional free- space pointer where the pointing is direct and absolute in nature, similar to those described in co-pending U.S. Patent Application No. 1 1/187,435, filed July 21, 2005, co-pending U.S. Patent Application No. 1 1/187,405, filed July 21, 2005, co-pending U.S. Patent Application No. 11/187,387, filed July 21 , 2005, and co- pending U.S. Provisional Patent Application No. 60/831,735, filed July 17, 2006. The disclosure of the above-identified applications is incorporated by reference herein in its entirety.

[0085] In one embodiment this metadata is strictly informative and yields results akin to a visual search query such as "What is this that I am pointing at?" In one embodiment, the data is wholly or partially sponsored and paid for to instantiate an on-demand advertising model. In one embodiment, the payment is proportional to the frequency of the searches. In one embodiment, the "point & search" patterns of users are logged for later use in, for example, modifying and tailoring the metadata content.

[0086] Figure 9 shows an example movie scene that will be used to exemplify a video stream that may contain various objects that may or may not be stationary. Figure 9 includes three objects — a rectangle, a ball, and a cylinder that are moving around in the screen and may disappear and re-appear at various locations in space at various times. [0087] Figure 10 illustrates an example of how the objects in a TV scene may have been uniquely tagged with invisible reference (e.g., "hyperlink") areas, according to certain embodiments. Specifically, Figure 10 shows how the TV scene in Figure 10 may be over-laid with invisible "tags" or hyperlink areas that move dynamically with the objects they are associated with. The purpose of these invisible tags is to enable a user to point to objects of interest on the screen, as illustrated in Figure 1 1 which illustrates how a user may use a free-space absolute pointing device to easily point to and select an object in a scene. Note that the tags are not directly visible to the user, according to one embodiment. In one preferred embodiment the user is sitting in front of a large screen TV using a multidimensional absolute pointer, such as the Wavlt™ available from ThinkOptics, Inc., described in the above-incorporated co-pending applications, to point to the object of interest.

[0088] In one embodiment, a cursor may appear on the screen that changes color and/or shape when a valid tag or hyperlink exists. This feature is similar to that of static hyperlinks that may be embedded in certain web-page images. One difference is that now the tags are dynamically moving with the object, and may grow, shrink, and/or evolve with object size and/or shape, or may disappear and reappear with the object.

[0089] In one embodiment, the object that is pointed to may be selected by pressing a button on a remote control or pointing device. This action may subsequently log the "click" for later retrieval, or in the preferred embodiment it substantially immediately brings up on-screen information about the selected object. Figure 12 shows the results of such a visual search query according to one embodiment. In an alternative embodiment, instead of pressing a button on the remote control or pointing device, the object may be selected by pointing within the object area or within a predefined range around the area for some predetermined period of time. For example, an object may be selected by pointing at or within a radius of, for example, 50 screen pixels from the center of the object for more than, for example, 2 consecutive seconds. Alternatively, the time required for object selection may depend on the pointing location relative to a reference location within the object area. For example, if the pointed to location is within a certain number, N, of screen pixels of the center of the object, then perhaps only a fraction of a second of continuous pointing within this region may be required for object selection. However, if the pointed to location remains within a number of pixels larger than N, say 2N, then a longer continuous time, say 1 to 2 seconds, may be required before the object is selected. This approach may have advantages especially for rapidly moving objects. Other actions that do not involve pressing a button may also be used for object selection. For example, circling an object with the cursor may be interpreted as object selection. [0090] Once an object is selected, some or all of the metadata associated with the object may become immediately visible in, for example, a pop-up graphical representation or menu. Alternatively the object selection may simply be recorded for later viewing. At this point the user may choose to receive more information about the object by, for example, clicking once more inside the call- out bubble. In one embodiment all "clicks" are logged in a click-history that the user can pull up at his convenience at a later time, as illustrated in Figure 13. [0091] Figure 13 shows an example click "Search History" screen that may be called up at the user's convenience. This search history may log the location and timestamp of the click as well as the object description and possible further related actions. It is possible to select any of these to jump back to the time and place of the object in the media file, or to examine related information in more detail.

[0092] Returning now to the metadata content, it is desirable to the Service Provider that the tagging data be easy to generate, although this is irrelevant to the end user, i.e. the consumer of the service. In one embodiment, the tagging information consists of simple data files that can be specifically generated for different media content. In one embodiment this data consists of arrays of numbers arranged according to the rules laid out in Figure 14 and Figure 15. [0093] Figure 14 shows an example file format for metadata according to one embodiment. It consists of multiple arrays of data for all the potential objects available in the media content. The file contains a first column with incremental timestamps. Corresponding Object columns will contain the specific object's location in space at the corresponding time. If the object is not visible, these columns may contain "-1".

[0094] Figure 15 shows another embodiment of the metadata tagging file format. The Object columns contain additional data about the size and shape of the Object at the regular incremental time slots.

[0095] In one embodiment, the location data is generated by using a software program that allows the Service Provider to run the media content one or more times while pointing to the objects of interest. If, for example, the Service Provider simultaneously holds down specific keys on a keyboard that correspond to that object, the object's position is recorded (overwritten) in the corresponding object column. While the object is not visible on the screen, no key will be pressed and hence the default value of -1 will remain in the object column, signifying that the object is not present.

[0096] Having discussed embodiments for how different objects moving around in video content may be easily tagged with time and location stamps and stored in "tagging" files, it is useful to discuss the actual descriptive metadata itself. Figure 16 shows one embodiment of the metadata. In one embodiment this metadata is strictly informative and yields results akin to a visual search query such as "What is this that I am pointing at?". In one embodiment, the data is wholly or partially sponsored and paid for to instantiate an on-demand advertising model. Objects may have multiple sponsors. For example a generic "soda can" may be sponsored by Coke and Pepsi. Tiered pricing may also be offered to change the search ranking results. In one embodiment said payment is proportional to the recorded frequency of said visual searches, yielding a "click- model" for advertising. [0097] Figure 16 shows an example of metadata content. This metadata content may apply to different video files. By contrast the "tagging files" are specific to each file. In one embodiment all metadata and tagging data information accompanies the video media, such as the DVD or movie MPEG file. In one preferred embodiment all of this data resides on the internet. In this case software on the media player may recognize content being viewed, for example from its title, and download the appropriate metadata and tagging file. This embodiment permits more flexibility for the Service Provider to modify and update the files as appropriate. In one embodiment, the metadata relates to direct marketing information and the visual search described in this disclosure can be used as a tool for order generation, voting, subscriptions, coupons, vouchers, direct sales, etc.

[0098] Thus, methods and apparatuses for free-space multi-dimensional absolute pointer using a projection marker system have been described herein. Some portions of the preceding detailed descriptions have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. [0099] It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices. [00100] Embodiments of the present invention also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general -purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD- ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), erasable programmable ROMs (EPROMs), electrically erasable programmable ROMs (EEPROMs), magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.

[00101] The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method operations. The required structure for a variety of these systems will appear from the description below. In addition, embodiments of the present invention are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the invention as described herein.

[00102] A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory ("ROM"); random access memory ("RAM"); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc. [00103] In the foregoing specification, embodiments of the invention have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the invention as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.

Claims

CLAIMSWhat is claimed is:

1. A machine implemented method, comprising: capturing one or more optical spots projected on, or near, a display surface, the display surface displaying machine generated content capable of being manipulated via a cursor of a pointing device, wherein the one or more optical spots are captured wirelessly using a handheld device representing the pointing device; and the handheld device determining coordinates of the handheld device based on characteristics of the captured optical spots, wherein a position of the cursor of the pointing device displayed on the display surface is determined based on the coordinates of the handheld device.

2. The method of claim 1 , wherein the one or more optical spots are projected from a projection apparatus.

3. The method of claim 2, further comprising in response to the coordinates of the handheld device in view of the captured optical spots received from the handheld device, calculating the coordinates of the cursor based on the coordinates of the handheld device in view of the position of the one or more optical spots.

4. The method of claim 3, further comprising displaying a cursor on the display surface at a position based on the calculated coordinates of the cursor.

5. The method of claim 4, wherein the projection apparatus is configured to wirelessly receive the coordinates of the handheld device and to calculate the coordinates of the cursor.

6. The method of claim 4, wherein the projection apparatus is physically separated from wireless receiving electronics that receive the coordinates of the handheld device and calculate the coordinates of the cursor.

7. The method of claim 2, wherein the projection apparatus comprises one or more coherent sources of optical radiation.

8. The method of claim 2, wherein the projection apparatus comprises one or more incoherent sources of optical radiation.

9. The method of claim 1, wherein the one or more optical spots are displayed according to a predetermined optical frequency such that the one or more optical spots are invisible to a user looking at the display surface.

10. The method of claim 1 , wherein the one or more optical spots are captured using a camera embedded within the handheld device.

11. The method of claim 1 , further comprising calibrating initial coordinates of the handheld device by pointing the handheld device to one or more predetermined locations within or near a display area in view of positions of the one or more optical spots.

12. A machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform a method, the method comprising: capturing one or more optical spots projected on, or near, a display surface, the display surface displaying machine generated content capable of being manipulated via a cursor of a pointing device, wherein the one or more optical spots are captured wirelessly using a hand-held device representing the pointing device; and the handheld device determining coordinates of the handheld device based on characteristics of the captured optical spots, wherein a position of the cursor of the pointing device displayed on the display surface is determined based on the coordinates of the handheld device.

13. The machine-readable medium of claim 12, wherein the one or more optical spots are projected from a projection apparatus.

14. The machine-readable medium of claim 13, wherein the method further comprises in response to the coordinates of the handheld device in view of the captured optical spots received from the handheld device, calculating the coordinates of the cursor based on the coordinates of the handheld device in view of the position of the on or more optical spots.

15. The machine-readable medium of claim 14, wherein the method further comprises displaying a cursor on the display surface at a position based on the calculated coordinates of the cursor.

16. The machine-readable medium of claim 15, wherein the projection apparatus is configured to wirelessly receive the coordinates of the handheld device and to calculate the coordinates of the cursor.

17. The machine-readable medium of claim 16, wherein the projection apparatus is physically separated from wireless receiving electronics that receive the coordinates of the handheld device and calculate the coordinates of the cursor.

18. The machine-readable medium of claim 15, wherein the projection apparatus comprises one or more coherent sources of optical radiation.

19. The machine-readable medium of claim 15, wherein the projection apparatus comprises one or more incoherent sources of optical radiation.

20. The machine-readable medium of claim 12, wherein the one or more optical spots are displayed according to a predetermined optical frequency such that the one or more optical spots are invisible to a user looking at the display surface.

21. The machine-readable medium of claim 12, wherein the one or more optical spots are captured using a camera embedded within the handheld device.

22. The machine-readable medium of claim 12, wherein the method further comprises calibrating initial coordinates of the handheld device by pointing the handheld device to one or more predetermined locations within a display area in view of positions of the one or more optical spots.

23. A presentation system comprising: a projection apparatus to project one or more optical spots on a display surface for displaying machine generated content capable of being manipulated via a cursor of a pointing device; a handheld device to wirelessly capture the projected optical spots from the display surface; and a control unit communicatively coupled to the projection apparatus and the handheld device to determine coordinates of the cursor based on characteristics of the captured optical spots.

24. The system of claim 23, wherein the one or more optical spots are projected in a predetermined optical frequency such that the one or more optical spots are invisible to a user of the display surface.

25. The system of claim 24, wherein the handheld device comprises a camera to capture the one or more optical spots.

26. The system of claim 25, wherein the handheld device is configured to determine coordinates of the handheld device based on the captured one or more optical spots, and wherein the coordinates of the cursor are determined based on the coordinates of the handheld device wirelessly received from the handheld device.

27. A computer implemented method, comprising: associating metadata with an object of a media stream having one or more frames, the metadata having information describing the object, including a location of the object within each frame; dynamically tracking a pointed to location of a pointing device having a free-space multi-dimensional absolute pointer when a particular frame of the media stream is displayed; and in response to an activation of the pointing device when the pointed to location of the pointing device is within a predetermined proximity of the object, retrieving and presenting the information from the metadata associated with the object.

28. The method of claim 27, further comprising providing metadata for the object for each frame of the media stream prior to displaying the media stream, wherein the metadata is invisible to a viewer of the media stream.

29. The method of claim 28, wherein the media stream comprises a digital movie.

30. The method of claim 27, wherein the metadata of the object further includes a hyperlink which when the activation of the pointing device is detected, additional information is retrieved and displayed from a remote facility via the hyperlink.

31. The method of claim 30, wherein the metadata of the object further comprises a description about the object and a cost to purchase the object from the remote facility.

32. The method of claim 31 , wherein the metadata of the object comprises multiple hyperlinks and wherein different information is retrieved from multiple remote facilities via the hyperlinks to enable a viewer to compare the retrieved information.

33. The method of claim 27, further comprising determining the coordinates of the pointing device based on an orientation and/or location of the pointing device with respect to one or more reference markers located at a fixed location with respect to the display area.

34. The method of claim 33, wherein the pointing device includes a pixelated sensor and a wireless transceiver wirelessly communicating with a receiver that is connected to the display, and wherein the pointing device calculates its orientation and/or location based on information from the pixelated sensor.

35. The method of claim 34, wherein the pointing device wirelessly transmits the calculated orientation and/or location to the receiver to enable a controller coupled to the receiver to determine an absolute location pointed to by the pointing device within the display area.

36. A machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor perform a method, the method comprising: associating metadata with an object of a media stream having one or more frames, the metadata having information describing the object, including a location of the object within each frame; dynamically tracking a pointed to location of a pointing device having a free-space multi-dimensional absolute pointer when a particular frame of the media stream is displayed; and in response to an activation of the pointing device when the pointed to location of the pointing device is within a predetermined proximity of the object, retrieving and presenting the information from the metadata associated with the object.

37. The machine-readable medium of claim 36, wherein the method further comprises providing metadata for the object for each frame of the media stream prior to displaying the media stream, wherein the metadata is invisible to a viewer of the media stream.

38. The machine-readable medium of claim 37, wherein the media stream comprises a digital movie.

39. The machine-readable medium of claim 36, wherein the metadata of the object further includes a hyperlink which when the activation of the pointing device is detected, additional information is retrieved and displayed from a remote facility via the hyperlink.

40. The machine-readable medium of claim 39, wherein the metadata of the object further comprises a description about the object and a cost to purchase the object from the remote facility.

41. The machine-readable medium of claim 40, wherein the metadata of the object comprises multiple hyperlinks and wherein different information is retrieved from multiple remote facilities via the hyperlinks to enable a viewer to compare the retrieved information.

42. The machine-readable medium of claim 36, wherein the method further comprises determining the coordinates of the pointing device based on an orientation and/or location of the pointing device with respect to one or more reference markers located at a fixed location with respect to the display area.

43. The machine-readable medium of claim 42, wherein the pointing device includes a pixelated sensor and a wireless transceiver wirelessly communicating with a receiver that is connected to the display, and wherein the pointing device calculates its orientation and/or location based on information from the pixelated sensor.

44. The machine-readable medium of claim 43, wherein the pointing device wirelessly transmits the calculated orientation and/or location to the receiver to enable a controller coupled to the receiver to determine an absolute location pointed to by the pointing device within the display area.

45. A data processing system, comprising: a processor; and a memory for storing instructions, which when executed from the memory, cause the processor to perform a method, the method including associating metadata with an object of a media stream having one or more frames, the metadata having information describing the object, including a location of the object within each frame, dynamically tracking a pointed to location of a pointing device having a free-space multi-dimensional absolute pointer when a particular frame of the media stream is displayed, and in response to an activation of the pointing device when the pointed to location of the pointing device is within a predetermined proximity of the object, retrieving and presenting the information from the metadata associated with the object.

46. The system of claim 45, wherein the method further comprises providing metadata for the object for each frame of the media stream prior to displaying the media stream, wherein the metadata is invisible to a viewer of the media stream.