US20240038084A1

US20240038084A1 - Systems and methods to interact with discoverable objects by pointing a light beam using a handheld device

Info

Publication number: US20240038084A1
Application number: US18/201,094
Authority: US
Inventors: Lewis James Marggraff; Nelson George Publicover
Original assignee: Kibeam Learning Inc
Current assignee: Kibeam Learning Inc
Priority date: 2022-07-29
Filing date: 2023-05-23
Publication date: 2024-02-01
Also published as: US20240036619A1; US12033531B2

Abstract

Systems and methods are described in which one or more attributes or cues related to a discoverable object are conveyed to a user of handheld device via audible, haptic and/or visual means. Using a light beam generated by the handheld device and pointed in the same direction as a device camera, the user may point toward one or more objects in the user's environment believed to be associated with the one or more attributes or cues. Based on the region being pointed at within camera images, the handheld device may classify objects within images to determine if an object matches one or more templates or classifications of the discoverable object. A match, or mismatch, of objects being pointed at may determine subsequent actions. Systems and methods may provide simple and intuitive methods for human-machine interactions that may be particularly well-suited for learners.

Description

RELATED APPLICATION DATA

The present application claims benefit of co-pending provisional application Ser. No. 63/441,731, filed Jan. 27, 2023, the entire disclosure of which is expressly incorporated by reference herein.

TECHNICAL FIELD

The present application relates generally to systems and methods for an individual to identify a real or virtual object using a light beam projected from a handheld electronic device. Although the handheld device may be used by anyone, it may be particularly well-suited for use by a young child or learner, utilizing simple interactive signaling that lacks requirements for precision manual dexterity and/or understanding screen-based interactive sequences. Systems and methods herein employ techniques within the fields of mechanical design, electronic design, firmware design, computer programming, inertial measurement units (IMUs), optics, computer vision (CV), ergonometric (including child safe) construction, human motor control and human-machine interaction. Systems and methods may provide a user, especially a child or learner, with a familiar machine interface to instinctively and/or confidently indicate the selection of a discoverable object from a plurality of viewable objects.

BACKGROUND

In recent years, the world has become increasingly reliant on portable electronic devices that have become more powerful, sophisticated and useful to a wide range of users. However, although children may rapidly embrace using some aspects of electronics designed for more experienced users, young children may benefit from having access to interactive electronic devices that are small, light-weight, colorful, playful, informative, ergonomically designed for children (including being child safe), and easy to use. The systems and methods disclosed herein make use of advances in the fields of optics that include visible (i.e., frequently referred to as “laser”) pointers, mobile sound and/or vibration generation (employing miniature coils, piezoelectric elements and/or haptic units), portable displays, inertial measurement units (sometimes also referred to as inertial motion units), and telecommunications.
The beam of a visible light pointer (also referred to as a “laser pen”), typically used within business and educational environments, is often generated by a lasing diode with undoped intrinsic (I) semiconductor between p (P) and n (N) type semiconductor regions (i.e., a PIN diode). Within prescribed power levels and when properly operated, such coherent and collimated light sources are generally considered safe. Additionally, if directed at an eye, the corneal reflex (also known as the blink or eyelid reflex) ensures an involuntary aversion to bright light (and foreign bodies).
However, further eye safety may be attained using a non-coherent, light-emitting diode (LED) source. Such non-coherent sources (so-called “point-source” LEDs) may be collimated using precision (e.g., including so-called “pre-collimating”) optics to produce a light beam with minimal and/or controlled divergence. Point-source LEDs may, if desired, also generate a beam composed of a range of spectral frequencies (i.e., compared with the predominantly monochromatic light produced by a single laser).
Speakers associated with televisions, theaters and other stationary venues generally employ one or more electromagnetic moving coils. Within handheld and/or mobile devices, the vibrations of a miniature speaker may be produced using similar electromagnetic coil approaches and/or piezoelectric (sometimes referred to as “buzzer”) designs. Vibrations (e.g., particularly those associated with alerts) may also be generated by a haptic unit (also known as kinesthetic communication). Haptic units generally employ an eccentric (i.e., unbalanced) rotating mass or piezoelectric actuator to produce vibrations (particularly at the low end of the audio spectrum) that can be heard and/or felt.
Visual displays or indicators may be composed of any number of monochromatic or multi-colored, addressable light-sources or pixels. Displays may range from a single light source (e.g., illuminating an orb, transmitted via a waveguide), to those that are capable of displaying a single number (e.g., seven-segment display) or alphanumeric character (e.g., a five-pixel by eight-pixel array), to high-resolution screens with tens of millions of pixels. Regardless of scale, displays are typically implemented as: 1) a two-dimensional array of light sources (most frequently some form of light-emitting diodes (LEDs) including organic LEDs (OLEDs), or 2) two plates of polarized glass that sandwich liquid crystal material (i.e., forming a liquid crystal display, LCD) that responds to an electric current to allow different wavelengths of light from one or more illumination sources (i.e., a backlight) to pass.
Inertial measurement unit (IMU), accelerometer and/or magnetometer tracking may incorporate any or all combinations of: 1) linear accelerometers measuring forces generated during movement (i.e., governed by Newton's second law of motion) in up to three axes or dimensions, 2) gyroscope-based sensing of rotational rates or velocities in up to three rotational axes, 3) magnetometers measuring magnetic field (i.e., magnetic dipole moment) including fields generated by the earth, and/or 4) the gravitational pull of the earth (including gravitational orientation) by measuring forces on an internal mass. The accuracy of IMUs, accelerometers and magnetometers varies widely, depending on size, operating range, compensating hardware that may be used for correction of measurements (affecting cost), environmental factors including thermal gradients, the availability of individual device calibrations, and times required to perform measurements (including integration times for some types of measurements).
Advances in both electronics (i.e., hardware), standardized communications protocols and allocation of dedicated frequencies within the electromagnetic spectrum have led to the development of a wide array of portable devices with abilities to wirelessly communicate with other, nearby devices as well as large-scale communications systems including the World Wide Web and the metaverse. Considerations for which protocols (or combinations of available protocols) to employ within such portable devices include power consumption, communication range (e.g., from a few centimeters to hundreds of meters and beyond), and available bandwidth.
Currently, Wi-Fi (e.g., based on the IEEE 802.11 family of standards) and Bluetooth (managed by the Bluetooth Special Interest Group) are used within many portable devices. Less common and/or older communications protocols within portable devices in household settings include Zigbee, Zwave, and cellular- or mobile phone-based networks. In general (i.e., with many exceptions, particularly considering newer standards), compared with Bluetooth, Wi-Fi offers a greater range, greater bandwidth and a more direct pathway to the internet. On the other hand, Bluetooth, including Bluetooth Low Energy (BLE), offers lower power, a shorter operational range (that may be advantageous in some applications), and less complex circuitry to support communications.
Advances in miniaturization, reduced power consumption and increased sophistication of electronics, including those applied to displays, IMUs, and telecommunications have revolutionized the mobile device industry. Such portable devices have become increasingly sophisticated, allowing users to concurrently communicate, interact, geolocate, monitor exercise, track health, be warned of hazards, capture videos, perform financial transactions, and so on. Systems and methods that facilitate simple and intuitive interactions with a handheld pointing device may be useful.

SUMMARY

In view of the foregoing, systems and methods are provided herein that describe a light-weight, simple-to-use and intuitive handheld device that may be particularly well-suited for machine-based interactions by a child or other learner. Although the device may, in part, be accepted by a child as a toy, the computational flexibility embedded within the device may allow the device to be used as a means for play, embodied learning, emotional support, cognitive development, communications, expressing creativity, developing mindfulness and/or enhancing imagination.
The handheld device may aid in areas related to literacy, mathematics, understanding of science and technology, basic reading, dialogic reading and CROWD (i.e., Completion, Recall, Open-ended, WH-, and Distancing) questioning. The handheld device may enhance a learner's ZPD (i.e., Zone of Proximal Development) by providing continuous (machine-based) educational support. Additionally, a portable, light-weight, “fun” handheld device may motivate physical movement by a child (and adults) including kinetic and kinesthetic activities.
According to one aspect, systems and methods are provided for an individual to identify a real or virtual object, from a plurality of viewable objects, based on one or more audible, haptic and/or visual prompts and/or cues generated by the handheld electronic device. Prompts and/or cues about a discoverable object may be based on one or more distinctions, attributes and/or associations related to the discoverable object. Once prompts and/or cues are provided, the handheld device user may point a light beam emanating from the device toward an object that is viewable in the user's environment that, from the perspective of the device user, best associates with the prompt(s) and/or cue(s).
A handheld device camera pointed in the same direction as the light beam source may acquire one or more images of objects being pointed at with the light beam. One or more classifications of objects pointed at within camera images may then be compared with one or more templates and/or classifications related to the discoverable object to determine whether a match with the prompt(s) and/or cue(s) is present.
The one or more audible, haptic and/or visual prompts or cues may include an aspect that a particular individual might know (e.g., proper names and/or birthdays of family members or pets), or that anyone might more generally associate with the discoverable object (e.g., common names, motions, sounds, functions, pairings with other objects, typical colorations). In the case of a child, particular attention may be paid to the age-appropriateness and/or educational level during presentation of prompts and/or cues by the handheld device.
As examples, audible prompts or cues may include: sounds or sound effects typically produced by a discoverable object, verbal descriptions of activities using the object, an enunciated name (including proper name) of the object, statements about the object, one or more questions about a function of the object, a verbal description of one or more object attributes, a verbal quiz in which the discoverable object is an answer, and so on. Haptic attributes or cues may include vibrations that may be synchronous (or at least at similar frequencies) to motions or sounds typically generated by the discoverable object.
A visual prompt or cue may comprise: displaying a name of the discoverable object, an image of the object, an outline of the object, a caricature of the object, a name of an object category, one or more colors of the object (e.g., displaying a swatch with actual colors and/or displaying one or more words that describe colors), a size of the object (e.g., relative to other viewable objects), a sentence or question about the object, an object absent from a known sequence of objects (e.g., a letter within an alphabet), displaying a fill-in-the-blank phrase in which the object is a missing component, and so on. Visual prompts or cues may be displayed on one or more handheld device displays, projected (including scrolling) within the handheld device beam and/or displayed on one or more separate (digital) screens.
The handheld device may acquire an image that includes a viewable object being pointed at by the handheld device user. The viewable object may be located within the image based on the presence of a (spot or regional) reflection off the viewable object generated by the light beam.
Alternatively, a location of the selected object within a camera image may be based on knowing the beam location as a result of the light beam source and camera being pointed in the same direction. In other words, the location of a beam, may be computed based on geometry (i.e., knowing the origin locations and pointing directions of both the light beam source and camera) even if the light beam were turned off. In this case, the beam simply serves as a visual indicator to the device user of the region within the handheld device camera's field-of-view where an object being pointed at may be compared with a template of the discoverable object.
The selected object may then be classified using CV methods (e.g., neural network, template matching). If a bounding region of a selected object (and, optionally, any other viewable objects within camera images) is sufficiently correlated with a template and/or classification related to the discoverable object, then a match with the discoverable object may be declared.
The presence (or lack) of a match may be indicated to the handheld device user using visual (e.g., using one or more device displays and/or by modulating the light beam source), audible (e.g., using a device speaker) and/or vibrational (e.g., using a device haptic unit) feedback. The handheld device may perform additional actions based on a match (or not) with the discoverable object including transmitting the one or more prompts and/or cues, selection timing, and identity (optionally, including the acquired image) of a discovered object (or, lack of discovery) to one or more remote processors that may enact further actions.
In accordance with an example, a method is provided to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, and a device speaker operatively coupled to the device processor, the method comprising: playing, by the device speaker, one or more audible cues related to the discoverable object; acquiring, by the device camera, a camera image when the handheld device is manipulated by the human such that a projected light beam points from the device light beam source to one or more visible objects; isolating, by the device processor, one or more indicated objects at the beam location region within the camera image; and determining, by the device processor, whether one or more of the one or more indicated objects match a predetermined template of the discoverable object.
In accordance with another example, a method is provided to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, and a device speaker operatively coupled to the device processor, the method comprising: playing, by the device speaker, one or more audible cues related to the discoverable object; acquiring, by the device camera, a camera image when the handheld device is manipulated by the human such that a projected light beam points from the device light beam source to one or more visible objects; isolating, by the device processor, one or more indicated objects at the beam location region within the camera image; and determining, by the device processor, whether one or more of the one or more indicated objects match a predetermined template of the discoverable object.
In accordance with a further example, a method is provided to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, and one or more device displays operatively coupled to the device processor, the method comprising: displaying, by the one or more device displays, one or more visual cues related to the discoverable object; acquiring, by the device camera, a camera image when the handheld device is manipulated by the human such that a projected light beam points from the device light beam source to the one or more visible objects; isolating, by the device processor, one or more indicated objects at the beam location region within the camera image; and determining, by the device processor, whether one or more of the one or more indicated objects match a predetermined template of the discoverable object.
In accordance with yet a further example, a method is provided to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, and a device haptic unit operatively coupled to the device processor, the method comprising: producing, by the device haptic unit, sensed haptic vibrations at one or more haptic frequencies related to one or more of motions and sounds associated with the discoverable object; acquiring, by the device camera, a camera image when the handheld device is manipulated such that a projected light beam points from the device light beam source to the one or more visible objects; isolating, by the device processor, one or more indicated objects at the beam location region within the camera image; and determining, by the device processor, whether one or more of the one or more indicated objects match a predetermined template of the discoverable object.
In accordance with yet another example, a method is provided to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, a device speaker operatively coupled to the device processor, and a device inertial measurement unit operatively coupled to the device processor, the method comprising: playing, by the device speaker, one or more audible cues related to the discoverable object; acquiring, by the device camera, one or more camera images when the handheld device is manipulated such that a projected light beam points from the device light beam source to the one or more visible objects; acquiring, by the device inertial measurement unit, inertial measurement data; determining, by the device processor, that the inertial measurement data include one or more of: device movement magnitudes that are less than a predetermined maximum movement threshold for a predetermined minimum time threshold, a predetermined handheld device gesture motion, and a predetermined handheld device orientation; isolating, by the device processor, a most recent camera image from the one or more camera images; isolating, by the device processor, one or more indicated objects at the beam location region within the most recent camera image; and determining, by the device processor, whether one or more of the one or more indicated objects match a predetermined template of the discoverable object.
Other aspects and features including the need for and use of the present invention will become apparent from consideration of the following description taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete understanding may be derived by referring to the Detailed Description when considered in connection with the following illustrative figures. In the figures, like-reference numbers refer to like-elements or acts throughout the figures. Presented examples are illustrated in the accompanying drawings, in which:

FIG. 1 illustrates exemplary manipulation of a handheld device by a child to identify and select, based on audible cues, a dog within a scene from a book (containing several additional characters and objects) by pointing a light beam emanating from the handheld device toward the canine form.

FIG. 2 shows exemplary manipulation of a handheld device by a child to identify a cat, based on visual cues, from images displayed on a tablet device by pointing a light beam toward the feline form.

FIG. 3 is a flow diagram illustrating exemplary steps using a handheld device to discover a cat (e.g., stuffed toy, screen image or real cat) as a result of displaying the word “CAT”, by pointing a light beam at the cat.

FIG. 4 is a flow diagram, similar to FIG. 3 , illustrating exemplary steps using a handheld device to discover, based on audible cues, a hammer from among a set of construction tools by pointing a light beam at the hammer.

FIG. 5 is an electronic schematic and ray diagram showing exemplary selected components of the generation, light path and detection by a camera of a discoverable (but camouflaged) object (i.e., a candy cane hidden within the form of a snake) within a book using a light beam emanating from a handheld device.

FIG. 6 is an exploded-view drawing of an exemplary handheld device showing locations, pointing directions, and relative sizing for a light beam source and camera.

FIG. 7 is an exemplary interconnection layout of components within a handheld device (in which some components may not be used during some applications) showing predominant directions for the flow of information relative to an electronic bus structure that forms an electronic circuitry backbone.

FIG. 8 is a flow diagram illustrating exemplary steps following display of a text-based cue (i.e., the word “CAT”), in which beam dwell time measured using an IMU is used to indicate a selection based on a camera image acquired prior to significant movement of the handheld device.

FIG. 9 is a flow diagram illustrating exemplary steps, following an audible prompt to choose an article of clothing, in which handheld device orientation while pointing a light beam is used to indicate selection of a long sleeve shirt.

DETAILED DESCRIPTION

Before the examples are described, it is to be understood that the invention is not limited to particular examples described herein, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular examples only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. It must be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a compound” includes a plurality of such compounds and reference to “the polymer” includes reference to one or more polymers and equivalents thereof known to those skilled in the art, and so forth.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Certain ranges are presented herein with numerical values being preceded by the term “about.” The term “about” is used herein to provide literal support for the exact number that it precedes, as well as a number that is near to or approximately the number that the term precedes. In determining whether a number is near to or approximately a specifically recited number, the near or approximating unrecited number may be a number which, in the context in which it is presented, provides the substantial equivalent of the specifically recited number.
According to one aspect, devices, systems and methods are provided in which an individual uses a light beam generated by a handheld device to select (i.e., point at, identify, and/or indicate) a real or virtual object, from a plurality of viewable objects in the environment of the individual, in response to one or more audible, haptic and/or visual prompts and/or cues generated by the handheld electronic device. Prompts or cues may include an aspect that a particular individual might know (e.g., the proper names of family members or colleagues), or that anyone might more generally associate with the identity, properties, functions, related objects and/or other attributes of a discoverable object.
In addition to physical objects within the environment of the user, discoverable objects may be displayed on paper (e.g., within a book, brochure, newspaper, handwritten note or magazine) or one or more other surfaces (e.g., on a book cover, box, sign, poster), or on an electronic display (e.g., tablet, television, mobile phone or other screen-based device). Such discoverable objects (and their display media) may be made “interactive” using the handheld device, particularly while reading by, and/or to, a child. For example, the reading of a book may be brought to life by adding queries, questions (for both a parent or guardian, and the child), additional related information, sounds, sound effects, audiovisual presentations of related objects, real-time feedback following discoveries, and so on.
During generation of one or more attributes and/or cues broadcast by the handheld device, attention may be paid to a range of individual considerations including age-appropriateness and/or educational level of the device user, an individual's hobbies and/or interests, the educational and/or entertainment value of the discoverable objects, whether an object is anticipated as a next discovery within a serial sequence (e.g., alphabetical or numerical sequence), and so on. Such considerations may be known by the handheld device and/or by one or more remote processors formulating prompts and/or cues conveyed to the handheld device.
According to further aspects, the handheld device and/or a remote processor may simultaneously perform ongoing, real-time assessments of engagement, language skills, reading abilities and/or comprehension. Assessment metrics may, for example, include the measured times a child spends interacting, success rates in discovering objects based on attributes or cues (particularly within different topic areas and/or potential areas of interest such as sports, science or art), times required to make pointing-based discoveries (e.g., often related to attention and/or interest), rates of overall progression when “discovering” objects within serial content such as a book or magazine, and so on.
Such assessments may be compared with prior interactions by the same individual (e.g., to determine progress in particular topic areas), interactions using the same or similar cues by others (e.g., at the same age, cultural environment or educational level), and/or performance among different groups (e.g., comparing geographic, economic and/or social clusters). Milestone responses demonstrating various aspects of cognitive processing (e.g., first indications involving distinguishing colors, differentiating phonemes and/or words, understanding numbers of objects, performing simple mathematical operations, gesture responses requiring controlled motor functions) may be particularly useful in monitoring childhood development, assessing if more challenging cues (and discoverable objects) might be presented, and/or enhancing engagement. Auditory, tactile and/or visual acuity may also be monitored in an ongoing manner.
Within further examples, audible attributes or cues may include: sounds or sound effects typically produced by the discoverable object, sounds associated with descriptions of activities using the object, an enunciated name (including a proper name) of the object, a portion of a name or even a single letter (e.g., begins with a letter) associated with the object, a statement about the object and/or its function, a verbal description of one or more object attributes, a question about a function and/or object attributes, a musical score related to the object, a quotation or saying completed by the object, a verbal quiz in which the discoverable object is an answer, and so on.
Visual attributes or cues may include: displaying a name of the discoverable object, an image or drawing of the discoverable object or one similar in appearance, an image of an object within a class or category of objects, an outline of the object, a caricature of the object, the name of an object or object category, a portion of a word or letter (e.g., first letter) associated with the spelling of the object, one or more colors of the object (e.g., displaying a swatch with actual colors and/or one or more words that describe colors), a size of the object (e.g., particularly relative to other viewable objects), a sentence or question about the object, a mathematical problem in which the (numeric or symbolic) object is a solution, an object absent from a known sequence of objects (e.g., a letter of an alphabet), a fill-in-the-blank phrase in which the object is a missing component, a question or prompt related to the prompts taught in the acronym CROWD (Completion prompts, Recall prompts, Open-ended prompts, Wh-prompts [where, when, why, what, who], Distancing Prompts) for dialogic reading, and so on. Cues, prompts, or questions may be generated in accordance with the principles of Vygotsky's Zone of Proximal Development to optimize learning for a user.
Haptic attributes or cues may include vibrations that may be synchronous (or at least produced at similar frequencies) to motions or sounds typically generated by the discoverable object. As an example, haptic vibrations pulsing roughly once per second may be associated with an image of a heart. Additionally, any combinations of visual, audible and/or haptic cues may be generated by the handheld device or delivered (e.g., if generated by a remote processor) simultaneously or as a closely timed sequence.
Within further examples herein, the handheld device may acquire a camera-based image that includes a viewable object being pointed at by the individual. The viewable object may be located within the image based on the presence of a reflection off the viewable object generated by the light beam. Alternatively, a location of the selected object within a camera image may be based on knowing the location of the light beam projection when the light source and camera are pointed in the same direction. In other words, the location of a light beam within a camera image may be computed mathematically based on geometry similar to parallax (i.e., knowing the locations of both the light beam source and camera) even if a beam were turned off.
The selected object (and, optionally, any other viewable objects in the region within the camera image) may then be isolated using CV methods (e.g., template matching, neural network classification). Predetermined templates of discoverable objects may include visual properties such as shapes, sizes, profiles, patterns, colors, and/or textures of the discoverable object. If any of the one or more classifications of the selected object (and, optionally, any other viewable objects in the field-of-view of a camera image) matches one or more templates or classifications of the discoverable object, then discovery may be declared.
Databases that include templates of discoverable objects may also contain one or more of the prompts and/or cues associated with each object (just described), one or more alternative (e.g., backup) prompts or cues if an object is not quickly discovered, one or more actions performed if discovery is not attained for a predetermined period, and/or one or more actions (e.g., by the handheld device or by a remote processor) performed upon successfully identifying the discoverable object, as described in greater detail below.
Within further examples herein, prompts and/or cues to initiate and/or guide handheld device interactions toward discovery may include scripted (i.e., pre-established) sequences including combinations of visual displays, audio sequences and/or haptic vibrations. Such sequences may additionally include conditional dependencies (i.e., selects from two or more interaction scenarios), determined based on real-time conditions (e.g., success during prior discoveries, time of day, user age).
Using the symbol “[prompt]” to represent one or more prompts, properties or cues associated with a discoverable object, exemplary scripted prompts (presented visually, aurally and/or haptically) by the handheld device include:
Can you find the [prompt]?
Show me the largest [prompt].
What letter follows [prompt]?
What animal produces this sound [prompt]?
Which musical instrument produces the [prompt] beat you feel?
Further, the handheld device processor may include a “personality” driven by AI (i.e., artificial intelligence personality, AIP), transformer models and/or large language models (e.g., ChatGPT, Cohere, GooseAI). An AIP may enhance user interactions with the handheld device by including a familiar appearance, interactive sequences, physical form, and/or voice that may include personal insights (e.g., likes, dislikes, preferences) about the device user.
Human-machine interactions enhanced by an AIP are more fully described in U.S. Pat. No. 10,915,814, filed Jun. 15, 2020, and U.S. Pat. No. 10,963,816, filed Oct. 23, 2020, the entire disclosures of which are expressly incorporated herein by reference. Determining context from audiovisual content and subsequently generating conversation by a virtual agent based on such context(s) are more fully described in U.S. Pat. No. 11,366,997, filed Apr. 17, 2021, the entire disclosure of which is expressly incorporated herein by reference.
According to further aspects of the systems and methods, network training and other programming of handheld device CV schemes may take advantage of world-wide computing resources (e.g., TensorFlow). The restricted nature of discoverable objects within a template database may greatly simplify both training and classification processes to determine the presence, or not, of a match (i.e., a binary determination). Training of a classification network may be restricted to the predetermined template database, or even a subset of the database if context during a discovery process is known.
Along similar lines, relatively simple classification networks and/or decision trees within the handheld device may be used to identify the presence (or not) of a match. Classifications may, optionally, be performed on a device with confined computing resources and/or without transmitting to remote devices (e.g., to access more substantial computing resources). Such classifications may be performed using neural network (or other CV approaches) using hardware typically found on mobile devices. As examples, MobileNet and EfficientNet Lite are platforms that may be sufficient to determine a match (or not) between the content(s) of a camera image and a discoverable object.
Additionally, compared with universal CV classification schemes, such restricted classifications may be considerably more robust since: 1) images are only being compared to the database of discoverable objects (e.g., not to all possible object in the world), and 2) binary determinations allow match thresholds to be adjusted to help ensure intents of device users are accurately measured. Thresholds for determining matching (or not) of objects being pointed at within camera image may be adjusted based on factors such as hardware (e.g., camera resolution), environment (e.g., lighting, object size), a specific application (e.g., examination-based questioning, the presence of multiple objects similar in appearance), categories of users (e.g., experienced versus novice) or a particular user (e.g., younger versus older, considering prior discovery success rate).
The presence (or lack) of a match may be indicated to the handheld device user using visual (e.g., using one or more device displays), audible (e.g., using a device speaker) and/or vibrational (e.g., using a device haptic unit) feedback. The handheld device may, additionally or alternatively, perform other actions based on matching (or mismatching) of the discoverable object including transmitting the one or more cues, selection occurrence, timing and identity (optionally, including the camera image) of a discovered object (or, lack of discovery) to one or more remote processors.
Particularly when used in entertainment, educational and/or collaborative settings, an ability to transmit the results of finding discoverable objects allows the handheld device to become a component of larger systems. For example, when used by a child, experiences (successfully discovering objects, or not) may be shared, registered, evaluated, and/or simply enjoyed with parents, relatives, friends and/or guardians.
A shared experience (e.g., with a parent or guardian) may involve discovering objects in books and/or magazines. Using a handheld device to control the delivery of such serial content is more fully described in co-pending U.S. application Ser. No. 18/091,274, filed Dec. 29, 2022, the entire disclosure of which is expressly incorporated herein by reference. Sharing the control of advancing to a new page or panel to find discoverable objects when viewing a book or other medium is more fully described in U.S. Pat. No. 11,652,654, filed Nov. 22, 2021, the entire disclosure of which is expressly incorporated herein by reference.
When used in isolation, interactions using a handheld device may eliminate needs for accessories or other devices such as a computer screen, computer mouse, track ball, stylus, tablet or mobile device while making object selections and performing activities; thereby eliminating requirements by a user to understand interactions involving such devices or pointing mechanisms.
Whether used in isolation or as a part of a larger system, a handheld device that is familiar to an individual (e.g., to a child) may be a particularly persuasive element of audible, haptic and/or visual rewards as a result of finding a discoverable object (or, conversely, notifying a user that a match has not been discovered). The handheld device may even be colored and/or decorated to be a child's unique possession. Along similar lines, audible cues (voices, one or more languages, alert tones, overall volume), and/or visual cues (letters, symbols, one or more languages, visual object sizing) may be pre-selected to suit the preferences, accommodations, skills and/or abilities of an individual device user.
According to further aspects of the systems and methods herein, the light beam emanating by the handheld device may be generated using one or more lasing diodes, such as those manufactured by OSRAM and ROHM Semiconductor. Lasing diodes (and lasers in general) produce coherent, collimated and monochromatic sources of light.
Considering the portability of a handheld device in which a close-proximity light source may be pointed in any direction, increased eye safety (especially during use by a child, generally considered an “uncontrolled” environment from a safety perspective) may be attained using a non-coherent source such as a (non-lasing) light-emitting diode (LED). LED so-called point sources, such as those manufactured by Jenoptik and Marktech Optoelectronics may produce non-coherent, highly collimated and (optionally) polychromatic light sources.
Optical components associated with LED point sources may control beam divergence that, in turn, may guide reflected spot size (see, e.g., FIG. 5 ) at typical working distances (e.g., 0.05 to 1.0 meters when the handheld device is used to point at objects within pages of a book). Desirable spot sizes may differ during different applications environments and/or during use by different users. For example, a young child may want only to point toward larger objects within pages of a child's book, whereas an older child or adult may wish to be able to point at objects as small as individual words or symbols on a page or screen (i.e., using a smaller and/or less divergent beam).
Within additional examples, wide spectrum (at least compared to a laser) and/or polychromatic light sources produced by (non-lasing) LEDs may help beam reflections to be seen by those who might be color-blind within one or more regions of the visible spectrum. A polychromatic light source may also be viewed more consistently by all device users when reflected off different surfaces. As an example, a purely green light source may not be seen easily when reflected off a purely red surface (e.g., region of a book page). A polychromatic light source, especially in the red-green portion of the visible spectrum, may help alleviate such issues. More energetic photons within the deep blue end of the visible spectrum may be avoided for reasons related to eye safety.
Within further examples, the device beam source may be operatively coupled to the device processor, allowing the intensity of the beam source to be controlled, including turning the beam on and off. Turning the beam on by the handheld device may, for example, be included as a prompt to indicate to a user that pointing toward a discoverable object is expected (e.g., after providing a prompt or cue).
The beam may subsequently be turned off during the time that a camera image is acquired to avoid beam reflection(s), and/or to avoid pixel saturation at or near (e.g., as a result of camera pixels “bleeding” due to saturation) a beam reflection. The absence of reflections off objects being pointed at (where reflections may be considered “noise” when identifying an object) may reduce requirements and increase accuracy for both machine learning-based training and classification processing.
The beam may also be turned off upon determining a match (e.g., as a component of signaling success to a user). Conversely, leaving the beam turned on during interactions may indicate to the device user that further searching for a discoverable object is expected.
Beam intensity may also be modulated, for example, based on measurements of one or more reflections within images acquired by a handheld device camera. Reflection intensity may be made clearly discernible to a user over background (e.g., considering ambient lighting conditions, to accommodate for the reflectivity of different surfaces, and/or to accommodate for visual impairment), but not overwhelming (e.g., based on user preference). Beam intensity may be modulated by a number of means known in the art including regulating the magnitude of the light beam driving current (e.g., with transistor-based circuitry) and/or using pulse width modulation (i.e., PWM) of the driving circuitry.
As a further aspect of systems and methods herein, one or more illumination patterns may be shown on device displays and/or projected within the device beam. Using the beam to project one or more illumination patterns effectively combines the role of one or more separate displays on the handheld device with beam pointing. Illumination patterns projected within the beam may be formed, for example, using miniature LED arrays, LCD filtering, or DLP (i.e., digital light processing, using an array of microscopic mirrors) techniques, known in the art.
When images and/or symbols (e.g., letters forming words and phrases) are too long and/or complex to be displayed all at once, messages and/or patterns within a beam may be “scrolled”. Scrolled text or graphics may be displayed one segment at a time (e.g., providing an appearance of motion) in a predetermined direction (e.g., up, down, horizontally). During and following the process of discovering an object using the light beam (i.e., when attention may be focused on the beam), messages embedded within the beam (e.g., names of identified objects) may be noticed, effective and/or meaningful.
Illumination patterns generated by a beam source may be used to enhance pointing functions including control of the size and/or shape of a beam viewable by a device user. Within an illumination pattern, the size (e.g., related to the number of illuminated pixels) and relative location (location of illuminated pixels) may be controlled by the handheld device. Different sizes of beams may, for example, be used during different applications such as pointing letters within text (using a narrow beam) versus larger cartoon characters (using a larger beam). The location of a beam may be “nudged” by the handheld device to help direct user attention to a particular (e.g., nearby) object within the camera's field-of-view.
Illumination patterns generated within the beam may also be used to “augment”, add to, or enhance printed or other forms of external (i.e., to the handheld device) content. One or more reflective objects identified within images acquired by the handheld device camera may be augmented by light beam projections. For example, if a beam is directed toward a squirrel when looking for a bird as a discoverable object, the beam may project a pair of wings superimposed on the printed image of a squirrel as a component of a (silly) query whether the object being pointed at is a bird. As a further example, the apparent color of one or more components of a printed object may be altered as a result of illuminating the overall shape of the object (or individual object components) with selected colors within the beam. Such augmentation of visual content (along with added acoustic content) may help bring the static contents of a book or other printed material “to life”.
Additionally, viewable information and/or symbols within beam projections with similar optical performance compared with the device camera (e.g., common depth-of-field, not distorted when viewed normal to a reflective surface) tends to encourage and/or mentally nudge handheld device users to orient and/or position the handheld device such that the information and/or symbols are most viewable (e.g., in focus, not skewed) by both the device user and the device camera. As a consequence, well-positioned and oriented camera-acquired images (i.e., readily viewable by the user) may facilitate computer vision processing (e.g., improved classification reliability and accuracy). A user may be unaware that the ability to readily view and/or identify projected beam patterns also enhances image quality for camera-based processing.
As further aspects of systems and methods herein, construction of the handheld device may include pointing the device camera in the same direction as the light beam, allowing the identification of a location, or at least a confined region, where the beam is pointed within any camera image (e.g., even if the beam were turned off). Because both the beam and camera move at the same time (i.e., both affixed to, or embedded within, the handheld device body), the location (or region) a beam is pointed within any camera image may be known regardless of the physical position, pointing direction, or overall orientation of the handheld device in (three-dimensional) space.
Ideally, construction of the handheld device may place the beam reflection at the center of camera images. However, given a small separation (e.g., as a result of physical construction constraints) between the beam source and the camera sensor, the beam may not appear at the center of camera images at all working distances. At a particular working distance, a location of a reflection may be computed using simple geometry (analogous to the geometry describing parallax) given the direction of beam pointing, the direction of camera image acquisition and the physical separation between the two (see FIG. 6 ).
The beam and camera may be aligned, for example, to project and acquire light rays that are parallel (i.e., non-converging). In this case, a reflection may be offset from the center of camera images by an amount that depends on working distance (i.e., from the handheld device to a reflective surface). The separation between the center of camera images and the center of the beam decreases as working distances increase (e.g., approaching a zero distance at infinity). By keeping the physical distance separating the beam and camera small, the separation may similarly be kept small.
Alternatively, the pointing directions of the beam and camera may be aligned to converge at a preferred working distance. In this case, the beam may be made to appear at the center of camera images (or some other selected camera image location) at preferred working distances. As the distance from the handheld device to a reflective surface varies, the location of the beam may vary over a limited range (generally, in one dimension related to an axis defined by the camera and light beam source). Once again, by keeping the physical separation between the beam and camera small, a beam pointing region within camera images may be kept small over the range of working distances employed during typical applications.
A calibration process to determine a region and/or location of the beam within camera images may include:

- 1. acquiring a baseline camera image (e.g., from a featureless surface) that includes no reflection by the projected light beam,
- 2. acquiring a light beam reflection image (i.e., with the beam turned on) that includes one or more reflections produced by the projected light beam,
- 3. computing a subtracted pixel intensity image based on the baseline image subtracted from the light beam reflection image, and
- 4. assigning to the beam location region, light beam sensed pixels within the subtracted pixel intensity image that exceed a predetermined light intensity threshold.

A beam pointing region may be identified based on measured values of pixels that exceed the threshold, and/or a singular beam pointing location may be determined from a central (e.g., two-dimensional median or average) location of all pixels that exceed the threshold intensity. Calibrations may be performed at different working distances to map the full extent of a beam pointing region.
Within additional examples herein, identifying an object being pointed at within camera-based images may (with the beam turned on) be based on repeatedly identifying the beam reflection as a region of high luminosity (e.g., high intensity within a region of pixel locations). Identifying the location of such beam reflections may also take into account the color of the light beam (i.e., identifying higher intensities only within one or more colors associated with the light beam spectrum). Knowing the relation between pointing location and working distance allows an estimate of the distance from the handheld device to a reflective surface to be computed when camera-based images of beam reflections are available.
Within further examples, based on the spectral sensitivity of a typical human eye, a light beam in the green portion of the visible spectrum may be most readily sensed by most individuals. Many so-called RGB (i.e., red, green, blue) cameras contain twice as many green sensor elements as red or blue elements. Utilizing a light beam in the mid-range of the visible spectrum (e.g., green) may allow beam intensity to be kept low but readily detectable (e.g., by both humans and cameras), improving the reliability of detecting reflections within camera-based images and increasing overall eye safety.
As further aspects of systems and methods herein, the handheld device user may indicate (i.e., to the handheld device) that the beam is pointed toward a selected object. Such indications may be made via a range of interactive means including:

- 1. pressing or releasing a pushbutton (or other contact or proximity sensor) that is a component of the handheld device,
- 2. providing a verbal indication (e.g., saying “now”) sensed by a handheld device microphone and identified by the device processor or a remote processor,
- 3. point the beam at an object (i.e., absent substantial movement) for a predetermined (e.g., based on user preferences) “dwell” time,
- 4. orienting the device in a predetermined direction (e.g., vertically relative to the gravitational pull of the earth) sensed by a handheld device IMU, or
- 5. gesturing or tapping the handheld device (e.g., tipping the device forward), also sensed by a handheld device IMU.
  Within these latter exemplary cases, in which signaling movements of the handheld device by the user (e.g., gesture, tap) may affect image stability within the camera's field-of-view, a stationary image (e.g., showing no substantial movement compared with one or more previously acquired images) isolated (e.g., from a continuously sampled series of images) prior to the motion may be used to identify the viewable object being pointed at.

Within further examples, one method to implement dwell-based methods involves ensuring a number of consecutive images (computed from desired dwell time divided by frame rate) to reveal a substantially stationary viewable object and/or beam reflection. For rapid and/or precise dwell times, this method demands high camera frame rates, and resultant computational and/or power consumption. An alternative method to determine if a sufficient dwell time has elapsed includes using an IMU to assess whether the handheld device remains substantially stationary for a predetermined period. In general, IMU data to assess motion may be acquired with higher temporal resolution compared to processes involving acquisition of full-frame camera images.
Conversion of analog IMU data into a digital form, suitable for processing, may use analog-to-digital (A/D) conversion techniques, well-known in the art. IMU sample rates may generally be in a range from about 10 samples/second to about 10,000 samples/second where (as introduced in the Background section, above) higher IMU sample rates involve trade-offs involving signal noise, cost, power consumption and/or circuit complexity. IMU data streams may include of one or more of:

- 1. up to three channels (i.e., representing three orthogonal spatial dimensions) of accelerometer data,
- 2. up to three channels (i.e., representing rotation around three axes often denoted pitch, roll and yaw) of gyroscope rotational velocities,
- 3. up to three channels (i.e., representing orientation in three orthogonal dimensions) of magnetometer data representing magnetic forces (e.g., including the magnetic pull of the earth), and
- 4. up to three channels (i.e., representing orientation in three orthogonal dimensions) of inertial forces on an internal mass data that may include the gravitational pull of the earth.

Data from three-axis accelerometers may be considered a time-varying vector in three-dimensional space where each axis may be denoted X, Y and Z. Treating accelerometer data as a vector, the magnitudes of accelerations, |A|, may be computed according to
|A|=(X _i −X _b)²+(Y _i −Y _b)²+(Z _i −Z _b)² (eqn. 1)
where X_i, Y_iand Z_irepresent accelerometer samples (i.e., where “i” represents sample index) in each of the three dimensions; and X_b, Y_band Z_brepresent so-called “baseline” values, respectively in each of the same three dimensions.
Baseline values may take into account factors such as electronic offsets and may be determined during “calibration” periods (e.g., by computing average values, reducing the effects of noise) when there is no movement. Three-dimensional acceleration directions (e.g., using spherical, Cartesian and/or polar coordinate systems) may also be computed from such data streams. Analogous approaches may also be made based on multi-dimensional IMU gyroscope data streams, and computing pointing vectors relative to the gravitational and/or magnetic pull of the earth.
Handheld device user indications may include translational motion, rotation, tapping the device, and/or device orientation. User intent(s) may, for example, be signaled by:

- 1. motion of any sort (e.g., |A| above IMU noise levels),
- 2. movement in a particular direction,
- 3. velocity (e.g., in any direction) above a threshold value,
- 4. a gesture using the handheld device (e.g., known movement pattern),
- 5. pointing the handheld device in a predetermined direction,
- 6. tapping the handheld device with a digit of an opposing hand,
- 7. tapping the handheld device with a striking object (e.g., stylus),
- 8. striking the handheld device against a solid object (e.g., desk), and/or
- 9. striking the handheld device against an additional handheld device.

In yet further examples to determine user intent based on IMU data streams, a “tap” of the handheld device may be identified as a result of intentionally moving and subsequently causing an object (i.e., a “striking object”) to hit a location on the surface of a handheld device (i.e., “tap location”) targeted by the user. A computed tap location on a handheld device may be used to convey additional information about intent (i.e., in addition to making an object selection) by the device user. As examples, user confidence in making a selection, indicating a first or last selection that is a part of a group of objects, or a desired to “skip forward” during a sequential interactive sequence may each be signaled based on directional movement(s) and/or tap location(s) on the handheld device.
Characteristics of a tap may be determined when a stationary handheld device is struck by a moving object (e.g., a digit of the hand opposing the hand holding the device), when the handheld device itself is moved to strike another object (e.g., table), or when both the striking object and the handheld device are moved simultaneously prior to contact. IMU data streams prior to and following a tap may help to determine whether a striking object was used to tap a stationary device, the device was forcefully moved toward another object, or both processes occurred simultaneously.
Tap locations may be determined using distinctive “signatures” or waveform patterns (e.g., peak force, acceleration directions) within IMU data streams (i.e., particularly accelerometer and gyroscopic data) that depend on tap location. Determining tap location on the surface of a handheld device based on inertial (i.e., IMU) measurements and subsequent control of activities based on tap location are more fully described in U.S. Pat. No. 11,614,781, filed Jul. 26, 2022, the entire disclosure of which is expressly incorporated herein by reference.
As further aspects of devices and methods herein, one or more actions may be performed by the handheld device processor based on determining a match (or mismatch) with characteristics of discoverable objects. Upon determining a mismatch of one or more objects within a camera image, the handheld device may simply acquire one or more subsequent camera images to continue monitoring whether a match has been found. Alternatively, further prompts and/or cues may be provided. Such prompts and/or cues may involve repeating previous prompts and/or cues, or presenting new prompts and/or cues (e.g., acquired from the template database) to hasten and/or enhance the discovery process.
Upon determining a match, an action enacted by a processor within the handheld device may include transmitting available information related to the discovery process to a remote device where, for example, further action(s) may be enacted. This information may include the audible, haptic and/or visual cues used to initiate the discovery process. Additionally, the camera image(s), an acquisition time of the acquiring camera image(s), the predetermined camera image light beam pointing region, the predetermined template of the discoverable object, and the one or more indicated objects may be included in the transmitted dataset.
Alternatively, or in addition, action(s) involving the user may be enacted within the handheld device itself. For example, one or more sounds may be played on the device speaker. This may include a congratulatory phrase or sentence, a name of the discovered object, a sound produced by the discovered object, a celebratory chime, a sound produced by another object related to the discoverable object, and so on.
Illumination patterns (e.g., broadcast on a display or projected within the beam) may include a congratulatory phrase or sentence, a displayed name of the discovered object, a caricature or drawing of the discovered object, an image of another object related to the discovered object (e.g., the next object within a sequence, such as letters of the alphabet), and so on. Haptic feedback upon discovery may comprise simply acknowledging success with a vibratory pattern, generating vibrations at a frequency associated with motions and/or sounds produced by the discovered object, and so on.
The handheld device may additionally include one or more photodiodes, an optical blood sensor, and/or an electrical heart sensor, each operatively coupled to the device processor. These handheld device components may provide additional elements (i.e., inputs) to help monitor and determine user interactions. For example, a data stream from a heart rate monitor may indicate stress or duress during discovery processes. Based on detected levels, previous interactions, and/or predefined user preferences, object discovery may be limited, delayed or abandoned.
Within additional examples, although not “handheld” in a strict sense, such portable electronic devices may be affixed and/or manipulated by other parts of the human body. A device that interacts with a user to point a light beam toward discoverable objects may, for example, be affixed to an arm, leg, foot or head. Such positioning may be used to address accessibility issues for individuals with restricted upper limb and/or hand movement, individuals lacking sufficient manual dexterity to convey intent, individuals absent a hand, and/or during situations where a hand may be required for other activities.
Interactions using the handheld device may additionally take into account factors associated with accessibility. For example, particular colors and/or color patterns may be avoided within visual cues when devices are used by individuals with different forms of color blindness. The size and/or intensity of cues broadcast on one or more handheld device displays and/or within the beam may accommodate visually impaired individuals. Media containing discoverable objects may be Braille-enhanced (e.g., containing both Braille and images), and/or contain patterns and/or textures with raised edges. Beam intensity may be enhanced and/or the handheld device camera may track pointing by a finger (e.g., within regions containing Braille) to supplement pointing using a light beam.
Along similar lines, if an individual has a hearing loss over one or more ranges of audio frequencies, then those frequencies may be avoided or boosted in intensity (e.g., depending on the type of hearing loss) within audio cues generated by the handheld device. Haptic interactions may also be modulated to account for elevated or suppressed tactile sensitivity of an individual.
During activities that, for example, involve young children or individuals who are cognitively challenged, interactions may involve significant “guessing” and/or needs to guide a device user. Assisting a user during an interaction and/or relaxing the precision of expected responses may be considered a form of “interpretive control”. Interpretive control may include “nudging” (e.g., providing intermediary hints) toward one or more target responses or reactions. For example, a young child may not fully understand how to manipulate a handheld device to effectively point the light beam. During such interactions, auditory instruction may accompany the interactive process (e.g., broadcasting “hold the wand straight up”), guiding the individual toward a selection.
Similarly, a flashing display and/or repeating sound (where frequency may be related to how close a cue or attribute is to a particular selection) may be broadcast as a user approaches a discoverable object and/or anticipated response. On the other hand, a reaction in which there is no apparent attempt to point the beam may be accompanied by a “questioning” indication (e.g., haptic feedback and/or buzzing sound), as a prompt promoting alternative considerations. Further aspects of interpretive control are more fully described in U.S. Pat. No. 11,334,178, filed Aug. 6, 2021, and U.S. Pat. No. 11,409,359 filed Nov. 19, 2021, the entire disclosures of which are expressly incorporated herein by reference.
FIG. 1 shows an exemplary scenario in which a child 11 discovers a viewable object based on barking sounds 16 b played by a speaker at 16 a of a handheld device at 15. The handheld device 15 may additionally play (via its speaker 16 a) instructions 16 b such as “Let's find the dog!”. As further examples of canine cues, the handheld device 15 may project the word “dog” and other descriptions or symbols that might generally be associated with dogs on its one or more device displays at 17 a, 17 b, 17 c and/or scrolled using the device beam.
Additionally, images of the dog at 12 b within the cartoon scene 12 a (e.g., previously captured by the handheld device camera), and/or one or more drawings or other representations of dogs may be projected on the one or more displays at 17 a, 17 b, 17 c. Barking sounds may be accompanied by vibrations (i.e., heard and/or felt by the holding hand 14 of the device user 11) generated using a haptic unit (not viewable) embedded within the handheld device 15, further alerting the user 11 that discovery of a viewable object is expected.
The child 11 may use her right hand at 14 to manipulate a light beam at 10 a projected by the handheld device at 15 to point toward the drawing of a dog at 12 b within the cartoon scene 12 a containing a number of additional viewable objects spread across two pages of the magazine 13 a, 13 b. The beam at 10 a generates a light reflection at 10 b at the location of the dog (i.e., the selected object) at 12 b on the rightmost page 13 b. Images acquired by a camera (not viewable in FIG. 1 , pointed toward the page in the same direction as the beam at 10 a) may classify the object 12 b at the beam location 10 b as a dog (whether a beam reflection is present within camera images or not). Acoustic, haptic and/or visual rewards may be provided to the child 11 upon successfully pointing out the dog at 12 b.
If the classification of the object being pointed at is determined not to be canine in nature, the overall process may be repeated by re-broadcasting the acoustic, haptic and/or visual cues related to the discoverable object; broadcasting additional or alternative cues related to the discoverable object; or broadcasting acoustic, haptic and/or visual cues to indicate that the provided cues do not appear to be associated with the object being pointed at. Additionally, the selection, timing and identity of any discovered object 12 b (or, lack of discovery) may subsequently control actions enacted directly by the handheld device 15 and/or conveyed to one or more remote devices (not shown) to modulate further activities.
FIG. 2 shows another exemplary scenario in which three spherical displays at 27 a, 27 b, 27 c on a handheld device spell the word “CAT”. Additional cues related to the discoverable object may include playing meowing sounds and/or enunciating the word “cat” and/or other related terms using the handheld device speaker at 26 (i.e., distinct from sounds at 23 b emanating from the tablet at 23 a).
The child at 21 may use her right hand at 24 a to direct a light beam at 20 a projected by the handheld device at 25 to point toward a cartoon drawing of a cat at 22 a. The cat at 22 a and a unicorn at 22 b are components of a combined audible 23 b and video 23 a presentation. The beam at 20 a may reflect off the tablet screen at 20 b in the region of the cat 22 a during the audiovisual sequence.
An indication that the object being illuminated at 20 a is a selection may be conveyed by the user 21 by one or more of: 1) vocalizing the indication (e.g., saying “now” or “OK”) sensed by a microphone (not visible) embedded within the handheld device 25, 2) using a thumb at 24 b (or any other digit) to press one (e.g. at 25 a) of the one or more available device pushbuttons at 25 a and 25 b, 3) gesturing 29 a, 29 b in a predetermined movement pattern sensed by an IMU (not visible) embedded within the device 25, and/or 4) orienting the handheld device 25 in a predetermined direction (e.g., relative to the gravitational pull of the earth) to indicate that image processing steps should be initiated.
Analyses of images acquired by a handheld device camera (not viewable in FIG. 2 , pointed in the same direction as the light beam at 20 a) may classify the object 22 a at the location of a beam reflection at 20 b as feline in nature and subsequently generate audible, haptic and/or visual feedback related to the discovery by the child 21. The beam reflection at 20 b may, or may not, be present in images captured by the camera. For example, the beam may be turned off during the period of image acquisition by the camera (e.g., to help ascertain the object's identity absent interference due to beam reflections). The selection, timing and identity of the discovered cat 22 a (including predetermined actions within the discoverable object database) may subsequently be used to control actions on the handheld device 25, tablet 23 a, and/or other remote processors (not shown).
FIG. 3 is an exemplary flow diagram illustrating steps to discover, based on a visual cue (displayed on handheld device displays at 31 c, 31 b and 31 a), a cat (e.g., stuffed toy, feline image within a book or screen, and/or real cat) at 32 b from a collection of viewable objects, including a unicorn at 32 a, using a light beam at 32 c generated by a handheld device 32 d. These steps include:

- 1) at 30 a, as a visual cue related to the discoverable object, displaying on three displays that are components of the handheld device 31 d, letters that spell “C” at 31 c, “A” at 31 b, and “T” at 31 a;
- 2) at 30 b, moving and/or orienting the handheld device 32 d to manually direct the light beam at 32 c toward the viewable cat at 32 b;
- 3) at 30 c, using a camera (not visible) and focusing optics at 33 c (depicted apart from the body of the handheld device for illustration purposes only) within the handheld device 33 d to acquire an image that includes a cat at 33 a in the region of the beam reflection at 33 b (where the beam itself in not shown in 30 c for clarity);
- 4) at 30 d, using neural network 34, template matching and/or other classification scheme(s) to determine if one or more objects within the camera image at 33 a match one or more objects within the discoverable object database (e.g., particularly the discoverable object associated with the “CAT” visual cue 31 c, 31 b, 31 a);
- 5) at 30 e, determining if there is a match between the object(s) classified within the camera image at 33 a and one or more predetermined templates or classifications of the discoverable object at 37 and, if not, returning at 38 to re-broadcast and/or generate new discoverable object cues at 30 a; or
- 6) at 30 f, determining that the object pointed at by the device user is the discoverable object (i.e., a cat at 35); and
- 7) optionally, as indicated by the dashed-line rectangular outline at 30 g, turning off the light beam used to point at viewable objects 36 a, 36 b and/or rewarding success in pointing out the discoverable object by haptic vibrations and/or other acoustic or visual cues generated by the handheld device at 36 c.

FIG. 4 is a flow diagram illustrating exemplary steps employing a handheld device 42 d to discover a hammer at 42 b from a collection of construction tools (that includes a saw at 42 a) using one or more acoustic cues at 41 c played by the handheld device 41 a. In this case at 40 a, the one or more acoustic cues emanating from a handheld device speaker at 41 b may include the sounds of pounding a nail. Such rhythmic sounds may (optionally) be accompanied by vibrations generated by a haptic unit (not viewable) embedded within the handheld device, that may be felt by a hand of the device user at 42 d. Alternatively, or in addition, the word “hammer” and/or question such as “What tool is used to drive or remove nails?” may be played at 41 c by the device speaker at 41 b.
Similar to FIG. 3 , the exemplary steps in the sequence in FIG. 4 include, at 40 b, manually manipulating the handheld device at 42 d to point a light beam at 42 c toward the hammer at 42 b (i.e., being careful not to dwell on other viewable objects such as a saw at 42 a). At 40 c, a camera (not visible, pointed in the same direction and the beam at 42 c) and focusing optics at 43 c within the handheld device at 43 d may be used to acquire an image at 43 a that includes the hammer at 43 b (with or without a beam reflection). At 40 c, neural network, template matching and/or other classification schemes (i.e., CV methods at 44) may be used to classify the selected object 43 b within the camera image at 43 a using a discoverable objects database.
In this exemplary case, the particular type of hammer within descriptions and/or templates of the discoverable object (i.e., a peen hammer at 47) may not exactly match the type of hammer being pointed at within camera images (i.e., a claw hammer at 42 b). A match in classifications at 42 e may be based on one or more predetermined thresholds for classification agreement, allowing any object being pointed at 42 b that is generally regarded as a hammer to generate a reward for finding the discoverable object.
At 40 e, if the intent of the user is classified as indeterminate or disagreeing with the cue(s) provided by the handheld device at step 40 a, then operation is returned at 48 to processes allowing the user to hear a rebroadcast of the original cues and/or any additional acoustic cues at 40 a regarding the discoverable object. The user may then continue aiming the light beam 42 c toward (the same or) other viewable objects.
On the other hand at 40 f, if the indication by the user is determined to agree with the auditory cue(s) provided by the handheld device, then the identified object (i.e., the claw hammer at 45) is designated as the discovered object. At 40 g, optionally (as indicated by the dashed-line rectangular outline) the light beam generated by the handheld device at 46 d used to point at viewable objects 46 b, 46 c may be turned off (e.g., to move on to other activities) and/or success in pointing out the discoverable object may be indicated acoustically at 46 a (e.g., broadcasting “You got it!”) and/or using other visual or haptic means. The object image 43 a, classified identity, and/or timing of object selection by the device user may subsequently govern one or more handheld device activities and/or be transmitted to one or more remote processors (not shown) for further action(s).
FIG. 5 shows components of an electronic schematic and ray diagram illustrating exemplary elements for beam generation at 51 a, 51 b, 51 c, 5ld, beam illumination 50 a and reflected 50c light paths, and detection by a camera at 56 a of a drawing of a candy cane at 59 being pointed at within pages 53 a, 53 b of a book at 57. Both the electronic circuitry at 51 a, 51 b, 51 c, 51 d as well as the camera at 56 a and its associated optics at 54 may be incorporated within the body of a handheld device (not shown).
The drawing of the candy cane at 59 may be discovered within sketches that include a cartoon character 52 a on the leftmost page 53 a, a second cartoon character 52 d on the rightmost page 53 b page, and a cat at 52 c. In this exemplary case, the drawing of the candy cane at 59 is intentionally incorporated (i.e., embedded) within a drawing of a snake at 52 b. Camouflaging such images may challenge a child (and others) to look more carefully at book contents, where such whimsical strategies may make book-related activities more fun, educational and instructive for children (and adults).
Components of the beam generating circuitry may include: 1) a power source at 51 a that typically comprises a rechargeable or replaceable battery within the portable, handheld device, 2) optionally, a switch at 51 b or other electronic control device (e.g., pushbutton, relay, transistor) that may be governed by the handheld device processor and/or the device user to turn the pointing beam on and off, 3) a resistor at 51 c (and/or transistor that may regulate beam intensity) limiting current delivered to the beam source, since LEDs are generally configured in a forward bias (i.e., lower resistance) direction, and 4) a beam source, typically comprising a lasing or light emitting diode at 51 d.
Precision optics (that may comprise multiple optical elements including some encapsulated within the diode-based light source, not shown) may largely collimate the beam 50 a, optionally also providing a small (i.e., designed) degree of beam divergence. As a consequence, beam dimensions emanating from the light source at 58 a may be smaller than at some distance along the light path at 58 b. The illuminating beam 50 a divergence, along with the distance between the light source 51 d and the selected object 59, largely govern the reflected beam spot size at 50 b.
Further, the beam reflected off the selected object 50 c may continue to diverge where, for example, in FIG. 5 , the size of the reflected beam at 58 c is larger than the illuminating beam at 58 b. The size and shape of the illumination spot at 50 b (and its reflection) may also be affected by the location of the beam relative to the reflective surface (e.g., angle relative to a normal to the reflective surface) and/or the shape of the reflective surface(s). The horizontal dimension of the leftmost page 53 a is curved convexly relative to the incident beam 50 a, causing (e.g., a Gaussian profile and/or circular shaped) illumination beam to generate a reflected spot 50 b that may be elliptical in nature (i.e., wider in the horizontal dimension).
Light from the field-of-view of the camera 56 a may be collected by camera optics 54 that focus camera images 55 a, including a focused reflected beam spot 55 b, onto the light-sensing components of the camera at 56 b. Such sensed images may then be digitized using techniques know in the art, and subsequently processed (e.g., using CV techniques) to identify the object being pointed at within the camera's field-of-view.
FIG. 6 is an exploded-view drawing of a handheld device 65 showing exemplary locations for a light beam source at 61 a and a camera at 66 a. Such components may be internalized within the handheld device 65 during final assembly. This view of the handheld device 65 also shows the backsides of three spherical displays at 67 a, 67 b, 67 c attached to the main body of the handheld device 65.
The light beam source may comprise a lasing or non-lasing light-emitting diode at 61 a that may also include embedded and/or external optical components (not viewable in FIG. 6 ) to form, structure and/or collimate the light beam at 60. Beam generation electronics and optics may be housed in a sub-assembly at 61 b that provides electrical contacts for the beam source and precision control over beam aiming.
Along similar lines, the process of image acquisition is achieved by light gathering optics at 64 a incorporated within a threaded housing at 64 b that allows further (optional) optics to be included in the light path for magnification and/or optical filtering (e.g., to reject reflected light emanating from the beam). Optical components are attached to a camera assembly (i.e., including the image-sensing surface) at 66 a that, in turn, is housed in a sub-assembly that provides electrical contacts for the camera and precision control over image detection direction.
An aspect of the exemplary configuration shown in FIG. 6 includes the light beam at 60 and image-acquiring optics of the camera at 64 a pointing in the same direction 62. As a result, the beam reflection off of a viewable object occurs within roughly the same region within camera images, regardless of the overall direction the handheld device is pointed. Depending on relative alignment and separation (i.e., of the beam source and camera at 63), the location of the beam reflection may be offset somewhat from the center of an image. Additionally, small differences in beam location may occur at different distances from the handheld device to a reflective surface due to the (designed to be small) separation at 63 between the beam source 61 a and camera 66 a.
Such differences may be estimated using mathematical techniques analogous to those that describe parallax. As a net result, if the pointing beam is turned off (i.e., absent a beam reflection within camera images) one or more objects being pointed at may still be determined based on where beam optics are pointed within the camera's field-of-view. Conversely, any measured shift in the location of the center (or any other reference) of a light beam reflection within a camera image may be used to estimate a distance from the handheld device (more specifically, the device camera) to the viewable object based on geometry.
FIG. 7 is an exemplary electronic interconnection diagram of a handheld device 75 illustrating components at 72 a, 72 b, 72 c, 72 d, 72 e,72 f, 72 g, 72 h, 72 i, 72 j, 73, 74 and predominant directions for the flow of information during use (i.e., indicated by the directions of arrows relative to an electronic bus structure at 70 that forms a backbone for device circuitry). All electronic components may communicate via this electronic bus 70 and/or by direct pathways (not shown) with one or more processors at 73. Some components may not be required or used during specific applications.
A core of the portable, handheld device may be one or more processors (including microcomputers, microcontrollers, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), etc.) at 73 powered by one or more (typically rechargeable or replaceable) batteries at 74. As shown in FIG. 6 , handheld device elements also include a beam generating component (e.g., typically a lasing or light-emitting diode) at 72 c, and camera at 72 d to detect objects in the region of the beam (that may include a reflection produced by the beam). If embedded within the core of the handheld device 75, both the beam source at 72 c and camera at 72 d may require one or more optical apertures and/or optical transparency (at 71 b and 71 c, respectively) through any handheld device casing 75 or other structures.
During applications that include acoustic cues, a speaker (e.g., electromagnetic coil or piezo-based) at 72 f may be utilized. Similarly, during applications that might include audio-based user interactions, a microphone at 72 e may acquire sounds from the environment of the handheld device. If embedded within the handheld device 75, operation of both the speaker 72 f and the microphone 72 e may be aided by acoustic transparency through the handheld device casing 75 or other structures by, for example, coupling tightly to the device housing and/or including multiple perforations at 71 d (e.g., as further illustrated at 16 a in FIG. 1 ).
During applications that include vibrational and/or rhythmic prompts or cues, and/or to alert a user that a reward (or warning of a mismatch) might be expected, a haptic unit (e.g., eccentric rotating mass or piezoelectric actuator) at 72 a may be employed. One or more haptic units may be mechanically couple to locations on the device housing (e.g., to be felt at specific locations on the device) or may be affixed to internal support structures (e.g., designed to be felt more generally throughout the device surface).
Similarly, during applications that include visual cues regarding a discoverable object, one or more displays at 72 b may be utilized to display, for example, letters (as illustrated), words, images and/or drawings related to discoverable objects, and/or to provide feedback following discovery (or to indicate that an object has been incorrectly pointed at). Such one or more displays may be affixed and/or exterior to the main handheld device body (as shown at 72 b), and/or optical transparency may be employed within a device casing as indicated at 71 a.
During typical interactions, a user may signal to the handheld device at various times such as when ready to discover another object, pointing at a new object, agreement about a previously discovered object, and so on. User signaling may be indicated by verbal feedback (sensed by a microphone, as described above), as well as movement gestures or physical orientation of the handheld device sensed by an IMU at 72 g. Although illustrated as a single device at 72 g, different implementations may involve distributed subcomponents that, for example, separately sense acceleration, gyroscopic motion, magnetic orientation and gravitational pull. Additionally, subcomponents may be located in different regions of a device structure (e.g., distal arms, electrically quiet areas) to, for example, enhance signal-to-noise during sensed motions.
User signaling may also be indicated using one or more switching mechanisms including pushbuttons, toggles, contact switches, capacitive switches, proximity switches, and so on. Such switch-based sensors may require structural components at or near the surface of the handheld device at 71 e to convey forces and/or movements to more internally located circuitry.
Telecommunications to and from the handheld device 75 may be implemented using Wi-Fi 72 i and/or Bluetooth 72 j hardware and protocols (each using different regions of the electromagnetic spectrum). During exemplary scenarios that employ both protocols, shorter-range Bluetooth 72 j may be used, for example, to register a handheld device (e.g., to identify a Wi-Fi network and enter a password) using a mobile phone or tablet. Subsequently, Wi-Fi protocols may be employed to allow the activated handheld device to communicate directly with other, more distant devices and/or the World Wide Web.
FIG. 8 is a flow diagram illustrating exemplary steps following a text-based cue (i.e., the three letters spelling the word “CAT” at 81 c, 81 b, and 81 a, respectively), in which dwell time pointing the handheld device beam (i.e., lack of significant movement of the device) measured using an embedded IMU is used to indicate user selection of a viewable object. Once selected, the handheld device processor (or a remote processor) may determine whether or not the selected object (at 82 b) is feline in nature (e.g., matching a predetermined discoverable object template) and perform resultant actions.
Steps in this process include:

- 1) at 80 a, as a visual cue related to the discoverable object, displaying on three displays that are components of the handheld device at 81 d, letters that spell “C” at 81 c, “A” at 81 b, and “T” at 81 a;
- 2) at 80 b, allowing the user to point, using a light beam on the handheld device at 82 d, toward a selected object (i.e., a cat at 82 b) within a camera's field-of-view that includes other objects, such as a unicorn at 82 a;
- 3) at 80 c, acquiring, using a camera and camera optics at 83 b that are components of the handheld device at 83 c (optics at 83 b shown separated from the device for clarity), one or more camera images at 83 a of the scene in the same direction as the beam;
- 4) at 80 d, acquire IMU data monitoring movement of the handheld device (including acceleration and rotation) that may be represented as one or more vectors in three-dimensional space and labeled X at 84 a, Y at 84 b, and Z at 84 c;
- 5) at 80 e, determine when the magnitude of movements (|A|) of the handheld device (including acceleration and/or rotation) at 85 b remains less than a predetermined movement threshold shown as a dashed horizontal line at 85 c, for a sufficient duration or dell time at 85 a, beginning when |A| is less the threshold 85 c at a time indicated by a vertical dashed line at 85 d;
- 6) at 80 f, as IMU data are sampled, if dwell (i.e. movement is less than the movement threshold at 85 c) time has not yet reached a predetermined threshold time, then return at 89 a to acquiring additional camera images at 80 c and IMU data at 80 d; otherwise, if sufficient time has passed to exceed the dwell time threshold, then proceed at 89 b to processes associated with isolating the object being pointed at;
- 7) optionally, as indicated by a dashed-line rectangle at 80 g, turn the light beam (that was pointed toward viewable objects such as the unicorn at 86 a and cat at 86 b) emanating from the handheld device at 86 c off, as one indication to the user that a selection has been made;
- 8) at 80 h, from the one or more camera images at 87 a of the camera scene in the direction of the beam that includes a cat at 87 c, isolate the most recent camera image at 87 b, prior to significant movement of the handheld device (i.e., during the dwell time);
- 9) at 80 i, further isolate within the most recent camera image, an object within the region where the beam points at 88 a (i.e., in the same direction as the camera);
- 10) at 80 j, determine if the object in the light beam pointing region matches a template and/or classification of the discoverable object (in this case, a cat at 88 b); and
- 11) at 80 k, play at 88 e one or more attributes (e.g., cat meowing sounds) of the discoverable object (i.e., successfully pointed at by the user) on the speaker at 88 d of the handheld device at 88 c.

FIG. 9 is flow diagram in which orientation of the handheld device (i.e., vertically at 95 a, relative to the gravitational pull of the earth at 95 c) is used to indicate, by the device user, that a light beam-based selection is being made following an audible prompt (e.g., question about which clothing item to pack next) regarding a discoverable object at 91 c. The selection process involves a user choosing an article of clothing by pointing a light beam at a long sleeved shirt at 92 b. In this case, discoveries may include real (versus printed or displayed) clothing items.
Exemplary steps in this process include:

- 1) at 90 a, playing at 91 c, one or more audible prompts related to clothing selections on the speaker at 91 b of the handheld device at 91 a;
- 2) at 90 b, pointing a light beam at 92 c emanating from the handheld device at 92 d toward a long sleeved shirt at 92 b (e.g., to pack next), rather than a nearby pair of socks at 92 a;
- 3) at 90 c, by using the handheld device camera at 93 b, acquiring one or more images at 93 of the scene pointed to by the camera (and light beam);
- 4) at 90 d, acquiring IMU data to determine handheld device orientation, typically represented as a vector in three-dimensional space as X at 94 a, Y at 94 b, and Z at 94 c;
- 5) at 90 e, computing by the handheld device at 95 a based on the IMU data, orientations of the device at 95 b relative to the gravitational pull of the earth at 95 c;
- 6) at 90 f, determining if the computed orientation matches (within a predetermined range) a predetermined target orientation (e.g. vertical) to indicate a selection by the device user and, if so, continue at 99 b to isolate the selected object; otherwise (e.g., if not oriented vertically) continue at 99 a to re-acquire camera images at 90 c and IMU data at 90 d;
- 7) optionally, as indicated by the dashed-line rectangle at 90 g, turn off the handheld device at 96 c beam (used to point at viewable objects at 96 a, 96 b);
- 8) at 90 h, isolating the most recent camera image at 97 b from the one or more acquired camera images at 97 a (e.g., that includes a long sleeved shirt at 97 c, pointed at when the handheld device was steady and held vertically);
- 9) at 90 i, isolating an object (i.e., the long sleeved shirt at 98 a) at the light beam pointing region at 98 b within camera images (i.e., when the beam and camera are pointed in the same direction);
- 10) at 90 j, determining if the object being pointed at matches a template and/or characteristics of the discoverable object and, if not, returning at 99 c to re-broadcast the audible prompt(s) and/or broadcast new prompts to help find the discoverable object; otherwise proceed at 99 d to one or more actions based on successful discovery; and
- 11) optionally at 90 k, indicating successful discovery by, for example, displaying a congratulatory message on device displays at 98 c and/or producing haptic vibrations at 98 d felt by the device user.

The foregoing disclosure of the examples has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many variations and modifications of the examples described herein will be apparent to one of ordinary skill in the art in light of the above disclosure. It will be appreciated that the various components and features described with the particular examples may be added, deleted, and/or substituted with the other examples, depending upon the intended use of the examples.
Further, in describing representative examples, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. As one of ordinary skill in the art would appreciate, other sequences of steps may be possible. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims.
While the invention is susceptible to various modifications and alternative forms, specific examples thereof have been shown in the drawings and are herein described in detail. It should be understood that the invention is not to be limited to the particular forms or methods disclosed, but to the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the scope of the appended claims.

Claims

We claim:

1. A method to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, and a device speaker operatively coupled to the device processor, the method comprising:

playing, by the device speaker, one or more audible cues related to the discoverable object;

acquiring, by the device camera, a camera image when the handheld device is manipulated by the human such that a projected light beam points from the device light beam source to one or more visible objects;

isolating, by the device processor, one or more indicated objects at the beam location region within the camera image; and

determining, by the device processor, whether one or more of the one or more indicated objects match a predetermined template of the discoverable object.

2. The method of claim 1, wherein the device light beam source is one of a light-emitting diode and a lasing diode.

3. The method of claim 1, wherein the device light beam source is operatively coupled to the device processor and wherein an intensity of the device light beam source is controlled by one or more of regulating a magnitude of a light beam driving current and a pulse width modulation of the light beam driving current.

4. The method of claim 1, wherein the projected light beam is one or more of collimated, non-coherent, diverging, and patterned.

5. The method of claim 1, wherein the device light beam source is operatively coupled to the device processor, the method further comprising turning the device light beam source off upon one or both of the acquiring of the camera image and the determining the match with the predetermined template of the discoverable object.

6. The method of claim 1, wherein the beam location region within the camera image is determined by:

acquiring, by the device camera, a baseline image that includes no reflection by the projected light beam;

acquiring, by the device camera, a light beam reflection image that includes one or more reflections produced by the projected light beam;

computing, by the device processor, a subtracted pixel intensity image based at least in part on the baseline image subtracted from the light beam reflection image; and

assigning to the beam location region, by the device processor, light beam sensed pixels within the subtracted pixel intensity image that exceed a predetermined light intensity threshold.

7. The method of claim 1, wherein the one or more audible cues related to the discoverable object include one or more of: one or more sounds generated by the discoverable object, one or more names of the discoverable object, one or more questions about the discoverable object, one or more descriptions of the discoverable object, one or more described functions of the discoverable object, a mathematical problem related to the discoverable object, a musical score related to the discoverable object, and one or more related object descriptions related to the discoverable object.

8. The method of claim 1, wherein the predetermined template of the discoverable object includes one or more of: one or more shapes of the discoverable object, one or more sizes of the discoverable object, one or more colors of the discoverable object, one or more textures of the discoverable object, and one or more patterns within the discoverable object.

9. The method of claim 1, wherein the discoverable object is a printed object within one of a book, a book cover, a brochure, a box, a sign, a newspaper and a magazine.

10. The method of claim 1, further comprising performing an action by the device processor based at least in part on the determining one of the match and a mismatch, with the predetermined template of the discoverable object.

11. The method of claim 10, wherein the action comprises one or more of:

transmitting, to one or more remote processors, one or more of the one or more audible cues, the camera image, an acquisition time of the acquiring the camera image, the predetermined camera image light beam pointing region, the predetermined template of the discoverable object, and the one or more indicated objects;

playing one or more sounds on the device speaker;

displaying one or more illumination patterns on one or more device displays operatively coupled to the device processor; and

activating a device haptic unit operatively coupled to the device processor.

12. The method of claim 1, wherein a device switch is operatively coupled to the device processor, the method further comprising determining the match with the predetermined template of the discoverable object occurring upon detecting, by the device processor, a change in a switch state of the device switch.

13. The method of claim 1, wherein a device microphone is operatively coupled to the device processor, the method further comprising determining the match with the predetermined template of the discoverable object occurring upon identifying, by the device processor, one or more identifiable sounds produced by the human within data acquired by the device microphone.

14. The method of claim 1, wherein a device inertial measurement unit is operatively coupled to the device processor, the method further comprising determining the match with the predetermined template of the discoverable object occurring upon identifying, by the device processor, one of a predetermined handheld device gesture motion and a predetermined handheld device orientation within data acquired by the device inertial measurement unit.

15. A method to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, and one or more device displays operatively coupled to the device processor, the method comprising:

displaying, by the one or more device displays, one or more visual cues related to the discoverable object;

acquiring, by the device camera, a camera image when the handheld device is manipulated by the human such that a projected light beam points from the device light beam source to the one or more visible objects;

16. The method of claim 15, further comprising performing an action by the device processor based on the determining one of the match and a mismatch, with the predetermined template of the discoverable object.

17. The method of claim 16, wherein the action comprises one or more of:

transmitting, to one or more remote processors, one or more of: the one or more visual cues, the camera image, an acquisition time of the acquiring the camera image, the predetermined camera image light beam pointing region, the predetermined template of the discoverable object, and the one or more indicated objects;

playing one or more sounds on a device speaker operatively coupled to the device processor;

displaying one or more illumination patterns on the one or more device displays; and

activating a device haptic unit operatively coupled to the device processor.

18. A method to indicate a discoverable object by a human using a handheld device including a device processor, a device light beam source configured to generate a light beam producing one or more light beam reflections off one or more visible objects viewable by the human, a device camera aligned such that a camera field-of-view includes a beam location region of the one or more light beam reflections and operatively coupled to the device processor, and a device haptic unit operatively coupled to the device processor, the method comprising:

producing, by the device haptic unit, sensed haptic vibrations at one or more haptic frequencies related to one or more of motions and sounds associated with the discoverable object;

acquiring, by the device camera, a camera image when the handheld device is manipulated such that a projected light beam points from the device light beam source to the one or more visible objects;

19. The method of claim 18, further comprising performing an action by the device processor based on the determining one of the match and a mismatch, with the predetermined template of the discoverable object.

20. The method of claim 19, wherein the action comprises one or more of:

transmitting, to one or more remote processors, one or more of: the one or more haptic frequencies, the camera image, an acquisition time of acquiring the camera image, the predetermined camera image light beam pointing region, the predetermined template of the discoverable object, and the one or more indicated objects;

activating a device haptic unit operatively coupled to the device processor.