WO2015057845A1

WO2015057845A1 - Eye tracking system and methods for developing content

Info

Publication number: WO2015057845A1
Application number: PCT/US2014/060706
Authority: WO
Inventors: Jeff CLUNE; Hod Lipson; Jason Byron YOSINSKI; Nicholas CHENEY
Original assignee: Cornell University
Current assignee: Cornell University
Priority date: 2013-10-18
Filing date: 2014-10-15
Publication date: 2015-04-23
Anticipated expiration: 2016-04-18

Abstract

Content such as objects can be developed hands-free using eye tracking technology that accurately infers what a user is looking at. Content can be created in any form including in three-dimensions (3-D). Employing CPPN-NEAT to encode and evolve the object enables it to be printed using 3-D printing technology.

Description

EYE TRACKING SYSTEM AND METHODS FOR DEVELOPING CONTENT

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application No. 61/892,945 filed October 18, 2013, which is incorporated by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with government support under DBI-1003220 awarded by the National Science Foundation (NSF). The government has certain rights in the invention.

FIELD OF THE INVENTION

The invention relates to developing content in the form of two-dimensions (2- D) or three-dimensions (3-D) using any automated design process. More specifically, the invention is a system and methods that enables a user to create an object using eye tracking technology as well as provides for the creation of objects in 3-D that can be printed using 3-D printing technology.

BACKGROUND OF THE INVENTION

People are generally good critics, but are often poor at describing exactly what they want, especially for things they have never seen before. While it is difficult to precisely describe something in technical terms, people usually find it much easier to look at a set of options and declare which ones they prefer. This phenomenon is captured by the phrase "I'll know it when I see it" and may be due to a lack of technical knowledge to explain a conceptualized idea or the inability to imagine something never previously encountered. Additionally, some ideas for designs seem preferable before they are viewed, while others seem undesirable, but look surprisingly good once instantiated.

Automated design processes are known that adapt designs to user preferences. One example of an automated design process is known as interactive evolution. Interactive evolution uses this idea to drive design, either of solutions to a particular problem or of open-ended creation where the only objective is aesthetic appeal. Certain evolutionary algorithms present human users with potential solutions, and allow them to show a preference for things they like and discourage things they don't like. The information provided from the user's feedback is used to create novel designs similar to previously preferred solutions, iteratively finding designs more and more preferential to the user.

As one may imagine, interactive evolution is slow and the user becomes fatigued, which can lead to insufficient amounts of time and effort dedicated towards design. There have been numerous efforts to relieve this fatigue by predicting user preferences and offloading some of this interactivity to a machine proxy or substitute for the human user, usually though interweaving human and computer evaluations.

However, few studies have focused on the human-computer interface itself in order to allow a greater number of trials or less overall fatigue, as each interaction is made to be less taxing than one performed with a traditional interface. Although there have been attempts to produce attention driven evolution using eye tracking technology, these attempts suffer from limited subject data, a tightly confined design space - in which only a single number was evolved that represented the ratio of the side-lengths of a rectangle - and no comparisons of eye tracking driven evolution to evolution driven by more traditional mouse or keyboard interaction interfaces. Certain other attempts note the promise and importance of such a system, but lack the implementation of an eye tracking to demonstrate it.

Advances in eye tracking based interactive evolution may be driven by the interaction between various observations including, for example,: (1 ) unique stimuli - in their setup, unique rotational orientations - are likely to cause involuntary shifts in visual attention, (2) novel solutions may be a better driver of evolution than fitness (target-goal) based approaches, which often converge to local-optima in complex problems, and (3) objects varying in any attribute such as shape, color, complexity, etc., from other objects on a display device (as well as objects previously seen by the user). These complementary properties may suggest that simply displaying potential designs and measuring attention via an eye tracking may involuntarily draw attention to the most novel or unique designs shown to a user and thus provide a powerful driving force for interactive evolution.

The evolution of interactive manufacturing system and methods has shown the potential to create amazing object designs in both the 2-D and 3-D environments. Despite the ability to fabricate custom designed products, an easy to use, consumer- centric design tool fails to exist. The vast majority of consumers lack the Computer Aided Design (CAD) skills and experience to sit down and create designs. Interactive systems and methods have changed many industries, such as gaming, education and manufacturing.

Thus, there is a need for interactive evolution system and methods that reduce user fatigue and improve evolutionary success as compared to traditional evolution system and methods, such as those driven by mouse clicks or keyboard interaction. The invention satisfies this need.

SUMMARY OF THE INVENTION

Consumers are likely to spend more time viewing advertisements of business they end up choosing than those from business they do not choose suggesting that using visual attention as a proxy for preference in eye tracking technology is likely to bias selection towards designs preferred by the user. This is further supported by a finding that consumers tend to fixate longer on brands of products that they eventually chose, compared to alternative choices. Similarly, it has been found that consumers are more likely to spend more time looking at types of ads to which they were instructed to pay attention suggesting that attempts at target-driven evolution are justified in using visual attention as a proxy for intentional preference towards certain designs.

One of the greatest potential advantages of eye tracking technology is that it gathers user-feedback on all displayed objects, not just the one or two a user may end up selecting in traditional click-based selection via a mouse or input-based selection via a keyboard. According to the invention, a user paints all of the objects with preference information via the amount of time the user gazes at each object, providing much more data to an interactive evolution algorithm per generation, thereby reducing fatigue and improve performance. For proof-of-concept purposes of this application, the interactive evolution algorithm is a Compositional Pattern- Producing Network (CPPN) as applied to NeuroEvolution of Augmenting Topologies (NEAT), i.e., CPPN-NEAT. However, any evolutionary algorithm and encoding is contemplated.

More importantly, one of the main motivations of the invention is the notion that eye gaze/movements represent an inadvertent/subconscious stream of information. While traditional interactive evolution has made users explicitly choose from a number of potential choices, the invention does not require any active participation as required by traditional methods. By simply looking at the screen, a user's preferences can be gathered based on where the user spends time looking. User's preferences can also be gathered based on the user's eye movement, for example, a user's eyes moving back and forth on the display device over a period of time. Thus, the invention does not require users to perform any action (or even be aware that they are providing information).

Not only does the invention overcome user fatigue, but also speeds up evaluations, and increases evaluation quality by tracking where a user's eyes are looking to gauge their interest in content such as objects. For purposes of this application, content is referred to specifically in the form of an object. However, any form of content is contemplated including 2-D and 3-D tangible or intangible thing, design, or artifact. According to the invention, eye tracking technology can be used for direct design and free-form design of content.

In addition to its optimization advantages, eye tracking technology is an attractive interface for interactive design, as it can also enable participation from new populations of users, such as those with physical disabilities, or those using devices which traditionally do not employ interactivity such as televisions, but could easily take advantage of passive or involuntary interactions. Increasingly, consumer devices such as computers and cell phones now include the capability to incorporate eye tracking, suggesting the possibility for including passive, preference-driven, customized design as part of everyday technological interactions. The invention not only automates design, but coupled with recent innovations in 3-D printing, an effortless interface greatly increase the use the general public has for in-home 3-D printers.

Eye-tracking enables an open-ended exploration of evolved designs. Although the invention is discussed in reference to an eye tracking device, it is contemplated that the invention may incorporate other modes of detecting which object a user is interested in, such as a Kinect or similar device for determining the position of a user's head - which can improve the accuracy of eye tracking and indicate user interest in its own right -, as well as brain-scanning technologies, such as the electroencephalogram (EEG) headsets such that a user could control a mouse via mind control to indicate preferences, and also where a user's emotional state can be inferred to determine their feelings regarding displayed objects.

According to the invention, natural, interesting objects are evolved and then printed on 3-D printers, enabling the objects to exist in the physical world. In one embodiment, users may interactively evolve 3-D objects by manually performing fitness evaluations such as those gleaned by eye-tracking technology. The fitness of each user is a sum of the fraction of display time a user spends looking at each object presented in a population. The invention transforms interactive evolution from a tool that is rarely used due to user fatigue, to one that allows the user to sit back and relax while content is morphed to match their preferences. Although certain aspects of the invention are discussed in reference to an interactive evolution process, the invention is applicable to any automated design process that adapts designs to user preferences.

The future potential of eye tracking for interactive design is enormous - especially when one considers its potential for commercial use. The ability to customize design without the need to formally describe it may open many doors for distributed design, which couple well with the increase in distributed fabrication as 3- D printers become commonplace appliances. Furthermore, the invention may be embedded within existing programming such as internet TV, so that users can create product designs as part of their traditional viewing process.

The invention and its attributes and advantages will be further understood and appreciated with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The preferred embodiments of the invention will be described in conjunction with the appended drawings provided to illustrate and not to the limit the invention, where like designations denote like elements, and in which:

FIG. 1 illustrates a block diagram of an exemplary embodiment of the system according to the invention.

FIG. 2 illustrates a flow chart of an exemplary embodiment of the method according to the invention.

DETAILED DESCRIPTION

Eye tracking is the process of measuring either the point-of-regard (where one is looking) or the motion of an eye relative to the head. More specifically, the point- of-regard is a position in rendered content - also referred to as content representations - that the user is presumed to be viewing. The dimensions of the point-of-regard may vary, for example, it may be a point (i.e., line of sight), or a range, or an area.

An exemplary system 100 according to the invention is shown in FIG. 1. The exemplary system 100 as shown may be used to implement the methods according to the invention using one or more processor devices 108. The system 100 includes a display device 102 and an eye tracking device 104 connected to communication infrastructure 106 - such as a bus -, which forwards data from the communication infrastructure 106 to other components of the system 100.

The display device 102 may be, for example, a monitor, touch screen, or any other computer peripheral device, or any combination thereof, capable of entering and/or viewing data. It is also contemplated the display device 102 may be a web- based interface accessible through the system 100. According to the invention, the system 100 may be a small-sized computer device including, for example, a personal digital assistant (PDA), smart hand-held computing device, cellular telephone, or a laptop or netbook computer, hand held console or MP3 player, tablet, or similar hand held computer device, such as an iPad®, iPad Touch® or iPhone®.

The eye tracking device 104 measures eye positions and eye movement. More specifically, the device 104 incorporates illumination, sensors and processing to track eye movements and gaze point. The use of near-infrared light allows for accurate, continuous tracking regardless of surrounding light conditions. This technology is often referred to as pupil center corneal reflection eye tracking. The eye tracking device 104 may be a remote eye tracking, or a mobile eye tracking. It is contemplated that the eye tracking device 104 may be a camera including a standard webcam.

The eye tracking device 104 can be calibrated using software, in which the user focuses on a blue dot as it moves to a variety of different locations on the display device 102. More specifically, the eye tracking device 104 operates by shining infrared light into the eye of the user to create reflections that cause the pupil to appear as a bright, well-defined disc in eye tracking device 104. The corneal reflection is also generated by the infrared light, appearing as a small, but sharp, glint outside of the pupil. The point being looked at by the user is then triangulated from the corneal reflection and the pupil center.

The system 100 includes one or more processor devices 108, which may be a special purpose or a general-purpose digital signal processor device that processes certain information. The system 100 also includes a main memory 110 and/or secondary memory 112. Main memory 110 includes, for example, random access memory (RAM), read-only memory (ROM), mass storage device, or any combination thereof. Secondary memory 112 may include, for example, a hard disk unit, a removable storage unit, or any combination. Main memory 110 and/or secondary memory 112 may each include a database 111 , 113, respectively.

The system 100 may also include a communication interface 114, for example, a modem, a network interface (such as an Ethernet card or Ethernet cable), a communication port, a PCMCIA slot and card, wired or wireless systems (such as Wi-Fi, Bluetooth, Infrared), local area networks, wide area networks, intranets, etc.

It is contemplated that the main memory 110, secondary memory 111 , including database 111 and database 113, or a combination, function as a computer usable storage medium, otherwise referred to as a computer readable storage medium, to store and/or access computer software including computer instructions. For example, computer programs or other instructions may be loaded into the system 100 such as through a removable storage device, for example, a ZIP disk, portable flash drive, optical disk such as a CD or DVD or Blu-ray, Micro-Electro- Mechanical Systems (MEMS). Computer programs, when executed, enable the system 100, particularly the processor device 108, to implement the methods of the invention according to computer software including instructions. The system 100 may perform any one of, or any combination of, the steps of any of the methods according to the invention.

Communication interface 114 allows software, instructions and data to be transferred between the system 100 and external devices or external networks. Software, instructions, and/or data transferred by the communication interface 114 are typically in the form of signals that may be electronic, electromagnetic, optical or other signals capable of being sent and received by the communication interface 114. Signals may be sent and received using wire or cable, fiber optics, a phone line, a cellular phone link, a Radio Frequency (RF) link, wireless link, or other communication channels.

The system 100 of FIG. 1 is provided only for purposes of illustration, such that the invention is not limited to this specific embodiment. It is appreciated that a person skilled in the relevant art knows how to program and implement the invention using any computer system or network architecture.

FIG. 2 is a flowchart 200 according to one embodiment of a method for developing content by a user. The content the user desires to develop can be accomplished using directed design methods or free-form design methods. According to the invention, directed design methods are those used to develop by the user an object pre-selected by the system, i.e., target object. Content can be created without requiring a target object. Free-form design methods are those used to develop by the user any object the user desires, i.e., creative object. As an example, a user's trace of the point-of regard is processed by the system using a function - stochastic, machine learning - to produce a representation of the one or more preferences of the user. The user can simply sit back and look at the display device while the content changes.

The method is implemented with a user positioned in front of the system 100, specifically display device 102, as described in reference to FIG. 1. For example, a user may be placed approximately 27 inches in front of the display device. At step 202, a plurality of content representations, or objects, is presented to a user on the display device. Each content representation is displayed according an attribute type. More specifically, each content representation is displayed according to an attribute value for the attribute type. In one embodiment of the invention attribute types include, for example, size, color, shape. For the attribute type size, the attribute values may include, for example: small, large. For the attribute type color, the attribute values may include, for example: red, blue, green. For the attribute type shape, the attribute values may include, for example: cone, oval, rectangle. However, any attribute type - texture, length, composition - and values - medium, purple, square - can be used according to the invention. Furthermore, the invention is applicable to multi-part objects such as faces. In one embodiment, the plurality of content representations is shown to the user in the structure of an array, such as a 3 x 5 array, of 3-D objects. In another embodiment, each content representation is shown as rotating around its vertical axis.

A user pays more attention to the content representation that most resembles the object - target object, creative object - the user is attempting to develop or create. At step 204, an eye tracking device 104 (FIG. 1) calculates the point-of-regard data over a specified period of time for each content representation. For example, the eye tracking device 104 loses a signal resultant from the user's pupils leaving the capture range of the eye tracking device 104 or the point-of-regard appeared off the display device, the system 100 pauses, only to resume upon the return of a valid, onscreen signal, e.g., eye tracking device captures user's pupils.

At step 206, the processor device 108 (FIG. 1) records the point-of-regard data for each content representation. 5. In one embodiment, point-of-regard data is recorded only within an array of three-dimensional objects.

The processor device accumulates the point-of-regard data for each content representation to obtain accumulated data for each content representation at step 208. The accumulated point-of-regard data is taken as a proxy for an affinity of the user for the one or more content representations. In one embodiment, data is accumulated using a non-weighted sum. In an alternative embodiment, data is accumulated using a weighted sum, for example, to discount any early attention paid to surprising or different, yet ultimately uninteresting object.

Then, at step 210, the processor device compares the accumulated data for each content representation to a pre-determined threshold value. In one embodiment the predetermined threshold value is one second; however any value is contemplated.

At step 212, the process device determines the accumulated data for each content representation exceeding the pre-determined threshold value to obtain one or more favored content representations. Of those, a content representation is selected that has an accumulated data exceeding the predetermined threshold value by the greatest amount at step 214. The selected content representation is illustrated on the display device at step 216. In one embodiment, the selected content representation is highlighted to designate to the user that it is the content representation the user selected. It is contemplated that the user may communicate to the system 100 (FIG. 1 ) whether or not the selected content representation illustrated on the display device is correct.

In order to print the content representation, such as an object in 3-D, it is produced for presentment on the display device using Compositional Pattern- Producing Network (CPPN) as applied to NeuroEvolution of Augmenting Topologies (NEAT), i.e., CPPN-NEAT. Although CPPN and NEAT are used for proof-of-concept purposes according to one embodiment of the invention, any evolutionary algorithm and encoding is contemplated.

CPPN is a way to encode designs in the same way nature encodes its designs (e.g. overlapping chemical gradients during the embryonic development of animals such as jaguars, hawks, or dolphins). CPPN is similar to a neural network, but its nodes contain multiple math functions, for example: sine, sigmoid, Gaussian, and linear.

CPPN produces geometric output patterns that are built up from the functions of these nodes. Because the nodes have regular mathematical functions, the output patterns tend to be regular (e.g., a Gaussian function can create symmetry and a sine function can create repetition). These patterns specify phenol-typic attributes as a function of their geometric location. According to the invention, each voxel has an x, y, and z coordinate that is input into the network, along with the voxel's distance from center d. An output of the network, queried at each geometric-coordinate location, specifies whether any material is present at a given location. The remaining output nodes are queried once (at the center point d) and specify the red, green, blue (RGB) values that comprise the object's color. By producing a single CPPN representing the functional structure of a design, and iteratively querying it for each voxel, the entire structure of the object at any resolution can be produced.

CPPN-NEAT iteratively queries each voxel within a specified bounding area and produces output values as a function of the coordinates of that voxel. These outputs determine the shape and color of an object. The voxel shape is then smoothed, for example, with a Marching Cubes algorithm to produce the final object. Although the CPPN-NEAT network is queried at some finite resolution, it actually specifies a mathematical representation of the shape and thus, critically for high quality 3-D printing, it can be queried with arbitrarily high resolution.

CPPN evolves according to the evolutionary algorithm, NEAT. In one exemplary embodiment, a population size of 15 is used, such that the entire population is displayed to the user at each generation in a 3 x 5 array. At each refresh loop of the NEAT algorithm and on-screen representations, if the eye tracking device records the user's point-of-regard within the 3 x 5 array cell corresponding with that user, that user would gain the clock time since the last refresh loop of the algorithm (this value is typically a small fraction of a second). This process lasts until one user representation accumulates one second (1000 milliseconds) of time it was looked at by the user. At this point, the generation ceases, and each user is assigned the fitness equal to the time it was looked at during that generation (in milliseconds). Thus, the top user at each generation would have a fitness of 1000, while all other users have a fitness between 999 and 1 (the minimum baseline fitness), depending on the time the user spent looking at each of the content representations during the given generation.

The invention allows hands-free design by passively gathering user feedback via eye tracking technology, which accurately infers which object the user is looking at and can use that information to direct successful design sessions. Users can successfully design objects - target and creative - and can develop interesting, novel shapes without touching a keyboard or mouse.

While the disclosure is susceptible to various modifications and alternative forms, specific exemplary embodiments of the invention have been shown by way of example in the drawings and have been described in detail. It should be understood, however, that there is no intent to limit the disclosure to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure as defined by the appended claims.

Claims

CLAIMS:

1. A method for developing content by a user, the method implemented on a computer system including at least a processor device, a display device, and an eye tracking device, the processor device executing the method comprising the steps of:

presenting on the display device a plurality of content representations; calculating by the eye tracking device point-of-regard data over a specified period of time of the user for each content representation;

recording by the processor device point-of-regard data for each content representation;

accumulating by the processor device the point-of-regard data for each content representation to obtain accumulated data for each content representation;

comparing by the processor device the accumulated data for each content representation to a pre-determined threshold value;

determining by the processor device the accumulated data for each content representation exceeding the pre-determined threshold value to obtain one or more favored content representations;

selecting by the processor device an accumulated data for a favored content representation exceeding the predetermined threshold value by the greatest amount to obtain a selected content representation;

illustrating on the display device the selected content representation.

2. The method for developing content by a user according to claim 1 , wherein each content representation of said presenting step is shown as rotating around a vertical axis of the content representation.

3. The method for developing content by a user according to claim 1 , wherein the plurality of content representations is an array of three-dimensional objects.

4. The method for developing content by a user according to claim 1 , wherein the predetermined threshold value is one second.

5. The method for developing content by a user according to claim 3, wherein the point-of-regard data of said recording step is recorded only within the array of three-dimensional objects.

6. The method for developing content by a user according to claim 1 , wherein at least one content representation of the plurality of content representations is encoded using a CPPN algorithm.

7. The method for developing content by a user according to claim 6, wherein at least one content representation of the plurality of content representations is evolved using a NEAT algorithm.

8. The method for developing content by a user according to claim 1 , wherein each content representation of said presenting step is shown by an attribute type.

9. The method for developing content by a user according to claim 8, wherein the attribute type is selected from the group comprising of: size, color, shape.

10. The method for developing content by a user according to claim 9, wherein each attribute type comprises one or more attribute values.

11. The method for developing content by a user according to claim 10, wherein the one or more attribute values for the attribute type size is selected from the group comprising of: small, large.

12. The method for developing content by a user according to claim 10, wherein the one or more attribute values for the attribute type color is selected from the group comprising of: red, yellow, blue.

13. The method for developing content by a user according to claim 10, wherein the one or more attribute values for the attribute type shape is selected from the group comprising of: cone, oval, rectangle, square.