[go: up one dir, main page]

WO2001025947A1 - Procede permettant de recommander de maniere dynamique des sites web et de repondre a des requetes d'utilisateurs repartis par groupes d'affinite - Google Patents

Procede permettant de recommander de maniere dynamique des sites web et de repondre a des requetes d'utilisateurs repartis par groupes d'affinite Download PDF

Info

Publication number
WO2001025947A1
WO2001025947A1 PCT/US2000/027419 US0027419W WO0125947A1 WO 2001025947 A1 WO2001025947 A1 WO 2001025947A1 US 0027419 W US0027419 W US 0027419W WO 0125947 A1 WO0125947 A1 WO 0125947A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
users
additional
previously
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2000/027419
Other languages
English (en)
Inventor
Liad Y. Meidar
George H. Milligan
John C. Day
Monte F. Hancock
Rodney L. Mccormick
Erik Fretheim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU78579/00A priority Critical patent/AU7857900A/en
Publication of WO2001025947A1 publication Critical patent/WO2001025947A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention relates generally to Internet search engine methods and technologies as well as systems and methods for deriving marketing data based upon accumulated Internet usage data for groupings of users.
  • the global computer network known as the Internet has emerged as a mass communications and commerce medium enabling millions of people worldwide to share information, create community among individuals with similar interests, and conduct business electronically.
  • IDC International Data Corp
  • a market research firm the number of Internet users will increase from approximately 97 million at the end of 1998 to 320 million by the end of 2002.
  • This exponential growth of the World Wide Web portion of the Internet (the "Web") has made it increasingly difficult for individual users to derive maximum value from the Web.
  • the explosive growth of the Internet is unprecedented in its magnitude, diffusion, and capabilities.
  • Finding information on the Internet is becoming increasingly difficult as the Internet continues to exponentially grow and change. Web sites have proliferated along with the data available on these sites, making it more difficult and time consuming for users to find the information they want. Users are spending a substantial portion of their time searching for the specific information, products, or services they desire. According to IDC, over 100 million Web searches are conducted every day. Furthermore, once an Internet user locates a desired site or sites, the user often finds it difficult to navigate such sites. As Web technology has improved many Web sites have become more complex by adding new features. According to
  • Alexa InternetTM is a free Web navigation service that works with the user's browser and accompanies the user when surfing the Web, providing useful information about the sites currently being viewed and suggesting related sites.
  • Alexa InternetTM is a free Web navigation service that works with the user's browser and accompanies the user when surfing the Web, providing useful information about the sites currently being viewed and suggesting related sites.
  • a user downloads and installs software from
  • AlexaTM In March 1999, AlexaTM surpassed 2 million downloads of its software, had 150 advertising partners, and receives 130 million impressions monthly. The company had $370,000 in annual revenues when acquired by Amazon.comTM in April 1999 for $250 million. AlexaTM's technology has also been integrated into both Netscape Navigator® 4.5 and Microsoft Internet Explorer® 4.0 and later versions.
  • eTourTM is another recommendation system that automatically directs people to a different Website, matched to their unique interest, each time they connect to the Internet and reward users for its use. This personalized, ever-changing service has signed up over 325,000 people. eTourTM's recommendation system is powered entirely by humans and does not involve any sophisticated technology. All recommendations are for Websites that have paid eTourTM to recommend their site to a targeted audience.
  • Direct Hit Technologies is a provider of popularity-based, recommendation system, search engine products. This engine ranks results according to which sites millions of Internet searchers have found useful. Direct HitTM helps searchers broaden or narrow their search by displaying additional search topics that others have found helpful for similar searches. This technology also has personalization features that enable users to receive search results tailored to their gender, age, or geographic region. Direct HitTM's revenues are derived from licensing their technology to clients such as LycosTM, HotbotTM, MSNTM, ICQTM, LooksmartTM, among others. 2. Author-Controlled Systems
  • the "author-controlled" search engines such as InktomiTM, Alta VistaTM, ExciteTM, InfoseekTM, and LycosTM work by comparing the words in the search request with the words in the millions of available Web documents. While these engines are capable of automatically locating a large amount of information, they have not proven reliable at reducing this large body of information down to a manageable set of documents relevant to the search request.
  • author-controlled engines essentially empower the authors of documents to control their own ranking by the words that they choose to put into their Web site documents. Because a ranking on the highly trafficked search engines means more traffic to the site owner, site owners have an incentive to get the highest possible placement for their site, regardless of its quality with respect to other sites or its relevancy to any particular search request. This conflict of interest has caused many site authors to strive for the highest placement possible by tricking the search engines.
  • InktomiTM's search technology powers many Internet search engines, which include regional or global Internet searching, retrieval systems for large text archives and powerful online search support for publisher archives.
  • This search application is optimized to handle the combination of massive data and larger user bases, without requiring the use of expensive multiprocessor supercomputers.
  • InktomiTM scientists recently developed a new technology (Concept Induction Technology), which uses supercomputing techniques to model human conceptual classification of content and projects this intelligence across millions of documents.
  • Alta VistaTM is a deep search spider that indexes all the pages within a Website.
  • the company uses a ranking algorithm to determine the order in which matching documents are returned on the results page.
  • Each document gets a grade based on how many of the search terms it contains, where the search terms are in the document, and how close to each other the search terms are.
  • Alta VistaTM also uses site popularity to help boost the ranking of a Website.
  • ExciteTM is different from other navigation tools because it uses "concept searching," understanding language to the point that it uses synonyms.
  • ExciteTM 's spider summarizes each page using sentences, which express its dominant concept. Pages are then reviewed and automatically rated. Keywords are assigned to a page according to what the spider deems is the page's theme. ExciteTM assigns a "confidence rating" according to how closely the queried words match what it considers to be the theme of a site.
  • search sites such as YahooTM, LookSmartTM and About.comTM use an editorially created directory of sites. These "editor-controlled" sites each employs a staff of editors to manually select and catalog Web sites. Even the aforementioned author-controlled search engines have added similar editorially created directories above their traditional search result listings. By slowly adding sites to the index, these companies build an index of Web documents that are each carefully reviewed before addition to the index. The amount of labor needed for such a task, however, is quite high. While the quality of this body of data tends to be much higher, this expensive and labor-intensive process is incapable of keeping up with the constant growth and change in the Web.
  • the Internet enables advertisers to target advertising and marketing campaigns utilizing sophisticated databases of information about the users of various sites and to directly generate revenues from these users through online transactions. As a result, the Internet has become a compelling means to advertise and market products and services. IDC estimates that global e-commerce revenue is expected to increase from approximately $32 billion in
  • Media Metrix provides Internet audience measurement products and services to leading Internet advertisers, advertising agencies, Internet properties, technology companies, and financial institutions. Media Metrix collects this data by measuring Internet usage from a representative sample, or panel, of personal computers with their proprietary metering system, which is contained in a software application installed on a panelist's personal computer The meter monitors all communications between the computer's operating system and the software applications and hardware that the operating system controls and monitors
  • Net PerceptionsTM is a developer and supplier of real-time recommendation technology that enables Internet retailers to market to customers on a one-to-one basis
  • Realtime technology predicts an individual's preferences and makes specific recommendations accordingly The technology does this by learning about each individual's preferences through observing relative behavioi, recalling past behavior, and asking the individual to rate a number of relevant items.
  • the technology then pools this information with knowledge gained from a community of other individuals who share similar tastes and interests.
  • This technology integrates collaborative filtering, neural networks, fuzzy logic, and genetic algorithms.
  • Artificial LifeTM is a provider of intelligent software bots for Internet applications.
  • Artificial LifeTM develops intelligent software bots that can multi-task across the enterprise with the effective use of natural language.
  • the bots are designed for user convenience and the automation of business-related Internet and intranet tasks including Web navigation, direct marketing, user profiling, information gathering, messaging, knowledge management, sales response, and call center automation.
  • Artificial LifeTM is also developing products for data mining, Web page analysis, statistical analysis, and direct marketing to support the functionality of Artificial LifeTM's suite of intelligent software products.
  • Ask JeevesTM is a provider of natural-language question answering services on the Internet for consumers and companies.
  • the Ask Jeeves question answering services allow users to ask a question in plain English and receive a response pointing the user to relevant
  • InquizitTM Technologies' patented software linguistically interprets the meaning and concepts of plain English. This technology analyzes sentence structure, grammar, word meanings, and content. It incorporates a dictionary that contains most of the common English words and all their different meanings in addition to using a dictionary of concepts and incorporates common sense knowledge.
  • IBM's Intelligent Miner for TextTM is a software development tool kit. This product offers the ability to extract patterns from text, organize documents by subject, and research for documents that match a given topic. IMT has text analysis tools and an advanced search engine enhanced with mining functionality and capabilities to visualize results. IBM is also currently working on a search technology research project named "Clever.”
  • Third VoiceTM enables online discussion forums for private, group, or public interaction. This service allows users to freely and openly express ideas at points of references anywhere in a Web page using a free browser companion.
  • HypernixTM is an Israeli company dedicated to developing innovative communications tools. With GooeyTM, its recently released freeware product, HypernixTM introduced the concept of Dynamic Roving Communities (DRC), which allows Internet users the opportunity to interact and communicate anywhere, and anytime on the Web. More than 50,000 users have downloaded this hybrid of Web surfing and chat technology since June 14,
  • Web pages are added to the Internet every day. To take full advantage of the Web, users must be able to successfully navigate a network of dispersed Web sites, which are generally not connected in a logical fashion.
  • search engines Users currently rely on Internet search engines or directories of Web sites and Web pages to locate information and find sites of interest. Search engines typically require consumers to construct keyword or complex search strings that often result in hundreds or thousands of matches. As directories become larger, they require users to move through large and complex hierarchies of information. As the Internet grows, users of conventional search and directory products are finding that locating the information they need is increasingly difficult.
  • the problem with most search engines is that they return an overwhelmingly large number of Web sites for each search request. For example, one of the major search engines returns over 163,720 possible Webs sites for the search request "Boston Car Dealerships.” Obviously, it is impossible for a person to look at all of these sites, and many people spend a
  • search engines are the most common search method, according to eStats 0 53% of users still rely on recommendations from friends and relatives as one of the most common navigations methods. What is needed then is a service which complements that of search engines; a service that points out to the user the most popular sites visited by people within the user's demographic pool, and which match the areas of interest to the user. In addition, it would be desirable for a user to be able to instantly communicate with an 5 appropriate "affinity group" to seek answers to questions or information needs. Affinity groups are developed by the personal navigation system based on common interests and/or backgrounds as defined by usage patterns and demographics.
  • the invention disclosed herein comprises a personal navigation system that uses advanced artificial intelligence technology that transforms the way users presently navigate, communicate, and find relevant information on the Internet.
  • the personal navigation system as a permission-based consumer and business tool, also changes the way companies reach their prospective customers, as well as provides some of the most revealing information available about consumer behavior on the Internet.
  • the personal navigation system provides a nonobtrusive window adjacent to the user's main browser (e.g., Internet Explorer® or Netscape Navigator®), through which the personal navigation system makes recommendations of sites that would be of interest to the user.
  • the personal navigation system makes its recommendations based on complex pattern matching algorithms that take into account the past navigating behavior of the user and behaviors of others with similar backgrounds who have demonstrated interests in the same concepts to create groupings based upon affinity between users.
  • the personal navigation system combines detailed demographic data along with time stamped Web page content to develop a histogram that is translated into a complex waveform representing a user's usage.
  • the complex waveform of a user is the DNA of that user's web behavior.
  • the user's interests at their most basic level, i.e., it is a collection of "key words” or "atomic phrases” that represent "the meaning" of the Web pages they have visited.
  • the user's waveform is matched to other similar waveforms to provide information source recommendations.
  • the time stamping feature tracks interests as they change over time.
  • the personal navigation system provides a personalized and active approach to recommending sites, which others have found to be useful or interesting.
  • An additional component of the personal navigation system is a dialogue box.
  • the personal navigation system dialogue box allows the user to query on any subject of interest.
  • the personal navigation system instantaneously forms an ad-hoc affinity group to which it transmits this query anonymously through e-mail to ask for recommendations from other users who have demonstrated an interest in the topic in question. Recipients can, if they choose, respond anonymously and, if both parties choose, can engage in blind or offline open dialogue.
  • the personal navigation system dialogue box will allow users to communicate with their peers anonymously around the world to get recommendations on sites which other users have found useful or to answer other questions.
  • the personal navigation system can amass valuable marketing data.
  • the personal navigation system uses sophisticated data mining techniques to reveal significant details on consumer behavior on the Internet.
  • the personal navigation system can use this marketing data to generate marketing intelligence reports.
  • the personal navigation system is able to provide answers to such questions as
  • Figure 1 is a schematic representation of a preferred network for implementing the persona] navigation system of the present invention over the Internet.
  • Figures 2A and 2B depict a flow process for registering and creating a waveform a user within the personal navigation system of the present invention.
  • Figure 2C is a schematic representation of various matching methodologies performed by the personal navigation system of the present invention.
  • Figures 3A and 3B depict a flow process for implementing an anonymous dialogue between users of the personal navigation system of the present invention.
  • Figure 3C is a schematic representation of an anonymous dialogue between users of the personal navigation system of the present invention.
  • Figure 4 is depicts a flow process followed by the personal navigation system of the present invention for mining user data and creating marketing reports.
  • FIG. 5 is schematic diagram of the various functional components of the server end of the personal navigation system of the present invention.
  • Figure 6 is a representation of a universal histogram according to the personal navigation system of the present invention.
  • Figure 7 is a combined representation of a user waveform and histogram according to the personal navigation system of the present invention.
  • FIG. 8 is a schematic diagram of the various functional components of the user end of the personal navigation system of the present invention. Detailed Description of the Preferred Embodiments of the Invention
  • the most preferred embodiment comprises a personal navigator system for the Internet.
  • the detailed description further discusses methods for generating marketing intelligence reports about consumer behavior on the Internet based upon accumulated usage data within the context of the personal navigator system.
  • Figure 1 depicts the overall architecture of the personal navigation system of the present invention, which follows two distinct technical paradigms.
  • a portion of the personal navigation system operates in a thin-client environment through an industry standard browser on the user's computer 100 accessing the personal navigation system Web site 1 10, preferably over the Internet 120.
  • the remaining portion executes on a combination of both the user's computer 100 (client) and the personal navigation system server 1 10.
  • Both the client and server execute independently with a point-to-point communication link being established on-demand for exchange of information, when appropriate.
  • Other possible network configurations for example distributed networks, local area networks, wide area networks, and the like, are well known in the art and may be alternately used to implement the processes of the invention.
  • Such other communication networks may include intranets, private and public networks, ATM networks, telephony networks, and broadcast, cable, and satellite television.
  • Such other communication networks may include intranets, private and public networks, ATM networks, telephony networks, and broadcast, cable, and satellite television.
  • numerous other information sources are accessible over the Internet and transferred via Internet protocol packets.
  • Other information sources are available via telephony networks.
  • a further example is the in-band and out-of-band information transmitted in television broadcasts, most notably in vertical or horizontal blanking intervals.
  • Figures 2A and 2B depict the steps taken by a user to register with and begin using the personal navigation system.
  • a user accesses a Web site hosting the personal navigation system 200, for example, the Personal NavigatorTM system soon to be available from Personal Navigator, Inc.
  • the user then completes a questionnaire 202 including requests for name, address, e-mail address, gender, birthday, occupation, income level, marital status, number of children, and college attended.
  • the user further checks off boxes indicating areas of general interest to the user.
  • the user reads terms of use and privacy statements 204, opts whether to receive targeted e-mail information and advertisements 206, and clicks a button, indicating acceptance of the conditions 208.
  • the personal navigation system creates an anonymous user ID 210 and automatically downloads the personal navigation system tracking software onto the user's computer 212.
  • the tracking software captures data on the user's behavior 214, including, but not limited to, the following: selected text from Web sites visited (preferably including nouns, but not adjectives); time of use data (date, day of week and time of day) and duration of viewing for each page viewed; frequency of hits by all users on each site viewed; and the path taken to reach each page viewed (e.g., by collecting HTTP commands).
  • the personal navigation system While the user is connected to the Internet, the personal navigation system completes a periodic (daily or as often as the user connects to the Internet) "quiet" upload of the user's usage data 216 collected by the tracking software on the user's desktop. The user is generally unaware of this event.
  • the personal navigation system then employs pattern-matching algorithms 218 to determine the following: concepts of interest to the user 220; sites matching concepts of interest to the user viewed most commonly by the user's affinity group 222 (developed by the personal navigation system based on common interests and/or backgrounds as defined by usage patterns and demographics); sites most commonly viewed by the user's affinity group 224; and sites on the Internet containing concepts of interest to the user 226. See Figure 2C.
  • Figures 3A and 3B depict the steps of the ensuing process.
  • a browser 300 e.g., Internet Explorer® or Netscape Navigator®
  • the personal navigation system window opens along-side the user's browser window 302.
  • the personal navigation system window displays a list of recommended of sites to visit 304, and links thereto, based on the user's current navigation matched against sites with similar concepts historically visited most commonly by members of the user's affinity group, or presently being visited by members of the user's affinity group
  • the personal navigation system window may also display products of potential interest to the user based upon concepts of interest to the user and products viewed by members of the user's affinity group
  • the personal navigation system further opens and displays a dialogue box 306 in which the usei can entei any question, comment, or other message of interest 308
  • the message is broadcasted anonymously to the personal navigation system dialogue box of all personal navigation system users who have indicated or demonstrated an interest in the concept(s) contained in the message 310
  • a message flag is present to indicate that they are in receipt of an anonymous message from another personal navigation system usei 312 Users in receipt of a broadcasted message choose whether to respond anonymously to the sender of the original message or whether to ignore it
  • a first user interested in Cajun cooking may enter a query in the dialogue box seeking good Cajun lecipes 318
  • a second user of the personal navigation system in the first user's affinity group, simultaneously connected to the system via the Internet leceives the tirst user's message broadcast bv the system, and enters a response 320
  • the personal navigation system recognizing that both the first and second users are piesently, simultaneously connected to the network, provides a real-time dialogue thread between the users to support ongoing, anonymous communications 322
  • a further aspect of the personal navigation system is its ability to "mine" data on anonymous consumer behavior over the Internet to create marketing intelligence reports that assist e-commerce companies to define marketing strategies
  • Marketing intelligence leports may be used, for example, to optimize text, colors, and placement of on-line ads; understand customer behavior in navigating the Internet; and send targeted e-mail (for those users who have opted to accept email advertising).
  • the personal navigation system collects significant data 400 on each user, including demographic information such as age, gender, zip code, income level, marital status, number of children, and education; Web sites and pages viewed; time (date, day of week, and time of day) and duration of each page viewed; the content of each page viewed (e.g., nouns and proper nouns identified on the page), including advertisements but not graphics; the path taken to reach each page viewed; and the frequency of page views.
  • the personal navigation system then applies advanced data mining techniques to analyze the data captured from each personal navigation system user 402. This allows the personal navigation system to make insightful conclusions about consumer behavior on the Internet 404. These conclusions are summarized in various customized marketing intelligence reports 406. It is estimated that about 50,000 users provide the personal navigation system with a "critical mass.” This critical mass translates to very pertinent Web site recommendations and marketing intelligence reports.
  • Marketing intelligence reports prepared by the personal navigation system may take the form of "syndicated reports" and "customized reports.”
  • Syndicated reports comprise general information on Web usage, which may be offered through subscription and distributed periodically (weekly or monthly). These syndicated reports may include, for example, information about: unique visitors to most popular Web sites; time (date, day of week and time of day) and average duration of usage; average unique Web pages visited per day and per month; demographic compositions of Web users; purchasing tendencies of Web users; and other behavioral data, including mass consumer trends.
  • the customized reports offer more in depth and revealing analysis of the data collected by the personal navigation system. These reports are prepared and sold individually by request of the customer. These custom reports offer answers to very specific and difficult questions about consumer behavior on the Internet. Examples of such questions topically arranged, which the personal navigation system can answer, are the following:
  • the technology comprising the personal navigation system consists generally of modern, state-of-the-art software development tools, software languages, communication protocols, and commercial off-the-shelf Web server and browser technology augmented with three key technologies. While the modern state-of-the-art components will certainly play an important role in the implementation of the personal navigation system, it is the three key technologies that create an advantage over similar systems and methods.
  • the first of these technologies is a natural language parsing tool that enables the identification and capture of key text contained within a Web page that collectively represents the "meaning" of the Web page.
  • the second technology uses a combination of several very multi-dimensional clustering algorithms that enable comparison of the multiple users Web usage (i.e., comparison of the meaning of each Web page), computation of a measure of their similarity, and derivation of affinity groups that represent collections of users with similar interests. By then comparing the list of Web sites visited by members of the affinity group, Web sites visited by other members of the affinity group can be recommended to each member.
  • the third technology is actually a collection of several algorithms and techniques that perform pattern recognition, data analysis, clustering, classification, inductive learning, and other intelligent information processing tasks. This collection of algorithms enables discovery of very sophisticated and complex relationships within Web usage that can provide valuable market intelligence reports to e-commerce and on-line marketing companies. The following is a brief description of each of these key technologies.
  • a natural language parsing tool is a software scripting language that performs many sophisticated functions, such as pattern matching and manipulating text and relational objects. It has powerful pattern-matching and string manipulation functions for "drilling down” into text files and decomposing them into relational objects. It also has built-in text formatting functions for "rolling up” relational objects, string concatenation, and producing text reports.
  • that natural language parsing tool is called RML, available from Computer Science Innovations, Inc. (CSI) of Melbourne, Florida.
  • CSI Computer Science Innovations, Inc.
  • RML was used for machine learning and data mining projects. For example, a system designed completely in RML was used to identify patterns in medical ledger information and subsequently convert medical charge descriptions into coded numbers for a large data warehouse company.
  • the human brain is the most sophisticated pattern-matching machine known to civilization. Viewing Web pages, understanding their meaning, and recognizing similarity or non-similarity between Web pages is a task that can be easily and quickly accomplished by the human mind. Yet, comprehending the "meaning" of a Web page and subsequently deriving a measure of similarity is an extremely difficult task to perform automatically in software.
  • the clustering algorithms comprise part of an Advisor Toolkit, which is also available from CSI.
  • CSI's Advisor Toolkit contains a variety of mathematical algorithms that possess the power to distinguish similarities and differences within the content of Web pages. The most significant of these is a neural paradigm, which, through unsupervised clustering, maps Web page content into multi-dimensional feature space and subsequently computes a measurement of "nearness" based on a variety of mathematical metrics. In effect, these algorithms can build a profile that accurately represents a user's Web behavior. This capability enables recognition of similar Web usage and similar interests among different users and thus, the establishment of affinity groups and recommendations of potential Web sites of interest.
  • these routines collectively comprise the Advisor Toolkit mentioned above.
  • the Advisor Toolkit is a collection of advanced software routines that perform pattern recognition, data analysis, clustering, classification, inductive learning, and other intelligent information processing tasks. These routines enable exploitation of the repository of Web usage information collected by the personal navigation system. Combined into a hybrid system, these powerful routines can perform detailed and in-depth analysis of Web usage data for knowledge discovery and identification of hidden patterns and correlations within the usage patterns. From these analyses, market intelligence reports, customer behavior reports, and predictions of behavior based on the historical behavior and profiles of individual users, and communities of users, can be produced.
  • Atomic Phrase An atomic phrase is defined as a string of text that if subdivided, destroys its semantic associations. For example, “tennis” and “chief operating officer” are both atomic because “tennis” cannot be subdivided and subdivision of "'chief operating officer” produces phrases that generally do not preserve its connotation. "French"
  • Semantic Association The semantic association of two phrases is defined as one or more words that, when used alone in separate Web searches, return URL lists having many items in common.
  • the software residing on the personal navigation system server performs several primary functions: 1 ) sign-up, 2) communications, 3) definition of affinity groups, 4) processing of banner advertisements, 5) messaging (i.e., anonymous e-mail), 6) utility routines, and 7) reports. Each of these functions may be implemented as separate software modules and interact through either internal message passing or via event queuing. Figure 5 provides a pictorial view of how these functions interact.
  • the sign-up module 500 enables a user to subscribe to the personal navigation system service.
  • the sign-up module is invoked by connecting to the personal navigation system Web server using an Internet browser (e.g., Internet Explorer® or Netscape Navigator®). After selecting the "Sign-Up” option, the personal navigation system "Terms and Conditions" is displayed within the browser window. Acceptance or non-acceptance of the Terms and Conditions is indicated by the subscriber pointing and clicking on either an "Accept" or
  • the “Decline” button displays an appropriate completion message and exits the sign-up module.
  • the “Accept” button triggers several activities that prepare for initiation of the personal navigation system on the user's computer.
  • a formatted data entry panel is displayed within the browser into which the subscriber enters two pieces of information: the textual name of each user within the household that will be using the computer and demographic information describing each person. Completion of entry of the information is indicated by the subscriber by, for example, pointing and clicking on a "Next" button. All textual names and demographic information are stored in a data repository 502 on the personal navigation system server. The list of textual user names is also stored on the client computer. Storage on the client computer facilitates easy selection and change of the current user of the system without connecting to the personal navigation system Web server. As the number of users grows to a significant number, the amount of data in the data repository 502 will also grow to a substantial size.
  • the design and structure of the data repository 502 is preferably upwardly scalable to accommodate timely storage of data in the data repository
  • Each user's computer is assigned an identifier that uniquely and anonymously identifies the computer.
  • the identifier is preferably a concatenation of several pieces of information including one of the textual names, a date/time stamp, with time expressed to the millisecond, and a one-up numerical suffix.
  • the one-up numerical suffix is established and controlled by the personal navigation system server, thus allowing each number to be accessed only a single time and guaranteeing a unique identifier even contemplating the slim chance of multiple identifiers being created with the same textual name during the same millisecond.
  • This unique identifier is appended to the textual name of each individual user of the computer, thus uniquely identifying each user.
  • the communications module ⁇ c responsible toi sending and receiving all information from and to the Web servei
  • Three types of information are transmitted 1 ) Web site recommendations derived from an affinity group, 2) banner advertisements, and 3) responses to questions posed to affinity group membeis
  • Three types of information are received 1 ) HTTP commands and atomic phrases, 2) requests for banner adveitisements, and 3) questions to be posed to affinity group members
  • communications involving transmission of information are initiated by the server and are accomplished via a point-to-point (PTP) connection All communications involving receipt of information are again accomplished via a PTP connection, initiated, howevei, by the client computer All PTP connections, regardless of the originator, are established and remain open for only the duration of the transmission
  • the communications module operates based upon receive and transmit queues 504
  • the type of information m the queues is identifiable, preferably by the file type All received information is queued for processing by the other modules
  • the communication module pe ⁇ odically examines its "transmit queue" and if a file or files are present, transmits the file(s) to the appropriate client computers.
  • the communications module has responsibility for re-try of transmissions when communications fail or transfers are interrupted. When the transfer is complete, the transferred files are deleted from the server.
  • the definition of an affinity group is the hub of the personal navigation system.
  • the define affinity group module 506 examines all atomic phrases and, through a series of processing steps, enables comparison of the Web usage of different users and computation of a measure of similarity between users. Users determined to be similar are thus members of an affinity group.
  • Affinity groups are ad hoc dynamic associations; they vary with time and the nature of the comparison being performed. Affinity group definition is accomplished by the creation of a histogram that captures the universe of atomic phrases. The histogram is itself a list of all the atomic phrases captured from all the Web pages that have been visited by all users of the personal navigation system.
  • each atomic phrase is compared to the atomic phrases residing within the existing universe. If an atomic phrase is not present in the histogram, it is added to the universe. If the atomic phrase is already present, no action will be taken. Only one occurrence of each atomic phrase is present in the histogram.
  • the universal histogram can be described as a sequential list of every atomic phrase encountered to date. Note that an atomic phrase is comprised of any combination of the characters within the ASCII character set. Thus, the universal histogram is completely insensitive to language and context. A histogram representing a small world might be similar to that shown in Figure 6.
  • each user's histogram is a concatenation of their demographic information and their list of atomic phrases. All demographic information is represented in the histogram as numeric values. This requires the set of demographic information to be a closed set of information. Demographic information such as gender, age, marital status, and state of residence is captured and consistently described by numeric values. These values are simply transcribed into text when displayed. Other information, such as special interests and hobbies is selected from a closed list of information, thus also allowing numeric representation of this information
  • the list of atomic phrases is actually a list of pointers that point to the location of the atomic phrase within the universal histogram. Again, there is only a single occurrence of each unique atomic phrase in the user ' s histogram. Each pointer is accompanied by other relevant information including the following: 1) a date/time stamp reflecting when each atomic phrase was encountered; 2) a count which reflects the number of times the atomic phrase has been encountered; and 3) the URL of the Web page from which the atomic phrase was extracted.
  • the set of atomic phrases with the highest weighted value indicate a user's high level of interest in that topic.
  • Figure 7 depicts a sample histogram for an individual user.
  • Each histogram can be viewed as a waveform with the amplitude of the wave at any point in the histogram being the numerically coded demographic information or the weighted combination of count and recency.
  • These waveforms provide the mathematical basis for formation of affinity groups using a neural technology called autoclustering.
  • autoclustering By proper selection of the weight assigned to the recency of encounter with respect to the current date, a natural decay of the significance of each atomic phrase occurs. This natural decay very accurately depicts an individual user's change of interests over time and thus, a change of affinity groups as personal interests change.
  • demographic information should remain relatively static while the atomic phrases and their frequencies representing Web usage are very dynamic.
  • the content of the set of atomic phrases extracted from a Web page constitutes the entire "meaning" of the Web page. No external interpretation of each page is required.
  • Atomic phrases must be accurately and sufficiently recognized to successfully capture the "meaning" of the Web page.
  • meaning is extracted from a web page, or other information source using "cognitive engineering.”
  • Cognitive engineering is the technology associated with the design and implementation of computerized applications that emulate intelligent human behaviors, such as decision-making, plan development, and problem solving.
  • CEM Cognitive engineering technology applies cognitive engineering tools according to a "Cognitive Engineering Methodology (CEM).
  • CEM is an objective, systematic methodology for developing systems having embedded intelligence (e.g., neural nets, expert systems, and regression models). This methodology consists of seven steps: 1 ) problem evaluation and analysis; 2) feature extraction and enhancement; 3) sampling; 4) data analysis and modification; 5) model design and development; 6) model evaluation; and 7) system implementation, testing, and validation. Steps four through six above are repeated, as required.
  • a "spiral development methodology” based on a rapid prototyping approach is often appropriate.
  • the purpose of CEM is to provide a consistent framework within which the two components of data mining can be carried out: “knowledge discovery” and “predictive modeling.” Knowledge discovery occurs in steps 1 and 2 above. Predictive modeling occurs in steps 4 - 7 above.
  • Knowledge discovery is the isolation and characterization of actionable information from data. It consists of problem evaluation and analysis, feature selection, and feature enhancement
  • the first step is problem evaluation and analysis Problem evaluation begins with interviews of domain experts, and the collection of raw domain data from accessible repositories
  • a problem description is written by the system developer, and OLAP tools are used to assess the available data sources for quality and information content
  • Some techniques used to support the data analysis phase are first, conventional and statistical techniques such as population modeling by statistical moments (e g , means and standard deviations), correlation (e g , testing data for dependence or independence), chi-square analysis (e g , test hypotheses vis-a-vis the statistical character of the data), simple visualization (e g histograms, scatterplots, graphs, and charts), time-series analysis (e g , control charts, linear predictive modeling)
  • a second group of techniques are online analytical processing (OLAP) techniques such as stratification and segmentation (“slicing and dicing" data), roll-ups (summarizing data in
  • the second step in the knowledge discovery piocess is feature extiaction and enhancement Feature extraction is the process whereby data are characterized for processing by pattern recognition and exploitation tools and techniques
  • Some techniques used during the feature extraction process are elementary conventional methods such as counts, ratios, differences, and quotients, integial transtorms, Fourier transforms (e g , windowed fast Fourier transforms), wavelets (multi-resolution decomposition), and general kernel filters (spectral, spatial, and temporal)
  • Other techniques can be classified as quantization and coding, such as: MAX quantization, histogram equalization, and view-through-feature coding.
  • More techniques include semantic feature extraction such as tokenization (parse tree) and bag-of-words, as well as regression features such as model coefficients (e.g., slope of least-squares line).
  • Feature enhancement is the process of transforming and coding data in such a way as to make the information it contains more accessible for automatic exploitation by predictive models.
  • Some techniques used for feature enhancement are: Bayesian analysis (How well will a linear classifier do on this problem?); feature registration and normalization (e.g., z- scoring); excision, replication, and synthesis (e.g., class collisions and population imbalance); feature correlation and salience(Do the features provide independent information?); principal component analysis (PCA) (e.g., Karhunen-Loeve); independent component analysis (ICA); filling in missing data fields (e.g., intra-vector regression); and rule induction (RML, LCR, and BAM (e.g., BOLTZ routines)).
  • PCA principal component analysis
  • ICA independent component analysis
  • RML rule induction
  • LCR LCR
  • BAM e.g., BOLTZ routines
  • Predictive modeling is the automated exploitation of actionable information based upon the results of the knowledge discovery phase. It consists of sampling, data analysis and modification, model design and development, model evaluation, and system implementation, testing, and validation.
  • Sampling which is the third step of the CEM, is the first step to fall under the protective modeling phase.
  • Four statistically representative random samples are created from the data set conditioned in steps 1 and 2. These random samples are the calibration set, the training set, the validation set, and the hold-back set.
  • the calibration set is used to estimate statistical and informational theoretic parameters for the problem (e.g., ranges, minimum values, maximum values, variances, scores, counts, and entropies).
  • the training set is used to construct regression models from adaptive algorithms (e.g., neural networks).
  • the validation set is used to perform "blind tests" to determine the ability of the predictive model to generalize.
  • the hold-back set is retained in a blind store, and used for final validation of the completed model.
  • the fourth step is data analysis and modification. Once features have been extracted and the development sets are created by sampling, another round of analysis and data transformation is conducted. The same techniques and tools used in step 1 above are applied to the refined data sets to create enhanced sets that will serve as the basis for the final predictive model.
  • the fifth step is model design and development.
  • a predictive modeling paradigm e.g., neural network, expert system, or black-box regression
  • This selection is based upon the analysis conducted in earlier stages of the process, and is a matter of engineering judgment.
  • more complex problems in well-understood domains are addressed using conventional techniques (e.g., logistic regression and expert systems), while hard problems in poorly understood domains are addressed using advanced adaptive algorithms (e.g., neural networks).
  • Numerous tools for the automatic construction of predictive models may be used, such as model based
  • Model-based techniques can include the following: hard analytic models (e.g., ad hoc mathematical) and knowledge-based expert systems (e.g., forward and backward chaining).
  • Non-model based techniques can include the following: neural networks (e.g., backpropagation and reinforcement learning); multi-layer perceptrons; Hopfield nets (e.g.,
  • the sixth step involves model evaluation.
  • ART adaptive resonance theory
  • RCE restricted coulomb energy
  • RBF restricted coulomb energy
  • RFN radial basis functions
  • APN adaptive logic networks
  • hybrid systems e.g., "bagging”
  • CART-like systems e.g., INDUCE and SPLITS
  • the sixth step involves model evaluation.
  • To evaluate a predictive model it is applied to the validation set (a "blind test"), and scored for performance (typically for classification accuracy or optimization of some objective function). If the results are "good” (a subjective judgment), a further validation may be performed by combining the calibration and validation sets, and using n-fold cross validation.
  • Some other techniques used for model validation are sensitivity analysis and application to "use cases.”
  • the seventh step in the CEM process involves model implementation, testing, and validation.
  • a deliverable level of performance is achieved, a delivery version is constructed. This version is tested on the hold-back set.
  • the CEM process is used capture the "meaning" of the Web page.
  • the CEM tools in the CSI Advisor Toolkit are used to extract the pertinent content from the information source, a Web page in the preferred embodiment, and create the atomic phrases.
  • Web pages tend to change with relative frequency.
  • the atomic phrases that constitute the meaning of a particular Web page are dynamic and will change over time.
  • a natural decay of the importance of atomic phrases that no longer appear in a particular Web page will occur due to the measure of importance assigned to the recency of encounter. If the Web page changes so that the atomic phrases in the new set are different from those in the old, the site is not the "same site" for the purposes of the personal navigation system, even though the URL is the same.
  • the counts and recency of the new atomic phrases will indicate importance and form the basis for the affinity group.
  • An affinity group is a nonpersistent group of users whose histograms are similar at a particular moment in time. Affinity groups will continuously change and any individual user will likely be a member of multiple affinity groups, each of which represents a different collection of interests. The obvious question is how similar is "similar enough" for multiple users to reside in the same affinity group.
  • the fundamental technique used to determine similarities and create affinity groups is called autoclustering.
  • an autoclustering technique called Weighted Pair-Group Centroid is used by the personal navigation system to determine the similarity of users for placement in affinity groups. A detailed description of this and other autoclustering techniques can be found at http://www.statsoftinc.com/textbook.stcluan.html, which is hereby inco ⁇ orated by reference as though fully set forth herein.
  • the Weighted Pair-Group Centroid clustering algorithm assumes each entry in a user's histogram is a point in a feature space where the numeric value in each histogram is a coordinate in the feature space. In this way, using N values from a user's histogram represents that user as a point in an N-dimensional feature space. This allows the mathematics of Euclidean N-space to be applied to the analysis of a user's demographics and histories of visitations to information sources, e.g., Web pages.
  • Affinity groups are created by matching users in the N-dimensional feature space through clustering. This is a four-step process. First, histogram features are selected for use in creating an affinity group for a user or class of users. This is done by designating interests or demographic attributes that are of interest for the match. The default for user affinity grouping is to select the J attributes of the user having the highest histogram counts. Second, the number of users, M, needed to form the desired affinity group is determined. This will usually default to a statistically relevant value (e.g., 0.01 % of the total population), or a similarity threshold (e.g., a maximum distance beyond which individuals cannot be regarded as similar for the purposes of the clustering).
  • a statistically relevant value e.g. 0.01 % of the total population
  • a similarity threshold e.g., a maximum distance beyond which individuals cannot be regarded as similar for the purposes of the clustering.
  • each of the N- dimensional points (histogram values) in the entire population to be searched is assigned an initial weight of one.
  • the two N-dimensional points (histogram values) having the smallest distance between them are found. These two points are replaced by a single point located at their weighted mean, having a weight which is the sum of the weights of the original two points. This process is continued until exactly M points remain. More generally, any computable function can be used as a "similarity measure". If the Euclidean distance is used, conventional autoclustering is based on nearness results.
  • the personal navigation system architecture allows the fourth step above to be replaced with the following, more general, affinity clustering step.
  • a computable function (the objective function) is applied to the original user for which the affinity group is being formed.
  • the affinity group is called the value so obtained, V.
  • This same computable function is then applied to each of the M N- dimensional points (histogram values); call these values W,.
  • the list IW, - VI is sorted in ascending order, and the first (i.e., smallest) K elements are selected. These are the K points having function values most like the function value of the original user, and their members are selected as the affinity group
  • the alternate embodiment of the fourth step allows complete generality in the formation of affinity groups to support clustering for "arbitrary groups.
  • Arbitrary groups are affinity groups that are, for example, most alike in interest, most alike in web access schedules (apart from interest), most demographically similar, most similar in likelihood of some future behavior (e.g., purchasing, fraud, default), or most similar in an abstract sense.
  • the Web sites visited by the K closest neighbors in multi-dimensional space of all members of the affinity group will be compared to the web sites visited by the user.
  • Web sites, or other information sources, not visited by the original defining user are garnered from the K affinity members, and prepared for transmission to the user's computer.
  • Recommendations are preferably made based upon the popularity of the Web site, with more popular Web sites being recommended first.
  • Popularity is defined as a combination of the number of times the Web site is accessed, the recency of the access, and the dwell time at the site.
  • the atomic phrases within the question are concatenated with the user's demographic information forming a mini-histogram representing only the very limited universe of the question.
  • the mini-histogram is autoclustered as above to identify histograms (and so, other users) that are similar to the mini-histogram.
  • the web sites visited by these other users may be recommended to the poser of the question.
  • Banner advertisements are associated with specific affinity groups, and therefore, groups of atomic phrases. These associations are established manually by members of the personal navigation system staff by determining atomic phrases that represent the "meaning" of the banner advertisement. An individual's histogram is examined to determine regions of interest. Banner advertisements associated with regions of high weight are selected for transmission to the client computer. This function is implemented in the personal navigation system by a banner advertisement module 508.
  • the process question module 510 implements the dialogue box features of the personal navigation system, preferably through traditional electronic mail capabilities.
  • the atomic phrases within the question are concatenated with the submitter's demographic information forming a mini-histogram representing only the very limited universe of the question.
  • the mini-histogram is autoclustered and compared to the individual histograms that are nearby in multi-dimensional space. Histograms that are similar to the mini-histogram, i.e., histograms that contain high weights for the same, limited set of atomic phrases, are deemed eligible for receipt of the question.
  • the Web sites visited by these individuals are preferably recommended to the poser of the question.
  • An electronic mail message is programmatically formed and sent to each recipient.
  • incoming electronic mail messages are simply forwarded on to the submitter of the question. If the submitter is not on-line, the server stores any responses until such time that the submitter is on-line and the message can be sent. As the personal navigation system is preferably required to retain the anonymity of users, the process question module 510 translates user identities into the appropriate e-mail address.
  • the personal navigation system also preferably allows question submitters and responders to initiate a two-way dialog thread through which they can communicate directly. This is accomplished using Instant Messaging (IM) technology, originated by America
  • IM has recently become an industry standard means for establishing point-to-point communications between two on-line users.
  • IM technology is available through several commercial sources.
  • utilities module 512 Several utility routines provided by the utilities module 512 are preferred to effectively support the personal navigation system. These include but are not limited to the following: displaying a list of users and demographic information; displaying affinity groups; and displaying banner advertisements and attaching banner advertisements to an affinity group.
  • the report module 514 provides the ability to mine consumer's Internet usage behavior to create market intelligence and other reports, as deemed appropriate.
  • the primary tool used to identify usage trends is a collection of advanced software routines that perform pattern recognition, data analysis, clustering, classification, inductive learning and other intelligent information processing tasks. In the most preferred embodiment, these routines collectively comprise CSI's Advisor Toolkit. These routines enable exploitation of the repository of Web usage information collected by the personal navigation system. Combined into a hybrid system, these powerful routines perform detailed and m-depth analysis of Web usage data for knowledge discovery and identification of hidden patterns and coirelations within the usage patterns From these analyses, market intelligence repoits, customer behavior reports, and predictions of behavior based on the historical behavior and profiles of individual users, and communities of users, can be produced
  • the software downloaded to the user's computer peiforms a variety of functional tasks These may be combined in a single software module or may be separated into individual modules In eithei event, these modules interact through either internal message passing or via event queuing Figure 8 provides a pictorial view of how these functions interact
  • the capture module 800 is responsible for inteicept, parsing, and storage of Internet usage
  • the capture module executes as a plug-m to the computer's default browser
  • the capture module activates based upon two specific events posted by the browser 1) the sending of an HTTP command, and 2) the receipt of an HTML tag
  • the HTTP command is captured
  • the HTML tag is captuied and parsed for atomic phrases
  • the set of captured atomic phrases constitutes the entire "meaning" of the Web page
  • the atomic phrases are captured primarily from the HTML header, anchor, center, title, header tags, and potentially other tagged fields such as a table
  • aie normally rich in meaning Nontext objects (such as images and sounds) will not be captured
  • XML Extensible Markup Language
  • parsing of the HTML tags and extraction of atomic phrases are accomplished via a combination of the RML tool and well-established techniques within the natural language parsing domain.
  • RML performs the basic parsing task of separating the narrative text into sentences, phrases, and individual words. Each word then goes through several distinct processing steps. The residual after the processing steps represents the atomic phrases, and thus the meaning of the Web page.
  • Atomic phrases with a higher weight would have a higher representation of the meaning of the page. If weighting of words is desired, it is preferred to count the frequency at which the atomic phrase occurs, both within the individual Web page, and also throughout the universe of atomic phrases. With regard to an individual page, a higher frequency of occurrence would indicate a higher level of importance. With respect to the universe of atomic phrases, the inverse is true. A lower frequency of occurrence within the universe indicates a higher level of importance with regard to an individual Web page.
  • Both the HTTP commands and atomic phrases are saved in local storage 802 as disk files on the user computer in a specifically named "transmit" subdirectory which acts as queue for all data to be sent to the personal navigation system Web server. All saved information is date and time stamped and tagged with the user identifier.
  • the communications module 804 is responsible for sending and receiving all information from and to the user's computer. Three types of information are transmitted: 1 ) HTTP commands and atomic phrases; 2) requests for banner advertisements; and 3) questions to be posed to affinity group members.
  • the communications module operates based upon receive and transmit queues. The type of information in the queues is identifiable, preferably by the file type. All received information is queued for processing by the other modules.
  • the communication module periodically examines its transmit queue and, if a file or files are present, transmits the file(s) to the personal navigation system server.
  • the communications module has responsibility for re-try of transmissions when communications fail or transfers are interrupted.
  • the transferred files are deleted from the user's computer.
  • Incoming files are queued in a manner that enables recognition by the type of data they represent (i.e., affinity information or banner advertisements) and are subsequently pre-processed and displayed by the personal navigation system graphical user interface.
  • the graphical user interface (GUI) module 806 provides several functions including displaying Web site recommendations based on an affinity group, acting as a receiver for questions posted to affinity group members, displaying responses to posed questions, selecting the active user, changing user and/or demographic information, and displaying banner advertisements.
  • GUI graphical user interface
  • the personal navigation system GUI executes as its own window and is completely independent from Web browser operation on the user's computer.
  • the GUT module recognizes that Web site recommendations have been received.
  • the GUI displays the recommendations and initiates the flashing of a personal navigation system icon indicating new information has been received.
  • the GUI also provides a text window, the dialogue box, into which short, concise questions can be entered. Questions are then queued for transmission to the personal navigation system server.
  • the GUI provides an electronic mail capability that queues received messages and allows users to view the messages at their leisure. Most normally recognized email functions are implemented.
  • the "Instant Message" capability mentioned above, is implemented, which allows two personal navigation system users to establish a direct point-to-point link between their computers, thus allowing direct communication between the two subscribers in a completely anonymous manner.
  • the GUI module recognizes that new banner advertisements have been received.
  • GUI displays the banner advertisements in a manner viewable by the user.
  • the GUI also provides a means for selecting the current user of the computer.
  • a list of individual users resides on the client computer and is displayed in a manner that provides for easy selection of the desired user.
  • the GUI further provides a means for adding/changing individual users and demographic information, as well as a means for deleting an individual user.
  • a button on the GUI invokes the default Web browser, connects to the personal navigation system Web site, extracts the appropriate demographic information from the database, and displays the information in a formatted manner enabling changes to the information.
  • This screen is preferably the same screen that is used for initial entry of demographic information at the time of sign-up.
  • An “OK” button designates completion of the change and the new information is saved in the data repository.
  • a “Cancel” button aborts the update and no changes are made.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne une système de navigation personnalisé qui offre à l'utilisateur des recommandations relatives à des sources d'intérêt. Ces recommandations sont fondées sur des algorithmes de filtrage complexes qui prennent en compte le comportement de navigation (506) passé de l'utilisateur et le comportement d'autres utilisateurs ayant des antécédents similaires et ayant montré de l'intérêt pour les mêmes concepts. Le système de navigation personnalisé associe des données démographiques détaillées et un contenu de page Web estampillé à l'horodateur pour mettre au point une forme d'onde complexe (500) représentant les habitudes de navigation d'un individu. A mesure que l'individu navigue sur le Web, cette forme d'onde est comparée avec d'autres formes d'onde dans le but de fournir des recommandations relatives à des sites. Le dispositif d'estampillage à l'horodateur repère les centres d'intérêt à mesure qu'ils changent au fil du temps. Un élément complémentaire du système est une boîte de dialogue (504) qui permet à l'utilisateur d'effectuer une requête sur n'importe quel sujet d'intérêt. Le système forme instantanément un groupe d'affinité spécial constitué d'autres utilisateurs auxquels il transmet cette requête de manière anonyme, par courrier électronique ou messagerie instantanée, pour solliciter d'eux des recommandations dans le cadre d'un dialogue libre aveugle ou hors ligne. De plus, le repérage des habitudes de navigation des utilsateurs du système de navigation personnalisé permet à celui-ci d'amasser de précieuses données marketing.
PCT/US2000/027419 1999-10-04 2000-10-04 Procede permettant de recommander de maniere dynamique des sites web et de repondre a des requetes d'utilisateurs repartis par groupes d'affinite Ceased WO2001025947A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU78579/00A AU7857900A (en) 1999-10-04 2000-10-04 Method of dynamically recommending web sites and answering user queries based upon affinity groups

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15763299P 1999-10-04 1999-10-04
US60/157,632 1999-10-04

Publications (1)

Publication Number Publication Date
WO2001025947A1 true WO2001025947A1 (fr) 2001-04-12

Family

ID=22564589

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/027419 Ceased WO2001025947A1 (fr) 1999-10-04 2000-10-04 Procede permettant de recommander de maniere dynamique des sites web et de repondre a des requetes d'utilisateurs repartis par groupes d'affinite

Country Status (2)

Country Link
AU (1) AU7857900A (fr)
WO (1) WO2001025947A1 (fr)

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2371644A (en) * 2000-09-25 2002-07-31 Mythink Technology Co Ltd Real-time analysis of browsing over the internet
EP1449145A2 (fr) * 2001-11-26 2004-08-25 Kenan Tur Procede et dispositif de traitement d'information dans des systemes de surveillance destines a la gestion de l'ethique, de risques et/ou de valeurs et produit programme d'ordinateur correspondant et moyen memoire correspondants
WO2004114161A1 (fr) * 2003-06-16 2004-12-29 Google Inc. Systeme et procede de distribution de resultats de recherche en fonction de pays preferes
EP1557773A3 (fr) * 2004-01-26 2005-11-09 Microsoft Corporation Système et méthode pour la recherche dans des ressources diverses
US7159023B2 (en) 1999-12-23 2007-01-02 Alexa Internet Use of web usage trail data to identify relationships between browsable items
US7451129B2 (en) 2003-03-31 2008-11-11 Google Inc. System and method for providing preferred language ordering of search results
US7660815B1 (en) 2006-06-30 2010-02-09 Amazon Technologies, Inc. Method and system for occurrence frequency-based scaling of navigation path weights among online content sources
US7668821B1 (en) 2005-11-17 2010-02-23 Amazon Technologies, Inc. Recommendations based on item tagging activities of users
US7673330B2 (en) 2006-01-05 2010-03-02 Microsoft Corporation Ad-hoc creation of group based on contextual information
US7685192B1 (en) 2006-06-30 2010-03-23 Amazon Technologies, Inc. Method and system for displaying interest space user communities
WO2010081238A1 (fr) * 2009-01-19 2010-07-22 Kibboko, Inc. Procédé et système de classification de document
US7774335B1 (en) 2005-08-23 2010-08-10 Amazon Technologies, Inc. Method and system for determining interest levels of online content navigation paths
US7797421B1 (en) 2006-12-15 2010-09-14 Amazon Technologies, Inc. Method and system for determining and notifying users of undesirable network content
US7827055B1 (en) 2001-06-07 2010-11-02 Amazon.Com, Inc. Identifying and providing targeted content to users having common interests
US7949659B2 (en) 2007-06-29 2011-05-24 Amazon Technologies, Inc. Recommendation system with multiple integrated recommenders
US7966395B1 (en) 2005-08-23 2011-06-21 Amazon Technologies, Inc. System and method for indicating interest of online content
US7991650B2 (en) 2008-08-12 2011-08-02 Amazon Technologies, Inc. System for obtaining recommendations from multiple recommenders
US7991757B2 (en) 2008-08-12 2011-08-02 Amazon Technologies, Inc. System for obtaining recommendations from multiple recommenders
US8060463B1 (en) 2005-03-30 2011-11-15 Amazon Technologies, Inc. Mining of user event data to identify users with common interests
US8078615B2 (en) 2002-04-12 2011-12-13 Stumbleupon, Inc. Method and system for single-action personalized recommendation and display of internet content
US8122086B1 (en) 2005-11-01 2012-02-21 Amazon Technologies, Inc. Strategies for presenting a sequence of messages to a user
US20120047448A1 (en) * 2009-04-29 2012-02-23 Waldeck Technology, Llc System and method for social browsing using aggregated profiles
US8260787B2 (en) 2007-06-29 2012-09-04 Amazon Technologies, Inc. Recommendation system with multiple integrated recommenders
US8306972B2 (en) 2003-03-31 2012-11-06 Google Inc. Ordering of search results based on language and/or country of the search results
US8327266B2 (en) 2006-07-11 2012-12-04 Napo Enterprises, Llc Graphical user interface system for allowing management of a media item playlist based on a preference scoring system
US8386509B1 (en) 2006-06-30 2013-02-26 Amazon Technologies, Inc. Method and system for associating search keywords with interest spaces
US8396951B2 (en) 2007-12-20 2013-03-12 Napo Enterprises, Llc Method and system for populating a content repository for an internet radio service based on a recommendation network
US8429691B2 (en) * 2008-10-02 2013-04-23 Microsoft Corporation Computational recommendation engine
WO2013181434A1 (fr) * 2012-05-31 2013-12-05 Qualcomm Incorporated Localisations prédictives basées sur un contexte
US8620699B2 (en) 2006-08-08 2013-12-31 Napo Enterprises, Llc Heavy influencer media recommendations
CN103544313A (zh) * 2013-11-04 2014-01-29 北京国双科技有限公司 用于网页推荐的数据处理方法和装置
US8751507B2 (en) 2007-06-29 2014-06-10 Amazon Technologies, Inc. Recommendation system with multiple integrated recommenders
US8909667B2 (en) 2011-11-01 2014-12-09 Lemi Technology, Llc Systems, methods, and computer readable media for generating recommendations in a media recommendation system
US8990700B2 (en) 2011-10-31 2015-03-24 Google Inc. Rating and review interface
US9003056B2 (en) 2006-07-11 2015-04-07 Napo Enterprises, Llc Maintaining a minimum level of real time media recommendations in the absence of online friends
GB2525189A (en) * 2014-04-14 2015-10-21 Stephen Morris Internet-based search mechanism
US9292179B2 (en) 2006-07-11 2016-03-22 Napo Enterprises, Llc System and method for identifying music content in a P2P real time recommendation network
US9367808B1 (en) 2009-02-02 2016-06-14 Napo Enterprises, Llc System and method for creating thematic listening experiences in a networked peer media recommendation environment
US10261938B1 (en) 2012-08-31 2019-04-16 Amazon Technologies, Inc. Content preloading using predictive models
US12051087B2 (en) 2022-11-23 2024-07-30 Sas Institute Inc. Method and system for digital traffic campaign management
US12141713B2 (en) * 2016-09-02 2024-11-12 Hithink Financial Services Inc. Systems and methods for semantic analysis based on knowledge graph

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975609A (zh) * 2016-05-18 2016-09-28 德稻全球创新网络(北京)有限公司 一种工业设计产品智能推荐方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5931907A (en) * 1996-01-23 1999-08-03 British Telecommunications Public Limited Company Software agent for comparing locally accessible keywords with meta-information and having pointers associated with distributed information
US5933811A (en) * 1996-08-20 1999-08-03 Paul D. Angles System and method for delivering customized advertisements within interactive communication systems
US6009410A (en) * 1997-10-16 1999-12-28 At&T Corporation Method and system for presenting customized advertising to a user on the world wide web

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5931907A (en) * 1996-01-23 1999-08-03 British Telecommunications Public Limited Company Software agent for comparing locally accessible keywords with meta-information and having pointers associated with distributed information
US5933811A (en) * 1996-08-20 1999-08-03 Paul D. Angles System and method for delivering customized advertisements within interactive communication systems
US6009410A (en) * 1997-10-16 1999-12-28 At&T Corporation Method and system for presenting customized advertising to a user on the world wide web

Cited By (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7159023B2 (en) 1999-12-23 2007-01-02 Alexa Internet Use of web usage trail data to identify relationships between browsable items
GB2371644B (en) * 2000-09-25 2004-10-06 Mythink Technology Co Ltd Method and system for real-time analyzing and processing data over the internet
GB2371644A (en) * 2000-09-25 2002-07-31 Mythink Technology Co Ltd Real-time analysis of browsing over the internet
US7827055B1 (en) 2001-06-07 2010-11-02 Amazon.Com, Inc. Identifying and providing targeted content to users having common interests
US8285589B2 (en) 2001-06-07 2012-10-09 Amazon.Com, Inc. Referring-site based recommendations
EP1449145A2 (fr) * 2001-11-26 2004-08-25 Kenan Tur Procede et dispositif de traitement d'information dans des systemes de surveillance destines a la gestion de l'ethique, de risques et/ou de valeurs et produit programme d'ordinateur correspondant et moyen memoire correspondants
US8078615B2 (en) 2002-04-12 2011-12-13 Stumbleupon, Inc. Method and system for single-action personalized recommendation and display of internet content
US8306972B2 (en) 2003-03-31 2012-11-06 Google Inc. Ordering of search results based on language and/or country of the search results
US7451129B2 (en) 2003-03-31 2008-11-11 Google Inc. System and method for providing preferred language ordering of search results
US9183311B2 (en) 2003-03-31 2015-11-10 Google Inc. Ordering of search results based on language and/or country of the search results
WO2004114161A1 (fr) * 2003-06-16 2004-12-29 Google Inc. Systeme et procede de distribution de resultats de recherche en fonction de pays preferes
US7451130B2 (en) 2003-06-16 2008-11-11 Google Inc. System and method for providing preferred country biasing of search results
US7346613B2 (en) 2004-01-26 2008-03-18 Microsoft Corporation System and method for a unified and blended search
EP1557773A3 (fr) * 2004-01-26 2005-11-09 Microsoft Corporation Système et méthode pour la recherche dans des ressources diverses
US8892508B2 (en) 2005-03-30 2014-11-18 Amazon Techologies, Inc. Mining of user event data to identify users with common interests
US8554723B2 (en) 2005-03-30 2013-10-08 Amazon Technologies, Inc. Mining of user event data to identify users with common interest
US8060463B1 (en) 2005-03-30 2011-11-15 Amazon Technologies, Inc. Mining of user event data to identify users with common interests
US9160548B2 (en) 2005-03-30 2015-10-13 Amazon Technologies, Inc. Mining of user event data to identify users with common interests
US9519938B2 (en) 2005-03-30 2016-12-13 Amazon Technologies, Inc. Mining of user event data to identify users with common interests
US8224773B2 (en) 2005-03-30 2012-07-17 Amazon Technologies, Inc. Mining of user event data to identify users with common interests
US9792332B2 (en) 2005-03-30 2017-10-17 Amazon Technologies, Inc. Mining of user event data to identify users with common interests
US7831582B1 (en) 2005-08-23 2010-11-09 Amazon Technologies, Inc. Method and system for associating keywords with online content sources
US7860895B1 (en) 2005-08-23 2010-12-28 Amazon Technologies, Inc. Method and system for determining interest spaces among online content sources
US7774335B1 (en) 2005-08-23 2010-08-10 Amazon Technologies, Inc. Method and system for determining interest levels of online content navigation paths
US7966395B1 (en) 2005-08-23 2011-06-21 Amazon Technologies, Inc. System and method for indicating interest of online content
US9444648B1 (en) 2005-11-01 2016-09-13 Amazon Technologies, Inc. Strategies for presenting a recommendation as supplemental information
US9996872B1 (en) 2005-11-01 2018-06-12 Amazon Technologies, Inc. Strategies for presenting a recommendation as supplemental information
US8122086B1 (en) 2005-11-01 2012-02-21 Amazon Technologies, Inc. Strategies for presenting a sequence of messages to a user
US8122020B1 (en) 2005-11-17 2012-02-21 Amazon Technologies, Inc. Recommendations based on item tagging activities of users
US8577880B1 (en) 2005-11-17 2013-11-05 Amazon Technologies, Inc. Recommendations based on item tagging activities of users
US7668821B1 (en) 2005-11-17 2010-02-23 Amazon Technologies, Inc. Recommendations based on item tagging activities of users
US7673330B2 (en) 2006-01-05 2010-03-02 Microsoft Corporation Ad-hoc creation of group based on contextual information
US8386509B1 (en) 2006-06-30 2013-02-26 Amazon Technologies, Inc. Method and system for associating search keywords with interest spaces
US7685192B1 (en) 2006-06-30 2010-03-23 Amazon Technologies, Inc. Method and system for displaying interest space user communities
US7660815B1 (en) 2006-06-30 2010-02-09 Amazon Technologies, Inc. Method and system for occurrence frequency-based scaling of navigation path weights among online content sources
US9292179B2 (en) 2006-07-11 2016-03-22 Napo Enterprises, Llc System and method for identifying music content in a P2P real time recommendation network
US8327266B2 (en) 2006-07-11 2012-12-04 Napo Enterprises, Llc Graphical user interface system for allowing management of a media item playlist based on a preference scoring system
US10469549B2 (en) 2006-07-11 2019-11-05 Napo Enterprises, Llc Device for participating in a network for sharing media consumption activity
US9003056B2 (en) 2006-07-11 2015-04-07 Napo Enterprises, Llc Maintaining a minimum level of real time media recommendations in the absence of online friends
US8620699B2 (en) 2006-08-08 2013-12-31 Napo Enterprises, Llc Heavy influencer media recommendations
US7797421B1 (en) 2006-12-15 2010-09-14 Amazon Technologies, Inc. Method and system for determining and notifying users of undesirable network content
US8751507B2 (en) 2007-06-29 2014-06-10 Amazon Technologies, Inc. Recommendation system with multiple integrated recommenders
US7949659B2 (en) 2007-06-29 2011-05-24 Amazon Technologies, Inc. Recommendation system with multiple integrated recommenders
US8260787B2 (en) 2007-06-29 2012-09-04 Amazon Technologies, Inc. Recommendation system with multiple integrated recommenders
US8396951B2 (en) 2007-12-20 2013-03-12 Napo Enterprises, Llc Method and system for populating a content repository for an internet radio service based on a recommendation network
US9071662B2 (en) 2007-12-20 2015-06-30 Napo Enterprises, Llc Method and system for populating a content repository for an internet radio service based on a recommendation network
US7991650B2 (en) 2008-08-12 2011-08-02 Amazon Technologies, Inc. System for obtaining recommendations from multiple recommenders
US8249948B1 (en) 2008-08-12 2012-08-21 Amazon Technologies, Inc. System for obtaining recommendations from multiple recommenders
US7991757B2 (en) 2008-08-12 2011-08-02 Amazon Technologies, Inc. System for obtaining recommendations from multiple recommenders
US8429691B2 (en) * 2008-10-02 2013-04-23 Microsoft Corporation Computational recommendation engine
WO2010081238A1 (fr) * 2009-01-19 2010-07-22 Kibboko, Inc. Procédé et système de classification de document
US9367808B1 (en) 2009-02-02 2016-06-14 Napo Enterprises, Llc System and method for creating thematic listening experiences in a networked peer media recommendation environment
US20120047448A1 (en) * 2009-04-29 2012-02-23 Waldeck Technology, Llc System and method for social browsing using aggregated profiles
US8990700B2 (en) 2011-10-31 2015-03-24 Google Inc. Rating and review interface
US9015109B2 (en) 2011-11-01 2015-04-21 Lemi Technology, Llc Systems, methods, and computer readable media for maintaining recommendations in a media recommendation system
US8909667B2 (en) 2011-11-01 2014-12-09 Lemi Technology, Llc Systems, methods, and computer readable media for generating recommendations in a media recommendation system
WO2013181434A1 (fr) * 2012-05-31 2013-12-05 Qualcomm Incorporated Localisations prédictives basées sur un contexte
US9633310B2 (en) 2012-05-31 2017-04-25 Qualcomm Incorporated Predictive searching with modified search terms that are based on behaviors
US8972318B2 (en) 2012-05-31 2015-03-03 Qualcomm Incorporated Predicting user behavior using feedback on previously run predictive searches
US10261938B1 (en) 2012-08-31 2019-04-16 Amazon Technologies, Inc. Content preloading using predictive models
CN103544313A (zh) * 2013-11-04 2014-01-29 北京国双科技有限公司 用于网页推荐的数据处理方法和装置
GB2525189A (en) * 2014-04-14 2015-10-21 Stephen Morris Internet-based search mechanism
US10977387B2 (en) 2014-04-14 2021-04-13 Bubblr Limited Internet-based search mechanism
US12141713B2 (en) * 2016-09-02 2024-11-12 Hithink Financial Services Inc. Systems and methods for semantic analysis based on knowledge graph
US12051087B2 (en) 2022-11-23 2024-07-30 Sas Institute Inc. Method and system for digital traffic campaign management

Also Published As

Publication number Publication date
AU7857900A (en) 2001-05-10

Similar Documents

Publication Publication Date Title
WO2001025947A1 (fr) Procede permettant de recommander de maniere dynamique des sites web et de repondre a des requetes d'utilisateurs repartis par groupes d'affinite
KR100852034B1 (ko) 분배형 데이터베이스의 문서를 분류하고 제시하기 위한 방법 및 장치
US6647383B1 (en) System and method for providing interactive dialogue and iterative search functions to find information
Beitzel et al. Temporal analysis of a very large topically categorized web query log
KR101031449B1 (ko) 트렌드 분석을 이용한 검색 쿼리 처리 시스템 및 방법
Pu et al. Subject categorization of query terms for exploring Web users' search interests
US8260786B2 (en) Method and apparatus for categorizing and presenting documents of a distributed database
US20070136256A1 (en) Method and apparatus for representing text using search engine, document collection, and hierarchal taxonomy
US20090248682A1 (en) System and method for personalized search
US9317584B2 (en) Keyword index pruning
CN101019118A (zh) 搜索结果中放置内容排序的个性化
CA2578513A1 (fr) Systeme et methode d'analyse d'information en ligne
EP1360608A2 (fr) Systeme d'entreprise d'exploration en profondeur de reseau et procede
AU2001291248A1 (en) Enterprise web mining system and method
Xue et al. Content-aware trust propagation toward online review spam detection
Li et al. A feature-free search query classification approach using semantic distance
Chau et al. Redips: Backlink search and analysis on the Web for business intelligence analysis
WO2010087882A1 (fr) Moteur de personnalisation pour la création d'un profil utilisateur
Abbattista et al. Learning user profiles for content-based filtering in e-commerce
WO2008032037A1 (fr) Procédé et système pour filtrer et rechercher des données à l'aide de fréquences de mots
Montaner et al. A taxonomy of personalized agents on the internet
Daryaie Zanjani et al. Predicting user click behaviour in search engine advertisements
Markellou et al. Web personalization for e-marketing intelligence
Truran et al. The effect of user intent on the stability of search engine results
Wen Development of personalized online systems for web search, recommendations, and e-commerce

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP