EP1221078A1

EP1221078A1 - Method and system for selecting and automatically updating arbitrary elements from structured documents

Info

Publication number: EP1221078A1
Application number: EP00961548A
Authority: EP
Inventors: Arturo Pizano; Sanjeev Segan; Chellury R. Sastry; Darrin Lewis
Original assignee: Siemens Corporate Research Inc
Current assignee: Siemens Corporate Research Inc
Priority date: 1999-09-15
Filing date: 2000-09-05
Publication date: 2002-07-10
Also published as: AU769236B2; AU7348900A; BR0014049A; WO2001019160A2; CN1451126A; WO2001019160A3

Abstract

According to one embodiment of the invention, there is provided a method for automatically providing a user with user selected elements from a structured document in a computer network environment. The structured document includes a plurality of features. The method includes the step of selecting at least one section of interest in the structured document (step 208a). An interval at which updated versions of the at least one selected section are to be received by the user is specified (step 208b). A map is generated that includes identifiers for features associated with the at least one selected section (step 206). The identifiers are stored (step 212). A contemporaneous version of the structured document corresponding to the stored identifiers is fetched (step 304). Sections of the contemporaneous version of the structured document corresponding to the stored identifiers are extracted (step 306a). The extracted sections are provided to the user in accordance with the interval specified by the user (step 308).

Description

METHOD AND SYSTEM FOR SELECTING AND AUTOMATICALLY UPDATING ARBITRARY ELEMENTS FROM STRUCTURED DOCUMENTS

BACKGROUND

1. Technical Field

The present invention relates generally to computer processing systems and, in particular, to a method and system for selecting and automatically updating arbitrary elements from structured documents in a computer processing system (s) .

2. Background Description

It is quite apparent that the Internet/Intranet provides a cost effective and ubiquitous medium to connect people and organizations worldwide to exchange and share information via web documents with rich multimedia content.

One of the newest technologies used on the Web to provide information is referred to as "push" . In push technology, entire Web sites and even applications can be sent to a user's computer on a predefined periodic basis without them having to repeatedly request the same. In push technology, you "subscribe" to Web sites (also called "channels") of information and they are sent to you at intervals that you specify. A "channel" generally refers to an area of interest that a publisher constructs, which can include HTML pages, JAVA applets, ActiveX components, multimedia objects, and other information packaged together to provide users information via push technology. The term "subscribing" does not refer to paying for information. Rather, the term is used to state that you have asked to receive information in accordance with a regular schedule.

With the evolution of "Push technology", primarily with POINTCAST, content providers on the web are able to deliver customized newsletters to millions of their users on the web. This is one of the first solutions towards satisfying a key customer requirement, i.e., the automatic delivery of desired web content whenever needed.

Currently, both NETSCAPE and INTERNET EXPLORER browsers utilize Push technology (NETCASTING and WEBCASTING, respectively) in the form of channels to deliver personalized web content to users. Through the use of channels, web content is automatically delivered to a user in a pre-determined manner without the need for his/her explicit interaction. Users can customize what, when, and how often web content is delivered to their desktops. The delivered content could be rich in multimedia and interactive content, based on dynamic HTML (HyperText Markup Language) and JAVASCRIPT, and the delivered content can either be viewed in either a browser window or full screen mode.

One of the key drawbacks of techniques for delivering personalized web content to users is the limitation on the amount of personalization that can be achieved. Personalized web pages, like those offered by YAHOO, LYCOS and CNN, allow personalization only in the limited context of the material available at these sites.

Also these pages do not refresh (update) their content.

Thus, there is a need by internet users to take the idea of personalization a step further. That is, users would prefer only certain parts of a web page to be delivered to their desktop at regular intervals. For example, some users may prefer only certain stock quotes of all quotes listed on a web page to be delivered. Other users may prefer only the Associated Press headlines that are refreshed ever so often on the web sites of major newspapers (e.g., the New York Times or the Washington Post) to be delivered. When it comes to personalization, it is desirable to provide the customer with as many choices as possible.

One system that attempts to deliver only a part of web page to a user is called DIBS (see http://www.modaka.com/solutions/index.html). This tool lets a user select a rectangular region of a web page the contents of which are updated and sent to the user's desktop at intervals chosen by the user. There are two main drawbacks in a system such as DIBS. First, the selection mechanism is purely geometric. This is a problem because web pages are dynamically changing over time with the addition of advertisements, etc. Hence, the contents that are present at a particular geometric region on a web page at one instant in time may not have any relevance to those present at that location at another instant in time. Second, the selection mechanism is not context sensitive; thus, over a period of time, either extra content or missing content result .

Accordingly, it would be desirable and highly advantageous to have a method and system for which overcomes the problems inherent in the conventional methods and systems for providing portions of web pages, i.e., geometric selection and change (loss or addition) of content. SUMMARY OF THE INVENTION

The present invention is directed to a method and system for selecting and automatically updating arbitrary elements from structured documents.

According to a first aspect of the present invention, there is provided a system for automatically providing a user with user selected elements from a structured document in a computer network environment. The structured document includes a plurality of features.

The system includes an authoring tool for enabling a user to select sections of interest in the structured document, for enabling the user to specify an interval at which updated versions of the selected sections are to be received, for generating a map comprising identifiers for features associated with the selected sections of interest, and for outputting the identifiers. A database stores the identifiers. A server fetches a contemporaneous version of the structured document corresponding to the stored identifiers, extracts sections of the contemporaneous version of the structured document corresponding to the stored identifiers, and provides the extracted sections to the user in accordance with the interval specified by the user.

According to a second aspect of the present invention, the network environment is the world wide web and the structured document is a web page.

According to a third aspect of the present invention, the selected sections are provided to the user in an HTML file.

According to a fourth aspect of the present invention, the authoring tool includes a user interface control for receiving a location identifier of the structured document.

According to a fifth aspect of the present invention, the server generates a map based on the contemporaneous version of the structured document that includes identifiers for features included in the contemporaneous version.

According to a sixth aspect of the present invention, the server compares the identifiers of the fetched contemporaneous version of the structured document with the stored identifiers to determine those identifiers which are common therebetween.

According to a seventh aspect of the present invention, the common identifiers are used by said server to generate a new document that includes updated versions of the sections selected by the user in the structured document .

These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings .

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a diagram of the main components of a system 100 for selecting and automatically updating arbitrary elements from structured documents according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating the steps performed with respect to a client side of the system of FIG. 1 according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating the steps performed with respect to the server side of the system of FIG. 1 according to an embodiment of the present invention; and

FIG. 4 is a diagram illustrating the results of the steps of FIGs . 2 and 3 according to an embodiment of the present invention. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention is directed to a method and system for selecting and automatically updating arbitrary elements from structured documents. The present invention is preferably, although not necessarily, practiced through a network of 2 or more computers (e.g., Internet/Intranet) . Further, the present invention may be practiced on a single computer, where information in a certain structured file is being constantly updated (e.g., manually), and a user desires to view a portion of that document at a time interval specified by him or herself. As is known to one skilled in the art, the type of the structured documents will vary depending upon the specific implementation.

For illustrative purposes, the present invention is described herein with respect to the World Wide Web (hereinafter "Web"), and the structured documents are "Web pages" . However, it is to be appreciated that invention is not limited to only the Web and to Web pages and, thus, other computer configurations (preferably networks) and types of structured documents may be used.

The illustrative embodiment of the invention includes a client or user portion and a server portion. In the case of the Web, the present invention allows a user to pick and choose elements (snippets) of interest from an arbitrary web page. Moreover, the present invention may be implemented in an interactive manner and configured to employ the well-known "cut-and-paste" editing paradigm. Further, the present invention provide enhancements to enable the formatting of the resulting "web-page snippets". Additionally, the present invention provides the server portion with the support needed to maintain up-to-date snippets.

A general description of the present invention will now be given to introduce the reader to the concepts and advantages of the invention. Subsequently, more detailed descriptions of various aspects of the invention will be provided.

The present invention allows a user to select different elements of different web pages, and the contents of the user's selections are all periodically updated and delivered to the user. The selection techniques will vary depending upon the specific implementation.

The invention solves the problems of geometric selection and loss of context present in the prior art. The present invention allows a user to, for example, select stock quotes from one page, latest sports headlines from another page, weather reports from yet another page, and so on. The present invention will update the contents of all these selections at intervals chosen by the user and then deliver the selections to the user's desktop. Unlike systems such as DIBS, our system does not update user selections based on geometry; rather a user selection is closely tied to the underlying structure of the web document. Hence, updates are more reliable and, over time, the relevance of a user's selections is maintained.

It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Preferably, the present invention is implemented in software as a program tangibly embodied on a program storage device . The program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU) , a random access memory (RAM), and input/output (I/O) interface (s) . The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the program (or a combination thereof) which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.

It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present invention is programmed.

FIG. 1 is a diagram of the main components of a system 100 for selecting and automatically updating arbitrary elements from structured documents according to an embodiment of the present invention. The three main components of the invention are: a client front end authoring tool 102; a database engine 104; and a web server 106. These components may be grouped into a client portion (or client side) and a server porion (or server side) . However, other configurations are possible, including at least one intermediate portion (depending on the specific implementation) . In the illustrative embodiment of the invention, the client front end authoring tool 102 corresponds to the client portion, and the database engine 104 and the web server 106 correspond to the server portion. These components correspond to the three main tasks performed by the present invention, that is identifying selections from web pages, saving these selections, and reproducing the saved selections, respectively.

FIG. 2 is a flow chart illustrating the steps performed with respect to a client side of the system of FIG. 1 according to an embodiment of the present invention. The authoring page is loaded into the web browser (step 202) . The authoring page enables the user to select sections of web pages. In a preferred embodiment of the present invention, the authoring page has a user interface control in the form of a floating toolbar.

The URL (uniform resource locator) of a first target page is entered into this user interface (step 204) . This results in the target page being fetched. Upon fetch of the target page, a client side topographical map (hereinafter "t-map") is constructed for that page (step 206) . The tamp is constructed at step 206 to generate feature-IDs for the mapped features of the page. A t-map is a map/data structure that associates an identifier for a mapped feature in a structured document with corresponding location information for that feature, the location information corresponding to a given presentation of the document. Optionally, the t-map may include additional information associated with each feature.

A feature in a structured document may be, for example, a table, a paragraph, and so forth. The identifier for a mapped feature is referred to as a "feature- ID" .

Structured documents are often hierarchal and their presentation will likely consist of nested regions.

The t-map allows for the indexing of mapped features based on the position and scope of those features . Moreover, the t-map allows for the correlation of nested regions in the presentation of a document with the underlying document features. In an illustrative embodiment of the present invention, regions with the broadest scope will appear lower in the t-map than regions with a more limited scope. Moreover, in the illustrative embodiment, the location information is in the form of absolute plane coordinates. However, it is to be appreciated that other orders may be used to arrange the information in the t-map, such as, for example, regions with the narrowest scope appearing lower in the t-map than regions with a more broader scope. Moreover, other types of information may be used to represent the location of mapped information.

In another embodiment of the present invention, the t-map is implemented as a linked hierarchal structure in which feature-regions that enclose other feature- regions are high-level nodes in the structure. Each feature-region that is contained within a region defined by a higher-level node will appear as a child (node) of that node. Siblings may be linked horizontally and generations (parents-children) may be linked vertically.

Given the teaching provided herein, one of ordinary skill in the related art will contemplate these and other similar implementations of the t-map.

The user selects sections of interest in the target page (step 208a) . The user then specifies at what frequency these sections of interest are to be refreshed (step 208b) . It is then determined whether there are any remaining target pages (for which sections of interest thereof are to be selected) (step 210) . If so, then a return is made to step 204. On the other hand, if there are no remaining target pages, then the URLs and feature ids of these sections are stored in the database engine 104 (step 212) .

The database engine 104 is the interface between the client and server. For each user of the system 100, the database engine 104 stores the URL/feature id lists for each target page the user has used to construct his or her paste-up page. A paste-up page is a preview version of a final target HTML file that is displayed to a user. The paste-up page reflects the selections made by the user. The database engine 104 also maintains the required security and privilege level information about users. The web server 106 uses database information to manage user sessions.

Once a user "logs in", a session is initiated. The paste-up page is then created as described below. When the user logs out, or after a timeout interval, the session is closed.

FIG. 3 is a flow chart illustrating the steps performed with respect to the server side of the system of FIG. 1 according to an embodiment of the present invention. The server 106 reads the feature id/URL list information from the database engine 104 (step 302) . The server 106 then pre-fetches the target web pages, which are required to construct the paste-up page, and constructs the server side T-Map (step 304) . Once these pages are available locally, the relevant content therein is extracted using the text of the pre-fetched web page (step 306a) . The content is then pasted up in the final target HTML file, which the user can view (step 306b) . The browser is then pointed to the final target HTML file (step 308) . FIG. 4 is a diagram illustrating the results of the steps of FIGs . 2 and 3 according to an embodiment of the present invention. Step 306a includes the step of comparing the feature-IDs of the pre-fetched target web page with the stored feature- IDs to determine those feature- IDs which are common therebetween.

The front-end authoring tool 102 allows users to select content from different web pages. It provides a simple mechanism to select content from different web pages as shown in FIG. 4. A resizable rectangle is used to enclose the selection area. When a particular region is selected from a page, the underlying features are identified and stored as feature- IDs in the database engine 104. The user also specifies at what interval he or she wants the content to refresh.

At the server side, the selected pages are prefetched. A server side T-Map is then generated from the HTML source of the page. Next, using the feature id identified during the authoring process, the relevant selections from these pages are reconstructed and pasted up on a target page, which is then available for the user to view. This is illustrated in the bottom half of FIG 4.

A description of some of the advantages of using a context sensitive mechanism to "map" a document and a T- Map as a basis for selection and reproduction of web content will now be given. First, the selection mechanism is independent of geometric, browser or platform considerations. Second, if the structure of a web page ever changes, then the T-Maps at the client and server ends will be out of synch and the user can be notified of the need to re-author the content of that page. Third, once a section of a document is selected, the server is able to retrieve the selected structural features of the document regardless of changes to that section, allowing a context sensitive presentation of information. Fourth, the process of generating the T-Map follows the hierarchical nature of the document structure. Even if the selection is incomplete, since the context of the selection is known, the server can gather information to reconstruct missing parts of the document for a better presentation.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present system and method is not limited to those precise embodiments, and that various other changes and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the invention. All such changes and modifications are intended to be included within the scope of the invention as defined by the appended claims .

Claims

WHAT IS CLAIMED IS:

1. A system for automatically providing a user with user selected elements from a structured document in a computer network environment, the structured document comprising a plurality of features, the system comprising: an authoring tool for enabling a user to select sections of interest in the structured document, for enabling the user to specify an interval at which updated versions of the selected sections are to be received, for generating a map comprising identifiers for features associated with the selected sections of interest, and for outputting the identifiers; a database for storing the identifiers; and a server for fetching a contemporaneous version of the structured document corresponding to the stored identifiers, extracting sections of the contemporaneous version of the structured document corresponding to the stored identifiers, and providing the extracted sections to the user in accordance with the interval specified by the user.

2. The system according to claim 1, wherein the network environment is the world wide web and the structured document is a web page.

3. The system according to claim 2, wherein the selected sections are provided to the user in an HTML file.

4. The system according to claim 2, wherein said authoring tool comprises a user interface control for receiving a location identifier of the structured document .

5. The system according to claim 2, wherein said server generates a map based on the contemporaneous version of the structured document that comprises identifiers for features comprised in the contemporaneous version.

6. The system according to claim 5, wherein said server compares the identifiers of the fetched contemporaneous version of the structured document with the stored identifiers to determine those identifiers which are common therebetween.

7. The system according to claim 6, wherein the common identifiers are used by said server to generate a new document that includes updated versions of the sections selected by the user in the structured document .

8. A method for automatically providing a user with user selected elements from a structured document in a computer network environment, the structured document comprising a plurality of features, the method comprising the steps of : selecting at least one section of interest in the structured document; specifying an interval at which updated versions of the at least one selected section are to be received by the user; generating a map comprising identifiers for features associated with the at least one selected section; storing the identifiers; fetching a contemporaneous version of the structured document corresponding to the stored identifiers; extracting sections of the contemporaneous version of the structured document corresponding to the stored identifiers; and providing the extracted sections to the user in accordance with the interval specified by the user.

9. The method according to claim 8, wherein the network environment is the world wide web and the structured document is a web page.

10. The method according to claim 9, wherein the selected sections are provided to the user in an HTML file.

11. The method according to claim 9, wherein said selecting step comprises the steps of : receiving a location identifier of the structured document ; and fetching the structured document using the location identifier.

12. The method according to claim 9, wherein said method further comprises the step of generating a map based on the contemporaneous version of the structured document that comprises identifiers for features comprised in the contemporaneous version.

13. The method according to claim 12, wherein said extracting step comprises the step of comparing the identifiers of the fetched contemporaneous version of the structured document with the stored identifiers to determine those identifiers which are common therebetween .

14. The method according to claim 13 , wherein said providing step comprises the step of generating a new document that includes updated versions of the sections selected by the user in the structured document, using the common identifiers.

15. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for automatically providing a user with user selected elements from a structured document in a computer network environment, the structured document comprising a plurality of features, said method steps comprising: selecting at least one section of interest in the structured document; specifying an interval at which updated versions of the at least one selected section are to be received by the user; generating a map comprising identifiers for features associated with the at least one selected section; storing the identifiers; fetching a contemporaneous version of the structured document corresponding to the stored identifiers; extracting sections of the contemporaneous version of the structured document corresponding to the stored identifiers; and providing the extracted sections to the user in accordance with the interval specified by the user.

16. The program storage device according to claim 15, wherein the network environment is the world wide web and the structured document is a web page.

17. The program storage device according to claim 16, wherein the selected sections are provided to the user in an HTML file.

18. The program storage device according to claim 16, wherein said selecting step comprises the steps of: receiving a location identifier of the structured document ; and fetching the structured document using the location identifier.

19. The program storage device according to claim 16, wherein said method further comprises the step of generating a map based on the contemporaneous version of the structured document that comprises identifiers for features comprised in the contemporaneous version.

20. The program storage device according to claim 19, wherein said extracting step comprises the step of comparing the identifiers of the fetched contemporaneous version of the structured document with the stored identifiers to determine those identifiers which are common therebetween.

21. The method according to claim 20, wherein said providing step comprises the step of generating a new document that includes updated versions of the sections selected by the user in the structured document, using the common identifiers.