EP1221078A1 - Method and system for selecting and automatically updating arbitrary elements from structured documents - Google Patents
Method and system for selecting and automatically updating arbitrary elements from structured documentsInfo
- Publication number
- EP1221078A1 EP1221078A1 EP00961548A EP00961548A EP1221078A1 EP 1221078 A1 EP1221078 A1 EP 1221078A1 EP 00961548 A EP00961548 A EP 00961548A EP 00961548 A EP00961548 A EP 00961548A EP 1221078 A1 EP1221078 A1 EP 1221078A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- structured document
- user
- identifiers
- sections
- contemporaneous
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9562—Bookmark management
Definitions
- the present invention relates generally to computer processing systems and, in particular, to a method and system for selecting and automatically updating arbitrary elements from structured documents in a computer processing system (s) .
- Push One of the newest technologies used on the Web to provide information is referred to as "push” .
- push technology entire Web sites and even applications can be sent to a user's computer on a predefined periodic basis without them having to repeatedly request the same.
- you “subscribe” to Web sites (also called “channels”) of information and they are sent to you at intervals that you specify.
- a "channel” generally refers to an area of interest that a publisher constructs, which can include HTML pages, JAVA applets, ActiveX components, multimedia objects, and other information packaged together to provide users information via push technology.
- the term “subscribing” does not refer to paying for information. Rather, the term is used to state that you have asked to receive information in accordance with a regular schedule.
- NETSCAPE and INTERNET EXPLORER browsers utilize Push technology (NETCASTING and WEBCASTING, respectively) in the form of channels to deliver personalized web content to users.
- Push technology NETCASTING and WEBCASTING, respectively
- web content is automatically delivered to a user in a pre-determined manner without the need for his/her explicit interaction.
- Users can customize what, when, and how often web content is delivered to their desktops.
- the delivered content could be rich in multimedia and interactive content, based on dynamic HTML (HyperText Markup Language) and JAVASCRIPT, and the delivered content can either be viewed in either a browser window or full screen mode.
- DIBS One system that attempts to deliver only a part of web page to a user is called DIBS (see http://www.modaka.com/solutions/index.html).
- DIBS This tool lets a user select a rectangular region of a web page the contents of which are updated and sent to the user's desktop at intervals chosen by the user.
- DIBS There are two main drawbacks in a system such as DIBS.
- the selection mechanism is purely geometric. This is a problem because web pages are dynamically changing over time with the addition of advertisements, etc. Hence, the contents that are present at a particular geometric region on a web page at one instant in time may not have any relevance to those present at that location at another instant in time.
- the selection mechanism is not context sensitive; thus, over a period of time, either extra content or missing content result .
- the present invention is directed to a method and system for selecting and automatically updating arbitrary elements from structured documents.
- a system for automatically providing a user with user selected elements from a structured document in a computer network environment includes a plurality of features.
- the system includes an authoring tool for enabling a user to select sections of interest in the structured document, for enabling the user to specify an interval at which updated versions of the selected sections are to be received, for generating a map comprising identifiers for features associated with the selected sections of interest, and for outputting the identifiers.
- a database stores the identifiers.
- a server fetches a contemporaneous version of the structured document corresponding to the stored identifiers, extracts sections of the contemporaneous version of the structured document corresponding to the stored identifiers, and provides the extracted sections to the user in accordance with the interval specified by the user.
- the network environment is the world wide web and the structured document is a web page.
- the selected sections are provided to the user in an HTML file.
- the authoring tool includes a user interface control for receiving a location identifier of the structured document.
- the server generates a map based on the contemporaneous version of the structured document that includes identifiers for features included in the contemporaneous version.
- the server compares the identifiers of the fetched contemporaneous version of the structured document with the stored identifiers to determine those identifiers which are common therebetween.
- the common identifiers are used by said server to generate a new document that includes updated versions of the sections selected by the user in the structured document .
- FIG. 1 is a diagram of the main components of a system 100 for selecting and automatically updating arbitrary elements from structured documents according to an embodiment of the present invention
- FIG. 2 is a flow chart illustrating the steps performed with respect to a client side of the system of FIG. 1 according to an embodiment of the present invention
- FIG. 3 is a flow chart illustrating the steps performed with respect to the server side of the system of FIG. 1 according to an embodiment of the present invention.
- FIG. 4 is a diagram illustrating the results of the steps of FIGs . 2 and 3 according to an embodiment of the present invention. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
- the present invention is directed to a method and system for selecting and automatically updating arbitrary elements from structured documents.
- the present invention is preferably, although not necessarily, practiced through a network of 2 or more computers (e.g., Internet/Intranet) . Further, the present invention may be practiced on a single computer, where information in a certain structured file is being constantly updated (e.g., manually), and a user desires to view a portion of that document at a time interval specified by him or herself.
- the type of the structured documents will vary depending upon the specific implementation.
- the present invention is described herein with respect to the World Wide Web (hereinafter "Web"), and the structured documents are "Web pages" .
- Web World Wide Web
- structured documents are "Web pages” .
- invention is not limited to only the Web and to Web pages and, thus, other computer configurations (preferably networks) and types of structured documents may be used.
- the illustrative embodiment of the invention includes a client or user portion and a server portion.
- the present invention allows a user to pick and choose elements (snippets) of interest from an arbitrary web page.
- the present invention may be implemented in an interactive manner and configured to employ the well-known "cut-and-paste" editing paradigm.
- the present invention provide enhancements to enable the formatting of the resulting "web-page snippets".
- the present invention provides the server portion with the support needed to maintain up-to-date snippets.
- the present invention allows a user to select different elements of different web pages, and the contents of the user's selections are all periodically updated and delivered to the user.
- the selection techniques will vary depending upon the specific implementation.
- the invention solves the problems of geometric selection and loss of context present in the prior art.
- the present invention allows a user to, for example, select stock quotes from one page, latest sports headlines from another page, weather reports from yet another page, and so on.
- the present invention will update the contents of all these selections at intervals chosen by the user and then deliver the selections to the user's desktop.
- our system does not update user selections based on geometry; rather a user selection is closely tied to the underlying structure of the web document. Hence, updates are more reliable and, over time, the relevance of a user's selections is maintained.
- the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
- the present invention is implemented in software as a program tangibly embodied on a program storage device .
- the program may be uploaded to, and executed by, a machine comprising any suitable architecture.
- the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU) , a random access memory (RAM), and input/output (I/O) interface (s) .
- the computer platform also includes an operating system and microinstruction code.
- the various processes and functions described herein may either be part of the microinstruction code or part of the program (or a combination thereof) which is executed via the operating system.
- various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
- FIG. 1 is a diagram of the main components of a system 100 for selecting and automatically updating arbitrary elements from structured documents according to an embodiment of the present invention.
- the three main components of the invention are: a client front end authoring tool 102; a database engine 104; and a web server 106. These components may be grouped into a client portion (or client side) and a server porion (or server side) . However, other configurations are possible, including at least one intermediate portion (depending on the specific implementation) .
- the client front end authoring tool 102 corresponds to the client portion
- the database engine 104 and the web server 106 correspond to the server portion.
- These components correspond to the three main tasks performed by the present invention, that is identifying selections from web pages, saving these selections, and reproducing the saved selections, respectively.
- FIG. 2 is a flow chart illustrating the steps performed with respect to a client side of the system of FIG. 1 according to an embodiment of the present invention.
- the authoring page is loaded into the web browser (step 202) .
- the authoring page enables the user to select sections of web pages.
- the authoring page has a user interface control in the form of a floating toolbar.
- the URL (uniform resource locator) of a first target page is entered into this user interface (step 204) . This results in the target page being fetched.
- a client side topographical map (hereinafter "t-map") is constructed for that page (step 206) .
- the tamp is constructed at step 206 to generate feature-IDs for the mapped features of the page.
- a t-map is a map/data structure that associates an identifier for a mapped feature in a structured document with corresponding location information for that feature, the location information corresponding to a given presentation of the document.
- the t-map may include additional information associated with each feature.
- a feature in a structured document may be, for example, a table, a paragraph, and so forth.
- the identifier for a mapped feature is referred to as a "feature- ID" .
- Structured documents are often hierarchal and their presentation will likely consist of nested regions.
- the t-map allows for the indexing of mapped features based on the position and scope of those features . Moreover, the t-map allows for the correlation of nested regions in the presentation of a document with the underlying document features. In an illustrative embodiment of the present invention, regions with the broadest scope will appear lower in the t-map than regions with a more limited scope. Moreover, in the illustrative embodiment, the location information is in the form of absolute plane coordinates. However, it is to be appreciated that other orders may be used to arrange the information in the t-map, such as, for example, regions with the narrowest scope appearing lower in the t-map than regions with a more broader scope. Moreover, other types of information may be used to represent the location of mapped information.
- the t-map is implemented as a linked hierarchal structure in which feature-regions that enclose other feature- regions are high-level nodes in the structure. Each feature-region that is contained within a region defined by a higher-level node will appear as a child (node) of that node. Siblings may be linked horizontally and generations (parents-children) may be linked vertically.
- the user selects sections of interest in the target page (step 208a) .
- the user specifies at what frequency these sections of interest are to be refreshed (step 208b) .
- the database engine 104 is the interface between the client and server. For each user of the system 100, the database engine 104 stores the URL/feature id lists for each target page the user has used to construct his or her paste-up page.
- a paste-up page is a preview version of a final target HTML file that is displayed to a user. The paste-up page reflects the selections made by the user.
- the database engine 104 also maintains the required security and privilege level information about users.
- the web server 106 uses database information to manage user sessions.
- a session is initiated.
- the paste-up page is then created as described below.
- the session is closed.
- FIG. 3 is a flow chart illustrating the steps performed with respect to the server side of the system of FIG. 1 according to an embodiment of the present invention.
- the server 106 reads the feature id/URL list information from the database engine 104 (step 302) .
- the server 106 then pre-fetches the target web pages, which are required to construct the paste-up page, and constructs the server side T-Map (step 304) . Once these pages are available locally, the relevant content therein is extracted using the text of the pre-fetched web page (step 306a) .
- the content is then pasted up in the final target HTML file, which the user can view (step 306b) .
- the browser is then pointed to the final target HTML file (step 308) .
- Step 306a includes the step of comparing the feature-IDs of the pre-fetched target web page with the stored feature- IDs to determine those feature- IDs which are common therebetween.
- the front-end authoring tool 102 allows users to select content from different web pages. It provides a simple mechanism to select content from different web pages as shown in FIG. 4. A resizable rectangle is used to enclose the selection area. When a particular region is selected from a page, the underlying features are identified and stored as feature- IDs in the database engine 104. The user also specifies at what interval he or she wants the content to refresh.
- the selected pages are prefetched.
- a server side T-Map is then generated from the HTML source of the page.
- the relevant selections from these pages are reconstructed and pasted up on a target page, which is then available for the user to view. This is illustrated in the bottom half of FIG 4.
- the selection mechanism is independent of geometric, browser or platform considerations.
- the T-Maps at the client and server ends will be out of synch and the user can be notified of the need to re-author the content of that page.
- the server is able to retrieve the selected structural features of the document regardless of changes to that section, allowing a context sensitive presentation of information.
- the process of generating the T-Map follows the hierarchical nature of the document structure. Even if the selection is incomplete, since the context of the selection is known, the server can gather information to reconstruct missing parts of the document for a better presentation.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US39695199A | 1999-09-15 | 1999-09-15 | |
| US396951 | 1999-09-15 | ||
| PCT/US2000/024344 WO2001019160A2 (en) | 1999-09-15 | 2000-09-05 | Method and system for selecting and automatically updating arbitrary elements from structured documents |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP1221078A1 true EP1221078A1 (en) | 2002-07-10 |
Family
ID=23569266
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP00961548A Ceased EP1221078A1 (en) | 1999-09-15 | 2000-09-05 | Method and system for selecting and automatically updating arbitrary elements from structured documents |
Country Status (5)
| Country | Link |
|---|---|
| EP (1) | EP1221078A1 (en) |
| CN (1) | CN1451126A (en) |
| AU (1) | AU769236B2 (en) |
| BR (1) | BR0014049A (en) |
| WO (1) | WO2001019160A2 (en) |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2411017A (en) * | 2004-02-13 | 2005-08-17 | Satellite Information Services | Updating mark-up language documents from contained instructions |
| US20140053053A1 (en) * | 2005-03-31 | 2014-02-20 | Google Inc. | Methods and systems for real-time extraction of user-specified information |
| US7836085B2 (en) | 2007-02-05 | 2010-11-16 | Google Inc. | Searching structured geographical data |
| US8005842B1 (en) | 2007-05-18 | 2011-08-23 | Google Inc. | Inferring attributes from search queries |
| US8977645B2 (en) | 2009-01-16 | 2015-03-10 | Google Inc. | Accessing a search interface in a structured presentation |
| US8412749B2 (en) | 2009-01-16 | 2013-04-02 | Google Inc. | Populating a structured presentation with new values |
| AU2010256777A1 (en) * | 2009-06-01 | 2011-12-22 | Google Inc. | Searching methods and devices |
| CN101702161B (en) * | 2009-11-05 | 2012-07-04 | 金蝶软件(中国)有限公司 | Data extraction method, device and data management system |
| US8375328B2 (en) | 2009-11-11 | 2013-02-12 | Google Inc. | Implementing customized control interfaces |
| CN110569314B (en) * | 2018-06-05 | 2023-05-05 | 阿里巴巴集团控股有限公司 | Structured data generation method, device, equipment and storage medium |
-
2000
- 2000-09-05 AU AU73489/00A patent/AU769236B2/en not_active Ceased
- 2000-09-05 CN CN00815732A patent/CN1451126A/en active Pending
- 2000-09-05 BR BR0014049-0A patent/BR0014049A/en not_active IP Right Cessation
- 2000-09-05 EP EP00961548A patent/EP1221078A1/en not_active Ceased
- 2000-09-05 WO PCT/US2000/024344 patent/WO2001019160A2/en not_active Ceased
Non-Patent Citations (1)
| Title |
|---|
| See references of WO0119160A2 * |
Also Published As
| Publication number | Publication date |
|---|---|
| AU769236B2 (en) | 2004-01-22 |
| AU7348900A (en) | 2001-04-17 |
| BR0014049A (en) | 2003-07-15 |
| WO2001019160A2 (en) | 2001-03-22 |
| CN1451126A (en) | 2003-10-22 |
| WO2001019160A3 (en) | 2002-05-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6289362B1 (en) | System and method for generating, transferring and using an annotated universal address | |
| US6081829A (en) | General purpose web annotations without modifying browser | |
| US7360166B1 (en) | System, method and apparatus for selecting, displaying, managing, tracking and transferring access to content of web pages and other sources | |
| US6546406B1 (en) | Client-server computer system for large document retrieval on networked computer system | |
| US7562287B1 (en) | System, method and apparatus for selecting, displaying, managing, tracking and transferring access to content of web pages and other sources | |
| US6122647A (en) | Dynamic generation of contextual links in hypertext documents | |
| US5649186A (en) | System and method for a computer-based dynamic information clipping service | |
| US6397217B1 (en) | Hierarchical caching techniques for efficient dynamic page generation | |
| CN101427229B (en) | Techniques for modifying the presentation of information displayed to an end user of a computer system | |
| US20080028334A1 (en) | Searchable personal browsing history | |
| Kamba et al. | ANATAGONOMY: a personalized newspaper on the World Wide Web | |
| US20160098170A1 (en) | Discoverability and navigation of hyperlinks | |
| US20040168123A1 (en) | Infrastructure for generating web content | |
| US20030070143A1 (en) | Method for extracting digests, reformatting, and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation | |
| US20080195932A1 (en) | Method and apparatus for re-editing and redistributing web documents | |
| US20020047856A1 (en) | Web based stacked images | |
| CA2500263A1 (en) | System, method and apparatus for selecting, displaying, managing, tracking and transferring access to content of web pages and other sources | |
| MXPA05011863A (en) | System and method for customization of search results. | |
| US20060106844A1 (en) | Method and system for client-side manipulation of tables | |
| AU769236B2 (en) | Method and system for selecting and automatically updating arbitrary elements from structured documents | |
| WO2003105027A1 (en) | Improved web browser | |
| JP2000067038A (en) | Homepage preparing device | |
| US7047487B1 (en) | Methods for formatting electronic documents | |
| US20020138621A1 (en) | System and method for displaying remotely stored content on a web page | |
| Michard et al. | The Aquarelle resource discovery system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20020311 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
| RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: SASTRY, CHELLURY, R. Inventor name: PIZANO, ARTURO Inventor name: SEGAN, SANJEEV Inventor name: LEWIS, DARRIN |
|
| 17Q | First examination report despatched |
Effective date: 20030124 |
|
| RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB IT |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
| 18R | Application refused |
Effective date: 20081021 |