US20020023073A1 - Network interactive tree search method and system - Google Patents
Network interactive tree search method and system Download PDFInfo
- Publication number
- US20020023073A1 US20020023073A1 US09/368,110 US36811099A US2002023073A1 US 20020023073 A1 US20020023073 A1 US 20020023073A1 US 36811099 A US36811099 A US 36811099A US 2002023073 A1 US2002023073 A1 US 2002023073A1
- Authority
- US
- United States
- Prior art keywords
- search
- pages
- list
- readable medium
- computer readable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
Definitions
- This application relates to an invention similar to that in a patent application having attorney docket BC9-99-057, with the same inventors as identified above commonly assigned herewith to International Business Machines.
- the invention disclosed broadly relates to the field of computer networks, and more particularly relates to the field of search methods for the World-Wide Web (WWW or simply, the Web).
- WWW World-Wide Web
- the Internet is a global network of computers and computer networks that all linked communicate by virtue of the Internet Protocol (IP).
- IP Internet Protocol
- the IP is a packet-switched communications protocol. In such protocols the information to be transmitted is broken up into a series of packets (i.e., sets of data) that are encapsulated in a sort of electronic envelope (the packet) including a portion called a header that includes fields for identifying the source of the transmission, the destination, and other information about the data to be delivered to the destination (called the payload).
- a popular application for the Internet is to access the Web which uses a protocol called HTTP (HyperText Transfer Protocol) by client units for connecting to servers in the Web.
- HTTP HyperText Transfer Protocol
- a client unit e.g., a microcomputer unit with a communication subsystem connected to the Internet invokes the HTTP by simply typing a “http://” prefix with the desired Web address. Once the connection is made to the desired Web site, the user (or client) can access any document stored on that site that is available to that user.
- the interface used by the client is an application program called a Web browser (the Netscape and Explorer browsers are popular examples).
- the browser establishes hypertext links to the subject server, enabling the user to view graphical and textual representations of information provided by the server.
- HTML HyperText Mark Up Language
- Web-compliant browsers are capable of rendering text, graphics, images, audio, real-time video, etc.
- HTML is independent of client operating systems. So HTML renders the same content across a wide variety of software and hardware operating platforms.
- Software platforms include Windows 3.1, Windows NT, Apple's Copeland and Macintosh, and IBM's AIX and OS/2, HP Unix, etc.
- Popular compliant Web-Browsers include Microsoft's Internet Explorer, Netscape Navigator, Lynx and Mosaic.
- the browser interprets links to files, images, sound clips, etc. through the use of hypertext links. Upon user invocation of a hypertext link to a Web page, the browser initiates a network request to receive the desired Web page.
- a common problem with the general Internet search is that often too many result pages are returned and many of these have low relevance to the search request issued by the end-user.
- the search engines used in corporate sites are not as powerful as the Internet search engines and typically provide less information than is desirable.
- Search engines are often less effective than desirable or lack advanced features of more generic search engines.
- end users desire information which is in related sites, perhaps business partners, etc., which is not contained within the corporate pages and which will not be displayed as a result of the corporate page search.
- Some search engines such as Hotbot, allow a user to specify a domain, but do not then search the related sites.
- a method for searching for data in a data network comprising hyperlinked pages comprising the steps of (1) receiving an initial set of network addresses for pages in the data network; (2) receiving a non-negative integer, N, specifying a chain length; (3) receiving a set of at least one search argument comprising search criteria; and (4) performing a search wherein all pages linked to said initial set of addresses by a chain of distance less than or equal to N are examined for compliance with the search criteria, and all pages meeting such criteria are returned as successful objects of the search.
- the foregoing method can be implemented as a computer readable medium with instructions for performing the above steps, as an application program, or a browser resident at an end user's computer system. It is also possible to implement as a special purpose information handling system.
- FIG. 1 is an illustration of a typical Internet Web page.
- FIG. 2 is an illustration of typical Internet Web page linkage showing a results page produced by a search engine and links to other sites.
- FIG. 3 shows a simplified configuration of an information handling system suitable for performing a search method according to the invention.
- FIG. 4 is a simplified flow chart for four basic processes according to the invention.
- FIG. 5 is a flow chart illustrating the first process shown in FIG. 4.
- FIG. 6 is a continuation of the flow chart of FIG. 5.
- FIG. 7 is a flow chart illustrating the second process of FIG. 4.
- FIG. 8 is a flow chart illustrating the third process of FIG. 4.
- FIG. 9 is a flow chart illustrating the fourth process relating to presenting the URL list to the user.
- FIG. 1 we show a typical Internet search result page 100 that may have been produced from a search inquiry using any of the popular search engines such as AltaVista, Lycos, Excite or any others. It may contain headers and footer information 102 , graphic pictures and animation 104 , and typically contains text information 106 and 110 . It also typically contains other “hot links” (URL references 108 with the appropriate supporting logic to allow a user to “click” on the address or phrase and have the browser initiate a call to that location). Depending on how precise the original search arguments were, and how many references exist, the number of pages returned may be small or very large, as noted earlier.
- URL references 108 with the appropriate supporting logic to allow a user to “click” on the address or phrase and have the browser initiate a call to that location.
- FIG. 2 A pictorial representation of the results of a typical search are shown in FIG. 2.
- the initial search results are shown in the Initial Page 202 .
- Page 202 shows three network addresses (URLs 1 - 3 ). Each URL points at (or links to) a different page.
- the page at URL- 1 points to page 204 ; the page at URL- 2 points at page 210 ; and the page at URL- 3 points at page 206 .
- Each of these pages comprise URLs that identify pages that link to other pages.
- Pages 202 , 206 and 208 are all within the same site; whereas pages 204 , 210 and 212 are in other sites. Pages point to other pages, with URLs that again point to other pages, and often loop back to pages already referenced.
- FIG. 2 can be thought of as a tree, with a “root” (the first page found, page 202 ) and “branches” (the next layer of pages that page 202 references, pages 206 , 204 , and 210 ), with more branches that each of these pages reference, etc.
- FIG. 3 there is shown an illustration of the software and hardware configuration for an end user system 300 according to an aspect of the invention.
- the system 300 comprises a plurality of software applications 302 including a client application 304 operating in accordance with the invention.
- the configuration 300 also includes a Windows 95 Operating System 306 comprising a 32-bit shell 308 and a windows core 310 for interacting with the applications programs.
- the Windows Operating System 306 comprises a virtual machine manager 312 , an installable file manager system/I/O support/Winsock Support module 314 , and a configuration manager.
- the Operating System 306 also includes a universal driver 318 for interacting with various device drivers 320 , each provided by an OEM (original equipment manufacturer) each for driving a plurality of OEM devices 322 (e.g., a printer, CD ROM drive, and communications card).
- OEM original equipment manufacturer
- Other conventional hardware and software components for information handling systems is included but not shown, for purposes of simplicity.
- the end user application 304 allows for standard search classifications and operators. This includes any terms, Boolean operators such as AND, OR, NOT, NOR, etc; and also allows a “starting location” parameter.
- the application 304 includes program instructions for performing any of various methods according to the invention.
- the application is shown as a client application running on a Windows 95 system.
- the application could run as a client or a server, and on any operating system such as Windows, Netware, UNIX, or IBM OS/2, since all modern operating systems have the ability for applications to pass messages among the applications they support.
- the functionality of the invention or part thereof could be implement in the browser 305 or the search engine.
- FIG. 4 we show a simplified flowchart illustrating a method for performing a search according to the invention.
- the method comprises four principal operations 400 , 500 , 600 , and 700 .
- Operation 400 comprises various steps (see FIGS. 5 and 6) for generating a search argument to be sent to a search engine.
- Operation 500 relates to determining the parameters of a tree search.
- Operation 600 relates to building the search tree.
- Operation 700 relates to presenting the user with a choice of a verbose (full tree) list of search results or a list of root search results only.
- the chart shows that the process may proceed from operation 500 directly to operation 700 (if no tree search is selected) or may proceed to operating 600 and then operation 500 and on to operation 700 .
- step 402 the end user using an application program 304 fills in the search criteria, selects one or more operators, and also fills in the search location.
- a decision 404 is then made to determine whether the search should be restricted to specific domains or locations. If it is not restricted, in step 406 , the application formats a typical browser search argument, using the parameters that the given search engine supports, and withholding the “required location” parameter. The “required location” parameter is an option for the user to limit (or restrict) the search to a given site or set of sites. If the search is restricted, then in step 408 , the domain filters are stored for later use, and the process continues at step 406 . After step 406 , the application 304 sends the argument to the browser (step 410 ) and the browser sends the search message to the search engine in a normal manner (step 412 ).
- search engine returns the results, (step 414 ), the browser passes these results to the client application 304 (step 416 ), and the application 304 reads and (optionally) categorizes the returned URLs. This will produce a set of clustered URLs.
- search results comprise what we will call a basis list of addresses or URLs.
- the process 400 continues at step 420 .
- the application 304 stores (step 420 ) the list of URLs, and then determines (step 422 ) whether there are more search result pages. If there are, the application instructs the browser to get the next page (step 424 ), and continues until there are no additional pages of results. Then when there are no further pages of results, in step 426 , the stored domain (location) restriction list is retrieved and the URLs not on the list are discarded.
- step 504 orders the search results by URL group (.com, .org, etc), by name within the group, and by most senior URL to least senior URL within the name. We define most senior as the page with the least number of additional slashes (/) after the “.com” or other qualifier.
- step 506 the user receives the grouped search results and is allowed to select/deselect URLs for the next step. This step occurs in response to a negative determination in decision 502 or directly follows step 504 .
- the application 304 (optionally) displays the clustered list and allows the user to select/deselect clusters ordered to continue to operate upon. For instance, the user could command the application to discard the “gov” clusters and the clusters of the form “.org” since the user may not be looking for information from the government or organizations at this time.
- decision 508 a determination is made as to whether a tree search has been selected. If not, the process continues at step 700 in FIG. 9. If a tree search has been selected, the process is continued at step 510 .
- step 510 the application 304 prompts the user for “N”, the length of the chain of links to be used in the tree search, and the search arguments to be used in the tree search.
- Operation 600 provides the tree search.
- the first time through we use the list of URLs produced in operation 500 as our tree list.
- steps 602 - 606 we use the tree list of URLs, examine the page associated with each URL on the list, and add any new URLs contained on those pages to the tree.
- Step 606 is the decision step where we determine whether we have exhausted the list of URLs we began with in step 602 , or whether there are more uninspected pages associated with the list.
- step 608 the search tree is examined for duplicate links to the same page and these duplicates are eliminated from the list.
- step 610 we determine if we have completed N iterations, that is followed the URLs to a chain N deep. If we have not, we continue the process in step 602 , and follow each of the URLs on the tree for one more step. This increases the chain length by one. If in step 610 , we conclude that we have completed the chain of length N, a URL list is produced in step 612 .
- step 614 we examine the pages referenced on this list to see if they meet the search criteria. We reject those pages not meeting the search criteria and output the list of URLs that do meet the search criteria.
- operation 700 begins with a decision 702 wherein it is determined whether the user has selected a verbose list. If yes, then in step 704 all links resulting from the search tree are shown. If not, then in step 706 only most senior links are shown in the list of references (this is called a “terse” list).
- An alternative embodiment is to allow this process to run on a server, and the client uses a typical browser access to this server. This would allow a much more powerful processor than a typical client system to perform the multiple passes of searching and pruning. Servers are typically larger and faster in terms of processor speed, multiple processor architectures, more RAM and caching, etc. and also typically have a much faster connection to the network than the typical client on a LAN or a dial-up connection. This approach creates a client-server-server model, with the targeted search engine being the last server in the chain.
- Another alternative implementation would be to install this method on the targeted search engine itself (e.g. Yahoo, AltaVista, Excite, etc.) and use normal browser access. This would allow the search engine to return a more meaningful list of hits initially, with no additional search requests required, assuming the original search arguments have not changed.
- the targeted search engine e.g. Yahoo, AltaVista, Excite, etc.
- This methodology allows the user a great deal of flexibility in how a search is conducted. For example, if a search is first undertaken without using this application, and a reasonably small number of hits is returned, the search data can be examined the same way it is today, essentially by serial examination of each page URL returned. If a large number of hits is returned, the original search arguments could be given to the new application, either by reentering them or by using the clipboard function (or its equivalent), and then running the application to reduce the number of hits while increasing the relevancy of those finally returned.
- This invention supports extending the search to reach all the URLs reachable from a given URL, or to restrict the search to a specific targeted location.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
- This application relates to an invention similar to that in a patent application having attorney docket BC9-99-057, with the same inventors as identified above commonly assigned herewith to International Business Machines.
- 1. Field of the Invention
- The invention disclosed broadly relates to the field of computer networks, and more particularly relates to the field of search methods for the World-Wide Web (WWW or simply, the Web).
- 2. Description of the Related Art
- The Internet is a global network of computers and computer networks that all linked communicate by virtue of the Internet Protocol (IP). The IP is a packet-switched communications protocol. In such protocols the information to be transmitted is broken up into a series of packets (i.e., sets of data) that are encapsulated in a sort of electronic envelope (the packet) including a portion called a header that includes fields for identifying the source of the transmission, the destination, and other information about the data to be delivered to the destination (called the payload). A popular application for the Internet is to access the Web which uses a protocol called HTTP (HyperText Transfer Protocol) by client units for connecting to servers in the Web. A client unit (e.g., a microcomputer unit with a communication subsystem connected to the Internet) invokes the HTTP by simply typing a “http://” prefix with the desired Web address. Once the connection is made to the desired Web site, the user (or client) can access any document stored on that site that is available to that user. The interface used by the client is an application program called a Web browser (the Netscape and Explorer browsers are popular examples). The browser establishes hypertext links to the subject server, enabling the user to view graphical and textual representations of information provided by the server.
- The Web relies on an interpretative scripting language called HTML (HyperText Mark Up Language) which with Web-compliant browsers are capable of rendering text, graphics, images, audio, real-time video, etc. HTML is independent of client operating systems. So HTML renders the same content across a wide variety of software and hardware operating platforms. Software platforms include Windows 3.1, Windows NT, Apple's Copeland and Macintosh, and IBM's AIX and OS/2, HP Unix, etc. Popular compliant Web-Browsers include Microsoft's Internet Explorer, Netscape Navigator, Lynx and Mosaic. The browser interprets links to files, images, sound clips, etc. through the use of hypertext links. Upon user invocation of a hypertext link to a Web page, the browser initiates a network request to receive the desired Web page.
- Users of the Internet are faced with an ever-increasing number of sites, each containing varied information. This results in difficulty finding the desired information. Among commonly used tools for locating information are the so-called search engines or portals to the Internet. These sites provide various indexes to other sites. Search engines use crawlers or spiders, programs having their own sets of rules, to index pages on the Web. Some of these follow every link on every page they find. Others ignore some kinds of links.
- A common problem with the general Internet search is that often too many result pages are returned and many of these have low relevance to the search request issued by the end-user. Typically, the search engines used in corporate sites are not as powerful as the Internet search engines and typically provide less information than is desirable.
- Finding information on the Internet, or on corporate intranets, can be a daunting task. Even targeted searches frequently result in hundreds or thousands of hits. Many producers of Web pages intentionally use techniques to cause their pages to be displayed as a result of searches which are not really pertinent. This results in too much information, much of it not useful. In addition, many Web domains have other links buried within their pages, and restricting a search to a specific Web domain results in ignoring information contained in these links. This results in too little information. Thus, there is a need for a search process producing more directly useable results.
- Corporate sites frequently employ a search engine to allow users to search their corporate pages. These search engines are often less effective than desirable or lack advanced features of more generic search engines. At times, end users desire information which is in related sites, perhaps business partners, etc., which is not contained within the corporate pages and which will not be displayed as a result of the corporate page search. Some search engines, such as Hotbot, allow a user to specify a domain, but do not then search the related sites.
- Accordingly, there is a need for a system for searching the Internet that limits the search results and which overcomes the above problems and produces more directly useful search results.
- Briefly, according to the invention, a method for searching for data in a data network comprising hyperlinked pages comprising the steps of (1) receiving an initial set of network addresses for pages in the data network; (2) receiving a non-negative integer, N, specifying a chain length; (3) receiving a set of at least one search argument comprising search criteria; and (4) performing a search wherein all pages linked to said initial set of addresses by a chain of distance less than or equal to N are examined for compliance with the search criteria, and all pages meeting such criteria are returned as successful objects of the search.
- According to optional embodiments the foregoing method can be implemented as a computer readable medium with instructions for performing the above steps, as an application program, or a browser resident at an end user's computer system. It is also possible to implement as a special purpose information handling system.
- FIG. 1 is an illustration of a typical Internet Web page.
- FIG. 2 is an illustration of typical Internet Web page linkage showing a results page produced by a search engine and links to other sites.
- FIG. 3 shows a simplified configuration of an information handling system suitable for performing a search method according to the invention.
- FIG. 4 is a simplified flow chart for four basic processes according to the invention.
- FIG. 5 is a flow chart illustrating the first process shown in FIG. 4.
- FIG. 6 is a continuation of the flow chart of FIG. 5.
- FIG. 7 is a flow chart illustrating the second process of FIG. 4.
- FIG. 8 is a flow chart illustrating the third process of FIG. 4.
- FIG. 9 is a flow chart illustrating the fourth process relating to presenting the URL list to the user.
- In FIG. 1 we show a typical Internet
search result page 100 that may have been produced from a search inquiry using any of the popular search engines such as AltaVista, Lycos, Excite or any others. It may contain headers andfooter information 102, graphic pictures andanimation 104, and typically contains 106 and 110. It also typically contains other “hot links” (text information URL references 108 with the appropriate supporting logic to allow a user to “click” on the address or phrase and have the browser initiate a call to that location). Depending on how precise the original search arguments were, and how many references exist, the number of pages returned may be small or very large, as noted earlier. - A pictorial representation of the results of a typical search are shown in FIG. 2. The initial search results are shown in the
Initial Page 202.Page 202 shows three network addresses (URLs 1-3). Each URL points at (or links to) a different page. The page at URL-1 points topage 204; the page at URL-2 points atpage 210; and the page at URL-3 points atpage 206. Each of these pages, comprise URLs that identify pages that link to other pages. -
202, 206 and 208 are all within the same site; whereasPages 204, 210 and 212 are in other sites. Pages point to other pages, with URLs that again point to other pages, and often loop back to pages already referenced.pages - FIG. 2 can be thought of as a tree, with a “root” (the first page found, page 202) and “branches” (the next layer of pages that
page 202 references, 206, 204, and 210), with more branches that each of these pages reference, etc.pages - Referring to FIG. 3, there is shown an illustration of the software and hardware configuration for an
end user system 300 according to an aspect of the invention. Thesystem 300 comprises a plurality of software applications 302 including aclient application 304 operating in accordance with the invention. Theconfiguration 300 also includes aWindows 95Operating System 306 comprising a 32-bit shell 308 and awindows core 310 for interacting with the applications programs. TheWindows Operating System 306 comprises avirtual machine manager 312, an installable file manager system/I/O support/Winsock Support module 314, and a configuration manager. TheOperating System 306 also includes auniversal driver 318 for interacting with various device drivers 320, each provided by an OEM (original equipment manufacturer) each for driving a plurality of OEM devices 322 (e.g., a printer, CD ROM drive, and communications card). Other conventional hardware and software components for information handling systems is included but not shown, for purposes of simplicity. - In a method according to one aspect of the invention, the
end user application 304 allows for standard search classifications and operators. This includes any terms, Boolean operators such as AND, OR, NOT, NOR, etc; and also allows a “starting location” parameter. Theapplication 304 includes program instructions for performing any of various methods according to the invention. - For simplicity, the application is shown as a client application running on a
Windows 95 system. However, the application could run as a client or a server, and on any operating system such as Windows, Netware, UNIX, or IBM OS/2, since all modern operating systems have the ability for applications to pass messages among the applications they support. In another embodiment the functionality of the invention or part thereof could be implement in the browser 305 or the search engine. - In FIG. 4 we show a simplified flowchart illustrating a method for performing a search according to the invention. The method comprises four
400, 500, 600, and 700.principal operations Operation 400 comprises various steps (see FIGS. 5 and 6) for generating a search argument to be sent to a search engine.Operation 500 relates to determining the parameters of a tree search.Operation 600 relates to building the search tree.Operation 700 relates to presenting the user with a choice of a verbose (full tree) list of search results or a list of root search results only. The chart shows that the process may proceed fromoperation 500 directly to operation 700 (if no tree search is selected) or may proceed to operating 600 and thenoperation 500 and on tooperation 700. - Referring to FIG. 5, there is shown detail of the process of
block 400. Instep 402, the end user using anapplication program 304 fills in the search criteria, selects one or more operators, and also fills in the search location. A decision 404 is then made to determine whether the search should be restricted to specific domains or locations. If it is not restricted, instep 406, the application formats a typical browser search argument, using the parameters that the given search engine supports, and withholding the “required location” parameter. The “required location” parameter is an option for the user to limit (or restrict) the search to a given site or set of sites. If the search is restricted, then instep 408, the domain filters are stored for later use, and the process continues atstep 406. Afterstep 406, theapplication 304 sends the argument to the browser (step 410) and the browser sends the search message to the search engine in a normal manner (step 412). - When the search engine returns the results, (step 414), the browser passes these results to the client application 304 (step 416), and the
application 304 reads and (optionally) categorizes the returned URLs. This will produce a set of clustered URLs. These search results comprise what we will call a basis list of addresses or URLs. - Referring to FIG. 6, the
process 400 continues atstep 420. Theapplication 304 stores (step 420) the list of URLs, and then determines (step 422) whether there are more search result pages. If there are, the application instructs the browser to get the next page (step 424), and continues until there are no additional pages of results. Then when there are no further pages of results, instep 426, the stored domain (location) restriction list is retrieved and the URLs not on the list are discarded. - Referring to FIG. 7, the
process 500 begins atdecision 502 wherein a determination is made to establish whether client categorization has been requested. If it has, then step 504 orders the search results by URL group (.com, .org, etc), by name within the group, and by most senior URL to least senior URL within the name. We define most senior as the page with the least number of additional slashes (/) after the “.com” or other qualifier. - In
step 506 the user receives the grouped search results and is allowed to select/deselect URLs for the next step. This step occurs in response to a negative determination indecision 502 or directly followsstep 504. Thus, the application 304 (optionally) displays the clustered list and allows the user to select/deselect clusters ordered to continue to operate upon. For instance, the user could command the application to discard the “gov” clusters and the clusters of the form “.org” since the user may not be looking for information from the government or organizations at this time. Indecision 508, a determination is made as to whether a tree search has been selected. If not, the process continues atstep 700 in FIG. 9. If a tree search has been selected, the process is continued atstep 510. - In
step 510, theapplication 304 prompts the user for “N”, the length of the chain of links to be used in the tree search, and the search arguments to be used in the tree search. -
Operation 600 provides the tree search. The first time through, we use the list of URLs produced inoperation 500 as our tree list. In steps 602-606 we use the tree list of URLs, examine the page associated with each URL on the list, and add any new URLs contained on those pages to the tree. Step 606 is the decision step where we determine whether we have exhausted the list of URLs we began with instep 602, or whether there are more uninspected pages associated with the list. - In
step 608 the search tree is examined for duplicate links to the same page and these duplicates are eliminated from the list. The process continues at step 610. In step 610, we determine if we have completed N iterations, that is followed the URLs to a chain N deep. If we have not, we continue the process instep 602, and follow each of the URLs on the tree for one more step. This increases the chain length by one. If in step 610, we conclude that we have completed the chain of length N, a URL list is produced instep 612. Instep 614, we examine the pages referenced on this list to see if they meet the search criteria. We reject those pages not meeting the search criteria and output the list of URLs that do meet the search criteria. This list is output to operation 700 (FIG. 4) with the details shown in FIG. 9. Referring to FIG. 9,operation 700 begins with adecision 702 wherein it is determined whether the user has selected a verbose list. If yes, then instep 704 all links resulting from the search tree are shown. If not, then in step 706 only most senior links are shown in the list of references (this is called a “terse” list). - It is important to note that throughout this process we have contained all of the processing logic within this application. There has not been any impact to the established search engines. Once the original search is returned, the client application uses the browser functions to call in the referenced pages and scan for the embedded hot links. The pruning is invisible to the search engines and the browser.
- An alternative embodiment is to allow this process to run on a server, and the client uses a typical browser access to this server. This would allow a much more powerful processor than a typical client system to perform the multiple passes of searching and pruning. Servers are typically larger and faster in terms of processor speed, multiple processor architectures, more RAM and caching, etc. and also typically have a much faster connection to the network than the typical client on a LAN or a dial-up connection. This approach creates a client-server-server model, with the targeted search engine being the last server in the chain.
- Another alternative implementation would be to install this method on the targeted search engine itself (e.g. Yahoo, AltaVista, Excite, etc.) and use normal browser access. This would allow the search engine to return a more meaningful list of hits initially, with no additional search requests required, assuming the original search arguments have not changed.
- This methodology allows the user a great deal of flexibility in how a search is conducted. For example, if a search is first undertaken without using this application, and a reasonably small number of hits is returned, the search data can be examined the same way it is today, essentially by serial examination of each page URL returned. If a large number of hits is returned, the original search arguments could be given to the new application, either by reentering them or by using the clipboard function (or its equivalent), and then running the application to reduce the number of hits while increasing the relevancy of those finally returned.
- This invention supports extending the search to reach all the URLs reachable from a given URL, or to restrict the search to a specific targeted location.
- Although a specific embodiment of the invention has been disclosed, it will be understood by those having skill in the art that changes can be made to this specific embodiment without departing from the spirit and scope of the invention. The scope of the invention is not to be restricted, therefore, to the specific embodiment, and it is intended that the appended claims cover any and all such applications, modifications, and embodiments within the scope of the present invention.
Claims (36)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US09/368,110 US6397210B1 (en) | 1999-08-04 | 1999-08-04 | Network interactive tree search method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US09/368,110 US6397210B1 (en) | 1999-08-04 | 1999-08-04 | Network interactive tree search method and system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20020023073A1 true US20020023073A1 (en) | 2002-02-21 |
| US6397210B1 US6397210B1 (en) | 2002-05-28 |
Family
ID=23449886
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US09/368,110 Expired - Fee Related US6397210B1 (en) | 1999-08-04 | 1999-08-04 | Network interactive tree search method and system |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US6397210B1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060059126A1 (en) * | 2004-09-16 | 2006-03-16 | International Business Machines Corporation | System and method for network searching |
| US20080183720A1 (en) * | 2005-10-27 | 2008-07-31 | Douglas Stuart Brown | Systems, Methods, and Media for Dynamically Generating a Portal Site Map |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3698242B2 (en) * | 1999-08-20 | 2005-09-21 | 日本電気株式会社 | Information set importance determination system and method, and recording medium recording information set importance determination program |
| US6732086B2 (en) * | 1999-09-07 | 2004-05-04 | International Business Machines Corporation | Method for listing search results when performing a search in a network |
| US6983311B1 (en) * | 1999-10-19 | 2006-01-03 | Netzero, Inc. | Access to internet search capabilities |
| JP2001175528A (en) * | 1999-12-21 | 2001-06-29 | Tokyo Kikai Seisakusho Ltd | System and method for providing information |
| GB0004578D0 (en) * | 2000-02-25 | 2000-04-19 | Xrefer Com Limited | Automated data cross-referencing method |
| US6789076B1 (en) * | 2000-05-11 | 2004-09-07 | International Business Machines Corp. | System, method and program for augmenting information retrieval in a client/server network using client-side searching |
| US7155491B1 (en) * | 2000-11-13 | 2006-12-26 | Websidestory, Inc. | Indirect address rewriting |
| US7823057B1 (en) | 2001-01-04 | 2010-10-26 | Adobe Systems Incorporated | Simplified document creation |
| US7058644B2 (en) * | 2002-10-07 | 2006-06-06 | Click Commerce, Inc. | Parallel tree searches for matching multiple, hierarchical data structures |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5920859A (en) * | 1997-02-05 | 1999-07-06 | Idd Enterprises, L.P. | Hypertext document retrieval system and method |
| US6144962A (en) * | 1996-10-15 | 2000-11-07 | Mercury Interactive Corporation | Visualization of web sites and hierarchical data structures |
| US6112202A (en) * | 1997-03-07 | 2000-08-29 | International Business Machines Corporation | Method and system for identifying authoritative information resources in an environment with content-based links between information resources |
| US6101503A (en) * | 1998-03-02 | 2000-08-08 | International Business Machines Corp. | Active markup--a system and method for navigating through text collections |
| US6138113A (en) * | 1998-08-10 | 2000-10-24 | Altavista Company | Method for identifying near duplicate pages in a hyperlinked database |
-
1999
- 1999-08-04 US US09/368,110 patent/US6397210B1/en not_active Expired - Fee Related
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060059126A1 (en) * | 2004-09-16 | 2006-03-16 | International Business Machines Corporation | System and method for network searching |
| US7490082B2 (en) * | 2004-09-16 | 2009-02-10 | International Business Machines Corporation | System and method for searching internet domains |
| US20080183720A1 (en) * | 2005-10-27 | 2008-07-31 | Douglas Stuart Brown | Systems, Methods, and Media for Dynamically Generating a Portal Site Map |
| US8326837B2 (en) * | 2005-10-27 | 2012-12-04 | International Business Machines Corporation | Dynamically generating a portal site map |
Also Published As
| Publication number | Publication date |
|---|---|
| US6397210B1 (en) | 2002-05-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6397218B1 (en) | Network interactive search engine server and method | |
| US7702811B2 (en) | Method and apparatus for marking of web page portions for revisiting the marked portions | |
| US6230196B1 (en) | Generation of smart HTML anchors in dynamic web page creation | |
| US6453342B1 (en) | Method and apparatus for selective caching and cleaning of history pages for web browsers | |
| US7293012B1 (en) | Friendly URLs | |
| US6222634B1 (en) | Apparatus and method for printing related web pages | |
| US7660781B2 (en) | Method, apparatus and computer-readable medium for searching and navigating a document database | |
| US6480853B1 (en) | Systems, methods and computer program products for performing internet searches utilizing bookmarks | |
| US8250050B2 (en) | Systems and methods for managing database authentication and sessions | |
| US7058644B2 (en) | Parallel tree searches for matching multiple, hierarchical data structures | |
| JP3924102B2 (en) | Method for customizing file and information processing system | |
| US6433794B1 (en) | Method and apparatus for selecting a java virtual machine for use with a browser | |
| EP1428139B1 (en) | System and method for extracting content for submission to a search engine | |
| US7865494B2 (en) | Personalized indexing and searching for information in a distributed data processing system | |
| US6408316B1 (en) | Bookmark set creation according to user selection of selected pages satisfying a search condition | |
| CN1151457C (en) | System and method for sharing search engine query based on World Wide Web | |
| US6941552B1 (en) | Method and apparatus to retain applet security privileges outside of the Java virtual machine | |
| US6308210B1 (en) | Method and apparatus for traffic control and balancing for an internet site | |
| US20010011365A1 (en) | Method and apparatus for passively browsing the internet | |
| US6963901B1 (en) | Cooperative browsers using browser information contained in an e-mail message for re-configuring | |
| JP2001043244A (en) | Method and apparatus for implementing a search selection tool on a browser | |
| US20020143861A1 (en) | Method and apparatus for managing state information in a network data processing system | |
| US7275086B1 (en) | System and method for embedding a context-sensitive web portal in a computer application | |
| US6397210B1 (en) | Network interactive tree search method and system | |
| US20060080612A1 (en) | Dynamic portlet tabbing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:STERN, EDITH H.;DUNN, JAMES M.;WILLNER, BARRY E.;REEL/FRAME:010158/0537;SIGNING DATES FROM 19990802 TO 19990803 |
|
| FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| FPAY | Fee payment |
Year of fee payment: 4 |
|
| FPAY | Fee payment |
Year of fee payment: 8 |
|
| REMI | Maintenance fee reminder mailed | ||
| LAPS | Lapse for failure to pay maintenance fees | ||
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20140528 |