HK1173525B

HK1173525B - Propagating signals across a web graph

Info

Publication number: HK1173525B
Application number: HK13100620.3A
Authority: HK
Inventors: William Finley Thomas; De Melo Duarte Herbert; Middha Bhuvan; Qi Dehu; Holt Gibbs Tanton; Muthukrishnan Sambavi
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2011-02-18
Filing date: 2013-01-15
Publication date: 2018-04-27

Description

Propagating signals across a Web graph

Background

A search engine or search website generates a set of search results that are responsive to a search query. Search engines attempt to select the most sensitive documents, videos, pictures, and web pages to include in search results. The search engine matches terms in the query with terms associated with the web page to determine whether the web page matches the search results. The search engine may then rank the matching web pages according to responsiveness and display the most responsive search results.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in isolation as an aid in determining the scope of the claimed subject matter.

Embodiments of the present invention are generally directed to a method of propagating signals across a web graph (network graph). The signal describes the file or otherwise provides useful information about the file in the web graph. A web graph is a collection of files that are related to each other. The files may be related to each other by links, such as hyperlinks. For example, a web page may be related to other web pages connected by hyperlinks. A signal is propagated in the sense that information from one document is associated with a description of the associated document. This information may not be directly found in the target file. The search engine may use this information to determine that the target document is relevant to the search query.

Drawings

Embodiments of the invention will be described in detail hereinafter with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of an exemplary computing environment suitable for implementing embodiments of the present invention;

FIG. 2 is a diagram of a computing system architecture suitable for propagating signals across a web graph, according to an embodiment of the invention;

FIG. 3 is a diagram illustrating relationships between files according to an embodiment of the invention;

FIG. 4 is a table illustrating contents within a file index according to an embodiment of the invention;

FIG. 5 is a table illustrating contents within an anchor stream (anchor stream) according to an embodiment of the present invention;

FIG. 6 is a flow diagram illustrating a method of adding terms from related documents to a document description of a document in accordance with an embodiment of the present invention;

FIG. 7 is a flow diagram illustrating a method of associating terms from related documents with document descriptions of documents in accordance with an embodiment of the present invention; and

FIG. 8 is a flow diagram illustrating a method for presenting search results using an anchor stream generated by propagating terms between files related by links, according to an embodiment of the present invention.

Detailed Description

The subject matter of embodiments of the present invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms "step" and/or "block" may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.

Embodiments of the present invention are generally directed to a method of propagating signals across a web graph. The signal describes the file or otherwise provides useful information about the file in the web graph. A web graph is a collection of files that are related to each other. The files may be related to each other by links, such as hyperlinks. For example, a web page may be related to other web pages connected by hyperlinks. A signal is propagated in the sense that information from one document is associated with a description of the associated document. This information may not be found in the target file. The search engine may use this information to determine that the target document is relevant to the search query.

Thus, in one embodiment, one or more computer-readable storage media are provided having computer-executable instructions embodied thereon that, when executed by a computing device, perform a method of adding terms from a related document to a document description of a target document. The method comprises the following steps: determining that terms found in the relevant document do not match the filter criteria, wherein terms matching the filter criteria are not added to the document description of the target document. The file description of a target file includes terms within a plurality of signal streams associated with the target file. The method further comprises the following steps: a similarity score for the term is calculated. The similarity score is based on the cosine similarity between the target document and the related documents. The method further comprises the following steps: a source credibility score for the term is calculated based on the static ranking of the related documents. The static ranking is based on the individual popularity scores of the related documents. The method further comprises the following steps: the corroborative score for a term used in a link is calculated based on the similarity between the term and the term. The method further comprises the following steps: a uniqueness score is calculated for the term based on whether the term is currently associated with document descriptions from other sources. The method further comprises the following steps: a term score for the term is calculated based on the similarity score, the source confidence score, the corroboration score, and the uniqueness score. The method further comprises the following steps: because the term score is above the threshold score, the term is associated with the document description.

In another embodiment, a method is provided for associating terms from a related document with a document description of a target document, wherein the related document is related to the target document by a forward-link or backward-link relationship. The file description is used to determine whether the target file should be returned as a search result in response to the query. The method comprises the following steps: a similarity score for the term is calculated. The similarity score is based on the similarity between the target document and the related documents. The method further comprises the following steps: a source credibility score for the term is calculated based on the static ranking of the related documents. The static ranking is based on the individual popularity scores of the related documents. The method further comprises the following steps: the corroborative score for a term used in a link from another document to a related document is calculated based on the similarity between the term and the term. The method further comprises the following steps: a uniqueness score is calculated for the term based on whether the term is currently associated with document descriptions from other sources. The method comprises the following steps: a term score for the term is calculated based on a weighted combination of the similarity score, the source confidence score, the corroboration score, and the uniqueness score. The method further comprises the following steps: because the term score is above the threshold score, the term is associated with the document description.

In an embodiment, one or more computer-readable storage media are provided having computer-executable instructions embodied thereon that, when executed by a computing device, perform a method of presenting search results using an anchor stream generated by propagating terms between files related by links. The method comprises the following steps: a search query composed of one or more terms is received. The method further comprises the following steps: because at least one of the one or more terms is associated with a target document in the anchor stream, the target document is determined to match the search query. As used in this application, the term anchor stream refers to either a forward-looking anchor stream or a backward-looking anchor stream. Terms that are not included in the target document are associated with the target document by the anchor stream because these terms are included in the related document and are determined to be related to the target document. The related files are linked to or from the target file. The method further comprises the following steps: presenting the target file as a search result in response to a search query.

Having briefly described an overview of embodiments of the present invention, an exemplary operating environment suitable for implementing embodiments of the present invention is described below.

Exemplary operating Environment

Referring to the drawings in general, and initially to FIG. 1 in particular, an exemplary operating environment for implementing embodiments of the present invention is shown and designated generally as computing device 100. Computing device 100 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing device 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated.

The invention may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a server. Generally, program components including routines, programs, objects, components, data structures, etc., refer to code that performs particular tasks or implements particular abstract data types. Embodiments of the invention may be practiced in a variety of system configurations, including general-purpose computers, specialty computing devices, and the like. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.

With continued reference to FIG. 1, the computing device 100 includes a bus 110 that directly or indirectly couples the following devices: memory 112, one or more processors 114, one or more presentation components 116, input/output (I/O) ports 118, I/O components 120, and an illustrative power supply 122. Bus 110 represents what may be one or more busses (such as an address bus, data bus, or combination thereof). Although the various blocks of FIG. 1 are shown with lines for the sake of clarity, in reality, delineating various components is not so clear, and metaphorically, the lines would more accurately be grey and fuzzy. For example, a presentation component such as a display device may be considered an I/O component 120 t. Also, the processor has a memory. The inventors hereof recognize that such is the nature of the art, and reiterate that the diagram of FIG. 1 is merely illustrative of an exemplary computing device that can be used in connection with one or more embodiments of the present invention. There is no distinction between categories such as "workstation," "server," "laptop," "handheld device," etc., as these are all contemplated and referred to as "computer" or "computing device" within the scope of FIG. 1.

Computing device 100 typically includes a variety of computer-readable storage media. By way of example, computer storage media may include: random Access Memory (RAM), Read Only Memory (ROM); an Electrically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technology; compact Disc Read Only Memory (CDROM), Digital Versatile Disc (DVD), or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage devices, or other magnetic storage devices; or any other medium that can be used to encode desired information and be accessed by computing device 100. The computer readable storage medium may be non-transitory.

The memory 112 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory 112 may be removable, non-removable, or a combination thereof. Exemplary memory includes solid state memory, hard drives, optical drives, and the like. Computing device 100 includes one or more processors 114 that read data from various entities such as bus 110, memory 112, or I/O components 120. Presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components 116 include a display device, speakers, a printing component, a vibrating component, and the like. I/O ports 118 allow computing device 100 to be logically coupled to other devices including I/O components 120, some of which are built-in. Illustrative I/O components 120 include a microphone, satellite dish, scanner, printer, wireless device, and the like.

Exemplary System architecture

Turning now to FIG. 2, an exemplary computing system architecture 200 suitable for adding terms from related documents to a document description of a target document in accordance with embodiments of the present invention is provided. The computing system architecture 200 shown in FIG. 2 is an example of one suitable computing system architecture 200. The computing system architecture 200 runs on one or more computing devices similar to the computing system 100 described with reference to fig. 1. Neither should the computing system architecture 200 be interpreted as having any dependency or requirement relating to any single module/component or combination of modules/components illustrated herein.

The computing system architecture 200 includes a search site 205. A search site 205 (alternatively depicted as a search engine), a search interface 210, a web graph data store 215, a forward anchor stream component 220, a backward anchor stream component 225, a stream store component 230, and a file index 235.

The search site 205 provides search results responsive to queries submitted by users. The search site 205 may be accessed by navigating to a URL associated with a search interface. The search site 205 displays one or more search results responsive to the search query on a user interface. The search site 205 may also provide advertisements and other features related to the search query. The search site 205 may include a web crawler (web crawler) that traverses one or more computer networks and catalogs encountered files. These files may be indexed for comparison with a search query. Relationships between files may be stored in a web graph. Links between files may be the basis for relationships.

The search interface component 210 generates an interface through which a search query may be submitted by a user. The search interface component 210 also presents a search results interface and/or other search features. The search interface component 210 may allow a user to set preferences, modify user profiles, log in, and otherwise facilitate the transfer of information between the user and the search site 205. In one embodiment, the search interface 210 provides different categories or verticals (verticals) that the user can select for searching. For example, the search interface 210 may allow a user to search for travel, shopping information, maps, books, or other categories of information. By selecting one of these categories, the user may limit the search results to those that fit the selected category.

The web graph data store 215 stores one or more web graphs. web graphs describe the relationships between files. These relationships are constructed by linking. As used throughout, the individual files being analyzed are described as target files. Other files may be linked to the target file and the target file may be linked to other files. In some cases, individual files may be linked to the target file and the target file may be linked back to the individual file. A visual depiction of the relationship between files is shown in fig. 3.

Turning now to FIG. 3, relationships between files in a web graph 300 are visually illustrated, in accordance with an embodiment of the present invention. FIG. 3 illustrates a simple set of relationships between a target file 310 and other files. An actual web graph of files may contain thousands of relationships between individual files. The simple illustration shown in FIG. 3 is not intended to be limiting, but merely to illustrate aspects of the relationships between related files. In one embodiment, the web graph 300 describes the relationship between files published over the Internet. A web page is an example of such a file.

The web graph 300 includes a target file 310. The target file includes text 312, link 314, link 316, and link 318. Target file 310 may be a web page. The link 314 is associated with the link text "spreadsheet Help" and links to the linked file A320. The link 316 is associated with the link text "database help" and links to the linked file B322. The link 318 is associated with the link text "email Help" and links to the linked file C324. Each of links 314, 316, and 318 are depicted as links forward with reference to target file 310.

Target file 310 is also associated with the file by a backward link. The backward links may be determined through the web graph 300. Target file 310 may not contain information indicating that one or more files are linked to target file 310. The files linked to the target file 310 by the backward link include a linked file D326, a linked file a 320, and a linked file E328. As can be seen, a single document (i.e., linked document A320) may be linked to the target document 310 by both forward and backward links.

As will be described in more detail later, information from related files 320, 322, and 324 may be associated with target file 310 within the forward-looking anchor stream. Information from related files 326, 320, and 328 may be associated with target file 310 through a look-back anchor stream.

Returning to FIG. 2, the forward anchor flow component 220 analyzes the web graph and constructs a forward-looking anchor flow. The forward looking anchor flow includes a number of terms. Each term within the anchor stream may be associated with one or more files within the web graph. The forward anchor stream component 220 uses a number of criteria to determine whether terms in the related file should be associated with the target file. Likewise, related files are those associated with the target file by a forward link of the forward anchor stream. Criteria for determining whether a term should be associated with a target document are described within the description of FIG. 6.

The backward anchor flow component 225 constructs a backward-looking anchor flow. The backward-looking anchor stream is similar to the forward-looking anchor stream previously described, except that terms from files related to the target file by a backward relationship are associated with the target file. The criteria used to associate terms with the base file may be similar to those used to generate the look-ahead anchor stream. In one embodiment, the weights given to the different criteria in determining whether to associate a term with a target file differ depending on whether a forward-looking or a backward-looking anchor stream is constructed.

The stream store component 230 stores forward-looking anchor streams and backward-looking anchor streams. As used in this application, the term anchor stream refers to either a forward-looking anchor stream or a backward-looking anchor stream. Additional signal streams may be stored in stream storage component 230. Examples of the additional signal stream include a content signal stream, a header word signal stream, a metadata signal stream, a click signal stream, and a spam-rank (spam-rank) signal stream. The header signal stream associates terms used in the file header with the file. The content signal stream associates terms used within the content of the file with the file. The click stream may associate terms within a search query with a document clicked on in response to being presented in response to the search query.

The file index 235 stores additional information about files contained within the web graph. Information that may be included within the file index is illustrated in FIG. 4.

Turning now to FIG. 4, an exemplary file index 400 is illustrated, in accordance with embodiments of the present invention. Exemplary file index 400 includes a column 410 of file names, file 1, file 2, file 3, through file N. Column 420 includes a file ID associated with the file and column 430 includes a file address or location associated with the file. This is a simplified document index and the actual document index may include additional information such as the publication date, author, and other such information of the document.

Turning now to fig. 5, an exemplary signal flow in accordance with an embodiment of the present invention is illustrated. The signal stream 500 includes a column of terms 510 and a column of document IDs 520. Each term is associated with one or more document IDs. For example, the first term "spreadsheet" 512 is associated with the first set of document IDs 522. The second term 514 "database" is associated with a different plurality of file IDs 524. The third term 516 "motion" is associated with a third plurality of document IDs 526. The search engine may use the signal stream to match terms in the query with terms associated with documents by document ID. In one embodiment, the search engine uses a single signal stream to determine whether a file matches a query. In other words, terms within the query must be associated with the matching document through a single signal stream. In another embodiment, a document is determined to match a search query if terms in the search query are associated with documents in one or more of the signal streams.

Turning now to FIG. 6, a flow diagram is depicted that illustrates a method 600 of adding terms from related documents to a document description of a target document, in accordance with an embodiment of the present invention. As previously described, the target document is the document to which the other documents are related by forward or backward links. Relationships between files may be determined by analyzing a web graph containing the files.

At step 610, it is determined that the terms found in the relevant document do not match the filter criteria. Terms that match the filter criteria are not added to the document description. Thus, the fact that a term found in a related document is determined not to match the filter criteria means that the term overcomes the first hurdle to be associated with the target document. The file description includes a plurality of signals that describe the file. These signals may be stored in one or more signal streams. Each signal stream that includes terms associated with the document may be part of the document description.

In one embodiment, the filter criteria include a list of common words excluded from the signal stream. For example, very common words may not be useful in determining relevance of a target document. Excluding the extremely common words from other analysis prevents the signal stream from being flooded with low value terms.

In another embodiment, the filter criteria excludes terms from all related documents when the target document links to more than a threshold number of other documents. When a target document links to a large number of other documents, the terms found within these other documents may not add too much descriptive value to the target document.

When it is determined that the term does not match the filter criteria, further analysis is performed to determine whether the term should be added to the document description of the target document. Likewise, terms may be added to the description of a target document by including it in the signal stream and associating it with the target document within the signal stream.

At step 620, a similarity score is calculated for the term. The similarity score may be based on cosine similarity between the base document and the related documents. Cosine similarity measures similarity between terms or documents by calculating the cosine of an angle between vectors describing the terms or documents. When the angle is small, this indicates that the vectors point in a similar direction and that the terms/documents associated with the vectors are similar. In general, greater similarity between documents suggests that terms in the related documents are more likely to be related to the target document. For example, terms from similar documents may be another way of describing content in the target document.

At step 630, a corroborative score for a term used in a link to a related document is calculated based on the similarity between the term and the term. A strong relationship or similarity between a term in a link and the term indicates that the term is closely associated with the target document.

At step 640, a uniqueness score is calculated for a term based on whether the term is currently associated with a document description through other sources. Other sources may be other signal streams. In one embodiment, when the term is unique across all signal streams associated with the document, then the uniqueness suggestion includes the term in the document description. Including unique terms in the document description may allow target documents to be returned as search results when they are otherwise not associated with terms in the search query.

At step 650, a term score for the term is calculated based on the similarity score, the source credibility score, the corroboration score, and the uniqueness score. In one embodiment, the various scores may not be given equal weight in the term score calculation. In one embodiment, the weighting given to the different scores is determined by a machine learning algorithm. The machine learning algorithm may be trained with human training data that indicates whether a particular term from the related document is actually related to the target document. The machine learning algorithm then analyzes the various scores and combines them with a weighting that produces an accurate reflection of the relevance of the terms to the target document.

At step 660, the terms are associated with the document description because the term score is above the threshold score. The threshold score may be a static number. For example, on a scale of 1 to 10, the threshold score may be 8 and any term with a term score higher than 8 is then associated with the document description of the target document. In another embodiment, the threshold score is dynamic based on the total number of terms estimated for a particular target document. For example, the top 20 terms may be included in the document description of the target document. In this example, the threshold score would be the score ranked at the 20 th term. In this example, terms lower than the 20 th term will not be included in the document description.

Turning now to FIG. 7, a method 700 of associating terms from a related document with a document description of a target document is described, in accordance with an embodiment of the present invention. As previously described, the related documents are related to the target document by either a forward or backward linking relationship. The file description is used to determine whether the target file should be returned as a search result in response to a query.

Determining whether a file should be returned as a result of a search result in response to a query may be a two-part analysis. The first part may be to determine whether the target document matches the search query. The second part of the analysis may be the degree of relevance of the target document to the search query. Different files may be more relevant than other files. In other words, all files that match the search query may not be presented simply because they match the search query. The search engine tries to present the most relevant documents. The file description may be used for both match determination and relevance determination.

As previously described, the file description may include multiple signal streams. In particular, the signal streams may include forward anchored (anchoring) signal streams and backward anchored signal streams. The forward anchored signal stream associates terms from the forward related file with the target file. The look-back anchor stream associates terms from files linked to the target file. As used in this application, the term anchor stream refers to either a forward-looking anchor stream or a backward-looking anchor stream.

At step 710, a similarity score for the term is calculated. The similarity score is based on the similarity between the target document and the link document. The similarity between these documents can be determined by calculating the cosine similarity of these documents. As previously mentioned, the similarity between documents suggests that the term should be included in the document description.

At step 720, a source credibility score for the term is calculated based on the static ranking of the related documents. The static ranking is based on the individual popularity scores of the related documents. The popularity score for the related document may be generated based on the number of other pages linked to the related document. The static ranking and popularity score may also be based on traffic to other files or other factors.

At step 730, a corroborative score for a term used in a link from another document to the linked-to document is calculated based on the similarity between the term and the term. Links from other files to the linked-to file encompass links found on the target file to other files and links found on other files to the target file. For example, if a term helps in the use of a link and the term being evaluated is "auxiliary," a potential similarity between these terms may result in a high corroborative score. In this use, a high corroboration score indicates that the term should be included in the description of the target document.

At step 740, a uniqueness score is calculated for a term based on whether the term is currently associated with a document description through other sources. With reference to FIG. 6, the uniqueness score has been previously described. At step 750, a term score for the term is calculated based on a weighted combination of the similarity score, the source credibility score, the corroboration score, and the uniqueness score. At step 760, the term is associated with the document description because the term score is above the threshold score.

Turning now to FIG. 8, a methodology 800 for presenting search results using anchor streams generated by propagating terms between documents related by links is depicted in accordance with an embodiment of the present invention. As previously described, the term anchor stream refers to either a forward-looking anchor stream or a backward-looking anchor stream. At step 810, a search query is received. A search query is composed of one or more terms. The search query may be received through a search interface presented by a search engine over the world wide web.

At step 820, because at least one term of the one or more terms in the search query is associated with a target document in the anchor stream, the target document is determined to match the search query. The anchor stream may be a forward-looking anchor stream or a backward-looking anchor stream. In one embodiment, the threshold amount of terms needed to determine that the target document matches the search query are all associated with the target document by the anchor stream. In another embodiment, at least one term required to determine that the target document matches the search query is associated with the target document by the anchor stream and other terms required to determine that the target document matches the search query are associated with the target document in the additional signal stream. In other words, the signal streams may be used in isolation to determine that a file matches a search query or they may be used in combination with each other. In one embodiment, the anchor stream associates terms with the target file because the terms are not included in the target file and are determined to be related to the target file using various criteria. As previously described, related documents are related by a link to or from the target document.

At step 830, the target document is presented as search results in response to the search query.

Embodiments of the present invention have been described by way of illustration and not limitation. It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims.

Claims

1. A method of adding terms from a related document to a document description of a target document, the method comprising:

determining that terms found in the relevant document do not match filter criteria, wherein terms matching the filter criteria are not added to a document description of a target document, wherein the document description of the target document includes terms within a plurality of signal streams associated with the target document, wherein the relevant document is relevant because the target document is linked to the relevant document; calculating a similarity score for the term, wherein the similarity score is based on a cosine similarity between the term of the target document and the term of the related document;

calculating a source credibility score for the term based on a static ranking of the related documents, wherein the static ranking is based on an independent popularity score of the related documents;

calculating a corroboration score for a term used in a link based on a similarity between the term and the term;

calculating a uniqueness score for the term based on whether the term is currently associated with a document description through other sources;

calculating a term score for the term based on the similarity score, source credibility score, the corroboration score, and the uniqueness score; and

associating the term with the document description because the term score is above a threshold score.

2. The method of claim 1, wherein the filter criteria comprises a list of excluded common terms.

3. The method of claim 1, wherein the filter criteria excludes terms from all related documents when the target document links to more than a threshold number of other documents.

4. The method of claim 1, wherein the similarity score is also based on a similarity between the term and one or more terms used in the link from the document to the target document.

5. The method of claim 1, wherein the static ordering is based on a spam score of the related documents.

6. The method of claim 1, wherein calculating the term score for the term further comprises using a weighting factor for each of the similarity score, the source credibility score, the corroboration score, and the uniqueness score.

7. The method of claim 1, wherein the threshold score is determined by ranking term scores computed for each of a plurality of terms in a link to a document.

8. A method for associating terms from related documents with a document description of a target document, wherein the related documents are related to the target document by either a forward or backward linking relationship, and wherein the document description is used to determine whether the target document should be returned as a search result in response to a query, the method comprising:

calculating a similarity score for the terms, wherein the similarity score is based on similarities between the terms of the target document and the terms of the related documents;

calculating a corroborative score for a term used in the related document based on similarities between the term and terms used in links from other documents to the term;

calculating a term score for the term based on a weighted combination of the similarity score, source credibility score, the corroboration score, and the uniqueness score; and

9. The method of claim 8, wherein the method further comprises determining that terms found in the relevant document do not match filter criteria, wherein terms matching the filter criteria are not added to the document description, wherein the filter criteria comprise a list of excluded common terms.

10. The method of claim 8, wherein the method further comprises generating weights for the term scores using a machine learning algorithm.

11. The method of claim 8, wherein the similarity score is calculated using cosine similarity.

12. The method of claim 8, wherein calculating the term score for the term further comprises using a weighting factor for each of the similarity score, the source credibility score, the corroboration score, and the uniqueness score.

13. A method of presenting search results using an anchor stream generated by propagating terms between documents related by links, the method comprising:

receiving a search query composed of one or more terms;

determining that a target document matches the search query because at least one term of the one or more terms is associated with the target document in an anchor stream, wherein terms not included in the target document by the anchor stream are associated with the target document because the term is included in and determined to be relevant to the target document, and wherein the relevant document is linked from the target document;

determining that the target document matches the search query because the at least one term is associated with the target document in a signal stream that associates terms included in the target document with the target document; and

presenting the target file as a search result in response to the search query.

14. The method of claim 13, wherein determining that a target document matches the search query because at least one of the one or more terms is associated with a target document in the anchor stream further comprises determining that at least one of the one or more terms is associated with the target document in a content stream that associates terms found in content of the target document with the target document.

15. The method of claim 13, wherein the anchor stream is a look-ahead anchor stream that associates terms from a different file linked to the target file with the target file.

16. An apparatus for adding terms from a related document to a document description of a target document, comprising means for performing the method of any one of claims 1-7.

17. An apparatus for associating terms from a related document with a document description of a target document, comprising means for performing the method of any of claims 8-12.

18. An apparatus for presenting search results using anchor flow, comprising means for performing the method of any of claims 13-15.