US20130097139A1 - Programmable multi-filtering - Google Patents
Programmable multi-filtering Download PDFInfo
- Publication number
- US20130097139A1 US20130097139A1 US13/275,111 US201113275111A US2013097139A1 US 20130097139 A1 US20130097139 A1 US 20130097139A1 US 201113275111 A US201113275111 A US 201113275111A US 2013097139 A1 US2013097139 A1 US 2013097139A1
- Authority
- US
- United States
- Prior art keywords
- selection
- search
- execution
- models
- grouping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9038—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/638—Presentation of query results
Definitions
- Techniques of the present disclosure relate to generating search results for a search query, and more specifically to grouping and aggregating search results according to selection models.
- Search engines are designed to provide data mining services.
- the approaches for developing data mining search engines may vary and may depend on the criteria that the search engine should meet. For example, some data mining applications can be optimized to return a significant quantity of relevant documents (hits, matches) in response to a search query submitted to the search engine. That may require developing algorithms for determining a relevancy of the documents returned in response to a search query. Also, that may require developing algorithms for determining measures of a document relevancy and for determining content of the returned documents.
- Other data mining application can be optimized to generate various views of the returned documents.
- the application can be configured to organize a list of returned documents not only by the scores associated with the documents, but also to organize the list of the returned documents by some additional criteria.
- both groups of the applications may be unable to supplement the list of returned documents with aggregated information generated for the returned documents. For example, if a user submitted a search query seeking the titles of music albums recorded by a well known artist—Michael Jackson, then it may be desirable to provide not only the list of the albums, but also to provide some information indicating aggregated details about each album. Such information may help the user to determine the album that can be the most relevant to the user's search. Furthermore, such information may help the user to refine his generic search query and formulate a more specific query.
- providing aggregated information for groups of the search results in addition to an organized list of the search results enhances the user's experience from initiating a search session and finding the desired result. It also makes the process efficient by optimizing the number of transactions required to build final aggregated results.
- FIG. 1 illustrates an embodiment of a search engine environment
- FIG. 2 illustrates a data flow associated with processing grouping requests
- FIG. 3 illustrates an embodiment of generating of selection models and execution models
- FIG. 4 illustrates an embodiment of relationship between an execution model and execution result
- FIG. 5 illustrates an embodiment of an example of a display generated for grouped search results
- FIG. 6 illustrates an embodiment of an approach for programmable multi-filtering
- FIG. 7 illustrates a computer system on which embodiments may be implemented.
- Techniques disclosed herein include approaches for programmable multi-filtering of search results.
- Programmable multi-filtering can be applied to a variety of data mining applications, and in particular to data mining applications implemented in search engines.
- programmable multi-filtering of search results is performed in two phases.
- One phase can be referred to as a back-end phase, and pertains to an initial processing of a search query. It can involve transforming a search query into a multiple back-end requests which, once executed, provide one or more sets of search results.
- Another phase can be referred to as a front-end phase, and pertains to processing of the obtained search results.
- a back-end phase involves receiving a search query, parsing the search query, generating a plurality of selection models, generating a plurality of back-end requests, and executing the back-end requests to generate a set of search results.
- the search query can comprise a query select statement and a plurality of search terms.
- the plurality of selection models can be generated based on the query select statement and the plurality of search terms.
- Each selection model, from the plurality of selection models can comprise a unique combination of one or more terms, from the plurality of search terms, that is not present in other selection models, from the plurality of selection models.
- the back-end phase processing further comprises obtaining a plurality of particular selection results for a particular selection model for each particular selection model, from the plurality of models.
- one or more search cores execute a plurality of execution models by mining multi-dimensional information extracted from distributed search engine results.
- a plurality of particular selection results is grouped to a set of search results.
- a front-end phase involves analyzing and aggregating a set of search results.
- the grouping and aggregating of the search results can be executed in parallel.
- grouping of the search results is performed based on one or more selection models identified in a back-end phase of the processing, which used one or more attributes associated with the models. For example, using a particular attribute of the search results, the search results that are associated with the same value of the particular attribute can be grouped to one group. Other search results that are associated with another value of the particular attribute can be grouped to another group. For instance, if a search query was issued to return the names of music albums recorded after year 2005, then all returned names of the albums can be grouped based on the names of the artist. If the returned search results indicate one hundred (100) different names of the artists, then the returned search results can be potentially divided into one hundred different groups.
- groups identified from the search results can be graphically represented in a tree-structure. For example, if ten groups were identified from the search results, then a corresponding tree-structure can be represented as a tree having a root and ten branches originated from the root. According to another example, the tree-structure can have nested brunches, which represent groups and subgroups of the search results.
- a grouping level is identified for each group identified for search results.
- a level associated with a particular group can represent the level in a hierarchical tree-structure. For example, if a search query was issued to return the names of music albums recorded after year 2005, then the returned search results can be grouped based on the name of the artist, and within each group associated with a particular artist, one or more subgroups representing a particular type of music in a recorded album can be also identified.
- a grouping based on the name of the artist can be associated with a first level of grouping, while a grouping based on the type of music in the album can be associated with a second level.
- the search results are first grouped based on the name of the artist, and then, for each artist, the results in a group can be grouped based on the type of the recorded music albums.
- the two levels can be represented in a hierarchical tree-structure by two levels originated at a root.
- a first group level can comprise different names of the artists, while a second group level can comprise different types of music albums for each artist.
- a grouping based on the type of music in the album can be associated with a first level, while a grouping based on the name of the artist can be associated with a second level.
- the search results are first grouped based on the type of the recorded music albums, and then, for each type of music, the search results are grouped based on the name of the artist.
- the two levels can be represented in a corresponding hierarchical tree-structure by two levels originated at a root.
- a first group level can comprise different types of music albums, while a second group level can comprise different names of the artists for each type of music albums.
- information about each group of search results can be aggregated. For example, if search results providing the names of music albums recorded after 2005 were divided into groups based on the name of the artist, then aggregated information associated with the group can comprise information about the quantity of music tracks recorded by the particular artist, the quantity of hits recorded by the particular artist, the quantity of search-hits that were issued for the particular recording, the average price of the recording/albums recorded by the particular artist, the minimum and the maximum prices of the recordings/albums recorded by the particular artist, and other information that can be derived from the search results. Furthermore, the aggregated information can provide information summarizing a musical career of the particular artist, the artist' accomplishments, awards, recorded albums and other information about the artist.
- groups of search results and aggregated information associated with the groups are presented to a user.
- the groups and aggregated information can be displayed in a graphical user interface displayed for the user.
- a graphical user interface comprises a panel for a result display, a panel for any of a timeline data display, a panel for a hit-map display, a panel for a demographic information display, a panel for a price range display, and any other panel that can be used to display data.
- FIG. 1 illustrates an embodiment of a search engine environment 100 .
- the search engine environment 100 comprises one or more search engines 120 , one or more databases 130 , one or more client computers 140 a . . . 140 n , and one or more computer networks 150 .
- Other components such as servers, routers, data repositories, data clouds, can be included in the search engine environment 100 .
- a search engine 120 is configured to collect information available on the Internet or dedicated data repositories, process the collected information and store the processed information in storage, such as a database 130 .
- Search engine 120 can be further configured to receive a search query, process the search query and return search results in response to receiving the query.
- Search engine 120 can implement the functional units that are shown within the search engine 120 , and the processes described herein, using hardware logic such as in an application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), system-on-a-chip (SoC) or other combinations of hardware, firmware and/or software.
- ASIC application-specific integrated circuit
- FPGA field-programmable gate array
- SoC system-on-a-chip
- search engine 120 is a vertical search platform that provides scalability and state-of-the art search technology.
- search engine 120 can provide a multi-filtering tool that exceeds the scope of conventional grouping implemented by for example, the “group-by” and “join” search query statements.
- search engine 120 is configured to perform a method for programmable data multi-filtering of search results.
- the method comprises a back-end processing and a front-end processing.
- search engine 120 while executing a back-end phase, receives a search query, parses the search query, generates a multiple back-end requests, and executes the back-end requests to generate a set of search results.
- search engine 120 while executing a front-end phase, analyzes and aggregates a set of search results. For example, in the front-end phase, search engine 120 groups the search results and aggregates the search requests according to one or more selection models.
- Search engine 120 can group and aggregate data in parallel. For example, search engine 120 can group and aggregate data for each of the multiple back-end requests at the same time. While performing the front-end phase, for each result in a result set generated by a back-end request, search engine 120 can identify the group to which the results belong and the level to which the identified group belongs.
- search engine 120 groups search results into groups by classifying the search results based on different characteristics of the results. For instance, in response to receiving a search query that requests proving titles of music albums recorded after year 2005, search engine 120 can return a list of different music albums performed by different artists.
- the search results can be grouped by the name of the artist, and/or by the name of the album. Grouping by the name of the artist can be referred to as a first level of grouping, while grouping by the name of the album for each artist can be referred to as a second level of grouping.
- the set of songs in the first level are grouped by the name of the artist, and in the second level are grouped by the name of the album for a particular artist.
- search engine 120 generates and collects aggregated data for each group identified at each level. For example, if a result set comprises a list of music albums, and the music albums are grouped by the name of the artist, then aggregated data for a group can include information that is specific to the group. That can include a quantity of music albums found for a particular artist, a quantity of albums within the group, an average price of music album for each artist, a maximum price of music albums for each artist, a minimum price of music albums for each artist, and other types of information.
- aggregated data for the group can include information such as a quantity of different albums in the group, an average price of the albums in the group, the maximum price of the albums in the group, the minimum price of the albums in the group, or other types of information.
- search engine 120 also aggregate search results by generating a nested tree for the search results. Aggregating the search results allows displaying the search results as divided into various groups. For example, if a search result query returned three titles of music songs, out of which two songs are credited to one artist and one song is credited to another artist, and each song was a part of a different album, then the search results can be organized in a tree structure having two branches. One branch can depict three music songs organized by the name of the artist (Artist 1 , Artist 2 ), and other branch can depict three music songs organized by the name of the album (Album 1 , Album 2 , Album 3 , Album 4 , Album 5 ).
- search engine 120 comprises one or more processors 102 , one or more search units 104 , one or more grouping searchers 106 , one or more selection transformers 108 , one or more grouping executors 110 , one or more presenting units 112 , and one or more search cores 114 a , 114 b.
- a processor 102 facilitates communications between search engine 120 , and client computers 140 a . . . 140 n . Furthermore, processor 102 can process commands received and executed by procurement computer 110 , processes responses received by search engine 120 , and facilitates various types of operations executed by search engine 120 . Processor 102 comprises hardware and software logic configured to execute various processes on search engine 120 .
- a search unit 104 is configured to receive a search query comprising a query select statement and a plurality of search terms.
- a grouping searcher 106 is configured to generate a plurality of selection models based on a query select statement and a plurality of search terms.
- grouping searcher 106 is further configured to identify one or more hierarchies in the search query, enable execution of one or more nested grouping operations for the search query and enable execution of one or more parallel grouping operations for the search query.
- grouping searcher 106 is further configured to group a plurality of selection results into a final result.
- grouping searcher 106 is further configured to group one or more search terms into one or more groups of features.
- a selection model from a plurality of selection models, is generated based on a unique combination of one or more terms, from the plurality of search terms, that is not present in other selection models, from the plurality of selection models.
- a selection model can be created by a client application of the user who issued a search query to a search engine 120 .
- a selection model can be an abstract list-manipulation model.
- a selection model can comprise a variety of directives.
- a set of main directives can comprise an “all” directive for processing an input list as a whole, an “each” directive for processing each element of the input separately, a “group” directive for partitioning the input list into sub-lists, and an “output” directive for including output data in the search results.
- a selection transformer 108 is configured to transform selection models into a plurality of execution models. For example, for each of the plurality of selection models, selection model 108 transforms a selection model into a plurality of execution models.
- selection transformer 108 is further configured to group the plurality of execution results into a selection result. For example, once the execution models are executed by other units of search engine 120 , the execution results for the execution models are provided to selection transformer 108 , and the execution results are grouped to a selection result.
- a grouping executor 110 is configured to distribute execution models to search cores 114 a . . . 114 b and to receive execution results from the search cores 114 a . . . 114 b.
- any of search cores 114 a . . . 114 b is configured to execute execution models to generate execution results.
- any of search cores 114 a . . . 114 b can be configured to execute a plurality of execution models by mining multi-dimensional information stored in storage 130 .
- any of search cores 114 a . . . 114 b can be configured to access distributed databases associated with a search engine 120 .
- each of search cores 114 a . . . 114 b can be configured to search the same search core repository.
- each of search cores 114 a . . . 114 n can be configured to search separate search core repositories.
- grouping executor 110 is further configured to group a plurality of execution results into a selection result in an approximation single-pass process.
- grouping executor 110 can be configured to group the plurality of execution results to a selection result in a multi-pass process.
- a presenting unit 112 is configured to present final results.
- the final results can be grouped and aggregated, and the grouped and aggregated results can be sent to a client computer 140 a via a network 150 .
- presenting unit 112 is further configured to cause displaying a user interface on any of client computers 140 a . . . 140 n .
- the user interface can comprise a variety of panels, including a panel for a result display, a panel for a timeline data display, a panel for a hit-map display, a panel for a demographic information display, a panel for a price range display, and other panels.
- Storage 130 can be configured to store a variety of information, including information related to search queries, selection models, execution models, execution results, selection results, and any other information that search engine 120 may require.
- search engine 120 communicates with one or more client computers 140 a . . . 140 n via a communications network 150 .
- FIG. 1 shows one or more client computers 140 a . . . 140 n , and one network 150 .
- client computers 140 a . . . 140 n may use any number of client computers 140 , and any number of networks 150 .
- network 150 is communicatively coupled to client computers 140 a . . . 120 n , and search engine 120 .
- Network 150 is used to maintain various communications sessions and may implement one or more communications protocols.
- Client computers 140 a . . . 140 n and search engine 120 may implement the processes described herein using hardware logic such as in an application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), system-on-a-chip (SoC) or other combinations of hardware, firmware and/or software.
- ASIC application-specific integrated circuit
- FPGA field-programmable gate array
- SoC system-on-a-chip
- client computers 140 a . . . 140 n , search engine 120 and network 150 comprise hardware or software logic configured to generate and maintain various types of communications session information, and routing information for data communications network 150 .
- an approach for multi-filtering of multi-dimensional information is presented.
- the multi-filtering can be implemented on a variety of search platforms.
- the multi-filtering can be implemented a vertical search platform “Vespa 4.0” that provides scalability and state-of-the-art search technology and that is available from Yahoo! Inc., Santa Clara, Calif.
- FIG. 2 illustrates a data flow associated with processing grouping requests.
- FIG. 2 depicts a search container 200 , and one or more search cores 210 a . . . 210 b .
- Search container 200 comprises a grouping searcher 202 , a selection transformer 204 and a grouping executor 206 , each of which was briefly described in reference to FIG. 1 .
- Search cores 210 a . . . 210 b can run multiple select statements in parallel for the same query.
- a search container 200 is a multi-filtering tool and is configured to perform a multi-filtering.
- a search container 200 performs multi-filtering by utilizing a ranking framework for deriving and executing various ranking expressions tailored for various applications.
- the ranking expressions can be designed to perform various math operations as well as conditional branching.
- the ranking expressions can operate on a variety of document attributes.
- a search container 200 is configured to execute an approach for multi-filtering by executing two types of processing: a front-end processing and a back-end processing.
- a front-end processing starts with a grouping searcher 202 generating one or more selection models based on a received search query.
- the example below is used to illustrate a data flow associated with processing grouping requests as described in FIG. 2 .
- the processing of the above search query would require at least two processing phases.
- the first phase can be referred to as an initial processing, and comprises accessing the table called “orders,” and selecting those records from the table called “orders” that contain information about “Smith.”
- the second phase can be referred to as a post-processing, and comprises counting the number of records from the table “orders” that indeed contain the information about “Smith.”
- the two-step processing can be inefficient and time-consuming at times.
- a grouping searcher 202 of the search container 200 can represent the above user search query using the following expression: all(group(customer) each(output(count( )))). Based on the expression, grouping searcher 202 can generate various grouping instructions. The examples of the grouping instructions depend on the implementation.
- a grouping searcher 202 is configured to generate one or more selection models.
- An example of an embodiment of generating selection models is depicted in FIG. 3 .
- FIG. 3 illustrates an embodiment of generating selection models and execution models.
- one or more selection models are generated for an expression “all (group(a) each (group(b) . . . ) each (group(c) . . . )).”
- the expression indicates a group(a) 320 , a group(b) 330 , and a group(c) 340 .
- the group(a) 320 is displayed above group(b) 330 and group(c) 340 .
- Group(a) 320 has an associated level.
- Group(b) 330 and group(c) 340 also have an associated level.
- the level associated with group(a) 320 is higher than the level associated with group(b) 330 and group(c) 340 .
- one or more execution models 350 are generated for each group identified in a selection model 310 .
- the one or more execution models 350 can be generated by a selection transformer 204 of FIG. 2 .
- a selection transformer 204 of FIG. 3 is configured to generate execution models. For example, selection transformer 204 receives one or more selection models, and based on the selection model information, selection transformer 204 generates a plurality of execution models. An example of an embodiment of generating the execution models is depicted in FIG. 3 .
- one or more execution models are generated for one or more groups of selection model 310 .
- an execution model 360 is generated for a group(a) 320 and a group(b) 330
- an execution model 370 is generated for a group(a) and a group(c) 340 .
- the execution model 360 is a separate model from the execution model 370 . As depicted in FIG.
- the execution model 360 comprises a root, an expression “all(group(a))” and an expression (each(group(b) output(count( )), while the execution model 370 comprises a root, an expression “all(group(a))” and an expression (each(group(c) output(count( )).
- the transformation process is also able to discard execution models that either have no outputs, or that can be collapsed into another parallel execution model.
- one or more execution models are sent to a grouping executor 206 , depicted in FIG. 2 .
- a grouping executor 206 is configured to generate execution models for each search core.
- grouping executor 206 can receive a plurality of execution modes, and use the execution models to generate a plurality of execution models for each core search. For instance, if two search cores have been identified by grouping executor 206 , then grouping executor 206 can generate a plurality of execution models for the first search core, and a plurality of execution models for the second search core.
- each of the plurality of execution models is executed by search cores 210 a . . . 210 b .
- FIG. 2 depicts two search cores 210 a . . . 210 b , more than two search cores 210 can be dedicated to execute the execution models.
- search cores 210 a . . . 210 b finish processing the plurality of execution models, the search cores 210 a . . . 210 b provide a plurality of execution results for search cores to a grouping executor 206 .
- a grouping executor 206 groups execution results provided for each search core into a plurality of execution results.
- An example of grouping the plurality of the execution results for search cores is depicted in FIG. 4 .
- FIG. 4 illustrates an embodiment of relationship between an execution model and execution result.
- execution results 450 are grouped for a plurality of execution models 410 .
- FIG. 4 depicts two execution models: an execution model 412 comprises a root 420 , an expression “all (group(a))” 430 and an expression “(each(group(b) output(count( ))” 440 , while an execution model 414 comprises other respective clauses. Beneath the grouping executor sits a dispatch that scatters the execution model across all search cores, and the same dispatch merges the result so that a the grouping searcher gets exactly one execution result per execution model.
- a grouping executor 206 is matched with one execution result in the following manner: an execution result generated for a root 420 of an execution model 412 is referred to as execution result 452 ; an execution result generated for an “all(group(a))” expression 430 of the execution model 412 is referred to as execution result 454 , and an execution result generated for a clause “each(group(b) output(count( )))” 440 is referred to as execution results 456 .
- execution model 414 is matched with exactly one execution result.
- execution results can be represented in a tree-structure 450 .
- the tree has two branches: a branch 452 - 454 - 456 , which comprises execution results generated for an execution model 412 ; and a branch 462 - 464 - 466 , which comprises execution results generated for an execution model 414 .
- the branch 452 - 454 - 456 comprises results r+a 1 +a 2 +b 1 +b 2 +b 3
- branch 462 - 464 - 466 comprises results r+a 2 +a 3 +c 1 +c 2 +c 3 .
- grouping of the execution results can cause a repetition of some execution results in a tree-structure of the execution results.
- the results “r” and “a 2 ” are included in both branches.
- grouping of the execution results can be performed using custom expressions, such as group clauses.
- the expressions can comprise numerical constants, document attributes, functions defined over another expressions (such as md5, cat, xor, and, or, add, sub, mul, div, mod), data types of expressions resolved using best effort, arithmetical operands, and other types of expressions.
- Numeric Numeric ⁇ Negate right argument Numeric Numeric Bitwise expressions and AND the arguments in order. Long+ Long or OR the arguments in order. Long+ Long xor XOR the arguments in order. Long+ Long String expressions strlen Count the number of bytes in String Long argument. strcat Concatenate arguments in order. String+ String Type conversion expressions todouble Convert argument to double. Any Double tolong Convert argument to long. Any Long tostring Convert argument to string. Any String toraw Convert argument to raw. Any Raw Raw data expressions cat Cat the binary representation of Any+ Raw the arguments together. md5 Does an md5 over the binary Any Raw representation of the argument, and keeps the lowest 64 bits. Accessor expressions relevance Return the computed rank of a None Double document.
- Bucket expressions fixedwidth Maps the value of the first Any, Numeric NumericBucketList argument into second argument number of fixed width buckets. predefined Maps the value of the first Any Bucket+ BucketList argument into the given buckets.
- Time expressions time.dayofmonth Returns the day of month (1-31) Long Long for the given timestamp.
- time.dayofweek Returns the day of week (0-6) Long Long for the given timestamp, Monday being 0. time.dayofyear Returns the day of year (0-365) Long Long Long for the given timestamp. time.hourofday Returns the hour of day (0-23) Long Long for the given timestamp.
- time.minuteofhour Returns the minute of hour (0- Long Long 59) for the given timestamp.
- time.monthofyear Returns the month of year (1-12) Long Long for the given timestamp.
- time.secondofminute Returns the second of minute (0- Long Long 59) for the given timestamp.
- time.year Returns the full year (e.g. 2009) Long Long of the given timestamp.
- List expressions size Return the number of elements Any Long in the argument if it is a list. If not return 1. sort Sort the elements in argument in Any Any ascending order if argument is a list If not it is a NOP. reverse Reverse the elements in the Any Any argument if argument is a list If not it is a NOP.
- zcurve.x Returns the X component of the Long Long given zcurve encoded 2d point.
- zcurve.y Returns the Y component of the Long Long given zcurve encoded 2d point. uca Converts the attribute string Any Locale(String), Raw using unicode collation Strength(String) algorithm, useful for sorting.
- a type of the results generated by custom expressions can be either scalar or single dimension arrays. For example, an expression “add( ⁇ array>)” adds all elements together to produce a scalar. Adding elements to arrays can produce a new array with length of max(
- groups can contain subgroups.
- the subgroups can be generated by using sub-expressions and group operations).
- groups can be nested within any number of levels.
- Each level of grouping can specify a set of aggregates configured to collect search results that belong to the particular group.
- Aggregated information for a particular group can comprise various types of information.
- the aggregated information can comprise a list of documents retrieved using a particular summary class.
- the aggregated information can comprise the count of the documents in the group.
- the aggregated information can comprise the sum, average, min, max, or xor computed for the expression associated with the group.
- an order in which the search results can be ordered can be determined for some or all levels of grouping. For example, an order for grouping the documents within a particular group can be defined and associated with a particular level of the grouping.
- a simple example of grouping provisioning for counting the number of documents in each group can be expressed as all(group(a) each(output(count( )))).
- Two parallel groupings can be expressed as: all(all(group(a) each(output(count( )))) all(group(b) each(output(count( )))))
- a simple example of grouping provisioning for grouping only the 1000 best hits at each search core node can be expressed as: all(max(1000) all(group(a) each(output(count( )))))
- a simple example of grouping provisioning for grouping of all search results can be expressed as: all(group(a) each(output(count( )))) where(true).
- a simple example of grouping with a local aware sorting can be expressed as: all(group(s) order(max(uca(s, “sv”))) each(output(count( )))) all(group(s) order(max(uca(s, “sv”, “PRIMARY”))) each(output(count( )))))
- Grouping and multivalue fields A simple example of grouping based on a map from strings to integers, where the strings are can be processed by a sort of key can be expressed as: all(group(mymap.key) each(output(sum(mymap.value)))))
- Ordering groups A simple example of grouping using a modulo-5 operation before the group is selected can be expressed as: all(group(a % 5) order(sum(b)) each(output(count( )))))))
- Collecting aggregates A simple example of grouping where the number of documents in each group is counted and the best hit in each group is returned can be expressed as
- ordering of the grouped search results can be performed using any of the available aggregates.
- a multi-filtering can be used to implement various types of search results ordering.
- the multi-filtering can be used to implement a strict ordering of the search results.
- Other types of ordering can include an ascending ordering, a descending ordering and any type of ordering specified for each level of the grouping.
- a quantity of groups returned for each level can be restricted. This can be accomplished by using for example, a “max” operation expression, and allowing returning only for example, first n groups as specified by the order operation.
- a grouping executor 206 is also configured to transmit a plurality of execution results to a selection transformer 204 .
- a selection transformer 204 receives a plurality of execution results and generates one selection result per selection model.
- a grouping searcher received one or more selection results and displays the selection results grouped according to one or more selection models. Example of the grouped selection results is depicted in FIG. 5 .
- FIG. 5 illustrates an example of a display for grouped search results.
- the example depicted in FIG. 5 illustrates search results generated for a search query seeking a count for each of three most popular songs performed by Michael Jackson and a count for each of three most popular songs performed by The Beatles.
- a count may represent for example, the count of different recordings of a particular song, the count of websites providing the recording of a particular song, or any other related count.
- FIG. 5 comprises three columns: a first GroupId column 510 , a second GroupId column 520 and a count column 530 .
- first GroupId column 510 labeled “GroupId” 512
- GroupId 514 GroupId 514 “Michael Jackson”
- GroupId 516 The Beatles.”
- the names of the songs are listed.
- the names of the songs are organized by the group identifiers.
- three most popular songs include: “Thriller,” “Bad,” and “Dangerous.”
- the lists were truncated to three elements (songs); however, in other implementation, a list can comprise any number of elements.
- execution of the search query returned search results, and the search results are organized by a GroupId.
- the search results can be displayed in a count column 530 .
- the count of M. Jackson's “Thriller” was 9
- the count of M. Jackson's “Bad” was 11
- the count of M. Jackson's “Dangerous” was 14
- the count of The Beatles' “A Hard Day's Night” was 13
- the count of The Beatles' “Sgt. Pepper's Lonely Hearts Club Band” was 13, and the count of The Beatles' “Abbey Road” was 17.
- results grouping can produce groups that contain outputs, group lists, and hit lists.
- Group lists can contain sub-groups, and hit lists can contain hits that are part of the owning group.
- FIG. 6 illustrates an embodiment of an approach for programmable multi-filtering.
- a search engine receives a search query.
- the search query can be issued by a client application executed on a user computer.
- the search query can be issued to request one or more search results that satisfy the terms present in the search query.
- a search engine In step 602 , a search engine generates one or more selection models for the received search query. The details of generating the one or more selection models are provided in the description of FIG. 2-3 .
- a search engine In step 604 , a search engine generates one or more execution models based on the one or more selection models generated for the search query. The details of generating the one or more execution models are provided in the description of FIG. 2 and FIG. 4 .
- a search engine generates one or more execution models for each search core.
- the execution models can be customized according to the search core available to a particular search core. For example, if two search cores are available to perform a search, then the search engine can generate two execution models, each model for one search core.
- An example of generating the one or more execution models is provided in the description of FIG. 2 and FIG. 4 .
- each search core will receive the same set of execution models.
- the execution of the execution models can be performed in parallel by each search core, and thus execution of the execution models can be performed simultaneously by the search cores.
- the details of executing the execution models are provided in the description of FIG. 2 .
- step 610 a search engine checks whether the execution of all execution models has been completed. If the execution of the execution models has not been completed, then the process proceeds to step 612 , in which the execution of the execution models is continued.
- step 614 the execution of the execution models has been completed.
- a search engine receives a plurality of execution results, and generates selection results. Generating of the selection results can be performed online or offline. If the generation is performed online, then the selection results can be immediately provided to the user. If the generation is performed offline, then the selection results can be provided to the user with some delay. The details are provided in the description of FIG. 2-4 .
- a search engine presents the selection results to a user.
- the selection results can be aggregated. An example of the selection results is described in FIG. 4 .
- the techniques described herein are implemented by one or more special-purpose computing devices.
- the special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
- ASICs application-specific integrated circuits
- FPGAs field programmable gate arrays
- Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques.
- the special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
- FIG. 7 is a block diagram that illustrates a computer system 700 upon which an embodiment of the invention may be implemented.
- Computer system 700 includes a bus 702 or other communication mechanism for communicating information, and a hardware processor 704 coupled with bus 702 for processing information.
- Hardware processor 704 may be, for example, a general purpose microprocessor.
- Computer system 700 also includes a main memory 706 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 702 for storing information and instructions to be executed by processor 704 .
- Main memory 706 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 704 .
- Such instructions when stored in storage media accessible to processor 704 , render computer system 700 into a special-purpose machine that is customized to perform the operations specified in the instructions.
- Computer system 700 further includes a read only memory (ROM) 708 or other static storage device coupled to bus 702 for storing static information and instructions for processor 704 .
- ROM read only memory
- a storage device 710 such as a magnetic disk or optical disk, is provided and coupled to bus 702 for storing information and instructions.
- Computer system 700 may be coupled via bus 702 to a display 712 , such as a cathode ray tube (LCD, CRT), for displaying information to a computer user.
- a display 712 such as a cathode ray tube (LCD, CRT)
- An input device 714 is coupled to bus 702 for communicating information and command selections to processor 704 .
- cursor control 716 is Another type of user input device
- cursor control 716 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 704 and for controlling cursor movement on display 712 .
- This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
- Computer system 700 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 700 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 700 in response to processor 704 executing one or more sequences of one or more instructions contained in main memory 706 . Such instructions may be read into main memory 706 from another storage medium, such as storage device 710 . Execution of the sequences of instructions contained in main memory 706 causes processor 704 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
- Non-volatile media includes, for example, optical or magnetic disks, such as storage device 710 .
- Volatile media includes dynamic memory, such as main memory 706 .
- Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
- Storage media is distinct from but may be used in conjunction with transmission media.
- Transmission media participates in transferring information between storage media.
- transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 702 .
- transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
- Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 704 for execution.
- the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.
- the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
- a modem local to computer system 700 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
- An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 702 .
- Bus 702 carries the data to main memory 706 , from which processor 704 retrieves and executes the instructions.
- the instructions received by main memory 706 may optionally be stored on storage device 710 either before or after execution by processor 704 .
- Computer system 700 also includes a communication interface 718 coupled to bus 702 .
- Communication interface 718 provides a two-way data communication coupling to a network link 720 that is connected to a local network 722 .
- communication interface 718 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated services digital network
- communication interface 718 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
- LAN local area network
- Wireless links may also be implemented.
- communication interface 718 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- Network link 720 typically provides data communication through one or more networks to other data devices.
- network link 720 may provide a connection through local network 722 to a host computer 724 or to data equipment operated by an Internet Service Provider (ISP) 726 .
- ISP 726 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 728 .
- Internet 728 uses electrical, electromagnetic or optical signals that carry digital data streams.
- the signals through the various networks and the signals on network link 720 and through communication interface 718 which carry the digital data to and from computer system 700 , are example forms of transmission media.
- Computer system 700 can send messages and receive data, including program code, through the network(s), network link 720 and communication interface 718 .
- a server 730 might transmit a requested code for an application program through Internet 728 , ISP 726 , local network 722 and communication interface 718 .
- the received code may be executed by processor 704 as it is received, and/or stored in storage device 710 , or other non-volatile storage for later execution.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- Techniques of the present disclosure relate to generating search results for a search query, and more specifically to grouping and aggregating search results according to selection models.
- Search engines are designed to provide data mining services. The approaches for developing data mining search engines may vary and may depend on the criteria that the search engine should meet. For example, some data mining applications can be optimized to return a significant quantity of relevant documents (hits, matches) in response to a search query submitted to the search engine. That may require developing algorithms for determining a relevancy of the documents returned in response to a search query. Also, that may require developing algorithms for determining measures of a document relevancy and for determining content of the returned documents.
- Other data mining application can be optimized to generate various views of the returned documents. For example, the application can be configured to organize a list of returned documents not only by the scores associated with the documents, but also to organize the list of the returned documents by some additional criteria.
- However, both groups of the applications may be unable to supplement the list of returned documents with aggregated information generated for the returned documents. For example, if a user submitted a search query seeking the titles of music albums recorded by a well known artist—Michael Jackson, then it may be desirable to provide not only the list of the albums, but also to provide some information indicating aggregated details about each album. Such information may help the user to determine the album that can be the most relevant to the user's search. Furthermore, such information may help the user to refine his generic search query and formulate a more specific query.
- Hence, providing aggregated information for groups of the search results in addition to an organized list of the search results enhances the user's experience from initiating a search session and finding the desired result. It also makes the process efficient by optimizing the number of transactions required to build final aggregated results.
- The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
-
FIG. 1 illustrates an embodiment of a search engine environment; -
FIG. 2 illustrates a data flow associated with processing grouping requests; -
FIG. 3 illustrates an embodiment of generating of selection models and execution models; -
FIG. 4 illustrates an embodiment of relationship between an execution model and execution result; -
FIG. 5 illustrates an embodiment of an example of a display generated for grouped search results; -
FIG. 6 illustrates an embodiment of an approach for programmable multi-filtering; -
FIG. 7 illustrates a computer system on which embodiments may be implemented. - In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
- Embodiments are described herein according to the following outline:
-
- 1.0 General Overview
- 2.0 Structural and Functional Overview
- 3.0 Programmable Multi-filtering of Search Results
- 4.0 Example of an Embodiment of Programmable Multi-filtering
- 5.0 Implementation Mechanisms—Hardware Overview
- 6.0 Extensions and Alternatives
- 1.0 General Overview
- Techniques disclosed herein include approaches for programmable multi-filtering of search results. Programmable multi-filtering can be applied to a variety of data mining applications, and in particular to data mining applications implemented in search engines.
- In an embodiment, programmable multi-filtering of search results is performed in two phases. One phase can be referred to as a back-end phase, and pertains to an initial processing of a search query. It can involve transforming a search query into a multiple back-end requests which, once executed, provide one or more sets of search results. Another phase can be referred to as a front-end phase, and pertains to processing of the obtained search results.
- In particular, in an embodiment, a back-end phase involves receiving a search query, parsing the search query, generating a plurality of selection models, generating a plurality of back-end requests, and executing the back-end requests to generate a set of search results. The search query can comprise a query select statement and a plurality of search terms. The plurality of selection models can be generated based on the query select statement and the plurality of search terms. Each selection model, from the plurality of selection models, can comprise a unique combination of one or more terms, from the plurality of search terms, that is not present in other selection models, from the plurality of selection models.
- In an embodiment, the back-end phase processing further comprises obtaining a plurality of particular selection results for a particular selection model for each particular selection model, from the plurality of models.
- In an embodiment, one or more search cores execute a plurality of execution models by mining multi-dimensional information extracted from distributed search engine results.
- In an embodiment, a plurality of particular selection results is grouped to a set of search results.
- In an embodiment, a front-end phase involves analyzing and aggregating a set of search results. The grouping and aggregating of the search results can be executed in parallel.
- In an embodiment, grouping of the search results is performed based on one or more selection models identified in a back-end phase of the processing, which used one or more attributes associated with the models. For example, using a particular attribute of the search results, the search results that are associated with the same value of the particular attribute can be grouped to one group. Other search results that are associated with another value of the particular attribute can be grouped to another group. For instance, if a search query was issued to return the names of music albums recorded after year 2005, then all returned names of the albums can be grouped based on the names of the artist. If the returned search results indicate one hundred (100) different names of the artists, then the returned search results can be potentially divided into one hundred different groups.
- In an embodiment, groups identified from the search results can be graphically represented in a tree-structure. For example, if ten groups were identified from the search results, then a corresponding tree-structure can be represented as a tree having a root and ten branches originated from the root. According to another example, the tree-structure can have nested brunches, which represent groups and subgroups of the search results.
- In an embodiment, a grouping level (level) is identified for each group identified for search results. A level associated with a particular group can represent the level in a hierarchical tree-structure. For example, if a search query was issued to return the names of music albums recorded after year 2005, then the returned search results can be grouped based on the name of the artist, and within each group associated with a particular artist, one or more subgroups representing a particular type of music in a recorded album can be also identified.
- In one scenario, a grouping based on the name of the artist can be associated with a first level of grouping, while a grouping based on the type of music in the album can be associated with a second level. In this scenario, the search results are first grouped based on the name of the artist, and then, for each artist, the results in a group can be grouped based on the type of the recorded music albums. The two levels can be represented in a hierarchical tree-structure by two levels originated at a root. A first group level can comprise different names of the artists, while a second group level can comprise different types of music albums for each artist.
- In another scenario, a grouping based on the type of music in the album can be associated with a first level, while a grouping based on the name of the artist can be associated with a second level. In this scenario, the search results are first grouped based on the type of the recorded music albums, and then, for each type of music, the search results are grouped based on the name of the artist. The two levels can be represented in a corresponding hierarchical tree-structure by two levels originated at a root. A first group level can comprise different types of music albums, while a second group level can comprise different names of the artists for each type of music albums.
- In an embodiment, information about each group of search results can be aggregated. For example, if search results providing the names of music albums recorded after 2005 were divided into groups based on the name of the artist, then aggregated information associated with the group can comprise information about the quantity of music tracks recorded by the particular artist, the quantity of hits recorded by the particular artist, the quantity of search-hits that were issued for the particular recording, the average price of the recording/albums recorded by the particular artist, the minimum and the maximum prices of the recordings/albums recorded by the particular artist, and other information that can be derived from the search results. Furthermore, the aggregated information can provide information summarizing a musical career of the particular artist, the artist' accomplishments, awards, recorded albums and other information about the artist.
- In an embodiment, groups of search results and aggregated information associated with the groups are presented to a user. For example, the groups and aggregated information can be displayed in a graphical user interface displayed for the user.
- In an embodiment, a graphical user interface comprises a panel for a result display, a panel for any of a timeline data display, a panel for a hit-map display, a panel for a demographic information display, a panel for a price range display, and any other panel that can be used to display data.
- 2.0 Structural and Functional Overview
-
FIG. 1 illustrates an embodiment of asearch engine environment 100. Thesearch engine environment 100 comprises one ormore search engines 120, one ormore databases 130, one ormore client computers 140 a . . . 140 n, and one ormore computer networks 150. Other components, such as servers, routers, data repositories, data clouds, can be included in thesearch engine environment 100. - In an embodiment, a
search engine 120 is configured to collect information available on the Internet or dedicated data repositories, process the collected information and store the processed information in storage, such as adatabase 130.Search engine 120 can be further configured to receive a search query, process the search query and return search results in response to receiving the query. -
Search engine 120 can implement the functional units that are shown within thesearch engine 120, and the processes described herein, using hardware logic such as in an application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), system-on-a-chip (SoC) or other combinations of hardware, firmware and/or software. - In an embodiment,
search engine 120 is a vertical search platform that provides scalability and state-of-the art search technology. For example,search engine 120 can provide a multi-filtering tool that exceeds the scope of conventional grouping implemented by for example, the “group-by” and “join” search query statements. - In an embodiment,
search engine 120 is configured to perform a method for programmable data multi-filtering of search results. The method comprises a back-end processing and a front-end processing. - In an embodiment, while executing a back-end phase,
search engine 120 receives a search query, parses the search query, generates a multiple back-end requests, and executes the back-end requests to generate a set of search results. - In an embodiment, while executing a front-end phase,
search engine 120 analyzes and aggregates a set of search results. For example, in the front-end phase,search engine 120 groups the search results and aggregates the search requests according to one or more selection models. -
Search engine 120 can group and aggregate data in parallel. For example,search engine 120 can group and aggregate data for each of the multiple back-end requests at the same time. While performing the front-end phase, for each result in a result set generated by a back-end request,search engine 120 can identify the group to which the results belong and the level to which the identified group belongs. - In an embodiment,
search engine 120 groups search results into groups by classifying the search results based on different characteristics of the results. For instance, in response to receiving a search query that requests proving titles of music albums recorded after year 2005,search engine 120 can return a list of different music albums performed by different artists. The search results can be grouped by the name of the artist, and/or by the name of the album. Grouping by the name of the artist can be referred to as a first level of grouping, while grouping by the name of the album for each artist can be referred to as a second level of grouping. Hence, the set of songs in the first level are grouped by the name of the artist, and in the second level are grouped by the name of the album for a particular artist. - In an embodiment,
search engine 120 generates and collects aggregated data for each group identified at each level. For example, if a result set comprises a list of music albums, and the music albums are grouped by the name of the artist, then aggregated data for a group can include information that is specific to the group. That can include a quantity of music albums found for a particular artist, a quantity of albums within the group, an average price of music album for each artist, a maximum price of music albums for each artist, a minimum price of music albums for each artist, and other types of information. - According to another example, if a result set comprises a list of music albums, and the music albums are grouped by the type of the music, then aggregated data for the group can include information such as a quantity of different albums in the group, an average price of the albums in the group, the maximum price of the albums in the group, the minimum price of the albums in the group, or other types of information.
- In an embodiment,
search engine 120 also aggregate search results by generating a nested tree for the search results. Aggregating the search results allows displaying the search results as divided into various groups. For example, if a search result query returned three titles of music songs, out of which two songs are credited to one artist and one song is credited to another artist, and each song was a part of a different album, then the search results can be organized in a tree structure having two branches. One branch can depict three music songs organized by the name of the artist (Artist 1, Artist 2), and other branch can depict three music songs organized by the name of the album (Album 1, Album 2, Album 3, Album 4, Album 5). - In an embodiment,
search engine 120 provides grouped and aggregated search results. Continuing with the previous example, the search results can be displayed as organized by the name of the artist, and as organized by the name of the album. In addition to the grouping, additional information can be displayed to provide information specific to the group, such as a quantity of the documents in each group, average prices of the documents in each group, maximum and minimum prices in each group, and other information specific to the group. - In an embodiment,
search engine 120 comprises one ormore processors 102, one ormore search units 104, one ormore grouping searchers 106, one ormore selection transformers 108, one ormore grouping executors 110, one ormore presenting units 112, and one or 114 a, 114 b.more search cores - In an embodiment, a
processor 102 facilitates communications betweensearch engine 120, andclient computers 140 a . . . 140 n. Furthermore,processor 102 can process commands received and executed byprocurement computer 110, processes responses received bysearch engine 120, and facilitates various types of operations executed bysearch engine 120.Processor 102 comprises hardware and software logic configured to execute various processes onsearch engine 120. - In an embodiment, a
search unit 104 is configured to receive a search query comprising a query select statement and a plurality of search terms. - In an embodiment, a
grouping searcher 106 is configured to generate a plurality of selection models based on a query select statement and a plurality of search terms. - In an embodiment, grouping
searcher 106 is further configured to identify one or more hierarchies in the search query, enable execution of one or more nested grouping operations for the search query and enable execution of one or more parallel grouping operations for the search query. - In an embodiment, grouping
searcher 106 is further configured to group a plurality of selection results into a final result. - In an embodiment, grouping
searcher 106 is further configured to group one or more search terms into one or more groups of features. - In an embodiment, a selection model, from a plurality of selection models, is generated based on a unique combination of one or more terms, from the plurality of search terms, that is not present in other selection models, from the plurality of selection models.
- In an embodiment, a selection model can be created by a client application of the user who issued a search query to a
search engine 120. A selection model can be an abstract list-manipulation model. - In an embodiment, a selection model can comprise a variety of directives. For example, a set of main directives can comprise an “all” directive for processing an input list as a whole, an “each” directive for processing each element of the input separately, a “group” directive for partitioning the input list into sub-lists, and an “output” directive for including output data in the search results.
- In an embodiment, a
selection transformer 108 is configured to transform selection models into a plurality of execution models. For example, for each of the plurality of selection models,selection model 108 transforms a selection model into a plurality of execution models. - In an embodiment,
selection transformer 108 is further configured to group the plurality of execution results into a selection result. For example, once the execution models are executed by other units ofsearch engine 120, the execution results for the execution models are provided toselection transformer 108, and the execution results are grouped to a selection result. - In an embodiment, a
grouping executor 110 is configured to distribute execution models to searchcores 114 a . . . 114 b and to receive execution results from thesearch cores 114 a . . . 114 b. - In an embodiment, any of
search cores 114 a . . . 114 b is configured to execute execution models to generate execution results. For example, any ofsearch cores 114 a . . . 114 b can be configured to execute a plurality of execution models by mining multi-dimensional information stored instorage 130. Furthermore, any ofsearch cores 114 a . . . 114 b can be configured to access distributed databases associated with asearch engine 120. - In an embodiment, each of
search cores 114 a . . . 114 b can be configured to search the same search core repository. Alternatively, each ofsearch cores 114 a . . . 114 n can be configured to search separate search core repositories. - In an embodiment, grouping
executor 110 is further configured to group a plurality of execution results into a selection result in an approximation single-pass process. Alternatively, groupingexecutor 110 can be configured to group the plurality of execution results to a selection result in a multi-pass process. - In an embodiment, a presenting
unit 112 is configured to present final results. For example, the final results can be grouped and aggregated, and the grouped and aggregated results can be sent to aclient computer 140 a via anetwork 150. - In an embodiment, presenting
unit 112 is further configured to cause displaying a user interface on any ofclient computers 140 a . . . 140 n. The user interface can comprise a variety of panels, including a panel for a result display, a panel for a timeline data display, a panel for a hit-map display, a panel for a demographic information display, a panel for a price range display, and other panels. - In an embodiment, various search core repositories are referred to as
storage 130.Storage 130 can be configured to store a variety of information, including information related to search queries, selection models, execution models, execution results, selection results, and any other information thatsearch engine 120 may require. - In an embodiment,
search engine 120 communicates with one ormore client computers 140 a . . . 140 n via acommunications network 150. - For purposes of illustrating clear examples,
FIG. 1 shows one ormore client computers 140 a . . . 140 n, and onenetwork 150. However, practical embodiments may use any number of client computers 140, and any number ofnetworks 150. - In an embodiment,
network 150 is communicatively coupled toclient computers 140 a . . . 120 n, andsearch engine 120.Network 150 is used to maintain various communications sessions and may implement one or more communications protocols. - Each
client computer 140 a . . . 120 n, andsearch engine 120 can be any type of a workstation, laptop, PDA device, phone, or a portable device. -
Client computers 140 a . . . 140 n andsearch engine 120 may implement the processes described herein using hardware logic such as in an application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), system-on-a-chip (SoC) or other combinations of hardware, firmware and/or software. - In an embodiment,
client computers 140 a . . . 140 n,search engine 120 andnetwork 150 comprise hardware or software logic configured to generate and maintain various types of communications session information, and routing information fordata communications network 150. - In an embodiment,
client computers 140 a . . . 140 n can be used by users who issued search queries to asearch engine 120. For example, from aclient computer 140 a, a search query can be sent via anetwork 150 tosearch engine 120 for processing, and multi-filtered results can be sent fromsearch engine 120 via anetwork 150 back to theclient computer 140 a. - 3.0 Programmable Multi-Filtering of Search Results
- In an embodiment, an approach for multi-filtering of multi-dimensional information is presented. The multi-filtering can be implemented on a variety of search platforms. For example, the multi-filtering can be implemented a vertical search platform “Vespa 4.0” that provides scalability and state-of-the-art search technology and that is available from Yahoo! Inc., Santa Clara, Calif.
-
FIG. 2 illustrates a data flow associated with processing grouping requests. In an embodiment,FIG. 2 depicts asearch container 200, and one ormore search cores 210 a . . . 210 b.Search container 200 comprises agrouping searcher 202, aselection transformer 204 and agrouping executor 206, each of which was briefly described in reference toFIG. 1 .Search cores 210 a . . . 210 b can run multiple select statements in parallel for the same query. - In an embodiment, a
search container 200 is a multi-filtering tool and is configured to perform a multi-filtering. - In an embodiment, a
search container 200 performs multi-filtering by utilizing a ranking framework for deriving and executing various ranking expressions tailored for various applications. The ranking expressions can be designed to perform various math operations as well as conditional branching. The ranking expressions can operate on a variety of document attributes. - In an embodiment, a
search container 200 is configured to execute an approach for multi-filtering by executing two types of processing: a front-end processing and a back-end processing. - A front-end processing involves initiating one or more search container instances that run one or more searcher plug-ins. The front-end processing enables grouping across multiple search cores without peer communication, and thus, parts of the grouping logic can be implemented as searcher plug-ins.
- In an embodiment, a front-end processing starts with a
grouping searcher 202 generating one or more selection models based on a received search query. - In an embodiment, generating one or more selection models is a client-type grouping. The client-type grouping requests are referred to as the selection models. The selection models can be represented as tree-type models, and can be created programmatically.
- The example below is used to illustrate a data flow associated with processing grouping requests as described in
FIG. 2 . In this example, it is assumed that a user issued an SQL search query: SELECT COUNT (*) FROM orders WHERE customer=‘Smith.” In SQL, the processing of the above search query would require at least two processing phases. The first phase can be referred to as an initial processing, and comprises accessing the table called “orders,” and selecting those records from the table called “orders” that contain information about “Smith.” The second phase can be referred to as a post-processing, and comprises counting the number of records from the table “orders” that indeed contain the information about “Smith.” The two-step processing can be inefficient and time-consuming at times. - In contrast to the SQL processing, in an embodiment, a
grouping searcher 202 of thesearch container 200, can represent the above user search query using the following expression: all(group(customer) each(output(count( )))). Based on the expression,grouping searcher 202 can generate various grouping instructions. The examples of the grouping instructions depend on the implementation. - In an embodiment, a
grouping searcher 202 is configured to generate one or more selection models. An example of an embodiment of generating selection models is depicted inFIG. 3 . -
FIG. 3 illustrates an embodiment of generating selection models and execution models. In the example depicted inFIG. 3 , one or more selection models are generated for an expression “all (group(a) each (group(b) . . . ) each (group(c) . . . )).” The expression indicates a group(a) 320, a group(b) 330, and a group(c) 340. The group(a) 320 is displayed above group(b) 330 and group(c) 340. Group(a) 320 has an associated level. Group(b) 330 and group(c) 340 also have an associated level. The level associated with group(a) 320 is higher than the level associated with group(b) 330 and group(c) 340. - In an embodiment, for each group identified in a
selection model 310, one ormore execution models 350 are generated. The one ormore execution models 350 can be generated by aselection transformer 204 ofFIG. 2 . - In an embodiment, a
selection transformer 204 ofFIG. 3 is configured to generate execution models. For example,selection transformer 204 receives one or more selection models, and based on the selection model information,selection transformer 204 generates a plurality of execution models. An example of an embodiment of generating the execution models is depicted inFIG. 3 . - Continuing with the description of
FIG. 3 , one or more execution models are generated for one or more groups ofselection model 310. In the example depicted inFIG. 3 , anexecution model 360 is generated for a group(a) 320 and a group(b) 330, and anexecution model 370 is generated for a group(a) and a group(c) 340. Theexecution model 360 is a separate model from theexecution model 370. As depicted inFIG. 3 , theexecution model 360 comprises a root, an expression “all(group(a))” and an expression (each(group(b) output(count( )), while theexecution model 370 comprises a root, an expression “all(group(a))” and an expression (each(group(c) output(count( )). In an embodiment, there is one execution model for each path through the selection model from root to any leaf. The transformation process is also able to discard execution models that either have no outputs, or that can be collapsed into another parallel execution model. - In an embodiment, one or more execution models are sent to a
grouping executor 206, depicted inFIG. 2 . - Continuing with the description of
FIG. 2 , in an embodiment, agrouping executor 206 is configured to generate execution models for each search core. For example, groupingexecutor 206 can receive a plurality of execution modes, and use the execution models to generate a plurality of execution models for each core search. For instance, if two search cores have been identified by groupingexecutor 206, then groupingexecutor 206 can generate a plurality of execution models for the first search core, and a plurality of execution models for the second search core. - In an embodiment, each of the plurality of execution models is executed by
search cores 210 a . . . 210 b. AlthoughFIG. 2 , depicts twosearch cores 210 a . . . 210 b, more than two search cores 210 can be dedicated to execute the execution models. - In an embodiment, once
search cores 210 a . . . 210 b finish processing the plurality of execution models, thesearch cores 210 a . . . 210 b provide a plurality of execution results for search cores to agrouping executor 206. - In an embodiment, a
grouping executor 206 groups execution results provided for each search core into a plurality of execution results. An example of grouping the plurality of the execution results for search cores is depicted inFIG. 4 . -
FIG. 4 illustrates an embodiment of relationship between an execution model and execution result. In the example depicted inFIG. 4 , execution results 450 are grouped for a plurality ofexecution models 410. In particular,FIG. 4 depicts two execution models: anexecution model 412 comprises aroot 420, an expression “all (group(a))” 430 and an expression “(each(group(b) output(count( ))” 440, while anexecution model 414 comprises other respective clauses. Beneath the grouping executor sits a dispatch that scatters the execution model across all search cores, and the same dispatch merges the result so that a the grouping searcher gets exactly one execution result per execution model. - As depicted in
FIG. 4 , for anexecution model 412, agrouping executor 206 is matched with one execution result in the following manner: an execution result generated for aroot 420 of anexecution model 412 is referred to asexecution result 452; an execution result generated for an “all(group(a))”expression 430 of theexecution model 412 is referred to asexecution result 454, and an execution result generated for a clause “each(group(b) output(count( )))” 440 is referred to as execution results 456. - Similarly,
execution model 414 is matched with exactly one execution result. - In an embodiment, execution results can be represented in a tree-
structure 450. The tree has two branches: a branch 452-454-456, which comprises execution results generated for anexecution model 412; and a branch 462-464-466, which comprises execution results generated for anexecution model 414. Cumulatively, the branch 452-454-456 comprises results r+a1+a2+b1+b2+b3, while branch 462-464-466 comprises results r+a2+a3+c1+c2+c3. - As depicted in
FIG. 4 , grouping of the execution results can cause a repetition of some execution results in a tree-structure of the execution results. In the depicted example, the results “r” and “a2” are included in both branches. - In an embodiment, grouping of the execution results can be performed using custom expressions, such as group clauses. The expressions can comprise numerical constants, document attributes, functions defined over another expressions (such as md5, cat, xor, and, or, add, sub, mul, div, mod), data types of expressions resolved using best effort, arithmetical operands, and other types of expressions.
- TABLE 1 (below) illustrates examples of various expressions that can be used to group execution results:
-
TABLE 1 Name Description Arguments Result Arithmetic expressions add Add the arguments together. Numeric+ Numeric + Add left and right argument. Numeric, Numeric Numeric mul Multiply the arguments together. Numeric+ Numeric * Multiply left and right argument. Numeric, Numeric Numeric sub Subtract second argument from Numeric+ Numeric first, third from result, etc. − Subtract right argument from Numeric, Numeric Numeric left. div Divide first argument by second, Numeric+ Numeric result by third, etc. / Divide left argument by right. Numeric, Numeric Numeric mod Modulo first argument by Numeric+ Numeric second, result by third, etc. % Modulo left argument by right. Numeric, Numeric Numeric neg Negate argument. Numeric Numeric − Negate right argument. Numeric Numeric Bitwise expressions and AND the arguments in order. Long+ Long or OR the arguments in order. Long+ Long xor XOR the arguments in order. Long+ Long String expressions strlen Count the number of bytes in String Long argument. strcat Concatenate arguments in order. String+ String Type conversion expressions todouble Convert argument to double. Any Double tolong Convert argument to long. Any Long tostring Convert argument to string. Any String toraw Convert argument to raw. Any Raw Raw data expressions cat Cat the binary representation of Any+ Raw the arguments together. md5 Does an md5 over the binary Any Raw representation of the argument, and keeps the lowest 64 bits. Accessor expressions relevance Return the computed rank of a None Double document. <attribute-name> Return the value of the named None Any attribute. Bucket expressions fixedwidth Maps the value of the first Any, Numeric NumericBucketList argument into second argument number of fixed width buckets. predefined Maps the value of the first Any Bucket+ BucketList argument into the given buckets. Time expressions time.dayofmonth Returns the day of month (1-31) Long Long for the given timestamp. time.dayofweek Returns the day of week (0-6) Long Long for the given timestamp, Monday being 0. time.dayofyear Returns the day of year (0-365) Long Long for the given timestamp. time.hourofday Returns the hour of day (0-23) Long Long for the given timestamp. time.minuteofhour Returns the minute of hour (0- Long Long 59) for the given timestamp. time.monthofyear Returns the month of year (1-12) Long Long for the given timestamp. time.secondofminute Returns the second of minute (0- Long Long 59) for the given timestamp. time.year Returns the full year (e.g. 2009) Long Long of the given timestamp. List expressions size Return the number of elements Any Long in the argument if it is a list. If not return 1. sort Sort the elements in argument in Any Any ascending order if argument is a list If not it is a NOP. reverse Reverse the elements in the Any Any argument if argument is a list If not it is a NOP. Other expressions zcurve.x Returns the X component of the Long Long given zcurve encoded 2d point. zcurve.y Returns the Y component of the Long Long given zcurve encoded 2d point. uca Converts the attribute string Any Locale(String), Raw using unicode collation Strength(String) algorithm, useful for sorting. Single argument standard mathematical expressions math.exp Double Double math.log Double Double math.log 1p Double Double math.log 10 Double Double math.sqrt Double Double math.cbrt Double Double math.sin Double Double math.cos Double Double math.tan Double Double math.asin Double Double math.acos Double Double math.atan Double Double math.sinh Double Double math.cosh Double Double math.tanh Double Double math.asinh Double Double math.acosh Double Double math.atanh Double Double Dual argument standard mathematical expressions math.pow Return X{circumflex over ( )}Y. Double, Double Double math.hypot Return length of hypothenus Double, Double Double given X and Y sqrt(X{circumflex over ( )}2 + Y{circumflex over ( )}2). - TABLE 2 (below) illustrates an example of the language grammar that can be used to generate custom expressions:
-
TABLE 2 Language grammar request ::= group [ “where” “(” ( “true” | “$query” ) “)” ] group ::= ( “all” | “each”) “(” operations “)” [ “as” “(” identifier “)” ] operations ::= [ “group” “(” expression “)” ] ( ( “alias” “(” identifier “,” expression “)” ) | ( “max” “(” number “)” ) | ( “order” “(” expList | aggrList “)” ) | ( “output” “(” aggrList “)” ) | ( “precision” “(” number “)” ) )* group* aggrList ::= aggr ( “,” aggr )* aggr ::= ( ( “count” “(” “)” ) | ( “sum” “(” exp “)” ) ( “avg” “(” exp “)” ) | ( “max” “(” exp “)” ) | ( “min” “(” exp “)” ) ( “xor” “(” exp “)” ) | ( “summary” “(” [ identifier ] “)” ) ) [ “as” “(” identifier “)” ] expList ::= exp ( “,” exp )* exp ::= ( “+” | “−”) ( “$” identifier [ “=” math ] ) | ( math ) | ( aggr ) math ::= value [ ( “+” | “−” | “*” | “/” | “%” ) value ] value ::= ( “(” exp “)” ) | ( “add” “(” expList “)” ) | ( “and” “(” expList “)” ) | ( “cat” “(” expList “)” ) | ( “div” “(” expList “)” ) | ( “fixedwidth” “(” exp “,” number “)” ) | ( “math” “.” ( ( “exp” | “log” | “log1p” | “log10” | “sqrt” | “cbrt” | “sin” | “cos” | “tan” | “asin” | “acos” | “atan” | “sinh” | “cosh” | “tanh” | “asinh” | “acosh” | “atanh” ) “(” exp “)” | ( “pow” | “hypot” ) “(” exp “,” exp “)” )) | ( “max” “(” expList “)” ) | ( “md5” “(” exp “,” number “,” number “)” ) | ( “min” “(” expList “)” ) | ( “mod” “(” expList “)” ) | ( “mul” “(” expList “)” ) | ( “or” “(” expList “)” ) | ( “predefined” “(” exp “,” “(” bucket ( “,” bucket )* “)” “)” ) | ( “reverse” “(” exp “)” ) | ( “relevance” “(” “)” ) | ( “sort” “(” exp “)” ) | ( “strcat” “(” expList “)” ) | ( “strlen” “(” exp “)” ) | ( “size” “(” exp“)” ) | ( “sub” “(” expList “)” ) | ( “time” “.” ( “year” | “monthofyear” | “dayofmonth” | “dayofyear” | “dayofweek” | “hourofday” | “minuteofhour” | “secondofminute” ) “(” exp “)” ) | ( “todouble” “(” exp “)” ) | ( “tolong” “(” exp “)” ) | ( “tostring” “(” exp “)” ) | ( “toraw” “(” exp “)” ) | ( “uca” “(” exp “,” string [ “,” string ] “)” ) | ( “xor” “(” expList “)” ) | ( “zcurve” “.” ( “x” | “y” ) “(” exp “)” ) | ( attributeName ) bucket ::= “bucket” ( “(” | “[” | “<”) ) ( “−inf” | rawvalue | number | string ) [ “,” ( “inf” | rawvalue | number | string ) ] (“)” | “+” | “>”) rawvalue ::= “{” ( ( string | number) “,”)* “}” - In an embodiment, a type of the results generated by custom expressions can be either scalar or single dimension arrays. For example, an expression “add(<array>)” adds all elements together to produce a scalar. Adding elements to arrays can produce a new array with length of max(|A|, |B|). The type of the elements can match the arithmetic type rules for scalar values.
- In an embodiment, groups can contain subgroups. The subgroups can be generated by using sub-expressions and group operations).
- In an embodiment, groups can be nested within any number of levels. Each level of grouping can specify a set of aggregates configured to collect search results that belong to the particular group.
- Aggregated information for a particular group can comprise various types of information. For example, the aggregated information can comprise a list of documents retrieved using a particular summary class. Furthermore, the aggregated information can comprise the count of the documents in the group. Moreover, the aggregated information can comprise the sum, average, min, max, or xor computed for the expression associated with the group.
- TABLE 3 (below) illustrates an example of aggregators that can be used to aggregate information:
-
TABLE 3 Name Description Arguments Result Group aggregators count Simply increments a long counter everytime it is None Long invoked. sum Sums the argument over all selected documents. Numeric Numeric avg Computes the average over all selected documents. Numeric Numeric min Keeps the minimum value of selected documents. Numeric Numeric max Keeps the maximum value of selected documents. Numeric Numeric xor XOR the values (their least significant 64 bits) of all Any Long selected documents. Hit aggregators summary Produces a summary of the requested summary class. Name of Summary summary class - In an embodiment, an order in which the search results can be ordered can be determined for some or all levels of grouping. For example, an order for grouping the documents within a particular group can be defined and associated with a particular level of the grouping.
- TABLE 4 (below) illustrates examples of grouping:
-
TABLE 4 TopN/Full corpus Grouping A simple example of grouping provisioning for counting the number of documents in each group can be expressed as all(group(a) each(output(count( )))). Two parallel groupings can be expressed as: all(all(group(a) each(output(count( )))) all(group(b) each(output(count( ))))) A simple example of grouping provisioning for grouping only the 1000 best hits at each search core node (providing a lower accuracy, but a higher speed) can be expressed as: all(max(1000) all(group(a) each(output(count( ))))) A simple example of grouping provisioning for grouping of all search results can be expressed as: all(group(a) each(output(count( )))) where(true). Locale aware sorting A simple example of grouping with a local aware sorting can be expressed as: all(group(s) order(max(uca(s, “sv”))) each(output(count( )))) all(group(s) order(max(uca(s, “sv”, “PRIMARY”))) each(output(count( )))) Grouping and multivalue fields A simple example of grouping based on a map from strings to integers, where the strings are can be processed by a sort of key can be expressed as: all(group(mymap.key) each(output(sum(mymap.value)))) Ordering groups A simple example of grouping using a modulo-5 operation before the group is selected can be expressed as: all(group(a % 5) order(sum(b)) each(output(count( )))) Collecting aggregates A simple example of grouping where the number of documents in each group is counted and the best hit in each group is returned can be expressed as: all(group(a) each(max(1) each(output(summary( ))))) Predefined buckets A simple example of grouping based on predefined buckets for a raw attribute value can be expressed as: all(group(predefined(age, [0, 10>, [10,inf>)) each(outtput(count( )))) Other Grouping Examples Single level grouping on “a” attribute, returning at most 5 groups with full hit count as well as the 69 best hits. all(group(a) max(5) each(max(69) output(count( )) each(output(summary( ))))) Two level grouping on “a” and “b” attribute: all(group(a) max(5) each(output(count( )) all(group(b) max(5) each(max(69) output(count( )) each(output(summary( ))))))) Three level grouping on “a”, “b” and “c” attribute: all(group(a) max(5) each(output(count( )) all(group(b) max(5) each(output(count( )) all(group(c) max(5) each(max(69) output(count( )) each(output(summary( ))))))) As above example, but also collect best hit in level 2: all(group(a) max(5) each(output(count( )) all(group(b) max(5) each(output(count( )) all(max(1) each(output(summary( )))) all(group(c) max(5) each(max(69) output(count( )) each(output(summary( ))))))) As above example, but also collect best hit in level 1: all(group(a) max(5) each(output(count( )) all(max(1) each(output(summary( )))) all(group(b) max(5) each(output(count( )) all(max(1) each(output(summary( )))) all(group(c) max(5) each(max(69) output(count( )) each(output(summary( ))))))) As above example, but using different document summaries on each level: all(group(a) max(5) each(output(count( )) all(max(1) each(output(summary(complexsummary)))) all(group(b) max(5) each(output(count( )) all(max(1) each(output(summary(simplesummary)))) all(group(c) max(5) each(max(69) output(count()) each(output(summary(fastsummary))))))) Group on fixed width buckets for numeric attribute, then on “a” attribute, count hits in leaf nodes: all(group(fixedwidth(n, 3)) each(group(a) max(2) each(output(count( ))))) As above example, but limiting groups in level 1, and returning hits from level 2: all(group(fixedwidth(n, 3)) max(5) each(group(a) max(2) each(output(count( )) each(output(summary( )))))) Deep grouping with counting and hit collection on all levels: all(group(a) max(5) each(output(count( )) all(max(1) each(output(summary( )))) all(group(b) each(output(count( )) all(max(1) each(output(summary( )))) all(group(c) each(output(count( )) all(max(1) each(output(summary( )))))))))) Time aware grouping Group by year: all(group(time.year(a)) each(output(count( )))) Group by year, then by month: all(group(time.year(a)) each(output(count( )) all(group(time.month(a)) each(output(count( )))))) Group by year, then by month, then day, then by hour: all(group(time.year(a)) each(output(count( )) all(group(time.monthofyear(a)) each(output(count( )) all(group(time.dayofmonth(a)) each(output(count( )) all(group(time.hourofday(a)) each(output(count( )))))))))) Groups today, yesterday, lastweek, and lastmonth using predefined aggregator, and groups each day within each of these separately: all(group(predefined((now( ) − a) / (60 * 60 * 24), bucket(0,1), bucket(1,2), bucket(3,7), bucket(8,31))) each(output(count( )) all(max(2) each(output(summary( )))) all(group((now( ) − a) / (60 * 60* 24)) each(output(count( )) all(max(2) each(output(summary( )))))))) - In an embodiment, ordering of the grouped search results can be performed using any of the available aggregates.
- In an embodiment, a multi-filtering can be used to implement various types of search results ordering. For example, the multi-filtering can be used to implement a strict ordering of the search results. Other types of ordering can include an ascending ordering, a descending ordering and any type of ordering specified for each level of the grouping.
- In an embodiment, a quantity of groups returned for each level can be restricted. This can be accomplished by using for example, a “max” operation expression, and allowing returning only for example, first n groups as specified by the order operation.
- Continuing with the description of
FIG. 2 , in an embodiment, agrouping executor 206 is also configured to transmit a plurality of execution results to aselection transformer 204. - In an embodiment, a
selection transformer 204 receives a plurality of execution results and generates one selection result per selection model. - In an embodiment, a grouping searcher received one or more selection results and displays the selection results grouped according to one or more selection models. Example of the grouped selection results is depicted in
FIG. 5 . -
FIG. 5 illustrates an example of a display for grouped search results. The example depicted inFIG. 5 illustrates search results generated for a search query seeking a count for each of three most popular songs performed by Michael Jackson and a count for each of three most popular songs performed by The Beatles. A count may represent for example, the count of different recordings of a particular song, the count of websites providing the recording of a particular song, or any other related count. -
FIG. 5 comprises three columns: afirst GroupId column 510, asecond GroupId column 520 and acount column 530. In thefirst GroupId column 510, labeled “GroupId” 512, two group identifiers are listed:GroupId 514 “Michael Jackson,” andGroupId 516 “The Beatles.” - In the
second GroupId column 520, the names of the songs are listed. The names of the songs are organized by the group identifiers. In particular, for theGroupId 514 “Michael Jackson,” three most popular songs include: “Thriller,” “Bad,” and “Dangerous.” For theGroupId 516 “The Beatles,” three most popular songs include: “A Hard Day's Night,” “Sgt. Pepper's Lonely Hearts Club Band,” and “Abbey Road.” In the example depicted inFIG. 5 , the lists were truncated to three elements (songs); however, in other implementation, a list can comprise any number of elements. - In
FIG. 5 , execution of the search query returned search results, and the search results are organized by a GroupId. As depicted inFIG. 5 , the search results can be displayed in acount column 530. In the depicted example, it was determined that the count of M. Jackson's “Thriller” was 9, the count of M. Jackson's “Bad” was 11, the count of M. Jackson's “Dangerous” was 14, the count of The Beatles' “A Hard Day's Night” was 13, the count of The Beatles' “Sgt. Pepper's Lonely Hearts Club Band” was 13, and the count of The Beatles' “Abbey Road” was 17. - In an embodiment, results grouping can produce groups that contain outputs, group lists, and hit lists. Group lists can contain sub-groups, and hit lists can contain hits that are part of the owning group.
- 4.0 Example of an Embodiment of Programmable Multi-Filtering
-
FIG. 6 illustrates an embodiment of an approach for programmable multi-filtering. - In
step 600, a search engine receives a search query. The search query can be issued by a client application executed on a user computer. The search query can be issued to request one or more search results that satisfy the terms present in the search query. - In
step 602, a search engine generates one or more selection models for the received search query. The details of generating the one or more selection models are provided in the description ofFIG. 2-3 . - In
step 604, a search engine generates one or more execution models based on the one or more selection models generated for the search query. The details of generating the one or more execution models are provided in the description ofFIG. 2 andFIG. 4 . - In
step 606, a search engine generates one or more execution models for each search core. The execution models can be customized according to the search core available to a particular search core. For example, if two search cores are available to perform a search, then the search engine can generate two execution models, each model for one search core. An example of generating the one or more execution models is provided in the description ofFIG. 2 andFIG. 4 . - In
step 608, each search core will receive the same set of execution models. The execution of the execution models can be performed in parallel by each search core, and thus execution of the execution models can be performed simultaneously by the search cores. The details of executing the execution models are provided in the description ofFIG. 2 . - In
step 610, a search engine checks whether the execution of all execution models has been completed. If the execution of the execution models has not been completed, then the process proceeds to step 612, in which the execution of the execution models is continued. - However, if the execution of the execution models has been completed, then the proceeds to step 614.
- In
step 614, a search engine receives a plurality of execution results, and generates selection results. Generating of the selection results can be performed online or offline. If the generation is performed online, then the selection results can be immediately provided to the user. If the generation is performed offline, then the selection results can be provided to the user with some delay. The details are provided in the description ofFIG. 2-4 . - In
step 616, a search engine presents the selection results to a user. The selection results can be aggregated. An example of the selection results is described inFIG. 4 . - 5.0 Hardware Overview
- According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
- For example,
FIG. 7 is a block diagram that illustrates acomputer system 700 upon which an embodiment of the invention may be implemented.Computer system 700 includes abus 702 or other communication mechanism for communicating information, and ahardware processor 704 coupled withbus 702 for processing information.Hardware processor 704 may be, for example, a general purpose microprocessor. -
Computer system 700 also includes amain memory 706, such as a random access memory (RAM) or other dynamic storage device, coupled tobus 702 for storing information and instructions to be executed byprocessor 704.Main memory 706 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed byprocessor 704. Such instructions, when stored in storage media accessible toprocessor 704, rendercomputer system 700 into a special-purpose machine that is customized to perform the operations specified in the instructions. -
Computer system 700 further includes a read only memory (ROM) 708 or other static storage device coupled tobus 702 for storing static information and instructions forprocessor 704. Astorage device 710, such as a magnetic disk or optical disk, is provided and coupled tobus 702 for storing information and instructions. -
Computer system 700 may be coupled viabus 702 to adisplay 712, such as a cathode ray tube (LCD, CRT), for displaying information to a computer user. Aninput device 714, including alphanumeric and other keys, is coupled tobus 702 for communicating information and command selections toprocessor 704. Another type of user input device iscursor control 716, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections toprocessor 704 and for controlling cursor movement ondisplay 712. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. -
Computer system 700 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes orprograms computer system 700 to be a special-purpose machine. According to one embodiment, the techniques herein are performed bycomputer system 700 in response toprocessor 704 executing one or more sequences of one or more instructions contained inmain memory 706. Such instructions may be read intomain memory 706 from another storage medium, such asstorage device 710. Execution of the sequences of instructions contained inmain memory 706 causesprocessor 704 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. - The term “storage media” as used herein refers to any media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as
storage device 710. Volatile media includes dynamic memory, such asmain memory 706. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge. - Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise
bus 702. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. - Various forms of media may be involved in carrying one or more sequences of one or more instructions to
processor 704 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local tocomputer system 700 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data onbus 702.Bus 702 carries the data tomain memory 706, from whichprocessor 704 retrieves and executes the instructions. The instructions received bymain memory 706 may optionally be stored onstorage device 710 either before or after execution byprocessor 704. -
Computer system 700 also includes acommunication interface 718 coupled tobus 702.Communication interface 718 provides a two-way data communication coupling to anetwork link 720 that is connected to alocal network 722. For example,communication interface 718 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example,communication interface 718 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation,communication interface 718 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information. - Network link 720 typically provides data communication through one or more networks to other data devices. For example,
network link 720 may provide a connection throughlocal network 722 to ahost computer 724 or to data equipment operated by an Internet Service Provider (ISP) 726.ISP 726 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 728.Local network 722 andInternet 728 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals onnetwork link 720 and throughcommunication interface 718, which carry the digital data to and fromcomputer system 700, are example forms of transmission media. -
Computer system 700 can send messages and receive data, including program code, through the network(s),network link 720 andcommunication interface 718. In the Internet example, aserver 730 might transmit a requested code for an application program throughInternet 728,ISP 726,local network 722 andcommunication interface 718. - The received code may be executed by
processor 704 as it is received, and/or stored instorage device 710, or other non-volatile storage for later execution. - In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
- 6.0 Extensions and Alternatives
- In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/275,111 US20130097139A1 (en) | 2011-10-17 | 2011-10-17 | Programmable multi-filtering |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/275,111 US20130097139A1 (en) | 2011-10-17 | 2011-10-17 | Programmable multi-filtering |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20130097139A1 true US20130097139A1 (en) | 2013-04-18 |
Family
ID=48086679
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/275,111 Abandoned US20130097139A1 (en) | 2011-10-17 | 2011-10-17 | Programmable multi-filtering |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US20130097139A1 (en) |
Cited By (45)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160117417A1 (en) * | 2014-10-27 | 2016-04-28 | Joseph Wong | Detection of the n-queries via unit test |
| CN109857901A (en) * | 2019-01-25 | 2019-06-07 | 杭州网易云音乐科技有限公司 | Information displaying method and device and method and apparatus for information search |
| US11176208B2 (en) | 2016-09-26 | 2021-11-16 | Splunk Inc. | Search functionality of a data intake and query system |
| US11232100B2 (en) | 2016-09-26 | 2022-01-25 | Splunk Inc. | Resource allocation for multiple datasets |
| US11243963B2 (en) | 2016-09-26 | 2022-02-08 | Splunk Inc. | Distributing partial results to worker nodes from an external data system |
| US11250056B1 (en) | 2016-09-26 | 2022-02-15 | Splunk Inc. | Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system |
| US11269939B1 (en) | 2016-09-26 | 2022-03-08 | Splunk Inc. | Iterative message-based data processing including streaming analytics |
| US11281706B2 (en) | 2016-09-26 | 2022-03-22 | Splunk Inc. | Multi-layer partition allocation for query execution |
| US11294941B1 (en) | 2016-09-26 | 2022-04-05 | Splunk Inc. | Message-based data ingestion to a data intake and query system |
| US11321321B2 (en) | 2016-09-26 | 2022-05-03 | Splunk Inc. | Record expansion and reduction based on a processing task in a data intake and query system |
| US11334543B1 (en) | 2018-04-30 | 2022-05-17 | Splunk Inc. | Scalable bucket merging for a data intake and query system |
| US11341131B2 (en) | 2016-09-26 | 2022-05-24 | Splunk Inc. | Query scheduling based on a query-resource allocation and resource availability |
| US11442935B2 (en) | 2016-09-26 | 2022-09-13 | Splunk Inc. | Determining a record generation estimate of a processing task |
| US11461334B2 (en) | 2016-09-26 | 2022-10-04 | Splunk Inc. | Data conditioning for dataset destination |
| US11494380B2 (en) | 2019-10-18 | 2022-11-08 | Splunk Inc. | Management of distributed computing framework components in a data fabric service system |
| US11500875B2 (en) | 2017-09-25 | 2022-11-15 | Splunk Inc. | Multi-partitioning for combination operations |
| US11550847B1 (en) | 2016-09-26 | 2023-01-10 | Splunk Inc. | Hashing bucket identifiers to identify search nodes for efficient query execution |
| US11562023B1 (en) | 2016-09-26 | 2023-01-24 | Splunk Inc. | Merging buckets in a data intake and query system |
| US11567993B1 (en) | 2016-09-26 | 2023-01-31 | Splunk Inc. | Copying buckets from a remote shared storage system to memory associated with a search node for query execution |
| US11580107B2 (en) | 2016-09-26 | 2023-02-14 | Splunk Inc. | Bucket data distribution for exporting data to worker nodes |
| US11586627B2 (en) | 2016-09-26 | 2023-02-21 | Splunk Inc. | Partitioning and reducing records at ingest of a worker node |
| US11586692B2 (en) | 2016-09-26 | 2023-02-21 | Splunk Inc. | Streaming data processing |
| US11593377B2 (en) * | 2016-09-26 | 2023-02-28 | Splunk Inc. | Assigning processing tasks in a data intake and query system |
| US11599541B2 (en) | 2016-09-26 | 2023-03-07 | Splunk Inc. | Determining records generated by a processing task of a query |
| US11604795B2 (en) | 2016-09-26 | 2023-03-14 | Splunk Inc. | Distributing partial results from an external data system between worker nodes |
| US11615104B2 (en) | 2016-09-26 | 2023-03-28 | Splunk Inc. | Subquery generation based on a data ingest estimate of an external data system |
| US11615087B2 (en) | 2019-04-29 | 2023-03-28 | Splunk Inc. | Search time estimate in a data intake and query system |
| US11620336B1 (en) | 2016-09-26 | 2023-04-04 | Splunk Inc. | Managing and storing buckets to a remote shared storage system based on a collective bucket size |
| US11663227B2 (en) | 2016-09-26 | 2023-05-30 | Splunk Inc. | Generating a subquery for a distinct data intake and query system |
| US11704313B1 (en) | 2020-10-19 | 2023-07-18 | Splunk Inc. | Parallel branch operation using intermediary nodes |
| US11715051B1 (en) | 2019-04-30 | 2023-08-01 | Splunk Inc. | Service provider instance recommendations using machine-learned classifications and reconciliation |
| US11860940B1 (en) | 2016-09-26 | 2024-01-02 | Splunk Inc. | Identifying buckets for query execution using a catalog of buckets |
| US11874691B1 (en) | 2016-09-26 | 2024-01-16 | Splunk Inc. | Managing efficient query execution including mapping of buckets to search nodes |
| US11921672B2 (en) | 2017-07-31 | 2024-03-05 | Splunk Inc. | Query execution at a remote heterogeneous data store of a data fabric service |
| US11922222B1 (en) | 2020-01-30 | 2024-03-05 | Splunk Inc. | Generating a modified component for a data intake and query system using an isolated execution environment image |
| US11989194B2 (en) | 2017-07-31 | 2024-05-21 | Splunk Inc. | Addressing memory limits for partition tracking among worker nodes |
| US12013895B2 (en) | 2016-09-26 | 2024-06-18 | Splunk Inc. | Processing data using containerized nodes in a containerized scalable environment |
| US12039014B2 (en) | 2020-12-01 | 2024-07-16 | Motorola Solutions, Inc. | Obtaining potential match results for a reference image across a plurality of system sites |
| US12072939B1 (en) | 2021-07-30 | 2024-08-27 | Splunk Inc. | Federated data enrichment objects |
| US12093272B1 (en) | 2022-04-29 | 2024-09-17 | Splunk Inc. | Retrieving data identifiers from queue for search of external data system |
| US12118009B2 (en) | 2017-07-31 | 2024-10-15 | Splunk Inc. | Supporting query languages through distributed execution of query engines |
| US12141137B1 (en) | 2022-06-10 | 2024-11-12 | Cisco Technology, Inc. | Query translation for an external data system |
| US12248484B2 (en) | 2017-07-31 | 2025-03-11 | Splunk Inc. | Reassigning processing tasks to an external storage system |
| US12265525B2 (en) | 2023-07-17 | 2025-04-01 | Splunk Inc. | Modifying a query for processing by multiple data processing systems |
| US12287790B2 (en) | 2023-01-31 | 2025-04-29 | Splunk Inc. | Runtime systems query coordinator |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6473502B1 (en) * | 1999-08-31 | 2002-10-29 | Worldcom, Inc. | System, method and computer program product for achieving local number portability costing and network management support |
| US20060031214A1 (en) * | 2004-07-14 | 2006-02-09 | Microsoft Corporation | Method and system for adaptive categorial presentation of search results |
| US20060242132A1 (en) * | 2005-04-26 | 2006-10-26 | Computer Associates Think, Inc. | Method and apparatus for in-built searching and aggregating functionality |
| US20070112727A1 (en) * | 2003-07-04 | 2007-05-17 | Jardine Lewis F | Method for querying collated data sets |
| US20070220055A1 (en) * | 2001-06-29 | 2007-09-20 | Siebel Systems, Inc. | Automatic generation of data models and accompanying user interfaces |
| US20070226200A1 (en) * | 2006-03-22 | 2007-09-27 | Microsoft Corporation | Grouping and regrouping using aggregation |
| US20100005061A1 (en) * | 2008-07-01 | 2010-01-07 | Stephen Basco | Information processing with integrated semantic contexts |
| US20100121861A1 (en) * | 2007-08-27 | 2010-05-13 | Schlumberger Technology Corporation | Quality measure for a data context service |
| US20130054569A1 (en) * | 2010-04-30 | 2013-02-28 | Alibaba Group Holding Limited | Vertical Search-Based Query Method, System and Apparatus |
-
2011
- 2011-10-17 US US13/275,111 patent/US20130097139A1/en not_active Abandoned
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6473502B1 (en) * | 1999-08-31 | 2002-10-29 | Worldcom, Inc. | System, method and computer program product for achieving local number portability costing and network management support |
| US20070220055A1 (en) * | 2001-06-29 | 2007-09-20 | Siebel Systems, Inc. | Automatic generation of data models and accompanying user interfaces |
| US20070112727A1 (en) * | 2003-07-04 | 2007-05-17 | Jardine Lewis F | Method for querying collated data sets |
| US20060031214A1 (en) * | 2004-07-14 | 2006-02-09 | Microsoft Corporation | Method and system for adaptive categorial presentation of search results |
| US20060242132A1 (en) * | 2005-04-26 | 2006-10-26 | Computer Associates Think, Inc. | Method and apparatus for in-built searching and aggregating functionality |
| US20070226200A1 (en) * | 2006-03-22 | 2007-09-27 | Microsoft Corporation | Grouping and regrouping using aggregation |
| US20100121861A1 (en) * | 2007-08-27 | 2010-05-13 | Schlumberger Technology Corporation | Quality measure for a data context service |
| US20100005061A1 (en) * | 2008-07-01 | 2010-01-07 | Stephen Basco | Information processing with integrated semantic contexts |
| US20130054569A1 (en) * | 2010-04-30 | 2013-02-28 | Alibaba Group Holding Limited | Vertical Search-Based Query Method, System and Apparatus |
Cited By (61)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160117417A1 (en) * | 2014-10-27 | 2016-04-28 | Joseph Wong | Detection of the n-queries via unit test |
| US9779180B2 (en) * | 2014-10-27 | 2017-10-03 | Successfactors, Inc. | Detection of the N-queries via unit test |
| US11636105B2 (en) | 2016-09-26 | 2023-04-25 | Splunk Inc. | Generating a subquery for an external data system using a configuration file |
| US11663227B2 (en) | 2016-09-26 | 2023-05-30 | Splunk Inc. | Generating a subquery for a distinct data intake and query system |
| US11232100B2 (en) | 2016-09-26 | 2022-01-25 | Splunk Inc. | Resource allocation for multiple datasets |
| US11238112B2 (en) | 2016-09-26 | 2022-02-01 | Splunk Inc. | Search service system monitoring |
| US11243963B2 (en) | 2016-09-26 | 2022-02-08 | Splunk Inc. | Distributing partial results to worker nodes from an external data system |
| US11250056B1 (en) | 2016-09-26 | 2022-02-15 | Splunk Inc. | Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system |
| US11269939B1 (en) | 2016-09-26 | 2022-03-08 | Splunk Inc. | Iterative message-based data processing including streaming analytics |
| US11281706B2 (en) | 2016-09-26 | 2022-03-22 | Splunk Inc. | Multi-layer partition allocation for query execution |
| US11294941B1 (en) | 2016-09-26 | 2022-04-05 | Splunk Inc. | Message-based data ingestion to a data intake and query system |
| US11321321B2 (en) | 2016-09-26 | 2022-05-03 | Splunk Inc. | Record expansion and reduction based on a processing task in a data intake and query system |
| US12393631B2 (en) | 2016-09-26 | 2025-08-19 | Splunk Inc. | Processing data using nodes in a scalable environment |
| US11341131B2 (en) | 2016-09-26 | 2022-05-24 | Splunk Inc. | Query scheduling based on a query-resource allocation and resource availability |
| US11392654B2 (en) | 2016-09-26 | 2022-07-19 | Splunk Inc. | Data fabric service system |
| US11442935B2 (en) | 2016-09-26 | 2022-09-13 | Splunk Inc. | Determining a record generation estimate of a processing task |
| US11461334B2 (en) | 2016-09-26 | 2022-10-04 | Splunk Inc. | Data conditioning for dataset destination |
| US12204536B2 (en) | 2016-09-26 | 2025-01-21 | Splunk Inc. | Query scheduling based on a query-resource allocation and resource availability |
| US12204593B2 (en) | 2016-09-26 | 2025-01-21 | Splunk Inc. | Data search and analysis for distributed data systems |
| US11550847B1 (en) | 2016-09-26 | 2023-01-10 | Splunk Inc. | Hashing bucket identifiers to identify search nodes for efficient query execution |
| US11562023B1 (en) | 2016-09-26 | 2023-01-24 | Splunk Inc. | Merging buckets in a data intake and query system |
| US11567993B1 (en) | 2016-09-26 | 2023-01-31 | Splunk Inc. | Copying buckets from a remote shared storage system to memory associated with a search node for query execution |
| US11580107B2 (en) | 2016-09-26 | 2023-02-14 | Splunk Inc. | Bucket data distribution for exporting data to worker nodes |
| US11586627B2 (en) | 2016-09-26 | 2023-02-21 | Splunk Inc. | Partitioning and reducing records at ingest of a worker node |
| US11586692B2 (en) | 2016-09-26 | 2023-02-21 | Splunk Inc. | Streaming data processing |
| US11593377B2 (en) * | 2016-09-26 | 2023-02-28 | Splunk Inc. | Assigning processing tasks in a data intake and query system |
| US11599541B2 (en) | 2016-09-26 | 2023-03-07 | Splunk Inc. | Determining records generated by a processing task of a query |
| US11604795B2 (en) | 2016-09-26 | 2023-03-14 | Splunk Inc. | Distributing partial results from an external data system between worker nodes |
| US11615104B2 (en) | 2016-09-26 | 2023-03-28 | Splunk Inc. | Subquery generation based on a data ingest estimate of an external data system |
| US12141183B2 (en) | 2016-09-26 | 2024-11-12 | Cisco Technology, Inc. | Dynamic partition allocation for query execution |
| US11176208B2 (en) | 2016-09-26 | 2021-11-16 | Splunk Inc. | Search functionality of a data intake and query system |
| US11966391B2 (en) | 2016-09-26 | 2024-04-23 | Splunk Inc. | Using worker nodes to process results of a subquery |
| US12013895B2 (en) | 2016-09-26 | 2024-06-18 | Splunk Inc. | Processing data using containerized nodes in a containerized scalable environment |
| US11620336B1 (en) | 2016-09-26 | 2023-04-04 | Splunk Inc. | Managing and storing buckets to a remote shared storage system based on a collective bucket size |
| US11995079B2 (en) | 2016-09-26 | 2024-05-28 | Splunk Inc. | Generating a subquery for an external data system using a configuration file |
| US11874691B1 (en) | 2016-09-26 | 2024-01-16 | Splunk Inc. | Managing efficient query execution including mapping of buckets to search nodes |
| US11797618B2 (en) | 2016-09-26 | 2023-10-24 | Splunk Inc. | Data fabric service system deployment |
| US11860940B1 (en) | 2016-09-26 | 2024-01-02 | Splunk Inc. | Identifying buckets for query execution using a catalog of buckets |
| US11921672B2 (en) | 2017-07-31 | 2024-03-05 | Splunk Inc. | Query execution at a remote heterogeneous data store of a data fabric service |
| US12248484B2 (en) | 2017-07-31 | 2025-03-11 | Splunk Inc. | Reassigning processing tasks to an external storage system |
| US12118009B2 (en) | 2017-07-31 | 2024-10-15 | Splunk Inc. | Supporting query languages through distributed execution of query engines |
| US11989194B2 (en) | 2017-07-31 | 2024-05-21 | Splunk Inc. | Addressing memory limits for partition tracking among worker nodes |
| US11860874B2 (en) | 2017-09-25 | 2024-01-02 | Splunk Inc. | Multi-partitioning data for combination operations |
| US11500875B2 (en) | 2017-09-25 | 2022-11-15 | Splunk Inc. | Multi-partitioning for combination operations |
| US11720537B2 (en) | 2018-04-30 | 2023-08-08 | Splunk Inc. | Bucket merging for a data intake and query system using size thresholds |
| US11334543B1 (en) | 2018-04-30 | 2022-05-17 | Splunk Inc. | Scalable bucket merging for a data intake and query system |
| CN109857901A (en) * | 2019-01-25 | 2019-06-07 | 杭州网易云音乐科技有限公司 | Information displaying method and device and method and apparatus for information search |
| US11615087B2 (en) | 2019-04-29 | 2023-03-28 | Splunk Inc. | Search time estimate in a data intake and query system |
| US11715051B1 (en) | 2019-04-30 | 2023-08-01 | Splunk Inc. | Service provider instance recommendations using machine-learned classifications and reconciliation |
| US12007996B2 (en) | 2019-10-18 | 2024-06-11 | Splunk Inc. | Management of distributed computing framework components |
| US11494380B2 (en) | 2019-10-18 | 2022-11-08 | Splunk Inc. | Management of distributed computing framework components in a data fabric service system |
| US11922222B1 (en) | 2020-01-30 | 2024-03-05 | Splunk Inc. | Generating a modified component for a data intake and query system using an isolated execution environment image |
| US11704313B1 (en) | 2020-10-19 | 2023-07-18 | Splunk Inc. | Parallel branch operation using intermediary nodes |
| US12039014B2 (en) | 2020-12-01 | 2024-07-16 | Motorola Solutions, Inc. | Obtaining potential match results for a reference image across a plurality of system sites |
| US12072939B1 (en) | 2021-07-30 | 2024-08-27 | Splunk Inc. | Federated data enrichment objects |
| US12093272B1 (en) | 2022-04-29 | 2024-09-17 | Splunk Inc. | Retrieving data identifiers from queue for search of external data system |
| US12436963B2 (en) | 2022-04-29 | 2025-10-07 | Splunk Inc. | Retrieving data identifiers from queue for search of external data system |
| US12271389B1 (en) | 2022-06-10 | 2025-04-08 | Splunk Inc. | Reading query results from an external data system |
| US12141137B1 (en) | 2022-06-10 | 2024-11-12 | Cisco Technology, Inc. | Query translation for an external data system |
| US12287790B2 (en) | 2023-01-31 | 2025-04-29 | Splunk Inc. | Runtime systems query coordinator |
| US12265525B2 (en) | 2023-07-17 | 2025-04-01 | Splunk Inc. | Modifying a query for processing by multiple data processing systems |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20130097139A1 (en) | Programmable multi-filtering | |
| US11604794B1 (en) | Interactive assistance for executing natural language queries to data sets | |
| US9898554B2 (en) | Implicit question query identification | |
| US11500865B1 (en) | Multiple stage filtering for natural language query processing pipelines | |
| JP5623431B2 (en) | Identifying query aspects | |
| US10726018B2 (en) | Semantic matching and annotation of attributes | |
| US9798772B2 (en) | Using persistent data samples and query-time statistics for query optimization | |
| US20150310073A1 (en) | Finding patterns in a knowledge base to compose table answers | |
| US8977625B2 (en) | Inference indexing | |
| US20120117054A1 (en) | Query Analysis in a Database | |
| US11216474B2 (en) | Statistical processing of natural language queries of data sets | |
| WO2013066929A1 (en) | Method and apparatus of ranking search results, and search method and apparatus | |
| EP2291778A2 (en) | Searching using patterns of usage | |
| US20250094398A1 (en) | Unified rdbms framework for hybrid vector search on different data types via sql and nosql | |
| WO2013082506A1 (en) | Method and apparatus for information searching | |
| CN103942198A (en) | Method and device for mining intentions | |
| Cheng et al. | Supporting entity search: a large-scale prototype search engine | |
| CN103942232A (en) | Method and equipment for mining intentions | |
| CN103440308B (en) | A kind of digital thesis search method based on form concept analysis | |
| US8332415B1 (en) | Determining spam in information collected by a source | |
| RU2755568C1 (en) | Method for parallel execution of the join operation while processing large structured highly active data | |
| US20240184782A1 (en) | Heuristic database querying with dynamic partitioning | |
| Ma et al. | Web API discovery using semantic similarity and hungarian algorithm | |
| US20200089799A1 (en) | Cube construction for an olap system | |
| CN110245208A (en) | A retrieval analysis method, device and medium based on big data storage |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: YAHOO| INC,, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:THORESEN, SIMON;BALDERSHEIM, HENNING;PETTERSEN, HAAVARD;AND OTHERS;SIGNING DATES FROM 20111013 TO 20111017;REEL/FRAME:027080/0026 |
|
| AS | Assignment |
Owner name: EXCALIBUR IP, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO| INC.;REEL/FRAME:038383/0466 Effective date: 20160418 |
|
| AS | Assignment |
Owner name: YAHOO| INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EXCALIBUR IP, LLC;REEL/FRAME:038951/0295 Effective date: 20160531 |
|
| AS | Assignment |
Owner name: EXCALIBUR IP, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO| INC.;REEL/FRAME:038950/0592 Effective date: 20160531 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |