US20220019601A1 - Methods, apparatus, and systems to aggregate partitioned computer database data - Google Patents
Methods, apparatus, and systems to aggregate partitioned computer database data Download PDFInfo
- Publication number
- US20220019601A1 US20220019601A1 US17/491,146 US202117491146A US2022019601A1 US 20220019601 A1 US20220019601 A1 US 20220019601A1 US 202117491146 A US202117491146 A US 202117491146A US 2022019601 A1 US2022019601 A1 US 2022019601A1
- Authority
- US
- United States
- Prior art keywords
- nodes
- database data
- request
- data
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
Definitions
- This disclosure relates generally to databases, and, more particularly, to methods, apparatus and systems to aggregate partitioned computer database data.
- Security data streams may be generated by collecting data coming from large numbers of machines distributed across large-scale customer systems.
- FIG. 1 illustrates an example system in which partitioned database data may be scalably aggregated, according to teachings of this disclosure.
- FIG. 2 is an example dashboard for the example system of FIG. 1 .
- FIG. 3 is a block diagram illustrating an example implementation for the example cluster and the example nodes of FIG. 1 .
- FIG. 4 is a flowchart representative of example hardware logic or machine-readable instructions for implementing the example nodes, and/or, more generally, the example cluster of FIG. 1 .
- FIG. 5 illustrates an example processor platform structured to execute the example machine-readable instructions of FIG. 4 to implement the example nodes, and/or, more generally, the example cluster of FIG. 1 .
- R JOIN S is the set of all combinations of tuples in R and S that have common attribute names.
- common attributes that are the subset of fields K.
- each physical node in the cluster must provide the data from either S or A to the other nodes, which can then do a local merge.
- a local merge between R and P(S) is that each r in R is compared with all the rows in P(S) to ensure a total Cartesian product is determined for the JOIN.
- the providing of the data between nodes is known as re-shuffling, and makes a real-time JOIN not feasible on a known distributed systems.
- the partitioned data can be scalably filtered, joined and aggregated in real-time, across the plurality of nodes without re-shuffling.
- EPS events-per-second
- EPS is an important benchmark in the field of SIEM
- a 2 ⁇ improvement in EPS represents a significant improvement in database systems, apparatus and methods for SIEM.
- Such improvements allow SIEM analysts the ability to more quickly detect security events and respond to mitigate them in the computer systems they are monitoring, thereby lessening chances of, for example, data loss, data theft, computer system unavailability, etc.
- the database data 102 stored in the portions 106 A-N may be subsequently aggregated (e.g., combined, merged, etc.), in real-time, according to teachings of this disclosure.
- the portions 106 A-N are stored on different nodes 104 A-N. Additionally, and/or alternatively, a portion 106 A-N may be stored on multiple nodes 104 A-N for redundancy, multiple portions 106 A-N may be stored on a node 104 A-N, etc.
- the example system 100 of FIG. 1 includes an example data director 108 .
- the example data director 108 directs the database data 102 (e.g., distributes, spreads, etc.) to the portions 106 A-N according to, for example, a pattern not depending on data content.
- Example patterns include a random distribution, a rotating distribution, a round robin distribution, etc.
- the same data is stored in multiple portions 106 A-N.
- the distribution of the database data 102 can spread the database data 102 evenly (e.g., substantially evenly) across the nodes 104 A-N, bottlenecks can be avoided, and the database data 102 can be partitioned at a high rate, e.g., over two million EPS.
- the database data 102 is segmented into dimensions and facts. Facts contain references (e.g., keys, pointers, etc.) to one or more dimensions.
- the dimensions are stored in a dimension table 110 that is replicated on each node 104 A-N.
- Example dimensions include rules that are used to trigger an alert, the descriptions or properties of rules, etc.
- the portions 106 A-N store database data in a fact table (one of which is designated at reference numeral 112 ) containing example facts F 1 , F 2 , . . . FN.
- the example facts F 1 -FN are stored in separate rows of the fact table 112 (e.g., horizontally arranged, horizontally partitioned, etc.).
- the other portions 106 A-M likewise store the same or different facts in rows of a fact table.
- subsets of the rows of a fact table 112 are stored by a shard, which is a set of nodes 104 A-N storing the same data.
- the example portions 106 A-N, the example fact table 112 , and the example dimension table 110 may be stored on any number and/or type(s) of computer-readable storage device(s) and/or disk(s) using any number and/or type(s) of data structure(s).
- the example nodes 104 A of FIG. 1 include an example querier 114 .
- one of the example queriers 114 e.g., the querier 114 associated with the node 104 A
- An example SQL query that can be included in the request 116 is:
- the querier 114 of the node 104 A which is acting as the coordinator, forms (e.g., defines, generates, etc.) sub-queries 120 B, 120 C, . . . 120 N to be executed by the queriers 114 of respective nodes 104 B-N.
- the coordinator forms a sub-query 120 A to be executed by the coordinator (i.e., the querier 114 of node 104 A).
- the example coordinator forms a sub-query 120 A-N for each node 104 A-N, which may be the same, that stores a portion 106 A-N of the database data related to the request 116 .
- Example SQL sub-queries 120 A-N are:
- the example coordinator of FIG. 1 combines the results of the sub-queries 120 A-N to form a response 124 to the request 116 containing the result(s) of the query contained in the request 116 .
- an example SQL command that may be executed by the coordinator to reduce the results of the sub-queries 120 A-N(e.g., remove redundant entries) is:
- the example API 132 sends a query 116 to the designated coordinator 118 (e.g., one of the queriers 114 ).
- the coordinator 118 performs three actions: (1) determine what nodes need to participate in the resolution of the query 116 ; (2) creates sub-queries 120 A-N, which may be the same, to be computed locally on each participating node 104 A-N; and (3) consolidates the individual results 122 A-N into a single result set.
- the determination of the participating nodes is done by, for example, observing the sharding key values resulting from the WHERE condition of the example query 116 .
- each shard participates in the query 116 (e.g., all shards participates. If a set of sharding key-values is provided as part of the WHERE condition, then the knowledge of what shards manage those key-values is used to determine to what shards to send each sub-query 120 A-N.
- the creation of the sub-queries 120 A-N includes translation of aggregate functions and, if sharding key-values are present, filtering criteria segregated per node 104 A-N according to the data managed by each shard.
- the translation is implemented by, for example, mapping one high-level aggregate function (e.g., from the example AVG(s)) to one or more lower level aggregate functions to be computed in the sub-queries 120 A-N(e.g., COUNT(F.s) and SUM(F.s) in the illustrated example).
- the filtering conditions if present, are inserted for the data range managed by each shard (the example sub-queries 120 A-N shown above do not use sharding keys).
- the data shuffling inherent in known distributed JOIN operations is not required in the examples disclosed herein, at least because facts (e.g., alerts) are partitioned and dimensions (e.g., rules) are replicated. This obviates the need to provide either the facts or the dimensions over a network to ensure all combinations of facts and dimensions can be considered during a JOIN operation.
- the coordinator identifies whether any nodes 104 B-N fail to respond to the sub-queries 120 B-N. If/when a node 104 B-N fails to respond, the coordinator stores a handoff hint in a hints directory 134 on the affected node 104 B-N for handling by cluster management processes.
- a user uses an example client application 126 executing on, for example, an example client device 128 to interact with the cluster 103 .
- the example client application 126 of FIG. 1 may be used to generate and send requests 116 , and process the query results for those requests 116 received in responses 122 to populate, update, etc. the contents of an example dashboard 130 .
- An example dashboard 130 is shown in FIG. 2 .
- the example dashboard 130 of FIG. 2 includes an example graph 202 depicting the numbers of security related events associated with different event types that have been received, an example graph 204 showing a time distribution of security related events reception, etc.
- the example client application 126 maintains the dashboard 130 by sending query requests 116 to collect the database data 102 necessary to update and maintain the graphs 202 , 204 shown in the dashboard 130 .
- the example processor platform 500 of FIG. 5 may be used to the implement the example client device 128 .
- the example client application 126 of FIG. 1 includes an example application programming interface (API) 132 .
- the example API 132 of FIG. 1 enables the client application 126 to communicate with the cluster 103 using SQL operations that are translated to, for example, Apache Thrift-based interfaces implemented by the nodes 104 A-N. Additionally, the API 132 applies one or more policies to select a coordinator for a request 116 . For example, the API 132 may be aware of the topology of the cluster 103 formed by the nodes 104 A-N, and route the request 116 to the closest node 104 A-N according to the database data that is being requested.
- While an example manner of implementing the partitioned database system 100 is illustrated in FIG. 1 , one or more of the elements, processes and/or devices illustrated in FIG. 1 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way.
- the example nodes 104 A-N, the example portions 106 A-N, the example data director 108 , the example dimension tables 110 , the example facts table 112 , the example queriers 114 , the example coordinator, the example client application 126 , the example API 132 , and/or, more generally, the example system 100 of FIG. 1 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware.
- any of the example nodes 104 A-N, the example portions 106 A-N, the example data director 108 , the example dimension tables 110 , the example facts table 112 , the example queriers 114 , the example coordinator, the example client application 126 , the example API 132 , and/or, more generally, the example system 100 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)).
- GPU graphics processing unit
- DSP digital signal processor
- ASIC application specific integrated circuit
- PLD programmable logic device
- FPLD field programmable logic device
- At least one of the example nodes 104 A-N, the example portions 106 A-N, the example data director 108 , the example dimension tables 110 , the example facts table 112 , the example queriers 114 , the example coordinator, the example client application 126 , the example API 132 , and the example system 100 is/are hereby expressly defined to include a non-transitory computer-readable storage device or storage disk such as a memory, a digital versatile disc (DVD), a compact disc (CD), a Blu-ray disc, etc. including the software and/or firmware.
- a non-transitory computer-readable storage device or storage disk such as a memory, a digital versatile disc (DVD), a compact disc (CD), a Blu-ray disc, etc. including the software and/or firmware.
- the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.
- FIG. 3 is a block diagram illustrating an example implementation for the example cluster 103 and the example nodes 104 A-N of FIG. 1 .
- the example nodes 104 A-N of FIG. 3 include an example respective example service interface module 302 A, 302 B, 302 C, . . . 302 N.
- the example service interface modules 302 A-N of FIG. 3 form a distributed service interface layer 302 for the cluster 103 that provides a common interface for the client application 126 to query the portions 106 A-N.
- the example distributed service layer 302 of FIG. 3 enables the client application 126 to query the cluster 103 as if it is a single database node.
- the client application 126 can send a single query request 116 that gets decomposed into sub-queries 120 A-N for execution by the nodes 104 A-N, without the client application 126 needing to be aware of how the database data 102 is partitioned.
- the example service interface modules 302 A-N include a respective example querier 114 .
- the distributed service layer 302 maintains information regarding the current processing loads of the nodes 104 A-N, and distances between the nodes 104 A-N in terms of experienced latency, which may not be consistent with network topology, identifies which nodes 104 B-N are operational, parses query requests 116 to identify which nodes 104 B-N should receive each sub-query 120 B-N, and merges the results before sending the query results to the client application 126 .
- the example nodes 104 A-N of FIG. 3 include a respective example data management module 304 A, 304 B, 304 C, . . . 304 N.
- the example data management modules 304 A-N of FIG. 3 form a data management layer 304 for the cluster 103 .
- the example data management layer 304 of FIG. 3 maintains durable system state information for the database data 102 stored in the portions 106 A-N, and makes any necessary durable changes to the portions 106 A-N.
- the example nodes 104 A-N of FIG. 3 include a respective example cluster management module 306 A, 306 B, 306 C, . . . 306 N.
- the example cluster management modules 306 A-N of FIG. 3 form a cluster management layer 306 for the cluster 103 .
- 3 ensures long term consistency of the cluster 103 , e.g., by checking portions 106 A-N(e.g., fact tables 112 ) and dimension tables 110 for internal consistency, running data repair processes when inconsistencies are identified, allowing empty nodes to be introduced in the cluster 103 , allowing a node to leave the cluster 103 (e.g., be decommissioned), re-balancing data across shards, allowing a node that has been outside the cluster for a period of time to be re-introduced, etc.
- the example service interface modules 302 A-N, the example service interface layer 302 , the example data management modules 304 A-N, the example data management layer 304 , the example cluster management modules 306 A-N, the example cluster management layer 306 , and/or, more generally, the example nodes 104 A-N and the example cluster 103 of FIG. 3 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware.
- any of the example service interface modules 302 A-N, the example service interface layer 302 , the example data management modules 304 A-N, the example data management layer 304 , the example cluster management modules 306 A-N, the example cluster management layer 306 , and/or, more generally, the example nodes 104 A-N and the example cluster 103 of FIG. 3 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), GPU(s), DSP(s), ASIC(s), PLD(s) and/or FPLD(s).
- At least one of the example service interface modules 302 A-N, the example service interface layer 302 , the example data management modules 304 A-N, the example data management layer 304 , the example cluster management modules 306 A-N, the example cluster management layer 306 , the example nodes 104 A-N, and the example cluster 103 of FIG. 3 is/are hereby expressly defined to include a non-transitory computer-readable storage device or storage disk such as a memory, a DVD, a CD, a Blu-ray disc, etc. including the software and/or firmware.
- the example nodes 104 A-N and the example cluster 103 of FIG. 1 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated in FIG. 3 , and/or may include more than one of any or all of the illustrated elements, processes and devices.
- FIG. 4 A flowchart representative of example hardware logic or machine-readable instructions for implementing the example nodes 104 A-N, and/or, more generally, the example cluster 130 of FIGS. 1 and 3 is shown in FIG. 4 .
- the machine-readable instructions may be a program or portion of a program for execution by a processor such as the processor 510 shown in the example processor platform 500 discussed below in connection with FIG. 5 .
- the program may be embodied in software stored on a non-transitory computer-readable storage medium such as a compact disc read-only memory (CD-ROM), a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with the processor 510 , but the entire program and/or parts thereof could alternatively be executed by a device other than the processor 510 and/or embodied in firmware or dedicated hardware.
- a non-transitory computer-readable storage medium such as a compact disc read-only memory (CD-ROM), a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with the processor 510 , but the entire program and/or parts thereof could alternatively be executed by a device other than the processor 510 and/or embodied in firmware or dedicated hardware.
- a non-transitory computer-readable storage medium such as a compact disc read-only memory (CD-ROM), a floppy disk, a hard drive, a DVD,
- any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuitry, an FPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logic circuit, etc.) structured to perform the corresponding operation without executing software or firmware.
- hardware circuits e.g., discrete and/or integrated analog and/or digital circuitry, an FPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logic circuit, etc.
- the example process of FIG. 4 may be implemented using executable instructions (e.g., computer and/or machine-readable instructions) stored on a non-transitory computer and/or machine-readable medium such as a hard disk drive, a flash memory, a read-only memory, a CD-ROM, a DVD, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
- a non-transitory computer-readable medium is expressly defined to include any type of computer-readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media.
- A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, and (6) B with C.
- the program of FIG. 4 begins at block 402 , where the cluster 103 waits to receive a query request 116 from a client application 126 .
- a query request 116 is received at a first of the nodes 104 A-N acting as a coordinator for the query request 116 (block 402 )
- the coordinator identifies the nodes 104 A-N affected by the query request 116 (block 404 ).
- the coordinator decomposes the query request 116 into sub-queries 120 A-N (block 406 ), and sends the sub-queries 120 A-N to the identified nodes 104 A-N(block 408 ).
- the coordinator waits to receive results 122 A-N for the sub-queries 120 A-N from the identified nodes 104 A-N(block 410 ).
- the coordinator combines the results (block 412 ) and reduces the results to, for example, remove redundant data (block 414 ).
- the coordinator sends a response 124 with the results to the client application 126 (block 416 ), and control exits from the example program of FIG. 4 .
- the coordinator determines whether a timeout has occurred (block 418 ). If a timeout has not occurred (block 418 ), the coordinator continues to wait for sub-query results 122 A-N(block 410 ). If a timeout has occurred (block 418 ), the coordinator stores a hinted handoff notice for the node(s) 104 A-N from which sub-query results 122 A-N have not been received (block 420 ), and control proceeds to block 412 to combine the sub-query results 122 A-N that were received.
- FIG. 5 is a block diagram of an example processor platform 500 structured to execute the instructions of FIG. 4 to implement the cluster 103 and nodes 104 A-N of FIGS. 1 and 3 .
- the processor platform 500 can be, for example, a server, a personal computer, a workstation, or any other type of computing device.
- the processor platform 500 of the illustrated example includes a processor 510 .
- the processor 510 of the illustrated example is hardware.
- the processor 510 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer.
- the hardware processor may be a semiconductor based (e.g., silicon based) device.
- the processor implements the example queriers 114 , the example coordinator, the example service interface modules 302 A-N, the example data management modules 304 A-N, the example cluster management modules 306 A-N, the example data director 108 , the example client application 126 , the example API 132 .
- the processor 510 of the illustrated example includes a local memory 512 (e.g., a cache).
- the processor 510 of the illustrated example is in communication with a main memory including a volatile memory 514 and a non-volatile memory 516 via a bus 518 .
- the volatile memory 514 may be implemented by Synchronous Dynamic Random-Access Memory (SDRAM), Dynamic Random-Access Memory (DRAM), RAMBUS® Dynamic Random-Access Memory (RDRAM®) and/or any other type of random access memory device.
- the non-volatile memory 516 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 514 , 516 is controlled by a memory controller.
- the processor platform 500 of the illustrated example also includes an interface circuit 520 .
- the interface circuit 520 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
- one or more input devices 522 are connected to the interface circuit 520 .
- the input device(s) 522 permit(s) a user to enter data and/or commands into the processor 510 .
- the input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
- One or more output devices 524 are also connected to the interface circuit 520 of the illustrated example.
- the output devices 524 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker.
- the interface circuit 520 of the illustrated example thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
- the example dashboard 130 may be displayed on an output device 524
- the interface circuit 520 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 526 .
- the communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
- the interface circuit 520 includes a radio frequency (RF) module, antenna(s), amplifiers, filters, modulators, etc.
- RF radio frequency
- the processor platform 500 of the illustrated example also includes one or more mass storage devices 528 for storing software and/or data.
- mass storage devices 528 include floppy disk drives, hard drive disks, CD drives, Blu-ray disc drives, redundant array of independent disks (RAID) systems, and DVD drives.
- RAID redundant array of independent disks
- the example portions 106 A-N, the example fact table 112 , and the example dimension table 110 are stored on the mass storage device 528 .
- Coded instructions 532 including the coded instructions of FIG. 4 may be stored in the mass storage device 528 , in the volatile memory 514 , in the non-volatile memory 516 , and/or on a removable non-transitory computer-readable storage medium such as a CD-ROM or a DVD.
- Example methods, systems, apparatus, and articles of manufacture to aggregate partitioned database data are disclosed herein. Further examples and combinations thereof include at least the following.
- Example 1 is a partitioned database system that includes:
- a data director to distribute a plurality of portions of database data across the plurality of nodes, the plurality of portions distributed according to a pattern not based on data content
- queriers associated with respective ones of the plurality of nodes, the queriers to execute respective sub-queries of respective portions of the database data
- a coordinator to:
- Example 2 is the partitioned database system of example 1, wherein at least some of the nodes store their respective portions of the database data in a horizontally-arranged fact table.
- Example 3 is the partitioned database system of any of examples 1 to 2, wherein the pattern is at least one of a rotating pattern, or a random pattern.
- Example 4 is the partitioned database system of any of examples 1 to 3, wherein the queriers implement a distributed interface, the distributed interface to monitor a topology of storage devices associated with the nodes, and a real-time status of the partitioned database system.
- Example 5 is the partitioned database system of any of examples 1 to 4, wherein a first of the queriers is to perform the respective sub-query without a shuffle of the respective portion of the database data.
- Example 6 is the partitioned database system of any of examples 1 to 5, wherein a first of the sub-queries is a linearly-scalable query.
- Example 7 is the partitioned database system of any of examples 1 to 6, wherein the coordinator is a first of the queriers, and the coordinator is to: form the sub-queries based on the request; and send the sub-queries to others of the queriers.
- Example 8 is the partitioned database system of example 7, wherein the first of the queriers is to decompose the request to form the sub-queries.
- Example 9 is the partitioned database system of any of examples 1 to 8, wherein the request is received from a client application.
- Example 10 is method that includes:
- decomposing a request to query the database data by executing an instruction with at least one processor, to form a plurality of sub-queries of respective portions of the database data;
- Example 11 is the method of example 10, wherein distributing the database data is according to at least one of a rotating pattern, or a random pattern.
- Example 12 is the method of any of examples 10 to 11, wherein a first of the sub-queries does not shuffle the respective portion of the database data.
- Example 13 is the method of any of examples 10 to 11, wherein a first of the plurality of sub-queries is linearly scalable.
- Example 14 is the method of any of examples 10 to 13, wherein receiving the query request and decomposing the query request to form the plurality of sub-queries is performed on a first node of the plurality of nodes, the first node to:
- Example 15 is the method of example 14, wherein combining the results includes merging and reducing the results of the sub-queries.
- Example 16 is a non-transitory computer-readable storage medium storing instructions that, when executed, cause a machine to at least:
- Example 17 is the non-transitory computer-readable storage medium of example 16, wherein a first of the sub-queries is linearly scalable.
- Example 18 is the non-transitory computer-readable storage medium of any of examples 16 to 17, including further instructions that, when executed, cause the machine to combine the results by merging and reducing the results of the sub-queries.
- Example 19 is the non-transitory computer-readable storage medium of any of examples 16 to 18, wherein a first of the sub-queries does not shuffle the respective portion of the database data.
- Example 20 is the non-transitory computer-readable storage medium of any of examples 16 to 19, wherein a distribution pattern of the database data is not dependent on data content.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Operations Research (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- This patent arises from a continuation of U.S. patent application Ser. No. 15/935,746, (now U.S. Pat. No. ______) which was filed on Mar. 26, 2018. U.S. patent application Ser. No. 15/935,746 is hereby incorporated herein by reference in its entirety. Priority to U.S. patent application Ser. No. 15/935,746 is hereby claimed.
- This disclosure relates generally to databases, and, more particularly, to methods, apparatus and systems to aggregate partitioned computer database data.
- Data driven security applies big data analytics to security data streams. Security data streams may be generated by collecting data coming from large numbers of machines distributed across large-scale customer systems.
-
FIG. 1 illustrates an example system in which partitioned database data may be scalably aggregated, according to teachings of this disclosure. -
FIG. 2 is an example dashboard for the example system ofFIG. 1 . -
FIG. 3 is a block diagram illustrating an example implementation for the example cluster and the example nodes ofFIG. 1 . -
FIG. 4 is a flowchart representative of example hardware logic or machine-readable instructions for implementing the example nodes, and/or, more generally, the example cluster ofFIG. 1 . -
FIG. 5 illustrates an example processor platform structured to execute the example machine-readable instructions ofFIG. 4 to implement the example nodes, and/or, more generally, the example cluster ofFIG. 1 . - When useful, the same reference numbers will be used in the drawing(s) and accompanying written description to refer to the same or like parts. Connecting lines or connectors shown in the various figures presented are intended to represent example functional relationships and/or physical or logical couplings between the various elements.
- In the field of computer security information and event management (SIEM), security operations center analysts need to be able to interactively control data stream aggregation, and filtering to identify data stream properties that might otherwise remain unobserved. However, as customer deployments have grown to cloud scale, security data streams have become so large (e.g., hundreds of millions of events) that stream ingestion and aggregation can no longer be handled by a single database node. Accordingly, systems, methods and apparatus that scale beyond the existing limits of a single node are disclosed herein. Some examples disclosed herein scale stream data ingestion by partitioning (e.g., spreading, distributing) the stream data across a plurality of nodes, as the stream data is received and ingested (i.e., in substantially real-time).
- In known large systems, data that has been partitioned in that way across a plurality of nodes cannot be joined without data re-shuffle (e.g., aggregated, combined, etc.). For example, given two tables R and S, R JOIN S is the set of all combinations of tuples in R and S that have common attribute names. Consider example common attributes that are the subset of fields K. To compute R JOIN S, the database system must take each row r in R and find all the tuples s in S where r.k=s.k. To compute this in a distributed system, where both R and S are distributed, each physical node in the cluster must provide the data from either S or A to the other nodes, which can then do a local merge. A local merge between R and P(S) is that each r in R is compared with all the rows in P(S) to ensure a total Cartesian product is determined for the JOIN. The providing of the data between nodes is known as re-shuffling, and makes a real-time JOIN not feasible on a known distributed systems. In contrast, according to teachings of this disclosure, the partitioned data can be scalably filtered, joined and aggregated in real-time, across the plurality of nodes without re-shuffling. Currently-available, expensive and complex systems are only capable of processing approximately one million events-per-second (EPS). In stark contrast, the teachings of this disclosure have been used to demonstrate systems that are capable of over two million EPS. As EPS is an important benchmark in the field of SIEM, a 2× improvement in EPS represents a significant improvement in database systems, apparatus and methods for SIEM. Such improvements allow SIEM analysts the ability to more quickly detect security events and respond to mitigate them in the computer systems they are monitoring, thereby lessening chances of, for example, data loss, data theft, computer system unavailability, etc.
- Reference will now be made in detail to non-limiting examples, some of which are illustrated in the accompanying drawings.
-
FIG. 1 illustrates an example partitioneddatabase system 100 in which database data 102 (e.g., security event data for a SIEM system) may be scalably partitioned, in real-time, across acluster 103 of 104A, 104B, 104C, . . . 104N. Annodes example node 104A-N is a computer system (e.g., a server, a workstation, etc.) having one or more non-transitory storage device or storage disk for storing database data. Thedatabase data 102 is partitioned into 106A, 106B, 106C, . . . 106N of theportions database data 102 that are stored on thenodes 104A-N. Thedatabase data 102 stored in theportions 106A-N may be subsequently aggregated (e.g., combined, merged, etc.), in real-time, according to teachings of this disclosure. In some examples, theportions 106A-N are stored ondifferent nodes 104A-N. Additionally, and/or alternatively, aportion 106A-N may be stored onmultiple nodes 104A-N for redundancy,multiple portions 106A-N may be stored on anode 104A-N, etc. - To partition the
database data 102 into theportions 106A-N, theexample system 100 ofFIG. 1 includes anexample data director 108. As thedatabase data 102 is received in real-time, theexample data director 108 directs the database data 102 (e.g., distributes, spreads, etc.) to theportions 106A-N according to, for example, a pattern not depending on data content. Example patterns include a random distribution, a rotating distribution, a round robin distribution, etc. In some examples, the same data is stored inmultiple portions 106A-N. Because the distribution of thedatabase data 102 can spread thedatabase data 102 evenly (e.g., substantially evenly) across thenodes 104A-N, bottlenecks can be avoided, and thedatabase data 102 can be partitioned at a high rate, e.g., over two million EPS. - In the illustrated example of
FIG. 1 , thedatabase data 102 is segmented into dimensions and facts. Facts contain references (e.g., keys, pointers, etc.) to one or more dimensions. In the example ofFIG. 1 , the dimensions are stored in a dimension table 110 that is replicated on eachnode 104A-N. Example dimensions include rules that are used to trigger an alert, the descriptions or properties of rules, etc. - In the illustrated example of
FIG. 1 , theportions 106A-N store database data in a fact table (one of which is designated at reference numeral 112) containing example facts F1, F2, . . . FN. The example facts F1-FN are stored in separate rows of the fact table 112 (e.g., horizontally arranged, horizontally partitioned, etc.). Theother portions 106A-M likewise store the same or different facts in rows of a fact table. In some examples, subsets of the rows of a fact table 112 are stored by a shard, which is a set ofnodes 104A-N storing the same data. - The
example portions 106A-N, the example fact table 112, and the example dimension table 110 may be stored on any number and/or type(s) of computer-readable storage device(s) and/or disk(s) using any number and/or type(s) of data structure(s). - To query the
database data 102, theexample nodes 104A ofFIG. 1 include anexample querier 114. For eachrequest 116 to query thedatabase data 102, one of the example queriers 114 (e.g., thequerier 114 associated with thenode 104A) is a coordinator for therequest 116. An example SQL query that can be included in therequest 116 is: -
- SELECT D.Msg, AVG(F.s) As Average
- FROM Alert As F INNER JOIN Rule As D ON (F.DSIDSigID=DID)
- WHERE D.class=<filter>
- GROUP BY D.Msg
- In the illustrated example, the
querier 114 of thenode 104A, which is acting as the coordinator, forms (e.g., defines, generates, etc.) 120B, 120C, . . . 120N to be executed by thesub-queries queriers 114 ofrespective nodes 104B-N. In some examples, the coordinator forms asub-query 120A to be executed by the coordinator (i.e., thequerier 114 ofnode 104A). The example coordinator forms asub-query 120A-N for eachnode 104A-N, which may be the same, that stores aportion 106A-N of the database data related to therequest 116. Example SQLsub-queries 120A-N are: -
- SELECT D.Msg, SUM(F.s) As Sum, COUNT(F.s) As Count
- FROM Alert As F INNER JOIN Rule As D ON (F.DSIDSigID=DID)
- WHERE D.class=<filter>
- GROUP BY D.Msg
The queries shown above are illustrative examples of queries that may be performed to populate theexample dashboard 130 ofFIG. 2 . In these examples, the table Alert is an F (partition-able) table (which is a Fact table) than needs to be joined with Rule (which is a Dimension table). The result of theexample query 116 is the average volume (F.s) of messages per type (D.Msg). Thequery 116 is the high-level query sent to thedatabase coordinator 118. The example sub-queries 120A-N represents the intermediate calculations done by eachnode 104A-N before final aggregation. The example queriers 114 execute their sub-query 120A-N, and return 122A, 122B, 122C, . . . 122N to the coordinator containing the result(s) of their sub-query 120A-N. Because theresponses database data 102 is separated into horizontally-partitioned fact tables and dimension tables,database data 102 does not need to be reshuffled (e.g., moved betweennodes 104A-N) to perform the sub-queries 120A-N. By eliminating reshuffling, the sub-queries 120A-N are linearly scalable and, thus, theoverall query request 116 can be performed in real-time. Moreover, the aggregation of data can be changed after thedatabase data 102 has been partitioned.
- The example coordinator of
FIG. 1 combines the results of the sub-queries 120A-N to form aresponse 124 to therequest 116 containing the result(s) of the query contained in therequest 116. Assuming the results of the sub-queries 120A-N have been combined into MapResults, an example SQL command that may be executed by the coordinator to reduce the results of the sub-queries 120A-N(e.g., remove redundant entries) is: -
- SELECT Msg, SUM(Count)/SUM(sum) As Average
- FROM MapResults
- GROUP BY Msg
- The
example API 132 sends aquery 116 to the designated coordinator 118 (e.g., one of the queriers 114). Thecoordinator 118 performs three actions: (1) determine what nodes need to participate in the resolution of thequery 116; (2) creates sub-queries 120A-N, which may be the same, to be computed locally on each participatingnode 104A-N; and (3) consolidates theindividual results 122A-N into a single result set. The determination of the participating nodes is done by, for example, observing the sharding key values resulting from the WHERE condition of theexample query 116. If no sharding key is provided as part of the WHERE condition, then one replica of each shard participates in the query 116 (e.g., all shards participates. If a set of sharding key-values is provided as part of the WHERE condition, then the knowledge of what shards manage those key-values is used to determine to what shards to send each sub-query 120A-N. The creation of the sub-queries 120A-N includes translation of aggregate functions and, if sharding key-values are present, filtering criteria segregated pernode 104A-N according to the data managed by each shard. The translation is implemented by, for example, mapping one high-level aggregate function (e.g., from the example AVG(s)) to one or more lower level aggregate functions to be computed in the sub-queries 120A-N(e.g., COUNT(F.s) and SUM(F.s) in the illustrated example). The filtering conditions, if present, are inserted for the data range managed by each shard (the example sub-queries 120A-N shown above do not use sharding keys). The data shuffling inherent in known distributed JOIN operations is not required in the examples disclosed herein, at least because facts (e.g., alerts) are partitioned and dimensions (e.g., rules) are replicated. This obviates the need to provide either the facts or the dimensions over a network to ensure all combinations of facts and dimensions can be considered during a JOIN operation. - In some examples, the coordinator identifies whether any
nodes 104B-N fail to respond to the sub-queries 120B-N. If/when anode 104B-N fails to respond, the coordinator stores a handoff hint in ahints directory 134 on theaffected node 104B-N for handling by cluster management processes. - In the illustrated example of
FIG. 1 , a user (e.g., a STEM security operations center analyst) uses anexample client application 126 executing on, for example, anexample client device 128 to interact with thecluster 103. For example, theexample client application 126 ofFIG. 1 may be used to generate and sendrequests 116, and process the query results for thoserequests 116 received in responses 122 to populate, update, etc. the contents of anexample dashboard 130. Anexample dashboard 130 is shown inFIG. 2 . Theexample dashboard 130 ofFIG. 2 includes anexample graph 202 depicting the numbers of security related events associated with different event types that have been received, anexample graph 204 showing a time distribution of security related events reception, etc. Theexample client application 126 maintains thedashboard 130 by sendingquery requests 116 to collect thedatabase data 102 necessary to update and maintain the 202, 204 shown in thegraphs dashboard 130. Theexample processor platform 500 ofFIG. 5 may be used to the implement theexample client device 128. - To enable the
client application 126 to communicate with thenodes 104A-N, theexample client application 126 ofFIG. 1 includes an example application programming interface (API) 132. Theexample API 132 ofFIG. 1 enables theclient application 126 to communicate with thecluster 103 using SQL operations that are translated to, for example, Apache Thrift-based interfaces implemented by thenodes 104A-N. Additionally, theAPI 132 applies one or more policies to select a coordinator for arequest 116. For example, theAPI 132 may be aware of the topology of thecluster 103 formed by thenodes 104A-N, and route therequest 116 to theclosest node 104A-N according to the database data that is being requested. - While an example manner of implementing the partitioned
database system 100 is illustrated inFIG. 1 , one or more of the elements, processes and/or devices illustrated inFIG. 1 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, theexample nodes 104A-N, theexample portions 106A-N, theexample data director 108, the example dimension tables 110, the example facts table 112, theexample queriers 114, the example coordinator, theexample client application 126, theexample API 132, and/or, more generally, theexample system 100 ofFIG. 1 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware. Thus, for example, any of theexample nodes 104A-N, theexample portions 106A-N, theexample data director 108, the example dimension tables 110, the example facts table 112, theexample queriers 114, the example coordinator, theexample client application 126, theexample API 132, and/or, more generally, theexample system 100 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of theexample nodes 104A-N, theexample portions 106A-N, theexample data director 108, the example dimension tables 110, the example facts table 112, theexample queriers 114, the example coordinator, theexample client application 126, theexample API 132, and theexample system 100 is/are hereby expressly defined to include a non-transitory computer-readable storage device or storage disk such as a memory, a digital versatile disc (DVD), a compact disc (CD), a Blu-ray disc, etc. including the software and/or firmware. Further still, theexample system 100 ofFIG. 1 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated inFIG. 1 , and/or may include more than one of any or all of the illustrated elements, processes and devices. As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events. -
FIG. 3 is a block diagram illustrating an example implementation for theexample cluster 103 and theexample nodes 104A-N ofFIG. 1 . To provide an interface to theexample nodes 104A-N, theexample nodes 104A-N ofFIG. 3 include an example respective example 302A, 302B, 302C, . . . 302N. The exampleservice interface module service interface modules 302A-N ofFIG. 3 form a distributedservice interface layer 302 for thecluster 103 that provides a common interface for theclient application 126 to query theportions 106A-N. The example distributedservice layer 302 ofFIG. 3 enables theclient application 126 to query thecluster 103 as if it is a single database node. For example, theclient application 126 can send asingle query request 116 that gets decomposed into sub-queries 120A-N for execution by thenodes 104A-N, without theclient application 126 needing to be aware of how thedatabase data 102 is partitioned. To query theportions 106A-N, the exampleservice interface modules 302A-N include arespective example querier 114. - In the illustrated example of
FIG. 3 , the distributedservice layer 302 maintains information regarding the current processing loads of thenodes 104A-N, and distances between thenodes 104A-N in terms of experienced latency, which may not be consistent with network topology, identifies whichnodes 104B-N are operational, parses query requests 116 to identify whichnodes 104B-N should receive each sub-query 120B-N, and merges the results before sending the query results to theclient application 126. - To manage data persistency, the
example nodes 104A-N ofFIG. 3 include a respective exampledata management module 304A, 304B, 304C, . . . 304N. The exampledata management modules 304A-N ofFIG. 3 form adata management layer 304 for thecluster 103. The exampledata management layer 304 ofFIG. 3 maintains durable system state information for thedatabase data 102 stored in theportions 106A-N, and makes any necessary durable changes to theportions 106A-N. - To manage the
cluster 103, theexample nodes 104A-N ofFIG. 3 include a respective example 306A, 306B, 306C, . . . 306N. The examplecluster management module cluster management modules 306A-N ofFIG. 3 form acluster management layer 306 for thecluster 103. The examplecluster management layer 306 ofFIG. 3 ensures long term consistency of thecluster 103, e.g., by checkingportions 106A-N(e.g., fact tables 112) and dimension tables 110 for internal consistency, running data repair processes when inconsistencies are identified, allowing empty nodes to be introduced in thecluster 103, allowing a node to leave the cluster 103 (e.g., be decommissioned), re-balancing data across shards, allowing a node that has been outside the cluster for a period of time to be re-introduced, etc. - While an example manner of implementing the
example cluster 103 and theexample nodes 104A-N ofFIG. 1 is illustrated inFIG. 3 , one or more of the elements, processes and/or devices illustrated inFIG. 3 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the exampleservice interface modules 302A-N, the exampleservice interface layer 302, the exampledata management modules 304A-N, the exampledata management layer 304, the examplecluster management modules 306A-N, the examplecluster management layer 306, and/or, more generally, theexample nodes 104A-N and theexample cluster 103 ofFIG. 3 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware. Thus, for example, any of the exampleservice interface modules 302A-N, the exampleservice interface layer 302, the exampledata management modules 304A-N, the exampledata management layer 304, the examplecluster management modules 306A-N, the examplecluster management layer 306, and/or, more generally, theexample nodes 104A-N and theexample cluster 103 ofFIG. 3 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), GPU(s), DSP(s), ASIC(s), PLD(s) and/or FPLD(s). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the exampleservice interface modules 302A-N, the exampleservice interface layer 302, the exampledata management modules 304A-N, the exampledata management layer 304, the examplecluster management modules 306A-N, the examplecluster management layer 306, theexample nodes 104A-N, and theexample cluster 103 ofFIG. 3 is/are hereby expressly defined to include a non-transitory computer-readable storage device or storage disk such as a memory, a DVD, a CD, a Blu-ray disc, etc. including the software and/or firmware. Further still, theexample nodes 104A-N and theexample cluster 103 ofFIG. 1 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated inFIG. 3 , and/or may include more than one of any or all of the illustrated elements, processes and devices. - A flowchart representative of example hardware logic or machine-readable instructions for implementing the
example nodes 104A-N, and/or, more generally, theexample cluster 130 ofFIGS. 1 and 3 is shown inFIG. 4 . The machine-readable instructions may be a program or portion of a program for execution by a processor such as theprocessor 510 shown in theexample processor platform 500 discussed below in connection withFIG. 5 . The program may be embodied in software stored on a non-transitory computer-readable storage medium such as a compact disc read-only memory (CD-ROM), a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with theprocessor 510, but the entire program and/or parts thereof could alternatively be executed by a device other than theprocessor 510 and/or embodied in firmware or dedicated hardware. Further, although the example program is described with reference to the flowchart illustrated inFIG. 4 , many other methods of implementing theexample cluster 103 and theexample nodes 104A-N may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined. Additionally, and/or alternatively, any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuitry, an FPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logic circuit, etc.) structured to perform the corresponding operation without executing software or firmware. - As mentioned above, the example process of
FIG. 4 may be implemented using executable instructions (e.g., computer and/or machine-readable instructions) stored on a non-transitory computer and/or machine-readable medium such as a hard disk drive, a flash memory, a read-only memory, a CD-ROM, a DVD, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term non-transitory computer-readable medium is expressly defined to include any type of computer-readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media. - “Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, and (6) B with C.
- The program of
FIG. 4 begins atblock 402, where thecluster 103 waits to receive aquery request 116 from aclient application 126. When aquery request 116 is received at a first of thenodes 104A-N acting as a coordinator for the query request 116 (block 402), the coordinator identifies thenodes 104A-N affected by the query request 116 (block 404). The coordinator decomposes thequery request 116 into sub-queries 120A-N (block 406), and sends the sub-queries 120A-N to the identifiednodes 104A-N(block 408). - The coordinator waits to receive
results 122A-N for the sub-queries 120A-N from the identifiednodes 104A-N(block 410). When theresults 122A-N have been received (block 410), the coordinator combines the results (block 412) and reduces the results to, for example, remove redundant data (block 414). The coordinator sends aresponse 124 with the results to the client application 126 (block 416), and control exits from the example program ofFIG. 4 . - Returning to block 410, when not all
sub-query results 122A-N have been received (block 410), the coordinator determines whether a timeout has occurred (block 418). If a timeout has not occurred (block 418), the coordinator continues to wait forsub-query results 122A-N(block 410). If a timeout has occurred (block 418), the coordinator stores a hinted handoff notice for the node(s) 104A-N from whichsub-query results 122A-N have not been received (block 420), and control proceeds to block 412 to combine thesub-query results 122A-N that were received. -
FIG. 5 is a block diagram of anexample processor platform 500 structured to execute the instructions ofFIG. 4 to implement thecluster 103 andnodes 104A-N ofFIGS. 1 and 3 . Theprocessor platform 500 can be, for example, a server, a personal computer, a workstation, or any other type of computing device. - The
processor platform 500 of the illustrated example includes aprocessor 510. Theprocessor 510 of the illustrated example is hardware. For example, theprocessor 510 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements theexample queriers 114, the example coordinator, the exampleservice interface modules 302A-N, the exampledata management modules 304A-N, the examplecluster management modules 306A-N, theexample data director 108, theexample client application 126, theexample API 132. - The
processor 510 of the illustrated example includes a local memory 512 (e.g., a cache). Theprocessor 510 of the illustrated example is in communication with a main memory including avolatile memory 514 and anon-volatile memory 516 via abus 518. Thevolatile memory 514 may be implemented by Synchronous Dynamic Random-Access Memory (SDRAM), Dynamic Random-Access Memory (DRAM), RAMBUS® Dynamic Random-Access Memory (RDRAM®) and/or any other type of random access memory device. Thenon-volatile memory 516 may be implemented by flash memory and/or any other desired type of memory device. Access to the 514, 516 is controlled by a memory controller.main memory - The
processor platform 500 of the illustrated example also includes aninterface circuit 520. Theinterface circuit 520 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface. - In the illustrated example, one or
more input devices 522 are connected to theinterface circuit 520. The input device(s) 522 permit(s) a user to enter data and/or commands into theprocessor 510. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system. - One or
more output devices 524 are also connected to theinterface circuit 520 of the illustrated example. Theoutput devices 524 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. Theinterface circuit 520 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor. Theexample dashboard 130 may be displayed on anoutput device 524 - The
interface circuit 520 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via anetwork 526. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc. In some examples of a Wi-Fi system, theinterface circuit 520 includes a radio frequency (RF) module, antenna(s), amplifiers, filters, modulators, etc. - The
processor platform 500 of the illustrated example also includes one or moremass storage devices 528 for storing software and/or data. Examples of suchmass storage devices 528 include floppy disk drives, hard drive disks, CD drives, Blu-ray disc drives, redundant array of independent disks (RAID) systems, and DVD drives. In the illustrated example, theexample portions 106A-N, the example fact table 112, and the example dimension table 110 are stored on themass storage device 528. -
Coded instructions 532 including the coded instructions ofFIG. 4 may be stored in themass storage device 528, in thevolatile memory 514, in thenon-volatile memory 516, and/or on a removable non-transitory computer-readable storage medium such as a CD-ROM or a DVD. - From the foregoing, it will be appreciated that example systems, methods, apparatus and articles of manufacture have been disclosed that aggregate partitioned database data. From the foregoing, it will be appreciated that methods, apparatus and articles of manufacture have been disclosed that make computer operations more efficient by being able to aggregate partitioned database data. Thus, through use of teachings of this disclosure, computers can operate more efficiently by being able to process database data in real-time at rates that are currently infeasible.
- Example methods, systems, apparatus, and articles of manufacture to aggregate partitioned database data are disclosed herein. Further examples and combinations thereof include at least the following.
- Example 1 is a partitioned database system that includes:
- a plurality of nodes;
- a data director to distribute a plurality of portions of database data across the plurality of nodes, the plurality of portions distributed according to a pattern not based on data content;
- queriers associated with respective ones of the plurality of nodes, the queriers to execute respective sub-queries of respective portions of the database data; and
- a coordinator to:
-
- receive a request to query the database data; and
- merge results of the plurality of sub-queries to form a response to the request.
- Example 2 is the partitioned database system of example 1, wherein at least some of the nodes store their respective portions of the database data in a horizontally-arranged fact table.
- Example 3 is the partitioned database system of any of examples 1 to 2, wherein the pattern is at least one of a rotating pattern, or a random pattern.
- Example 4 is the partitioned database system of any of examples 1 to 3, wherein the queriers implement a distributed interface, the distributed interface to monitor a topology of storage devices associated with the nodes, and a real-time status of the partitioned database system.
- Example 5 is the partitioned database system of any of examples 1 to 4, wherein a first of the queriers is to perform the respective sub-query without a shuffle of the respective portion of the database data.
- Example 6 is the partitioned database system of any of examples 1 to 5, wherein a first of the sub-queries is a linearly-scalable query.
- Example 7 is the partitioned database system of any of examples 1 to 6, wherein the coordinator is a first of the queriers, and the coordinator is to: form the sub-queries based on the request; and send the sub-queries to others of the queriers.
- Example 8 is the partitioned database system of example 7, wherein the first of the queriers is to decompose the request to form the sub-queries.
- Example 9 is the partitioned database system of any of examples 1 to 8, wherein the request is received from a client application.
- Example 10 is method that includes:
- distributing respective portions of database data across a plurality of nodes;
- decomposing a request to query the database data, by executing an instruction with at least one processor, to form a plurality of sub-queries of respective portions of the database data;
- executing the sub-queries on respective ones of the nodes; and
- combining results of the plurality of sub-queries, by executing an instruction with at least one processor, to form a response to the request.
- Example 11 is the method of example 10, wherein distributing the database data is according to at least one of a rotating pattern, or a random pattern.
- Example 12 is the method of any of examples 10 to 11, wherein a first of the sub-queries does not shuffle the respective portion of the database data.
- Example 13 is the method of any of examples 10 to 11, wherein a first of the plurality of sub-queries is linearly scalable.
- Example 14 is the method of any of examples 10 to 13, wherein receiving the query request and decomposing the query request to form the plurality of sub-queries is performed on a first node of the plurality of nodes, the first node to:
- send the sub-queries to respective ones of the plurality of nodes;
- receive the results of the sub-queries from the respective ones of the nodes; and
- combine the results to form the response.
- Example 15 is the method of example 14, wherein combining the results includes merging and reducing the results of the sub-queries.
- Example 16 is a non-transitory computer-readable storage medium storing instructions that, when executed, cause a machine to at least:
- decompose a request to query database data to form a plurality of sub-queries of respective portions of the database data, the portions of the database data distributed on respective nodes of a partitioned database system;
- send the sub-queries to respective nodes for execution on respective portions of the database data; and
- combine results of the plurality of sub-queries to form a response to the request.
- Example 17 is the non-transitory computer-readable storage medium of example 16, wherein a first of the sub-queries is linearly scalable.
- Example 18 is the non-transitory computer-readable storage medium of any of examples 16 to 17, including further instructions that, when executed, cause the machine to combine the results by merging and reducing the results of the sub-queries.
- Example 19 is the non-transitory computer-readable storage medium of any of examples 16 to 18, wherein a first of the sub-queries does not shuffle the respective portion of the database data.
- Example 20 is the non-transitory computer-readable storage medium of any of examples 16 to 19, wherein a distribution pattern of the database data is not dependent on data content.
- Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/491,146 US20220019601A1 (en) | 2018-03-26 | 2021-09-30 | Methods, apparatus, and systems to aggregate partitioned computer database data |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/935,746 US11138230B2 (en) | 2018-03-26 | 2018-03-26 | Methods, apparatus, and systems to aggregate partitioned computer database data |
| US17/491,146 US20220019601A1 (en) | 2018-03-26 | 2021-09-30 | Methods, apparatus, and systems to aggregate partitioned computer database data |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/935,746 Continuation US11138230B2 (en) | 2018-03-26 | 2018-03-26 | Methods, apparatus, and systems to aggregate partitioned computer database data |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20220019601A1 true US20220019601A1 (en) | 2022-01-20 |
Family
ID=67984247
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/935,746 Active 2038-09-22 US11138230B2 (en) | 2018-03-26 | 2018-03-26 | Methods, apparatus, and systems to aggregate partitioned computer database data |
| US17/491,146 Abandoned US20220019601A1 (en) | 2018-03-26 | 2021-09-30 | Methods, apparatus, and systems to aggregate partitioned computer database data |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/935,746 Active 2038-09-22 US11138230B2 (en) | 2018-03-26 | 2018-03-26 | Methods, apparatus, and systems to aggregate partitioned computer database data |
Country Status (4)
| Country | Link |
|---|---|
| US (2) | US11138230B2 (en) |
| EP (1) | EP3776254B1 (en) |
| CN (1) | CN112204541A (en) |
| WO (1) | WO2019190983A1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240362219A1 (en) * | 2023-04-28 | 2024-10-31 | Ocient Holdings LLC | Query execution in a database system utilizing segment handles |
| US20240403857A1 (en) * | 2023-06-01 | 2024-12-05 | Vocalink International Limited | Systems and methods for aggregate routing among interconnecting directories |
| US12499110B2 (en) * | 2024-07-12 | 2025-12-16 | Ocient Holdings LLC | Query execution in a database system utilizing segment handles |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11138230B2 (en) | 2018-03-26 | 2021-10-05 | Mcafee, Llc | Methods, apparatus, and systems to aggregate partitioned computer database data |
| US11556710B2 (en) * | 2018-05-11 | 2023-01-17 | International Business Machines Corporation | Processing entity groups to generate analytics |
| US11580105B2 (en) | 2018-10-31 | 2023-02-14 | Anaplan, Inc. | Method and system for implementing subscription barriers in a distributed computation system |
| US11475003B1 (en) * | 2018-10-31 | 2022-10-18 | Anaplan, Inc. | Method and system for servicing query requests using dataspaces |
| US11281683B1 (en) | 2018-10-31 | 2022-03-22 | Anaplan, Inc. | Distributed computation system for servicing queries using revisions maps |
| US11354324B1 (en) | 2018-10-31 | 2022-06-07 | Anaplan, Inc. | Method and system for servicing query requests using revisions maps |
| US11481378B1 (en) * | 2018-10-31 | 2022-10-25 | Anaplan, Inc. | Method and system for servicing query requests using document-based metadata |
| US11573927B1 (en) | 2018-10-31 | 2023-02-07 | Anaplan, Inc. | Method and system for implementing hidden subscriptions in a distributed computation system |
| US11494423B1 (en) * | 2020-09-09 | 2022-11-08 | Amazon Technologies, Inc. | Generating partial boolean query results based on client-specified latency constraints |
| CN113238804B (en) * | 2021-05-17 | 2022-06-28 | 深圳掌酷软件有限公司 | A system and method for waking up a designated application based on an intelligent terminal screen-off state |
| CN113468208B (en) * | 2021-07-19 | 2024-08-23 | 网易(杭州)网络有限公司 | Method and device for generating data query statement, server and storage medium |
| US11928120B2 (en) * | 2022-05-31 | 2024-03-12 | Microsoft Technology, LLC. | Distributed data query under data flow limitations |
| US12360779B1 (en) * | 2023-06-16 | 2025-07-15 | Google Llc | Replicating configuration data across a multi-tenant system |
Citations (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1995026003A1 (en) * | 1994-03-24 | 1995-09-28 | Software Ag | Database query system |
| US5742806A (en) * | 1994-01-31 | 1998-04-21 | Sun Microsystems, Inc. | Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system |
| US6285997B1 (en) * | 1998-11-16 | 2001-09-04 | International Business Machines Corporation | Query optimization with deferred update and autonomous sources |
| US20040034616A1 (en) * | 2002-04-26 | 2004-02-19 | Andrew Witkowski | Using relational structures to create and support a cube within a relational database system |
| WO2005057429A1 (en) * | 2003-12-08 | 2005-06-23 | Koninklijke Philips Electronics N.V. | Searching in a melody database |
| WO2006009822A2 (en) * | 2004-06-18 | 2006-01-26 | Nexql | Integrated database indexing system |
| US20060294087A1 (en) * | 2005-06-23 | 2006-12-28 | Vladimir Mordvinov | System and method for processing and decomposition of a multidimensional query against a relational data source |
| US20080040317A1 (en) * | 2006-08-09 | 2008-02-14 | Dettinger Richard D | Decomposed query conditions |
| US20100017395A1 (en) * | 2008-07-16 | 2010-01-21 | Sapphire Information Systems Ltd. | Apparatus and methods for transforming relational queries into multi-dimensional queries |
| US20100094851A1 (en) * | 2008-10-09 | 2010-04-15 | International Business Machines Corporation | Node-level sub-queries in distributed databases |
| US20100125594A1 (en) * | 2008-11-14 | 2010-05-20 | The Regents Of The University Of California | Method and Apparatus for Improving Performance of Approximate String Queries Using Variable Length High-Quality Grams |
| US20110173164A1 (en) * | 2010-01-13 | 2011-07-14 | International Business Machines Corporation | Storing tables in a database system |
| US7984043B1 (en) * | 2007-07-24 | 2011-07-19 | Amazon Technologies, Inc. | System and method for distributed query processing using configuration-independent query plans |
| US8204873B2 (en) * | 2009-08-26 | 2012-06-19 | Hewlett-Packard Development Company, L.P. | System and method for query expression optimization |
| US20120310916A1 (en) * | 2010-06-04 | 2012-12-06 | Yale University | Query Execution Systems and Methods |
| US20130117307A1 (en) * | 2011-11-08 | 2013-05-09 | Sybase, Inc. | Snapshot isolation support for distributed query processing in a shared disk database cluster |
| US20130239093A1 (en) * | 2012-03-09 | 2013-09-12 | Microsoft Corporation | Parallelizing top-down interprocedural analysis |
| US20130262502A1 (en) * | 2012-03-30 | 2013-10-03 | Khalifa University of Science, Technology, and Research | Method and system for continuous query processing |
| US8880502B2 (en) * | 2004-03-15 | 2014-11-04 | International Business Machines Corporation | Searching a range in a set of values in a network with distributed storage entities |
| WO2015099961A1 (en) * | 2013-12-02 | 2015-07-02 | Qbase, LLC | Systems and methods for hosting an in-memory database |
| US20160103877A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Joining data across a parallel database and a distributed processing system |
| US20160259839A1 (en) * | 2013-11-08 | 2016-09-08 | International Business Machines Corporation | Reporting and summarizing metrics in sparse relationships on an oltp database |
| US20160350375A1 (en) * | 2015-05-29 | 2016-12-01 | Oracle International Corporation | Optimizing execution plans for in-memory-aware joins |
| US20160371355A1 (en) * | 2015-06-19 | 2016-12-22 | Nuodb, Inc. | Techniques for resource description framework modeling within distributed database systems |
| WO2017094009A1 (en) * | 2015-12-03 | 2017-06-08 | Dyadic Security Ltd | Securing sql based databases with cryptographic protocols |
| US20170185647A1 (en) * | 2015-12-23 | 2017-06-29 | Gluent Inc. | System and method for adaptive filtering of data requests |
| US20170308592A1 (en) * | 2016-04-22 | 2017-10-26 | Cloudera, Inc. | Interactive identification of similar sql queries |
| US20180113905A1 (en) * | 2016-10-26 | 2018-04-26 | Sap Se | Optimization of split queries |
| US10528599B1 (en) * | 2016-12-16 | 2020-01-07 | Amazon Technologies, Inc. | Tiered data processing for distributed data |
| US10649995B2 (en) * | 2010-04-19 | 2020-05-12 | Salesforce.Com, Inc. | Methods and systems for optimizing queries in a multi-tenant store |
| US11030192B2 (en) * | 2015-01-30 | 2021-06-08 | Splunk Inc. | Updates to access permissions of sub-queries at run time |
Family Cites Families (33)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5745746A (en) * | 1996-06-24 | 1998-04-28 | International Business Machines Corporation | Method for localizing execution of subqueries and determining collocation of execution of subqueries in a parallel database |
| US6092062A (en) * | 1997-06-30 | 2000-07-18 | International Business Machines Corporation | Relational database query optimization to perform query evaluation plan, pruning based on the partition properties |
| US6405198B1 (en) * | 1998-09-04 | 2002-06-11 | International Business Machines Corporation | Complex data query support in a partitioned database system |
| US7167853B2 (en) * | 1999-05-20 | 2007-01-23 | International Business Machines Corporation | Matching and compensation tests for optimizing correlated subqueries within query using automatic summary tables |
| JP2002169808A (en) * | 2000-11-30 | 2002-06-14 | Hitachi Ltd | Secure multi-database system |
| US6795817B2 (en) * | 2001-05-31 | 2004-09-21 | Oracle International Corporation | Method and system for improving response time of a query for a partitioned database object |
| US7246159B2 (en) * | 2002-11-01 | 2007-07-17 | Fidelia Technology, Inc | Distributed data gathering and storage for use in a fault and performance monitoring system |
| CA2427202A1 (en) * | 2003-04-30 | 2004-10-30 | Ibm Canada Limited - Ibm Canada Limitee | Method and system for aggregation subquery join elimination |
| CA2556979A1 (en) | 2004-02-21 | 2005-10-20 | Datallegro, Inc. | Ultra-shared-nothing parallel database |
| US7814042B2 (en) * | 2004-08-17 | 2010-10-12 | Oracle International Corporation | Selecting candidate queries |
| US20070226177A1 (en) * | 2006-03-23 | 2007-09-27 | International Business Machines Corporation | Evaluating a current partitioning of a database |
| US7962442B2 (en) * | 2006-08-31 | 2011-06-14 | International Business Machines Corporation | Managing execution of a query against selected data partitions of a partitioned database |
| US7657505B2 (en) * | 2007-01-19 | 2010-02-02 | Microsoft Corporation | Data retrieval from a database utilizing efficient eager loading and customized queries |
| US7805456B2 (en) * | 2007-02-05 | 2010-09-28 | Microsoft Corporation | Query pattern to enable type flow of element types |
| US8688683B2 (en) * | 2009-11-30 | 2014-04-01 | Business Objects Software Ltd. | Query plan reformulation |
| US8655901B1 (en) * | 2010-06-23 | 2014-02-18 | Google Inc. | Translation-based query pattern mining |
| US10860563B2 (en) * | 2012-01-06 | 2020-12-08 | Microsoft Technology Licensing, Llc | Distributed database with modular blocks and associated log files |
| US10579634B2 (en) | 2012-08-30 | 2020-03-03 | Citus Data Bilgi Islemleri Ticaret A.S. | Apparatus and method for operating a distributed database with foreign tables |
| US9563663B2 (en) * | 2012-09-28 | 2017-02-07 | Oracle International Corporation | Fast path evaluation of Boolean predicates |
| US9053210B2 (en) * | 2012-12-14 | 2015-06-09 | Microsoft Technology Licensing, Llc | Graph query processing using plurality of engines |
| US9081826B2 (en) | 2013-01-07 | 2015-07-14 | Facebook, Inc. | System and method for distributed database query engines |
| US9152669B2 (en) * | 2013-03-13 | 2015-10-06 | Futurewei Technologies, Inc. | System and method for distributed SQL join processing in shared-nothing relational database clusters using stationary tables |
| US9607042B2 (en) * | 2013-09-16 | 2017-03-28 | Mastercard International Incorporated | Systems and methods for optimizing database queries |
| GB2521197A (en) * | 2013-12-13 | 2015-06-17 | Ibm | Incremental and collocated redistribution for expansion of an online shared nothing database |
| US10120902B2 (en) * | 2014-02-20 | 2018-11-06 | Citus Data Bilgi Islemleri Ticaret A.S. | Apparatus and method for processing distributed relational algebra operators in a distributed database |
| US10437843B2 (en) * | 2014-07-29 | 2019-10-08 | Microsoft Technology Licensing, Llc | Optimization of database queries via transformations of computation graph |
| US10185730B2 (en) * | 2014-12-31 | 2019-01-22 | Nexenta Systems, Inc. | Methods and systems for key-value-tuple-encoded storage |
| WO2017062288A1 (en) * | 2015-10-07 | 2017-04-13 | Oracle International Corporation | Relational database organization for sharding |
| US10331634B2 (en) * | 2015-10-07 | 2019-06-25 | Oracle International Corporation | Request routing and query processing in a sharded database |
| US20170337232A1 (en) * | 2016-05-19 | 2017-11-23 | Fifth Dimension Holdings Ltd. | Methods of storing and querying data, and systems thereof |
| US10116725B2 (en) * | 2016-05-27 | 2018-10-30 | Intuit Inc. | Processing data retrieval requests in a graph projection of an application programming interfaces (API) |
| US10885027B2 (en) | 2016-08-24 | 2021-01-05 | Nec Corporation | Progressive processing for querying system behavior |
| US11138230B2 (en) | 2018-03-26 | 2021-10-05 | Mcafee, Llc | Methods, apparatus, and systems to aggregate partitioned computer database data |
-
2018
- 2018-03-26 US US15/935,746 patent/US11138230B2/en active Active
-
2019
- 2019-03-25 WO PCT/US2019/023877 patent/WO2019190983A1/en not_active Ceased
- 2019-03-25 CN CN201980030556.6A patent/CN112204541A/en active Pending
- 2019-03-25 EP EP19776041.6A patent/EP3776254B1/en active Active
-
2021
- 2021-09-30 US US17/491,146 patent/US20220019601A1/en not_active Abandoned
Patent Citations (32)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5742806A (en) * | 1994-01-31 | 1998-04-21 | Sun Microsystems, Inc. | Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system |
| WO1995026003A1 (en) * | 1994-03-24 | 1995-09-28 | Software Ag | Database query system |
| US6285997B1 (en) * | 1998-11-16 | 2001-09-04 | International Business Machines Corporation | Query optimization with deferred update and autonomous sources |
| US20040034616A1 (en) * | 2002-04-26 | 2004-02-19 | Andrew Witkowski | Using relational structures to create and support a cube within a relational database system |
| WO2005057429A1 (en) * | 2003-12-08 | 2005-06-23 | Koninklijke Philips Electronics N.V. | Searching in a melody database |
| US8880502B2 (en) * | 2004-03-15 | 2014-11-04 | International Business Machines Corporation | Searching a range in a set of values in a network with distributed storage entities |
| WO2006009822A2 (en) * | 2004-06-18 | 2006-01-26 | Nexql | Integrated database indexing system |
| US20060294087A1 (en) * | 2005-06-23 | 2006-12-28 | Vladimir Mordvinov | System and method for processing and decomposition of a multidimensional query against a relational data source |
| US20080040317A1 (en) * | 2006-08-09 | 2008-02-14 | Dettinger Richard D | Decomposed query conditions |
| US20130138633A1 (en) * | 2006-08-09 | 2013-05-30 | International Business Machines Corporation | Decomposed query conditions |
| US7984043B1 (en) * | 2007-07-24 | 2011-07-19 | Amazon Technologies, Inc. | System and method for distributed query processing using configuration-independent query plans |
| US20100017395A1 (en) * | 2008-07-16 | 2010-01-21 | Sapphire Information Systems Ltd. | Apparatus and methods for transforming relational queries into multi-dimensional queries |
| US20100094851A1 (en) * | 2008-10-09 | 2010-04-15 | International Business Machines Corporation | Node-level sub-queries in distributed databases |
| US20100125594A1 (en) * | 2008-11-14 | 2010-05-20 | The Regents Of The University Of California | Method and Apparatus for Improving Performance of Approximate String Queries Using Variable Length High-Quality Grams |
| US8204873B2 (en) * | 2009-08-26 | 2012-06-19 | Hewlett-Packard Development Company, L.P. | System and method for query expression optimization |
| US20110173164A1 (en) * | 2010-01-13 | 2011-07-14 | International Business Machines Corporation | Storing tables in a database system |
| US10649995B2 (en) * | 2010-04-19 | 2020-05-12 | Salesforce.Com, Inc. | Methods and systems for optimizing queries in a multi-tenant store |
| US20120310916A1 (en) * | 2010-06-04 | 2012-12-06 | Yale University | Query Execution Systems and Methods |
| US20130117307A1 (en) * | 2011-11-08 | 2013-05-09 | Sybase, Inc. | Snapshot isolation support for distributed query processing in a shared disk database cluster |
| US20130239093A1 (en) * | 2012-03-09 | 2013-09-12 | Microsoft Corporation | Parallelizing top-down interprocedural analysis |
| US20130262502A1 (en) * | 2012-03-30 | 2013-10-03 | Khalifa University of Science, Technology, and Research | Method and system for continuous query processing |
| US20160259839A1 (en) * | 2013-11-08 | 2016-09-08 | International Business Machines Corporation | Reporting and summarizing metrics in sparse relationships on an oltp database |
| WO2015099961A1 (en) * | 2013-12-02 | 2015-07-02 | Qbase, LLC | Systems and methods for hosting an in-memory database |
| US20160103877A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Joining data across a parallel database and a distributed processing system |
| US11030192B2 (en) * | 2015-01-30 | 2021-06-08 | Splunk Inc. | Updates to access permissions of sub-queries at run time |
| US20160350375A1 (en) * | 2015-05-29 | 2016-12-01 | Oracle International Corporation | Optimizing execution plans for in-memory-aware joins |
| US20160371355A1 (en) * | 2015-06-19 | 2016-12-22 | Nuodb, Inc. | Techniques for resource description framework modeling within distributed database systems |
| WO2017094009A1 (en) * | 2015-12-03 | 2017-06-08 | Dyadic Security Ltd | Securing sql based databases with cryptographic protocols |
| US20170185647A1 (en) * | 2015-12-23 | 2017-06-29 | Gluent Inc. | System and method for adaptive filtering of data requests |
| US20170308592A1 (en) * | 2016-04-22 | 2017-10-26 | Cloudera, Inc. | Interactive identification of similar sql queries |
| US20180113905A1 (en) * | 2016-10-26 | 2018-04-26 | Sap Se | Optimization of split queries |
| US10528599B1 (en) * | 2016-12-16 | 2020-01-07 | Amazon Technologies, Inc. | Tiered data processing for distributed data |
Non-Patent Citations (3)
| Title |
|---|
| Eleftherios Kalogeros et al., "Redundancy in Linked Data Partitioning for Efficient Query Evaluation", 3rd International Conference on Future Internet of Things and Cloud Aug 2015, (Page(s): 497-504) * |
| L. Ghionna et al., "Hypertree Decompositions for Query Optimization", IEEE 23rd International Conference on Data Engineering, April, 2007 , (Page(s): 36-45) * |
| Yuxiang Wang et al., "AQP++: A Hybrid Approximate Query Processing Framework for Generalized Aggregation Queries", International Conference on Advanced Cloud and Big Data Aug 2016, (CBD) (Page(s): 56-62) * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240362219A1 (en) * | 2023-04-28 | 2024-10-31 | Ocient Holdings LLC | Query execution in a database system utilizing segment handles |
| US20240403857A1 (en) * | 2023-06-01 | 2024-12-05 | Vocalink International Limited | Systems and methods for aggregate routing among interconnecting directories |
| US12499110B2 (en) * | 2024-07-12 | 2025-12-16 | Ocient Holdings LLC | Query execution in a database system utilizing segment handles |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3776254A1 (en) | 2021-02-17 |
| US11138230B2 (en) | 2021-10-05 |
| EP3776254A4 (en) | 2021-12-22 |
| US20190294724A1 (en) | 2019-09-26 |
| CN112204541A (en) | 2021-01-08 |
| EP3776254B1 (en) | 2023-12-06 |
| WO2019190983A1 (en) | 2019-10-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220019601A1 (en) | Methods, apparatus, and systems to aggregate partitioned computer database data | |
| CN109213792B (en) | Data processing method, server, client, device and readable storage medium | |
| JP6887544B2 (en) | Enriching events with dynamically typed big data for event processing | |
| US11354314B2 (en) | Method for connecting a relational data store's meta data with hadoop | |
| US8788458B2 (en) | Data caching for mobile applications | |
| US10262032B2 (en) | Cache based efficient access scheduling for super scaled stream processing systems | |
| US8635250B2 (en) | Methods and systems for deleting large amounts of data from a multitenant database | |
| US9405811B2 (en) | Systems and methods for interest-driven distributed data server systems | |
| US11042568B2 (en) | Proxy views for extended monitoring of database systems | |
| US11301447B2 (en) | Entity database | |
| CN112527848B (en) | Report data query method, device and system based on multiple data sources and storage medium | |
| US10223437B2 (en) | Adaptive data repartitioning and adaptive data replication | |
| US7987193B2 (en) | System and method for setting status flags for mobile data distribution based on distribution rules | |
| CN110287189B (en) | Method and system for processing mobile vehicle data based on spark streaming | |
| WO2016107155A1 (en) | Asset management method and system | |
| US11429636B2 (en) | Smart elastic scaling based on application scenarios | |
| US11561974B2 (en) | Cross-datasource querying using composite shapes | |
| EP3992808A1 (en) | Data processing method and related apparatus | |
| WO2023028330A1 (en) | System and method for query acceleration for use with data analytics environments | |
| US9934304B2 (en) | Systems and methods for memory optimization interest-driven business intelligence systems | |
| US20230251953A1 (en) | Client-side telemetry data filter model | |
| US10515063B1 (en) | System and method for real-time data acquisition and display | |
| US20150074150A1 (en) | Data management via active and inactive table space containers | |
| US12327132B2 (en) | Request processing methods and apparatuses, computing device and storage medium | |
| US9465846B2 (en) | Storing events from a datastream |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| AS | Assignment |
Owner name: MCAFEE, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MICHELIS, PABLO A.;STEWART, BRIAN H.;SIGNING DATES FROM 20180326 TO 20180329;REEL/FRAME:058397/0643 |
|
| AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT AND COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:059354/0335 Effective date: 20220301 |
|
| AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE THE PATENT TITLES AND REMOVE DUPLICATES IN THE SCHEDULE PREVIOUSLY RECORDED AT REEL: 059354 FRAME: 0335. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:060792/0307 Effective date: 20220301 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |