[go: up one dir, main page]

WO2022091204A1 - Dispositif de traitement d'analyse de données, procédé de traitement d'analyse de données et programme - Google Patents

Dispositif de traitement d'analyse de données, procédé de traitement d'analyse de données et programme Download PDF

Info

Publication number
WO2022091204A1
WO2022091204A1 PCT/JP2020/040213 JP2020040213W WO2022091204A1 WO 2022091204 A1 WO2022091204 A1 WO 2022091204A1 JP 2020040213 W JP2020040213 W JP 2020040213W WO 2022091204 A1 WO2022091204 A1 WO 2022091204A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
multidimensional
range
storage area
cube
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/040213
Other languages
English (en)
Japanese (ja)
Inventor
哲 八木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to PCT/JP2020/040213 priority Critical patent/WO2022091204A1/fr
Priority to JP2022558636A priority patent/JP7464142B2/ja
Publication of WO2022091204A1 publication Critical patent/WO2022091204A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Definitions

  • One aspect of the present invention relates to a data analysis processing apparatus, a data analysis processing method, and a program.
  • Real-world events change temporally, spatially, or both. In other words, an event is created, disappeared, or a state transitions.
  • the data that embodies the event can be mapped to a multidimensional cube, as it is called in data analysis technology.
  • the data analysis processing device executes an online analytical processing (OLAP) operation on the multidimensional cube to analyze the data.
  • OLAP online analytical processing
  • the data analysis processing apparatus uses, for example, a method as disclosed in Non-Patent Document 1.
  • the data analysis processing device executes an OLAP operation on a certain multidimensional cube
  • the argument instructed by the client is used as an argument of the OLAP operation.
  • the data analysis processing device can use a relational database to execute OLAP operations. Therefore, when performing an OLAP operation on a certain multidimensional cube, when trying to use the data constituting another multidimensional cube as an argument of the OLAP operation, the data constituting the certain multidimensional cube is newly used.
  • searching / manipulating data constituting other multidimensional cubes as a key it is possible to use the means for speeding up the relational database. For example, a speed-up means as disclosed in Non-Patent Document 2 can be used.
  • Data of up to 2 items of the data of each dimension / data representing each characteristic that composes the multidimensional cube can be stored in one of the list of one-dimensional value ranges, the list of names, and the hash function that are common among the multidimensional cubes. It is classified according to the value range based on it, and stored and managed in the storage area corresponding to the only value range to which the data belongs.
  • the range of search / operation is limited to the storage area corresponding to the above, and when a plurality of searches / operations are executed at the same time, the conflict of the storage area to be searched / operated is further avoided.
  • the means can be used only in a limited range. That is, the method that can be applied when each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is one-dimensional data cannot be applied when each of the above data is multidimensional data. Further, even when the data classified by the range belongs to a plurality of ranges, it is not possible to avoid the conflict of the storage area to be searched / operated and promote the speedup. Specifically, when the conventional data analysis processing device newly performs an OLAP operation on a certain multidimensional cube, when trying to use the data constituting another multidimensional cube as an argument of the OLAP operation.
  • the means for speeding up the relational database can be used.
  • the range that can be speeded up was limited.
  • the data classified by the range is classified by the range based on one of the list of one-dimensional range, the list of names, and the hash function common among the cubes, and the data classified by the range belongs to a single range, the data belongs to the only one.
  • the range to be searched / operated is limited to the storage area corresponding to the same value range of both multidimensional cubes, and multiple searches / operations are performed.
  • the speed can be increased by further avoiding the conflict of the storage area to be searched / operated.
  • the data can be classified by the multidimensional value range common among the multidimensional cubes, or the value range.
  • the data classified in (1) belongs to a plurality of price ranges, it cannot be accumulated and managed in duplicate in the storage area corresponding to each price range.
  • the present invention has been made by paying attention to the above circumstances, and is intended to provide a technique capable of executing OLAP operations on a multidimensional cube at high speed.
  • the data analysis processing apparatus includes a multidimensional database, an OLAP operation execution unit, and a multidimensional database management unit.
  • the multidimensional database stores data embodying a real-world event in a multidimensional cube constructed for each subject in association with the identifier of the event.
  • the OLAP operation execution unit executes an OLAP (Online Analytical Processing) operation on a multidimensional cube in response to a request from a client. Further, when the OLAP operation execution unit executes an OLAP operation on a certain multidimensional cube, at least one of the arguments instructed by the client as the argument of the OLAP operation or the data constituting another multidimensional cube. To use.
  • the multidimensional database management unit manages time-dimensional data, spatial-dimensional data, multiple types of unique-dimensional data, and data representing multiple types of characteristics in a multidimensional cube. If each of the data constituting the multidimensional cube is multidimensional data, the multidimensional database management unit classifies the multidimensional data in a multidimensional value range common among the multidimensional cubes. More specifically, the multidimensional database management unit determines that if each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is multidimensional data, the multidimensional value range common to the multidimensional cubes is used. Classify by. When the data classified by the range belongs to a single range, the multidimensional database management unit stores and manages the data in the storage area corresponding to the range.
  • the multidimensional database management unit When the data classified by the range belongs to multiple ranges, the multidimensional database management unit stores and manages the actual data or the reference of the data in the storage area corresponding to each range. do. In addition, the multidimensional database management unit simply uses the range used for classification as an index when searching / manipulating the data constituting the multidimensional cube using the data constituting another multidimensional cube as a key. When executing one search / operation, the range to be searched / operated is in the storage area corresponding to the same range of both multidimensional cubes and the storage area corresponding to the range near the same range of both multidimensional cubes. In addition to limiting the number of searches / operations, when multiple searches / operations are executed in parallel, conflicts in the storage area to be searched / operated are further avoided.
  • FIG. 1 is a functional block diagram showing an example of a data analysis processing apparatus according to the present invention.
  • FIG. 2 is a diagram for explaining a data storage state in the multidimensional database 16.
  • FIG. 3 is a diagram showing an example of a range of a wide range including the widest data or the main data.
  • FIG. 4 is a diagram showing an example of a storage area corresponding to a hierarchy of a range in which a higher range includes a lower adjacent range.
  • FIG. 5 is a sequence diagram for explaining an example of the operation of the data analysis processing device 10.
  • FIG. 6 is a flowchart showing an example of the processing procedure of the multidimensional database management unit 15.
  • FIG. 1 is a functional block diagram showing an example of a data analysis processing apparatus according to the present invention.
  • FIG. 2 is a diagram for explaining a data storage state in the multidimensional database 16.
  • FIG. 3 is a diagram showing an example of a range of a wide range including the widest data or the main data.
  • FIG. 7 is a diagram for explaining an example of processing for limiting the search / operation range in the storage area by the multidimensional database management unit 15.
  • FIG. 8 is a diagram for explaining another example of the process of limiting the search / operation range in the storage area by the multidimensional database management unit 15.
  • FIG. 9 is a diagram for explaining an example of an operation of avoiding a conflict in a storage area searched / operated by the multidimensional database management unit 15.
  • FIG. 10 is a diagram for explaining another example of the operation of avoiding the conflict of the storage area searched / operated by the multidimensional database management unit 15.
  • FIG. 11 is a diagram for explaining an example of a process in which the multidimensional database management unit 15 selects a hierarchy of a range.
  • FIG. 12 is a schematic diagram for explaining an example of an operation of suppressing redundant processing when a range corresponding to a plurality of storage areas is selected.
  • FIG. 13 is a diagram showing an example of tabular data representing the situation shown in FIG.
  • FIG. 14 is a block diagram showing an example of the hardware configuration of the data analysis processing apparatus according to the present invention.
  • FIG. 1 is a functional block diagram showing an example of a data analysis processing apparatus according to the present invention.
  • the data analysis processing device 10 includes an OLAP operation execution unit 11, a multidimensional database management unit 15, and a multidimensional database 16.
  • the multidimensional database 16 stores data embodying an event in the real world in a multidimensional cube in association with an event identifier for identifying an event that is an information source of the data.
  • Multidimensional cubes are constructed by subject.
  • the accumulated data includes time-dimensional data, spatial-dimensional data, a plurality of types of unique-dimensional data, and data representing a plurality of types of characteristics.
  • subject-dependent data There are multiple types of subject-dependent data in the eigendimensional dimension.
  • the characteristic data is identified by time-dimensional, spatial-dimensional, and eigen-dimensional data.
  • the multidimensional database 16 When each of the data of each dimension constituting the multidimensional cube / the data representing each characteristic is the multidimensional data, the multidimensional database 16 is the multidimensional data in the multidimensional value range common among the multidimensional cubes. To classify. Then, when the data classified by the range belongs to a single range, the multidimensional database 16 stores the data in the storage area corresponding to the range. Further, when the data classified by the range belongs to a plurality of ranges, the multidimensional database 16 duplicately stores the data entity or the reference in the storage area corresponding to each range.
  • FIG. 2 is a diagram for explaining the data accumulation state in the multidimensional database 16.
  • data a to c which are two-dimensional data representing features and the like
  • value ranges 1 to 4 which are two-dimensional value ranges representing areas and the like
  • data a to c are in the range 1 and data are in the range 2.
  • Data c is classified into b and range 3.
  • the data a belongs to the range 1
  • the data b belongs to the range 1 and 2
  • the data c belongs to the range 1 and 3.
  • the main body of the data entity is stored in the storage area corresponding to the range corresponding to the widest overlapping range, and the entity is duplicated or duplicated in the storage area corresponding to the other ranges.
  • the reference is, for example, the address of the data stored in the storage.
  • Distinguish between the body of an entity that accumulates in a storage area and a duplicate of an entity or a reference to the body of an entity for example, by partitioning within the storage area to store, marking the data to be stored, or creating an index. be able to.
  • the replication of the entity and the reference to the body of the entity accumulated in the storage area are, arbitrarily or according to the criteria, from the replication of the entity to the reference to the body of the entity, from the reference to the body of the entity to the replication of the entity. Can be changed.
  • the range is set to, for example, a size that can include the widest data or a size that can contain the main data. By doing so, the number of range to which the data belongs can be suppressed to the number of adjacent range at most.
  • the multidimensional database 16 classifies the multidimensional data in the multidimensional range, and when the data classified in the range belongs to a single range, the multidimensional database 16 stores the data in the storage area corresponding to the range.
  • the multidimensional database 16 duplicately stores the data entity or the reference in the storage area corresponding to each range.
  • * represents the substance (main body) of the data
  • ** represents the duplication of the substance of the data / the reference to the body of the substance.
  • FIG. 3 is a diagram showing an example of a range of a wide range including the widest data or the main data.
  • the data is re-accumulated according to the new range, including the accumulated data. ..
  • a hierarchy of the range in which the upper range includes the lower adjacent range is constructed, and the hierarchy of the range to be used is selected according to the situation.
  • the hierarchy of the range corresponding to the plurality of storage areas is selected for the multidimensional database 16, the data duplicated and stored in the plurality of storage areas is not used.
  • FIG. 4 is a diagram showing an example of a storage area corresponding to the hierarchy of the range in which the upper range includes the lower adjacent range.
  • the OLAP operation execution unit 11 executes an OLAP operation on multidimensional data according to the OLAP operation received from the client 20 and the arguments. That is, the OLAP operation execution unit 11 instructs the multidimensional database management unit 15 to perform an OLAP operation on the multidimensional data. Further, when the OLAP operation execution unit 11 receives the result of the instructed operation from the multidimensional database management unit 15, the OLAP operation execution unit 11 transmits the operation result to the client 20.
  • the multidimensional database management unit 15 refers to the information in the value range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as index information in response to the instruction of the OLAP operation execution unit 11. Specify the storage area to be searched / operated based on the referenced index information. Further, the multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube in parallel with the range corresponding to the storage area as the processing unit. Then, when the search / operation of all the storage areas to be searched / operated is completed, the multidimensional database management unit 15 aggregates the search / operation results and returns the operation result to the OLAP operation execution unit 11. Further, the multidimensional database 16 is managed so that the data is accumulated and used in the multidimensional database 16 as described above.
  • FIG. 5 is a sequence diagram for explaining an example of the operation of the data analysis processing device 10.
  • the OLAP operation execution unit 11 receives an OLAP operation and an argument from the client 20, it instructs the multidimensional database management unit 15 to operate the multidimensional data accordingly.
  • the multidimensional database management unit 15 refers to and refers to the information in the value range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as index information in response to the operation instruction of the multidimensional data. Specify the storage area to be searched / operated based on the index information. The multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube in parallel in parallel with the range corresponding to the storage area as the processing unit (“PARALLELL” surrounded by the broken line in FIG. 5).
  • the multidimensional database management unit 15 repeats until the search / operation of all the storage areas to be searched / operated is completed (“LOOP” surrounded by the broken line in FIG. 5), and when the search / operation is completed, the search / operation results are aggregated and the operation results are displayed. Return it to the OLAP operation execution unit 11.
  • the OLAP operation execution unit 11 repeats the instruction to the multidimensional database management unit 15 according to the received OLAP operation and the contents of the argument ("LOOP" surrounded by the broken line in FIG. 5).
  • the OLAP operation execution unit 11 acquires the final operation result corresponding to the OLAP operation and the contents of the argument, the OLAP operation execution unit 11 returns the operation result of the OLAP operation to the client 20.
  • FIG. 6 is a flowchart showing an example of the processing procedure of the multidimensional database management unit 15.
  • the multidimensional database management unit 15 waits for the reception of the operation instruction of the multidimensional data from the OLAP operation execution unit 11 (step S11).
  • the multidimensional database management unit 15 refers to the information in the range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as index information (step S12).
  • the multidimensional database management unit 15 specifies a storage area to be searched / operated based on the referenced index information (step S13), and configures a multidimensional cube with the value range corresponding to the storage area as a processing unit.
  • Search / operate data in parallel steps S141 to S14N). This process is repeated in step S15 until it is determined that the search / operation of all the storage areas to be searched / operated has been completed.
  • the multidimensional database management unit 15 sets the storage area corresponding to the same range of both multidimensional cubes and the range near the same range of both multidimensional cubes. Limit the search / operation range to the corresponding storage area. Further, when a plurality of searches / operations are executed in parallel, the multidimensional database management unit 15 further avoids a conflict in the storage area to be searched / operated. Then, the multidimensional database management unit 15 aggregates the search / operation results (step S16).
  • the multidimensional database management unit 15 configures another multidimensional cube as an argument of the OLAP operation when executing an OLAP operation on a certain multidimensional cube in response to an operation instruction of the multidimensional data.
  • the data constituting a certain multidimensional cube is searched / operated by using the data constituting another multidimensional cube as a key. That is, when the multidimensional database management unit 15 executes a single search / operation by using the range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic as an index.
  • the multidimensional database management unit 15 limits the search / operation range to the storage area corresponding to the same range of both multidimensional cubes and the storage area corresponding to the range in the vicinity of the same range of both multidimensional cubes. Further, when a plurality of searches / operations are executed in parallel, the multidimensional database management unit 15 further avoids a conflict in the storage area to be searched / operated.
  • FIG. 7 is a diagram for explaining an example of processing for limiting the search / operation range in the storage area by the multidimensional database management unit 15.
  • the multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube 1 using the data constituting the multidimensional cube 0 as a key
  • the value ranges 01, 02, 04 The data included in or superimposed on the data classified in the corresponding storage areas 01, 02, 04 and stored and managed in the corresponding storage areas 11, 12, 14 are classified into the value areas 11, 12, and 14, respectively, and stored and managed in the corresponding storage areas 11, 12, 14.
  • FIG. 8 is a diagram for explaining another example of the process of limiting the search / operation range in the storage area by the multidimensional database management unit 15.
  • the multidimensional database management unit 15 searches / operates the data constituting the multidimensional cube 1 using the data constituting the multidimensional cube 0 as a key, it is classified into a range 01 and a range.
  • the data in the vicinity represented by the dotted circle from the center of gravity of the data stored and managed in the storage area corresponding to 01 is the range 11 and the range 12, 14, 15 within the range of the radius of the dotted circle from the range 11.
  • the range to be searched / operated can be limited to the pair of the area 01 and the areas 11, 12, 14, and 15, which are the storage areas corresponding to the range of. The same applies to the data classified into other range and stored and managed in the storage area corresponding to the range.
  • the multidimensional database management unit 15 specifies the storage area to be searched / operated based on the referenced index information, the storage area corresponding to the same range of both multidimensional cubes and the two multidimensional cubes.
  • the range to be searched / operated is limited to the storage area corresponding to the range in the vicinity of the same range of.
  • FIG. 9 is a diagram for explaining an example of an operation of avoiding a conflict in the storage area to be searched / operated by the multidimensional database management unit 15. This will be described in association with the schematic diagram of FIG. 7. As shown in FIG. 9, it is a storage area corresponding to the same value range of both multidimensional cubes when the data constituting the multidimensional cube 1 is searched / operated by using the data constituting the multidimensional cube 0 as a key. By searching / manipulating the data constituting the multidimensional cube in parallel with the set of areas 01 and 11, the set of areas 02 and 12, and the set of areas 04 and 14, the conflict of the storage area to be searched / operated can be found. It can be avoided.
  • FIG. 10 is a diagram for explaining another example of the operation of avoiding the conflict of the storage area searched / operated by the multidimensional database management unit 15. This will be described in association with the schematic diagram of FIG.
  • the storage area is classified into the value range 01 and corresponds to the value range 01 as in FIG.
  • the data in the vicinity represented by the dotted circle from the center of gravity of the data accumulated and managed in is classified into the value range 11 and the value ranges 12, 14, and 15 within the range of the radius of the dotted circle from the value range 11.
  • the data in is classified into the value range 14 and the value range 11, 12, 15, 17, 18 within the range of the radius of the dotted circle from the value range 14, and accumulated in the corresponding storage areas 11, 12, 15, 17, 18 and stored.
  • Areas 01 and 15, 14, 12 which are storage areas corresponding to the same value range of both multidimensional cubes and storage areas in the vicinity of the same value range of both multidimensional cubes because they are managed data. , 11 pairs, regions 04 and 18, 17, 15, 14 as a unit, when searching / operating the data constituting the multidimensional cube in parallel, the region 15 for the data in the region 01.
  • the reference destination to the main body of the data entity and the main body of the relevant data entity are in the same storage area. Therefore, when the main body of any of the data stored in the storage area is searched / operated, the conflict of the storage area to be searched / operated cannot be avoided. On the other hand, when the reference to the main body of any of the stored data is searched / operated in the storage area, the conflict of the storage area to be searched / operated can be avoided. Further, if the reference to the main body of the entity is accumulated instead of accumulating the copy of the entity, the required amount of the storage area can be suppressed.
  • the multidimensional database management unit 15 further searches / operates the data constituting the multidimensional cube in parallel with the range corresponding to the storage area as the processing unit based on the referenced index information. Avoid conflicts in the storage area to be searched / operated.
  • the storage area to which the data does not belong is excluded from the processing target in the first place.
  • the same data is searched / operated in multiple sets of storage area because the entity or reference is duplicated and managed in the storage area corresponding to each range. In some cases. As a result, if the same result is obtained, the duplicated results are aggregated.
  • FIG. 11 is a diagram for explaining an example of a process in which the multidimensional database management unit 15 selects a range hierarchy.
  • the multidimensional database management unit 15 identifies the storage area to be searched / operated based on the referenced index information, and simultaneously parallels the data constituting the multidimensional cube with the storage area corresponding to the value range as a unit.
  • searching / operating the multidimensional database management unit 15 sets the hierarchy of the range in which the upper range includes the lower adjacent range for the range used for classifying the data of each dimension constituting the multidimensional cube / the data representing each characteristic. Build and select the range hierarchy to be the processing unit of search / operation according to the situation.
  • the situation is to select according to the value of the stored data, select the level of the range that can accommodate the widest data or the range that can accommodate the main data, and the data belongs. Limit the number of ranges to the number of adjacent ranges at most.
  • the range that can contain the widest data and the range that can contain the main data specifies the level of the range that can contain the data each time the data is accumulated, and the level of the maximum range and the most frequent. It is obtained by calculating the level of the range of. For example, since the data a and b cannot be included in the level 2 range and can be included in the level 1 range, the level 1 range is selected.
  • the selection is made according to the degree of parallelism that can be executed, the selection is made based on the number of available CPU cores and the status of other processing, and the processing capacity is maximized. For example, if a level 2 range is selected, the 64 storage area corresponds to the 64 range, and 64 is the upper limit of the degree of parallelism that can be executed. If the range of level 1 is selected, the 64 storage areas are aggregated into four, corresponding to the four ranges, and 4 is the upper limit of the degree of parallelism that can be executed. If the range of level 0 is selected, the 64 storage areas are aggregated into one, corresponding to one range, and 1 is the upper limit of the degree of parallelism that can be executed.
  • the degree of parallelism that can be executed is larger than the number of CPU cores when I / O waits are taken into consideration, and less than the number of CPU cores when the execution of other processes is taken into consideration. Therefore, the degree of parallelism that can be executed is calculated based on the information set in advance and the information acquired from the OS (Operating System). For example, if the number of CPU cores is 4, the range of level 1 whose range number is closest to the number of CPU cores is selected.
  • FIG. 12 and 13 are diagrams for explaining an example of processing for suppressing redundant processing by the multidimensional database management unit 15.
  • the multidimensional database management unit 15 selects the range hierarchy corresponding to a plurality of storage areas as the range hierarchy used as the search / operation processing unit. think.
  • redundant processing can be suppressed by not using the data that is duplicately stored and managed in a plurality of storage areas.
  • data belongs to multiple range the entity or reference is stored and managed in duplicate in the storage area corresponding to each range. Therefore, when the same data is searched / operated in multiple sets of storage areas. There is. As a result, if the same result is obtained, it is necessary to aggregate the duplicated results.
  • the multidimensional database management unit 15 suppresses this redundant processing.
  • the data a is the level for the level 2 range included in the level 1 range. It is classified into the range 2 of 2 and stored and managed in the corresponding storage area 2, and the data b is classified into the range 2, 3, 6 and 7 of the level 2 and stored and managed in the corresponding storage areas 2, 3, 6 and 7. It is shown that the range 1 to 16 of the level 2 is included in the range 3 of the level 1, and the range 1 to 4 of the level 1 is included in the range 1 of the level 0.
  • FIG. 13 is an example of tabular data representing the situation shown in FIG. Similar to FIG. 11, when the level 1 range is selected as the hierarchy of the range used as the search / operation processing unit, the multidimensional database management unit 15 corresponds to the level 2 range included in the level 1 range. Data is read out and processed in order from each storage area. For example, when the data a is read from the storage area corresponding to the range 2 of the level 2, by searching the tabular data of FIG. 13, the data is stored only in the storage area corresponding to the range 2 of the level 2. Can be identified. Therefore, in order to suppress redundant processing, the multidimensional database management unit 15 searches / operates the storage area corresponding to the range 2 of the level 2 of the paired multidimensional cube.
  • the multidimensional database management unit 15 searches / operates the storage area corresponding to the range 2, 3, 6, and 7 of the level 2 of the paired multidimensional cube. Further, in order to suppress redundant processing, the multidimensional database management unit 15 marks the tabular data in FIG. 13 that the data b has been processed, and corresponds to the range 3, 6 and 7 of the level 2. Data b is not read from the storage area.
  • the main body of the entity, the duplication of the entity, and the reference to the main body of the entity are displayed in the storage area corresponding to the hierarchy. If it has been accumulated, the copy of the entity and the reference to the main body of the entity can be deleted and reflected in the tabular data of FIG. 13, or the storage area and the state before the deletion can be obtained after the deletion. It is also possible to return the tabular data of.
  • FIG. 14 is a block diagram showing an example of the hardware configuration of the data analysis processing apparatus according to the present invention.
  • the data analysis processing device 10 includes a processor 12, a storage 200 for storing a multidimensional database 16, an interface unit 13, and a memory 14. That is, the data analysis processing device 10 is a computer, and is realized as, for example, a personal computer, a server computer, or the like.
  • the interface unit 13 is connected to the network 100 and receives access from the client 20 connected to the network 100.
  • the storage 200 is a non-volatile storage medium (block device) such as an HDD (Hard Disk Drive) or SSD (Solid State Drive).
  • the storage 200 stores a multidimensional database 16 in a predetermined storage area in addition to basic programs such as an OS (Operating System) and a device driver, and a program for realizing the functions of the data analysis processing device 10.
  • basic programs such as an OS (Operating System) and a device driver, and a program for realizing the functions of the data analysis processing device 10.
  • the memory 14 in FIG. 14 is, for example, a RAM (RandomAccessMemory), and stores a program 14a loaded from the storage 200 and various data 14b.
  • RAM RandomAccessMemory
  • the processor 12 in FIG. 14 is an arithmetic unit such as a Central Processing Unit (CPU) or a Micro Processing Unit (MPU), and its function is realized by a program loaded in the memory 14.
  • CPU Central Processing Unit
  • MPU Micro Processing Unit
  • the processor 12 includes an OLAP operation execution unit 11 and a multidimensional database management unit 15 as processing functions related to the embodiment.
  • the OLAP operation execution unit 11, the multidimensional database management unit 15, and the time-series alignment unit 17 are processing functions realized by the processor 12 executing the instructions included in the program 14a. That is, the data analysis processing device 10 of the present invention can also be realized by a computer and a program. In addition to recording and distributing the program on a recording medium such as an optical medium, it is also possible to provide the program through a network.
  • the OLAP operation execution unit 11 and the multidimensional database management unit 15 include integrated circuits such as an ASIC (Application Specific Integrated Circuit) and an FPGA (field-programmable gate array) in place of or in addition to the processor 12. , May be realized in various other formats.
  • ASIC Application Specific Integrated Circuit
  • FPGA field-programmable gate array
  • the processor 12 can receive the OLAP operation and the argument from the client 20 via the interface unit 13, and can send the operation result to the client 20.
  • the multidimensional database management unit 15 shares the data among the multidimensional cubes. Classify by value range. Further, when the data classified by the range belongs to a single range, the multidimensional database management unit 15 stores the data in the storage area corresponding to the range, and the data classified by the range belongs to a plurality of ranges. In that case, the entity or reference is duplicated and accumulated in the storage area corresponding to each range.
  • the range information used to classify the data to be operated that constitutes the multidimensional cube is used as index information.
  • the storage area corresponding to the same range of both multidimensional cubes and the storage area corresponding to the range near the same range of both multidimensional cubes are searched. / Limit the range of operation. Further, when a plurality of searches / operations are executed at the same time, the conflict of the storage area to be searched / operated is further avoided.
  • the multidimensional database management unit 15 uses data constituting another multidimensional cube as an argument of the OLAP operation.
  • the multidimensional database management unit 15 is the data of each dimension constituting the multidimensional cube.
  • the hierarchy of the value range in which the upper value range includes the lower adjacent value range is constructed.
  • the multidimensional database management unit 15 selects a hierarchy of a range to be a processing unit of search / operation according to a situation such as a value of accumulated data and a degree of parallelism that can be executed. Further, when the multidimensional database management unit 15 selects a hierarchy of range corresponding to a plurality of storage areas, the multidimensional database management unit 15 does not use the data duplicated and stored and managed in the plurality of storage areas.
  • the embodiment it is possible to speed up the process of searching / operating the data constituting another multidimensional cube by using the data constituting the multidimensional cube as a key. That is, according to the embodiment, it becomes possible to provide a data analysis processing device, a data analysis processing method, and a program capable of executing OLAP operations on a multidimensional cube at high speed. More specifically, according to the embodiment, when data constituting another multidimensional cube is used as an argument of an OLAP operation, the data constituting one multidimensional cube and the data constituting another multidimensional cube are used.
  • the present invention is not limited to the above-described embodiment as it is, and at the implementation stage, the components can be modified and embodied within a range that does not deviate from the gist thereof.
  • various inventions can be formed by an appropriate combination of the plurality of components disclosed in the above-described embodiment. For example, some components may be removed from all the components shown in the embodiments. In addition, components from different embodiments may be combined as appropriate.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Un dispositif de traitement d'analyse de données selon un aspect de la présente invention comprend : une base de données multidimensionnelle ; une unité d'exécution d'opération OLAP ; et une unité de gestion de base de données multidimensionnelle. La base de données multidimensionnelle stocke des données représentant un événement du monde réel dans un cube multidimensionnel construit pour chaque sujet en association avec l'identifiant de l'événement. L'unité d'exécution d'opération OLAP exécute une opération de traitement analytique en ligne (OLAP) sur un cube multidimensionnel en réponse à une demande provenant d'un client. L'unité de gestion de base de données multidimensionnelle gère, dans le cube multidimensionnel, des données tridimensionnelles, des données spatiales, de multiples types de données dimensionnelles uniques, et des données représentant de multiples types de caractéristiques. Lorsque chacune des données constituant le cube multidimensionnel consiste en des données multidimensionnelles, l'unité de gestion de base de données multidimensionnelle classifie les données multidimensionnelles dans une plage de valeurs multidimensionnelles commune aux cubes multidimensionnels.
PCT/JP2020/040213 2020-10-27 2020-10-27 Dispositif de traitement d'analyse de données, procédé de traitement d'analyse de données et programme Ceased WO2022091204A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/040213 WO2022091204A1 (fr) 2020-10-27 2020-10-27 Dispositif de traitement d'analyse de données, procédé de traitement d'analyse de données et programme
JP2022558636A JP7464142B2 (ja) 2020-10-27 2020-10-27 データ分析処理装置、データ分析処理方法、およびプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/040213 WO2022091204A1 (fr) 2020-10-27 2020-10-27 Dispositif de traitement d'analyse de données, procédé de traitement d'analyse de données et programme

Publications (1)

Publication Number Publication Date
WO2022091204A1 true WO2022091204A1 (fr) 2022-05-05

Family

ID=81382206

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/040213 Ceased WO2022091204A1 (fr) 2020-10-27 2020-10-27 Dispositif de traitement d'analyse de données, procédé de traitement d'analyse de données et programme

Country Status (2)

Country Link
JP (1) JP7464142B2 (fr)
WO (1) WO2022091204A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007502466A (ja) * 2003-08-12 2007-02-08 オラクル・インターナショナル・コーポレイション 次元分割による、オンライン分析処理(olap)および多次元計画アプリケーションにおける相互属性分析および操作のためのシステムならびに方法
US20070150862A1 (en) * 2005-11-07 2007-06-28 Business Objects, S.A. Apparatus and method for defining report parts
JP2016518646A (ja) * 2013-03-15 2016-06-23 デシジョン, インク. 次元データによってデータ測定値にマッピングされた文脈オブジェクトを生成するためのシステム、装置、及び方法
JP2018136963A (ja) * 2014-11-19 2018-08-30 株式会社インフォメックス データ検索装置、データ検索方法、データ検索プログラム、及び記録媒体

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007502466A (ja) * 2003-08-12 2007-02-08 オラクル・インターナショナル・コーポレイション 次元分割による、オンライン分析処理(olap)および多次元計画アプリケーションにおける相互属性分析および操作のためのシステムならびに方法
US20070150862A1 (en) * 2005-11-07 2007-06-28 Business Objects, S.A. Apparatus and method for defining report parts
JP2016518646A (ja) * 2013-03-15 2016-06-23 デシジョン, インク. 次元データによってデータ測定値にマッピングされた文脈オブジェクトを生成するためのシステム、装置、及び方法
JP2018136963A (ja) * 2014-11-19 2018-08-30 株式会社インフォメックス データ検索装置、データ検索方法、データ検索プログラム、及び記録媒体

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAGI SATORU: "A concept of a multidimensional data analysis system for real-world phenomena", IPSJ SIG TECHNICAL REPORT, vol. 2019-DBS-169, no. 14, 10 September 2019 (2019-09-10), pages 1 - 6, XP055938138, ISSN: 2188-871X *

Also Published As

Publication number Publication date
JP7464142B2 (ja) 2024-04-09
JPWO2022091204A1 (fr) 2022-05-05

Similar Documents

Publication Publication Date Title
US12056583B2 (en) Target variable distribution-based acceptance of machine learning test data sets
US9449115B2 (en) Method, controller, program and data storage system for performing reconciliation processing
KR101137147B1 (ko) 질의 강제 인덱싱
US11204707B2 (en) Scalable binning for big data deduplication
EP3532949A1 (fr) Surveillance d'une modification couvrant des interrogations de graphe
Wang et al. Accelerated butterfly counting with vertex priority on bipartite graphs
JP6153331B2 (ja) 連想メモリに基づくプロジェクト管理システム
US12210570B2 (en) System and method of managing indexing for search index partitions
Hu et al. Towards big linked data: a large-scale, distributed semantic data storage
US8250024B2 (en) Search relevance in business intelligence systems through networked ranking
US8473496B2 (en) Utilizing density metadata to process multi-dimensional data
JP7464142B2 (ja) データ分析処理装置、データ分析処理方法、およびプログラム
US10019472B2 (en) System and method for querying a distributed dwarf cube
Wang et al. Turbo: Dynamic and decentralized global analytics via machine learning
Shaabani et al. Incrementally updating unary inclusion dependencies in dynamic data
Nurhadi et al. Evaluation of NoSQL Databases Features and Capabilities for Smart City Data Lake Management
CN118647987A (zh) 快速跳表扫描和插入
Topcu Evaluating Riak Key Value Cluster for Big Data
Hanmanthu et al. Parallel optimal grid-clustering algorithm exploration on mapreduce framework
CN115658841A (zh) 一种数据管理方法、装置、计算设备及存储介质
CN108664662A (zh) 时间旅行和时态聚合查询处理方法
NR et al. MapReduce‐based storage and indexing for big health data
Trinh et al. A novel ensemble-based paradigm to process large-scale data
CN118786422A (zh) 在数据存储中存储和搜索数据
US12405920B2 (en) Data file clustering with KD-classifier trees

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20959722

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022558636

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20959722

Country of ref document: EP

Kind code of ref document: A1