WO2015125230A1

WO2015125230A1 - Data update method, and computer system

Info

Publication number: WO2015125230A1
Application number: PCT/JP2014/053897
Authority: WO
Inventors: 康志宮田; 輝早川; 博泰西山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2014-02-19
Filing date: 2014-02-19
Publication date: 2015-08-27
Anticipated expiration: 2016-08-19
Also published as: JPWO2015125230A1; JP6143938B2

Abstract

When new unit data is added to original graph data, a computer system identifies from meta information of contraction graph data a first contraction value which is a contraction value corresponding to a first node value in the new unit data and a second contraction value which is a contraction value corresponding to a second node value in the new unit data, and adds, to the contraction graph data, new unit contraction data combining the first contraction value, the second contraction value, and an edge from the first contraction value to the second contraction value (the same edge as an edge in the new unit data).

Description

Data update method and computer system

　本発明は、概して、グラフデータの更新に関する。 The present invention generally relates to updating graph data.

　グラフデータは、例えば、データベースの少なくとも一部分に使用され、検索の範囲とされる。グラフデータとして、例えば、ＲＤＦデータ（ＲＤＦ（Resource　Description　Framework）と呼ばれる形式のデータ）が知られている。ＲＤＦデータの縮約に関し、特許文献１に開示の技術が知られている。特許文献１によれば、元のＲＤＦデータが縮約された縮約ＲＤＦデータが生成される。 The graph data is used for at least a part of the database, for example, and is used as a search range. As graph data, for example, RDF data (data in a format called RDF (Resource Description Framework)) is known. Regarding the reduction of RDF data, a technique disclosed in Patent Document 1 is known. According to Patent Document 1, reduced RDF data obtained by reducing the original RDF data is generated.

国際公開第2013/111287号パンフレットInternational Publication No. 2013/111287 Pamphlet

　元のＲＤＦデータにデータが追加された場合、縮約ＲＤＦデータを通じてＲＤＦデータから追加されたデータを取得できるようにするためには、データ追加後の元ＲＤＦデータから新たに縮約ＲＤＦデータを作成することになる。 When data is added to the original RDF data, new reduced RDF data is created from the original RDF data after data addition so that the data added from the RDF data can be acquired through the reduced RDF data. Will do.

　しかし、ＲＤＦデータのようなグラフデータの縮約には、そのグラフデータのサイズに準じた時間がかかる。 However, reduction of graph data such as RDF data takes time according to the size of the graph data.

　元のグラフデータは、複数の単位データの集合であり、各単位データは、第１ノード値、第２ノード値と、第１ノード値から第２ノード値へのエッジとの組合せである。縮約グラフデータは、複数の単位縮約データの集合であり、各単位縮約データは、第１ノード値の縮約値と、第２ノード値の縮約値と、第１ノード値の縮約値から第２ノード値の縮約値へのエッジとの組合せである。元のグラフデータに新たな単位データが追加された場合、計算機システムは、縮約グラフデータのメタ情報から、新たな単位データ内の第１ノード値に対応した縮約値である第１の縮約値と、新たな単位データ内の第２ノード値に対応した縮約値である第２の縮約値とを特定し、第１の縮約値と、第２の縮約値と、第１の縮約値から第２の縮約値へのエッジ（新たな単位データ内のエッジと同じエッジ）との組合せである新たな単位縮約データを、縮約グラフデータに追加する。 The original graph data is a set of a plurality of unit data, and each unit data is a combination of a first node value, a second node value, and an edge from the first node value to the second node value. The contract graph data is a set of unit contract data, and each unit contract data includes a contract value of the first node value, a contract value of the second node value, and a contract of the first node value. It is a combination with an edge from the reduced value to the reduced value of the second node value. When new unit data is added to the original graph data, the computer system determines from the meta information of the reduced graph data the first reduced value corresponding to the first node value in the new unit data. A reduction value and a second reduction value that is a reduction value corresponding to the second node value in the new unit data are specified, and the first reduction value, the second reduction value, New unit contracted data that is a combination of an edge from the contracted value of 1 to the second contracted value (the same edge as the edge in the new unit data) is added to the contracted graph data.

　元のグラフデータに新たなデータが追加されてもデータ追加後の元グラフデータから新たに縮約グラフデータを作成することなく、データ追加後のＲＤＦデータから追加されたデータを縮約ＲＤＦデータを通じて検索できる。 Even if new data is added to the original graph data, the data added from the RDF data after the data is added through the reduced RDF data without creating new reduced graph data from the original graph data after the data is added. Searchable.

実施形態に係る計算機システムの構成を示す。1 shows a configuration of a computer system according to an embodiment. 元ＲＤＦデータ１１５の表形式の構造を示す。The table format structure of the original RDF data 115 is shown. 単位ＲＤＦデータの構成を示す。The structure of unit RDF data is shown. 図２の元ＲＤＦデータ１１５が表す木構造を示す。The tree structure which the original RDF data 115 of FIG. 2 represents is shown. 元ＲＤＦデータ１１５に対する単位ＲＤＦデータの追加例を示す。An example of adding unit RDF data to the original RDF data 115 is shown. 元ＲＤＦデータ１１５を基に縮約基準表１０１を作成する流れの概要を示す。An outline of a flow for creating the contraction criterion table 101 based on the original RDF data 115 is shown. 縮約基準表１０１を用いて縮約ＲＤＦデータ１１６を作成する流れの概要を示す。An outline of the flow of creating the contracted RDF data 116 using the contraction criterion table 101 is shown. 縮約ＲＤＦデータ１１６の表形式の構造を示す。A table format structure of the reduced RDF data 116 is shown. 縮約ＲＤＦデータ１１６を用いた検索処理の一例を示す。An example of search processing using the contracted RDF data 116 is shown. 差分更新処理のフローチャートである。It is a flowchart of a difference update process. 縮約ＲＤＦ更新処理（図１０のステップ１００４）のフローチャートである。12 is a flowchart of contracted RDF update processing (step 1004 in FIG. 10). 縮約値決定処理（図１１のステップ１１０２）のフローチャートである。12 is a flowchart of contracted value determination processing (step 1102 in FIG. 11). 図１２のステップ１２０３の結果の一例を示す。An example of the result of step 1203 of FIG. 12 is shown. 図１２のステップ１２１３の結果の一例を示す。An example of the result of step 1213 in FIG. 12 is shown. 図１１のステップ１１０３の結果の一例を示す。An example of the result of step 1103 in FIG. 11 is shown. データ整理処理のフローチャートである。It is a flowchart of a data reduction process. 縮約条件設定画面の一例を示す。An example of a contraction condition setting screen is shown.

　以下、グラフデータとしてＲＤＦデータが採用された一実施形態を説明する。 Hereinafter, an embodiment in which RDF data is employed as graph data will be described.

　なお、以下の説明では、「ｋｋｋ表」の表現にて情報を説明することがあるが、情報は、表以外のデータ構造で表現されていてもよい。データ構造に依存しないことを示すために「ｋｋｋ表」を「ｋｋｋ情報」と呼ぶことができる。 In the following description, information may be described by the expression “kkk table”, but the information may be expressed by a data structure other than the table. In order to show that it does not depend on the data structure, the “kkk table” can be called “kkk information”.

　また、以下の説明では、「プログラム」を主語として処理を説明する場合があるが、プログラムは、プロセッサによって実行されることで、定められた処理を、適宜に記憶資源（例えば、メモリ）及び／又は通信インターフェイスデバイス（例えば、通信ポート）を用いながら行うため、処理の主語がプロセッサとされてもよい。逆に、プロセッサが主語となっている処理は、１以上のプログラムを実行することにより行われると解釈することができる。 In the following description, the process may be described using “program” as a subject. However, the program is executed by the processor, so that the determined process can be appropriately performed with storage resources (for example, memory) and / or Alternatively, since the processing is performed using a communication interface device (for example, a communication port), the subject of processing may be a processor. On the contrary, the processing whose subject is the processor can be interpreted as being executed by executing one or more programs.

　図１は、実施形態に係る計算機システムの構成を示す。 FIG. 1 shows a configuration of a computer system according to the embodiment.

　計算機システム１００は、１以上の計算機を含んだシステムである。１以上の計算機は、少なくとも１つの物理計算機を含み、１以上の仮想計算機を含んでよい。計算機システム１００は、入力デバイス１１１、出力デバイス１１２、通信インターフェイスデバイス（Ｉ／Ｆ）１１３、記憶資源１０３及びそれらに接続されたプロセッサ１１０を有する。入力デバイス１１１及び出力デバイス１１２のうちの少なくとも１つは、計算機システム１００に接続された遠隔の表示用計算機（図示せず）に存在してもよい。 The computer system 100 is a system including one or more computers. The one or more computers include at least one physical computer and may include one or more virtual computers. The computer system 100 includes an input device 111, an output device 112, a communication interface device (I / F) 113, a storage resource 103, and a processor 110 connected to them. At least one of the input device 111 and the output device 112 may be present in a remote display computer (not shown) connected to the computer system 100.

　入力デバイス１１１は、１以上の入力デバイスであり、例えば、キーボード及びポインティングデバイスでよい。出力デバイス１１２は、表示デバイスを含む１以上の出力デバイスであり、例えば、液晶ディスプレイでよい。入力デバイス１１１及び出力デバイス１１２は、タッチパネルのように一体であってもよい。 The input device 111 is one or more input devices, and may be a keyboard and a pointing device, for example. The output device 112 is one or more output devices including a display device, and may be a liquid crystal display, for example. The input device 111 and the output device 112 may be integrated like a touch panel.

　Ｉ／Ｆ１１３は、１以上の通信インターフェイスデバイスであり、例えば、ＬＡＮ（Local　Area　Network）コントローラ及びＨＢＡ（Host　Bus　Adapter）のうちの少なくとも１つでよい。Ｉ／Ｆ１１３に、外部ストレージ装置１１４が接続される。外部ストレージ装置１１４は、ＳＳＤ（Solid　State　Drive）又はＨＤＤ（Hard　Disk　Drive）のような記憶デバイスであってもよいし、複数の記憶デバイスで構成された１以上のＲＡＩＤ（Redundant　Arrays　of　Inexpensive　（or　Independent）　Disks）グループを有するストレージ装置であってもよい。外部ストレージ装置１１４が、元ＲＤＦデータ１１５と、元ＲＤＦデータ１１５を縮約した結果である縮約ＲＤＦデータ１１６とを記憶する。本実施形態において、縮約ＲＤＦデータの元のデータと言う意味で、ＲＤＦデータを「元ＲＤＦデータ」と称する。元ＲＤＦデータ１１５及び縮約ＲＤＦデータ１１６は、例えばデータベース中のデータでよい。外部ストレージ装置１１４が無く、インメモリデータベースのように、元ＲＤＦデータ１１５及び縮約ＲＤＦデータ１１６が記憶資源１０３に格納されてもよい。 The I / F 113 is one or more communication interface devices, and may be, for example, at least one of a LAN (Local Area Network) controller and an HBA (Host Bus Adapter). An external storage device 114 is connected to the I / F 113. The external storage device 114 may be a storage device such as SSD (Solid State Drive) or HDD (Hard Disk Drive), or one or more RAID (Redundant Arrays of Inexpensive (or It may be a storage device having an Independent) Disks) group. The external storage device 114 stores the original RDF data 115 and the reduced RDF data 116 that is the result of reducing the original RDF data 115. In the present embodiment, RDF data is referred to as “original RDF data” in the sense of original data of reduced RDF data. The original RDF data 115 and the contracted RDF data 116 may be data in a database, for example. There is no external storage device 114, and the original RDF data 115 and the reduced RDF data 116 may be stored in the storage resource 103 like an in-memory database.

　記憶資源１０３は、揮発性又は不揮発性メモリを含む１以上の記憶デバイスである。記憶資源１０３は、元ＲＤＦデータ１１５の縮約の際に作成された縮約基準表１０１及び縮約表１０２を記憶する。縮約基準表１０１及び縮約表１０２は、縮約ＲＤＦデータ１１６のメタ情報の一例である。また、記憶資源１０３は、初めて縮約ＲＤＦデータ１１６を作成する初期縮約プログラム１０４、入力された元クエリを縮約クエリに変換し縮約クエリを処理するクエリ処理プログラム１０５、縮約ＲＤＦデータ１１６を更新する差分更新プログラム１０６、及び、所定の条件に適合した場合にも元ＲＤＦデータ１１５を基に新たに縮約ＲＤＦデータを作成するデータ整理プログラム１０７を記憶する。これらのプログラム１０４～１０７は、プロセッサ１１０により実行される。例えば、差分更新プログラム１０６は、後述する縮約ＲＤＦ更新処理及び縮約値決定処理を行うことができる。 Storage resource 103 is one or more storage devices including volatile or nonvolatile memory. The storage resource 103 stores the reduction criterion table 101 and the reduction table 102 created when the original RDF data 115 is reduced. The contraction criterion table 101 and the contraction table 102 are examples of meta information of the contracted RDF data 116. Further, the storage resource 103 includes an initial contracted program 104 that creates the contracted RDF data 116 for the first time, a query processing program 105 that converts the input original query into a contracted query and processes the contracted query, and the contracted RDF data 116. And a data organizing program 107 for newly creating contracted RDF data based on the original RDF data 115 even when a predetermined condition is met. These programs 104 to 107 are executed by the processor 110. For example, the difference update program 106 can perform reduced RDF update processing and reduced value determination processing described later.

　プロセッサ１１０は、１以上のプロセッサである。プロセッサは、ＣＰＵ（Central　Processing　Unit）のようなマイクロプロセッサであってもよいし、ＣＰＵコアのようなプロセッサコアであってもよいし、それらのうちの少なくとも１つに加えて一部の処理（例えば、暗号化／復号化、又は、圧縮／伸張）を実行するハードウェア回路を含んでもよい。 The processor 110 is one or more processors. The processor may be a microprocessor such as a CPU (Central Processing Unit), a processor core such as a CPU core, or a part of processing (at least one of them) ( For example, a hardware circuit that performs encryption / decryption or compression / decompression) may be included.

　図２は、元ＲＤＦデータ１１５の表形式の構造を示す。 FIG. 2 shows a tabular structure of the original RDF data 115.

　図２に示すように、元ＲＤＦデータ１１５は、表形式であり、複数の単位ＲＤＦデータにそれぞれ対応した複数のレコードで構成される。各レコードは、そのレコードに対応した単位ＲＤＦデータが有する主語、述語及び目的語のそれぞれの値を有する。なお、この元ＲＤＦデータ１１５は、Ａ、Ｂ、Ｃ、ＤおよびＥの５つの国のランク（rank）、度数（degree）、名前（name）及び友好関係（friend）を表している。 As shown in FIG. 2, the original RDF data 115 has a tabular format, and is composed of a plurality of records respectively corresponding to a plurality of unit RDF data. Each record has the values of the subject, predicate, and object of the unit RDF data corresponding to the record. The original RDF data 115 represents the rank, the degree, the name, and the friendship of five countries A, B, C, D, and E.

　図３に示すように、単位ＲＤＦデータは、「主語」、「述語」及び「目的語」と呼ばれる３つの要素（値）で構成されている。グラフデータの単位データは、第１ノード値、第２ノード値と、第１ノード値から第２ノード値へのエッジとの組合せであるが、単位ＲＤＦデータでは、主語が、第１ノード値の一例であり、述語が、エッジの一例であり、目的語が、第２ノード値の一例である。単位ＲＤＦデータは、「トリプル」と呼ばれることもある。 As shown in FIG. 3, the unit RDF data is composed of three elements (values) called “subject”, “predicate”, and “object”. The unit data of the graph data is a combination of the first node value, the second node value, and the edge from the first node value to the second node value. In the unit RDF data, the subject is the first node value. It is an example, the predicate is an example of an edge, and the object is an example of a second node value. The unit RDF data may be referred to as “triple”.

　図３のような単位ＲＤＦデータの組合せによって、図４に示すような木構造の元ＲＤＦデータ１１５が構築される。言い換えれば、図２の表が、図４の木構造を表す。グラフデータは、一般に、複数のノードと複数のエッジの集合であるが、元ＲＤＦデータ１１５も同様の構成である。元ＲＤＦデータ１１５において、主語及び目的語の両方になれるノード（非末端ノードの一例）が「リソース」と呼ばれ、目的語のみになれるノード（末端ノードの一例）が「リテラル」と呼ばれることがある。 The original RDF data 115 having a tree structure as shown in FIG. 4 is constructed by combining the unit RDF data as shown in FIG. In other words, the table of FIG. 2 represents the tree structure of FIG. The graph data is generally a set of a plurality of nodes and a plurality of edges, but the original RDF data 115 has the same configuration. In the original RDF data 115, a node (an example of a non-terminal node) that can be both a subject and an object is called a “resource”, and a node (an example of a terminal node) that can be only an object is called a “literal”. is there.

　元ＲＤＦデータ１１５に対して、図５に示すように、単位ＲＤＦデータが追加されることがある。具体的には、図２の元ＲＤＦデータ１１５に、追加される単位ＲＤＦデータに対応したレコードが追加されることがある。このように、元ＲＤＦデータ１１５に対する単位ＲＤＦデータの追加は容易である。 Unit RDF data may be added to the original RDF data 115 as shown in FIG. Specifically, a record corresponding to the unit RDF data to be added may be added to the original RDF data 115 in FIG. In this way, it is easy to add unit RDF data to the original RDF data 115.

　しかし、追加された単位ＲＤＦデータを元ＲＤＦデータ１１５から縮約ＲＤＦデータ１１６を用いて取得できるようにするためには、従来技術によれば、単位ＲＤＦデータ追加後の元ＲＤＦデータ１１５から縮約ＲＤＦデータ１１６を新たに作成しなければならない。 However, in order to be able to acquire the added unit RDF data from the original RDF data 115 using the reduced RDF data 116, according to the prior art, the reduced unit RDF data 115 after the addition of the unit RDF data is used. RDF data 116 must be newly created.

　本実施形態では、単位ＲＤＦデータが追加の元ＲＤＦデータ１１５に追加された場合、必要に応じて縮約基準表１０１及び縮約表１０２が更新され、縮約基準表１０１及び縮約表１０２から、追加された単位ＲＤＦデータにおける主語及び目的語の縮約値がそれぞれ取得される。そして、取得された２つの縮約値とそれらを繋ぐ述語（追加された単位ＲＤＦデータ内の述語と同じ述語）とを含んだ単位縮約ＲＤＦデータが、縮約ＲＤＦデータ１１５に追加される。これにより、単位ＲＤＦデータ追加後の元ＲＤＦデータ１１５から縮約ＲＤＦデータ１１６を新たに作成することなく、追加された単位ＲＤＦデータを元ＲＤＦデータ１１５から縮約ＲＤＦデータ１１６を用いて取得できる。なお、本実施形態において、「縮約値」とは、一意な何らかの値でよい。 In this embodiment, when the unit RDF data is added to the additional original RDF data 115, the contraction criterion table 101 and the contraction table 102 are updated as necessary, and the contraction criterion table 101 and the contraction table 102 are updated. The contracted values of the subject and the object in the added unit RDF data are acquired. Then, unit contracted RDF data including the two acquired contracted values and a predicate connecting them (the same predicate as the predicate in the added unit RDF data) is added to the contracted RDF data 115. Thus, the added unit RDF data can be acquired from the original RDF data 115 using the reduced RDF data 116 without newly creating the reduced RDF data 116 from the original RDF data 115 after the unit RDF data is added. In the present embodiment, the “contracted value” may be any unique value.

　以下、本実施形態をより詳細に説明する。 Hereinafter, this embodiment will be described in more detail.

　まず、図６及び図７を参照して、縮約ＲＤＦデータ１１６の作成の流れの概要を説明する。図６は、元ＲＤＦデータ１１５を基に縮約基準表１０１を作成する流れの概要を示し、図７は、縮約基準表１０１を用いて縮約ＲＤＦデータ１１６を作成する流れの概要を示す。 First, an outline of the flow of creating the reduced RDF data 116 will be described with reference to FIGS. 6 and 7. FIG. 6 shows an outline of a flow for creating the contracted criterion table 101 based on the original RDF data 115, and FIG. 7 shows an overview of a flow for creating the contracted RDF data 116 using the contracted criterion table 101. .

　図６に示すように、初期縮約プログラム１０４は、同じ述語を持つ複数のリソースを元ＲＤＦデータ１１５から選び出すことで、複数の同一構造のＲＤＦデータ部を作成する。各ＲＤＦデータ部は、リソース（主語）と、そのリソースから延びた２つの述語（rank、degree）と、２つの述語にそれぞれ接続された目的語とを有する。初期縮約プログラム１０４は、複数のＲＤＦデータ部から目的語の値集合を取り出し、各グループ（目的語の値範囲）に所属する値集合（目的語の数）が例えば均等になるように、値集合を分割する。値集合は、アクセス頻度に基づいて分割される等、他の規則に従い分割されてもよい。初期縮約プログラム１０４は、分割した際の閾値（値集合の最大値及び最小値）から、縮約基準表１０１の縮約範囲（目的語の値範囲）を作成する。例えば、値が文字列でも、辞書順や文字コード順のような所定の規則に従って並べることで、最大値と最小値を決定することができる。初期縮約プログラム１０４は、各縮約範囲に、一意に対応する縮約値を生成して関連付け、述語を、基準述語として関連付けて管理する。このようにして作成された縮約基準表１０１は、複数のリテラルを縮約値に対応づけるために定めた基準を表す。図６に示す縮約基準表１０１によれば、基準述語「rank」について、値が２未満の目的語は、縮約値「c1」に変換され、値が２以上の目的語は、縮約値「c2」に変換されることになる。 As shown in FIG. 6, the initial contraction program 104 creates a plurality of RDF data portions having the same structure by selecting a plurality of resources having the same predicate from the original RDF data 115. Each RDF data part has a resource (subject), two predicates (rank, degree) extending from the resource, and an object connected to each of the two predicates. The initial reduction program 104 takes out a set of object values from a plurality of RDF data parts, and sets values such that the value sets (number of objects) belonging to each group (object value range) are equal, for example. Divide the set. The value set may be divided according to other rules such as division based on access frequency. The initial contraction program 104 creates a contraction range (object value range) of the contraction criterion table 101 from the threshold values (maximum value and minimum value of the value set) at the time of division. For example, even if the value is a character string, the maximum value and the minimum value can be determined by arranging according to a predetermined rule such as a dictionary order or a character code order. The initial reduction program 104 generates and associates a reduction value uniquely corresponding to each reduction range, and manages the predicate as a reference predicate. The reduction standard table 101 created in this way represents a standard defined for associating a plurality of literals with reduction values. According to the contraction criterion table 101 shown in FIG. 6, with respect to the standard predicate “rank”, an object having a value less than 2 is converted into a contraction value “c1”, and an object having a value of 2 or more is contracted. It will be converted to the value “c2”.

　次に、図７に示すように、初期縮約プログラム１０４は、複数の同一構造のＲＤＦデータ部の各々のリテラル（目的語）を、縮約基準表１０１（図６参照）を用いて縮約値に変換する。そして、初期縮約プログラム１０４は、同じ縮約値を含んだ２以上のＲＤＦデータ部のリソースを集約することで、リソース集合を作成する。初期縮約プログラム１０４は、各リソース集合に、一意に対応する縮約値を生成して関連付け、その対応関係を縮約表１０２に登録する。つまり、初期縮約プログラム１０４は、元ＲＤＦデータ１１５におけるすべてのリテラルについて、縮約基準表１０１に基づいて縮約値を求め、元のリソースと縮約値との対応関係を表す縮約表１０２を作成する。主語となるリソースが複数存在し、各リソースが、同じ縮約した目的語（リテラルの縮約値）に接続されていれば、同じ縮約値に接続されるリソースは、同じ縮約値となり、その対応関係が縮約表で管理される。縮約表１０２は、元ＲＤＦデータ１１５に含まれる複数のリソースをそれぞれひとつの縮約値に対応付ける情報である。初期縮約プログラム１０４は、縮約表１０２を基に、各リソースの値を縮約値に変換する。このようなリソース変換（リソースの値を縮約値に変換すること）及びリテラル変換（リテラルの値を縮約値に変換すること）を元ＲＤＦデータ１１５に対して実行した結果として、縮約ＲＤＦデータ１１６が作成される。なお、リソースの縮約値を用い、その縮約値を述語とするリソース集合を求めることで、再び縮約値を決定することができる。この操作を繰り返すことで、元ＲＤＦデータ１１５の全てのノードをそれぞれ縮約値に変換することができる。図８は、作成された縮約ＲＤＦデータ１１６の表形式の構造を示す。なお、図８において、リテラル（目的語）の値が、文字列の場合、数値と異なり、降順や昇順などの順番を付けられない。また、文字列間の距離も定義できないため、値範囲を設定することもできない。そのため、リテラルの値が文字列の場合、縮約ＲＤＦデータ１１６において、そのリテラルの値は無効な値（例えば「other」）とされ、そのリテラルについては、縮約の対象外とされる。ただし、文字列を文字コードやその他のルールによって数値に変換することで、文字列が値のリテラルについても縮約がされてもよい。 Next, as shown in FIG. 7, the initial contraction program 104 contracts each literal (object) of a plurality of RDF data portions having the same structure using the contraction criterion table 101 (see FIG. 6). Convert to value. Then, the initial contraction program 104 creates a resource set by aggregating resources of two or more RDF data parts including the same contraction value. The initial reduction program 104 generates and associates a reduction value uniquely corresponding to each resource set, and registers the correspondence relationship in the reduction table 102. That is, the initial reduction program 104 obtains a reduction value based on the reduction criterion table 101 for all literals in the original RDF data 115, and a reduction table 102 indicating the correspondence between the original resource and the reduction value. Create If there are multiple resources to be the subject and each resource is connected to the same contracted object (literal contracted value), the resources connected to the same contracted value will have the same contracted value, The correspondence is managed in the contract table. The contraction table 102 is information that associates a plurality of resources included in the original RDF data 115 with one contracted value. The initial reduction program 104 converts the value of each resource into a reduction value based on the reduction table 102. As a result of executing such resource conversion (converting a resource value into a contracted value) and literal conversion (converting a literal value into a contracted value) on the original RDF data 115, the contracted RDF Data 116 is created. Note that the contracted value can be determined again by using the contracted value of the resource and obtaining a resource set using the contracted value as a predicate. By repeating this operation, all nodes of the original RDF data 115 can be converted into contracted values. FIG. 8 shows a tabular structure of the generated reduced RDF data 116. In FIG. 8, when the value of a literal (object) is a character string, it cannot be given an order such as descending order or ascending order, unlike a numerical value. In addition, since the distance between character strings cannot be defined, the value range cannot be set. Therefore, when the literal value is a character string, the literal value is an invalid value (for example, “other”) in the contracted RDF data 116, and the literal is not subject to contraction. However, the literal of the character string may be reduced by converting the character string into a numerical value by a character code or other rules.

　計算機システム１００は、縮約ＲＤＦデータ１１６を用いて、元ＲＤＦデータ１１５からデータを検索することができる。 The computer system 100 can retrieve data from the original RDF data 115 using the reduced RDF data 116.

　例えば、計算機システム１００は、図９に示すクエリ（以下、元クエリ）９０１を、クエリ発行元から受けたとする。クエリ発行元は、計算機システム１００において実行されるアプリケーションプログラム（図示せず）であってもよいし、計算機システム１００の外部の計算機（図示せず）であってもよい。また、前述したプログラム１０４～１０７は、データベース管理システムに含まれていてもよい。元クエリ９０１は、述語「rank」の目的語の値が４以上、且つ、述語「degree」の目的語の値が１であるリソースを探す要求である。 For example, it is assumed that the computer system 100 receives a query (hereinafter, original query) 901 shown in FIG. 9 from a query issuer. The query issuer may be an application program (not shown) executed in the computer system 100 or a computer (not shown) outside the computer system 100. Further, the above-described programs 104 to 107 may be included in the database management system. The original query 901 is a request to search for a resource whose predicate “rank” has an object value of 4 or more and whose predicate “degree” has an object value of 1.

　まず、クエリ処理プログラム１０５が、元クエリ９０１を縮約クエリ９０２に変換する。縮約クエリ９０２は、元クエリ９０１における条件値が縮約値に変換されたクエリである。 First, the query processing program 105 converts the original query 901 into a contracted query 902. The contracted query 902 is a query in which the condition value in the original query 901 is converted into a contracted value.

　具体的には、例えば、クエリ処理プログラム１０５は、述語「rank」及び「degree」にそれぞれ繋がるリテラルを、縮約基準表１０１を用いて縮約値に変換する。この結果、「rank」についての「4以上」は、縮約値「c2」に変換され、「degree」についての「1」は、縮約値「c3」に変換される。また、クエリ処理プログラム１０５は、リソースに値あれば、そのリソースの値を、縮約表１０２を用いて変換する。ここまでの変換により、縮約クエリ９０２が生成される。 Specifically, for example, the query processing program 105 converts literals connected to the predicates “rank” and “degree” into contracted values using the contraction criterion table 101, respectively. As a result, “4 or more” for “rank” is converted to the contracted value “c2”, and “1” for “degree” is converted to the contracted value “c3”. Further, if the value is a resource, the query processing program 105 converts the value of the resource using the contraction table 102. By the conversion so far, the contracted query 902 is generated.

　クエリ処理プログラム１０５は、この縮約クエリ９０２と、縮約ＲＤＦデータ１１６のパターンマッチを実行する。その結果、リソース「c6」、リテラル「c2」及び「c3」の縮約ＲＤＦデータ部を発見できる。その後、クエリ処理プログラム１０５は、縮約表１０２及び元クエリ９０１を用いて、縮約値を元の値に戻す。これにより、縮約値「c6」はリソース「B」又は「D」に戻され、縮約値「c2」は「4以上」に戻され、縮約値「c3」は「1」に戻される。クエリ処理プログラム１０５は、その情報（リソースの値に「B」又は「D」が代入された元クエリ９０１）を用いて、元ＲＤＦ１１５を検索する。つまり、クエリ処理プログラム１０５は、リソースが「B」で「rank」について「4以上」且つ「degree」について「1」に該当するデータ、及び、リソースが「D」で「rank」について「4以上」且つ「degree」について「1」に該当するデータを、元ＲＤＦデータ１１５から探索する。結果、リソースが「B」で「rank」について「7」且つ「degree」が「1」のデータが見つかる。 The query processing program 105 executes pattern matching between the contracted query 902 and the contracted RDF data 116. As a result, the reduced RDF data part of the resource “c6” and the literals “c2” and “c3” can be found. Thereafter, the query processing program 105 uses the contraction table 102 and the original query 901 to return the contracted value to the original value. As a result, the reduced value “c6” is returned to the resource “B” or “D”, the reduced value “c2” is returned to “4 or more”, and the reduced value “c3” is returned to “1”. . The query processing program 105 searches the original RDF 115 using the information (the original query 901 in which “B” or “D” is substituted for the resource value). That is, the query processing program 105 determines that the data corresponding to “1” for “rank” and “1” for “rank” and “4” for “rank” and “rank” for the resource “B”. The data corresponding to “1” for “degree” is searched from the original RDF data 115. As a result, data with the resource “B” and “rank” “7” and “degree” “1” is found.

　このように、縮約ＲＤＦデータ１１６を利用することで、大量にあるリソース候補を「B」と「D」に絞り込むことができる。この絞り込みにより、「B」と「D」以外のリソースを探索する必要がなく、検索時間を短縮することができる。 Thus, by using the contracted RDF data 116, a large number of resource candidates can be narrowed down to “B” and “D”. By narrowing down, it is not necessary to search for resources other than “B” and “D”, and the search time can be shortened.

　本実施形態では、このような計算機システム１００において、縮約ＲＤＦデータ１１６の更新、及び、そのメタ情報（縮約基準表１０１及び縮約表１０２）の更新が可能である。以下、それについて詳細に説明する。 In the present embodiment, in such a computer system 100, it is possible to update the contracted RDF data 116 and update the meta information (contraction standard table 101 and contract table 102). This will be described in detail below.

　図１０は、差分更新処理のフローチャートである。 FIG. 10 is a flowchart of the difference update process.

　差分更新プログラム１０６は、元ＲＤＦデータ１１５に対する差分を入力として取得する（ステップ１００１）。差分は、元ＲＤＦデータ１１５に対する単位ＲＤＦデータの追加、又は、元ＲＤＦデータ１１５からの単位ＲＤＦデータの削除である。なお、或る単位ＲＤＦデータにおける値が別の値に更新される場合は、その或る単位ＲＤＦデータの削除と、更新後の値を含んだ単位ＲＤＦデータの追加の両方が行われる。 The difference update program 106 acquires the difference with respect to the original RDF data 115 as an input (step 1001). The difference is addition of unit RDF data to the original RDF data 115 or deletion of unit RDF data from the original RDF data 115. When the value in a certain unit RDF data is updated to another value, both the deletion of the certain unit RDF data and the addition of the unit RDF data including the updated value are performed.

　差分更新プログラム１０６は、データベースをロックする（ステップ１００２）。データベースのロックとは、例えば、クエリ処理プログラム１０５が元クエリを処理しない状態でよい。データベースがロックされることにより、更新後のデータベースにおけるデータがクエリ発行元に返されないといったエラーが生じることを回避できる。 The difference update program 106 locks the database (step 1002). The database lock may be, for example, a state where the query processing program 105 does not process the original query. By locking the database, it is possible to avoid an error that data in the updated database is not returned to the query issuer.

　その後、差分更新プログラム１０６は、ステップ１００１で入力された差分に従い、元ＲＤＦデータ１１５を更新する（ステップ１００３）。例えば、差分が、単位ＲＤＦデータの追加であれば、差分更新プログラム１０６は、追加対象の単位ＲＤＦデータに対応したレコードを元ＲＤＦデータ１１５に追加する。また、例えば、差分が、単位ＲＤＦデータの削除であれば、差分更新プログラム１０６は、削除対象の単位ＲＤＦデータに対応したレコードを元ＲＤＦデータ１１５から削除する。 Thereafter, the difference update program 106 updates the original RDF data 115 in accordance with the difference input in Step 1001 (Step 1003). For example, if the difference is addition of unit RDF data, the difference update program 106 adds a record corresponding to the addition target unit RDF data to the original RDF data 115. For example, if the difference is deletion of the unit RDF data, the difference update program 106 deletes the record corresponding to the unit RDF data to be deleted from the original RDF data 115.

　差分更新プログラム１０６は、縮約ＲＤＦ更新処理を行う（ステップ１００４）。 The difference update program 106 performs a contracted RDF update process (step 1004).

　その後、差分更新プログラム１０６は、データベースをアンロックする（ステップ１００５）。以後、クエリ処理プログラム１０５が元クエリを処理できる。 Thereafter, the differential update program 106 unlocks the database (step 1005). Thereafter, the query processing program 105 can process the original query.

　なお、データベースのロック／アンロックの代替例として、次の代替例１及び２が考えられる。代替例１では、差分更新プログラム１０６は、元ＲＤＦデータ１１５及び縮約ＲＤＦデータ１１６のそれぞれについてスナップショットをとり、差分更新処理中に元クエリを処理する場合には、元ＲＤＦデータ１１５及び縮約ＲＤＦデータ１１６のそれぞれのスナップショットを参照してよい。代替例２では、差分更新プログラム１０６は、ステップ１００１で入力された差分について縮約ＲＤＦ更新処理（ステップ１００４）が終わるまでは、参照について、縮約技術を利用しないで元ＲＤＦデータ１１５に対してフォールバックを行ってよい。 As alternative examples of database lock / unlock, the following alternative examples 1 and 2 can be considered. In alternative example 1, the differential update program 106 takes a snapshot of each of the original RDF data 115 and the reduced RDF data 116, and when processing the original query during the differential update process, the original RDF data 115 and the reduced RDF data 115 are reduced. Each snapshot of RDF data 116 may be referenced. In alternative example 2, the difference update program 106 uses the reduced RDF update process (step 1004) for the difference input in step 1001 for the reference RDF data 115 without using the reduction technique until the end. You can do a fallback.

　図１１は、縮約ＲＤＦ更新処理（図１０のステップ１００４）のフローチャートである。 FIG. 11 is a flowchart of the contracted RDF update process (step 1004 in FIG. 10).

　差分更新プログラム１０６は、差分（図１０のステップ１１０１での入力）が削除であるかどうかを判断する（ステップ１１０１）。 The difference update program 106 determines whether or not the difference (input in step 1101 in FIG. 10) is deletion (step 1101).

　差分が追加の場合（ステップ１１０１：Ｎｏ）、差分更新プログラム１０６は、縮約値決定処理を行う（ステップ１１０２）。これにより、追加された単位ＲＤＦデータ内の主語及び目的語のそれぞれが、縮約値に変換され、また、必要に応じて、縮約基準表１０１又は縮約表１０２が更新される。そして、差分更新プログラム１０６は、主語の縮約値である第１の縮約値から、目的語の縮約値である第２の縮約値へと、追加された単位ＲＤＦデータ内の述語と同じ述語で繋ぎ、第１の縮約値と、第２の縮約値と、それらを繋ぐ述語との組合せである単位縮約ＲＤＦデータに対応したレコードを、縮約ＲＤＦデータ１１６に追加する（ステップ１１０３）。これにより、縮約ＲＤＦ更新処理が終了する。 When the difference is added (step 1101: No), the difference update program 106 performs a contracted value determination process (step 1102). Thereby, each of the subject and the object in the added unit RDF data is converted into a contracted value, and the contraction reference table 101 or the contraction table 102 is updated as necessary. Then, the difference update program 106 changes the predicate in the added unit RDF data from the first contracted value that is the contracted value of the subject to the second contracted value that is the contracted value of the object. A record corresponding to unit contracted RDF data that is a combination of the first contracted value, the second contracted value, and a predicate that connects them is added to the contracted RDF data 116 by connecting with the same predicate ( Step 1103). Thereby, the contracted RDF update process ends.

　差分が削除の場合（ステップ１１０１：Ｙｅｓ）、差分更新プログラム１０６は、ステップ１１０２及び１１０３を実行することなく、縮約ＲＤＦ更新処理を終了する。つまり、差分が削除の場合、縮約ＲＤＦデータ１１６もそのメタ情報も更新されない。言い換えれば、縮約ＲＤＦデータ１１６は、元ＲＤＦデータ１１５における接続関係を不足なく含んでいれさえすればよい。縮約ＲＤＦデータ１１６から、元ＲＤＦデータ１１５に無いノードの縮約値が特定され、その縮約値が使用されても、元ＲＤＦデータ１１５から該当するデータが見つからないだけで、クエリ発行元に対して不具合を生じさせることはない。このような観点から、差分が削除の場合、ステップ１１０２及び１１０３がスキップされるので、縮約ＲＤＦ更新処理が短時間で終了する。 If the difference is deleted (step 1101: Yes), the difference update program 106 ends the reduced RDF update process without executing steps 1102 and 1103. That is, when the difference is deleted, neither the contracted RDF data 116 nor its meta information is updated. In other words, the contracted RDF data 116 only needs to include the connection relationship in the original RDF data 115 without a shortage. Even if a contracted value of a node that is not in the original RDF data 115 is specified from the contracted RDF data 116 and the contracted value is used, only the corresponding data is not found in the original RDF data 115, and the query issuer is notified. It does not cause any problems. From this point of view, when the difference is deletion, steps 1102 and 1103 are skipped, so that the contracted RDF update process is completed in a short time.

　なお、単位ＲＤＦデータにおける値の更新の場合、差分は、前述したように、削除と追加の両方である。この場合、差分が削除の場合の処理は非実行とされ、差分が追加の場合の処理のみが実行されてよい。 In the case of updating the value in the unit RDF data, the difference is both deletion and addition as described above. In this case, the process when the difference is deletion is not executed, and only the process when the difference is added may be executed.

　図１２は、縮約値決定処理（図１１のステップ１１０２）のフローチャートである。この処理は、追加された単位ＲＤＦデータの主語と目的語の各々について行われる。従って、１つの単位ＲＤＦデータにつき、図１２の処理は２回行われる。 FIG. 12 is a flowchart of the contracted value determination process (step 1102 in FIG. 11). This processing is performed for each of the subject and object of the added unit RDF data. Therefore, the process of FIG. 12 is performed twice for one unit RDF data.

　差分更新プログラム１０６は、追加された単位ＲＤＦデータ内の目的語がリテラルかどうかを判断する（ステップ１２０１）。 The difference update program 106 determines whether or not the object in the added unit RDF data is a literal (step 1201).

　目的語がリテラルの場合（ステップ１２０１：Ｙｅｓ）、差分更新プログラム１０６は、縮約基準表１０１を基に、追加された単位ＲＤＦデータ内の述語について縮約済かどうかを判断する（ステップ１２０２）。追加された単位ＲＤＦデータ内の述語と同じ基準述語が縮約基準表１０１に登録されていれば、ステップ１２０２の判断結果は真（縮約済）である。 When the object is a literal (step 1201: Yes), the difference update program 106 determines whether or not the predicate in the added unit RDF data has been contracted based on the contraction criterion table 101 (step 1202). . If the same reference predicate as the predicate in the added unit RDF data is registered in the contraction standard table 101, the determination result in step 1202 is true (contracted).

　ステップ１２０２の判断結果が偽の場合（ステップ１２０２：Ｎｏ）、差分更新プログラム１０６は、追加された単位ＲＤＦデータ内の述語を基準述語としたレコードを、縮約基準表１０１に追加する（ステップ１２０３）。追加されたレコードには、その述語に対して差分更新プログラム１０６により割り振られた縮約値と、追加された単位ＲＤＦデータ内の目的語（値）を閾値とした縮約範囲とが登録される。図１３が、ステップ１２０３の結果の一例、すなわち、縮約基準表１０１に追加されたレコード（破線枠内のレコード）の一例を示す。この例は、図５に例示した追加（破線枠内の単位ＲＤＦデータ）に対応する。ステップ１２０３の後、差分更新プログラム１０６は、縮約基準表１０１から、追加された単位ＲＤＦデータ内の目的語に対応した縮約値を決定する（ステップ１２０４）。つまり、差分更新プログラム１０６は、追加された単位ＲＤＦデータ内の主語及び目的語をそれぞれ縮約値に変換する。 When the determination result of step 1202 is false (step 1202: No), the difference update program 106 adds a record that uses the predicate in the added unit RDF data as a reference predicate to the contracted reference table 101 (step 1203). ). In the added record, the contracted value assigned by the difference update program 106 to the predicate and the contracted range using the object (value) in the added unit RDF data as a threshold value are registered. . FIG. 13 shows an example of the result of step 1203, that is, an example of a record (record in a broken line frame) added to the contraction criterion table 101. This example corresponds to the addition illustrated in FIG. 5 (unit RDF data in a broken line frame). After step 1203, the difference update program 106 determines a contracted value corresponding to the object in the added unit RDF data from the contraction criterion table 101 (step 1204). That is, the difference update program 106 converts the subject and object in the added unit RDF data into contracted values.

　ステップ１２０２の判断結果が真の場合（ステップ１２０２：Ｙｅｓ）、差分更新プログラム１０６は、ステップ１２０３を行うことなく、追加された単位ＲＤＦデータ内の主語及び目的語のそれぞれについて縮約値を決定する（ステップ１２０４）。 If the determination result in step 1202 is true (step 1202: Yes), the difference update program 106 determines a contraction value for each of the subject and object in the added unit RDF data without performing step 1203. (Step 1204).

　目的語がリテラルでない場合（ステップ１２０１：Ｎｏ）、差分更新プログラム１０６は、縮約表１０２を基に、追加された単位ＲＤＦデータ内の主語又は目的語が新しいリソースかどうかを判断する（ステップ１２１２）。追加された単位ＲＤＦデータ内の主語（値）又は目的語（値）がリソースとして縮約表１０２に登録されていなければ、ステップ１２１２の判断結果は真（新リソース）である。 If the object is not a literal (step 1201: No), the difference update program 106 determines whether the subject or object in the added unit RDF data is a new resource based on the contract table 102 (step 1212). ). If the subject (value) or object (value) in the added unit RDF data is not registered in the contract table 102 as a resource, the determination result in step 1212 is true (new resource).

　ステップ１２１２の判断結果が真の場合（ステップ１２１２：Ｙｅｓ）、差分更新プログラム１０６は、追加された単位ＲＤＦデータ内の主語又は目的語を新リソースとしたレコードを、縮約表１０２に追加する（ステップ１２１３）。追加されたレコードには、その新リソースに対して差分更新プログラム１０６により割り振られた縮約値が登録される。図１４が、ステップ１２１３の結果の一例、すなわち、縮約表１０２に追加されたレコード（破線枠内のレコード）の一例を示す。この例は、図５に例示した追加（破線枠内の単位ＲＤＦデータ）に対応する。なお、追加された単位ＲＤＦデータ内の主語及び目的語の両方が新リソースに該当する場合、縮約表１０２には主語及び目的語にそれぞれ対応した２つのレコードが追加される。ステップ１２１３の後、差分更新プログラム１０６は、縮約表１０２から、追加された単位ＲＤＦデータ内の主語又は目的語に対応した縮約値を決定する（ステップ１２１４）。つまり、差分更新プログラム１０６は、追加された単位ＲＤＦデータ内の主語及び目的語をそれぞれ縮約値に変換する。 If the determination result in step 1212 is true (step 1212: Yes), the difference update program 106 adds a record that uses the subject or object in the added unit RDF data as a new resource to the contraction table 102 ( Step 1213). In the added record, the contracted value assigned to the new resource by the differential update program 106 is registered. FIG. 14 shows an example of the result of step 1213, that is, an example of a record added to the contraction table 102 (a record in a broken line frame). This example corresponds to the addition illustrated in FIG. 5 (unit RDF data in a broken line frame). When both the subject and the object in the added unit RDF data correspond to the new resource, two records corresponding to the subject and the object are added to the contract table 102, respectively. After step 1213, the difference update program 106 determines a contracted value corresponding to the subject or object in the added unit RDF data from the contract table 102 (step 1214). That is, the difference update program 106 converts the subject and object in the added unit RDF data into contracted values.

　ステップ１２１２の判断結果が偽の場合（ステップ１２１２：Ｎｏ）、差分更新プログラム１０６は、ステップ１２１３を行うことなく、追加された単位ＲＤＦデータ内の主語又は目的語のそれぞれについて縮約値を決定する（ステップ１２１４）。 If the determination result in step 1212 is false (step 1212: No), the difference update program 106 determines a contraction value for each of the subject or object in the added unit RDF data without performing step 1213. (Step 1214).

　ステップ１２０４又は１２１４により、縮約値決定処理（図１１のステップ１１０２）が終了する。その後、図１１のステップ１１０３が行われる。図１５が、ステップ１１０３の結果の一例、すなわち、縮約ＲＤＦデータ１１６に追加されたレコード（破線枠内のレコード）の一例を示す。この例は、図５、図１３及び図１４に例示した追加（破線枠内の単位ＲＤＦデータ）に対応する。 In step 1204 or 1214, the contracted value determination process (step 1102 in FIG. 11) ends. Thereafter, step 1103 in FIG. 11 is performed. FIG. 15 shows an example of the result of step 1103, that is, an example of a record (record in a broken line frame) added to the contracted RDF data 116. This example corresponds to the addition illustrated in FIGS. 5, 13, and 14 (unit RDF data in a broken line frame).

　以上のようにして、縮約ＲＤＦデータ１１６の更新、及び、そのメタ情報（縮約基準表１０１及び縮約表１０２）の更新が行われる。これにより、元ＲＤＦデータ１１５が更新されても、更新後の元ＲＤＦデータ１１５から縮約ＲＤＦデータ１１６を新たに作成することなく、更新後の元ＲＤＦデータ１１５から縮約ＲＤＦデータ１１６を用いてデータを取得できる。 As described above, the contracted RDF data 116 is updated, and the meta information (contract standard table 101 and contract table 102) is updated. Thus, even if the original RDF data 115 is updated, the reduced RDF data 116 is used from the updated original RDF data 115 without newly creating the reduced RDF data 116 from the updated original RDF data 115. Data can be acquired.

　以上のようにして縮約ＲＤＦデータ１１６の更新を続けていくと、検索性能が劣化する可能性がある。 If the reduced RDF data 116 is continuously updated as described above, the search performance may be deteriorated.

　具体的には、例えば、縮約ＲＤＦデータ１１６の作成において、リテラルの数が均等になるようにリテラル値範囲（縮約範囲）が分割される。例えば、１０００個のリテラルがあり、１０００個のリテラルの値が１～１０００と連続した整数の場合、リテラルの値範囲は、１～２５０、２５１～５００、５０１～７５０、及び７５１～１０００と分割されてよい。これにより、各値範囲に属するリテラル数は２５０個と均等であり、縮約ＲＤＦデータ１１６を用いて検索範囲を絞り込んだ場合に、絞り込み先に含まれる要素数（リテラル数）の偏りを防ぐことができ、以って、適度な検索性能を維持できる。しかし、本実施形態では、縮約ＲＤＦデータ１１６及びそのメタ情報は更新されるものの、値範囲（縮約範囲）の再分割は行われない。このため、差分が増えると、複数の値範囲にそれぞれ対応する複数のリテラル数の偏り（偏り）が増え、十分な絞り込み効果を得られず、以って、検索性能が劣化する可能性がある。 Specifically, for example, in creating the contracted RDF data 116, the literal value range (contracted range) is divided so that the number of literals is equal. For example, if there are 1000 literals and the value of 1000 literals is an integer consecutive with 1 to 1000, the literal value range is divided into 1 to 250, 251 to 500, 501 to 750, and 751 to 1000. May be. As a result, the number of literals belonging to each value range is equal to 250, and when the search range is narrowed down using the contracted RDF data 116, the bias in the number of elements (literal number) included in the narrowing down destination is prevented. Therefore, moderate search performance can be maintained. However, in this embodiment, the reduced RDF data 116 and its meta information are updated, but the value range (reduced range) is not subdivided. For this reason, when the difference increases, the bias (bias) of a plurality of literal numbers corresponding to each of a plurality of value ranges increases, and a sufficient narrowing effect cannot be obtained, thereby possibly degrading search performance. .

　そこで、本実施形態では、データ整理プログラム１０７が、図１６に示すデータ整理処理を行う。具体的には、データ整理プログラム１０７は、検索性能が劣化したとみなされる所定の条件である縮約条件が満たされた場合に（ステップ１６０１：Ｙｅｓ）、元ＲＤＦデータ１１５から最新の縮約ＲＤＦデータを作成する（ステップ１６０２）。ステップ１６０２の後、旧い縮約ＲＤＦデータは外部ストレージ装置１１４からデータ整理プログラム１０７により削除されてよい。 Therefore, in this embodiment, the data organization program 107 performs the data organization process shown in FIG. Specifically, the data reduction program 107, when a reduction condition, which is a predetermined condition for which the search performance is regarded as degraded (Step 1601: Yes), is updated from the original RDF data 115 to the latest reduced RDF. Data is created (step 1602). After step 1602, the old reduced RDF data may be deleted from the external storage device 114 by the data reduction program 107.

　縮約条件は、変更不可能に予め決められていてもよいし、管理者（手動）により変更されてもよいし、自動により（例えば、縮約ＲＤＦデータ１１６を用いた検索処理に関する履歴情報を基に）変更されてもよい。例えば、データ整理プログラム１０７は、出力デバイス（表示デバイス）１１２に、縮約条件を入力するための縮約条件設定画面を表示する。 The contraction condition may be determined in advance so that it cannot be changed, may be changed by an administrator (manually), or automatically (for example, history information regarding search processing using the contracted RDF data 116 is stored. May be changed). For example, the data reduction program 107 displays a reduction condition setting screen for inputting reduction conditions on the output device (display device) 112.

　図１７は、縮約条件設定画面の一例を示す。 FIG. 17 shows an example of the contraction condition setting screen.

　縮約条件設定画面は、例えばＧＵＩ（Graphical　User　Interface）であり、５つの条件のうちのいずれを縮約条件として採用するかを指定するツール（例えばチェックボックス）と、各条件について関連付けられる閾値の入力欄とを有する。データ整理プログラム１０７は、この縮約条件設定画面を通じて入力された縮約条件を表す縮約条件情報を記憶資源１０３に設定する。 The reduction condition setting screen is, for example, a GUI (Graphical User Interface), a tool (for example, a check box) that specifies which of the five conditions is adopted as the reduction condition, and a threshold value associated with each condition. And an input field. The data reduction program 107 sets contraction condition information representing the contraction condition input through the contraction condition setting screen in the storage resource 103.

　縮約条件は、前述したように、下記５つの条件のうちの少なくとも１つで構成される。 As described above, the contraction condition includes at least one of the following five conditions.

　（１）頻度 (1) Frequency

　頻度は、直前回にステップ１６０２（新たに縮約グラフデータを生成すること）が行われてからの経過時間である。つまり、縮約条件としての「頻度」は、ステップ１６０２が周期的に行われることを意味する。頻度に対応した閾値は、ステップ１６０２の実行周期（例えば「毎週土曜」）であり、その実行周期を、頻度に対応した閾値入力欄に入力可能である。データ整理プログラム１０７は、閾値として設定された周期毎に、新たに縮約グラフデータを生成する。なお、データ整理プログラム１０７は、ステップ１６０２の実行時刻と、検索性能（例えば、１つの元クエリを受け付けてからその元クエリに従う検索結果をクエリ発行元に返すまでの時間）の時系列変化と、プロセッサ１１０の負荷の時系列変化とのうちの少なくとも１つを含む履歴情報を記憶資源１０３（又は外部ストレージ装置１１４）に格納し、その履歴情報を基に、頻度に対応した閾値を更新してもよい。縮約ＲＤＦデータ１１６を新たに作成する際には、元のＲＤＦデータ１１５を全件検査する処理が含まれるため、データ整理プログラム１０７は、元のＲＤＦデータ１１５のサイズも更に基にして、頻度に対応した閾値を更新してよい。 The frequency is an elapsed time since step 1602 (new generation of contracted graph data) is performed immediately before. That is, “frequency” as the contraction condition means that step 1602 is periodically performed. The threshold corresponding to the frequency is the execution cycle of step 1602 (for example, “every Saturday”), and the execution cycle can be input in the threshold value input column corresponding to the frequency. The data reduction program 107 newly generates contracted graph data for each period set as the threshold value. Note that the data reduction program 107 changes the time series of the execution time of step 1602 and the search performance (for example, the time from when one original query is received until the search result according to the original query is returned to the query issuer), The history information including at least one of the time-series changes in the load of the processor 110 is stored in the storage resource 103 (or the external storage device 114), and the threshold corresponding to the frequency is updated based on the history information. Also good. When the new reduced RDF data 116 is newly created, a process for inspecting all of the original RDF data 115 is included. Therefore, the data reduction program 107 further determines the frequency based on the size of the original RDF data 115. You may update the threshold value corresponding to.

　（２）時刻 (2) Time

　時刻は、ステップ１６０２（新たに縮約グラフデータを生成すること）が行われる時刻（例えば年月日時分）である。つまり、縮約条件としての「時刻」は、ステップ１６０２が指定時刻に行われることを意味する。時刻に対応した閾値は、ステップ１６０２の実行時刻であり、その実行時刻（例えば「2014/02/12/02:00」）を、時刻に対応した閾値入力欄に入力可能である。データ整理プログラム１０７は、閾値として時刻に現時点が達した場合に、新たに縮約グラフデータを生成する。なお、データ整理プログラム１０７は、ステップ１６０２の直前回の実行時刻と、検索性能の時系列変化と、プロセッサ１１０の負荷の時系列変化とのうちの少なくとも１つを含む履歴情報を記憶資源１０３（又は外部ストレージ装置１１４）に格納し、その履歴情報を基に、時刻に対応した閾値を決定してもよい。データ整理プログラム１０７は、元のＲＤＦデータ１１５のサイズも更に基にして、時刻に対応した閾値を決定してよい。 The time is a time (for example, year / month / day / minute) at which step 1602 (new generation of contracted graph data) is performed. That is, “time” as the contraction condition means that step 1602 is performed at the specified time. The threshold corresponding to the time is the execution time of step 1602, and the execution time (for example, “2014/02/12/02: 00”) can be input in the threshold value input field corresponding to the time. The data reduction program 107 newly generates contracted graph data when the current time reaches the time as a threshold. The data reduction program 107 stores history information including at least one of the execution time immediately before step 1602, a time-series change in search performance, and a time-series change in the load on the processor 110 as the storage resource 103 ( Alternatively, the threshold value corresponding to the time may be determined based on the history information stored in the external storage device 114). The data reduction program 107 may determine a threshold corresponding to the time based further on the size of the original RDF data 115.

　（３）差分容量比 (3) Differential capacity ratio

　差分容量比とは、過去の或る時点の元のグラフデータのサイズに対して現時点の元のグラフデータのサイズの倍率を意味する。元グラフデータが更新されなければ、複数の値範囲（縮約範囲）にそれぞれ対応した複数のリテラル数（要素数）の偏りは発生しない。リテラル数が偏っていない状態で元ＲＤＦデータ１１５を縮約しても無駄な処理となる。そこで、縮約条件の一要素として、差分容量比が採用される。差分容量比に対応した閾値は、差分容量比としての上記倍率であり、その倍率（例えば「２００％」）を、差分容量比に対応した閾値入力欄に入力可能である。データ整理プログラム１０７は、過去の或る時点の元のグラフデータのサイズに対して現時点の元のグラフデータのサイズの倍率が、閾値として設定された倍率に達した場合に、新たに縮約グラフデータを生成する。なお、データ整理プログラム１０７は、差分容量比の時系列変化と検索性能の時系列変化とを含む履歴情報を記憶資源１０３（又は外部ストレージ装置１１４）に格納し、その履歴情報を基に、差分容量比に対応した閾値を更新してもよい。データ整理プログラム１０７は、元のＲＤＦデータ１１５のサイズも更に基にして、差分容量比に対応した閾値を更新してよい。 The difference capacity ratio means a magnification of the size of the original graph data at the present time with respect to the size of the original graph data at a certain time in the past. If the original graph data is not updated, there will be no bias in the number of literals (number of elements) corresponding to each of a plurality of value ranges (reduction ranges). Even if the original RDF data 115 is contracted in a state where the number of literals is not biased, useless processing is performed. Therefore, the differential capacity ratio is adopted as an element of the contraction condition. The threshold value corresponding to the differential capacity ratio is the above-described magnification as the differential capacity ratio, and the magnification (for example, “200%”) can be input in the threshold value input field corresponding to the differential capacity ratio. The data reduction program 107 newly reduces the reduced graph when the magnification of the original graph data size at the present time reaches the magnification set as the threshold with respect to the original graph data size at a certain past time. Generate data. The data reduction program 107 stores history information including the time series change of the difference capacity ratio and the time series change of the search performance in the storage resource 103 (or the external storage device 114), and based on the history information, a difference is stored. The threshold value corresponding to the capacity ratio may be updated. The data reduction program 107 may update the threshold corresponding to the differential capacity ratio based on the size of the original RDF data 115.

　（４）検索性能 (4) Search performance

　検索性能に対応した閾値は、検索性能としての上記時間であり、その時間（例えば「1sec」）を、検索性能に対応した閾値入力欄に入力可能である。データ整理プログラム１０７は、検索性能を監視してよい。データ整理プログラム１０７は、検索性能が所定回数以上（例えば１回でも）閾値として設定された検索性能より低下した場合に（検索にかかる時間が閾値としての時間よりも長くなった場合に）、新たに縮約グラフデータを生成する。 The threshold corresponding to the search performance is the above time as the search performance, and the time (for example, “1 sec”) can be input in the threshold value input column corresponding to the search performance. The data reduction program 107 may monitor the search performance. When the search performance is lower than the search performance set as the threshold value by a predetermined number of times (for example, even once) (when the time required for the search becomes longer than the threshold time), the data reduction program 107 Generate reduced graph data.

　（５）偏り (5) Unbalance

　前述したように、縮約ＲＤＦデータ１１６の作成時点では、各値範囲（縮約範囲）に属するリテラル数は均等である。値範囲毎のリテラル数が、初期縮約プログラム１０４又はデータ整理プログラム１０７により記憶資源１０３に格納されてよい。データ整理プログラム１０７又は差分更新プログラム１０６は、元ＲＤＦデータ１１５を更新（データ追加又はデータ削除）の都度に、追加又は削除されたデータに含まれるリテラルが属する値範囲を特定し、その特定した値範囲に対応したリテラル数を更新してよい。リテラル数（要素数）の偏りとして、閾値入力欄には、例えば、リテラル数の最大値に対するリテラル数の最小値の割合が入力可能である。データ整理プログラム１０７は、複数の値範囲にそれぞれ対応した複数のリテラル数の偏りが、入力された閾値としての偏り以上になった場合、新たに縮約グラフデータを生成する。 As described above, at the time of creation of the contracted RDF data 116, the number of literals belonging to each value range (contracted range) is equal. The number of literals for each value range may be stored in the storage resource 103 by the initial reduction program 104 or the data reduction program 107. Each time the original RDF data 115 is updated (data addition or data deletion), the data reduction program 107 or the difference update program 106 specifies the value range to which the literal included in the added or deleted data belongs, and the specified value The number of literals corresponding to the range may be updated. As the bias of the number of literals (number of elements), for example, a ratio of the minimum value of the literal number to the maximum value of the literal number can be input in the threshold value input field. The data reduction program 107 newly generates contracted graph data when the bias of the plurality of literal numbers respectively corresponding to the plurality of value ranges exceeds the bias as the input threshold value.

　以上、一実施形態を説明したが、これは本発明の説明のための例示であって、本発明の範囲をこの実施形態にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能である。例えば、元ＲＤＦデータ１１５の各レコードに、縮約ＲＤＦデータ１１６からのリンク情報（インデックス）が追加されてよい。リンク情報は、主語及び目的語にそれぞれ対応した縮約値を含んでよい。この場合、クエリ処理プログラム１０５は、縮約値をキーに元ＲＤＦデータ１１５からリソースを特定できる。 As mentioned above, although one embodiment was described, this is an illustration for explaining the present invention, and the scope of the present invention is not limited to this embodiment. The present invention can be implemented in various other forms. For example, link information (index) from the contracted RDF data 116 may be added to each record of the original RDF data 115. The link information may include contracted values corresponding to the subject and the object, respectively. In this case, the query processing program 105 can specify the resource from the original RDF data 115 using the contracted value as a key.

１００…計算機システム 100: Computer system

Claims

A data update method used in a system for retrieving data from the original graph data using the reduced graph data, which is data obtained by reducing the original graph data,
The original graph data is a set of unit data,
Each unit data is a combination of a first node value, a second node value, and an edge from the first node value to the second node value,
The reduced graph data is a set of a plurality of unit reduced data,
Each unit reduced data includes a reduced value of the first node value, a reduced value of the second node value, an edge from the reduced value of the first node value to the reduced value of the second node value, and A combination of
The data update method includes:
When new unit data is added to the original graph data, a first contraction that is a contraction value corresponding to a first node value in the new unit data is obtained from the meta information of the contraction graph data. A value and a second reduced value that is a reduced value corresponding to the second node value in the new unit data;
The first reduced value, the second reduced value, and the edge from the first reduced value to the second reduced value, the same edge as the edge in the new unit data Adding new unit contracted data that is a combination of and to the contracted graph data,
Data update method.

The meta information includes first and second sub meta information,
The first sub meta information is information representing a relationship between a contracted value for each edge in the original graph data and a range of terminal node values in the original graph data;
The second sub meta information is information representing a plurality of contracted values respectively corresponding to a plurality of non-terminal node values included in the original graph data.
The data update method according to claim 1.

If the second node value in the new unit data is a terminal node value and the type of edge in the new unit data is not registered in the first sub meta information, the edge in the new unit data A first combination of a new contracted value and a value range determined according to a second node value in the new unit data is added to the first sub meta information,
The second contracted value is specified from the first sub meta information including the first combination.
The data update method according to claim 2.

The second node value in the new unit data is a non-terminal node value, and at least one of the first node value and the second node value in the new unit data is not registered in the second sub meta information. A second combination of a value that is at least one of the first node value and the second node value in the new unit data and is not registered in the second sub-meta information, and a new contracted value, Adding to the second sub-meta information,
The first contracted value is specified from the second sub meta information including the second combination.
The data update method according to claim 2.

Even if unit data is deleted from the original graph data, the contracted graph data is not updated.
The data update method according to claim 1.

Accepts input for certain conditions that are considered degraded search performance,
When the predetermined condition is satisfied, the contracted graph data is newly generated by contracting the original graph data.
The data update method according to claim 1.

That the predetermined condition is satisfied is that (a) a predetermined time has elapsed since the generation of the new reduced graph data immediately before, (b) the current time is the predetermined time, (c) the current source The size of the graph data is greater than or equal to a predetermined magnification compared to the size of the original graph data at a certain time in the past, (d) the performance of retrieving data from the original graph data has decreased below a predetermined value, (E) At least one of the plurality of terminal node biases belonging to the plurality of terminal node value ranges is equal to or greater than a predetermined bias,
The data update method according to claim 6.

A storage resource for storing meta information of the reduced graph data, which is data obtained by reducing the original graph data;
A processor connected to the storage resource and retrieving data from the original graph data using the reduced graph data;
The original graph data is a set of unit data,
Each unit data is a combination of a first node value, a second node value, and an edge from the first node value to the second node value,
The reduced graph data is a set of a plurality of unit reduced data,
Each unit reduced data includes a reduced value of the first node value, a reduced value of the second node value, an edge from the reduced value of the first node value to the reduced value of the second node value, and A combination of
When new unit data is added to the original graph data, the processor
A first contracted value corresponding to a first node value in the new unit data and a second contracted value corresponding to a second node value in the new unit data. Identifying an approximate value from the meta information,
The first reduced value, the second reduced value, and the edge from the first reduced value to the second reduced value, the same edge as the edge in the new unit data Adding new unit contracted data that is a combination of and to the contracted graph data,
Computer system.

The meta information includes first and second sub meta information,
The first sub meta information is information representing a relationship between a contracted value for each edge in the original graph data and a range of terminal node values in the original graph data;
The second sub meta information is information representing a plurality of contracted values respectively corresponding to a plurality of non-terminal node values included in the original graph data.
The computer system according to claim 8.

If the second node value in the new unit data is a terminal node value, and the edge type in the new unit data is not registered in the first sub meta information, the processor Adding a first combination of an edge in the data, a new reduced value, and a value range determined according to a second node value in the new unit data to the first sub meta information;
The second contracted value is specified from the first sub meta information including the first combination.
The computer system according to claim 9.

The second node value in the new unit data is a non-terminal node value, and at least one of the first node value and the second node value in the new unit data is not registered in the second sub meta information. In this case, the processor has a second value of a value that is at least one of the first node value and the second node value in the new unit data and is not registered in the second sub-meta information, and a new reduced value. Is added to the second sub-meta information,
The first contracted value is specified from the second sub meta information including the second combination.
The computer system according to claim 9.

Even if unit data is deleted from the original graph data, the processor does not update the reduced graph data.
The computer system according to claim 8.

The processor is
Accepts input for certain conditions that are considered degraded search performance,
When the predetermined condition is satisfied, the contracted graph data is newly generated by contracting the original graph data.
The computer system according to claim 8.

That the predetermined condition is satisfied is that (a) a predetermined time has elapsed since the generation of the new reduced graph data immediately before, (b) the current time is the predetermined time, (c) the current source The size of the graph data is greater than or equal to a predetermined magnification compared to the size of the original graph data at a certain time in the past, (d) the performance of retrieving data from the original graph data has decreased below a predetermined value, (E) At least one of the plurality of terminal node biases belonging to the plurality of terminal node value ranges is equal to or greater than a predetermined bias,
The computer system according to claim 13.