WO2017191877A1 - Dispositif de compression et procédé de gestion de la provenance - Google Patents
Dispositif de compression et procédé de gestion de la provenance Download PDFInfo
- Publication number
- WO2017191877A1 WO2017191877A1 PCT/KR2016/013271 KR2016013271W WO2017191877A1 WO 2017191877 A1 WO2017191877 A1 WO 2017191877A1 KR 2016013271 W KR2016013271 W KR 2016013271W WO 2017191877 A1 WO2017191877 A1 WO 2017191877A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- pattern
- graph
- encoding
- rdf
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/164—File meta data generation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
Definitions
- the present invention relates to a compression apparatus and method for managing management, and more particularly, to a compression apparatus and method for managing management for RDF (Resource Description Framework) documents.
- RDF Resource Description Framework
- the Semantic Web was first established as a technical standard by the World Wide Web Consortium (W3C), and represented in terms of ontology that machines can process information about resources and their relationships and semantics in a distributed environment. It is a framework that allows automated machines to handle this.
- W3C World Wide Web Consortium
- RDF is a standard for expressing information of resources on the web. It supports common rules about the syntax, syntax, and structure of heterogeneous data.
- RDF is represented graphically and consists of triples: subjects, predicates, and objects.
- LOD Link Open Data
- the source information of the RDF data that is, information about where the RDF data came from, who created it, and how it was changed.
- Provenance has emerged as metadata for managing the source information of such RDF data and the history information of the usage history data.
- Provenance data (hereinafter referred to as 'provenance data') is metadata representing the source information of the data and the history of use.
- the PROV model was proposed by the W3C.
- the PROV model consists of nodes, which are entities, activities, agents, and properties.
- the object represents an RDF document that is represented on the semantic web. Activities represent various activities, such as changing and deleting documents on the semantic web. Finally, an agent represents an individual or organization that performs an activity.
- Each of these nodes is organically connected and can be used to improve the compatibility of semantic web data when managing the provenance data using the standard PROV model, and can be searched through the standard query language.
- Provenance data is composed of graphs to represent historical information. Such graphs repeatedly represent duplicate data.
- Provenence compression technique based on the flow of Provenance is needed.
- the existing redundant data is managed by compressing the overlapped portions.
- there is no compression technique applying the standard provenence model since it is compressed using general processing data, it is difficult to apply it to the provisional data composed of RDF.
- no compression scheme using the standard model has been proposed. It manages the provisioning data but not the original RDF document.
- the provisional data can be tens of times larger than the original data, and the size of the provisional data is represented on the semantic web as a large amount of data.
- Provenance data is managed appropriately for the management techniques used, but it needs to be managed using a standard model to be used by various users.
- the existing provision management technique does not manage the original document separately and does not consider the RDF data.
- existing RDF data compression techniques do not consider the change history.
- an aspect of the present invention is to provide a compression scheme for efficiently managing a large amount of RDF provisional data.
- Another technical problem to be solved by the present invention is to reduce the storage capacity of the RDF provisional data.
- a compression apparatus for managing provenance including: a probability generation unit configured to receive history information and a final document, and to generate a data proof using a provisioning model; A pre-encoding unit connected to the pre-encoding unit, pre-encoding the string data of the data probe into numeric string data, storing the data in a pre-encoding table, and outputting the numeric string dataverification; Receives the column data provisionance, encodes the subject and object together into a numeric string, encodes only the predicates into a numeric string, and stores them in the final RDF data encoding table, and encodes the data using the data stored in the final RDF data encoding table.
- a final RDF compression unit for storing a data pattern compression graph for a final document using the values stored in the graph pattern variable table, and connected to the pre-encoding unit, and receives the numeric string data conference.
- the provisioning model may include an object node, an agent node, an activity node, and a metadata node having information about a time and a source.
- the pre-encoding unit encodes agent nodes, metadata nodes, and object nodes to store encoding values in a data table, encodes activity nodes to store encoding values in an activity table, and encodes attributes to store encoding values in a predicate table. It is desirable to.
- a compression method for managing provenance generating a data provision by using history information and a final document, and generating a string of string data of the data provision.
- Pre-encoding the data into a pre-encoding table and outputting the numeric string data prober; receiving the numeric string data prober, encoding the subject and object together with the numeric string, and encoding only the predicates into the numeric string separately.
- the provisioning model may include an object node, an agent node, an activity node, and a metadata node having information about a time and a source.
- this example uses an extended PROV model that extends the standard PRVO model to represent the provenance data.
- the extended PROV model handles the final RDF document to be changed or added, making history tracking easier.
- this example manages the final RDF document, unlike the existing PROV model, the original RDF document is compressed through the original RDF compression to prevent the storage space of the final RDF document from occupying much.
- the redundant portion of the data activity node in the PROV model is compressed into a subgraph to store the compressed data in consideration of the usage history of the data.
- FIG. 1 is a block diagram of a compression apparatus for management of prosperity according to an embodiment of the present invention.
- FIG. 2 is a flowchart illustrating an operation of a compression apparatus for management of maintenance according to an embodiment of the present invention.
- FIG. 3 shows an example of an extended PROV model of a compression device for provisioning management according to an embodiment of the present invention.
- 4A is an example of data provisionance generated according to a conventional PROV model.
- 4B is an example of data provisioning generated according to an extended PROV model of a compression apparatus for management of provisioning according to an embodiment of the present invention.
- FIG. 5 is an example of data provisionance input to a pre-encoding unit of a compression device for provisioning management according to an embodiment of the present invention.
- FIG. 6 is an example of numeric string data provisioning generated by a pre-encoding operation of a compression device for provisioning management according to an embodiment of the present invention.
- FIG. 7 is an example of an encoding probence graph generated according to an embodiment of the present invention.
- FIG. 8 is an example of a graph pattern extracted from a final RDF document in accordance with one embodiment of the present invention.
- 9A and 9B are repetitive graph patterns extracted from the graph pattern of FIG. 8.
- FIG. 10 is a data pattern compression graph for the final RDF document in accordance with one embodiment of the present invention.
- FIG. 11 illustrates a process of extracting a subgraph from numeric data data provisioning in accordance with an embodiment of the present invention.
- FIG. 13 is an example of a pattern-compressed provisionality graph according to an embodiment of the present invention.
- Provenance generation unit 20 Provenance compression unit
- a probe data compression method based on a probe model is used to compress the probe data.
- the Provenance data compression method extends an existing PROV model to represent RDF data.
- the extended PROV model since the time is indicated in the extended PROV model, it can be confirmed that the time has changed. Therefore, by using the extended PROV model, the data is compressed considering the historical information. In addition, it represents the proof data over time, so you can see who modified what documents and when.
- the original document Since most of the proof data consists of strings, it is converted into numeric data through dictionary encoding. Since the original document takes up a lot of space because it manages the original RDF document to be changed in the extended PROV model, the original RDF compression compresses the size of each original RDF document.
- the probe version compression module extracts the patterns used in the same order based on the active nodes of the PROV model and compresses the probe data.
- a compression device for managing probability refers to history information and final information.
- Provenance generation unit 10 for receiving a document a probe unit 20 connected to the probe generation unit 10 and the storage unit 30 connected to the probe version 20 ).
- Probability generation unit 10 generates the relevant data, that is, the probability of the final document (hereinafter, the 'proverance of the corresponding data' is referred to as 'data provision') by using the historical information and the final document. do.
- Probability compression unit 20 is connected to the pre-encoding unit 21, the pre-encoding unit 21 for encoding the generated data of the conversion of the string into the numeric string, the data probe encoded in the numeric string In the following (hereinafter referred to as 'numeric data provision'), it is connected to the final RDF compression unit 22 for compressing the final document, and the pre-encoding unit 21, and the history information is obtained from the numeric data provision.
- the provision pattern compression part 23 which extracts and compresses is provided.
- the storage unit 30 is a storage medium that stores data and information necessary for the operation of the provisioning compression device, data and information generated during the operation, and the like.
- both the historical information and the final document have an RDF data structure in RDF format, and the provisioning is also expressed in RDF format.
- documents in RED format are represented as graphs, so RDF documents and provisions such as final RDF documents or original RDF documents are also represented as graphs.
- the pre-encoding unit 21 encodes data of the data proberence, which is string data, from string data to numeric data through a pre-encoding operation. do.
- the final RDF compression unit 22 compresses the final document to compress the final document on the semantic web (that is, the final document in the form of RDF (final RDF document)). At this time, unlike the existing PROV model manages the final document.
- the final document refers to a document on the semantic web.
- the final RDF compression unit 22 performs the encoding operation as in the pre-encoding unit 21, but encodes the final document using a method different from the encoding method used in the pre-encoding unit 10.
- the subject S and the object O are encoded together, but the predicate P is separately encoded.
- the same pattern is searched and the final document is compressed using the searched same pattern.
- the same pattern is regarded as the same pattern if the subject (S) and the object (O) are different, but the use of the predicate (P) is the same.
- the probe pattern compression unit 23 extracts a subgraph based on the activity of the PROV model from the probe graph. After the subgraph is extracted, if a predetermined numerical value or more comes out according to the frequency of the extracted subgraph, the final graph is changed through the patterned information.
- the PROV model is a standard model proposed by the W3C to manage provenance data.
- the PROV model is not compatible when the method of managing the provention data in the semantic web is different, and most of the semantic web data can be expressed as a standard standard PROV model.
- the Provenance Compression method using the PROV model is used to represent the flow of Provenance.
- the PROV model represents a data flow as a model for managing the provenance data.
- Existing PROV models are easy to represent existing provenance data, but are insufficient to represent RDF documents because there are no nodes representing RDF documents on the web (ie, documents with RDF data structures). It is also created over time, but does not display accurate information about when it was changed.
- this example extends an existing PROV model and adds a part representing metadata. This process, unlike the existing model, reveals the changed parts and the changed time of the RDF document on the Semantic Web.
- the proofer generation unit 10 uses the extended PROV model shown in FIG. 3.
- the extended PROV model consists of nodes (N11-N13) and attributes (used, wasGeneratedBy, wasDerivedFrom, wasInformedBy, wasAttributedTo, ActedOneBehalfOf, wasAssociatedWith, time, source) that consist of already described entities, activities, and agents. It consists of adding the node N14 of Meta Data from the existing PROV model with), which causes when the RDF document is transformed and what RDF is generated by the data probe generated by the extended PROV model. Information about whether the document has been modified is further represented.
- An agent is made up of individuals and organizations and represents the person or organization that performed the activity.
- the metadata consists of time and source and identifies when the activity was executed and what RDF documents were modified.
- An object represents an RDF document, and an activity represents what you have done to that RDF document.
- the 'used' attribute represents, in the graph, the object required for the execution of the object N11 by connecting to the object in the activity.
- the ‘wasGeneratedBy’ branch (i.e., attribute) is the concatenation of an activity on an object, and the object that results from the activity (N12) represents that activity.
- the 'wasDerivedFrom' property is a property that connects objects from an object.
- the 'wasInformedBy' attribute is an attribute representing the exchange of an object with one object created by one activity
- the 'wasAttributedTo' attribute is an agent's influence on the object.
- the 'time' attribute connects the time of the activity with the metadata so that it knows when the activity was done.
- the 'source' attribute is an attribute that links the source of the metadata with the activity, and refers to the RDF document in which the activity is performed.
- Table 1 describes the definitions for the elements used in the extended PROV model shown in FIG. 3.
- An object means an RDF document having an RDF data structure as a document.
- An activity consists of four elements: insert, delete, change, and versioning.
- Metadata is generated when the actual activity is run and represents the time or document (ie source) to be modified.
- time means the time when the page is modified or added, and when the source is changed, the changed content or the new page. When added, it means the newly added contents.
- a new 'Document F' is created by inserting 'Document C' and 'Document D' into a document (not shown), and the generated 'Document F' is named 'Jieun'. Created by an individual named
- a new document X is generated by inserting certain content into the document F by an individual 'line drawing'.
- FIG. 4B illustrates data provisionance generated using the extended PROV model according to the present example when 'Document F' and 'Document X' are generated through the same process as that of FIG. 4A.
- a new 'document F' is generated by inserting 'document C' and 'document D' into a document (not shown), and the generated 'document F' 'Document F' was created on September 02, 2015, due to metadata for time (M11) and metadata for source (M12), indicating that RDF data was added. Can be.
- new document 'X' is created by adding the corresponding RDF data by individual 'Line Art' on September 03, 2015, like metadata (M21, M22) to newly generated 'document F'. .
- the proofer generation unit 10 further adds a metadata node indicating a time and a source to generate a data probe for the document (or data) to generate the probe compression unit 20. ) To be applied (S10).
- data provisionance consists of string data that is tens of times larger than the original data.
- the pre-encoding unit 21 changes the string data of the data probability to numeric data (S20).
- the pre-encoder 21 analyzes the inputted proof data to encode each node and branches.
- the number of activity nodes and attributes are smaller than the number of other nodes, and since the compression is based on the activity node when compressing the provenance pattern, the value encoded by encoding the agent node, metadata node, and object node
- the encoding values are stored in a total of three tables by dividing the data table that stores the data table, the activity table that stores the encoded values of the activity nodes, and the predicate table that stores the encoded values of the attributes.
- These data tables, activity data, and predicate tables may be stored in storage 30 or in pre-encoding 21.
- the input proof data is analyzed and the data is encoded through text encoding.
- Text encoding is divided into three encoding schemes.
- the text encoding is encoded in the input order.
- the text is analyzed to check whether there is already encoded data in the encoding table. If there is no encoded data after checking, the data for nodes and attributes corresponding to the predicate table, the activity table, and the data table are respectively encoded and stored.
- a new 'Document A' is generated by inserting a DF document corresponding to the metadata M31 into an existing 'Document B'.
- the pre-encoding unit 21 searches for the pre-encoding table stored in the storage unit 30.
- Table 2 is an example of a pre-encoding table stored in the storage unit 30. If document A is encoded, the data is first checked in the pre-encoding table, and a new ID is assigned if there is no data. When encoding with a new ID,
- the encoding amount of the provisional data is reduced by encoding the character string into a number through the text encoding in the pre-encoding section 21.
- 'Document B' is encoded at 1 and '2015.09.01.' Is encoded at # 2.
- the ID of the input document A becomes 3 by adding 2 to 1, the last ID of the data table.
- the ID is assigned to 2 because the existing change is 1 for the insert.
- the target corresponding to each corresponding table (that is, the data table, the activity table, and the predicate table) is stored by sequentially increasing the identification number (ID) by '1'.
- Information about object nodes, agent nodes, and metadata nodes are stored in the data table, information about activity nodes is stored in the activity table, and attribute information is stored in the predicate table.
- Pre-encoded data is reflected in graphs and encoding.
- the data provisionance generated by the extended PROV model also manages the RDF data to be changed.
- RDF data since the RDF data is composed of numerous triples, it takes up a lot of capacity. Accordingly, if the RDF data is large, it takes up a lot of storage space and compresses and stores it. Also, RDF data generally has fewer predicates than subjects and objects.
- the RDF graph having the same predicate pattern based on the predicate in the final RDF data is patterned.
- the variables included in the pattern are created and managed by creating a variable table in the storage unit 30, and converts each final RDF data into the created pattern and stores the compressed data.
- the final RDF compression unit 22 includes an RDF encoding step S31 consisting of an RDF data segmentation step S311 and a text encoding step S312, a pattern extraction step S321, and a final document pattern compression step (FIG. 2).
- a final RDF compression step S20 having an RDF pattern compression step S32 consisting of S322 is performed.
- a final document ie, a final RDF document
- the source points to in the metadata, which is the document on the semantic web.
- the string data is changed into numeric data through the RDF data analysis step S311.
- This conversion into numeric data is performed in a manner different from the encoding scheme performed in the pre-encoding section 21.
- the pre-encoding unit 21 encodes the data sequentially in the input order
- the final RDF compression unit 22 encodes the subject and the object in the same number string and encodes the predicate separately in the numeric string.
- the final encoded RDF document is compressed via RDF pattern compression.
- the RDF pattern compression when the same predicate is used, the pattern is compressed and stored in the storage unit 30.
- the final RDF compression unit 22 searches for the corresponding encoding ID in the final RDF data encoding table stored in the storage unit 30. If the corresponding encoding ID does not exist in the final RDF data encoding table, encoding is performed by adding 1 from the last ID.
- the encoding of the final RDF compression unit 22 is encoded together with the subject and the object, and only the predicate is encoded, the verbs are encoded in the order in which they are entered, and the subject and the object are encoded together.
- [Table 3] shows an example of the final RDF data encoding table generated through the operation of the RDF data analysis step of the final RDF compression unit 21.
- the elements (A, B, G, C, O, X, P, J, Q, S, H, K, V) described in the string part of the subclass 'subject, object' part are the final RDF.
- Elements (D, F, G, Q, W, S) that are words (i.e., nouns) (e.g., articles, Kim, Young-Chul, etc.) used as subjects or objects in a document Are verbs that are used as predicates in the final RDF document (eg, submit, compose, etc.), but these nouns and verbs are shown alphabetically for city convenience.
- the final RDF document lists a total of 14 different subjects or objects (A, B, G, C, O, X, P, J, Q, S, H, K, V). It can be seen that a total of eight verbs are described.
- an ID is assigned only once even when there is a large amount of repeated data. For example, even though the predicates 'D' and 'F' are repeatedly extracted, the predicate 'D' is assigned an ID having a value of 1 and the 'F' is assigned an ID of a value of '2'.
- the final RDF compression unit 22 proceeds to the text encoding step S312, and a verification graph using the encoded data (that is, encoding pro- gram). Rebuild the Verification Graph.
- FIG. 7 An example of an encoding compliance graph is shown in FIG. As is generally known, in FIG. 7, the value of each node is the value of the corresponding ID given to the 'subject, object' part, and the direction of the arrow connected between the two nodes is determined according to whether the string is given or the object, The number listed above the arrow is the value of the corresponding ID given in the 'predicate' part.
- RDF data is characterized by having fewer verbs than the subject and object and having the same pattern of verbs.
- the same pattern means that only the variables of the subject and the object are different and the order of the verbs is the same.
- the same pattern is used to extract the pattern using the subject and the object as variables.
- the final RDF compression unit 22 extracts the graph pattern repeatedly displayed by using the encoding provenance graph in the pattern extraction step S321, and stores the graph pattern having the number of times that the number of times repeatedly being used is greater than or equal to the set number. In the pattern storage unit.
- FIG. 8 as an example, a graph pattern that can be extracted from the final RDF document is shown.
- verb pattern 1 pattern1 that is used repeatedly three times is used by repeating verb 1 and verb 2
- verb pattern 4 and verb 5 that are repeated twice are used.
- pattern1, pattern2 two repetitive graph patterns are extracted as shown in FIGS. 9A and 9B, and the shape and the number of repetitions of the extracted graph patterns are as shown in Table 4 below.
- the table is stored in the storage unit 30.
- Table 5 is an example of a graph pattern variable table for graph pattern 1 (pattern1) shown in FIG. 9A.
- the information (that is, the subject or the object) entering the node (? X) in the order of finding the graph pattern 1 is the information having the identification numbers (ID) 1, 3, and 9 (Table 3).
- ID the identification numbers
- the information contained in the node (? Y) is information having identification numbers (ID) 1, 2, 3 (A, B, G in Table 3)
- the node ( information contained in? z is information having identification numbers (ID) 2, 12, and 8 (in the case of Table 3, B, H, and P).
- the final RDF compression unit 22 proceeds to the final document pattern compression step S322 and compresses the data pattern for the final RDF document by using the repeated graph patterns pattern1 and pattern2 extracted. Generate a graph (see FIG. 10).
- Provenance for the final RDF document is compressed and stored as a data compression graph.
- the final RDF compression unit 22 Compresses and stores the graph of the final RDF document by storing the changed node based on the extracted repeating graph pattern (see FIG. 10).
- the name of the graph pattern is determined based on the table shown in [Table 5], the name is determined in order with the name of the graph pattern.
- the pattern for processing the provenance data is often repeated the same.
- the pattern of document usage shows similar or identical usage patterns for various documents, such as creating a document and then changing the parts that users need to use. Therefore, the provision pattern compression unit 23 of the present example extracts and compresses and stores the repeated use pattern using the same.
- the compression operation of the provisional pattern compression section 23 is compressed in substantially the same manner as the final RDF compression section 22, except that only the object to be processed is different, but when the compression Rules are different.
- the final RDF compression unit 22 extracts the same pattern based on the predicate, but the provisional pattern compression unit 23 extracts the same pattern based on the active node.
- the provisional pattern compression unit 23 receives the numeric string data.
- a subgraph is generated based on the activity in the probever (S41).
- the provision pattern compression unit 23 stores the generated subgraph in the subgraph statistics table of the storage unit 30 and extracts the same subgraph repeatedly (S42).
- the probe pattern compression unit 23 compares the number of occurrences of the extracted subgraph with the set number of times, and if the number of occurrences is equal to or more than the set number of times, the corresponding subgraph is referred to as a reference pattern and compressed and stored.
- FIG. 11 illustrates a process of extracting a subgraph from numeric data data provisioning. As illustrated in FIG. 11, a subgraph is generated based on the activity data, and a subgraph is generated.
- a pattern not recently used ie, a pattern not used for a predetermined time
- the statistical data related to the subgraph is recorded in the subgraph statistics table of the form shown in [Table 6] stored in the storage unit 30.
- the number of times of each subgraph is managed by the subgraph statistics table.
- the number of times the subgraph appears is recorded in the subgraph statistics table. If the number of times indicated above is the set number or more, it is compressed into a reference pattern and stored in the storage unit 30. At this time, the set number of times is designated as a limit value and this value is changed according to the processing data. All subgraphs that can be extracted in FIG. 11 are counted in the subgraph statistics table.
- the provisional pattern compression unit 23 is stored after the pattern is compressed as shown in FIG. 13 (S43).
- Fig. 13 is a pattern compressed proof graph according to the present example.
- the repeated subgraph is stored as a reference pattern. It is generated as a reference pattern and is converted into string data and stored as shown in Table 7. The final result is stored as a node converted into a reference pattern to compress and store the graph of the provenance data.
- first reference pattern 2 (reference pattern 2-1) is associated with document A, document P, and document V
- second reference pattern 2 (reference pattern 2-2) is document K, document Y.
- document F are related.
- the final RDF document itself is processed by the final RDF compression unit 22, and the processing on the history information of the final RDF document is performed by the provisional pattern compression unit 23, so that the final RDF document and the history information are processed.
- the management operation of takes place separately.
- the extended PROV model handles the final RDF document to be changed or added, making history tracking easier.
- this example manages the final RDF document, unlike the existing PROV model, the original RDF document is compressed through the original RDF compression to prevent the storage space of the final RDF document from occupying much.
- the redundant portion of the data activity node in the PROV model is compressed into a subgraph to store the compressed data in consideration of the usage history of the data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Document Processing Apparatus (AREA)
Abstract
La présente invention concerne un dispositif de compression pour gérer la provenance, le dispositif comprenant : une unité de génération de provenance qui génère une provenance de données en recevant des informations d'historique et un document final, et en utilisant un modèle de provenance ; une unité de précodage qui précode des données de chaînes de caractères de la provenance de données en données de chaînes numériques, mémorise celles-ci dans une table de précodage, et sort une provenance de données de chaînes numériques ; une unité de compression RDF finale qui reçoit la provenance de données de chaînes numériques, code un sujet et un objet ensemble en une chaîne numérique, code un prédicat uniquement et séparément en une chaîne numérique, mémorise celle-ci dans une table de codage de données RDF finale, génère un graphe de provenance de codage en utilisant les données mémorisées dans la table de codage de données RDF finale, extrait des motifs de graphe de répétition en utilisant le graphe de provenance de codage généré, mémorise le nombre de répétitions de motifs de graphe extraits dans une table de statistiques de motifs, mémorise, dans une table de variables de motifs de graphe, un sujet ou un objet de chaque nœud des motifs de graphe extraits selon l'ordre dans lequel ont été trouvés les motifs de graphe extraits, et génère un graphe de compression de motif de données pour le document final à l'aide des valeurs mémorisées dans la table de variables de motif de graphe ; et une unité de compression de motif de provenance qui reçoit la provenance de données de chaînes numériques, génère un sous-graphe ayant un motif de répétition en référence à des données d'activité dans la provenance de données de chaînes numériques, mémorise des informations sur le nombre de fois du sous-graphe ayant un motif de répétition dans une table de statistiques de sous-graphe, et si le sous-graphe ayant un motif de répétition se produit un nombre prédéfini de fois ou plus, détermine que le sous-graphe ayant un motif répétitif est un motif de référence.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020160053651A KR101783791B1 (ko) | 2016-05-01 | 2016-05-01 | 프로버넌스 관리를 위한 압축 장치 및 방법 |
| KR10-2016-0053651 | 2016-05-01 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017191877A1 true WO2017191877A1 (fr) | 2017-11-09 |
Family
ID=60139054
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2016/013271 Ceased WO2017191877A1 (fr) | 2016-05-01 | 2016-11-17 | Dispositif de compression et procédé de gestion de la provenance |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR101783791B1 (fr) |
| WO (1) | WO2017191877A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4199361A1 (fr) * | 2021-12-17 | 2023-06-21 | Dassault Systèmes | Notation de graphique compressée |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110727683B (zh) * | 2019-09-30 | 2024-04-26 | 杭州久益机械股份有限公司 | 一种分布式压缩机状态监测数据编码方法及监测方法 |
| KR102597181B1 (ko) * | 2020-12-29 | 2023-11-02 | 케이웨어 (주) | 메타정보를 관리하는 데이터 관리 서버 및 그 제어방법 |
| KR102796537B1 (ko) * | 2021-12-28 | 2025-04-15 | 경희대학교 산학협력단 | 자원 기술 프레임워크 데이터의 압축 장치 및 방법 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080126399A1 (en) * | 2006-06-29 | 2008-05-29 | Macgregor Robert M | Method and apparatus for optimizing data while preserving provenance information for the data |
| US9053437B2 (en) * | 2008-11-06 | 2015-06-09 | International Business Machines Corporation | Extracting enterprise information through analysis of provenance data |
| US9069808B2 (en) * | 2009-05-20 | 2015-06-30 | International Business Machines Corporation | Indexing provenance data and evaluating provenance data queries in data processing systems |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8229775B2 (en) | 2008-11-06 | 2012-07-24 | International Business Machines Corporation | Processing of provenance data for automatic discovery of enterprise process information |
| US9058308B2 (en) | 2012-03-07 | 2015-06-16 | Infosys Limited | System and method for identifying text in legal documents for preparation of headnotes |
-
2016
- 2016-05-01 KR KR1020160053651A patent/KR101783791B1/ko not_active Expired - Fee Related
- 2016-11-17 WO PCT/KR2016/013271 patent/WO2017191877A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080126399A1 (en) * | 2006-06-29 | 2008-05-29 | Macgregor Robert M | Method and apparatus for optimizing data while preserving provenance information for the data |
| US9053437B2 (en) * | 2008-11-06 | 2015-06-09 | International Business Machines Corporation | Extracting enterprise information through analysis of provenance data |
| US9069808B2 (en) * | 2009-05-20 | 2015-06-30 | International Business Machines Corporation | Indexing provenance data and evaluating provenance data queries in data processing systems |
Non-Patent Citations (5)
| Title |
|---|
| BOK, KYUNG SOO ET AL.: "Provenance Compression Scheme Considering RDF Graph Patterns", THE JOURNAL OF THE KOREA CONTENTS ASSOCIATION, 1 February 2016 (2016-02-01), pages 374 - 386, XP055436479 * |
| HAN, JI EUN ET AL.: "An Efficient RDF Compression Scheme Considering Duplication of RDF Documents", PROCEEDINGS OF THE KOREAN INSTITUTE OF INFORMATION SCIENTISTS AND ENGINEERS CONFERENCE, 1 December 2015 (2015-12-01), pages 112 - 114, XP055436478 * |
| HAN, JI EUN ET AL.: "Efficient RDF Provenance Compression Scheme Considering Duplication", PROCEEDINGS OF THE KOREA CONTENTS ASSOCIATION CONFERENCE, 1 May 2015 (2015-05-01), pages 75 - 76, XP055436475 * |
| MCGLOTHLIN, JAMES P. ET AL.: "Efficient RDF Data Management Including Provenance and Uncertainty", PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM (IDEAS 10, 14 August 2010 (2010-08-14), XP058351748 * |
| ZHAO, JUN ET AL.: "Provenance Requirements for the Next Version of RDF", W3C WORKSHOP RDF NEXT STEPS, 2010, XP055436472 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4199361A1 (fr) * | 2021-12-17 | 2023-06-21 | Dassault Systèmes | Notation de graphique compressée |
| US12386897B2 (en) * | 2021-12-17 | 2025-08-12 | Dassault Systemes | Compressed graph notation |
Also Published As
| Publication number | Publication date |
|---|---|
| KR101783791B1 (ko) | 2017-10-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105868204B (zh) | 一种转换Oracle脚本语言SQL的方法及装置 | |
| US6658377B1 (en) | Method and system for text analysis based on the tagging, processing, and/or reformatting of the input text | |
| WO2010087566A1 (fr) | Système d'analyse de documents | |
| WO2017191877A1 (fr) | Dispositif de compression et procédé de gestion de la provenance | |
| WO2010120101A2 (fr) | Procede de recommandation de mots-cles mettant en oeuvre un modele spatial de vecteurs inverse et appareil correspondant | |
| WO2011122724A1 (fr) | Système exécutant une inspection de code pour effectuer une inspection de code sur les codes sources abap | |
| WO2010050675A2 (fr) | Procédé pour l’extraction automatique de triplets de relation par un arbre d’analyse de grammaire de dépendance | |
| CN108228701A (zh) | 一种实现汉语近自然语言查询接口的系统 | |
| JPH08255172A (ja) | 文書検索システム | |
| JP2011059935A (ja) | 設計チェック知識構築方法及びシステム | |
| WO2012130145A1 (fr) | Procédé et dispositif d'acquisition et de recherche d'informations de connaissance pertinentes | |
| WO2011162444A1 (fr) | Dictionnaire d'entités nommées combiné avec un schéma d'ontologie et dispositif et procédé permettant de renouveler un dictionnaire d'entités nommées ou une base de données de règles d'exploration à l'aide d'une règle d'exploration | |
| WO2022030670A1 (fr) | Système et procédé d'apprentissage profond par cadre utilisant une requête | |
| WO2022191368A1 (fr) | Procédé et dispositif de traitement de données pour l'apprentissage d'un réseau neuronal qui catégorise une intention en langage naturel | |
| WO2013008978A1 (fr) | Système et procédé de recherche de résultat d'identification d'objets | |
| WO2011068315A4 (fr) | Appareil permettant de sélectionner une base de données optimale en utilisant une technique de reconnaissance de force conceptuelle maximale et procédé associé | |
| JP7103763B2 (ja) | 情報処理システムおよび情報処理方法 | |
| Lagerström et al. | Extended influence diagram generation | |
| WO2014092360A1 (fr) | Procédé permettant d'évaluer des brevets sur la base de facteurs complexes | |
| WO2025136017A1 (fr) | Procédé permettant d'effectuer une analyse de données en fonction d'une interrogation en langage naturel à l'aide d'une ia générative, et dispositif électronique pour sa mise en œuvre | |
| WO2024029939A1 (fr) | Procédé permettant de construire une base de données d'esg contenant des données de guide esg structurées à l'aide d'un outil auxiliaire de guide esg, et système de fourniture de service de guide esg pour sa mise en œuvre | |
| WO2017159906A1 (fr) | Structure de données pour déterminer un ordre de traduction de mots compris dans un texte de langue source, programme pour générer une structure de données, et support informatique lisible par ordinateur le stockant | |
| WO2021060951A1 (fr) | Procédé de gestion de création basé sur une relation entre des documents électroniques et système de gestion de création | |
| WO2024019225A1 (fr) | Procédé de traitement de données structurées et de données non structurées dans une pluralité de bases de données différentes, et plateforme de traitement de données fournissant ledit procédé | |
| WO2024019224A1 (fr) | Procédé permettant de traiter des données structurées et des données non structurées dans une base de données, et plate-forme de traitement de données pour fournir un procédé |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16901103 Country of ref document: EP Kind code of ref document: A1 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16901103 Country of ref document: EP Kind code of ref document: A1 |