[go: up one dir, main page]

CN107016065A - It is customizable to rely on semantic effective origin filter method - Google Patents

It is customizable to rely on semantic effective origin filter method Download PDF

Info

Publication number
CN107016065A
CN107016065A CN201710158061.4A CN201710158061A CN107016065A CN 107016065 A CN107016065 A CN 107016065A CN 201710158061 A CN201710158061 A CN 201710158061A CN 107016065 A CN107016065 A CN 107016065A
Authority
CN
China
Prior art keywords
node
direct
origin
direct result
dependence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710158061.4A
Other languages
Chinese (zh)
Inventor
孙连山
侯涛
石丽波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN201710158061.4A priority Critical patent/CN107016065A/en
Publication of CN107016065A publication Critical patent/CN107016065A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提出一种可定制依赖语义的高效用起源过滤方法,包含以下步骤:步骤1:检查待过滤的起源图,确保其为符合标准化起源模型及其模型约束的有向无环图;步骤2:声明起源过滤需求,即待过滤的起源元素、直接或间接起源依赖关系;步骤3:按照起源过滤机制中的过滤规则和修复规则处理待过滤的节点和节点对;步骤4:按照起源图整理规则整理并更新起源图,得到起源过滤视图;步骤5:按照起源过滤视图评估模型定量地评估起源过滤视图的效用,生成起源过滤报告。本发明的方法,能够选择性地过滤或保持起源元素之间的依赖语义,能够在实现过滤需求的同时有效地保持所发布起源数据的效用。

The present invention proposes a high-efficiency origin filtering method that can customize dependency semantics, including the following steps: Step 1: Check the origin graph to be filtered to ensure that it is a directed acyclic graph that conforms to the standardized origin model and its model constraints; Step 2 : Declare origin filtering requirements, that is, origin elements to be filtered, direct or indirect origin dependencies; Step 3: Process the nodes and node pairs to be filtered according to the filtering rules and repair rules in the origin filtering mechanism; Step 4: Organize according to the origin graph Organize and update the origin map according to the rules to obtain the origin filter view; Step 5: Quantitatively evaluate the effectiveness of the origin filter view according to the origin filter view evaluation model, and generate an origin filter report. The method of the invention can selectively filter or maintain the dependency semantics between origin elements, and can effectively maintain the utility of the published origin data while realizing the filtering requirement.

Description

可定制依赖语义的高效用起源过滤方法High-Utility Origin Filtering Method with Customizable Dependency Semantics

技术领域technical field

本发明属于数据分享技术领域,涉及实现起源数据安全分享的起源过滤技术领域,特别涉及一种可定制依赖语义的高效用起源过滤方法。The invention belongs to the technical field of data sharing, relates to the technical field of origin filtering for realizing safe sharing of origin data, and in particular to a high-efficiency origin filtering method with customizable dependency semantics.

背景技术Background technique

数据起源(Data Provenance,简称起源)也称为数据溯源、数据世系等,记录数据的来源、数据所经历的处理过程以及时间、地点、工具、方法、策略、组织和人员等相关信息,是描述数据演化历史的元数据。起源是评估数据真实性、可信性以及可重现性的重要基础。为实现跨组织的起源数据的发布和共享,W3C相继发布了起源数据模型标准OPM和PROV-DM,用带标注的有向无环图表示起源信息,如图1所示。起源图的核心结构包括实体、活动、代理等三类起源元素以及它们之间的多种依赖关系。Data Provenance (Data Provenance, referred to as origin) is also known as data traceability, data lineage, etc. It records the source of data, the processing process experienced by data, and related information such as time, place, tools, methods, strategies, organizations, and personnel. Metadata of data evolution history. Provenance is an important basis for assessing the authenticity, credibility and reproducibility of data. In order to realize the release and sharing of origin data across organizations, W3C successively released the origin data model standards OPM and PROV-DM, which represent origin information with a marked directed acyclic graph, as shown in Figure 1. The core structure of the origin graph includes three types of origin elements, such as entities, activities, and agents, and various dependencies among them.

起源数据记录描述数据演变历史,可能蕴含各种敏感信息。如公开法律文档起源信息,如其制定者信息及其成形之前有争议的草案内容等可能是敏感的。直接发布和共享原始的起源信息可能造成敏感信息泄露,并进而造成严重的后果。原始的起源还往往包含一些用户不需要的冗余细节信息。起源过滤(Provenance Sanitization)也称为起源校订、起源抽象,是一种改造起源图,在起源数据发布之前,选择性地过滤起源图中的敏感或冗余信息,得到安全、可用的起源过滤视图(sanitized provenance view)的新技术。Origin data records describe the history of data evolution and may contain various sensitive information. For example, disclosing the origin information of legal documents, such as the information of its framers and the controversial draft content before its formation, may be sensitive. Direct publishing and sharing of original provenance information may result in the disclosure of sensitive information with serious consequences. The original origin also often contains some redundant details that the user does not need. Provenance Sanitization, also known as Provenance Revision and Provenance Abstraction, is a transformation of the Provenance Graph, which selectively filters sensitive or redundant information in the Provenance Graph before the Provenance data is released to obtain a safe and usable Provenance Filtered View (sanitized provenance view) new technology.

现有的起源过滤技术能过滤起源图中具体的节点和边乃至子图,但仍存在如下不足。首先,现有的技术无法制起源依赖语义,不能灵活地按需增减起源图所蕴含的起源依赖语义,特别是起源间接依赖语义,过滤对象覆盖不全面。其次,现有的技术侧重于实现起源安全,在过滤敏感信息的时候,往往采用分组抽象的方式聚合相关节点,造成一部分本应公开的节点和相关的边被过度抽象和隐藏,导致脱敏后的起源过滤视图效用偏低。而起源效用是起源用户获得并利用起源满足其业务需要的程度。效用太低的数据起源将失去价值。Existing provenance filtering techniques can filter specific nodes, edges and even subgraphs in the provenance graph, but there are still the following deficiencies. First of all, the existing technology cannot control the origin-dependent semantics, and cannot flexibly increase or decrease the origin-dependent semantics contained in the origin graph as needed, especially the origin-indirect-dependent semantics, and the coverage of filtering objects is not comprehensive. Secondly, the existing technologies focus on the realization of origin security. When filtering sensitive information, they often use group abstraction to aggregate related nodes, causing some nodes and related edges that should be disclosed to be over-abstracted and hidden, resulting in desensitization. The origin filter view for is low. Origin utility, on the other hand, is the degree to which origin users acquire and use origin to meet their business needs. A data origin with too little utility loses its value.

发明内容Contents of the invention

为克服现有技术的缺点,本发明的目的在于提供一种可定制依赖语义的高效用起源过滤方法,以解决现有起源过滤技术中过滤对象覆盖不全面、脱敏后起源效用偏低等问题;能够选择性地过滤或保持起源元素之间的依赖语义,过滤对象不仅包括现有起源过滤技术可以处理的起源元素和直接依赖关系,还包括蕴含丰富语义的起源间接依赖关系;本发明的过滤机制包含过滤规则及与之相配套的修复规则,能在实现过滤需求的同时有效地保持起源数据的效用。In order to overcome the shortcomings of the prior art, the purpose of the present invention is to provide a high-efficiency origin filtering method that can be customized and depend on semantics, so as to solve the problems of incomplete coverage of filtering objects and low origin utility after desensitization in the existing origin filtering technology. ; Can selectively filter or maintain the dependency semantics between the source elements, the filtering object not only includes the source elements and direct dependencies that can be processed by the existing source filtering technology, but also includes the source indirect dependencies that contain rich semantics; the filter of the present invention The mechanism includes filtering rules and matching repair rules, which can effectively maintain the effectiveness of the original data while realizing the filtering requirements.

为了实现上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical scheme adopted in the present invention is:

一种可定制依赖语义的高效用起源过滤方法,包括以下步骤:A high-efficiency origin filtering method with customizable dependency semantics, comprising the following steps:

步骤1:检查待过滤的起源图,确保其为符合标准化起源模型及其模型约束的有向无环图;Step 1: Check the origin graph to be filtered to ensure that it is a directed acyclic graph that conforms to the standardized origin model and its model constraints;

步骤2:声明起源过滤需求,即待过滤的起源元素、直接及间接起源依赖关系。其中,起源元素用起源图中的节点表示,直接或间接依赖关系用起源图中的节点对表示;Step 2: Declare origin filtering requirements, that is, origin elements to be filtered, direct and indirect origin dependencies. Among them, origin elements are represented by nodes in the origin graph, and direct or indirect dependencies are represented by node pairs in the origin graph;

步骤3:按照起源过滤机制中的过滤规则和修复规则处理待过滤的节点和节点对。过滤顺序为节点、表示直接依赖关系的节点对、表示间接依赖关系的节点对。依据过滤对象的类型及其直接前因节点和直接后果节点的类型组合选择并应用适当的过滤和修复操作;特别地,依据最小代价决策法选择适用于间接依赖关系的过滤和修复操作,代价包括过滤代价和修复代价。Step 3: Process the nodes and node pairs to be filtered according to the filtering rules and repairing rules in the origin filtering mechanism. The filtering order is nodes, pairs of nodes representing direct dependencies, pairs of nodes representing indirect dependencies. Select and apply appropriate filtering and repairing operations according to the type of the filtering object and the combination of the type of the direct cause node and the direct consequence node; in particular, the filtering and repairing operation suitable for indirect dependencies is selected according to the minimum cost decision method, and the cost includes Filter cost and repair cost.

步骤4:按照起源图整理规则整理并更新起源图,得到起源过滤视图;检查起源图中的代理类节点和活动类节点,若节点独立则删除之;检查起源图中的边,若边所依附的顶点之一已被删除,则删除该边;检查起源图中的边,若该边所表示的依赖关系可由其它边推理得到,则删除该边;更新起源图,得到最终的起源过滤视图。Step 4: Arrange and update the origin graph according to the origin graph sorting rules to obtain the origin filter view; check the proxy nodes and activity nodes in the origin graph, and delete them if they are independent; check the edges in the origin graph, if the edges are attached If one of the vertices has been deleted, delete the edge; check the edge in the origin graph, if the dependency represented by the edge can be deduced from other edges, delete the edge; update the origin graph to get the final origin filter view.

步骤5:按照起源过滤视图评估模型定量地评估起源过滤视图的效用,生成起源过滤报告。其中,起源效用是如下测量值的综合:(1)除待过滤节点外,过滤视图中的节点保留率;(2)除待过滤边之外,过滤视图中的边的保留率;(3)除待过滤连通路径节点对之外,过滤视图中的连通路径节点对保留率;(4)路径未变化的连通节点对数占过滤视图的中连通节点对总数的比率;起源过滤报告内容包括原始起源图、过滤需求、起源过滤视图、过滤需求的满足情况、多过滤的起源元素及依赖关系、为修复起源依赖语义而改变的可推理起源依赖关系以及起源过滤视图的效用值。Step 5: Quantitatively evaluate the utility of the origin filtering view according to the origin filtering view evaluation model, and generate an origin filtering report. Among them, the origin utility is the synthesis of the following measurements: (1) the retention rate of nodes in the filtered view except for the nodes to be filtered; (2) the retention rate of edges in the filtered view except for the edges to be filtered; (3) In addition to the connected path node pairs to be filtered, the retention rate of connected path node pairs in the filter view; (4) the ratio of the number of connected node pairs with unchanged paths to the total number of connected node pairs in the filter view; the source filter report includes the original Origin graphs, filtering requirements, origin filtering views, satisfaction of filtering requirements, multi-filtering origin elements and dependencies, reasonable origin dependencies changed to fix origin dependency semantics, and utility values for origin filtering views.

与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:

1.解决了现有方法无法定制起源依赖语义的问题,实现了过滤对象全面覆盖,不仅支持按需过滤起源元素和直接依赖关系,还支持按需过滤或保持间接依赖关系;1. Solve the problem that the existing methods cannot customize the origin dependency semantics, realize the full coverage of filtering objects, not only support on-demand filtering of origin elements and direct dependencies, but also support on-demand filtering or maintaining indirect dependencies;

2.设计了起源效用保持机制,即根据起源依赖语义的固有性质,在起源过滤机制中设计了修复规则,在实现过滤需求的同时有效地保持起源效用,减少起源消费者对起源的使用障碍,方便其利用起源实现相关数据的真实性评估、可信性验证等业务目标。2. The origin utility maintenance mechanism is designed, that is, according to the inherent nature of the origin-dependent semantics, the restoration rules are designed in the origin filtering mechanism, which effectively maintains the origin utility while realizing the filtering requirements, and reduces the origin consumers' barriers to the use of the origin, It is convenient for it to use the source to achieve business goals such as authenticity assessment and credibility verification of relevant data.

3.设计了起源过滤视图评估模型,定量地评估起源过滤视图的效用,并生成起源过滤报告,为进一步利用起源制定各类业务决策提供科学的依据。3. Design the origin filter view evaluation model, quantitatively evaluate the effectiveness of the origin filter view, and generate the origin filter report, to provide a scientific basis for further making various business decisions by using the origin.

附图说明Description of drawings

图1为本发明参考的标准起源模型PROV-DM的核心结构图。Fig. 1 is a core structure diagram of the standard origin model PROV-DM referred to in the present invention.

图2为本发明的起源过滤方法流程图。Fig. 2 is a flow chart of the origin filtering method of the present invention.

图3为实体类节点过滤规则及修复规则(表1)的可视化示意图。Fig. 3 is a visual schematic diagram of filtering rules and repairing rules (Table 1) of entity class nodes.

图4为活动类节点过滤规则及修复规则(表2)的可视化示意图。Fig. 4 is a visual schematic diagram of filtering rules and repairing rules (Table 2) of activity class nodes.

图5为代理类节点过滤规则及修复规则(表3)的可视化示意图。Fig. 5 is a visual schematic diagram of filtering rules and repairing rules (Table 3) of proxy class nodes.

图6为过滤前的原起源图实例。Figure 6 is an example of the original source graph before filtering.

图7为在过滤需求R1的情况下,使用本发明的方法处理图6得到的起源过滤视图。FIG. 7 is an origin filtering view obtained by using the method of the present invention to process FIG. 6 in the case of filtering requirement R1.

图8为在过滤需求R1的情况下,使用其它方法处理图6得到的起源过滤视图。Fig. 8 is an origin filtering view obtained by processing Fig. 6 with other methods in the case of filtering requirement R 1 .

图9为在过滤需求R2的情况下,使用本发明的方法处理图6得到的起源过滤视图。Fig. 9 is an origin filtering view obtained by processing Fig. 6 with the method of the present invention in the case of filtering requirement R2.

具体实施方式detailed description

下面结合附图和具体实施例进一步阐明本发明,应理解这些实施例仅用于说明本发明而不用于限制本发明的范围,在阅读了本发明之后,本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。Further illustrate the present invention below in conjunction with accompanying drawing and specific embodiment, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

本发明假定待过滤的起源图具有可追溯的依赖语义,即若任意节点v可以追溯到另一节点u,则u是v发生的部分起因。实施例参考标准起源模型PROV-DM,本发明方法所涉及的相关概念定义如下:The present invention assumes that the origin graph to be filtered has traceable dependency semantics, that is, if any node v can be traced back to another node u, then u is part of the cause of v. The embodiments refer to the standard origin model PROV-DM, and the related concepts involved in the method of the present invention are defined as follows:

定义1起源图ProvGraph=(V,E):Definition 1 origin graph ProvGraph=(V, E):

节点集V={v1,v2,…,vn},在起源图中,每个节点v表示一个起源元素,起源元素分为实体(EN),活动(AC)和代理(AG)等三种类型,分别表示数据演化过程中的中间形态制品、所经历的操作活动以及实施相关操作活动的人或组织;Node set V={v 1 , v 2 ,...,v n }, in the origin graph, each node v represents an origin element, and the origin elements are divided into entities (EN), activities (AC) and agents (AG), etc. Three types, representing the intermediate morphological artifacts in the process of data evolution, the operating activities experienced, and the people or organizations that implement related operating activities;

映射τ:V→{EN,AC,AG}将节点v映射为其类型,即对节点v∈V,其类型τ(v)∈{EN,AC,AG};Mapping τ: V → {EN, AC, AG} maps node v to its type, that is, for node v ∈ V, its type τ(v) ∈ {EN, AC, AG};

边集E={ei=<u,v>|u∈V;v∈V;i=1,2,…,m},在起源图中,边e=<u,v>是从v到u的有向弧,表示v到u的直接依赖关系,即v是u的直接后果,而u是v的直接起因。Edge set E={e i =<u, v>|u∈V;v∈V; i=1, 2,..., m}, in the origin graph, edge e=<u, v> is from v to The directed arc of u indicates the direct dependence of v to u, that is, v is the direct consequence of u, and u is the direct cause of v.

定义2起源元素子集:实体集(Entity)、活动集(Activity)和代理集(Agent):Definition 2 Subsets of origin elements: entity set (Entity), activity set (Activity) and agent set (Agent):

根据起源元素的类型可将起源元素集合V可分为三个互不相交的子集:实体集(Entity),活动集(Activity)和代理集(Agent),有V=Entity∪Activity∪Agent。According to the type of source elements, the source element set V can be divided into three mutually disjoint subsets: entity set (Entity), activity set (Activity) and agent set (Agent), there is V=Entity∪Activity∪Agent.

实体集:Entity={ent|ent(id,attributes),τ(ent)=EN},一个实体节点e描述某个数据对象过去某个时间段或时间点的状态,e具有唯一标识符(id)和一系列属性;Entity set: Entity={ent|ent(id, attributes), τ(ent)=EN}, an entity node e describes the state of a data object at a certain time period or point in the past, and e has a unique identifier (id ) and a series of attributes;

活动集:Activity={act|act(id,startTime,endTime,attributes),τ(act)=AC},一个活动节点a表示在一段时间内作用在实体节点上的过程或者会引起状态发生变化的一系列操作;Activity set: Activity={act|act(id, startTime, endTime, attributes), τ(act)=AC}, an activity node a represents the process that acts on the entity node within a period of time or will cause the state to change a series of operations;

代理集:Agent={ag|ag(id,attributes),τ(ag)=AG},一个代理节点ag表示为一个活动的发生、一个实体的存在或另一个代理的活动承担某种责任的代理人或组织。Agent set: Agent={ag|ag(id, attributes), τ(ag)=AG}, an agent node ag represents an agent that takes some responsibility for the occurrence of an activity, the existence of an entity, or the activity of another agent person or organization.

不同类型的起源元素之间的依赖关系具有不同的语义,可根据相互依赖的起源元素的类型,将起源图中的边划分为不同的子类。The dependencies between different types of origin elements have different semantics, and the edges in the origin graph can be divided into different subclasses according to the types of interdependent origin elements.

定义3直接依赖关系子集:Usage,Generation,Association,Attribution,Derivation,Communication,Delegation:Define 3 subsets of direct dependencies: Usage, Generation, Association, Attribution, Derivation, Communication, Delegation:

起源图中的边表示相邻两节点间的因果关系,边集E可分为七个互不相交的子集,E=Usage∪Generation∪Association∪Attribution∪Derivation∪Communication∪Delegation,其中:The edges in the origin graph represent the causal relationship between two adjacent nodes. The edge set E can be divided into seven mutually disjoint subsets, E=Usage∪Generation∪Association∪Attribution∪Derivation∪Communication∪Delegation, where:

Usage(使用依赖)={<ent,act>|ent∈Entity,act∈Activity,表示活动act使用实体ent};Usage (use dependency) = {<ent, act>|ent∈Entity, act∈Activity, which means that the activity act uses the entity ent};

Generation(产生依赖)={<act,ent>|act∈Activity,ent∈Entity,表示实体ent由活动act产生};Generation (generating dependency)={<act, ent>|act∈Activity, ent∈Entity, which means that the entity ent is generated by the activity act};

Association(关联依赖)={<ag,act>|ag∈Agent,act∈Activity,表示代理ag在活动act中承担责任};Association (association dependency) = {<ag, act>|ag∈Agent, act∈Activity, indicating that the agent ag takes responsibility in the activity act};

Attribution(归属依赖)={<ag,ent>|ag∈Agent,ent∈Entity,表示代理ag对实体ent承担责任};Attribution (attribution dependency) = {<ag, ent>|ag∈Agent, ent∈Entity, indicating that the agent ag is responsible for the entity ent};

Derivation(派生依赖)={<ent1,ent2>|ent1,ent2∈Entity,表示实体ent2由实体ent1派生而来};Derivation (derived dependency)={<ent 1 , ent 2 >|ent 1 , ent 2 ∈ Entity, which means that entity ent 2 is derived from entity ent 1 };

Communication(通信依赖)={<act1,act2>|act1,act2∈Activity,表示活动act2使用了由活动act1产生的实体};Communication (communication dependence) = {<act 1 , act 2 >|act 1 , act 2 ∈ Activity, indicating that activity act 2 uses the entity generated by activity act 1 };

Delegation(代理依赖)={<ag1,ag2>|ag1,ag2∈Agent,表示代理ag2代表代理ag1,称ag1为ag2的上级代理}。Delegation (agent dependency)={<ag 1 , ag 2 >|ag 1 , ag 2 ∈ Agent, means that agent ag 2 represents agent ag 1 , and ag 1 is called the superior agent of ag 2 }.

定义4起源间接依赖关系集PathSet:Definition 4 origin indirect dependency set PathSet:

连通路径P(u,v)={vi|v0=u;vk=v;<vi,vi+1>∈E;i=0,1,…,k-1;k>2},是起源图中的节点序列,表示节点v对u的间接依赖关系,即v是u的间接后果,u是v的间接起因;Connected path P(u, v)={v i |v 0 =u; v k =v; <v i , v i+1 >∈E;i=0,1,...,k-1;k>2 }, is the sequence of nodes in the origin graph, indicating the indirect dependency of node v on u, that is, v is the indirect consequence of u, and u is the indirect cause of v;

连通路径集PathSet(u,v)={P(u,v)|u∈V,v∈V},是节点对u,v之间所有连通路径构成的集合,表示节点v对节点u的间接依赖语义。The set of connected paths PathSet(u, v)={P(u, v)|u∈V, v∈V} is a set of all connected paths between the node pair u and v, representing the indirect connection between node v and node u Depends on semantics.

在起源图ProvGraph中,若ent1、ent2、ent3∈Entity,act1、act2、act3∈Activity,ag1、ag2∈Agent,则如下定理表示相关起源依赖关系的传递性和可推理性:In ProvGraph, if ent 1 , ent 2 , ent 3 ∈ Entity, act 1 , act 2 , act 3 ∈ Activity, ag 1 , ag 2 ∈ Agent, then the following theorem expresses the transitivity and reliability of related origin dependencies Reasonable:

定理1Attribution关系具有可推理性:Theorem 1 The Attribution relation is deducible:

(1)负责活动act1的代理ag1对该活动产生的实体ent1负责;即,若<ag1,act1>∈E且<act1,ent1>∈E,则<ag1,ent1>成立;(1) The agent ag 1 responsible for the activity act 1 is responsible for the entity ent 1 generated by the activity; that is, if <ag 1 ,act 1 >∈E and <act 1 ,ent 1 >∈E, then <ag 1 ,ent 1 >established;

(2)负责代理ag2的上级代理ag1对该代理负责的实体ent1负责;即,若<ag1,ag2>∈E且<ag2,ent1>∈E,则<ag1,ent1>成立。(2) The superior agent ag 1 responsible for the agent ag 2 is responsible for the entity ent 1 responsible for the agent; that is, if <ag 1 , ag 2 >∈E and <ag 2 , ent 1 >∈E, then <ag 1 , ent 1 > established.

定理2Association关系具有可推理性:Theorem 2 Association relation is deducible:

负责代理ag2的上级代理ag1对该代理负责的活动act1负责;即,若<ag1,ag2>∈E且<ag2,act1>∈E,则<ag1,act1>成立。The superior agent ag 1 responsible for the agent ag 2 is responsible for the activity act 1 that the agent is responsible for; that is, if <ag 1 , ag 2 >∈E and <ag 2 , act 1 >∈E, then <ag 1 , act 1 > established.

定理3Communication关系具有可推理性:Theorem 3Communication relation is deducible:

活动act1产生的某个实体ent1是活动act2开始并随后产生其它实体的必要条件,则表示这两个实体间存在通信关系;即,若<act1,ent1>∈E且<ent1,act2>∈E,则<act1,act2>成立。An entity ent 1 generated by activity act 1 is a necessary condition for the start of activity act 2 and subsequent generation of other entities, which means that there is a communication relationship between these two entities; that is, if <act 1 , ent 1 >∈E and <ent 1 , act 2 >∈E, then <act 1 , act 2 > holds.

定理4Communication关系具有传递性:Theorem 4Communication relationship is transitive:

活动act1与活动act2间存在通信关系,活动act2与活动act3间存在通信关系,则活动act3与活动act1间也存在通信关系;即,若<act1,act2>∈E且<act2,act3>∈E,则<act1,act3>成立。There is a communication relationship between act 1 and act 2 , and there is a communication relationship between act 2 and act 3 , so there is also a communication relationship between act 3 and act 1 ; that is, if <act 1 , act 2 >∈E And <act 2 , act 3 >∈E, then <act 1 , act 3 > holds.

定理5Delegation关系具有传递性:Theorem 5Delegation relationship is transitive:

代理ag2的上级代理是ag1,ag3的上级代理是ag2,则可以说ag1是ag3的上级代理;即,若<ag1,ag2>∈E且<ag2,ag3>∈E,则<ag1,ag3>成立。The superior agent of agent ag 2 is ag 1 , and the superior agent of ag 3 is ag 2 , then it can be said that ag 1 is the superior agent of ag 3 ; that is, if <ag 1 , ag 2 >∈E and <ag 2 , ag 3 >∈E, then <ag 1 , ag 3 > holds.

定理6Derivation关系具有传递性:Theorem 6Derivation relation is transitive:

若实体ent1由实体ent2派生而来,实体ent2由实体ent3派生而来,则从一定程度上可以认为ent1由ent3派生而来;即,若<ent3,ent2>∈E且<ent2,ent1>∈E,则<ent3,ent1>成立。If entity ent 1 is derived from entity ent 2 , and entity ent 2 is derived from entity ent 3 , then to a certain extent, ent 1 can be considered to be derived from ent 3 ; that is, if <ent 3 , ent 2 >∈ E and <ent 2 , ent 1 >∈E, then <ent 3 , ent 1 > holds.

为准确地表达过滤操作,下面定义起源过滤的基本操作。In order to express the filtering operation accurately, the basic operation of origin filtering is defined below.

定义5起源过滤基本操作:Definition 5 Basic operation of origin filtering:

(1)GetNumVex(ProvGraph):获取起源图ProvGraph节点的个数;(1) GetNumVex(ProvGraph): Get the number of ProvGraph nodes in the origin graph;

(2)GetNumVexPair(ProvGraph):获取起源图ProvGraph连通节点对的个数;(2) GetNumVexPair(ProvGraph): Get the number of connected node pairs in ProvGraph;

(3)GetPre(v):获取节点v的直接前因节点集合,若u∈GetPre(v),则存在e=<u,v>∈E,即u是v发生的直接起因之一;(3) GetPre(v): Get the direct antecedent node set of node v, if u∈GetPre(v), then there is e=<u, v>∈E, that is, u is one of the direct causes of v;

(3)GetSub(v):获取节点v的直接后果节点集合,若u∈GetSub(v),则存在边e=<v,u>∈E,即u是v的发生的直接后果之一;(3) GetSub(v): Get the direct consequence node set of node v, if u∈GetSub(v), then there is an edge e=<v, u>∈E, that is, u is one of the direct consequences of the occurrence of v;

(4)GetPreAct(v):获取节点v的最近前因活动节点集合,若u∈GetPreAct(v),则存在P(u,v)∈PathSet(u,v),u是距离v最近的直接或间接前因活动节点,即任意s∈P(u,v),若s≠u且s≠v,则τ(s)≠AC;(4) GetPreAct(v): Get the nearest antecedent activity node set of node v, if u∈GetPreAct(v), then there is P(u, v)∈PathSet(u, v), u is the closest direct path to v Or an indirect antecedent active node, that is, any s∈P(u, v), if s≠u and s≠v, then τ(s)≠AC;

(5)GetSubAct(v):获取节点v的最近后果活动节点集合,若u∈GetSubAct(v),则存在P(v,u)∈PathSet(u,v),u是距离v最近的直接或间接后果活动节点,即任意s∈P(v,u),若s≠u且s≠v,则τ(s)≠AC;(5) GetSubAct(v): Get the nearest consequence activity node set of node v, if u∈GetSubAct(v), then there is P(v,u)∈PathSet(u,v), u is the closest direct or Indirect consequence active node, that is, any s∈P(v,u), if s≠u and s≠v, then τ(s)≠AC;

(6)AddEdge(u,v):增加边e=<u,v>;(6) AddEdge(u, v): add edge e=<u, v>;

(7)DeleteVex(v):删除节点v;(7) DeleteVex(v): delete node v;

(8)DeleteEdge(u,v):删除边e=<u,v>。(8) DeleteEdge(u, v): delete edge e=<u, v>.

本发明的一种可定制依赖语义的高效用起源过滤方法,处理符合起源标准模型及其约束的有向无环起源图,根据包括待过滤节点和节点对的过滤需求,按照所提出的起源过滤机制中的过滤规则和修复规则处理起源图,最后采用所提出的起源过滤视图评估模型评估过滤视图的效用,生成起源过滤报告。A high-efficiency origin filtering method with customizable dependency semantics of the present invention processes directed acyclic origin graphs conforming to the origin standard model and its constraints, and filters according to the proposed origin according to the filtering requirements including nodes to be filtered and node pairs The filtering rules and repairing rules in the mechanism process the provenance graph, and finally the proposed provenance filtering view evaluation model is used to evaluate the effectiveness of the filtering view and generate the provenance filtering report.

步骤1:检查待过滤的起源数据,确保其为符合标准化起源模型及其模型约束的有向无环图。Step 1: Check the origin data to be filtered to ensure that it is a directed acyclic graph conforming to the standardized origin model and its model constraints.

模型约束1:起源中不存在单独的活动和代理;Model Constraint 1: There are no separate activities and agents in origin;

模型约束2:一个实体不能由两个以上的活动产生;Model Constraint 2: An entity cannot be produced by more than two activities;

模型约束3:当一个直接依赖关系可由其它直接依赖关系推理得到时,应在起源图中省略相应的边;Model constraint 3: When a direct dependency can be deduced from other direct dependencies, the corresponding edge should be omitted in the origin graph;

模型约束4:起源图中不能包含环,可利用拓扑排序算法检测起源图中是否有环。Model constraint 4: The origin graph cannot contain a cycle, and the topological sorting algorithm can be used to detect whether there is a cycle in the origin graph.

步骤2:声明起源过滤需求,即待过滤的起源元素、直接及间接起源依赖关系;其中,起源元素用起源图中的节点表示,直接或间接依赖关系用起源图中的节点对表示。Step 2: Declare origin filtering requirements, that is, origin elements to be filtered, direct and indirect origin dependencies; among them, origin elements are represented by nodes in the origin graph, and direct or indirect dependencies are represented by node pairs in the origin graph.

声明过滤需求时应注意:(1)声明起源元素时,需要说明元素的标识符或部分属性(attributes),当元素类型是活动时,还可以指出活动发生的起止时间(startTime和endTime);(2)声明起源依赖关系时,无论直接依赖关系还是间接依赖关系,均需指明该依赖关系所依附的两个起源元素的标识符或部分属性。When declaring filtering requirements, you should pay attention to: (1) When declaring the source element, you need to specify the identifier or some attributes of the element (attributes). When the element type is an activity, you can also indicate the start and end time of the activity (startTime and endTime); ( 2) When declaring the source dependency, regardless of the direct dependency or the indirect dependency, it is necessary to specify the identifiers or partial attributes of the two source elements to which the dependency is attached.

步骤3:按照如下过滤规则和修复规则处理待过滤的节点和节点对。Step 3: Process the nodes and node pairs to be filtered according to the following filtering rules and repairing rules.

本发明方法按照元素、直接依赖关系、间接依赖关系即节点、边、路径的顺序依次处理待过滤对象,具体流程如图2所示。The method of the present invention sequentially processes the objects to be filtered according to the order of elements, direct dependencies, and indirect dependencies, ie, nodes, edges, and paths. The specific process is shown in FIG. 2 .

首先,依次取待过滤的节点v,判断节点v及其直接前因节点和直接后果节点的类型,选取并应用适当的过滤和修复规则删除节点v以及与节点v相关的边,修复被破坏的起源依赖关系,更新起源图;节点类型不同,相应的过滤和修复规则也不同:当节点类型是实体(EN)时,规则如表1所示;当节点类型是活动(AC)时,规则如表2所示;当节点类型是代理(AG)时,规则如表3所示。其中,sv∈GetSub(v)表示v的直接后果节点,pv∈GetPre(v)表示v的直接前因节点,sav∈GetSubAct(v)表示v的最近后果活动节点,pav∈GetPreAct(v)表示v的最近前因活动节点,表格中“—”表示该情况不用或无法修复。其中,实体类过滤规则及修复规则的可视化示意图如图3所示,活动类节点过滤规则及修复规则的可视化示意图如图4所示,代理类节点过滤规则及修复规则的可视化示意图如图5所示。First, take the node v to be filtered in turn, judge the type of node v and its direct cause node and direct consequence node, select and apply appropriate filtering and repair rules to delete node v and the edges related to node v, and repair the damaged Origin dependencies, update the origin graph; different node types, corresponding filtering and repair rules are also different: when the node type is entity (EN), the rules are shown in Table 1; when the node type is activity (AC), the rules are as follows Table 2; when the node type is agent (AG), the rules are shown in Table 3. Among them, sv ∈ GetSub(v) represents the immediate consequence node of v, pv ∈ GetPre(v) represents the direct cause node of v, sav ∈ GetSubAct(v) represents the nearest consequence activity node of v, and pav ∈ GetPreAct(v) represents The most recent antecedent active node of v, "—" in the table indicates that the situation is not used or cannot be repaired. Among them, the visual schematic diagram of entity class filtering rules and repair rules is shown in Figure 3, the visual schematic diagram of activity class node filtering rules and repair rules is shown in Figure 4, and the visual schematic diagram of agent class node filtering rules and repair rules is shown in Figure 5 Show.

表1实体节点v的过滤及修复规则Table 1 Filtering and repairing rules of entity node v

表2活动节点v的过滤及修复规则Table 2 Filtering and repairing rules of active node v

表3代理节点v的过滤及修复规则Table 3 Filtering and repairing rules of proxy node v

然后,依次取待过滤的节点对(u,v),检查两点在当前起源图中是否存在且连通,若不存在或不连通则无需处理;若两点存在且连通判断u、v是否相邻,若不相邻则标记为待过滤连通路径节点对,否则删除u、v之间的边,并判断u、v两点的类型以及u的直接前因节点和v的直接后果节点类型,在依赖关系修复规则中选取适合的修复操作,修复被破坏的起源依赖关系,更新起源图。修复规则如表4和表5所示,其中,sv∈GetSub(v)表示v的直接后果节点,pu∈GetPre(u)表示u的直接前因节点,sav∈GetSubAct(v)表示v的最近后果活动节点,pau∈GetPreAct(u)表示u的最近前因活动节点,表格中“—”表示该情况不用修复或无法修复,“×”表示不存在该情况。Then, take the node pair (u, v) to be filtered in turn, check whether the two points exist and are connected in the current origin graph, if they do not exist or are not connected, no processing is required; if the two points exist and are connected, judge whether u and v If they are not adjacent, they are marked as connected path node pairs to be filtered, otherwise, delete the edge between u and v, and judge the types of u and v, as well as the direct cause node of u and the direct consequence node type of v, Select the appropriate repair operation in the dependency repair rules, repair the damaged origin dependencies, and update the origin graph. The restoration rules are shown in Table 4 and Table 5, where sv ∈ GetSub(v) represents the immediate consequence node of v, pu ∈ GetPre(u) represents the immediate cause node of u, and sav ∈ GetSubAct(v) represents the nearest Consequence activity node, pau∈GetPreAct(u) means u's recent antecedent activity node, "—" in the table means that the situation does not need to be repaired or cannot be repaired, and "×" means that the situation does not exist.

表4边(u,v)以及路径p(u,v)的修复规则Table 4 Repair rules for edge (u, v) and path p(u, v)

表5边(u,v)以及路径p(u,v)的修复规则Table 5 Repair rules for edge (u, v) and path p(u, v)

最后,依次取待过滤连通路径节点对(u,v),遍历起源图,依次获取u、v之间的连通路径p,并采用最小代价决策的方法过滤路径p,即通过代价函数分别计算删除路径p上每一条边的过滤代价和修复代价和,选取并删除总代价最小的边<s,t>;根据u、v两点的类型以及u的直接前因节点和v的直接后果节点的类型,选取并应用适当的修复规则,修复被破坏的起源间接依赖关系;若s的直接前因节点和t的直接后果节点不同时在路径p上,则还须根据s、t两点的类型以及s的直接前因节点和t的直接后果节点的类型,选择并应用适当的修复规则,修复被破坏的起源图。修复规则如表4和表5所示。Finally, take the node pair (u, v) of the connected path to be filtered in turn, traverse the origin graph, and obtain the connected path p between u and v in turn, and use the minimum cost decision method to filter the path p, that is, calculate and delete through the cost function The filter cost and repair cost sum of each edge on the path p, select and delete the edge <s, t> with the smallest total cost; according to the types of u and v and the direct cause node of u and the direct consequence node of v type, select and apply appropriate repair rules, and repair the damaged origin indirect dependencies; if the direct cause node of s and the direct consequence node of t are not on the path p at the same time, then it must also be based on the types of s and t As well as the type of the immediate cause node of s and the immediate consequence node of t, an appropriate repair rule is selected and applied to repair the broken origin graph. The repair rules are shown in Table 4 and Table 5.

最小代价决策方法中过滤代价为起源过滤视图连通节点对数减少率,修复代价预估为修复后路径变化的连通节点对数的比率。设原起源图中的连通节点对数为VP、过滤视图中的连通节点对数为VPS、过滤视图中路径变化的连通节点对数为VPSC,则过滤代价最小代价决策法过滤节点对(u,v)之间的连通路径的算法如下:In the minimum cost decision-making method, the filtering cost is the reduction rate of connected node logarithms in the origin filter view, and the restoration cost is estimated as the ratio of the connected node logarithms of path changes after repairing. Let the logarithm of connected nodes in the original source graph be VP, the logarithm of connected nodes in the filter view be VP S , and the logarithm of connected nodes whose paths change in the filter view be VP SC , then the filtering cost The algorithm of the minimum cost decision-making method to filter the connected path between the node pair (u, v) is as follows:

步骤4:按照如下整理规则整理并更新起源图,得到起源过滤视图。Step 4: Arrange and update the origin graph according to the following arrangement rules to obtain the origin filter view.

对得到的过滤视图进一步整理:(1)检查起源图中的代理节点和活动节点,若节点独立则删除之;(2)检查起源图中的边,若边所依附的顶点之一已被删除,则删除该边;(3)检查起源图中的边,若该边所表示的依赖关系可由其它边推理得到,则删除该边;(4)更新起源图,得到最终的起源过滤视图。Further organize the obtained filtered view: (1) check the proxy node and active node in the origin graph, and delete the node if it is independent; (2) check the edge in the origin graph, if one of the vertices attached to the edge has been deleted , then delete the edge; (3) check the edge in the origin graph, if the dependency represented by the edge can be deduced from other edges, delete the edge; (4) update the origin graph to obtain the final origin filter view.

步骤5:按照如下起源过滤视图评估模型定量地评估起源过滤视图的效用,生成起源过滤报告。Step 5: Quantitatively evaluate the effectiveness of the origin filtering view according to the following origin filtering view evaluation model, and generate an origin filtering report.

首先,对上一步骤中得到的起源过滤视图进行效用评估。其中,评估计算值包括:(1)除待过滤节点外,过滤视图中的节点保留率;(2)除待过滤边之外,过滤视图中的边的保留率;(3)除待过滤连通路径节点对之外,过滤视图中的连通路径节点对保留率;(4)路径未变化的连通节点对数占过滤视图的中连通节点对总数的比率;(5)按照各个起源图结构的重要性分别给(1)(2)(3)(4)赋权值ω1、ω2、ω3、ω4,其中ω1234=100%。First, utility evaluation is performed on the origin filtered view obtained in the previous step. Among them, the evaluation calculation value includes: (1) the retention rate of nodes in the filter view except for the nodes to be filtered; (2) the retention rate of edges in the filter view except for the edges to be filtered; In addition to path node pairs, the retention rate of connected path node pairs in the filter view; (4) the ratio of the number of connected node pairs with unchanged paths to the total number of connected node pairs in the filter view; (5) according to the importance of each origin graph structure assign weights ω 1 , ω 2 , ω 3 , ω 4 to (1)(2)(3)(4) respectively, where ω 1234 =100%.

设待过滤的节点数为VR、边数为ER、连通路径数为VPR,原起源图中的节点数为V、边数E、连通路径数为VP,过滤视图中的节点数为VS、边数为ES、连通路径数为VPS、路径变化的连通节点对数为VPSC,则起源过滤视图效用计算公式如下:Suppose the number of nodes to be filtered is V R , the number of edges is E R , the number of connected paths is VPR , the number of nodes in the original source graph is V, the number of edges E, and the number of connected paths are VP, and the number of nodes in the filter view is V S , the number of edges is E S , the number of connected paths is VPS , and the logarithm of connected nodes with path changes is VP SC , then the calculation formula of origin filter view utility is as follows:

实施例中起源原图符合标准化起源模型PROV-DM,如图6所示。声明两个不同安全需求:The origin original image in the embodiment conforms to the standardized origin model PROV-DM, as shown in FIG. 6 . Declare two different security requirements:

R1:待过滤的节点包括E2、Act3、Ag1,待过滤的节点对包括(E1,Act1);R 1 : the nodes to be filtered include E 2 , Act 3 , Ag 1 , and the node pairs to be filtered include (E 1 , Act 1 );

R2:待过滤的节点包括E2、Ag1,待过滤的节点对包括(E3,E5)。R 2 : the nodes to be filtered include E 2 and Ag 1 , and the node pairs to be filtered include (E 3 , E 5 ).

采用所提出的起源过滤机制对图6所示的起源图进行处理,实现需求R1得到的起源过滤视图如图7所示,实现需求R2得到的起源过滤视图如图9所示。作为实验效果对比,选用一个现有起源过滤方法对起源图6进行过滤,实现过滤需求R1,得到的起源过滤视图如图8所示。Using the proposed origin filtering mechanism to process the origin graph shown in Figure 6, the origin filtering view obtained by realizing requirement R 1 is shown in Figure 7, and the origin filtering view obtained by realizing requirement R 2 is shown in Figure 9. As a comparison of experimental results, an existing origin filtering method is selected to filter the origin graph 6 to realize the filtering requirement R 1 , and the obtained origin filtering view is shown in FIG. 8 .

利用起源过滤视图评估模型对实施例中各起源过滤视图进行效用量化评估,为了保证可比性,各起源视图均赋值ω1=ω2=ω3=30%,ω4=10%,需要指出的是,对ω1、ω2、ω3、ω4的其它赋值组合也落入本专利范围内。Use the origin filter view evaluation model to perform quantitative evaluation of the effectiveness of each origin filter view in the embodiment. In order to ensure comparability, each origin view is assigned a value of ω 1 = ω 2 = ω 3 = 30%, ω 4 = 10%. It needs to be pointed out Yes, other assignment combinations for ω 1 , ω 2 , ω 3 , and ω 4 also fall within the scope of this patent.

图8所示的起源过滤视图效用为65.78%;图7所示的起源过滤视图效用为81.04%,与图8所示的起源过滤视图相比,效用提高了15.26%,说明本发明的起源过滤方法与其它方法相比较,起源过滤视图的效用有明显的提升;图9所示的起源过滤视图效用为81.11%,说明本发明的起源过滤方法不仅可以处理复杂的起源依赖关系,也能保证较高的效用。The source filter view utility shown in Figure 8 is 65.78%; the source filter view utility shown in Figure 7 is 81.04%, compared with the source filter view shown in Figure 8, the utility has improved by 15.26%, illustrating the source filter of the present invention Compared with other methods, the effectiveness of the origin filtering view is significantly improved; the origin filtering view shown in Figure 9 has an effectiveness of 81.11%, which shows that the origin filtering method of the present invention can not only handle complex origin dependencies, but also ensure a relatively high utility.

最后,对起源过滤相关内容进行汇总,生成起源过滤报告,报告内容包括原始起源图、过滤需求、起源过滤视图、过滤需求的满足情况、多过滤的起源元素及依赖关系、为修复起源依赖语义而改变的可推理起源依赖关系、起源过滤视图的效用评估值。实施例中由图6得到图9所示的起源过滤视图的报告如表6所示。Finally, summarize the content related to origin filtering, and generate an origin filtering report, which includes the original origin map, filtering requirements, origin filtering view, satisfaction of filtering requirements, multi-filtering origin elements and dependencies, and repairing origin dependency semantics. Changed reasonable origin dependencies, utility evaluation values for origin filter views. Table 6 shows the report of the source filtering view shown in FIG. 9 obtained from FIG. 6 in the embodiment.

表6起源过滤报告摘要表Table 6 Origin Filtering Report Summary Table

Claims (9)

1. a kind of customizable effective origin filter method for relying on semanteme, it is characterised in that comprise the following steps:
Step 1:Check provenance graph to be filtered, it is ensured that it is to meet the directed acyclic for having standardized source model and its model constraint Figure;
Step 2:Statement origin filtration needs, i.e., to be filtered plays source element, the direct and dependence that originates from indirectly;Wherein, mistake The source element that rises to be filtered is represented with the node in provenance graph in filter demand, direct or indirect dependence to be filtered in filtration needs Node in relation provenance graph is to representing;
Step 3:To be filtered playing source element and rely on is handled according to the filtering rule and reparation rule in the strobe utility of origin to close System;
Step 4:Arranged according to the arrangement rule in the strobe utility of origin and update the result of step 3, obtain origin filtering View;
Step 5:The effectiveness of origin filtered view, generation origin filtering report are quantitatively assessed according to origin filtered view assessment models Accuse.
2. customizable according to claim 1 rely on semantic effective origin filter method, it is characterised in that the origin Figure has retrospective dependence semantic, i.e. if arbitrary node v can trace back to another node u, u is the part cause that v occurs In provenance graph, each node v represents that is played a source element, and according to the type for playing source element, origin element set is divided into three Individual mutually disjoint subset:In entity set, active set and agency's collection, the entity set, an entity node describes some data Object is gone in the state at some period or time point, the active set, and an active node represents to make within a period of time With the process on entity node or the sequence of operations that state can be caused to change, the agency concentrates, an agency Node is expressed as a movable generation, the presence or another activity acted on behalf of of an entity and undertakes the agent of certain responsibility Or tissue;
The relation that directly relies on is expressed as follows:
If v is u direct result, u is v direct cause, is designated as<U, v>, claim v to u's to directly rely on relation, or v direct Dependent on u, directly relying on relation includes such as Types Below:
(1) Usage (using relying on):Active node uses entity node;
(2) Generation (producing dependence), entity node is produced by active node;
(3) Association (association is relied on), agent node is undertaken the responsibility in active node;
(4) Attribution (ownership is relied on), agent node is undertaken the responsibility to entity node;
(5) Derivation (derive from and rely on), certain entity node is derived from by another entity node;
(6) Communication (communication is relied on), certain active node v communications depend on another active node u, v to use u institutes Some entity node produced;
(7) Delegation (agency relies on), certain agent node represents another agent node;
The relation that indirectly relies on is expressed as follows:
If v is u indirect consequence, u is v indirect cause, then node v indirectly relies on relation to u;
By following theorem represent related origin dependence transitivity and can be inferential:
(1) entity node one for being responsible for a pair of activities of agent node generation of active node one is responsible for, i.e. a pair of entity node The ownership dependence of agent node one has can be inferential;
(2) entity node one that a pair of the agencies of superior agency node for being responsible for agent node two are responsible for is responsible for, i.e. entity node The ownership dependence of a pair of agent nodes one has can be inferential;
(3) active node one that a pair of the agencies of superior agency node for being responsible for agent node two are responsible for is responsible for, i.e. active node The association dependence of a pair of agent nodes one has can be inferential;
(4) some entity node one that active node one is produced is that active node two starts and then produces necessity of other entities Condition, then the communication of active node two is dependent on the communication dependence of active node one, i.e. active node two to active node one With can be inferential;
(5) communication of active node one depends on active node two, and the communication of active node two depends on active node three, then active section The communication of point one has transitivity dependent on the communication dependence between active node three, i.e. active node;
(6) agent node three represents superior agency node two, and agent node two represents its higher level's agent node one, then agent node Three dependences of acting on behalf of that can be represented to a certain extent between superior agency node one, i.e. agent node have transitivity;
(7) if entity node one, which derives from, depends on entity node two, entity node two, which derives from, depends on entity node three, then entity Node one derives from has transitivity dependent on the derivation dependence between entity node three, i.e. entity node.
3. customizable according to claim 1 rely on semantic effective origin filter method, it is characterised in that the step 3, filtering order for represent the node of element, represent to directly rely on the node of relation to, represent to indirectly rely on the node pair of relation, According to filtering object type and its directly before because the type combination of node and direct result node is selected and is filtered and is repaiied Multiple operation.
4. customizable according to claim 1 rely on semantic effective origin filter method, it is characterised in that the step 3, handle rise source element and dependence to be filtered and include following sub-step:
Step 31:Take node v to be filtered successively, and according to the node and its directly before because of the class of node and direct result node Type, chooses and application filtering and reparation rule, deletion of node and the side related to node, and the destroyed origin of reparation, which is relied on, closes System, updates provenance graph;
Step 32:Take undressed node pair to be filtered<U, v>, if entering step 34 without if, otherwise check that u, 2 points of v are rising It whether there is and connect in the figure of source, without processing if being not present or not connecting;Determined whether if existing and connect if 2 points Whether u, v are adjacent, and step 33 is entered if the two is adjacent, otherwise labeled as communication path node pair to be filtered, step 32 is returned to;
Step 33:Delete the side between u, v, and type according to 2 points of u, v and u it is direct before because node and v it is direct after The type of fruit node, chooses and applies reparation rule, repairs destroyed origin dependence, updates provenance graph, return to step 32;
Step 34:Communication path node pair to be filtered is taken successively<U, v>, the communication path p between u, v is obtained successively, using most The method filtering p of small cost decision-making, i.e., calculated the filtering cost of each edge on the p of path and repair cost respectively by cost function With the minimum side of the total cost Cost values of deletion<S, t>;According to 2 points of u, v type and u it is direct before it is straight because of node and v The type of consequence node is connect, chooses and applies reparation rule, destroyed origin is repaired and indirectly relies on relation;If s it is direct before Because when node is different with t direct result node on the p of path, then must also be according to 2 points of s, t type and s direct cause The type of node and t direct result node, selects and applies reparation rule, repairs destroyed provenance graph.
5. customizable according to claim 4 rely on semantic effective origin filter method, it is characterised in that the cost Function is:Wherein VP is the connection node logarithm in former provenance graph, VPSFor the company in filtered view Logical node logarithm, VPSCFor the connection node logarithm of path change in filtered view.
6. customizable according to claim 4 rely on semantic effective origin filter method, it is characterised in that described all kinds of Filtering and reparation rule are:
Entity class node is filtered and reparation rule is as follows:
(1) when before directly because node is empty and direct result node is entity class or activity class when, filtering rule:Delete the node And its side between direct result node;The situation is without repairing;
(2) when direct result node be it is empty and directly before because node be entity class, activity class or proxy class when, filtering rule:Delete Except the node and its with directly before because of the side between node;The situation is without repairing;
(3) when before directly because node is proxy class and direct result node is entity class or activity class when, filtering rule:Deleting should Node and its with directly before because of the side between node and direct result node;The situation can not be repaired;
(4) when before directly because node is entity class and direct result node is activity class when, filtering rule:Delete the node and with It is directly preceding because of the side between node and direct result node;Reparation rule:Because of node before being searched directly first in provenance graph It is nearest before because of active node, if increasing direct result node in the presence of if to preceding because of the communication dependence of active node recently;
(5) when before directly because node is activity class and direct result node is entity class when, filtering rule:Delete the node and its With directly before because of the side between node and direct result node;Reparation rule:Direct result node is searched first in provenance graph Nearest consequence active node, if increase in the presence of if nearest consequence active node to directly before because of the communication dependence of node;
(6) when because of node and direct result node being all activity class before directly, filtering rule:Delete the node and its with it is direct The preceding side because between node and direct result node;Reparation rule:Increase from direct result node to directly preceding leading to because of node Believe dependence;
(7) when because of node and direct result node being all entity class before directly, filtering rule:Delete the node and its with it is direct The preceding side because between node and direct result node;Reparation rule:Increase from direct result node to directly before because of the group of node Raw dependence;
Activity class node is filtered and reparation rule is as follows:
(1) when before directly because node is empty and direct result node is entity class or activity class when, filtering rule:Delete the node And its side between direct result node;The situation is without repairing;
(2) when direct result node be it is empty and directly before because node be entity class, activity class or proxy class when, filtering rule:Delete Except the node and its with directly before because of the side between node;The situation is without repairing;
(3) when directly before because node be agency and direct result node be activity class when, filtering rule:Delete the node and its with Because of the side between node and direct result node before directly;The situation can not be repaired;
(4) when directly before because node be agency and direct result node be entity class when, filtering rule:Delete the node and its with Because of the side between node and direct result node before directly;Reparation rule:Increase direct result node to directly preceding because of node Belong to dependence;
(5) when before directly because node is entity class and direct result node is activity class when, filtering rule:Delete the node and its With directly before because of the side between node and direct result node;Reparation rule:Because of node before being searched directly first in provenance graph It is nearest before because of active node, if increasing in the presence of if from direct result node to being closed because the communication of active node is relied on before recently System;
(6) when before directly because node is activity class and direct result node is entity class when, filtering rule:Delete the node and its With directly before because of the side between node and direct result node;Reparation rule:Direct result node is searched first in provenance graph Nearest consequence active node, if increase in the presence of if from nearest consequence active node to directly before because node communication rely on close System;
(7) when because of node and direct result node being all activity class before directly, filtering rule:Delete the node and its with it is direct The preceding side because between node and direct result node;Reparation rule:Increase from direct result node to directly preceding leading to because of node Believe dependence;
(8) when because of node and direct result node being all entity class before directly, filtering rule:Delete the node and its with it is direct The preceding side because between node and direct result node;Reparation rule need to divide situation discussion:Depended on when direct result node derives from When before directly because of node, without repairing;Otherwise in provenance graph respectively search directly before because node it is nearest before because of active node And the nearest consequence active node of direct result node, if exist if increase from nearest consequence node to recently before because of node Communication dependence;
Proxy class node is filtered and reparation rule is as follows:
(1) when before directly because node is empty and direct result node is entity class, activity class or proxy class when, filtering rule:Delete Except the node and its side between direct result node;The situation is without repairing;
(2) when direct result node be it is empty and directly before because node be proxy class when, filtering rule:Delete the node and its with it is straight Connect the side between consequence node;The situation is without repairing;
(3) when before directly because node is proxy class and direct result node is proxy class when, filtering rule:Delete the node and its With the side between direct result node;Reparation rule:Increase to rely on from direct result node to the directly preceding agency because of node and close System;
(4) when before directly because node is proxy class and direct result node is entity class when, filtering rule:Delete the node and its With the side between direct result node;Reparation rule:Increase direct result node to directly preceding because of the ownership dependence of node;
(5) when before directly because node is proxy class and direct result node is activity class when, filtering rule:Delete the node and its With the side between direct result node;Reparation rule:Increase direct result node to directly preceding because of the association dependence of node;
Delete Usage classes side<U, v>Available origin dependence reparation rule is as follows afterwards:
(1) when u it is direct before because node be empty or during proxy class, the situation can not be repaired;
(2) when u it is direct before because node be entity class when, first in provenance graph u it is direct before because node nearest cause live Dynamic node, it is preceding because of the communication dependence of active node recently from v to this if increasing in the presence of if;
(3) when u it is direct before because node be activity class when, then increase before direct from v to u because of the communication dependence of node;
Delete Generation classes side<U, v>Available origin dependence reparation rule is as follows afterwards:
(1) when u it is direct before because node be proxy class when, then can increase before direct from v to u and to be closed because the ownership of node is relied on System;
(2) when v direct result node is empty and u it is direct before because node be not proxy class when, it is impossible to repair;
(3) when v direct result node is entity class, v nearest consequence active node is searched first in provenance graph, if depositing Then increasing from the node to the communication dependence u;
(4) when v direct result node is activity class, then the communication dependence from the node to u is increased;
Delete Derivation classes side<U, v>Available origin dependence reparation rule is as follows afterwards:
(1) when u it is direct before because node be empty or during proxy class, the situation can not be repaired;
(2) u it is direct before because direct result node that node is activity class and v be entity class when, v is searched in provenance graph first Nearest consequence active node, if increasing in the presence of if before direct from the node to u because of the communication dependence of node;
(3) when u it is direct before because direct result node that node is entity class and v be activity class when, looked into first in provenance graph Look for u it is nearest before because of active node, if increasing in the presence of if from v direct result node to the communication dependence of the node;
(4) when u it is direct before because the direct result node of node and v be all entity class when, first in provenance graph search u most Nearby because of active node and v nearest consequence active node, increase if existing from nearest consequence active node to most nearby Because of the communication dependence of active node;
(5) when u it is direct before because the direct result node of node and v be all activity class when, then increase the direct result node from v To the direct preceding because of the communication dependence of node of u;
Delete Communication classes side<U, v>Available origin dependence reparation rule is as follows afterwards:
(1) when u it is direct before because node be empty or during proxy class, the situation can not be repaired;
(2) when u it is direct before because direct result node that node is entity class and v be activity class when, then increase from v it is direct after Communication dependence of the fruit node to u;
(3) when u it is direct before because direct result node that node is activity class and v be entity class when, then increase straight from v to u Because of the communication dependence of node before connecing;
(4) when u it is direct before because the direct result node of node and v is all entity class when have two kinds of optional reparation rules:The It is a kind of that the nearest preceding because of active node of u is searched first in provenance graph, rely on pass if increasing the communication from v to the node in the presence of if System;The nearest consequence active node for first looking for v second, if increasing the communication dependence from the node to u in the presence of if;
(5) when u it is direct before because the direct result node of node and v is all activity class when have two kinds of optional reparation rules:The A kind of increase is direct preceding because of the communication dependence of node from v to u;Second of increase is from v direct result node to the logical of u Believe dependence;
Delete Association classes side<U, v>Available origin dependence reparation rule is as follows afterwards:
(1) when v direct result node is space-time, the situation can not be repaired;
(2) when v direct result node is entity class, increase from v direct result node to u ownership dependence;
(3) when u it is direct before because direct result node that node is proxy class and v be activity class when, increase direct from v to u It is preceding because of the association dependence of node;
Delete Attribution classes side<U, v>Available origin dependence reparation rule is as follows afterwards:
(1) when u it is direct before because node be proxy class when, because of the ownership dependence of node before direct from v to u of increase;
Delete Delegation classes side<U, v>Available origin dependence reparation rule is as follows afterwards:
(1) when u it is direct before because the direct result node of node and v be all space-time, the situation can not be repaired;
(2) when u it is direct before because node be empty or when proxy class and v direct result node are entity class, increase is from the direct of v Ownership dependence of the consequence node to u;
(3) when u it is direct before because node be empty or when proxy class and v direct result node are activity classes, increase is from the direct of v Association dependence of the consequence node to u;
(4) when u it is direct before because node be empty and when v direct result node is proxy class, increase the direct result node from v Dependence is acted on behalf of to u;
(5) when u it is direct before because direct result node that node is proxy class and v be proxy class when, have two kinds of optional reparations Rule:The first increase acts on behalf of dependence from v direct result node to u;Direct cause of second of increase from v to u Node acts on behalf of dependence.
7. customizable according to claim 1 rely on semantic effective origin filter method, it is characterised in that the step 4, according to arrangement rule arrange step 3 obtained by provenance graph, obtain origin filtered view include following sub-step:
Step 41:The proxy class node and activity class node in provenance graph are checked, is deleted if node disjoint;
Step 42:The side in provenance graph is checked, if one of summit that side is depended on has been deleted, the side is deleted;
Step 43:Check provenance graph in side, if the dependence represented by can by it is other while reasoning obtain, delete should Side;
Step 44:Provenance graph is updated, final origin filtered view is obtained.
8. customizable according to claim 1 rely on semantic effective origin filter method, it is characterised in that the step 5, the filtering report of generation origin includes following sub-step:
Step 51:Calculate nodes V to be filteredR, side number E to be filteredR, communication path number VP to be filteredR
Step 52:Extreme saturation original provenance graph, calculates total node number V, side number E, communication path number VP;
Step 53:Extreme saturation origin filtered view, calculates the nodes V retained after filteringS, side number ES, communication path number VPS、 The connection node logarithm VP of path change compared with former provenance graphSC
Step 54:Calculate the effect assessment value of origin filtered view:
Wherein, ω1、ω2、ω3、ω4Point 30%, 30%, 30%, 10% is not entered as;It should be noted that according to demand, ω1、ω2、ω3、ω4Assignment may be selected Other combinations, but ω need to be met1234=100%;
Step 55:The filtering report of generation origin.
9. customizable according to claim 8 rely on semantic effective origin filter method, it is characterised in that the origin Filtering report content includes original provenance graph, filtration needs, origin filtered view, filtration needs and meets situation, filters more Rise source element and dependence, for repair origin rely on it is semantic changes can reasoning originate from dependence and the filtering that originates from is regarded The value of utility of figure.
CN201710158061.4A 2017-03-16 2017-03-16 It is customizable to rely on semantic effective origin filter method Pending CN107016065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710158061.4A CN107016065A (en) 2017-03-16 2017-03-16 It is customizable to rely on semantic effective origin filter method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710158061.4A CN107016065A (en) 2017-03-16 2017-03-16 It is customizable to rely on semantic effective origin filter method

Publications (1)

Publication Number Publication Date
CN107016065A true CN107016065A (en) 2017-08-04

Family

ID=59440723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710158061.4A Pending CN107016065A (en) 2017-03-16 2017-03-16 It is customizable to rely on semantic effective origin filter method

Country Status (1)

Country Link
CN (1) CN107016065A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204789A (en) * 2021-05-25 2021-08-03 陕西科技大学 Origin filtering framework based on filtering primitive
CN114650249A (en) * 2020-12-02 2022-06-21 南京中兴软件有限责任公司 Algorithm model and path determination method, electronic device, SDN controller and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN102314638A (en) * 2010-06-30 2012-01-11 国际商业机器公司 Be used for the historical method and system of managerial meeting
CN103823885A (en) * 2014-03-07 2014-05-28 河海大学 Data provenance dependence relation analysis model-based data dependence analysis method
CN104239581A (en) * 2014-10-13 2014-12-24 河海大学 Database-system-oriented replicated data provenance tracing method
CN105550332A (en) * 2015-12-21 2016-05-04 河海大学 Dual-layer index structure based origin graph query method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN102314638A (en) * 2010-06-30 2012-01-11 国际商业机器公司 Be used for the historical method and system of managerial meeting
CN103823885A (en) * 2014-03-07 2014-05-28 河海大学 Data provenance dependence relation analysis model-based data dependence analysis method
CN104239581A (en) * 2014-10-13 2014-12-24 河海大学 Database-system-oriented replicated data provenance tracing method
CN105550332A (en) * 2015-12-21 2016-05-04 河海大学 Dual-layer index structure based origin graph query method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114650249A (en) * 2020-12-02 2022-06-21 南京中兴软件有限责任公司 Algorithm model and path determination method, electronic device, SDN controller and medium
CN113204789A (en) * 2021-05-25 2021-08-03 陕西科技大学 Origin filtering framework based on filtering primitive
CN113204789B (en) * 2021-05-25 2023-08-25 陕西科技大学 A Origin Filtering Framework Based on Filtering Primitives

Similar Documents

Publication Publication Date Title
US20240070487A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
CN112905580B (en) Multi-source heterogeneous data fusion system and method based on industrial big data
US8619084B2 (en) Dynamic adaptive process discovery and compliance
CN118761745B (en) OA collaborative workflow optimization method applied to enterprise
US20110270853A1 (en) Dynamic Storage and Retrieval of Process Graphs
CN112184302A (en) Product recommendation method and device, rule engine and storage medium
CN106484401B (en) An Automatic Refactoring Method for Object-Oriented Software
JP2023027741A (en) Real-time enterprise footprint automated system
Fraternali et al. Automating function point analysis with model driven development
CN110781177A (en) Electric energy meter electricity utilization information sorting method and device and readable storage medium
CN107016065A (en) It is customizable to rely on semantic effective origin filter method
CN107194280B (en) Model establishing method and device
KR101710649B1 (en) System for providing evaluation service of enterprise value by automated network deduction
CN111709636A (en) An emergency response strategy evaluation method, device, electronic device and storage medium
CN118535369A (en) System fault processing method and device, storage medium and electronic equipment
Dutta et al. Aggregation of heterogeneously related information with extended geometric Bonferroni mean and its application in group decision making
CN117764744A (en) Complaint early warning method, device, equipment and storage medium for insurance user
CN119917793A (en) A data processing method and related device
CN116795822A (en) Construction project abnormal data processing method and device
CN113204789B (en) A Origin Filtering Framework Based on Filtering Primitives
CN113656267B (en) Device energy efficiency calculation method and device, electronic device and storage medium
Assunção et al. A multi-objective solution for retrieving class diagrams
CN119417618B (en) A policy trusteeship method and system
CN115438036B (en) Data redundancy processing system and method for unified index database of power grid
CN120256641A (en) Related party information change processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170804