CN104899137A - Discovering method for defect mode in concurrent program - Google Patents
Discovering method for defect mode in concurrent program Download PDFInfo
- Publication number
- CN104899137A CN104899137A CN201510264963.7A CN201510264963A CN104899137A CN 104899137 A CN104899137 A CN 104899137A CN 201510264963 A CN201510264963 A CN 201510264963A CN 104899137 A CN104899137 A CN 104899137A
- Authority
- CN
- China
- Prior art keywords
- migration
- state
- sequence
- thread
- program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000007547 defect Effects 0.000 title claims abstract description 35
- 238000005065 mining Methods 0.000 claims abstract description 20
- 230000005012 migration Effects 0.000 claims description 73
- 238000013508 migration Methods 0.000 claims description 73
- 238000012544 monitoring process Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 9
- 239000000523 sample Substances 0.000 claims description 3
- 238000000151 deposition Methods 0.000 claims 1
- 238000003745 diagnosis Methods 0.000 abstract description 4
- 238000010380 label transfer Methods 0.000 description 6
- 230000007704 transition Effects 0.000 description 6
- 238000000682 scanning probe acoustic microscopy Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000002085 persistent effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 230000009467 reduction Effects 0.000 description 3
- 238000002347 injection Methods 0.000 description 2
- 239000007924 injection Substances 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Landscapes
- Debugging And Monitoring (AREA)
Abstract
Description
技术领域 technical field
本发明属于软件故障诊断领域,是一种自主发现并发程序缺陷模式的方法,尤其是一种针对系统执行轨迹使用序列模式挖掘的缺陷模式发现方法。 The invention belongs to the field of software fault diagnosis, and is a method for autonomously discovering defect patterns of concurrent programs, in particular a method for discovering defect patterns using sequence pattern mining for system execution tracks.
背景技术 Background technique
发现并发程序缺陷是一件困难的事情,一般都只能根据日志信息判断故障原因,而普通的监控系统只对软件行为进行检测而无法进一步判断软件的执行是否处于正确的轨迹(正确的行为)之上。 It is difficult to find concurrent program defects. Generally, the cause of the fault can only be judged according to the log information, while the ordinary monitoring system only detects the software behavior and cannot further judge whether the execution of the software is on the correct track (correct behavior). above.
基于面向方面编程的监控程序可将监测逻辑和业务逻辑相互分离,能方便获取到一个并行程序系统的执行轨迹。针对这些获取到的并发程序系统的执行轨迹,可利用序列模式挖掘技术对这些轨迹进行挖掘,这样能有效解决并发程序故障诊断困难的问题,并能从并发程序执行轨迹这一海量数据中提取有用信息,以获取到隐含在历史并发程序轨迹序列集上的故障特征关键点。这种技术将可有效地保障并发软件系统的安全性。 The monitoring program based on aspect-oriented programming can separate monitoring logic and business logic, and can easily obtain the execution track of a parallel program system. Based on the acquired execution trajectories of concurrent program systems, sequential pattern mining technology can be used to mine these trajectories, which can effectively solve the problem of difficult fault diagnosis of concurrent programs and extract useful data from the massive data of concurrent program execution trajectories. information to obtain the key points of fault characteristics implied in the sequence set of historical concurrent program trajectories. This technique can effectively guarantee the security of concurrent software systems.
现有技术的典型实现方案如下: A typical implementation scheme of the prior art is as follows:
第一,基于专家系统的故障/缺陷诊断程序。其主要原理是将领域专家的知识与经验被表示成产生式规则,并在知识库和数据库的支持下,综合运用各种规则进行一系列的推理。该诊断程序在运行中要求用户输入必要的信息后,就可快速地直接找到最终故障/缺陷或是最有可能的故障/缺陷,并由用户对结果进行进一步判断。但这种方法需要建立规则库,并利用规则库对程序轨迹状态进行判断和预测,这种方法的主要缺点是自动化程度不高,方法的复用性较差。 First, the fault/defect diagnosis program based on the expert system. Its main principle is to express the knowledge and experience of domain experts as production rules, and with the support of knowledge base and database, use various rules comprehensively to carry out a series of inferences. After the diagnostic program requires the user to input necessary information during operation, the final fault/defect or the most probable fault/defect can be found directly and quickly, and the user can further judge the result. But this method needs to establish a rule base, and use the rule base to judge and predict the state of the program trajectory. The main disadvantage of this method is that the degree of automation is not high, and the reusability of the method is poor.
第二,方便和有效的获取程序执行轨迹是进行软件故障/缺陷诊断的前提。在程序执行轨迹获取方面,美国加利福尼亚大学伯克利分校和贝儿实验室共同提出了动态偏序化简算法(Dynamic Partial Order Reduction,DPOR),探索线程的交互执行情况,并提出了Persistent集合的概念,以保证线程执行情况搜索的完备性。该算法从程序的初始状态执行的迁移序列开始采用深度优先搜索程序状态栈,动态探索线程间交互信息,然后利用这些信息去鉴定回溯点,从而探索状态空间的可选路径,获取程序执行轨迹。 Second, convenient and effective acquisition of program execution trace is a prerequisite for software fault/defect diagnosis. In terms of program execution trajectory acquisition, the University of California, Berkeley and Bell Labs jointly proposed the Dynamic Partial Order Reduction algorithm (Dynamic Partial Order Reduction, DPOR), which explores the interactive execution of threads, and proposes the concept of Persistent collections. To ensure the completeness of thread execution search. The algorithm uses depth-first search of the program state stack from the transition sequence executed by the initial state of the program, dynamically explores the interaction information between threads, and then uses this information to identify the backtracking point, so as to explore the optional path of the state space and obtain the program execution trajectory.
在程序状态的搜索过程中计算各个状态的Persistent集合,使得程序里所有迁移序列都不是空序列,在某一状态下可行的迁移集合里所有迁移和其他迁移序列中的迁移都是独立的,保证搜索的完备性。 During the search process of the program state, the Persistent set of each state is calculated, so that all migration sequences in the program are not empty sequences, and all migrations in the feasible migration set in a certain state are independent of migrations in other migration sequences, ensuring that completeness of the search.
但该算法的主要缺点是没有将完整地程序轨迹序列获取并存储;没有为程序轨迹序列的进一步分析提供基础;没有针对含有循环的并发程序进行优化,会出现对已经搜索过的迁移反复搜索的情况,这会造成了DPOR算法效率的低下,这反映在采用该算法的总探索时间和单次平均探索时间都较长。 However, the main disadvantage of this algorithm is that it does not acquire and store the complete program trajectory sequence; it does not provide a basis for further analysis of the program trajectory sequence; it does not optimize for concurrent programs containing loops, and there will be repeated searches for migrations that have been searched. This will cause the low efficiency of the DPOR algorithm, which is reflected in the long total exploration time and single average exploration time of the algorithm.
第三,在对缺陷模式的发现方法上,美国康奈尔大学提出序列模式挖掘算法(Sequential PAttern Mining,SPAM)进行序列模式挖掘,对序列集进行挖掘得到频繁模式。SPAM算法首先将序列信息存储于数据库位图中,然后可在项集层面实行项集拓展,和可在序列层面实行序列拓展,并使用格理论对数据库进行深度优先遍历从而找出序列模式。SPAM算法的主要缺点是需要用户自己来设定最小支持度阈值,不当的支持度阈值可能带来挖掘出的模式质量差的问题;只能对单一类型的序列集进行处理,误报情况严重。 Third, in terms of the discovery method of defect patterns, Cornell University in the United States proposed a sequential pattern mining algorithm (Sequential PAttern Mining, SPAM) to mine sequential patterns, and to mine sequence sets to obtain frequent patterns. The SPAM algorithm first stores the sequence information in the database bitmap, and then implements item set expansion at the item set level, and sequence expansion at the sequence level, and uses lattice theory to perform depth-first traversal on the database to find out the sequence pattern. The main disadvantage of the SPAM algorithm is that the user needs to set the minimum support threshold by himself. An improper support threshold may cause poor quality of the mined patterns; only a single type of sequence set can be processed, and the false positives are serious.
发明内容 Contents of the invention
本发明的目的在于针对以上不足,在传统基于动态偏序化简算法获取并发程序执行轨迹和SPAM算法进行序列模式挖掘的基础上,针对算法的核心步骤,即获取并发程序轨迹和对轨迹进行挖掘以发现缺陷模式,提出涉及这两步的改进的做法,能自主发现程序缺陷模式。 The purpose of the present invention is to address the above deficiencies, on the basis of the traditional dynamic partial order simplification algorithm to obtain the concurrent program execution track and the SPAM algorithm for sequential pattern mining, aiming at the core steps of the algorithm, that is, to obtain the concurrent program track and to mine the track By discovering the defect mode, an improvement method involving these two steps is proposed, and the program defect mode can be found independently.
本发明的技术方案是基于已插装的并发程序,在其运行过程中动态调度线程的执行,实时收集程序执行信息,根据需要存储程序轨迹序列,对执行序列进行挖掘得到缺陷模式。 The technical solution of the present invention is based on an inserted concurrent program, dynamically schedules the execution of threads during its running, collects program execution information in real time, stores program track sequences as required, and mines the execution sequences to obtain defect patterns.
本发明提供一种并发程序的缺陷模式发现方法,包括以下步骤: The present invention provides a method for discovering defect patterns of concurrent programs, comprising the following steps:
(1)监控目标程序,使用面向方面编程探针代码的注入方法对监控点进行监控以便获取状态信息; (1) Monitor the target program, and use the injection method of the aspect-oriented programming probe code to monitor the monitoring point in order to obtain status information;
(2)使用动态标记迁移算法DLT对设置监控点的目标程序的执行进行调度,获取目标程序的监控点的状态信息; (2) Use the dynamic label transfer algorithm DLT to schedule the execution of the target program with monitoring points, and obtain the status information of the monitoring points of the target program;
(3)把监控到的状态信息,以轨迹序列的形式存储在数据库中,获取数据库中每条轨迹序列的执行结果,对轨迹序列进行分类,得到成功执行序列集和失效执行序列集; (3) Store the monitored state information in the database in the form of trajectory sequences, obtain the execution results of each trajectory sequence in the database, classify the trajectory sequences, and obtain the successful execution sequence set and the failure execution sequence set;
(4)在成功执行序列集和失效执行序列集中,利用序列模式挖掘算法ISPAM进行挖掘,得到目标程序缺陷模式。 (4) In the successful execution sequence set and the failure execution sequence set, the sequence pattern mining algorithm ISPAM is used to mine, and the defect pattern of the target program is obtained.
在上述技术方案中,步骤(1)中所述的获取状态信息的步骤如下: In the above technical solution, the steps for obtaining status information described in step (1) are as follows:
步骤一,基于面向方面编程思想和待监控的源程序,首先由用户在待监控的并发程序中针对共享变量的读写操作定义监控点,接着用户在监控点处添加若干个切点; Step 1. Based on the idea of aspect-oriented programming and the source program to be monitored, the user first defines monitoring points for the read and write operations of shared variables in the concurrent program to be monitored, and then the user adds several cut points at the monitoring points;
步骤二,由用户定义每一个切点对应的通知,同时定义该通知需执行的处理代码,当切点匹配时,通知内定义的处理代码在切点之前,或在切点之后,或在切点附近这3处执行,通知内的transform()方法将用于存放包含源程序中被监控方法的参数表数组args中的行号、序列号信息的临时变量meg中的内容以及执行环境中的线程标识符转换成监控现场信息,并存入在事件类EventPO中。 Step 2. The user defines the notification corresponding to each cut point, and defines the processing code to be executed for the notification. When the cut point matches, the processing code defined in the notification is before, after, or after the cut point. These 3 places near the point are executed, the transform() method in the notification will be used to store the line number in the parameter table array args of the monitored method in the source program, the content in the temporary variable meg of the serial number information, and the content in the execution environment The thread identifier is converted into monitoring site information and stored in the event class EventPO.
在上述技术方案中,步骤(2)中所述的动态标记迁移算法DLT(S, T, t,SeS)的算法输入参数有:全局状态栈S,全局迁移序列集T,局部迁移t和初始的全局序列集SeS;算法输出参数有:全局序列集SeS;其步骤如下: In the above technical solution, the algorithm input parameters of the dynamic label transfer algorithm DLT(S, T, t, SeS) described in step (2) include: global state stack S, global migration sequence set T, local migration t and initial The global sequence set SeS; the output parameters of the algorithm are: the global sequence set SeS; the steps are as follows:
步骤一,将S中的栈顶元素出栈存入状态s,将SeS中的当前序列存入序列变量σ中; Step 1, pop the top element of the stack in S and store it in state s, and store the current sequence in SeS into the sequence variable σ;
步骤二,更新s的回溯信息; Step 2, update the backtracking information of s;
步骤三,如果存在线程p和s状态下可行迁移集合s.enabled中线程p的一个迁移元素t,执行以下步骤: Step 3, if there is a migration element t of thread p in the feasible migration set s.enabled in the state of thread p and s, perform the following steps:
(3.1)将线程p赋值给状态s的回溯集s.backtrack; (3.1) Assign the thread p to the backtracking set s.backtrack of the state s;
(3.2)创建状态s下已经执行过的迁移集合s.done,并将其置为空; (3.2) Create the migration set s.done that has been executed in state s, and set it to empty;
(3.3)如果存在某一线程q是状态s的回溯集s.backtrack与状态s下已经执行过的迁移集合s.done的差集中的元素,重复执行以下步骤,直到不存在这样的线程q为止: (3.3) If there is a thread q that is an element in the difference set between the backtracking set s.backtrack of state s and the migration set s.done that has been executed in state s, repeat the following steps until there is no such thread q :
(3.3.1)将q加入状态s下已经执行过的迁移集合s.done中; (3.3.1) Add q to the migration set s.done that has been executed in state s;
(3.3.2)从状态s的回溯集s.backtrack中删除线程q; (3.3.2) Delete thread q from the backtracking set s.backtrack of state s;
(3.3.3)设p’是执行迁移tm的线程,若线程p’属于s.done,且tm属于s.enabled,则将tm加入集合s.retrieved中; (3.3.3) Let p' be the thread that executes migration t m , if thread p' belongs to s.done, and t m belongs to s.enabled, then add t m to the set s.retrieved;
(3.3.4)将s状态下可行迁移集合s.enabled中线程q执行的迁移设置为tn; (3.3.4) Set the migration performed by thread q in the feasible migration set s.enabled in state s to t n ;
(3.3.5)创建项集is并初始化为空,并将tn添加到is中; (3.3.5) Create the itemset is and initialize it to be empty, and add t n to is;
(3.3.6)将项集is添加到序列变量σ中; (3.3.6) Add the itemset is to the sequence variable σ;
(3.3.7)判断从状态s出发且执行tn迁移后到达的状态是否为空值,若不是,则执行以下步骤,否则不执行: (3.3.7) Determine whether the state starting from state s and arriving after t n migration is empty, if not, execute the following steps, otherwise do not execute:
(3.3.7.1)将从状态s出发且执行tn迁移后得到的状态设为s'; (3.3.7.1) Set the state obtained after starting from state s and performing t n migration as s';
(3.3.7.2)将独立于迁移t且属于s.enabled中的迁移t'存入s'.retrieved集合中; (3.3.7.2) Store the migration t' that is independent of the migration t and belongs to s.enabled into the s'.retrieved collection;
(3.3.7.3)将s'.enabled与s'.retrieved集合的差集存入s'.enabled中; (3.3.7.3) Store the difference between s'.enabled and s'.retrieved in s'.enabled;
(3.3.7.4)将迁移tn添加入迁移序列T中; (3.3.7.4) Add migration t n to migration sequence T;
(3.3.7.5)将状态s'压入状态栈S中; (3.3.7.5) Push state s' into state stack S;
(3.3.7.6)递归调用DLT(S, T, t,SeS)算法; (3.3.7.6) Recursively call the DLT(S, T, t, SeS) algorithm;
(3.3.7.7)将T中的最后一项序列移出; (3.3.7.7) Remove the last sequence in T;
(3.3.8)将S中的一项状态出栈; (3.3.8) Pop a state in S out of the stack;
步骤四,返回全局序列集SeS。 Step 4, return the global sequence set SeS.
在上述技术方案中,步骤(4)中序列模式挖掘算法ISPAM的步骤如下: In the above technical solution, the steps of the sequential pattern mining algorithm ISPAM in step (4) are as follows:
步骤一,将候选模式集合R和前k个缺陷模式的集合L设置为空集,并设置全局变量minsup为0; Step 1: Set the candidate pattern set R and the top k defect pattern set L as empty sets, and set the global variable minsup to 0;
步骤二,将成功执行序列集GS和失效执行序列集BS分别转化为垂直序列数据集V(GS)和V(BS),生成V(BS)中项集列表Sinit; Step 2, converting the successful execution sequence set GS and the failure execution sequence set BS into vertical sequence data sets V(GS) and V(BS) respectively, and generating the item set list S init in V(BS);
步骤三,对于项集列表Sinit中的每一个项items,完成以下步骤: Step 3, for each item items in the itemset list S init , complete the following steps:
(3.1)计算其是否是频繁项,如果是: (3.1) Calculate whether it is a frequent item, if it is:
(3.1.1)调用SAVE()方法对由单个项组成的模式存储到L中; (3.1.1) Call the SAVE() method to store the schema consisting of a single item into L;
(3.1.2)在R集合添加项items,Sinit,Sinit中在items之后的项的集合组成的三元组; (3.1.2) Add items to the R collection, S init , a triplet composed of a collection of items after items in S init ;
步骤四,如果存在元组 属于R集合并且r的支持度大于等于minsup,则重复做以下步骤,直到不满足这一条件为止: Step 4, if there is a tuple Belongs to the R set and the support of r is greater than or equal to minsup, then repeat the following steps until this condition is not met:
(4.1)选择R集合中拥有最高支持度的模式r的元组; (4.1) Select the tuple of the pattern r with the highest support in the R set ;
(4.2)通过调用SEARCH()方法对候选模式集合R进行拓展,更新候选模式集合R,并将拓展得到的模式调用SAVE()方法存储到L中; (4.2) Expand the candidate pattern set R by calling the SEARCH() method, update the candidate pattern set R, and store the extended patterns in L by calling the SAVE() method;
(4.3)将元组从R中移除; (4.3) will tuple removed from R;
(4.4)将R中的所有满足条件的元组从R中移除; (4.4) will satisfy all in R tuples of conditions are removed from R;
步骤五,返回含有前k个缺陷模式的集合L。 Step 5, return the set L containing the top k defect patterns.
本发明方法与现有技术相比具有以下优点: Compared with the prior art, the inventive method has the following advantages:
1.对目标源程序进行插装并监控,对目标程序的监控进行设置,针对DPOR算法对循环较多的程序,执行效率较低的缺点,排除了对相互独立迁移的选取,并将已经检索过的迁移加入一个专门的集合,避免反复搜索已经执行过的迁移,从而实现了对包含循环较多程序的化简,提高了执行效率,在选择性搜索过程中,记录轨迹序列,将轨迹序列存入序列集合中,得到全部的程序轨迹序列。 1. Insert and monitor the target source program, set the monitoring of the target program, and aim at the shortcomings of the DPOR algorithm for programs with many loops and low execution efficiency, eliminate the selection of independent migration, and retrieve the The migrations that have been performed are added to a special set to avoid repeated searches for migrations that have been executed, thereby simplifying the program that contains many loops and improving execution efficiency. During the selective search process, the trajectory sequence is recorded and the trajectory sequence Store it in the sequence set to get all the program trajectory sequences.
2.针对SPAM算法中用户设置最小支持度带来的模式质量过差和数量过多的问题,引入项数k来规定对最佳模式数量的挖掘,转化为前k项序列模式挖掘,得到前k项效果最佳的缺陷解释模式。基于此,为了达到减少搜索空间,提高算法的效率,同时能够解决缺陷误报问题的目的,定义了成功执行序列集、失效执行序列集、相对支持度概念,所谓成功执行序列集指的是并发程序的执行结果符合程序的预期行为的执行序列集;所谓失效执行序列集指的是并发程序的执行结果不符合程序的预期行为的执行序列集;所谓相对支持度指的频繁出现在失效执行序列集而较少出现在成功执行序列集的一种相对度。在此基础上,本发明提出了在挖掘过程中通过对成功执行序列集和失效执行序列集中同一模式的比较来计算其相对支持度,减少对频繁出现在成功执行序列集中的模式选取,从而降低缺陷误报率的方法。 2. Aiming at the problem of poor pattern quality and too many patterns caused by the minimum support set by the user in the SPAM algorithm, the number of items k is introduced to specify the number of optimal pattern mining, which is transformed into the top k sequential pattern mining, and the top k items are obtained. The best defect explanation model for k items. Based on this, in order to achieve the purpose of reducing the search space, improving the efficiency of the algorithm, and solving the problem of false alarms, the concepts of successful execution sequence set, failure execution sequence set, and relative support are defined. The so-called successful execution sequence set refers to concurrent The execution result of the program conforms to the execution sequence set of the expected behavior of the program; the so-called failure execution sequence set refers to the execution sequence set whose execution result of the concurrent program does not meet the expected behavior of the program; the so-called relative support refers to the execution sequence set that frequently appears in the failure execution sequence A relative measure of sets that occur less frequently in successfully executed sequences. On this basis, the present invention proposes to calculate the relative support degree by comparing the same pattern in the successful execution sequence set and the failure execution sequence set in the mining process, so as to reduce the selection of patterns frequently appearing in the successful execution sequence set, thereby reducing the The method of defect false positive rate.
附图说明 Description of drawings
图1为本发明方法的流程图。 Fig. 1 is the flowchart of the method of the present invention.
具体实施方式 Detailed ways
本实施使用装有Windows 7操作系统的PC机作为平台,通过MyEclipse开发并发程序缺陷检测程序。本发明实施时,用户必须安装1.7版本以上的JDK,以获取Java提供的新特性。 This implementation uses a PC with Windows 7 operating system as a platform to develop a concurrent program defect detection program through MyEclipse. When the present invention is implemented, the user must install JDK above version 1.7 to obtain the new features provided by Java.
此外,需定义如下术语: In addition, the following terms need to be defined:
(1) 监控现场:指监控的目标程序运行时的状态信息。 (1) Monitoring site: refers to the status information of the monitored target program when it is running.
(2) 通知:指程序执行中明确定义的点(接入点)被程序运行所触发时应该执行的代码。 (2) Notification: Refers to the code that should be executed when a clearly defined point (access point) in program execution is triggered by program execution.
(3) 并发程序的状态:指由所有共享变量的全局状态和每个线程的局部状态所组成的状态。 (3) The state of a concurrent program: refers to the state consisting of the global state of all shared variables and the local state of each thread.
(4) 并发程序的迁移:指为将程序从一个状态转移到下一个状态,某个线程执行的一个可见操作。 (4) Migration of concurrent programs: Refers to a visible operation performed by a thread to transfer the program from one state to the next.
(5) 垂直成功执行序列集:指用位图表示的垂直格式的成功执行序列集。 (5) Vertical successful execution sequence set: refers to the successful execution sequence set in vertical format represented by a bitmap.
(6) 垂直失效执行序列集:指用位图表示的垂直格式的失效执行序列集。 (6) Vertical invalidation execution sequence set: refers to the invalidation execution sequence set in vertical format represented by a bitmap.
本发明实施例涉及如下文献: Embodiments of the present invention relate to the following documents:
[1] Flanagan C, Godefroid P. Dynamic Partial-Order Reduction for Model Checking Software POPL'2005[M]. New York: ACM press, 2005。 [1] Flanagan C, Godefroid P. Dynamic Partial-Order Reduction for Model Checking Software POPL'2005[M]. New York: ACM press, 2005.
[2] Ayres, J., Flannick, J., Gehrke, J., Yiu, T.. Sequential PAttern mining using a bitmap representation, Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002), July 23-26, 2002, Edmonton, Alberta, pp. 429-435 (2002)。 [2] Ayres, J., Flannick, J., Gehrke, J., Yiu, T.. Sequential PAttern mining using a bitmap representation, Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002) , July 23-26, 2002, Edmonton, Alberta, pp. 429-435 (2002).
以下以在PC机上采用Java语言在JDK开发环境下开发并发程序缺陷检测程序为例说明本发明技术方案。 The technical scheme of the present invention will be described below by taking the Java language on a PC to develop a concurrent program defect detection program under the JDK development environment as an example.
(一)设置监控目标程序。 (1) Set up the monitoring target program.
为了获取并发程序执行的状态信息,首先需要对被监控的目标程序进行设置。具体做法是:基于面向方面编程思想,在需监控的目标程序中定义监控点、切点(pointcut)、通知(advice),使用探针注入方式对监控点进行监控以获取状态信息。 In order to obtain the status information of concurrent program execution, it is first necessary to set the target program to be monitored. The specific method is: based on the idea of aspect-oriented programming, define monitoring points, pointcuts, and advice in the target program to be monitored, and use probe injection to monitor the monitoring points to obtain status information.
获取状态信息过程中涉及到的相关变量定义,具体如表1所示。 The definitions of related variables involved in the process of obtaining status information are shown in Table 1.
表1获取状态信息过程中的符号
获取状态信息的步骤如下: The steps to obtain status information are as follows:
步骤一,基于面向方面编程思想和待监控的源程序,首先由用户在待监控的并发程序中针对共享变量的读写操作定义监控点,接着用户在监控点处添加若干个切点; Step 1. Based on the idea of aspect-oriented programming and the source program to be monitored, the user first defines monitoring points for the read and write operations of shared variables in the concurrent program to be monitored, and then the user adds several cut points at the monitoring points;
步骤二,由用户定义每一个切点对应的通知,同时定义该通知需执行的处理代码,当切点匹配时,通知内定义的处理代码在切点之前,或在切点之后,或在切点附近这3处执行,通知内的transform()方法将临时变量meg中的内容以及执行环境中的线程标识符转换成监控现场信息,并存入在事件类EventPO中。 Step 2. The user defines the notification corresponding to each cut point, and defines the processing code to be executed for the notification. When the cut point matches, the processing code defined in the notification is before, after, or after the cut point. These 3 places are executed near the point. The transform() method in the notification converts the content in the temporary variable meg and the thread identifier in the execution environment into monitoring site information, and stores it in the event class EventPO.
(二)使用动态标记迁移算法DLT对设置监控点的目标程序的执行进行调度。 (2) Use the dynamic label transfer algorithm DLT to schedule the execution of the target program that sets the monitoring point.
使用本发明提出的动态标记迁移算法DLT,对待监控的并发程序中的线程执行进行调度。 The thread execution in the concurrent program to be monitored is scheduled by using the dynamic label transfer algorithm DLT proposed by the present invention.
本算法要用到的相关符号定义如表2所示。 The definitions of the relevant symbols used in this algorithm are shown in Table 2.
表2.动态标记迁移算法DLT中的符号
所述动态标记迁移算法DLT(S, T, t,SeS)的步骤如下: The steps of the dynamic label transfer algorithm DLT (S, T, t, SeS) are as follows:
算法输入参数有:全局状态栈S,全局迁移序列集T,局部迁移t和初始的全局序列集SeS。算法输出参数有:全局序列集SeS。 The input parameters of the algorithm are: the global state stack S, the global transition sequence set T, the local transition t and the initial global sequence set SeS. The output parameters of the algorithm are: global sequence set SeS.
(1)将S中的栈顶元素出栈存入状态s,将SeS中的当前序列存入序列变量σ中; (1) Pop the top element of S into the state s, store the current sequence in SeS into the sequence variable σ;
(2)更新s的回溯信息; (2) Update the backtracking information of s;
(3)如果存在线程p和s状态下可行迁移集合s.enabled中线程p的一个迁移元素t,执行以下步骤: (3) If there is a migration element t of thread p in the feasible migration set s.enabled in the state of thread p and s, perform the following steps:
(3.1)将线程p赋值给状态s的回溯集s.backtrack; (3.1) Assign the thread p to the backtracking set s.backtrack of the state s;
(3.2)创建状态s下已经执行过的迁移集合s.done,并将其置为空; (3.2) Create the migration set s.done that has been executed in state s, and set it to empty;
(3.3)如果存在某一线程q是状态s的回溯集s.backtrack与状态s下已经执行过的迁移集合s.done的差集中的元素,重复执行以下步骤,直到不存在这样的线程q为止: (3.3) If there is a thread q that is an element in the difference set between the backtracking set s.backtrack of state s and the migration set s.done that has been executed in state s, repeat the following steps until there is no such thread q :
(3.3.1)将q加入状态s下已经执行过的迁移集合s.done中; (3.3.1) Add q to the migration set s.done that has been executed in state s;
(3.3.2)从状态s的回溯集s.backtrack中删除线程q; (3.3.2) Delete thread q from the backtracking set s.backtrack of state s;
(3.3.3)设p’是执行迁移tm的线程,若线程p’属于s.done,且tm属于s.enabled,则将tm加入集合s.retrieved中; (3.3.3) Let p' be the thread that executes migration t m , if thread p' belongs to s.done, and t m belongs to s.enabled, then add t m to the set s.retrieved;
(3.3.4)将s状态下可行迁移集合s.enabled中线程q执行的迁移设置为tn; (3.3.4) Set the migration performed by thread q in the feasible migration set s.enabled in state s to t n ;
(3.3.5)创建项集is并初始化为空,并将tn添加到is中; (3.3.5) Create the itemset is and initialize it to be empty, and add t n to is;
(3.3.6)将项集is添加到序列变量σ中; (3.3.6) Add the itemset is to the sequence variable σ;
(3.3.7)判断从状态s出发且执行tn迁移后到达的状态是否为空值,若不是,则执行以下步骤,否则不执行: (3.3.7) Determine whether the state starting from state s and arriving after t n migration is empty, if not, execute the following steps, otherwise do not execute:
(3.3.7.1)将从状态s出发且执行tn迁移后得到的状态设为s'; (3.3.7.1) Set the state obtained after starting from state s and performing t n migration as s';
(3.3.7.2)将独立于迁移t且属于s.enabled中的迁移t'存入s'.retrieved集合中; (3.3.7.2) Store the migration t' that is independent of the migration t and belongs to s.enabled into the s'.retrieved collection;
(3.3.7.3)将s'.enabled与s'.retrieved集合的差集存入s'.enabled中; (3.3.7.3) Store the difference between s'.enabled and s'.retrieved in s'.enabled;
(3.3.7.4)将迁移tn添加入迁移序列T中; (3.3.7.4) Add migration t n to migration sequence T;
(3.3.7.5)将状态s'压入状态栈S中; (3.3.7.5) Push state s' into state stack S;
(3.3.7.6)递归调用DLT(S, T, t,SeS)算法; (3.3.7.6) Recursively call the DLT(S, T, t, SeS) algorithm;
(3.3.7.7)将T中的最后一项序列移出; (3.3.7.7) Remove the last sequence in T;
(3.3.8)将S中的一项状态出栈; (3.3.8) Pop a state in S out of the stack;
(4)返回全局序列集SeS。 (4) Return the global sequence set SeS.
其具体算法如下。 Its specific algorithm is as follows.
(三)把监控到的状态信息,以轨迹序列的形式存储在数据库中。 (3) Store the monitored state information in the database in the form of trajectory sequence.
把监控获取到的状态信息,存储在事件类EventPO中,通过Dao方法将事件类EventPO中的信息以轨迹序列的形式存储在数据库SDB中。类EventPO记录每个监控点事件;类SequencePO记录每次执行的序列信息,其中包括序列id、序列结果状态(status)。 Store the state information acquired by monitoring in the event class EventPO, and store the information in the event class EventPO in the database SDB in the form of a track sequence through the Dao method. Class EventPO records each monitoring point event; class SequencePO records sequence information for each execution, including sequence id and sequence result status (status).
使用Hibernate进行对象关系映射ORM(Object Relationship Mapping,ORM)操作,和数据访问对象DAO(Data Access Object,DAO)操作Java中的持久化对象PO(persistant object,PO),完成将EventPO和SequencePO中数据存储到数据库SDB。 Use Hibernate to perform object relationship mapping ORM (Object Relationship Mapping, ORM) operation, and data access object DAO (Data Access Object, DAO) to operate the persistent object PO (persistent object, PO) in Java, and complete the data in EventPO and SequencePO Stored to the database SDB.
事件类EventPO的定义如下: The event class EventPO is defined as follows:
public class EventPO { public class EventPO {
private int primaryKey; //表示项的序号 Private int primaryKey; //Indicates the serial number of the item
private int threadID; //表示线程的序号 Private int threadID; //Indicates the serial number of the thread
private String threadName;//表示线程名 private String threadName;//Indicates the thread name
private int methodLine;//表示被监控方法所在行 Private int methodLine;//Indicates the line where the monitored method is located
private String methodName;//表示被监控方法名 private String methodName;//Indicates the name of the monitored method
private int shareMemoryIndex;//表示共享内存的序号 private int shareMemoryIndex;//Indicates the serial number of the shared memory
private String shareMemoryValue;//表示共享内存的值 private String shareMemoryValue;//Indicates the value of shared memory
private Date date;//表示操作日期 private Date date;//Indicates the operation date
private int millisecond;//表示操作时间 Private int millisecond;//Indicates the operation time
private SequencePO sequence;//表示所属的序列号 private SequencePO sequence;//Indicates the sequence number it belongs to
} }
上述定义中: In the above definition:
primaryKey表示项的序号; primaryKey represents the serial number of the item;
threadID表示线程的序号; threadID represents the serial number of the thread;
threadName表示线程名; threadName represents the thread name;
methodLine表示被监控方法所在行; methodLine indicates the line where the monitored method is located;
methodName表示被监控方法名; methodName indicates the name of the monitored method;
shareMemoryIndex表示共享内存的序号; shareMemoryIndex indicates the serial number of the shared memory;
shareMemoryValue表示共享内存的值; shareMemoryValue represents the value of shared memory;
date表示操作日期; date represents the operation date;
millisecond表示操作时间; millisecond indicates the operation time;
sequence表示所属的序列号。 sequence indicates the sequence number it belongs to.
(四)获取数据库中每条轨迹序列的执行结果,对轨迹序列进行分类,得到成功执行序列集和失效执行序列集。 (4) Obtain the execution results of each trajectory sequence in the database, classify the trajectory sequences, and obtain a successful execution sequence set and a failure execution sequence set.
获取轨迹序列中的执行结果状态(status),依据预先定义的失效结果,判定并发程序中序列执行结果的状态是否丢失修改,或是否存在不可重复读、或是否读“脏”数据等情况中的一种,如是则执行结果是失效的,对轨迹序列进行分类,得到成功执行序列集和失效执行序列集。 Obtain the execution result status (status) in the track sequence, and determine whether the status of the sequence execution result in the concurrent program is lost or modified, or whether there is non-repeatable read, or whether to read "dirty" data, etc. according to the predefined failure result One, if yes, the execution result is invalid, classify the trajectory sequence, and obtain a successful execution sequence set and a failure execution sequence set.
(五)基于轨迹序列的并发程序缺陷模式挖掘。 (5) Concurrent program defect pattern mining based on trajectory sequence.
在成功执行序列集和失效执行序列集中,利用以下改进的序列模式挖掘算法ISPAM对存储在数据库的数据集进行挖掘,得到并发程序缺陷模式。 In the successful execution sequence set and the failure execution sequence set, the following improved sequence pattern mining algorithm ISPAM is used to mine the data set stored in the database to obtain the concurrent program defect pattern.
序列模式挖掘算法ISPAM的相关变量含义如表3所示。 The meanings of the relevant variables of the sequential pattern mining algorithm ISPAM are shown in Table 3.
表3 序列模式挖掘算法ISPAM中的符号
所述序列模式挖掘算法ISPAM的步骤如下: The steps of the sequential pattern mining algorithm ISPAM are as follows:
(1)将候选模式集合R和前k个缺陷模式的集合L设置为空集,并设置全局变量minsup为0; (1) Set the candidate pattern set R and the top k defect pattern set L as empty sets, and set the global variable minsup to 0;
(2) 步骤二,将成功执行序列集GS和失效执行序列集BS分别转化为垂直序列数据集V(GS)和V(BS),生成V(BS)中项集列表Sinit; (2) Step 2: Transform the successful execution sequence set GS and the failure execution sequence set BS into vertical sequence data sets V(GS) and V(BS) respectively, and generate the item set list S init in V(BS);
(3)对于项集列表Sinit中的每一个项items,完成以下步骤: (3) For each item in the itemset list S init , complete the following steps:
(3.1)计算其是否是频繁项,如果是: (3.1) Calculate whether it is a frequent item, if it is:
(3.1.1)调用SAVE()方法对由单个项组成的模式存储到L中; (3.1.1) Call the SAVE() method to store the schema consisting of a single item into L;
(3.1.2)在R集合添加项items,Sinit,Sinit中在items之后的项的集合组成的三元组; (3.1.2) Add items to the R collection, S init , a triplet composed of a collection of items after items in S init ;
(4)如果存在元组属于R集合并且r的支持度大于等于minsup,则重复做以下步骤,直到不满足这一条件为止: (4) If there is a tuple Belongs to the R set and the support of r is greater than or equal to minsup, then repeat the following steps until this condition is not met:
(4.1)选择R集合中拥有最高支持度的模式r的元组; (4.1) Select the tuple of the pattern r with the highest support in the R set ;
(4.2)通过调用SEARCH()方法对候选模式集合R进行拓展,更新候选模式集合R,并将拓展得到的模式调用SAVE()方法存储到L中; (4.2) Expand the candidate pattern set R by calling the SEARCH() method, update the candidate pattern set R, and store the extended patterns in L by calling the SAVE() method;
(4.3)将元组从R中移除; (4.3) will tuple removed from R;
(4.4)将R中的所有满足条件的元组从R中移除; (4.4) will satisfy all in R tuples of conditions are removed from R;
(5)返回含有前k个缺陷模式的集合L。 (5) Return the set L containing the top k defect patterns.
其具体算法如下。 Its specific algorithm is as follows.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510264963.7A CN104899137B (en) | 2015-05-22 | 2015-05-22 | A Defect Pattern Discovery Method for Concurrent Programs |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510264963.7A CN104899137B (en) | 2015-05-22 | 2015-05-22 | A Defect Pattern Discovery Method for Concurrent Programs |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN104899137A true CN104899137A (en) | 2015-09-09 |
| CN104899137B CN104899137B (en) | 2017-09-01 |
Family
ID=54031812
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510264963.7A Expired - Fee Related CN104899137B (en) | 2015-05-22 | 2015-05-22 | A Defect Pattern Discovery Method for Concurrent Programs |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN104899137B (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106529304A (en) * | 2016-10-27 | 2017-03-22 | 南京大学 | Android application concurrent vulnerability detection system |
| CN106598554A (en) * | 2015-10-14 | 2017-04-26 | 上海汽车集团股份有限公司 | Code generating apparatus |
| CN108139751A (en) * | 2015-09-30 | 2018-06-08 | 西门子股份公司 | Determine the method and diagnostic method of the diagnostic mode of the time series of technological system |
| CN108182144A (en) * | 2017-12-14 | 2018-06-19 | 东南大学 | A kind of concurrent program method for decomposing based on sequential mode mining |
| CN109522097A (en) * | 2018-10-11 | 2019-03-26 | 天津大学 | A kind of concurrent defect inspection method based on self-adapting random test |
| CN112965844A (en) * | 2019-12-12 | 2021-06-15 | 北京沃东天骏信息技术有限公司 | CPU surge accident processing method and device |
| CN115080374A (en) * | 2021-03-11 | 2022-09-20 | 中国科学院软件研究所 | General concurrent defect detection method and system based on partial order relation |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101710378A (en) * | 2009-10-10 | 2010-05-19 | 北京理工大学 | Software security flaw detection method based on sequential pattern mining |
| CN101727391A (en) * | 2009-12-14 | 2010-06-09 | 北京理工大学 | Method for extracting operation sequence of software vulnerability characteristics |
| US20100332540A1 (en) * | 2009-06-29 | 2010-12-30 | Siemens Corporation | Condition monitoring with automatically generated error templates from log messages and sensor trends based on time semi-intervals |
| US20150074035A1 (en) * | 2013-09-02 | 2015-03-12 | Appnomic Systems Private Limited | Detecting root cause for transaction degradation using causal bayesian networks |
-
2015
- 2015-05-22 CN CN201510264963.7A patent/CN104899137B/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100332540A1 (en) * | 2009-06-29 | 2010-12-30 | Siemens Corporation | Condition monitoring with automatically generated error templates from log messages and sensor trends based on time semi-intervals |
| CN101710378A (en) * | 2009-10-10 | 2010-05-19 | 北京理工大学 | Software security flaw detection method based on sequential pattern mining |
| CN101727391A (en) * | 2009-12-14 | 2010-06-09 | 北京理工大学 | Method for extracting operation sequence of software vulnerability characteristics |
| US20150074035A1 (en) * | 2013-09-02 | 2015-03-12 | Appnomic Systems Private Limited | Detecting root cause for transaction degradation using causal bayesian networks |
Non-Patent Citations (2)
| Title |
|---|
| 杨羡环: "基于函数调用序列的漏洞定位方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
| 黄南平: "基于序列模式挖掘软件异常行为检测", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108139751A (en) * | 2015-09-30 | 2018-06-08 | 西门子股份公司 | Determine the method and diagnostic method of the diagnostic mode of the time series of technological system |
| CN106598554A (en) * | 2015-10-14 | 2017-04-26 | 上海汽车集团股份有限公司 | Code generating apparatus |
| CN106529304A (en) * | 2016-10-27 | 2017-03-22 | 南京大学 | Android application concurrent vulnerability detection system |
| CN106529304B (en) * | 2016-10-27 | 2019-06-14 | 南京大学 | An Android application concurrency vulnerability detection system |
| CN108182144A (en) * | 2017-12-14 | 2018-06-19 | 东南大学 | A kind of concurrent program method for decomposing based on sequential mode mining |
| CN108182144B (en) * | 2017-12-14 | 2020-12-11 | 东南大学 | A Concurrent Program Analysis Method Based on Sequence Pattern Mining |
| CN109522097A (en) * | 2018-10-11 | 2019-03-26 | 天津大学 | A kind of concurrent defect inspection method based on self-adapting random test |
| CN109522097B (en) * | 2018-10-11 | 2023-03-07 | 天津大学 | A Concurrent Defect Detection Method Based on Adaptive Random Testing |
| CN112965844A (en) * | 2019-12-12 | 2021-06-15 | 北京沃东天骏信息技术有限公司 | CPU surge accident processing method and device |
| CN115080374A (en) * | 2021-03-11 | 2022-09-20 | 中国科学院软件研究所 | General concurrent defect detection method and system based on partial order relation |
| CN115080374B (en) * | 2021-03-11 | 2024-07-02 | 中国科学院软件研究所 | A general concurrent defect detection method and system based on partial order relation |
Also Published As
| Publication number | Publication date |
|---|---|
| CN104899137B (en) | 2017-09-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN104899137B (en) | A Defect Pattern Discovery Method for Concurrent Programs | |
| AU2019200046B2 (en) | Utilizing artificial intelligence to test cloud applications | |
| US11269822B2 (en) | Generation of automated data migration model | |
| US9189135B2 (en) | Three-dimensional GUI object stores in automation test tools | |
| US8140573B2 (en) | Exporting and importing business objects based on metadata | |
| JP2017508210A5 (en) | ||
| CN109376196B (en) | Method and device for batch synchronization of redo logs | |
| Corazza et al. | A probabilistic based approach towards software system clustering | |
| CN113553313A (en) | A data migration method and system, storage medium and electronic device | |
| US10789294B2 (en) | Method and system for performing searches of graphs as represented within an information technology system | |
| Rezig et al. | Dagger: A data (not code) debugger | |
| Singer et al. | Fundamental nano-patterns to characterize and classify java methods | |
| vanden Broucke et al. | Improved artificial negative event generation to enhance process event logs | |
| CN117312035A (en) | Root cause analysis method, root cause analysis device and root cause analysis medium | |
| CN115470285B (en) | Spacecraft operation mode mining method, system, storage medium and electronic equipment | |
| Kim et al. | Parallel processing of multiple graph queries using MapReduce | |
| CN119166610B (en) | Database operation and maintenance method, system and storage medium | |
| CN106294092B (en) | Semi-automatic log analysis method and system based on ontology knowledge base | |
| CN114461659A (en) | Killing method, device, computer equipment and storage medium | |
| CN115296896B (en) | Method, device, and electronic device for dynamically generating attack paths | |
| Lu et al. | Towards efficient closed infrequent itemset mining using bi-directional traversing | |
| CN112434831A (en) | Troubleshooting method and device, storage medium and computer equipment | |
| US20160335300A1 (en) | Searching Large Data Space for Statistically Significant Patterns | |
| CN113793063B (en) | Method and device for detecting conflict of planning project scheme of power distribution network | |
| WO2024049486A1 (en) | Zero-shot learning using decision tree |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170901 |