CN104899137A

CN104899137A - Discovering method for defect mode in concurrent program

Info

Publication number: CN104899137A
Application number: CN201510264963.7A
Authority: CN
Inventors: 叶俊民; 李超; 曹洋洋; 叶竹君; 李蓉; 杨艳
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2015-05-22
Filing date: 2015-05-22
Publication date: 2015-09-09
Anticipated expiration: 2035-05-22
Also published as: CN104899137B

Abstract

The present invention belongs to the field of software fault diagnosis and provides an autonomously discovering method for a defect mode in a concurrent program, particularly a defect mode discovering method for mining system execution traces by using sequential patterns. According to the technical scheme provided by the invention, based on a plugged concurrent program, the execution of threads are dynamically scheduled during the operation of the plugged concurrent program; program execution information is collected in real time; a program trace sequence is stored as required; the execution sequence is mined; and the program defect pattern is autonomously discovered.

Description

A Defect Pattern Discovery Method for Concurrent Programs

技术领域 technical field

本发明属于软件故障诊断领域，是一种自主发现并发程序缺陷模式的方法，尤其是一种针对系统执行轨迹使用序列模式挖掘的缺陷模式发现方法。 The invention belongs to the field of software fault diagnosis, and is a method for autonomously discovering defect patterns of concurrent programs, in particular a method for discovering defect patterns using sequence pattern mining for system execution tracks.

背景技术 Background technique

发现并发程序缺陷是一件困难的事情，一般都只能根据日志信息判断故障原因，而普通的监控系统只对软件行为进行检测而无法进一步判断软件的执行是否处于正确的轨迹（正确的行为）之上。 It is difficult to find concurrent program defects. Generally, the cause of the fault can only be judged according to the log information, while the ordinary monitoring system only detects the software behavior and cannot further judge whether the execution of the software is on the correct track (correct behavior). above.

基于面向方面编程的监控程序可将监测逻辑和业务逻辑相互分离，能方便获取到一个并行程序系统的执行轨迹。针对这些获取到的并发程序系统的执行轨迹，可利用序列模式挖掘技术对这些轨迹进行挖掘，这样能有效解决并发程序故障诊断困难的问题，并能从并发程序执行轨迹这一海量数据中提取有用信息,以获取到隐含在历史并发程序轨迹序列集上的故障特征关键点。这种技术将可有效地保障并发软件系统的安全性。 The monitoring program based on aspect-oriented programming can separate monitoring logic and business logic, and can easily obtain the execution track of a parallel program system. Based on the acquired execution trajectories of concurrent program systems, sequential pattern mining technology can be used to mine these trajectories, which can effectively solve the problem of difficult fault diagnosis of concurrent programs and extract useful data from the massive data of concurrent program execution trajectories. information to obtain the key points of fault characteristics implied in the sequence set of historical concurrent program trajectories. This technique can effectively guarantee the security of concurrent software systems.

现有技术的典型实现方案如下： A typical implementation scheme of the prior art is as follows:

第一，基于专家系统的故障/缺陷诊断程序。其主要原理是将领域专家的知识与经验被表示成产生式规则,并在知识库和数据库的支持下,综合运用各种规则进行一系列的推理。该诊断程序在运行中要求用户输入必要的信息后,就可快速地直接找到最终故障/缺陷或是最有可能的故障/缺陷,并由用户对结果进行进一步判断。但这种方法需要建立规则库，并利用规则库对程序轨迹状态进行判断和预测，这种方法的主要缺点是自动化程度不高，方法的复用性较差。 First, the fault/defect diagnosis program based on the expert system. Its main principle is to express the knowledge and experience of domain experts as production rules, and with the support of knowledge base and database, use various rules comprehensively to carry out a series of inferences. After the diagnostic program requires the user to input necessary information during operation, the final fault/defect or the most probable fault/defect can be found directly and quickly, and the user can further judge the result. But this method needs to establish a rule base, and use the rule base to judge and predict the state of the program trajectory. The main disadvantage of this method is that the degree of automation is not high, and the reusability of the method is poor.

第二，方便和有效的获取程序执行轨迹是进行软件故障/缺陷诊断的前提。在程序执行轨迹获取方面，美国加利福尼亚大学伯克利分校和贝儿实验室共同提出了动态偏序化简算法（Dynamic Partial Order Reduction，DPOR），探索线程的交互执行情况，并提出了Persistent集合的概念，以保证线程执行情况搜索的完备性。该算法从程序的初始状态执行的迁移序列开始采用深度优先搜索程序状态栈，动态探索线程间交互信息，然后利用这些信息去鉴定回溯点，从而探索状态空间的可选路径，获取程序执行轨迹。 Second, convenient and effective acquisition of program execution trace is a prerequisite for software fault/defect diagnosis. In terms of program execution trajectory acquisition, the University of California, Berkeley and Bell Labs jointly proposed the Dynamic Partial Order Reduction algorithm (Dynamic Partial Order Reduction, DPOR), which explores the interactive execution of threads, and proposes the concept of Persistent collections. To ensure the completeness of thread execution search. The algorithm uses depth-first search of the program state stack from the transition sequence executed by the initial state of the program, dynamically explores the interaction information between threads, and then uses this information to identify the backtracking point, so as to explore the optional path of the state space and obtain the program execution trajectory.

在程序状态的搜索过程中计算各个状态的Persistent集合，使得程序里所有迁移序列都不是空序列，在某一状态下可行的迁移集合里所有迁移和其他迁移序列中的迁移都是独立的，保证搜索的完备性。 During the search process of the program state, the Persistent set of each state is calculated, so that all migration sequences in the program are not empty sequences, and all migrations in the feasible migration set in a certain state are independent of migrations in other migration sequences, ensuring that completeness of the search.

但该算法的主要缺点是没有将完整地程序轨迹序列获取并存储；没有为程序轨迹序列的进一步分析提供基础；没有针对含有循环的并发程序进行优化，会出现对已经搜索过的迁移反复搜索的情况，这会造成了DPOR算法效率的低下，这反映在采用该算法的总探索时间和单次平均探索时间都较长。 However, the main disadvantage of this algorithm is that it does not acquire and store the complete program trajectory sequence; it does not provide a basis for further analysis of the program trajectory sequence; it does not optimize for concurrent programs containing loops, and there will be repeated searches for migrations that have been searched. This will cause the low efficiency of the DPOR algorithm, which is reflected in the long total exploration time and single average exploration time of the algorithm.

第三，在对缺陷模式的发现方法上，美国康奈尔大学提出序列模式挖掘算法(Sequential PAttern Mining,SPAM)进行序列模式挖掘，对序列集进行挖掘得到频繁模式。SPAM算法首先将序列信息存储于数据库位图中，然后可在项集层面实行项集拓展，和可在序列层面实行序列拓展，并使用格理论对数据库进行深度优先遍历从而找出序列模式。SPAM算法的主要缺点是需要用户自己来设定最小支持度阈值，不当的支持度阈值可能带来挖掘出的模式质量差的问题；只能对单一类型的序列集进行处理，误报情况严重。 Third, in terms of the discovery method of defect patterns, Cornell University in the United States proposed a sequential pattern mining algorithm (Sequential PAttern Mining, SPAM) to mine sequential patterns, and to mine sequence sets to obtain frequent patterns. The SPAM algorithm first stores the sequence information in the database bitmap, and then implements item set expansion at the item set level, and sequence expansion at the sequence level, and uses lattice theory to perform depth-first traversal on the database to find out the sequence pattern. The main disadvantage of the SPAM algorithm is that the user needs to set the minimum support threshold by himself. An improper support threshold may cause poor quality of the mined patterns; only a single type of sequence set can be processed, and the false positives are serious.

发明内容 Contents of the invention

本发明的目的在于针对以上不足，在传统基于动态偏序化简算法获取并发程序执行轨迹和SPAM算法进行序列模式挖掘的基础上，针对算法的核心步骤，即获取并发程序轨迹和对轨迹进行挖掘以发现缺陷模式，提出涉及这两步的改进的做法，能自主发现程序缺陷模式。 The purpose of the present invention is to address the above deficiencies, on the basis of the traditional dynamic partial order simplification algorithm to obtain the concurrent program execution track and the SPAM algorithm for sequential pattern mining, aiming at the core steps of the algorithm, that is, to obtain the concurrent program track and to mine the track By discovering the defect mode, an improvement method involving these two steps is proposed, and the program defect mode can be found independently.

本发明的技术方案是基于已插装的并发程序，在其运行过程中动态调度线程的执行，实时收集程序执行信息，根据需要存储程序轨迹序列，对执行序列进行挖掘得到缺陷模式。 The technical solution of the present invention is based on an inserted concurrent program, dynamically schedules the execution of threads during its running, collects program execution information in real time, stores program track sequences as required, and mines the execution sequences to obtain defect patterns.

本发明提供一种并发程序的缺陷模式发现方法，包括以下步骤： The present invention provides a method for discovering defect patterns of concurrent programs, comprising the following steps:

（1）监控目标程序，使用面向方面编程探针代码的注入方法对监控点进行监控以便获取状态信息； (1) Monitor the target program, and use the injection method of the aspect-oriented programming probe code to monitor the monitoring point in order to obtain status information;

（2）使用动态标记迁移算法DLT对设置监控点的目标程序的执行进行调度，获取目标程序的监控点的状态信息； (2) Use the dynamic label transfer algorithm DLT to schedule the execution of the target program with monitoring points, and obtain the status information of the monitoring points of the target program;

（3）把监控到的状态信息，以轨迹序列的形式存储在数据库中，获取数据库中每条轨迹序列的执行结果，对轨迹序列进行分类，得到成功执行序列集和失效执行序列集； (3) Store the monitored state information in the database in the form of trajectory sequences, obtain the execution results of each trajectory sequence in the database, classify the trajectory sequences, and obtain the successful execution sequence set and the failure execution sequence set;

（4）在成功执行序列集和失效执行序列集中，利用序列模式挖掘算法ISPAM进行挖掘，得到目标程序缺陷模式。 (4) In the successful execution sequence set and the failure execution sequence set, the sequence pattern mining algorithm ISPAM is used to mine, and the defect pattern of the target program is obtained.

在上述技术方案中，步骤（1）中所述的获取状态信息的步骤如下： In the above technical solution, the steps for obtaining status information described in step (1) are as follows:

步骤一，基于面向方面编程思想和待监控的源程序，首先由用户在待监控的并发程序中针对共享变量的读写操作定义监控点，接着用户在监控点处添加若干个切点； Step 1. Based on the idea of aspect-oriented programming and the source program to be monitored, the user first defines monitoring points for the read and write operations of shared variables in the concurrent program to be monitored, and then the user adds several cut points at the monitoring points;

步骤二，由用户定义每一个切点对应的通知，同时定义该通知需执行的处理代码，当切点匹配时，通知内定义的处理代码在切点之前，或在切点之后，或在切点附近这3处执行，通知内的transform()方法将用于存放包含源程序中被监控方法的参数表数组args中的行号、序列号信息的临时变量meg中的内容以及执行环境中的线程标识符转换成监控现场信息，并存入在事件类EventPO中。 Step 2. The user defines the notification corresponding to each cut point, and defines the processing code to be executed for the notification. When the cut point matches, the processing code defined in the notification is before, after, or after the cut point. These 3 places near the point are executed, the transform() method in the notification will be used to store the line number in the parameter table array args of the monitored method in the source program, the content in the temporary variable meg of the serial number information, and the content in the execution environment The thread identifier is converted into monitoring site information and stored in the event class EventPO.

在上述技术方案中，步骤（2）中所述的动态标记迁移算法DLT(S, T, t,SeS)的算法输入参数有：全局状态栈S，全局迁移序列集T，局部迁移t和初始的全局序列集SeS；算法输出参数有：全局序列集SeS；其步骤如下： In the above technical solution, the algorithm input parameters of the dynamic label transfer algorithm DLT(S, T, t, SeS) described in step (2) include: global state stack S, global migration sequence set T, local migration t and initial The global sequence set SeS; the output parameters of the algorithm are: the global sequence set SeS; the steps are as follows:

步骤一，将S中的栈顶元素出栈存入状态s，将SeS中的当前序列存入序列变量σ中； Step 1, pop the top element of the stack in S and store it in state s, and store the current sequence in SeS into the sequence variable σ;

步骤二，更新s的回溯信息； Step 2, update the backtracking information of s;

步骤三，如果存在线程p和s状态下可行迁移集合s.enabled中线程p的一个迁移元素t，执行以下步骤： Step 3, if there is a migration element t of thread p in the feasible migration set s.enabled in the state of thread p and s, perform the following steps:

（3.1）将线程p赋值给状态s的回溯集s.backtrack； (3.1) Assign the thread p to the backtracking set s.backtrack of the state s;

（3.2）创建状态s下已经执行过的迁移集合s.done，并将其置为空； (3.2) Create the migration set s.done that has been executed in state s, and set it to empty;

（3.3）如果存在某一线程q是状态s的回溯集s.backtrack与状态s下已经执行过的迁移集合s.done的差集中的元素，重复执行以下步骤，直到不存在这样的线程q为止： (3.3) If there is a thread q that is an element in the difference set between the backtracking set s.backtrack of state s and the migration set s.done that has been executed in state s, repeat the following steps until there is no such thread q :

（3.3.1）将q加入状态s下已经执行过的迁移集合s.done中； (3.3.1) Add q to the migration set s.done that has been executed in state s;

（3.3.2）从状态s的回溯集s.backtrack中删除线程q； (3.3.2) Delete thread q from the backtracking set s.backtrack of state s;

（3.3.3）设p’是执行迁移t_m的线程，若线程p’属于s.done，且t_m属于s.enabled，则将t_m加入集合s.retrieved中； (3.3.3) Let p' be the thread that executes migration t _m , if thread p' belongs to s.done, and t _m belongs to s.enabled, then add t _m to the set s.retrieved;

（3.3.4）将s状态下可行迁移集合s.enabled中线程q执行的迁移设置为t_n； (3.3.4) Set the migration performed by thread q in the feasible migration set s.enabled in state s to t _n ;

（3.3.5）创建项集is并初始化为空，并将t_n添加到is中； (3.3.5) Create the itemset is and initialize it to be empty, and add t _n to is;

（3.3.6）将项集is添加到序列变量σ中； (3.3.6) Add the itemset is to the sequence variable σ;

（3.3.7）判断从状态s出发且执行t_n迁移后到达的状态是否为空值，若不是，则执行以下步骤，否则不执行： (3.3.7) Determine whether the state starting from state s and arriving after t _n migration is empty, if not, execute the following steps, otherwise do not execute:

（3.3.7.1）将从状态s出发且执行t_n迁移后得到的状态设为s'； (3.3.7.1) Set the state obtained after starting from state s and performing t _n migration as s';

（3.3.7.2）将独立于迁移t且属于s.enabled中的迁移t'存入s'.retrieved集合中； (3.3.7.2) Store the migration t' that is independent of the migration t and belongs to s.enabled into the s'.retrieved collection;

（3.3.7.3）将s'.enabled与s'.retrieved集合的差集存入s'.enabled中； (3.3.7.3) Store the difference between s'.enabled and s'.retrieved in s'.enabled;

（3.3.7.4）将迁移t_n添加入迁移序列T中； (3.3.7.4) Add migration t _n to migration sequence T;

（3.3.7.5）将状态s'压入状态栈S中； (3.3.7.5) Push state s' into state stack S;

（3.3.7.6）递归调用DLT(S, T, t,SeS)算法; (3.3.7.6) Recursively call the DLT(S, T, t, SeS) algorithm;

（3.3.7.7）将T中的最后一项序列移出； (3.3.7.7) Remove the last sequence in T;

（3.3.8）将S中的一项状态出栈； (3.3.8) Pop a state in S out of the stack;

步骤四，返回全局序列集SeS。 Step 4, return the global sequence set SeS.

在上述技术方案中，步骤（4）中序列模式挖掘算法ISPAM的步骤如下： In the above technical solution, the steps of the sequential pattern mining algorithm ISPAM in step (4) are as follows:

步骤一，将候选模式集合R和前k个缺陷模式的集合L设置为空集，并设置全局变量minsup为0； Step 1: Set the candidate pattern set R and the top k defect pattern set L as empty sets, and set the global variable minsup to 0;

步骤二，将成功执行序列集GS和失效执行序列集BS分别转化为垂直序列数据集V(GS)和V(BS)，生成V(BS)中项集列表S_init； Step 2, converting the successful execution sequence set GS and the failure execution sequence set BS into vertical sequence data sets V(GS) and V(BS) respectively, and generating the item set list S _init in V(BS);

步骤三，对于项集列表S_init中的每一个项items，完成以下步骤： Step 3, for each item items in the itemset list S _init , complete the following steps:

（3.1）计算其是否是频繁项，如果是： (3.1) Calculate whether it is a frequent item, if it is:

（3.1.1）调用SAVE()方法对由单个项组成的模式存储到L中； (3.1.1) Call the SAVE() method to store the schema consisting of a single item into L;

（3.1.2）在R集合添加项items，S_init，S_init中在items之后的项的集合组成的三元组； (3.1.2) Add items to the R collection, S _init , a triplet composed of a collection of items after items in S _init ;

步骤四，如果存在元组属于R集合并且r的支持度大于等于minsup，则重复做以下步骤，直到不满足这一条件为止： Step 4, if there is a tuple Belongs to the R set and the support of r is greater than or equal to minsup, then repeat the following steps until this condition is not met:

（4.1）选择R集合中拥有最高支持度的模式r的元组； (4.1) Select the tuple of the pattern r with the highest support in the R set ;

（4.2）通过调用SEARCH()方法对候选模式集合R进行拓展，更新候选模式集合R，并将拓展得到的模式调用SAVE()方法存储到L中； (4.2) Expand the candidate pattern set R by calling the SEARCH() method, update the candidate pattern set R, and store the extended patterns in L by calling the SAVE() method;

（4.3）将元组从R中移除； (4.3) will tuple removed from R;

（4.4）将R中的所有满足条件的元组从R中移除； (4.4) will satisfy all in R tuples of conditions are removed from R;

步骤五，返回含有前k个缺陷模式的集合L。 Step 5, return the set L containing the top k defect patterns.

本发明方法与现有技术相比具有以下优点： Compared with the prior art, the inventive method has the following advantages:

1.对目标源程序进行插装并监控，对目标程序的监控进行设置，针对DPOR算法对循环较多的程序，执行效率较低的缺点，排除了对相互独立迁移的选取，并将已经检索过的迁移加入一个专门的集合，避免反复搜索已经执行过的迁移，从而实现了对包含循环较多程序的化简，提高了执行效率，在选择性搜索过程中，记录轨迹序列，将轨迹序列存入序列集合中，得到全部的程序轨迹序列。 1. Insert and monitor the target source program, set the monitoring of the target program, and aim at the shortcomings of the DPOR algorithm for programs with many loops and low execution efficiency, eliminate the selection of independent migration, and retrieve the The migrations that have been performed are added to a special set to avoid repeated searches for migrations that have been executed, thereby simplifying the program that contains many loops and improving execution efficiency. During the selective search process, the trajectory sequence is recorded and the trajectory sequence Store it in the sequence set to get all the program trajectory sequences.

2.针对SPAM算法中用户设置最小支持度带来的模式质量过差和数量过多的问题，引入项数k来规定对最佳模式数量的挖掘，转化为前k项序列模式挖掘，得到前k项效果最佳的缺陷解释模式。基于此，为了达到减少搜索空间，提高算法的效率，同时能够解决缺陷误报问题的目的，定义了成功执行序列集、失效执行序列集、相对支持度概念，所谓成功执行序列集指的是并发程序的执行结果符合程序的预期行为的执行序列集；所谓失效执行序列集指的是并发程序的执行结果不符合程序的预期行为的执行序列集；所谓相对支持度指的频繁出现在失效执行序列集而较少出现在成功执行序列集的一种相对度。在此基础上，本发明提出了在挖掘过程中通过对成功执行序列集和失效执行序列集中同一模式的比较来计算其相对支持度，减少对频繁出现在成功执行序列集中的模式选取，从而降低缺陷误报率的方法。 2. Aiming at the problem of poor pattern quality and too many patterns caused by the minimum support set by the user in the SPAM algorithm, the number of items k is introduced to specify the number of optimal pattern mining, which is transformed into the top k sequential pattern mining, and the top k items are obtained. The best defect explanation model for k items. Based on this, in order to achieve the purpose of reducing the search space, improving the efficiency of the algorithm, and solving the problem of false alarms, the concepts of successful execution sequence set, failure execution sequence set, and relative support are defined. The so-called successful execution sequence set refers to concurrent The execution result of the program conforms to the execution sequence set of the expected behavior of the program; the so-called failure execution sequence set refers to the execution sequence set whose execution result of the concurrent program does not meet the expected behavior of the program; the so-called relative support refers to the execution sequence set that frequently appears in the failure execution sequence A relative measure of sets that occur less frequently in successfully executed sequences. On this basis, the present invention proposes to calculate the relative support degree by comparing the same pattern in the successful execution sequence set and the failure execution sequence set in the mining process, so as to reduce the selection of patterns frequently appearing in the successful execution sequence set, thereby reducing the The method of defect false positive rate.

附图说明 Description of drawings

图1为本发明方法的流程图。 Fig. 1 is the flowchart of the method of the present invention.

具体实施方式 Detailed ways

本实施使用装有Windows 7操作系统的PC机作为平台，通过MyEclipse开发并发程序缺陷检测程序。本发明实施时，用户必须安装1.7版本以上的JDK，以获取Java提供的新特性。 This implementation uses a PC with Windows 7 operating system as a platform to develop a concurrent program defect detection program through MyEclipse. When the present invention is implemented, the user must install JDK above version 1.7 to obtain the new features provided by Java.

此外，需定义如下术语： In addition, the following terms need to be defined:

（1）监控现场：指监控的目标程序运行时的状态信息。 (1) Monitoring site: refers to the status information of the monitored target program when it is running.

（2）通知：指程序执行中明确定义的点（接入点）被程序运行所触发时应该执行的代码。 (2) Notification: Refers to the code that should be executed when a clearly defined point (access point) in program execution is triggered by program execution.

（3）并发程序的状态：指由所有共享变量的全局状态和每个线程的局部状态所组成的状态。 (3) The state of a concurrent program: refers to the state consisting of the global state of all shared variables and the local state of each thread.

（4）并发程序的迁移：指为将程序从一个状态转移到下一个状态，某个线程执行的一个可见操作。 (4) Migration of concurrent programs: Refers to a visible operation performed by a thread to transfer the program from one state to the next.

（5）垂直成功执行序列集：指用位图表示的垂直格式的成功执行序列集。 (5) Vertical successful execution sequence set: refers to the successful execution sequence set in vertical format represented by a bitmap.

（6）垂直失效执行序列集：指用位图表示的垂直格式的失效执行序列集。 (6) Vertical invalidation execution sequence set: refers to the invalidation execution sequence set in vertical format represented by a bitmap.

本发明实施例涉及如下文献： Embodiments of the present invention relate to the following documents:

[1] Flanagan C, Godefroid P. Dynamic Partial-Order Reduction for Model Checking Software POPL＇2005[M]. New York: ACM press, 2005。 [1] Flanagan C, Godefroid P. Dynamic Partial-Order Reduction for Model Checking Software POPL'2005[M]. New York: ACM press, 2005.

[2] Ayres, J., Flannick, J., Gehrke, J., Yiu, T.. Sequential PAttern mining using a bitmap representation, Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002), July 23-26, 2002, Edmonton, Alberta, pp. 429-435 (2002)。 [2] Ayres, J., Flannick, J., Gehrke, J., Yiu, T.. Sequential PAttern mining using a bitmap representation, Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002) , July 23-26, 2002, Edmonton, Alberta, pp. 429-435 (2002).

以下以在PC机上采用Java语言在JDK开发环境下开发并发程序缺陷检测程序为例说明本发明技术方案。 The technical scheme of the present invention will be described below by taking the Java language on a PC to develop a concurrent program defect detection program under the JDK development environment as an example.

（一）设置监控目标程序。 (1) Set up the monitoring target program.

为了获取并发程序执行的状态信息，首先需要对被监控的目标程序进行设置。具体做法是：基于面向方面编程思想，在需监控的目标程序中定义监控点、切点（pointcut）、通知（advice），使用探针注入方式对监控点进行监控以获取状态信息。 In order to obtain the status information of concurrent program execution, it is first necessary to set the target program to be monitored. The specific method is: based on the idea of aspect-oriented programming, define monitoring points, pointcuts, and advice in the target program to be monitored, and use probe injection to monitor the monitoring points to obtain status information.

获取状态信息过程中涉及到的相关变量定义，具体如表1所示。 The definitions of related variables involved in the process of obtaining status information are shown in Table 1.

表1获取状态信息过程中的符号符号名符号类型含义 meg 字符串存放参数表数组args中的行号、序列号信息的临时变量 transform() 方法将执行环境中的线程标识符和meg信息转换成监控现场信息的方法 EventPO 类存储监控现场信息的类 Table 1 Symbols in the process of obtaining status information symbolic name symbol type meaning meg string A temporary variable that stores the line number and serial number information in the parameter table array args transform() method Method for converting thread identifier and meg information in the execution environment into monitoring scene information EventPO kind A class that stores monitoring site information

获取状态信息的步骤如下： The steps to obtain status information are as follows:

步骤二，由用户定义每一个切点对应的通知，同时定义该通知需执行的处理代码，当切点匹配时，通知内定义的处理代码在切点之前，或在切点之后，或在切点附近这3处执行，通知内的transform()方法将临时变量meg中的内容以及执行环境中的线程标识符转换成监控现场信息，并存入在事件类EventPO中。 Step 2. The user defines the notification corresponding to each cut point, and defines the processing code to be executed for the notification. When the cut point matches, the processing code defined in the notification is before, after, or after the cut point. These 3 places are executed near the point. The transform() method in the notification converts the content in the temporary variable meg and the thread identifier in the execution environment into monitoring site information, and stores it in the event class EventPO.

（二）使用动态标记迁移算法DLT对设置监控点的目标程序的执行进行调度。 (2) Use the dynamic label transfer algorithm DLT to schedule the execution of the target program that sets the monitoring point.

使用本发明提出的动态标记迁移算法DLT，对待监控的并发程序中的线程执行进行调度。 The thread execution in the concurrent program to be monitored is scheduled by using the dynamic label transfer algorithm DLT proposed by the present invention.

本算法要用到的相关符号定义如表2所示。 The definitions of the relevant symbols used in this algorithm are shown in Table 2.

表2.动态标记迁移算法DLT中的符号符号名符号类型含义 update_backtrack_info 文献[1]中定义的方法更新回溯信息的方法 s 程序状态共享变量的全局状态和每个线程的局部状态组成的状态 SeS 集合由所有序列构成的集合 enabled 集合某一状态下可行迁移集合，其集合元素是可行的迁移 p 线程 s.enabled中的某一线程 t 迁移使得状态发生转移的一个可见操作 σ 序列 SeS中的当前序列 t_m 迁移该迁移属于s状态下可行迁移集合s.enabled中的迁移，且执行该迁移的线程是状态s下已经执行过的迁移集合s.done中的线程 t_n 迁移该迁移属于s状态下可行迁移集合s.enabled中线程q执行的迁移 retrieved 集合存放某一状态下能够执行却无执行必要的全部迁移的集合 q 线程某一线程 backtrack 集合某一状态下的回溯集，其集合元素是可执行但尚未执行的线程 done 集合某一状态下已经执行过的迁移集合 next(s,t) 方法从状态s出发且执行t迁移 s' 程序状态从状态s出发且执行t_n迁移后得到的状态 T 集合由所有迁移构成的集合 S 栈由所有状态构成的栈 p’ 集合执行迁移t_m的线程 is 项集添加到序列变量σ中的项集 t' 迁移该迁移属于s.enabled，独立于迁移t Table 2. Symbols in Dynamic Label Transfer Algorithm DLT symbolic name symbol type meaning update_backtrack_info The method defined in [1] Methods for Updating Backtrace Information the s program status A state consisting of the global state of shared variables and the local state of each thread SeS gather the set of all sequences enabled gather A set of feasible migrations in a certain state whose elements are feasible migrations p thread A thread in s.enabled t migrate A visible operation that causes a state transition σ sequence Current sequence in SeS t _m migrate This migration belongs to the migration in the feasible migration set s.enabled in the state s, and the thread that executes the migration is the thread in the migration set s.done that has been executed in the state s t _n migrate This migration belongs to the migration performed by thread q in the feasible migration set s.enabled in state s retrieved gather Store a collection of all migrations that can be executed in a certain state but are not necessary q thread a certain thread back track gather A traceback set in a certain state whose set elements are executable but not yet executed threads done gather A collection of migrations that have been executed in a certain state next(s,t) method Start from state s and perform t transition s' program status The state obtained after starting from state s and performing t _n transition T gather A collection of all migrations S the stack a stack of all states p' gather thread executing migration t _m is item set Itemsets added to the sequence variable σ t' migrate This migration belongs to s.enabled, independent of migration t

所述动态标记迁移算法DLT(S, T, t,SeS)的步骤如下： The steps of the dynamic label transfer algorithm DLT (S, T, t, SeS) are as follows:

算法输入参数有：全局状态栈S，全局迁移序列集T，局部迁移t和初始的全局序列集SeS。算法输出参数有：全局序列集SeS。 The input parameters of the algorithm are: the global state stack S, the global transition sequence set T, the local transition t and the initial global sequence set SeS. The output parameters of the algorithm are: global sequence set SeS.

（1）将S中的栈顶元素出栈存入状态s，将SeS中的当前序列存入序列变量σ中； (1) Pop the top element of S into the state s, store the current sequence in SeS into the sequence variable σ;

（2）更新s的回溯信息； (2) Update the backtracking information of s;

（3）如果存在线程p和s状态下可行迁移集合s.enabled中线程p的一个迁移元素t，执行以下步骤： (3) If there is a migration element t of thread p in the feasible migration set s.enabled in the state of thread p and s, perform the following steps:

（4）返回全局序列集SeS。 (4) Return the global sequence set SeS.

其具体算法如下。 Its specific algorithm is as follows.

（三）把监控到的状态信息，以轨迹序列的形式存储在数据库中。 (3) Store the monitored state information in the database in the form of trajectory sequence.

把监控获取到的状态信息，存储在事件类EventPO中，通过Dao方法将事件类EventPO中的信息以轨迹序列的形式存储在数据库SDB中。类EventPO记录每个监控点事件；类SequencePO记录每次执行的序列信息，其中包括序列id、序列结果状态（status）。 Store the state information acquired by monitoring in the event class EventPO, and store the information in the event class EventPO in the database SDB in the form of a track sequence through the Dao method. Class EventPO records each monitoring point event; class SequencePO records sequence information for each execution, including sequence id and sequence result status (status).

使用Hibernate进行对象关系映射ORM（Object Relationship Mapping，ORM）操作，和数据访问对象DAO（Data Access Object，DAO）操作Java中的持久化对象PO（persistant object，PO），完成将EventPO和SequencePO中数据存储到数据库SDB。 Use Hibernate to perform object relationship mapping ORM (Object Relationship Mapping, ORM) operation, and data access object DAO (Data Access Object, DAO) to operate the persistent object PO (persistent object, PO) in Java, and complete the data in EventPO and SequencePO Stored to the database SDB.

事件类EventPO的定义如下： The event class EventPO is defined as follows:

public class EventPO { public class EventPO {

private int primaryKey; //表示项的序号 Private int primaryKey; //Indicates the serial number of the item

private int threadID; //表示线程的序号 Private int threadID; //Indicates the serial number of the thread

private String threadName;//表示线程名 private String threadName;//Indicates the thread name

private int methodLine;//表示被监控方法所在行 Private int methodLine;//Indicates the line where the monitored method is located

private String methodName;//表示被监控方法名 private String methodName;//Indicates the name of the monitored method

private int shareMemoryIndex;//表示共享内存的序号 private int shareMemoryIndex;//Indicates the serial number of the shared memory

private String shareMemoryValue;//表示共享内存的值 private String shareMemoryValue;//Indicates the value of shared memory

private Date date;//表示操作日期 private Date date;//Indicates the operation date

private int millisecond;//表示操作时间 Private int millisecond;//Indicates the operation time

private SequencePO sequence;//表示所属的序列号 private SequencePO sequence;//Indicates the sequence number it belongs to

} }

上述定义中： In the above definition:

primaryKey表示项的序号； primaryKey represents the serial number of the item;

threadID表示线程的序号； threadID represents the serial number of the thread;

threadName表示线程名； threadName represents the thread name;

methodLine表示被监控方法所在行； methodLine indicates the line where the monitored method is located;

methodName表示被监控方法名； methodName indicates the name of the monitored method;

shareMemoryIndex表示共享内存的序号； shareMemoryIndex indicates the serial number of the shared memory;

shareMemoryValue表示共享内存的值； shareMemoryValue represents the value of shared memory;

date表示操作日期； date represents the operation date;

millisecond表示操作时间； millisecond indicates the operation time;

sequence表示所属的序列号。 sequence indicates the sequence number it belongs to.

（四）获取数据库中每条轨迹序列的执行结果，对轨迹序列进行分类，得到成功执行序列集和失效执行序列集。 (4) Obtain the execution results of each trajectory sequence in the database, classify the trajectory sequences, and obtain a successful execution sequence set and a failure execution sequence set.

获取轨迹序列中的执行结果状态（status），依据预先定义的失效结果，判定并发程序中序列执行结果的状态是否丢失修改，或是否存在不可重复读、或是否读“脏”数据等情况中的一种，如是则执行结果是失效的，对轨迹序列进行分类，得到成功执行序列集和失效执行序列集。 Obtain the execution result status (status) in the track sequence, and determine whether the status of the sequence execution result in the concurrent program is lost or modified, or whether there is non-repeatable read, or whether to read "dirty" data, etc. according to the predefined failure result One, if yes, the execution result is invalid, classify the trajectory sequence, and obtain a successful execution sequence set and a failure execution sequence set.

（五）基于轨迹序列的并发程序缺陷模式挖掘。 (5) Concurrent program defect pattern mining based on trajectory sequence.

在成功执行序列集和失效执行序列集中，利用以下改进的序列模式挖掘算法ISPAM对存储在数据库的数据集进行挖掘，得到并发程序缺陷模式。 In the successful execution sequence set and the failure execution sequence set, the following improved sequence pattern mining algorithm ISPAM is used to mine the data set stored in the database to obtain the concurrent program defect pattern.

序列模式挖掘算法ISPAM的相关变量含义如表3所示。 The meanings of the relevant variables of the sequential pattern mining algorithm ISPAM are shown in Table 3.

表3 序列模式挖掘算法ISPAM中的符号符号名符号类型含义 R 集合候选模式集合 L 集合前k个缺陷模式的集合 minsup 双精度变量最小支持度 GS 集合成功执行序列集 BS 集合失效执行序列集 V(GS) 集合垂直成功执行序列集 V(BS) 集合垂直失效执行序列集 S_init 列表项集列表 SAVE() 方法实现存储过程的方法 items 项 S_init中的一个项三元组可拓展序列记录元组 SEARCH() 文献[2]中定义的方法实现候选集生成过程的方法 r 模式模式挖掘过程中的一个候选模式 k 整型变量模式的个数 sup(r) 方法计算r的支持度的方法 bv(items) 方法计算items是否为频繁项的方法 Table 3 Symbols in the sequential pattern mining algorithm ISPAM symbolic name symbol type meaning R gather set of candidate patterns L gather Collection of top k defect patterns minsup double variable minimum support GS gather Successful execution of the sequence set BS gather Failed Execution Sequence Set V(GS) gather Vertical successfully executed sequence set V(BS) gather Vertical Failure Execution Sequence Set S _init the list itemset list SAVE() method How to implement a stored procedure items item An entry in S _init Triad Extensible Sequence Record Tuple SEARCH() The method defined in [2] A method to implement the candidate set generation process r model A Candidate Pattern in Pattern Mining Process k integer variable number of patterns sup(r) method How to calculate the support of r bv(items) method Method for calculating whether items are frequent items

所述序列模式挖掘算法ISPAM的步骤如下： The steps of the sequential pattern mining algorithm ISPAM are as follows:

（1）将候选模式集合R和前k个缺陷模式的集合L设置为空集，并设置全局变量minsup为0； (1) Set the candidate pattern set R and the top k defect pattern set L as empty sets, and set the global variable minsup to 0;

（2）步骤二，将成功执行序列集GS和失效执行序列集BS分别转化为垂直序列数据集V(GS)和V(BS)，生成V(BS)中项集列表S_init； (2) Step 2: Transform the successful execution sequence set GS and the failure execution sequence set BS into vertical sequence data sets V(GS) and V(BS) respectively, and generate the item set list S _init in V(BS);

（3）对于项集列表S_init中的每一个项items，完成以下步骤： (3) For each item in the itemset list S _init , complete the following steps:

（4）如果存在元组属于R集合并且r的支持度大于等于minsup，则重复做以下步骤，直到不满足这一条件为止： (4) If there is a tuple Belongs to the R set and the support of r is greater than or equal to minsup, then repeat the following steps until this condition is not met:

（4.3）将元组从R中移除； (4.3) will tuple removed from R;

（5）返回含有前k个缺陷模式的集合L。 (5) Return the set L containing the top k defect patterns.

其具体算法如下。 Its specific algorithm is as follows.

Claims

1. a defect mode discover method for concurrent program, is characterized in that the method comprises the following steps:

(1) monitoring objective program, uses the method for implanting of Aspect-oriented programming probe identification code to monitor control point to obtain status information;

(2) use the execution of dynamically labeled migration algorithm DLT to the target program arranging control point to dispatch, obtain the status information of the control point of target program;

(3) the status information monitored, store in a database with the form of track sets, obtain the execution result of every bar track sets in database, classify to track sets, succeeding to perform sequence sets and lost efficacy performs sequence sets;

(4) perform in sequence sets at successful execution sequence sets and inefficacy, utilize Sequential Pattern Mining Algorithm ISPAM to excavate, obtain target program defect mode.

2. the defect mode discover method of concurrent program according to claim 1, is characterized in that the step of the acquisition status information described in step (1) is as follows:

Step one, based on Aspect-oriented programming thought and source program to be monitored, for the read-write operation definition control point of shared variable in concurrent program to be monitored, then adds several point of contacts at control point place;

Step 2, define the notice that each point of contact is corresponding, define the process code that this notice need perform simultaneously, when point of contact is mated, in notice, the process code of definition is before point of contact, or after point of contact, or this 3 place performs near point of contact, transform () method in notice converts the content being used for depositing in the temporary variable meg of the line number comprised in the parameter list array args of monitored method in source program, sequence number information and the thread identifier in execution environment to monitoring site information, and stored in event class EventPO.

3. the defect mode discover method of concurrent program according to claim 1, it is characterized in that the dynamically labeled migration algorithm DLT (S described in step (2), T, t, SeS) algorithm input parameter has: global state stack S, overall situation migration series collection T, local migration t and initial global sequence collection SeS; Algorithm output parameter has: global sequence collection SeS; Its step is as follows:

Step one, pops stored in state s by the stack top element in S, by the current sequence in SeS stored in sequence variables σ;

Step 2, upgrades the traceback information of s;

Step 3, if a migration element t of s.enabled thread p is gathered in feasible migration under there is thread p and s state, performs following steps:

(3.1) by thread p assignment to the retraceable set s.backtrack of state s;

(3.2) the migration set s.done performed under creation state s, and be set to sky;

(3.3) if the element in the difference set of the migration set s.done performed under there is the retraceable set s.backtrack and state s that a certain thread q is state s, following steps are repeated, until there is not such thread q:

(3.3.1) q is added in the migration set s.done performed under state s;

(3.3.2) from the retraceable set s.backtrack of state s, thread q is deleted;

(3.3.3) p ' is established to perform migration t _mthread, if thread p ' belongs to s.done, and t _mbelong to s.enabled, then by t _madd in set s.retrieved;

(3.3.4) migration that migration set s.enabled thread q feasible under s state performs is set to t _n;

(3.3.5) create item collection is and be initialized as sky, and by t _nadd in is;

(3.3.6) item being collected is adds in sequence variables σ;

(3.3.7) judge from state s and perform t _nwhether the state arrived after migration is null value, if not, then perform following steps, otherwise do not perform:

(3.3.7.1) will from state s and perform t _nthe state obtained after migration is set to s';

(3.3.7.2) migration t' in s.enabled will be belonged to stored in s'.retrieved set independent of migration t;

(3.3.7.3) difference set gathered by s'.enabled and s'.retrieved is stored in s'.enabled;

(3.3.7.4) t will be moved _nbe added in migration series T;

(3.3.7.5) state s' is pressed in state stack S;

(3.3.7.6) recursive call DLT (S, T, t, SeS) algorithm;

(3.3.7.7) sequence of last in T is shifted out;

(3.3.8) state of in S is popped;

Step 4, returns global sequence collection SeS.

4. the defect mode discover method of concurrent program according to claim 1, is characterized in that the step of Sequential Pattern Mining Algorithm ISPAM in step (4) is as follows:

Step one, is set to empty set by the set L of candidate pattern set R and a front k defect mode, and to arrange global variable minsup be 0;

Step 2, is separately converted to vertical series data set V (GS) and V (BS) by successful execution sequence sets GS and the execution sequence sets BS that lost efficacy, generates V (BS) middle term collection list S _init;

Step 3, for item collection list S _initin each items, complete following steps:

(3.1) whether calculate it is frequent episode, if:

(3.1.1) SAVE () method is called to the pattern storage be made up of single item in L;

(3.1.2) add-ins items is gathered at R, S _init, S _initin the tlv triple of set composition of item after items;

, if there is tuple in step 4 belong to R set and the support of r is more than or equal to minsup, then repeat to do following steps, until do not meet this condition:

(4.1) tuple having the pattern r of the highest support in R set is selected ;

(4.2) by calling SEARCH () method, candidate pattern set R being expanded, upgrading candidate pattern set R, and be stored in L by expanding mode invocation the SAVE () method obtained;

(4.3) by tuple remove from R;

(4.4) by R all meet the tuple of condition removes from R;

Step 5, returns the set L containing a front k defect mode.