CN119396687A

CN119396687A - Software defect location method and device based on multiple views

Info

Publication number: CN119396687A
Application number: CN202411363094.9A
Authority: CN
Inventors: 谢晓园; 周纯英; 陈功
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2024-09-27
Filing date: 2024-09-27
Publication date: 2025-02-07

Abstract

The application relates to the technical field of software engineering, in particular to a multi-view-based software defect positioning method and device, wherein the method comprises the steps of extracting various relation data between a defect report and a source code file in a software project, carrying out characterization learning on multi-source relations between the defect report and the source code file based on a graph neural network and a contrast learning technology to obtain characterization of the defect report and the source code file, calculating feature similarity between the defect report and the source code file, generating similarity sequences of the source code file for corresponding defect reports according to the similarity, selecting the source code file with the similarity larger than preset similarity based on the similarity sequences, and determining defect positioning information in the software project by utilizing the selected source code file. Therefore, the problems that in the related technology, the multi-source information cannot be fully mined and utilized, so that a key path with low positioning accuracy is caused, the influence of multi-source information on positioning performance is not analyzed, an effective strategy is lacking and the like are solved.

Description

Software defect positioning method and device based on multiple views

Technical Field

The application relates to the technical field of software engineering, in particular to a software defect positioning method and device based on multiple views.

Background

Project teams often receive a large number of defect reports during the life cycle of a software system. Once a defect report is received, the developer needs to quickly mobilize the depth profile of the source code file to accurately identify and repair the underlying defect, and therefore, efficient defect localization techniques are particularly critical.

In the related art, the defect positioning technology is mainly divided into a spectrum-based method which relies on dynamic spectrum information of a program in different execution processes to assist in positioning a fault source and an information-based method which does not need dynamic execution information but focuses on analyzing text content between a defect report and a source code file to realize defect positioning by evaluating correlation between the defect report and the source code file. The information retrieval-based method can be divided into two main types, namely a text matching method and a semantic matching method. The text matching method focuses on directly utilizing similarity of vocabulary layers, regarding defect reports as queries, regarding program modules as a corpus, and performing text retrieval, and introducing the semantic matching method injects new vitality into defect positioning technology based on information retrieval, and is expected to become a mainstream development direction of the field in the future.

However, in the related art, in the situation of lack of the defect report text information, the key approach of low positioning accuracy caused by utilizing the multi-source information cannot be fully mined, and in addition, the potential influence of the multi-source information on the positioning performance is not deeply analyzed, and an effective strategy is not available to optimize the self-adaptive extraction and integration of valuable data from a plurality of information sources.

Disclosure of Invention

The application provides a software defect positioning method and device based on multiple views, which are used for solving the problems that in the related art, under the situation of deficiency of defect report text information, multiple source information cannot be fully mined and utilized, so that a key path with lower positioning precision is caused, in addition, potential influence of multiple source information on positioning performance is not deeply analyzed, and an effective strategy is not available for optimizing self-adaptively extracting and integrating valuable data from multiple information sources.

An embodiment of the application provides a software defect positioning method based on multiple views, which comprises the following steps of extracting multiple relational data between a defect report and a source code file in a software project, performing characterization learning on multiple source relations between the defect report and the source code file based on a graph neural network and a contrast learning technology to obtain characterization of the defect report and the source code file, calculating feature similarity between the defect report and the source code file based on the characterization of the defect report and the source code file, generating similarity sequences of the source code file for corresponding defect reports according to the similarity, selecting the source code file with the similarity larger than preset similarity based on the similarity sequences, and determining defect positioning information in the software project by utilizing the selected source code file.

Optionally, in an embodiment of the present application, the extracting multiple relational data between the defect report and the source code file in the software project, and performing characterization learning on multiple source relations between the defect report and the source code file based on a graph neural network and a contrast learning technology to obtain characterization of the defect report and the source code file includes extracting repair history data between the defect report and the source code file, similar data between the defect report and common guide data between the source code file, and respectively constructing different views, wherein the views may include but are not limited to a history interaction view, a defect report similar view and a source code common guide view, performing characterization learning on the history interaction view based on the repair history data to obtain a first defect report characterization matrix of the defect report and a first source code file characterization matrix of the source code file based on the defect report and the multiple relational data between the defect report and the source code file, and constructing common guide data between the defect report similar view, performing characterization learning on the history interaction view based on the graph neural network to obtain a second defect report similar view and the common guide code common guide matrix based on the second map neural network, and performing characterization learning on the history interaction view based on the history interaction view constructed on the history interaction view.

Optionally, in an embodiment of the present application, the extracting multiple relational data between the defect report and the source code file in the software project, performing characterization learning on multiple source relations between the defect report and the source code file based on a graph neural network and a contrast learning technology to obtain characterization of the defect report and the source code file, further including calculating a matching score between the defect report and the source code file by using an objective function based on the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix and the second source code file characterization matrix, performing information interaction on the first defect report characterization matrix and the second defect report characterization matrix to obtain a defect report characterization distance of the same sample and a defect report characterization distance of different samples, performing information interaction on the first source code file characterization matrix and the second source code file characterization matrix to obtain a source code file characterization distance of the same sample and a source code file characterization distance of different samples, optimizing a multi-task report strategy by combining the matching score, the defect report characterization distance and the source code characterization matrix, and the second defect report characterization matrix, and performing final optimization of the defect report characterization matrix and the defect report characterization matrix as a final feature code.

Optionally, in an embodiment of the present application, the calculation formula of the matching score may be, but is not limited to,:

Wherein z _r and z _c are characterization vectors for defect report v _r and source code file v _c, respectively;

The objective function of the matching score may be, but is not limited to,:

Wherein, Representing all positive sample pairs (i.e., the actually associated defect report and source code file pairs), C is the total number of candidate source code files.

Alternatively, in one embodiment of the present application, the objective function of the comparison function may be, but is not limited to,:

Wherein, AndRepresenting positive pairs of samples in different views,AndThen it is a negative pair of samples,For calculating cosine similarity of two characterization vectors, |R| and |C| are the number of defect reports and source code files, respectively, and τ is a temperature hyper-parameter for contrast learning.

The embodiment of the second aspect of the application provides a multi-view-based software defect positioning device, which comprises a learning module, a sorting module and a determining module, wherein the learning module is used for extracting various relation data between a defect report and a source code file in a software project, carrying out characterization learning on the multi-source relation between the defect report and the source code file based on a graph neural network and a contrast learning technology to obtain characterization of the defect report and the source code file, the sorting module is used for calculating feature similarity between the defect report and the source code file based on the characterization of the defect report and the source code file, generating a similarity sorting of the source code file for the corresponding defect report according to the similarity, and the determining module is used for selecting a source code file with the similarity larger than a preset similarity based on the similarity sorting and determining defect positioning information in the software project by using the selected source code file.

Optionally, in one embodiment of the present application, the learning module includes an obtaining unit configured to extract repair history data between a defect report and a source code file, similar data between defect reports, and co-reference data between source code files based on multiple relationship data between the defect report and the source code file, and respectively construct different views, where the views may include, but are not limited to, a history interaction view constructed based on the repair history data, a defect report similar view, and a source code co-reference view, a first learning unit configured to perform characterization learning on the history interaction view using the graph neural network to obtain a first defect report characterization matrix of the defect report and a first source code file characterization matrix of the source code file, a second learning unit configured to perform characterization learning on the defect report similar view using the graph neural network to obtain a second defect characterization matrix of the defect report, and a third learning unit configured to perform characterization learning on the common source code file using the graph neural network based on the common reference data.

Optionally, in one embodiment of the present application, the learning module further includes a matching unit configured to calculate a matching score between the defect report and the source code file by using an objective function based on the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix, and the second source code file characterization matrix, a first generating unit configured to perform information interaction on the first defect report characterization matrix and the second defect report characterization matrix to obtain a defect report characterization distance of the same sample and a defect report characterization distance of different samples, a second generating unit configured to perform information interaction on the first source code file characterization matrix and the second source code file characterization matrix to obtain a source code file characterization distance of the same sample and a source code file characterization distance of different samples, and a third generating unit configured to combine the matching score, the defect report characterization distance, and the source code file characterization distance, optimize the first defect report characterization matrix, the second defect report characterization matrix, the first source code file, and the second source code file characterization matrix by combining the contrast learning technique and the multi-strategy, and optimizing the defect report characterization matrix, and then using the first source code file characterization matrix, the second source code file characterization matrix, and the source code file characterization matrix, and the final feature score optimizing.

The objective function of the matching score may be, but is not limited to,:

An embodiment of a third aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the multi-view based software defect localization method as described in the above embodiment.

A fourth aspect of the present application provides a computer readable storage medium storing a computer program which when executed by a processor implements a multi-view based software defect localization method as above.

A fifth aspect of the present application embodiment provides a computer program product comprising a computer program which when executed implements a multi-view based software defect localization method as above.

According to the embodiment of the application, a multi-source relation view can be constructed according to the defect report and the source code file in the software project, and deep characterization learning is carried out on the nodes in each view by adopting the graph neural network, so that three relation displays are encoded into the characterization of the defect report and the source code file, and then the similarity between the defect report characteristic characterization and the source code file characteristic characterization is calculated by utilizing the contrast learning technology and the multi-task joint optimization, the source code file with the similarity larger than a certain similarity is selected, the defect positioning information in the software project is determined, under the situation that the defect report text information is deficient, the defect of the text information is effectively overcome by virtue of deep excavation and efficient utilization of the multi-source information, and the information codes of various sources are automatically integrated into the representation learning process of the defect report and the source code file by utilizing the contrast learning technology and the multi-task joint optimization, so that a more comprehensive and accurate characterization model is constructed, and the defect positioning performance and efficiency are remarkably improved. Therefore, the problems that in the related technology, the multi-source information cannot be fully mined and utilized, so that a key path with low positioning accuracy is caused, the influence of multi-source information on positioning performance is not analyzed, an effective strategy is lacking and the like are solved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart of a software defect localization method based on multiple views according to an embodiment of the present application;

FIG. 2 is a block diagram of a multi-source relational formal structure provided in accordance with one embodiment of the application;

FIG. 3 is a block diagram of a process for software defect localization according to one embodiment of the present application;

FIG. 4 is a flow chart of the working principle of a multi-view based software defect localization method according to an embodiment of the present application;

FIG. 5 is a block diagram of a multi-view based software defect localization apparatus according to an embodiment of the present application;

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

The following describes a multi-view software defect positioning method and device according to an embodiment of the present application with reference to the accompanying drawings. Aiming at the problems that in the background technology, multi-source information cannot be fully utilized under the situation of deficiency of defect report text information, so that a key path with lower positioning precision is not fully utilized, and in addition, potential influence of the multi-source information on positioning performance is not deeply analyzed, and an effective strategy is not available to optimize self-adaptively extracting and integrating valuable data from a plurality of information sources, the application provides a multi-view-based software defect positioning method, in the method, a multi-source relation view can be constructed according to defect reports and source code files in software items, deep characterization learning is carried out on nodes in each view by adopting a graph neural network, so that three relation displays are encoded into the characterization of the defect reports and the source code files, and then, the similarity between the defect report characteristic characterization and the source code file characteristic characterization is calculated by utilizing a contrast learning technology and the multi-task combined optimization, a source code file with the similarity larger than a certain similarity is selected, defect positioning information in a software project is determined, under the situation that the defect report text information is deficient, the defect of the text information is effectively made up by deep mining and efficient utilization of the multi-source information, and the information codes of various sources are automatically integrated into the defect report and the representation learning process of the source code file by utilizing the contrast learning technology and the multi-task combined optimization, so that a more comprehensive and accurate characterization model is constructed, and the defect positioning performance and efficiency are remarkably improved. Therefore, the problems that in the related technology, the multi-source information cannot be fully mined and utilized, so that a key path with low positioning accuracy is caused, the influence of multi-source information on positioning performance is not analyzed, an effective strategy is lacking and the like are solved.

Specifically, fig. 1 is a flowchart of a software defect positioning method based on multiple views according to an embodiment of the present application.

As shown in fig. 1, the multi-view-based software defect localization method includes the following steps:

In step S101, various relationship data between the defect report and the source code file in the software project are extracted, and the multi-source relationship between the defect report and the source code file is subjected to characterization learning based on the graph neural network and the contrast learning technology, so as to obtain the characterization of the defect report and the source code file.

As a possible implementation manner, the embodiment of the application can extract various relation data between the defect report and the source code file from the software project, respectively construct the repair history relation between the defect report and the source code file, the similarity relation between the defect report and the source code file and the co-index relation between the source code file into different views, and further adopt the graph neural network to perform characterization learning on the nodes in each view, so as to encode the repair history relation, the similarity relation and the co-index relation display into the defect report and the characterization of the source code file, and obtain the defect report characteristic characterization of the defect report and the source code file characteristic characterization of the source code file of the defect report.

Optionally, in one embodiment of the application, various relational data between a defect report and a source code file in a software project are extracted, and characterization learning is performed on multi-source relations between the defect report and the source code file based on a graph neural network and a contrast learning technology to obtain characterization of the defect report and the source code file, wherein the method comprises the steps of extracting repair history data between the defect report and the source code file, similar data between the defect report and the source code file and common guide data between the source code file based on the various relational data between the defect report and the source code file, respectively constructing different views, wherein the views comprise a historical interaction view, a defect report similar view and a source code common guide view, performing characterization learning on the historical interaction view based on the repair history interaction view by using the graph neural network to obtain a first defect report characterization matrix of the defect report and a first source code file characterization matrix of the source code file, performing characterization learning on the defect report similar view constructed based on the similar data by using the graph neural network to obtain a second defect report characterization matrix of the defect report, constructing the common guide data, and commonly guiding the view by using the graph neural network to obtain the source code common guide view.

In some embodiments, the historical interaction view of embodiments of the application may be defined as a weighted bipartite graph G _R-C＝(V_R-C,E_R-C) in which node v _i(v_i∈V_R-c) represents a defect report received during the life cycle of a software project or a class file in a software system, and if the source code file for which the defect report v _r needs to be repaired is v _c, there is a border e _rc(e_rc＝(<v_r,v_c>∈E_R-c between them. Further, in the embodiment of the present application, the weight between two nodes may be represented as:

a_rc(v_r,v_c)＝tex_sim(v_r,v_c),

Wherein tex_sim (v _r,v_c) is used to calculate the text similarity score between defect report v _r and source code file v _c.

In addition, it should be noted that, in the embodiment of the present application, a CodeBERT model may be used to convert the defect report and the source code file into high-dimensional vectors, and the distance between the two vectors is calculated by using cosine similarity, which is used as the weight between the two nodes.

For example, as shown in fig. 2 (a), the embodiment of the present application obtains a historical interaction view based on the weighted bipartite graph.

Further, in the history interactive view, the node set comprises a group of defect reports And a set of source code filesWhere m and n represent the number of defect reports and source code files, respectively. The initial characteristics of all nodes in the history interaction view adopt uniformly distributed random initialization values as inputs of the graph neural network. For any defect report v _r, the neighbor set N (v _r) is defined as the source code file set connected with the edge in C, namely N (v _r)＝{v_c∈C|(v_r,v_c)∈E_R-C) & the same applies to any source code file v _c, the neighbor set N (v _c) is the defect report set connected with the edge in R, namely N (v _c)＝{v_r∈R|(v_r,v_c)∈E_R-C) & the GAT (Graph Attention Networks, graph annotation force network) model is adopted, each node aggregates the characteristic information of the neighbor node through a multi-layer iterative process and fuses with the characteristic vector of the previous layer, the characteristic is updated gradually until the last layer, and an updated defect report characteristic matrix, namely a first defect report characteristic matrix is outputAnd an updated source code file matrix, the first source code file characterizing the matrixWhere d is the dimension of the feature vector.

In some embodiments, the defect report similarity view of embodiments of the present application may be defined as a weighted homogeneity map G _R-R＝(V_R-R,E_R-R, in which the nodesRepresenting defect reporting, if two defect reportsAndWith commonly repaired source code files, there is a certain correlation between them, i.e. there is a border Further, in the embodiment of the present application, the weight between two nodes may be represented as:

Wherein, For calculating two defect reportsAndA text similarity score between.

For example, as shown in FIG. 2 (b), the defect report of the embodiment of the present applicationAnd(OrAnd) There is a degree of similarity between them (i.e., indicated by the short dashed line in FIG. 2 (b)) because there are files between them that have been repaired with the same defects(AndThe defect file commonly repaired among the defect files is). In addition, the embodiment of the application constructs all similar relations among the defect reports in the mode to form a defect report similar view. Wherein the weight between two nodesThe calculation formula of (2) is shown as above.

Furthermore, in the embodiment of the application, in the defect report similar view, the node only comprises a defect report R, and the neighbor set of any defect report v _r is N (v _r)＝{u_r∈R|(v_r,u_r)∈E_R-R }, the similarity between the defect reports is subjected to characterization learning by adopting a GAT model, and finally an updated defect report characterization matrix, namely a second defect report characterization matrix is output

In some embodiments, the source code co-referenced view of embodiments of the present application may be defined as a weighted homogeneity map G _C-C＝(V_C-C,E_C-C in which the nodesRepresenting source code files, if two source code filesAndAt the same time, the same defect report is repaired, and a certain correlation exists between the defect reports, namely a continuous edge exists Further, in the embodiment of the present application, the weight between two nodes may be, but is not limited to,:

Wherein, For counting two source code filesAndThe number of co-references between, the more the number of co-references indicates the greater the correlation between the two files.

For example, as shown in FIG. 2 (b), embodiments of the present application report defectsThe related defect file is AndThus, there is a co-referenced relationship between them (i.e., shown by the long dashed line in fig. 2 (b)). In addition, the embodiment of the application constructs all the co-reference relations among the source code files in the mode to form a source code co-reference view. Wherein the weight between two nodesThe calculation formula of (2) is shown as above.

Furthermore, in the embodiment of the application, in the source code co-indexing view, the node only comprises a source code file C, the neighbor set of any source code file v _c is N (v _c)＝{u_c∈C|(v_c,u_c)∈E_C-C }, the co-indexing relation between the source code files is characterized and learned by adopting the GAT model, and finally an updated source code file characterization matrix, namely a second source code file characterization matrix, is output

Optionally, in one embodiment of the application, various relational data between a defect report and a source code file in a software project are extracted, and based on a graph neural network and a contrast learning technology, multi-source relation between the defect report and the source code file is subjected to characterization learning to obtain characterization of the defect report and the source code file, and further the method further comprises the steps of calculating matching scores between the defect report and the source code file by utilizing an objective function based on a first defect report characterization matrix, a second defect report characterization matrix, a first source code file characterization matrix and a second source code file characterization matrix, performing information interaction on the first defect report characterization matrix and the second defect report characterization matrix to obtain defect report characterization distances of the same sample and defect report characterization distances of different samples, performing information interaction on the first source code file characterization matrix and the second source code file characterization matrix to obtain source code file characterization distances of the same sample and source code file characterization distances of different samples, optimizing the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix and the second defect report characterization matrix and the second source code file characterization matrix by combining the matching score, and the contrast learning technology and the multi-task, and optimizing the first defect report characterization matrix, the second defect report characterization matrix, the source code file and the subsequent defect report characterization file and the final feature file sequencing and the final feature file characterization matrix. The calculation formula of the matching score may be, but is not limited to,:

Wherein z _r and z _c are characterization vectors for defect report v _r and source code file v _c, respectively.

The objective function of the matching score may be, but is not limited to,:

The objective function of the comparison function may be, but is not limited to,:

It can be understood that the embodiment of the application can strengthen the sharing information and fusion among different views by contrast learning, and simultaneously improve the accuracy of defect positioning and the effectiveness of information sharing among views by adopting a multi-task optimization strategy, and the content of the method can be as follows:

In some embodiments, the present application focuses on a defect localization task, and calculates a matching score between a first defect report characterization matrix and a first source code file characterization matrix in a historical interaction view, where a calculation formula of the matching score may be, but is not limited to,:

Further, in some embodiments, the defect localization task may be regarded as a sorting problem, and the cross entropy loss function is used to train a model, so as to maximize the score difference between the positive sample and the negative sample, and the definition of the objective function may be, but is not limited to, that:

In some embodiments, embodiments of the application may learn across view comparisons to enhance information sharing. It will be appreciated that embodiments of the present application may incorporate contrast learning mechanisms to facilitate information sharing between different views in view of their presence in different representation spaces. Specifically, for defect reporting, a first defect report characterization matrix in the historical interaction viewA second defect report characterization matrix in a view similar to the defect reportInformation interaction is carried out, and the characterization with more discriminant is learned by maximizing mutual information between the two matrixes. Similarly, the same information sharing process is performed for source code files. Contrast learning optimizes the model by comparing the characterization distances of the same sample (positive sample) to different samples (negative sample). The objective function of contrast learning according to the embodiment of the present application may be, but is not limited to,:

Wherein, AndRepresenting positive pairs of samples in different views,AndThen it is a negative pair of samples,For calculating cosine similarity of two characterization vectors, |R| and |C| are the numbers of defect reports and source code files, respectively, τ is a temperature super-parameter for contrast learning, set to 0.1.

It will be appreciated that embodiments of the application are directed to defect reporting in one aspectCharacterizing a first defect report in a historical interaction viewAnd a second defect report characterization matrix in the defect report viewMapping to the same space by two corresponding MLPs (Multi-Layer Perceptron Mapping, multi-layer perceptron map) Another aspect of embodiments of the application reports defectsReporting with another random defectMate and willObtaining mapped representationsAt this time, defect reporting according to the embodiment of the present applicationPositive sample pairs of (a)Should be less than the negative pair of samplesAndDistance of (2), i.e

Furthermore, the embodiment of the application can adopt multi-task combined optimization to jointly optimize the objective functions of defect positioning and contrast learning, and the overall optimization targets can be as follows:

L=L_FL+λL_CL,

where λ is a superparameter balancing the weights of the different tasks, set to 0.01.

In step S102, feature similarities between the defect report and the source code file are calculated based on the defect report and the characterization of the source code file, and a similarity ranking of the source code file is generated for the corresponding defect report based on the similarities.

As a possible implementation manner, according to the defect report feature characterization and the source code file feature characterization, the embodiment of the application calculates the similarity between each defect report and the source code file by using cosine similarity, so as to generate a similarity ranking of the defect report and the source code file.

In step S103, source code files with similarity greater than a preset similarity are selected based on the similarity sorting, and defect positioning information in the software project is determined by using the selected source code files.

In the actual execution process, the embodiment of the application can determine the defect positioning information in the software project according to the source code files with similarity greater than a certain similarity. The certain similarity may be set by those skilled in the art according to actual situations, and the present application is not particularly limited.

By way of example, the embodiment of the application can be used for recommending Top-N source code files with similarity greater than a certain similarity according to similarity sorting, namely the highest matching degree, to the defect report, so as to assist developers in rapidly positioning and repairing defects.

For example, as shown in fig. 3, after the defect report and the source code file in the embodiment of the present application are subjected to feature extraction in the previous steps, a defect report feature and a source code file feature can be obtained respectively, and further, the similarity score of the features between each defect report and the source code file is calculated by using cosine similarity, and the similarity scores are sorted according to the similarity score, and Top-N source code files with the highest matching degree with the defect report are selected as recommendations, so as to assist developers in quickly locating and repairing software defects.

The working principle of the multi-view software defect positioning method according to the embodiment of the application is explained below with reference to a specific embodiment.

Fig. 4 is a flowchart of an operating principle of a multi-view software defect positioning method according to an embodiment of the present application.

Step S401, input layer.

The embodiment of the application can extract various relation data between the defect report and the source code file from the software project, and respectively construct the repair history relation between the defect report and the source code file, the similarity relation between the defect report and the source code file and the common-reference relation between the source code files into different views, such as a history interaction view constructed based on the repair history data, a defect report similarity view constructed based on the similar data and a source code common-reference view constructed based on the common-reference data.

Step S402, GNN (Graph Neural Network, graphic neural network) embedded layer.

The embodiment of the application can obtain the first defect report characterization matrix of the defect report based on the historical interaction viewAnd a first source code file characterization matrix for the source code fileSecond defect report characterization matrix for obtaining defect report based on defect report similarity viewObtaining a second source code file characterization matrix of the source code file based on the source code co-referenced view

Step S403, joint learning layer.

The embodiment of the application can strengthen the coding of shared information among different views through a contrast learning technology, and simultaneously improve the accuracy of defect positioning and the effectiveness of the information sharing among the views by adopting a multi-task optimization strategy, wherein the embodiment of the application utilizes the contrast learning technology and the multi-task combined optimization and comprises the following steps:

Firstly, the embodiment of the application performs sequencing learning and optimization of defect positioning tasks. It may be appreciated that the embodiment of the present application focuses on a defect localization task, and calculates the matching score of the first defect report characterization matrix and the first source code file characterization matrix in the historical interaction view, where the calculation formula of the matching score is as described above.

Furthermore, the defect positioning task can be regarded as a sorting problem, a cross entropy loss function training model is adopted, the aim of maximizing the score difference between the positive sample and the negative sample is achieved, and the definition of the objective function is shown in the above.

Second, embodiments of the present application contrast learning across views to enhance information sharing. It will be appreciated that embodiments of the present application may incorporate contrast learning mechanisms to facilitate information sharing between different views in view of their presence in different representation spaces.

Specifically, for defect reporting, a first defect report characterization matrix in the historical interaction viewA second defect report characterization matrix in a view similar to the defect reportInformation interaction is carried out, and the characterization with more discriminant is learned by maximizing mutual information between the two matrixes. Similarly, the same information sharing process is performed for source code files. Contrast learning optimizes the model by comparing the characterization distances of the same sample (positive sample) to different samples (negative sample). On the one hand for defect reportingCharacterizing a first defect report in a historical interaction viewAnd a second defect report characterization matrix in the defect report viewMapping to the same space by two corresponding MLPs Another aspect of embodiments of the application reports defectsReporting with another random defectMate and willObtaining mapped representationsAt this time, defect reporting according to the embodiment of the present applicationPositive sample pairs of (a)Should be less than the negative pair of samplesAndDistance of (2), i.e

Again, the present embodiment of the application multitasking joint optimization. It will be appreciated that the present embodiment may employ a multitasking joint optimization to jointly optimize the objective function of defect localization and contrast learning, with the overall optimization objective being as described above.

Finally, the similarity sorting is performed according to the embodiment of the application. It can be understood that the embodiment of the application can calculate the similarity between each defect report and the source code file by using the obtained defect report feature characterization and the source code file feature characterization by using cosine similarity, further generate the similarity sequence of the defect report and the source code file, and recommend Top-N source code files with highest matching degree to the defect report according to the similarity sequence, thereby assisting a developer to quickly locate and repair the defect.

According to the multi-view-based software defect positioning method provided by the embodiment of the application, a multi-source relation view can be constructed according to the defect report and the source code file in the software project, and deep characterization learning is carried out on the nodes in each view by adopting the graph neural network, so that three relation displays are encoded into the characterization of the defect report and the source code file, the similarity between the defect report characteristic characterization and the source code file characteristic characterization is calculated by utilizing the contrast learning technology and the multi-task joint optimization, the source code file with the similarity larger than a certain similarity is selected, the defect positioning information in the software project is determined, the defect positioning information is effectively compensated by virtue of deep excavation and efficient utilization of the multi-source information under the condition of deficient defect report text information, and the defect information is automatically coded into the representation learning process of the defect report and the source code file by utilizing the contrast learning technology and the multi-task joint optimization, so that a more comprehensive and accurate characterization model is constructed, and the defect positioning performance and efficiency are remarkably improved. Therefore, the problems that in the related technology, the multi-source information cannot be fully mined and utilized, so that a key path with low positioning accuracy is caused, the influence of multi-source information on positioning performance is not analyzed, an effective strategy is lacking and the like are solved.

Next, a multi-view-based software defect localization apparatus according to an embodiment of the present application will be described with reference to the accompanying drawings.

Fig. 5 is a block diagram of a multi-view software defect localization apparatus according to an embodiment of the present application.

As shown in fig. 5, the multi-view based software defect localization apparatus 10 includes a learning module 100, a sorting module 200, and a determining module 300.

The learning module 100 is configured to extract various relationship data between the defect report and the source code file in the software project, and perform characterization learning on the multi-source relationship between the defect report and the source code file based on the graph neural network and the contrast learning technology, so as to obtain a characterization of the defect report and the source code file.

The ranking module 200 is configured to calculate feature similarities between the defect report and the source code file based on the defect report and the characterization of the source code file, and generate a similarity ranking of the source code file for the corresponding defect report according to the similarities.

The determining module 300 is configured to select source code files with similarity greater than a preset similarity based on the similarity sorting, and determine defect location information in the software project by using the selected source code files.

Alternatively, in one embodiment of the present application, the learning module 100 includes an acquisition unit, a first learning unit, a second learning unit, and a third learning unit.

The acquisition unit is used for extracting repair history data between the defect report and the source code file, similar data between the defect report and the source code file and common reference data between the source code file based on various relation data between the defect report and the source code file, and constructing the defect report and the source code file into different views respectively, wherein the views can comprise a history interaction view, a defect report similar view and a source code common reference view.

The first learning unit is used for constructing a historical interaction view based on the repair historical data, and performing characterization learning on the historical interaction view by utilizing the graph neural network so as to obtain a first defect report characterization matrix of the defect report and a first source code file characterization matrix of the source code file.

And the second learning unit is used for constructing a source code co-leading view based on the co-leading data, and performing characterization learning on the similar view of the defect report by using the graph neural network so as to obtain a second defect report characterization matrix of the defect report.

And the third learning unit is used for constructing a source code co-leading view based on the co-leading data, and performing characterization learning on the source code co-leading view by utilizing the graph neural network so as to obtain a second source code file characterization matrix of the source code file.

Optionally, in one embodiment of the present application, the learning module 100 further includes a matching unit, a first generating unit, a second generating unit, and a third generating unit.

The matching unit is used for calculating matching scores between the defect report and the source code file by utilizing an objective function based on the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix and the second source code file characterization matrix.

The first generation unit is used for carrying out information interaction on the first defect report characterization matrix and the second defect report characterization matrix so as to obtain defect report characterization distances of the same sample and defect report characterization distances of different samples.

And the second generating unit is used for carrying out information interaction on the first source code file characterization matrix and the second source code file characterization matrix so as to obtain the source code file characterization distance of the same sample and the source code file characterization distances of different samples.

The third generating unit is used for optimizing the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix and the second source code file characterization matrix by combining the matching score, the defect report characterization distance and the source code file characterization distance and combining a comparison function and a multitask strategy, and using the optimized characterization matrix as the final characteristics of the defect report and the source code file to be used for subsequent calculation of similarity sequencing.

Alternatively, in one embodiment of the present application, the calculation formula of the matching score may be, but is not limited to,:

The objective function of the matching score may be, but is not limited to,:

It should be noted that the foregoing explanation of the embodiment of the multi-view software defect positioning method is also applicable to the multi-view software defect positioning device of the embodiment, and will not be repeated herein.

According to the multi-view-based software defect positioning device provided by the embodiment of the application, a multi-source relation view can be constructed according to the defect report and the source code file in a software project, and deep characterization learning is carried out on nodes in each view by adopting a graph neural network, so that three relation displays are encoded into the characterization of the defect report and the source code file, the similarity between the defect report characteristic characterization and the source code file characteristic characterization is calculated by utilizing a contrast learning technology and a multi-task joint optimization, the source code file with the similarity greater than a certain similarity is selected, defect positioning information in the software project is determined, the defect positioning information is effectively compensated by virtue of deep excavation and efficient utilization of the multi-source information under the condition of deficient defect report text information, and the defect information of various sources is automatically encoded into the representation learning process of the defect report and the source code file by utilizing the contrast learning technology and the multi-task joint optimization, so that a more comprehensive and accurate characterization model is constructed, and the defect positioning performance and efficiency are remarkably improved. Therefore, the problems that in the related technology, the multi-source information cannot be fully mined and utilized, so that a key path with low positioning accuracy is caused, the influence of multi-source information on positioning performance is not analyzed, an effective strategy is lacking and the like are solved.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 601, a processor 602, and a computer program stored on the memory 601 and executable on the processor 602.

The processor 602 implements the multi-view based software defect localization method provided in the above embodiments when executing a program.

Further, the electronic device further includes:

A communication interface 603 for communication between the memory 601 and the processor 602.

A memory 601 for storing a computer program executable on the processor 602.

The memory 601 may comprise a high-speed RAM memory or may further comprise a non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 601, the processor 602, and the communication interface 603 are implemented independently, the communication interface 603, the memory 601, and the processor 602 may be connected to each other through a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT INTERCONNECT, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 601, the processor 602, and the communication interface 603 are integrated on a chip, the memory 601, the processor 602, and the communication interface 603 may perform communication with each other through internal interfaces.

The processor 602 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the application.

Embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a multi-view based software defect localization method as above.

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed, implements a multi-view based software defect localization method as above.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware as in another embodiment, may be implemented using any one or more combinations of discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like, as is known in the art.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. A multi-view based software defect localization method, comprising the steps of:

extracting various relation data between a defect report and a source code file in a software project, and performing characterization learning on the multi-source relation between the defect report and the source code file based on a graph neural network and a contrast learning technology to obtain a defect report and characterization of the source code file;

Calculating feature similarity between the defect report and the source code file based on the defect report and the characterization of the source code file, and generating similarity ordering of the source code file for the corresponding defect report according to the similarity;

And selecting source code files with similarity larger than preset similarity based on the similarity sorting, and determining defect positioning information in the software project by using the selected source code files.

2. The method of claim 1, wherein extracting a plurality of relationship data between a defect report and a source code file in a software project and performing characterization learning on a multi-source relationship between the defect report and the source code file based on a graph neural network and a contrast learning technique to obtain a characterization of the defect report and the source code file comprises:

Extracting repair history data between a defect report and a source code file, similar data between the defect report and common reference data between the source code file based on various relation data between the defect report and the source code file, and respectively constructing different views, wherein the views comprise a history interaction view, a defect report similar view and a source code common reference view;

A historical interaction view constructed based on the repair historical data, and performing characterization learning on the historical interaction view by utilizing the graph neural network to obtain a first defect report characterization matrix of the defect report and a first source code file characterization matrix of the source code file;

Performing characterization learning on the defect report similar view constructed based on the similar data by utilizing the graph neural network to obtain a second defect report characterization matrix of the defect report;

And constructing a source code co-leading view based on the co-leading data, and performing characterization learning on the source code co-leading view by utilizing the graph neural network so as to obtain a second source code file characterization matrix of the source code file.

3. The method of claim 2, wherein extracting a plurality of relationship data between defect reports and source code files in a software project and performing characterization learning on a multi-source relationship between the defect reports and the source code files based on a graph neural network and a contrast learning technique to obtain a characterization of the defect reports and the source code files, further comprising:

Calculating a matching score between the defect report and the source code file using an objective function based on the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix, and the second source code file characterization matrix;

performing information interaction on the first defect report characterization matrix and the second defect report characterization matrix to obtain defect report characterization distances of the same sample and defect report characterization distances of different samples;

Performing information interaction on the first source code file characterization matrix and the second source code file characterization matrix to obtain source code file characterization distances of the same sample and source code file characterization distances of different samples;

And optimizing the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix and the second source code file characterization matrix by combining the matching score, the defect report characterization distance and the source code file characterization distance and combining the comparison learning technology and a multitasking strategy, and using the optimized characterization matrix as final characteristics of a defect report and a source code file to be used for subsequent calculation of similarity sequencing.

4. The method of claim 3, wherein,

The calculation formula of the matching score is as follows:

the objective function of the matching score is:

5. A method according to claim 3, wherein the objective function of the comparison function is:

6. A multi-view based software defect localization apparatus, comprising:

The learning module is used for extracting various relation data between a defect report and a source code file in a software project, and carrying out characterization learning on the multi-source relation between the defect report and the source code file based on a graph neural network and a contrast learning technology so as to obtain a defect report and characterization of the source code file;

The sorting module is used for calculating the feature similarity between the defect report and the source code file based on the defect report and the characterization of the source code file, and generating similarity sorting of the source code file for the corresponding defect report according to the similarity;

and the determining module is used for selecting source code files with similarity larger than preset similarity based on the similarity sorting, and determining defect positioning information in the software project by using the selected source code files.

7. The apparatus of claim 6, wherein the learning module comprises:

The acquisition unit is used for extracting repair history data between the defect report and the source code file, similar data between the defect report and common reference data between the source code file based on various relation data between the defect report and the source code file, and constructing different views respectively, wherein the views comprise a history interaction view, a defect report similar view and a source code common reference view;

The first learning unit is used for constructing a history interaction view based on the repair history data, and performing characterization learning on the history interaction view by utilizing the graph neural network so as to obtain a first defect report characterization matrix of the defect report and a first source code file characterization matrix of the source code file;

the second learning unit is used for constructing a defect report similar view based on the similar data, and performing characterization learning on the defect report similar view by utilizing the graph neural network so as to obtain a second defect report characterization matrix of the defect report;

And the third learning unit is used for constructing a source code co-leading view based on the co-leading data, and performing characterization learning on the source code co-leading view by using a graph neural network so as to obtain a second source code file characterization matrix of the source code file.

8. The apparatus of claim 7, wherein the learning module further comprises:

a matching unit configured to calculate a matching score between the defect report and the source code file using an objective function based on the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix, and the second source code file characterization matrix;

The first generation unit is used for carrying out information interaction on the first defect report characterization matrix and the second defect report characterization matrix so as to obtain defect report characterization distances of the same sample and defect report characterization distances of different samples;

The second generating unit is used for carrying out information interaction on the first source code file characterization matrix and the second source code file characterization matrix so as to obtain source code file characterization distances of the same sample and source code file characterization distances of different samples;

And the third generating unit is used for optimizing the first defect report characterization matrix, the second defect report characterization matrix, the first source code file characterization matrix and the second source code file characterization matrix by combining the matching score, the defect report characterization distance and the source code file characterization distance and combining the contrast learning technology and the multitasking strategy, and using the optimized characterization matrix as the final characteristics of the defect report and the source code file to be used for sequencing the similarity in subsequent calculation.

9. The apparatus of claim 8, wherein,

The calculation formula of the matching score is as follows:

the objective function of the matching score is:

10. The apparatus of claim 8, wherein the objective function of the comparison function is:

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the multiview software defect localization method as claimed in any one of claims 1 to 5.

12. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor for implementing a multi-view based software defect localization method as claimed in any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed, is adapted to implement the multi-view based software defect localization method as claimed in any one of claims 1 to 5.