Disclosure of Invention
In view of the above, the invention provides a heterogeneous model framework extraction and matching method, a heterogeneous model framework extraction and matching device and a storable medium, which can refine characteristics and structures of a heterogeneous model in the design and construction processes so as to provide references for similar requirements.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A heterogeneous model frame extraction and matching method comprises the following steps:
preprocessing the heterogeneous model;
forming a template by the pretreated heterogeneous model;
uniformly representing the templates by utilizing the structure of a binary tree, and storing the templates in a framework library;
Traversing the element sequence obtained by the binary tree as the serialization feature of the template to obtain the inclusion relation of the serialization feature;
and extracting a sequence with the occurrence frequency higher than a preset threshold value in the serialized features and then processing the sequence to be used as a model construction mode, wherein the extraction of the model construction mode can be used for knowing more potential features and realizing the management of a framework library.
Preferably, the specific process of preprocessing the heterogeneous model includes:
classifying the heterogeneous models according to different categories, and extracting the characteristics of the heterogeneous models.
Preferably, the specific process of forming the heterogeneous model after pretreatment into a template comprises the following steps:
Reading an expression under a keyword in the heterogeneous model, and extracting a parameter name contained in the expression;
reading an expression under an equation in the heterogeneous model, extracting parameter names contained in the expression, and forming a parameter list;
Judging whether the parameter names under the keywords appear in the parameter list, deleting the keyword names if the parameter names do not exist, replacing the keyword names with consistent names if the parameter names exist, and finally forming a template.
Preferably, the features include any one or any several of domain features, logic features and descriptive features.
Preferably, the specific process of uniformly representing the template by using the structure of the binary tree comprises the following steps:
The formulas contained in the templates are stored into a first list after being cut according to spaces, whether the length of the formulas is equal to 1 is judged, if the length of the formulas is equal to 1, the formulas are directly output as root nodes of the binary tree, and if the formulas are not equal to 1, the formulas are sequentially scanned and whether brackets exist in the formulas is judged;
If brackets exist, removing the brackets contained in the formula, taking an operator with the lowest priority in the formula as a root node, recursively constructing a left subtree by a formula element sequence contained before the root node, recursively constructing a right subtree by a formula element sequence after the root node, and if no brackets exist, directly constructing a binary tree by adopting a recursion method until the formula is represented by the binary tree.
Preferably, the specific process of obtaining the inclusion relationship of the serialization feature by traversing the element sequence obtained by the binary tree as the serialization feature of the template includes:
and matching the serialization features by using the formula element sequences obtained by the binary tree middle sequence traversal as the serialization features of the templates, and obtaining the inclusion relation between the templates by using a naive algorithm.
Further, the invention also provides a heterogeneous model frame extraction and matching device, which comprises:
the data preprocessing module is used for preprocessing the heterogeneous model;
The template forming module is used for forming the pretreated heterogeneous model into a template;
The template representation module is used for uniformly representing the templates by utilizing the structure of the binary tree and storing the templates in a frame library;
The processing module is used for traversing the element sequence obtained by the binary tree as the serialization characteristic of the template to obtain the inclusion relation of the serialization characteristic;
The extraction module is used for extracting sequences with the occurrence frequency higher than a preset threshold value in the serialized features and then processing the sequences to be used as model construction modes, and the extraction of the model construction modes can be used for knowing more potential features and realizing management of a framework library.
Further, the present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the heterogeneous model frame extraction and matching method as described in any one of the above.
Compared with the prior art, the method, the device and the storage medium for extracting and matching the heterogeneous model framework can refine the characteristics and the structures of the heterogeneous model in the design and construction processes, the extraction of the model construction mode can know potential model characteristics so as to realize the reuse of the model, thus providing reference for similar requirements.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, embodiment 1 of the present invention discloses a heterogeneous model frame extraction and matching method, which includes:
preprocessing the heterogeneous model;
forming a template by the pretreated heterogeneous model;
Uniformly representing templates by utilizing the structure of a binary tree, and storing the templates in a frame library;
Traversing the element sequence obtained by the binary tree as the serialization feature of the template to obtain the inclusion relation of the serialization feature;
the sequences with the occurrence frequency higher than a preset threshold value in the sequence features are extracted and processed to be used as model construction modes, and the extraction of the model construction modes can be used for knowing more potential features and realizing the management of a framework library.
In a specific embodiment, the specific process of preprocessing the heterogeneous model includes:
Classifying the heterogeneous models according to different categories, and extracting the characteristics of the heterogeneous models.
In a specific embodiment, the specific process of forming the pre-processed heterogeneous model into a template includes:
Reading an expression under a keyword in the heterogeneous model, and extracting a parameter name contained in the expression;
reading an expression under an equation in the heterogeneous model, extracting parameter names contained in the expression, and forming a parameter list;
Judging whether the parameter names under the keywords appear in the parameter list, deleting the keyword names if the parameter names do not exist, replacing the parameter names with consistent names if the parameter names exist, and finally forming a template.
In a particular embodiment, the features include any one or any combination of domain features, logic features, descriptive features.
Referring to fig. 2-3, in a specific embodiment, the method for uniformly representing templates using a binary tree structure includes:
The formulas contained in the templates are stored in a first list after being cut according to spaces, whether the length of the formulas is equal to 1 is judged, if the length of the formulas is equal to 1, the formulas are directly output to be root nodes of a binary tree, and if the formulas are not equal to 1, the formulas are sequentially scanned and whether brackets exist in the formulas is judged;
If the brackets exist, the operator with the lowest priority in the formula is taken as a root node after the brackets contained in the formula are removed, a left subtree is constructed in a recursion mode by a formula element sequence contained in the front of the root node, a right subtree is constructed in a recursion mode by a formula element sequence behind the root node, and if the brackets do not exist, a binary tree is constructed by a recursion mode directly until the formula is represented by the binary tree.
The root node is the lowest priority operator in the model formula, firstly considers the assignment operator, then considers the relation operator, then follows the arithmetic operator and so on. And thirdly, merging the binary tree. The binary tree in list 2 is traversed circularly, the first two binary trees are fetched each time, and are connected into a new binary tree by taking the uniform name (equ 1, equ2,.. equN) as a root node until all binary trees are finally combined into one binary tree. And fourthly, taking the element sequence obtained by traversing the binary tree middle sequence as a serialization characteristic of the template and storing the element sequence.
In a specific embodiment, the specific process of obtaining the inclusion relationship of the serialized features by using the element sequence obtained by traversing the binary tree as the serialized feature of the template includes:
And using a formula element sequence obtained by binary tree middle sequence traversal as a serialization characteristic of the templates, and matching the serialization characteristic by a naive algorithm to obtain the inclusion relation between the templates.
Specifically, first, two model formulas to be matched are set as a main string, and the other model formula is a model string. Next, it is determined whether the character to be matched is an operator or an operand. If it is an operand, the backward comparison continues. If the operator is to judge whether the characters to be matched in the main string and the pattern string are equal, if so, the backward comparison is continued. Otherwise, the next comparison master string starts with the next character to be matched from the beginning, and the pattern string starts with the first character. When the main string is compared to the pattern string to the end, the containment relationship between templates can be obtained.
And using the formula element sequence obtained by binary tree middle sequence traversal as the serialization characteristic of the template. Matching the serialized features using a naive algorithm can result in inclusion relationships between templates. An inclusion relationship means that a sequence of elements in one formula contains some or all of the elements in another formula. The inclusion relationship is divided into partial inclusion and full inclusion, and when calculating the inclusion relationship, a great deal of attention is paid to whether operators between template formulas are identical. As shown in table 1, the similarity between templates can be expressed in terms of three relationships that are completely equal, all inclusive and part inclusive. For example, the main string is "var1/var 2=var 5+var4 (var 3-var 6)", and the pattern string is three model formulas in table 1. The matching of the main string and the pattern string can obtain the inclusion relation between the main string and the pattern string, wherein the main string totally contains all elements in the pattern string 1. The main string contains partial elements in pattern string 2 and the main string contains multiple partial elements in pattern string 3. And obtaining similar parts among templates according to the inclusion relation, extracting sequences frequently appearing in the parts, processing (denoising and de-duplication) and then taking the sequences as model construction modes. The framework set formed by the templates and the model construction mode are constructed into a heterogeneous model framework library. The model construction mode can be used as a supplement of the template function to be matched and combined with the template, and a model meeting different application requirements can be quickly established.
TABLE 1 template similarity
Referring to fig. 4, embodiment 1 of the present invention further provides a heterogeneous model frame extraction and matching device, including:
the data preprocessing module is used for preprocessing the heterogeneous model;
The template forming module is used for forming the preprocessed heterogeneous model into a template and storing the template in the frame library;
the template representation module is used for uniformly representing templates by utilizing the structure of the binary tree;
The processing module is used for traversing the element sequence obtained by the binary tree as the serialization feature of the template to obtain the inclusion relation of the serialization feature;
The extraction module is used for extracting sequences with the occurrence frequency higher than a preset threshold value in the serialized features and then processing the sequences to be used as model construction modes, and the extraction of the model construction modes can be used for knowing more potential features and realizing management of a framework library.
Embodiment 1 of the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the heterogeneous model frame extraction and matching method according to any one of embodiment 1 above.
Example 2
The method provided in example 1 was specifically applied to the extraction of a fuel cell model, and the specific procedure was as follows:
(1) Template extraction, namely, extracting structural features (field features, logic features, description features and the like) of a fuel cell model to form templates in order to quickly and accurately construct the model aiming at a certain field or a certain problem after the fuel cell model is classified according to different categories by realizing extraction and matching of the fuel cell model. First, reading the expression under the key words in the model, and extracting the variable names (such as a, b, c and d) in the expression. The variable names (e.g., a, b, c, d, va) under the equations in the model are then read. Finally, checking whether the variable names under the keywords in the model are also in a variable name list of the equation, if so, replacing the variable names with uniform names (var 1, var 2..varN), otherwise, deleting the variable names, and finally forming the template.
(2) Serialization representation of templates in order to facilitate management of subsequent framework libraries, a binary tree-based structured representation method implements the serialization representation of fuel cell model templates. And the first step is to read a model file preprocessing model. And (5) dividing the formulas in the model according to spaces, and storing the divided formulas in the list 1. And secondly, establishing a binary tree. Judging whether the length of the model formula is equal to 1, and because the length of the model formula is greater than 1, scanning the model formula from left to right in turn to judge whether brackets exist in the formula, wherein no brackets appear in the model formula, directly taking an operator with the lowest priority as a root node, recursively constructing a left subtree by a formula element sequence before the root node, and recursively constructing a right subtree by a formula element sequence after the root node. The established binary tree is stored in sequence in list 2. And thirdly, merging the binary tree. The binary tree is built for each model formula in the template, when a plurality of formulas exist in the template, a plurality of binary trees are required to be built, and the formulas in one template are required to be represented by one binary tree for facilitating subsequent processing, so that the binary tree merging operation is required. The first two binary trees in list 2 are sequentially fetched, with the unified names (equ 1, equ2,.. equN) as root nodes, one as left subtree, and one as right subtree, connected as new binary tree, until all binary trees are finally merged into one binary tree. And step four, using element sequences obtained by traversing the binary tree in a medium sequence as serialization features of the template to store.
(3) And obtaining a model construction mode, and adopting an improved naive matching algorithm to realize matching between fuel cell model templates. First, two model formulas to be matched are set as a main string and the other model formula is a model string. Next, it is determined whether the character to be matched is an operator, and if so, the comparison is continued backwards. Otherwise, the comparison main string of the next round starts from the next character matched at the beginning, and the pattern string starts from the first character and is compared backwards. If so, the backward comparison is continued. And after the comparison is finished, obtaining the inclusion relation between the fuel cell model templates, wherein the inclusion relation is divided into three types of congruent, full inclusion and partial inclusion, obtaining the similar parts of the fuel cell model templates according to the inclusion relation, extracting sequences with the occurrence frequency higher than a preset threshold value from the fuel cell model templates on the basis of the inclusion relation, processing (de-duplication and de-noising) to obtain a model construction mode, and extracting the model construction mode can know more potential characteristics and realize management of a frame library.
(4) And constructing a heterogeneous model framework library, wherein a framework set formed by the fuel cell model templates and a model construction mode are constructed into the heterogeneous model framework library. The model construction mode is combined with the template, and the model meeting different application requirements is quickly established.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.