[go: up one dir, main page]

CN116433799A - A flow chart generation method and device based on semantic similarity and subgraph matching - Google Patents

A flow chart generation method and device based on semantic similarity and subgraph matching Download PDF

Info

Publication number
CN116433799A
CN116433799A CN202310698508.2A CN202310698508A CN116433799A CN 116433799 A CN116433799 A CN 116433799A CN 202310698508 A CN202310698508 A CN 202310698508A CN 116433799 A CN116433799 A CN 116433799A
Authority
CN
China
Prior art keywords
document
word
flow chart
user
subgraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310698508.2A
Other languages
Chinese (zh)
Other versions
CN116433799B (en
Inventor
袁水平
董丙冰
高元鑫
吴信东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sigao Intelligent Technology Co ltd
Original Assignee
Anhui Sigao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sigao Intelligent Technology Co ltd filed Critical Anhui Sigao Intelligent Technology Co ltd
Priority to CN202310698508.2A priority Critical patent/CN116433799B/en
Publication of CN116433799A publication Critical patent/CN116433799A/en
Application granted granted Critical
Publication of CN116433799B publication Critical patent/CN116433799B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于语义相似度和子图匹配的流程图生成方法,包括:获取用户需求文档和RPA项目资产库文档;计算用户需求文档和RPA项目资产库文档的语义相似度,根据语义相似度从高到低排列得到top‑k的待匹配子图;根据用户需求文档构建查询知识图谱,并设置一个开始节点;将待匹配子图在查询知识图谱中进行搜索匹配,得到最终的最佳匹配图,返回对应的流程图。本发明的技术方案减少了子图匹配搜索遍历的时间,从语义和结构上双重约束,更加准确地生成当前用户需求的流程图。

Figure 202310698508

The invention discloses a flow chart generation method based on semantic similarity and subgraph matching, comprising: obtaining user requirement documents and RPA project asset library documents; calculating the semantic similarity between user requirement documents and RPA project asset library documents, and according to the semantic similarity Rank the subgraphs to be matched from high to low to obtain the top-k subgraphs to be matched; construct the query knowledge graph according to the user demand document, and set a start node; search and match the subgraphs to be matched in the query knowledge graph to obtain the final best Match the graph and return the corresponding flowchart. The technical scheme of the invention reduces the time for searching and traversing the subgraph matching, doubles constraints on semantics and structure, and more accurately generates the flow chart required by the current user.

Figure 202310698508

Description

一种基于语义相似度和子图匹配的流程图生成方法和装置A Flowchart Generation Method and Device Based on Semantic Similarity and Subgraph Matching

技术领域technical field

本发明属于数据处理技术领域,具体涉及一种基于语义相似度和子图匹配的流程图生成方法和装置。The invention belongs to the technical field of data processing, and in particular relates to a flow chart generation method and device based on semantic similarity and subgraph matching.

背景技术Background technique

对于业务流程的执行,员工目前花费大量时间处理企业资源规划(ERP)、客户关系管理(CRM)、电子表格和遗留系统,执行手动重复性任务,例如输入、复制、粘贴、提取、合并和移动大量数据从一个系统到另一个系统的数据。考虑到其中一些高度结构化、例行和手动的任务可以由机器人处理,这样知识工作者就有更多时间处理增值任务。机器人流程自动化(RPA)作为基于软件的解决方案出现,用于自动化基于规则的业务流程,这些业务流程涉及例行任务、结构化数据和确定性结果。流程图是RPA技术中重要的一环,根据用户的需求绘制流程图,再根据流程图生成执行代码来完成指定操作。流程图对准确了解事情是如何进行的,以及决定应如何改进过程极有帮助。而人工绘制流程图需要较长的时间,耗费较多的人力资源。如何利用RPA实施库中现有的流程图来为当前的用户需求自动生成一个流程图可以节省人力物力,大大提高RPA实施的效率,是目前研究的方向。For the execution of business processes, employees currently spend significant time dealing with enterprise resource planning (ERP), customer relationship management (CRM), spreadsheets, and legacy systems, performing manual repetitive tasks such as entry, copy, paste, extract, merge, and move Large amounts of data are passed from one system to another. Considering that some of these highly structured, routine, and manual tasks can be handled by robots, this leaves knowledge workers more time for value-added tasks. Robotic Process Automation (RPA) emerged as a software-based solution for automating rule-based business processes involving routine tasks, structured data, and deterministic outcomes. The flow chart is an important part of RPA technology. The flow chart is drawn according to the user's needs, and then the execution code is generated according to the flow chart to complete the specified operation. Flowcharts are extremely helpful in understanding exactly how things are going and deciding how the process should be improved. However, manual drawing of flow charts takes a long time and consumes more human resources. How to use the existing flowchart in the RPA implementation library to automatically generate a flowchart for the current user needs can save manpower and material resources, and greatly improve the efficiency of RPA implementation, which is the direction of current research.

发明内容Contents of the invention

有鉴于此,本发明提出一种基于语义相似度和子图匹配的流程图生成方法,包括以下步骤:In view of this, the present invention proposes a flow chart generation method based on semantic similarity and subgraph matching, including the following steps:

S1、获取用户需求文档和RPA项目资产库文档;S1. Obtain user requirement documents and RPA project asset library documents;

S2、计算用户需求文档和RPA项目资产库文档的语义相似度,根据语义相似度从高到低排列得到top-k的待匹配子图;S2. Calculate the semantic similarity between the user requirement document and the RPA project asset library document, and arrange top-k subgraphs to be matched according to the semantic similarity from high to low;

S3、根据用户需求文档构建查询知识图谱,并设置一个开始节点;S3. Construct a query knowledge map according to the user requirement document, and set a start node;

S4、将待匹配子图在查询知识图谱中进行搜索匹配,得到最终的最佳匹配图,返回对应的流程图。S4. Search and match the subgraphs to be matched in the query knowledge graph, obtain the final best matching graph, and return the corresponding flow chart.

本发明还提出一种基于语义相似度和子图匹配的流程图生成装置,包括:The present invention also proposes a flow chart generation device based on semantic similarity and subgraph matching, including:

处理器;processor;

存储器,其上存储有可在所述处理器上运行的计算机程序;a memory on which is stored a computer program executable on said processor;

其中,所述计算机程序被所述处理器执行时实现一种基于语义相似度和子图匹配的流程图生成方法。Wherein, when the computer program is executed by the processor, a flowchart generation method based on semantic similarity and subgraph matching is realized.

本发明提供的技术方案带来的有益效果是:The beneficial effects brought by the technical scheme provided by the invention are:

本发明提出的技术方案利用用户的需求文档通过语义相似度筛选出项目库中与当前需求相似的项目,同时将用户的需求文档构建成小的需求图使用模糊子图匹配的方法与RPA项目资产知识图谱进行匹配,找到与当前用户需求流程结构相似的项目。先使用语义进行初筛,再使用子图匹配进行搜索遍历。一方面减少了子图匹配搜索遍历的时间,另一方面从语义和结构上双重约束,更加准确的生成当前用户需求的流程图。使用生成技术根据用户需求自动生成流程图,便于用户直接使用或进行微调,简化了绘制流程图的过程,大大减少了人工干预,提高了RPA的效率。The technical solution proposed by the present invention utilizes the user's demand document to filter out items similar to the current demand in the project library through semantic similarity, and at the same time constructs the user's demand document into a small demand graph and uses the method of fuzzy subgraph matching to match RPA project assets The knowledge map is matched to find items similar to the current user demand process structure. First use semantics for preliminary screening, and then use subgraph matching for search traversal. On the one hand, it reduces the time for searching and traversing subgraph matching, and on the other hand, it generates the flow chart of current user needs more accurately from the dual constraints of semantics and structure. Using generation technology to automatically generate flowcharts according to user needs, which is convenient for users to use directly or fine-tune, simplifies the process of drawing flowcharts, greatly reduces manual intervention, and improves the efficiency of RPA.

附图说明Description of drawings

图1是本发明实施例一种基于语义相似度和子图匹配的流程图生成方法的流程图;Fig. 1 is a flow chart of a flow chart generation method based on semantic similarity and subgraph matching in an embodiment of the present invention;

图2是本发明实施例构建的农行网银流水下载用户需求知识图谱;Fig. 2 is the Agricultural Bank of China online banking running water download user demand knowledge map constructed by the embodiment of the present invention;

图3是本发明实施例构建的RPA资产库项目B公司网银流水下载知识图谱。Fig. 3 is the knowledge map of online banking flow downloading of Company B's RPA asset library project constructed by the embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

本发明提出的一种基于语义相似度和子图匹配的流程图生成方法参考图1,图1是本发明实施例一种基于语义相似度和子图匹配的流程图生成方法的流程图,包括下列步骤:A method for generating a flowchart based on semantic similarity and subgraph matching proposed by the present invention refers to FIG. 1. FIG. 1 is a flowchart of a method for generating a flowchart based on semantic similarity and subgraph matching in an embodiment of the present invention, including the following steps :

S1、获取用户需求文档和RPA项目资产库文档。S1. Obtain user requirement documents and RPA project asset library documents.

进一步的实施例中,用户的需求为农行网银流水下载。In a further embodiment, the user's demand is the online banking flow download of the Agricultural Bank of China.

S2、计算用户需求文档和RPA项目资产库文档的语义相似度,根据语义相似度从高到低排列得到top-k的待匹配子图。S2. Calculate the semantic similarity between the user requirement document and the RPA project asset library document, and arrange top-k subgraphs to be matched according to the semantic similarity from high to low.

具体为:Specifically:

S21、对用户的需求文档Q以及RPA项目资产库用户文档

Figure SMS_1
进行分词,并去除停用词。S21. Requirements document Q for users and user documents of RPA project asset library
Figure SMS_1
Perform word segmentation and remove stop words.

本实施例中,使用分词组件jieba对用户的需求文档Q以及RPA项目资产库用户文档

Figure SMS_2
进行分词。In this embodiment, the word segmentation component jieba is used to analyze the user's requirement document Q and the RPA project asset library user document
Figure SMS_2
Participate.

S22、使用文本中词出现的频率来对文档进行描述,将用户的需求文档Q以及RPA项目资产库用户文档

Figure SMS_3
表示成一维的向量。S22. Use the frequency of words in the text to describe the document, and combine the user's demand document Q and the user document of the RPA project asset library
Figure SMS_3
Represented as a one-dimensional vector.

本实施例中,使用归一化BOW词袋模型的方法来分别表示Q和

Figure SMS_4
。In this embodiment, the normalized BOW bag-of-words model is used to represent Q and
Figure SMS_4
.

S23、根据S22中得到的一维的向量,学习用户需求文档Q的每个词语和RPA项目资产库用户文档

Figure SMS_5
的每个词语的嵌入向量,在这个向量空间中,语义相似的词之间距离相近。S23. According to the one-dimensional vector obtained in S22, learn each word of the user requirement document Q and the user document of the RPA project asset library
Figure SMS_5
The embedding vector of each word in , in this vector space, the distance between semantically similar words is similar.

本实施例中,使用word2vec方法学习用户需求文档Q的每个词语和RPA项目资产库用户文档

Figure SMS_6
的每个词语的嵌入向量。In this embodiment, the word2vec method is used to learn each word of the user demand document Q and the user document of the RPA project asset library
Figure SMS_6
Embedding vectors for each word in .

S24、使用WMD算法计算用户需求文档Q和RPA项目资产库用户文档

Figure SMS_7
之间的文本相似度,WMD算法将文档距离建模成两个文档中词的语义距离的组合,对两个文档中的任意两个词所对应的词向量求欧氏距离然后再加权求和。WMD算法是基于word2vec基础上通过计算文本间词的距离来衡量文本相似度的算法。S24, use the WMD algorithm to calculate the user requirement document Q and the user document of the RPA project asset library
Figure SMS_7
The text similarity between the documents, the WMD algorithm models the document distance as a combination of the semantic distances of the words in the two documents, and calculates the Euclidean distance for the word vectors corresponding to any two words in the two documents, and then reweights the summation . The WMD algorithm is an algorithm based on word2vec to measure text similarity by calculating the distance between words in text.

S241、将文档Q和文档

Figure SMS_8
中的词出现的次数进行归一化处理,计算文档Q中第i个词汇的词频/>
Figure SMS_9
,文档/>
Figure SMS_10
中第j个词汇的词频为/>
Figure SMS_11
:S241, document Q and document
Figure SMS_8
The number of occurrences of the words in the document Q is normalized to calculate the word frequency of the i-th vocabulary in the document Q/>
Figure SMS_9
, document />
Figure SMS_10
The word frequency of the jth vocabulary in is />
Figure SMS_11
:

Figure SMS_12
Figure SMS_12

Figure SMS_13
Figure SMS_13

其中,m、n分别为文档Q、文档Ds中词汇数,

Figure SMS_14
、/>
Figure SMS_15
分别表示文档Q、文档Ds中第i个词、第j个词出现的次数;Among them, m and n are the number of words in document Q and document D s respectively,
Figure SMS_14
, />
Figure SMS_15
Respectively represent the number of occurrences of the i-th word and j-th word in document Q and document D s ;

S242、计算来自文档Q的词i和来自文档Ds的词j的两个词间的欧式距离为

Figure SMS_16
:S242. Calculate the Euclidean distance between the word i from the document Q and the word j from the document D s as
Figure SMS_16
:

Figure SMS_17
Figure SMS_17

其中,

Figure SMS_18
和/>
Figure SMS_19
为词i和j学习到的嵌入向量;in,
Figure SMS_18
and />
Figure SMS_19
Embedding vectors learned for words i and j;

S243、利用动态规划算法求解文档Q和RPA项目资产库Ds的每个文档的WMD距离:S243, using a dynamic programming algorithm to solve the document Q and the WMD distance of each document in the RPA project asset library D s :

Figure SMS_20
Figure SMS_20

Figure SMS_21
Figure SMS_21

Figure SMS_22
Figure SMS_22

其中,

Figure SMS_23
表示将文档Q中的单词i映射到文档Ds中的单词j的权重,/>
Figure SMS_24
表示文档Q中的单词i和文档Ds中的单词j之间的距离;值得注意的是,文档Q的第i个单词对应到Ds的一个文档中所有单词的权重值的和等于/>
Figure SMS_25
,同理,Ds中的文档的第j个单词映射到文档Q的所有单词的权重值的和等于/>
Figure SMS_26
,其中,f值越小,两个文档越相似。in,
Figure SMS_23
represents the weight that maps word i in document Q to word j in document Ds , />
Figure SMS_24
Indicates the distance between word i in document Q and word j in document D s ; it is worth noting that the i-th word in document Q corresponds to the sum of the weight values of all words in a document of D s equal to />
Figure SMS_25
, in the same way, the sum of the weight values of all words in the document Q mapped to the jth word of the document in D s is equal to />
Figure SMS_26
, where the smaller the f-value, the more similar the two documents are.

S244、计算文档Q和文档Ds的相似度:S244. Calculate the similarity between the document Q and the document D s :

Figure SMS_27
Figure SMS_27

S245、设置相似度阈值

Figure SMS_28
,根据文档Q和文档Ds的相似度得到小于阈值/>
Figure SMS_29
的k个RPA项目资产文档以及k个项目对应的知识图谱和流程图。S245, setting a similarity threshold
Figure SMS_28
, according to the similarity between document Q and document D s , it is less than the threshold />
Figure SMS_29
The k RPA project asset documents and the knowledge maps and flowcharts corresponding to the k projects.

进一步的实施例中,参考图2和图3,图2是本发明实施例构建的农行网银流水下载用户需求知识图谱,图3是本发明实施例构建的RPA资产库项目B公司网银流水下载知识图谱。用户的需求文档包括农行网银流水等,可以匹配到RPA资产库中类似完成的项目文档如B公司网银流水包括建行网银流水和工行网银流水下载。In a further embodiment, refer to Fig. 2 and Fig. 3, Fig. 2 is the user demand knowledge map of the Agricultural Bank of China's online banking running water downloading constructed by the embodiment of the present invention, and Fig. 3 is the RPA asset library project B company's online banking running water downloading knowledge constructed by the embodiment of the present invention Atlas. The user's demand documents include Agricultural Bank of China online banking records, etc., which can be matched to similarly completed project documents in the RPA asset library, such as company B's online banking records, including CCB online banking records and ICBC online banking records.

S3、根据用户需求文档构建查询知识图谱,并设置一个开始节点。如图2所示,用户的需求为农行网银流水下载,并包括农行网银U盾登录、农行网银流水导出以及农行网银流水数据转换的子步骤。S3. Construct a query knowledge map according to the user requirement document, and set a start node. As shown in Figure 2, the user's demand is the download of the ABC online banking flow, and includes the sub-steps of the ABC online banking U-Shield login, the ABC online banking flow export, and the ABC online banking flow data conversion.

S4、将待匹配子图在查询知识图谱中进行搜索匹配,得到最终的最佳匹配图,返回对应的流程图。S4. Search and match the subgraphs to be matched in the query knowledge graph, obtain the final best matching graph, and return the corresponding flow chart.

S41、使用TALE近似大图匹配工具在待匹配子图中搜索与查询知识图谱最佳可能匹配结果。S41. Use the TALE approximate large graph matching tool to search and query the best possible matching result of the knowledge graph in the subgraph to be matched.

S42、若步骤S41搜索到最佳匹配结果,返回匹配到的候选子图以及匹配到的候选子图对应的流程图,若搜索不到,执行步骤S43。S42. If the best matching result is found in step S41, return the matched candidate subgraph and the flow chart corresponding to the matched candidate subgraph; if no search is found, execute step S43.

S43、根据当前开始节点的边的属性为包含关系进行图划分得到子图集合,重复执行步骤S41和步骤S42,此时开始节点更新为当前节点的尾实体,直到获得最佳匹配结果或子图集合为空时结束。S43. According to the attribute of the edge of the current start node, divide the graph into a subgraph set for the inclusion relationship, and repeat steps S41 and S42. At this time, the start node is updated as the tail entity of the current node until the best matching result or subgraph is obtained. The collection ends when it is empty.

本发明还提出一种基于语义相似度和子图匹配的流程图生成装置,包括:The present invention also proposes a flow chart generation device based on semantic similarity and subgraph matching, including:

处理器;processor;

存储器,其上存储有可在所述处理器上运行的计算机程序;a memory on which is stored a computer program executable on said processor;

其中,计算机程序被所述处理器执行时实现一种基于语义相似度和子图匹配的流程图生成方法。Wherein, when the computer program is executed by the processor, a flowchart generation method based on semantic similarity and subgraph matching is implemented.

本发明提出的技术方案中通过使用WMD算法来衡量用户需求文档和RPA项目资产库文档的相似度,找到与当前用户需求比较相似的项目来减少后续使用子图匹配进一步搜索的范围,提高效率。In the technical solution proposed by the present invention, the WMD algorithm is used to measure the similarity between the user demand document and the RPA project asset library document, and find items similar to the current user demand to reduce the scope of further search using subgraph matching and improve efficiency.

将图划分与子图匹配的方法想结合,迭代的搜索最佳匹配子图。使用模糊子图匹配的方法将用户需求知识图谱在RPA项目资产知识图谱中进行搜索最佳匹配子图,模糊子图匹配允许某些节点不匹配和某些边缺失,推荐出相关的流程图后,可人为参与修正该流程图。当搜索不到时,利用图划分的边划分的思想同时考虑用户需求知识图谱的特征,限定关系边为“包含”关系进行子图划分后再进行搜索最佳匹配子图。Combining the method of graph partitioning and subgraph matching, iteratively searches for the best matching subgraph. Use the method of fuzzy subgraph matching to search the user demand knowledge map in the RPA project asset knowledge map for the best matching subgraph. Fuzzy subgraph matching allows some nodes to be mismatched and some edges to be missing. After recommending the relevant flow chart , the flow chart can be amended manually. When the search cannot be found, use the idea of edge division of graph division and consider the characteristics of the user demand knowledge graph, limit the relationship edge to the "containment" relationship for subgraph division, and then search for the best matching subgraph.

利用语义相似度和子图匹配的方法共同生成流程图:首先使用语义相似度进行第一轮的搜索,得到与用户需求较为相似的历史项目,同时减少第二轮子图匹配搜索空间;其次使用模糊子图匹配进行第二轮搜索,找到与用户需求语义和结构上都较为相似的历史项目作为最佳匹配项目,并找到对应的流程图。Using the method of semantic similarity and subgraph matching to jointly generate a flow chart: first, use semantic similarity to conduct the first round of search, get historical items that are relatively similar to user needs, and reduce the second round of subgraph matching search space; secondly, use fuzzy subgraph matching Graph matching conducts a second round of search, finds historical items that are similar in semantics and structure to user needs as the best matching item, and finds the corresponding flow chart.

对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1.一种基于语义相似度和子图匹配的流程图生成方法,其特征在于,包括以下步骤:1. A flow chart generation method based on semantic similarity and subgraph matching, characterized in that, comprising the following steps: S1、获取用户需求文档和RPA项目资产库文档;S1. Obtain user requirement documents and RPA project asset library documents; S2、计算用户需求文档和RPA项目资产库文档的语义相似度,根据语义相似度从高到低排列得到top-k的待匹配子图;S2. Calculate the semantic similarity between the user requirement document and the RPA project asset library document, and arrange top-k subgraphs to be matched according to the semantic similarity from high to low; S3、根据用户需求文档构建查询知识图谱,并设置一个开始节点;S3. Construct a query knowledge map according to the user requirement document, and set a start node; S4、将待匹配子图在查询知识图谱中进行搜索匹配,得到最终的最佳匹配图,返回对应的流程图。S4. Search and match the subgraphs to be matched in the query knowledge graph, obtain the final best matching graph, and return the corresponding flow chart. 2.根据权利要求1所述的一种基于语义相似度和子图匹配的流程图生成方法,其特征在于,步骤S2具体为:2. A method for generating a flow chart based on semantic similarity and subgraph matching according to claim 1, wherein step S2 is specifically: S21、对用户的需求文档Q以及RPA项目资产库用户文档
Figure QLYQS_1
进行分词,并去除停用词;
S21. Requirements document Q for users and user documents of RPA project asset library
Figure QLYQS_1
Perform word segmentation and remove stop words;
S22、使用文本中词出现的频率来对文档进行描述,将用户的需求文档Q以及RPA项目资产库用户文档
Figure QLYQS_2
表示成一维的向量;
S22. Use the frequency of words in the text to describe the document, and combine the user's demand document Q and the user document of the RPA project asset library
Figure QLYQS_2
Expressed as a one-dimensional vector;
S23、根据S22中得到的一维的向量,学习用户需求文档Q的每个词语和RPA项目资产库用户文档
Figure QLYQS_3
的每个词语的嵌入向量;
S23. According to the one-dimensional vector obtained in S22, learn each word of the user requirement document Q and the user document of the RPA project asset library
Figure QLYQS_3
The embedding vector of each word in ;
S24、使用WMD算法计算用户需求文档Q和RPA项目资产库用户文档
Figure QLYQS_4
之间的文本相似度。
S24, use the WMD algorithm to calculate the user requirement document Q and the user document of the RPA project asset library
Figure QLYQS_4
similarity between texts.
3.根据权利要求2所述的一种基于语义相似度和子图匹配的流程图生成方法,其特征在于,步骤S21中,使用分词组件jieba对用户的需求文档Q以及RPA项目资产库用户文档
Figure QLYQS_5
进行分词。
3. A flow chart generation method based on semantic similarity and subgraph matching according to claim 2, characterized in that in step S21, the word segmentation component jieba is used to analyze the user's demand document Q and the RPA project asset library user document
Figure QLYQS_5
Participate.
4.根据权利要求2所述的一种基于语义相似度和子图匹配的流程图生成方法,其特征在于,步骤S22中,使用归一化BOW词袋模型的方法来分别表示Q和
Figure QLYQS_6
4. A flow chart generation method based on semantic similarity and subgraph matching according to claim 2, characterized in that, in step S22, the method of normalized BOW bag-of-words model is used to represent Q and
Figure QLYQS_6
.
5.根据权利要求2所述的一种基于语义相似度和子图匹配的流程图生成方法,其特征在于,步骤S23中,使用word2vec方法学习用户需求文档Q的每个词语和RPA项目资产库用户文档
Figure QLYQS_7
的每个词语的嵌入向量。
5. a kind of flow chart generation method based on semantic similarity and subgraph matching according to claim 2, it is characterized in that, in step S23, use word2vec method to learn each word of user demand document Q and RPA project asset library user document
Figure QLYQS_7
Embedding vectors for each word in .
6.根据权利要求2所述的一种基于语义相似度和子图匹配的流程图生成方法,其特征在于,步骤S24具体为:6. A method for generating a flow chart based on semantic similarity and subgraph matching according to claim 2, wherein step S24 is specifically: S241、将文档Q和文档
Figure QLYQS_8
中的词出现的次数进行归一化处理,计算文档Q中第i个词汇的词频/>
Figure QLYQS_9
,文档/>
Figure QLYQS_10
中第j个词汇的词频为/>
Figure QLYQS_11
S241, document Q and document
Figure QLYQS_8
The number of occurrences of the words in the document Q is normalized to calculate the word frequency of the i-th vocabulary in the document Q/>
Figure QLYQS_9
, document />
Figure QLYQS_10
The word frequency of the jth vocabulary in is />
Figure QLYQS_11
:
Figure QLYQS_12
Figure QLYQS_12
Figure QLYQS_13
Figure QLYQS_13
其中,m、n分别为文档Q、文档Ds中词汇数,
Figure QLYQS_14
、/>
Figure QLYQS_15
分别表示文档Q、文档Ds中第i个词、第j个词出现的次数;
Among them, m and n are the number of words in document Q and document D s respectively,
Figure QLYQS_14
, />
Figure QLYQS_15
Respectively represent the number of occurrences of the i-th word and j-th word in document Q and document D s ;
S242、计算来自文档Q的词i和来自文档Ds的词j的两个词间的欧式距离为Ci,jS242. Calculate the Euclidean distance between word i from document Q and word j from document D s as C i,j :
Figure QLYQS_16
Figure QLYQS_16
其中,
Figure QLYQS_17
和/>
Figure QLYQS_18
为词i和j学习到的嵌入向量;
in,
Figure QLYQS_17
and />
Figure QLYQS_18
Embedding vectors learned for words i and j;
S243、利用动态规划算法求解文档Q和文档Ds的WMD距离:S243. Using a dynamic programming algorithm to solve the WMD distance between the document Q and the document D s :
Figure QLYQS_19
Figure QLYQS_19
Figure QLYQS_20
Figure QLYQS_20
Figure QLYQS_21
Figure QLYQS_21
其中,f为文档Q和文档Ds的WMD距离,
Figure QLYQS_22
表示将文档Q中的单词i映射到文档Ds中的单词j的权重,/>
Figure QLYQS_23
表示文档Q中的单词i和文档Ds中的单词j之间的距离;值得注意的是,文档Q的第i个单词对应到Ds的一个文档中所有单词的权重值的和等于/>
Figure QLYQS_24
,同理,Ds中的文档的第j个单词映射到文档Q的所有单词的权重值的和等于/>
Figure QLYQS_25
where f is the WMD distance between document Q and document D s ,
Figure QLYQS_22
represents the weight that maps word i in document Q to word j in document Ds , />
Figure QLYQS_23
Indicates the distance between word i in document Q and word j in document D s ; it is worth noting that the i-th word in document Q corresponds to the sum of the weight values of all words in a document of D s equal to />
Figure QLYQS_24
, in the same way, the sum of the weight values of all words in the document Q mapped to the jth word of the document in D s is equal to />
Figure QLYQS_25
;
S244、计算文档Q和文档Ds的相似度:S244. Calculate the similarity between the document Q and the document D s :
Figure QLYQS_26
Figure QLYQS_26
S245、设置相似度阈值
Figure QLYQS_27
,根据文档Q和文档Ds的相似度得到小于阈值/>
Figure QLYQS_28
的k个RPA项目资产文档以及k个项目对应的知识图谱和流程图。
S245, setting a similarity threshold
Figure QLYQS_27
, according to the similarity between document Q and document D s , it is less than the threshold />
Figure QLYQS_28
The k RPA project asset documents and the knowledge maps and flowcharts corresponding to the k projects.
7.根据权利要求1所述的一种基于语义相似度和子图匹配的流程图生成方法,其特征在于,步骤S4具体为:7. A method for generating a flow chart based on semantic similarity and subgraph matching according to claim 1, wherein step S4 is specifically: S41、使用TALE近似大图匹配工具在待匹配子图中搜索与查询知识图谱最佳可能匹配结果;S41. Use the TALE approximate large graph matching tool to search and query the best possible matching result of the knowledge graph in the subgraph to be matched; S42、步骤S41若搜索到最佳匹配结果,返回匹配到的候选子图以及匹配到的候选子图对应的流程图,若搜索不到,执行步骤S43;S42. If the best matching result is found in step S41, the matched candidate subgraph and the flow chart corresponding to the matched candidate subgraph are returned. If no search is found, step S43 is executed; S43、根据当前开始节点的边的属性为包含关系进行图划分得到子图集合,重复执行步骤S41和步骤S42,此时开始节点更新为当前节点的尾实体,直到获得最佳匹配结果或子图集合为空时结束。S43. According to the attribute of the edge of the current start node, divide the graph into a subgraph set for the inclusion relationship, and repeat steps S41 and S42. At this time, the start node is updated as the tail entity of the current node until the best matching result or subgraph is obtained. The collection ends when it is empty. 8.一种基于语义相似度和子图匹配的流程图生成装置,其特征在于,所述装置包括:8. A flow chart generation device based on semantic similarity and subgraph matching, characterized in that the device comprises: 处理器;processor; 存储器,其上存储有可在所述处理器上运行的计算机程序;a memory on which is stored a computer program executable on said processor; 其中,所述计算机程序被所述处理器执行时实现如权利要求1至7中任一项所述的一种基于语义相似度和子图匹配的流程图生成方法。Wherein, when the computer program is executed by the processor, a flow chart generation method based on semantic similarity and subgraph matching according to any one of claims 1 to 7 is implemented.
CN202310698508.2A 2023-06-14 2023-06-14 Flow chart generation method and device based on semantic similarity and sub-graph matching Active CN116433799B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310698508.2A CN116433799B (en) 2023-06-14 2023-06-14 Flow chart generation method and device based on semantic similarity and sub-graph matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310698508.2A CN116433799B (en) 2023-06-14 2023-06-14 Flow chart generation method and device based on semantic similarity and sub-graph matching

Publications (2)

Publication Number Publication Date
CN116433799A true CN116433799A (en) 2023-07-14
CN116433799B CN116433799B (en) 2023-08-25

Family

ID=87083684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310698508.2A Active CN116433799B (en) 2023-06-14 2023-06-14 Flow chart generation method and device based on semantic similarity and sub-graph matching

Country Status (1)

Country Link
CN (1) CN116433799B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628228A (en) * 2023-07-19 2023-08-22 安徽思高智能科技有限公司 RPA flow recommendation method and computer readable storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221558A1 (en) * 2011-02-28 2012-08-30 International Business Machines Corporation Identifying information assets within an enterprise using a semantic graph created using feedback re-enforced search and navigation
US20130111375A1 (en) * 2011-11-01 2013-05-02 Matthew Scott Frohliger Software user interface allowing logical expression to be expressed as a flowchart
CN110704574A (en) * 2019-09-03 2020-01-17 福建省农村信用社联合社 Method and system for managing bank business demand assets
CN111666740A (en) * 2020-06-22 2020-09-15 深圳壹账通智能科技有限公司 Flow chart generation method and device, computer equipment and storage medium
CN112836029A (en) * 2021-01-27 2021-05-25 润联软件系统(深圳)有限公司 A graph-based document retrieval method, system and related components
US20210209311A1 (en) * 2018-11-28 2021-07-08 Ping An Technology (Shenzhen) Co., Ltd. Sentence distance mapping method and apparatus based on machine learning and computer device
WO2021164171A1 (en) * 2020-02-17 2021-08-26 平安科技(深圳)有限公司 Method and apparatus for processing data in knowledge base, and computer device and storage medium
CN113641833A (en) * 2021-08-17 2021-11-12 同济大学 Service requirement matching method and device
US20210374479A1 (en) * 2020-06-02 2021-12-02 Accenture Global Solutions Limited Intelligent payment processing platform system and method
CN114298022A (en) * 2021-12-03 2022-04-08 天津大学 Subgraph matching method for large-scale complex semantic network
WO2022088409A1 (en) * 2020-10-28 2022-05-05 中国商用飞机有限责任公司北京民用飞机技术研究中心 Interactive retrieval method and apparatus, and computer device and storage medium
CN114780746A (en) * 2022-04-22 2022-07-22 润联软件系统(深圳)有限公司 Knowledge graph-based document retrieval method and related equipment thereof
US20220253871A1 (en) * 2020-10-22 2022-08-11 Assent Inc Multi-dimensional product information analysis, management, and application systems and methods
CN115496830A (en) * 2022-09-29 2022-12-20 中国银行股份有限公司 Method and device for generating product demand flow chart

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221558A1 (en) * 2011-02-28 2012-08-30 International Business Machines Corporation Identifying information assets within an enterprise using a semantic graph created using feedback re-enforced search and navigation
US20130111375A1 (en) * 2011-11-01 2013-05-02 Matthew Scott Frohliger Software user interface allowing logical expression to be expressed as a flowchart
US20210209311A1 (en) * 2018-11-28 2021-07-08 Ping An Technology (Shenzhen) Co., Ltd. Sentence distance mapping method and apparatus based on machine learning and computer device
CN110704574A (en) * 2019-09-03 2020-01-17 福建省农村信用社联合社 Method and system for managing bank business demand assets
WO2021164171A1 (en) * 2020-02-17 2021-08-26 平安科技(深圳)有限公司 Method and apparatus for processing data in knowledge base, and computer device and storage medium
US20210374479A1 (en) * 2020-06-02 2021-12-02 Accenture Global Solutions Limited Intelligent payment processing platform system and method
CN111666740A (en) * 2020-06-22 2020-09-15 深圳壹账通智能科技有限公司 Flow chart generation method and device, computer equipment and storage medium
US20220253871A1 (en) * 2020-10-22 2022-08-11 Assent Inc Multi-dimensional product information analysis, management, and application systems and methods
WO2022088409A1 (en) * 2020-10-28 2022-05-05 中国商用飞机有限责任公司北京民用飞机技术研究中心 Interactive retrieval method and apparatus, and computer device and storage medium
CN112836029A (en) * 2021-01-27 2021-05-25 润联软件系统(深圳)有限公司 A graph-based document retrieval method, system and related components
CN113641833A (en) * 2021-08-17 2021-11-12 同济大学 Service requirement matching method and device
CN114298022A (en) * 2021-12-03 2022-04-08 天津大学 Subgraph matching method for large-scale complex semantic network
CN114780746A (en) * 2022-04-22 2022-07-22 润联软件系统(深圳)有限公司 Knowledge graph-based document retrieval method and related equipment thereof
CN115496830A (en) * 2022-09-29 2022-12-20 中国银行股份有限公司 Method and device for generating product demand flow chart

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628228A (en) * 2023-07-19 2023-08-22 安徽思高智能科技有限公司 RPA flow recommendation method and computer readable storage medium
CN116628228B (en) * 2023-07-19 2023-09-19 安徽思高智能科技有限公司 An RPA process recommendation method and computer-readable storage medium

Also Published As

Publication number Publication date
CN116433799B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
Miao et al. Towards unified data and lifecycle management for deep learning
EP4182813A1 (en) Enterprise knowledge graph building with mined topics and relationships
WO2022019973A1 (en) Enterprise knowledge graphs using enterprise named entity recognition
US20100318499A1 (en) Declarative framework for deduplication
US10089390B2 (en) System and method to extract models from semi-structured documents
US11544323B2 (en) Annotations for enterprise knowledge graphs using multiple toolkits
Miao et al. Modelhub: Towards unified data and lifecycle management for deep learning
Adithya et al. OntoReq: an ontology focused collective knowledge approach for requirement traceability modelling
WO2022020005A1 (en) Enterprise knowledge graphs using user-based mining
Chakraborty et al. Semantic etl—state-of-the-art and open research challenges
CN116433799B (en) Flow chart generation method and device based on semantic similarity and sub-graph matching
Jurek-Loughrey et al. Semi-supervised and unsupervised approaches to record pairs classification in multi-source data linkage
Li et al. Deep hierarchical learning for 3d semantic segmentation
Li et al. Data+ AI: LLM4Data and Data4LLM
US20230359661A1 (en) Logic rule-based relative support and confidence for semi-structured document content extraction
WO2012174632A1 (en) Method and apparatus for preference guided data exploration
CN114443783B (en) Supply chain data analysis and enhancement processing method and device
Lin et al. An efficient modified Hyperband and trust-region-based mode-pursuing sampling hybrid method for hyperparameter optimization
Hsu et al. Similarity search over personal process description graph
Zhao et al. Big data processing with probabilistic latent semantic analysis on MapReduce
CN112214683A (en) Hybrid recommendation model processing method, system and medium based on heterogeneous information network
CN119494402B (en) Industrial recipe and process knowledge answer generation method and device
Yuan et al. Effective generation of relational schema from multi-model data with reinforcement learning
Martyniuk et al. Semantifying the PlanQK Platform and Ecosystem for Quantum Applications
CN113849163B (en) API (application program interface) document map-based operating system intelligent programming method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20230714

Assignee: HUBEI THINGO TECHNOLOGY DEVELOPMENT Co.,Ltd.

Assignor: Anhui Sigao Intelligent Technology Co.,Ltd.

Contract record no.: X2024980044492

Denomination of invention: A method and device for generating flowcharts based on semantic similarity and subgraph matching

Granted publication date: 20230825

License type: Exclusive License

Record date: 20250103

EE01 Entry into force of recordation of patent licensing contract