WO2025058122A1

WO2025058122A1 - System and method for generating multi-perspective event log

Info

Publication number: WO2025058122A1
Application number: PCT/KR2023/016843
Authority: WO
Inventors: 이상화; 원석래; 아스리아나 수트리스노와티리스카; 레비세이 풀샤시아이큐; 프라타마프란스; 김운재
Original assignee: Iochord Inc
Current assignee: Iochord Inc
Priority date: 2023-09-14
Filing date: 2023-10-27
Publication date: 2025-03-20
Anticipated expiration: 2026-03-14
Also published as: KR102655198B1

Abstract

The present invention relates to a system and a method for generating a multi-perspective event log, and may comprise: receiving a data source; extracting metadata from the data source; generating an event map and a sample event log by using the metadata; calculating the quality of the sample event log; and, if the quality value of the sample event log is greater than or equal to a reference value, generating, on the basis of the event map, a multi-perspective event log by collecting data from the data source.

Description

System and method for generating multi-view event logs

이하의 일 실시 예들은 다중 관점 이벤트로 로그를 생성하는 시스템 및 방법 에 관한 것이다.The following embodiments relate to systems and methods for generating logs with multi-view events.

프로세스 마이닝 분석은 비즈니스 프로세스의 다양한 관점을 나타내기 위해 특정 방식으로 그룹화할 수 있는 일련의 이벤트로 구성된 이벤트 로그를 입력으로 사용한다.Process mining analysis uses as input an event log, which consists of a series of events that can be grouped in a specific way to represent different aspects of a business process.

실제로 이벤트 로그는 대부분의 실제 상황에서 항상 쉽게 사용할 수 있는 것은 아니다. 대부분의 경우 이벤트 로그는 데이터 소스(예를 들어, 정보 시스템 또는 데이터베이스)에서 추출되어 프로세스 마이닝이 해석할 수 있는 특정 구조 및 형식으로 변환되어야 한다.In practice, event logs are not always readily available in most real-world situations. In most cases, event logs must be extracted from a data source (e.g., an information system or database) and converted into a specific structure and format that process mining can interpret.

추출 및 식별(예를 들어, 사례, 활동, 타임 스탬프) 프로세스에는 프로세스에 대한 수동 작업과 도메인 지식이 필요하며 시간과 비용이 많이 드는 이벤트 로그 생성의 주요 과제가 된다.The process of extracting and identifying (e.g., cases, activities, timestamps) events requires manual effort and domain knowledge about the process, making it a major challenge in generating event logs, which is time-consuming and costly.

이벤트 로그 생성의 또 다른 과제는 프로세스 관점을 다루는 것이다. 이벤트 로그는 일반적으로 특정 프로세스 분석 목표에 바인딩된다. 즉, 비즈니스 관점과 관점 자체가 데이터의 개체와 상호 연관될 수 있다.Another challenge in generating event logs is dealing with process perspectives. Event logs are usually bound to specific process analysis objectives, i.e., business perspectives and perspectives themselves can be correlated with entities in the data.

데이터베이스에 대한 여러 관점을 추출할 수 있다. 이는 동일한 프로세스 데이터에 대해 여러 이벤트 로그와 프로세스 모델이 생성될 수 있음을 의미한다.Multiple views can be extracted from the database, which means that multiple event logs and process models can be generated for the same process data.

특히, rdb2log 또는 OpenSLEX와 같은 데이터베이스에서 이벤트 로그를 자동으로 추출하기 위한 여러 가지 접근 방식이 제안되었다. 그러나 데이터베이스의 부분 또는 전체 스캔이 필요하므로 이러한 접근 방식의 계산 시간이 길고 소스가 관계형 데이터베이스로 제한된다.In particular, several approaches have been proposed to automatically extract event logs from databases, such as rdb2log or OpenSLEX. However, these approaches require partial or full scans of the database, which results in long computation times and limits the sources to relational databases.

본 발명은 다중 관점 이벤트 로그를 생성하는 시스템 및 방법을 제공하는 것을 목적으로 한다.The present invention aims to provide a system and method for generating a multi-view event log.

본 발명의 일 실시 예에 따른 다중 관점 이벤트 로그를 생성하는 방법은, 데이터 소스를 수신하는 단계; 상기 데이터 소스에서 메타 데이터를 추출하는 단계; 상기 메타 데이터를 이용해서 이벤트 맵과 샘플 이벤트 로그를 생성하는 단계; 상기 샘플 이벤트 로그의 품질을 계산하는 단계; 및 상기 샘플 이벤트 로그의 품질 값이 기준값 이상이면, 상기 이벤트 맵을 기반으로 상기 데이터 소스에서 데이터를 수집하여 다중 관점 이벤트 로그를 생성하는 단계를 포함할 수 있다.A method for generating a multi-view event log according to one embodiment of the present invention may include the steps of: receiving a data source; extracting metadata from the data source; generating an event map and a sample event log using the metadata; calculating a quality of the sample event log; and, if a quality value of the sample event log is equal to or higher than a reference value, collecting data from the data source based on the event map to generate a multi-view event log.

이때, 상기 메타 데이터를 이용해서 상기 이벤트 맵과 상기 샘플 이벤트 로그를 생성하는 단계는, 상기 추출된 메타데이터를 표준화된 엔터티 관계 데이터 모델로 변환하는 단계; 상기 엔터티 관계 데이터 모델들을 연결하여 데이터 카탈로그를 생성하는 단계; 상기 데이터 카탈로그에서 활동 개념 및 타임 스탬프 개념을 추론하는 단계; 상기 데이터 카탈로그에서 사례 개념을 추론하는 단계; 상기 활동 개념, 상기 타임 스탬프 개념 및 상기 사례 개념을 연관시켜 상기 이벤트 맵을 생성하는 단계; 및 상기 데이터 소스에서 샘플 데이터를 추출하고, 상기 이벤트 맵을 기반으로 상기 샘플 데이터에서 데이터를 추출해서 상기 샘플 이벤트 로그를 생성하는 단계를 포함할 수 있다.At this time, the step of generating the event map and the sample event log using the metadata may include the steps of: converting the extracted metadata into a standardized entity relationship data model; generating a data catalog by connecting the entity relationship data models; inferring an activity concept and a time stamp concept from the data catalog; inferring a case concept from the data catalog; generating the event map by associating the activity concept, the time stamp concept, and the case concept; and extracting sample data from the data source and extracting data from the sample data based on the event map to generate the sample event log.

이때, 상기 엔터티 관계 데이터 모델들을 연결하여 데이터 카탈로그를 생성하는 단계는, 상기 엔터티 관계 데이터 모델들에 포함된 필드 이름 유사성을 확인하는 단계; 확인된 유사성을 이용해서 상기 데이터 카탈로그가 비순환이 되도록 상기 엔터티 관계 데이터 모델들을 연결하는 단계; 및 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들 각각에서 필수적이지 않은 선택적인 필드를 제거하는 단계를 포함할 수 있다.At this time, the step of creating a data catalog by connecting the entity relationship data models may include the step of confirming the similarity of field names included in the entity relationship data models; the step of connecting the entity relationship data models so that the data catalog becomes acyclic by using the confirmed similarity; and the step of removing optional fields that are not essential from each of the entity relationship data models included in the data catalog.

이때, 상기 데이터 카탈로그에서 상기 활동 개념 및 상기 타임 스탬프 개념을 추론하는 단계는, 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들에 대한 각 필드의 데이터 유형을 분석하는 단계; 사전 기반 패턴 일치를 사용하여 상기 엔터티 관계 데이터 모델들의 각 필드 이름을 분석하는 단계; 상기 엔터티 관계 데이터 모델들에 포함된 필드 중에서 날짜 또는 시간 분석을 통해서 활동 개념과 타임 스탬프 개념을 추론하는 단계; 상기 엔터티 관계 데이터 모델들 각각에서 대표 타임 스템프 필드를 제외한 나머지 타임 스탬프 필드를 제거하는 단계; 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들에 중복된 열 이름을 제거하는 단계; 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들에 각각에 대해서 해당 엔터티 관계 데이터 모델에 의존하는 엔터티 관계 데이터 모델의 수를 기준으로 상기 엔터티 관계 데이터 모델들에 각각의 중심성을 계산하는 단계; 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들 중에서 타임 스탬프 개념 후보가 없고, 해당 엔터티 관계 데이터 모델을 의존하는 엔터티 관계 데이터 모델이 없는 엔터티 관계 데이터 모델을 제거하는 단계; 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들 각각에 대해서 최대 하나의 범주 필드와 숫자 필드를 가지도록 기본 키 또는 외래 키가 아니 필드 및 타임 스탬프에 해당하지 않는 필드를 제거하는 단계: 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들 각각에 대해서 타임 스템프 필드의 텍스트 유사성 분석을 통해서 타임 스탬프 개념 및 활동 수명 주기를 추론하는 단계; 및 상기 타임 스탬프 개념에서 날짜 및 시간 패턴을 제거하고, 상기 기본 키 또는 상기 외래 키가 아니 필드 및 상기 타임 스탬프에 해당하지 않는 필드를 연결하여 활동 개념을 추론하는 단계를 포함할 수 있다.At this time, the step of inferring the activity concept and the time stamp concept from the data catalog comprises: a step of analyzing the data type of each field of the entity relationship data models included in the data catalog; a step of analyzing each field name of the entity relationship data models using dictionary-based pattern matching; a step of inferring the activity concept and the time stamp concept through date or time analysis among the fields included in the entity relationship data models; a step of removing the remaining time stamp fields except for the representative time stamp field from each of the entity relationship data models; a step of removing duplicate column names in the entity relationship data models included in the data catalog; a step of calculating the centrality of each of the entity relationship data models based on the number of entity relationship data models that depend on each of the entity relationship data models included in the data catalog; a step of removing an entity relationship data model that has no time stamp concept candidate and no entity relationship data model that depends on the entity relationship data model among the entity relationship data models included in the data catalog; The method may further include: removing fields that are not primary keys or foreign keys and fields that do not correspond to timestamps so that each of the entity-relationship data models included in the data catalog has at most one category field and one numeric field; inferring a timestamp concept and an activity life cycle through text similarity analysis of a timestamp field for each of the entity-relationship data models included in the data catalog; and removing a date and time pattern from the timestamp concept and inferring an activity concept by connecting the fields that are not primary keys or foreign keys and the fields that do not correspond to timestamps.

이때, 상기 데이터 카탈로그에서 상기 사례 개념을 추론하는 단계는, 상기 데이터 카탈로그에 포함된 상기 엔터티 관계 데이터 모델들의 각 필드가 인용된 횟수와 인용된 횟수의 순위를 추론하여 상기 사례 개념을 생성할 수 있다.At this time, the step of inferring the case concept from the data catalog can generate the case concept by inferring the number of times each field of the entity relationship data models included in the data catalog is cited and the ranking of the number of times cited.

이때, 다중 관점 이벤트 로그를 생성하는 방법은, 상기 활동 개념, 상기 타임 스탬프 개념 및 상기 사례 개념 중에서 적어도 하나가 발견되지 않으면, 데이터 품질을 기반으로 이벤트 맵을 생성하는 단계; 및 상기 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 상기 데이터 소스에서 다중 관점 이벤트 로그를 생성하는 단계를 더 포함할 수 있다.At this time, the method for generating a multi-view event log may further include a step of generating an event map based on data quality if at least one of the activity concept, the timestamp concept and the case concept is not found; and a step of generating a multi-view event log from the data source based on the event map generated based on the data quality.

이때, 상기 샘플 이벤트 로그의 품질을 계산하는 단계는, 상기 샘플 이벤트 로그의 사례 식별자 비율, 추적 변형 비율, 케이스당 고유 활동의 평균 비율 및 엔드 포인트 활동 비율 및 시작 종료 활동 비율 중에서 적어도 하나를 고려해서 계산할 수 있다.At this time, the step of calculating the quality of the sample event log can be calculated by considering at least one of the case identifier ratio, the trace mutation ratio, the average ratio of unique activities per case, the endpoint activity ratio, and the start-end activity ratio of the sample event log.

이때, 다중 관점 이벤트 로그를 생성하는 방법은, 상기 샘플 이벤트 로그의 품질 값이 기준값 미만이면, 데이터 품질을 기반으로 이벤트 맵을 생성하는 단계; 및 상기 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 상기 데이터 소스에서 다중 관점 이벤트 로그를 생성하는 단계를 더 포함할 수 있다.At this time, the method for generating a multi-viewpoint event log may further include a step of generating an event map based on data quality if the quality value of the sample event log is less than a reference value; and a step of generating a multi-viewpoint event log from the data source based on the event map generated based on the data quality.

이때, 상기 데이터 품질을 기반으로 이벤트 맵을 생성하는 단계는, 상기 데이터 소스로부터 데이터 품질 지표를 계산하는 단계; 상기 데이터 소스로부터 데이터 품질 차원과 평균값을 계산하는 단계; 분류 알고리즘을 사용하여 이벤트 개념 후보를 추론하는 단계; 상기 이벤트 개념 후보를 기반으로 다수의 후보 이벤트 맵을 생성하는 단계; 상기 데이터 소스에서 샘플 데이터를 추출하고, 상기 다수의 후보 이벤트 맵 각각을 기반으로 샘플 후보 이벤트 로그를 생성하는 단계; 상기 샘플 후보 이벤트 로그 각각에 대한 품질을 계산하는 단계; 및 상기 샘플 후보 이벤트 로그의 품질을 기반으로 상기 다수의 후보 이벤트 맵 중에서 상기 이벤트 맵을 결정하는 단계를 포함할 수 있다.At this time, the step of generating an event map based on the data quality may include: a step of calculating a data quality index from the data source; a step of calculating a data quality dimension and an average value from the data source; a step of inferring an event concept candidate using a classification algorithm; a step of generating a plurality of candidate event maps based on the event concept candidates; a step of extracting sample data from the data source and generating a sample candidate event log based on each of the plurality of candidate event maps; a step of calculating a quality for each of the sample candidate event logs; and a step of determining the event map from among the plurality of candidate event maps based on a quality of the sample candidate event log.

이때, 상기 분류 알고리즘을 사용하여 상기 이벤트 개념 후보를 추론하는 단계는, 사전에 학습된 이벤트 로그를 학습 데이터로 상기 분류 알고리즘의 입력으로 하여 생성된 분류 모델을 이용하여 상기 데이터 소스로부터 추출되는 엔터티 관계 데이터 모델들의 각 필드에 대한 이벤트 개념을 상기 이벤트 개념 후보로 예측하는 단계; 상기 엔터티 관계 데이터 모델들에서 기본 키 또는 외래 키가 아니면서 이벤트 개념 필드가 아닌 필드를 제거하는 단계; 상기 엔터티 관계 데이터 모델들에 포함된 모든 상기 이벤트 개념 필드에 대한 데이터 품질 지표의 평균값을 평균해서 임계값으로 설정하고, 상기 이벤트 개념 필드의 데이터 품질 지표의 평균값이 상기 임계값 보다 낮은 상기 이벤트 개념 필드를 제거하는 단계; 상기 엔터티 관계 데이터 모델들에 포함된 필드 이름은 동일하나 이벤트 개념 필드의 이벤트 개념 후보가 다른 필드를 같은 이벤트 개념 후보를 가지도록 수정하는 단계; 및 상기 엔터티 관계 데이터 모델들에 포함된 동일한 필드 이름을 가진 필드를 가장 높은 데이터 품질 지표의 평균값을 가진 필드를 제외하고 제거하는 단계를 포함할 수 있다.At this time, the step of inferring the event concept candidate using the classification algorithm may include the step of predicting the event concept for each field of the entity relationship data models extracted from the data source as the event concept candidate using the classification model generated by using the previously learned event log as learning data as input to the classification algorithm; the step of removing fields that are not primary keys or foreign keys and are not event concept fields from the entity relationship data models; the step of setting the average value of the data quality indicators for all the event concept fields included in the entity relationship data models as a threshold value and removing the event concept field whose average value of the data quality indicator of the event concept field is lower than the threshold value; the step of modifying fields with the same field name included in the entity relationship data models but whose event concept candidates of the event concept field are different to have the same event concept candidate; and the step of removing fields with the same field name included in the entity relationship data models except for the field with the highest average value of the data quality indicator.

이때, 상기 샘플 후보 이벤트 로그의 품질을 기반으로 상기 다수의 후보 이벤트 맵 중에서 상기 이벤트 맵을 결정하는 단계는, 품질이 가장 높은 상기 샘플 후보 이벤트 로그에 대응하는 후보 이벤트 맵을 상기 이벤트 맵을 결정할 수 있다.At this time, the step of determining the event map from among the plurality of candidate event maps based on the quality of the sample candidate event log may determine the event map as the candidate event map corresponding to the sample candidate event log having the highest quality.

이때, 상기 샘플 후보 이벤트 로그의 품질을 기반으로 상기 다수의 후보 이벤트 맵 중에서 상기 이벤트 맵을 결정하는 단계는, 상기 다수의 후보 이벤트 맵과 대응하는 상기 샘플 후보 이벤트 로그의 품질 정보를 사용자에게 제공하고, 상기 사용자가 선택하는 후보 이벤트 맵을 상기 이벤트 맵을 결정할 수 있다.At this time, the step of determining the event map from among the plurality of candidate event maps based on the quality of the sample candidate event logs may include providing the user with quality information of the sample candidate event logs corresponding to the plurality of candidate event maps, and determining the event map based on the candidate event map selected by the user.

본 발명의 일 실시 예에 따른 다중 관점 이벤트 로그를 생성하는 시스템은, 데이터 소스를 저장하는 메모리; 및 프로세서를 포함하고, 상기 프로세서는, 상기 데이터 소스에서 메타 데이터를 추출하고, 상기 메타 데이터를 이용해서 이벤트 맵과 샘플 이벤트 로그를 생성하고, 상기 샘플 이벤트 로그의 품질을 계산하고, 상기 샘플 이벤트 로그의 품질 값이 기준값 이상이면, 상기 이벤트 맵을 기반으로 상기 데이터 소스에서 데이터를 수집하여 다중 관점 이벤트 로그를 생성할 수 있다.A system for generating a multi-view event log according to one embodiment of the present invention comprises: a memory for storing a data source; and a processor, wherein the processor extracts metadata from the data source, generates an event map and a sample event log using the metadata, calculates a quality of the sample event log, and if the quality value of the sample event log is equal to or higher than a reference value, collects data from the data source based on the event map to generate a multi-view event log.

이때, 상기 프로세서는, 상기 메타 데이터를 이용해서 상기 이벤트 맵과 상기 샘플 이벤트 로그를 생성할 때, 상기 추출된 메타데이터를 표준화된 엔터티 관계 데이터 모델로 변환하고, 상기 엔터티 관계 데이터 모델들을 연결하여 데이터 카탈로그를 생성하고, 상기 데이터 카탈로그에서 활동 개념 및 타임 스탬프 개념을 추론하고, 상기 데이터 카탈로그에서 사례 개념을 추론하고, 상기 활동 개념, 상기 타임 스탬프 개념 및 상기 사례 개념을 연관시켜 상기 이벤트 맵을 생성하고, 상기 데이터 소스에서 샘플 데이터를 추출하고, 상기 이벤트 맵을 기반으로 상기 샘플 데이터에서 데이터를 추출해서 상기 샘플 이벤트 로그를 생성할 수 있다.At this time, the processor may convert the extracted metadata into a standardized entity relationship data model when generating the event map and the sample event log using the metadata, generate a data catalog by connecting the entity relationship data models, infer an activity concept and a time stamp concept from the data catalog, infer a case concept from the data catalog, generate the event map by associating the activity concept, the time stamp concept, and the case concept, extract sample data from the data source, and extract data from the sample data based on the event map to generate the sample event log.

이때, 상기 프로세서는, 상기 활동 개념, 상기 타임 스탬프 개념 및 상기 사례 개념 중에서 적어도 하나가 발견되지 않으면, 데이터 품질을 기반으로 이벤트 맵을 생성하고, 상기 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 상기 데이터 소스에서 다중 관점 이벤트 로그를 생성할 수 있다.At this time, the processor can generate an event map based on data quality if at least one of the activity concept, the timestamp concept and the case concept is not found, and can generate a multi-view event log from the data source based on the event map generated based on the data quality.

이때, 상기 프로세서는, 상기 샘플 이벤트 로그의 품질을 계산할 때, 상기 샘플 이벤트 로그의 사례 식별자 비율, 추적 변형 비율, 케이스당 고유 활동의 평균 비율 및 엔드 포인트 활동 비율 및 시작 종료 활동 비율 중에서 적어도 하나를 고려해서 계산할 수 있다.At this time, the processor may calculate the quality of the sample event log by considering at least one of the case identifier ratio, the trace mutation ratio, the average ratio of unique activities per case, the endpoint activity ratio, and the start-end activity ratio of the sample event log.

이때, 상기 프로세서는, 상기 샘플 이벤트 로그의 품질 값이 기준값 미만이면, 데이터 품질을 기반으로 이벤트 맵을 생성하고, 상기 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 상기 데이터 소스에서 다중 관점 이벤트 로그를 생성할 수 있다.At this time, the processor can generate an event map based on data quality if the quality value of the sample event log is less than a reference value, and generate a multi-viewpoint event log from the data source based on the event map generated based on the data quality.

본 발명은 데이터 소스를 수신하고, 상기 데이터 소스에서 메타 데이터를 추출하고, 상기 메타 데이터를 이용해서 이벤트 맵과 샘플 이벤트 로그를 생성하고, 상기 샘플 이벤트 로그의 품질을 계산하고, 상기 샘플 이벤트 로그의 품질 값이 기준값 이상이면, 상기 이벤트 맵을 기반으로 상기 데이터 소스에서 데이터를 수집하여 다중 관점 이벤트 로그를 생성하는 다중 관점 이벤트 로그를 생성하는 시스템 및 방법에 관한 것으로, 다양한 사례 개념이 자동으로 추론되기 때문에 사용자에게 사례 개념을 지정하거나 추론하도록 강요하지 않고도 다중 관점 이벤트 로그를 추출할 수 있다.The present invention relates to a system and method for generating a multi-perspective event log, which receives a data source, extracts metadata from the data source, generates an event map and a sample event log using the metadata, calculates the quality of the sample event log, and generates a multi-perspective event log by collecting data from the data source based on the event map if the quality value of the sample event log is higher than a reference value. Since various case concepts are automatically inferred, the multi-perspective event log can be extracted without forcing the user to specify or infer case concepts.

도 1은 본 발명의 일 실시 예에 따른 다중 관점 이벤트 로그를 생성하는 시스템의 개략적인 구성을 도시한 도면이다.FIG. 1 is a diagram schematically illustrating a configuration of a system for generating a multi-view event log according to an embodiment of the present invention.

도 2는 본 발명의 일 실시 예에 따른 시스템에서 다중 관점 이벤트 로그를 생성하는 개략적인 과정을 도시한 흐름도이다.FIG. 2 is a flowchart schematically illustrating a process for generating a multi-view event log in a system according to one embodiment of the present invention.

도 3은 본 발명의 일 실시 예에 따른 시스템에서 메타 데이터를 이용해서 이벤트 맵과 샘플 이벤트 로그를 생성하는 과정을 도시한 흐름도이다.FIG. 3 is a flowchart illustrating a process of generating an event map and a sample event log using metadata in a system according to an embodiment of the present invention.

도 4는 본 발명의 일 실시 예에 따른 시스템에서 데이터 카탈로그를 생성하는 과정을 도시한 흐름도이다.FIG. 4 is a flowchart illustrating a process of creating a data catalog in a system according to one embodiment of the present invention.

도 5는 본 발명의 일 실시 예에 따른 시스템에서 데이터 카탈로그에서 활동 개념 및 타임 스탬프 개념을 추론하는 과정을 도시한 흐름도이다.FIG. 5 is a flowchart illustrating a process of inferring an activity concept and a time stamp concept from a data catalog in a system according to one embodiment of the present invention.

도 6은 본 발명의 일 실시 예에 따른 시스템에서 데이터 품질을 기반으로 이벤트 맵을 생성하는 과정을 도시한 흐름도이다.FIG. 6 is a flowchart illustrating a process of generating an event map based on data quality in a system according to an embodiment of the present invention.

도 7은 본 발명의 일 실시 예에 따른 시스템에서 분류 알고리즘을 사용해서 이벤트 개면 후보를 추론하는 과정을 도시한 흐름도이다.FIG. 7 is a flowchart illustrating a process of inferring event opening candidates using a classification algorithm in a system according to one embodiment of the present invention.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. However, since various modifications may be made to the embodiments, the scope of rights of the patent application is not limited or restricted by these embodiments. It should be understood that all modifications, equivalents, or substitutes to the embodiments are included in the scope of rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are for the purpose of description only and should not be construed as limiting. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this specification, the terms "comprises" or "has" and the like are intended to specify the presence of a feature, number, step, operation, component, part or combination thereof described in the specification, but should be understood to not exclude in advance the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments belong. Terms defined in commonly used dictionaries, such as those defined in common dictionaries, should be interpreted as having a meaning consistent with the meaning they have in the context of the relevant art, and shall not be interpreted in an idealized or overly formal sense, unless expressly defined in this application.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, when describing with reference to the attached drawings, the same components will be given the same reference numerals regardless of the drawing numbers, and redundant descriptions thereof will be omitted. When describing an embodiment, if it is determined that a specific description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

또한, 실시 예의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다. Also, in describing components of the embodiments, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only intended to distinguish the components from other components, and the nature, order, or sequence of the components are not limited by the terms. When it is described that a component is "connected," "coupled," or "connected" to another component, it should be understood that the component may be directly connected or connected to the other component, but another component may also be "connected," "coupled," or "connected" between each component.

어느 하나의 실시 예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성요소는, 다른 실시 예에서 동일한 명칭을 사용하여 설명하기로 한다. 반대되는 기재가 없는 이상, 어느 하나의 실시 예에 기재한 설명은 다른 실시 예에도 적용될 수 있으며, 중복되는 범위에서 구체적인 설명은 생략하기로 한다.Components included in one embodiment and components that have common functions will be described using the same names in other embodiments. Unless otherwise stated, descriptions made in one embodiment may be applied to other embodiments, and specific descriptions will be omitted to the extent of overlap.

이하에서는, 본 발명의 일 실시 예에 따른 다중 관점 이벤트 로그를 생성하는 시스템 및 방법을 첨부된 도 1 내지 도 7을 참조하여 상세히 설명한다.Hereinafter, a system and method for generating a multi-view event log according to an embodiment of the present invention will be described in detail with reference to the attached FIGS. 1 to 7.

도 1을 참조하면, 다중 관점 이벤트 로그를 생성하는 시스템(100)은 프로세서(110), 통신부(120) 및 메모리(130)를 포함하여 구성될 수 있다. Referring to FIG. 1, a system (100) for generating a multi-view event log may be configured to include a processor (110), a communication unit (120), and a memory (130).

통신부(120)는 수신기(Receiver)와 송신기(transmitter)를 포함하는 통신 인터페이스 장치로서 유선 또는 무선으로 데이터를 송수신할 수 있다. 통신부(120)는 외부 데이터베이스 서버 등과 통신을 연결하여 데이터 소스를 수신할 수 있다.The communication unit (120) is a communication interface device including a receiver and a transmitter, and can transmit and receive data wired or wirelessly. The communication unit (120) can receive a data source by connecting to an external database server, etc.

메모리(130)는 시스템(100)의 전반적인 동작을 제어하기 위한 운영체제, 응용 프로그램 및 저장용 데이터를 저장하고, 또한 본 발명에 따라 데이터 소스, 메타 데이터, 엔터티 관계 데이터 모델들, 데이터 카탈로그, 이벤트 맵, 샘플 이벤트 로그 및 다중 관점 이벤트 로그를 저장할 수 있다.The memory (130) stores an operating system, application programs and storage data for controlling the overall operation of the system (100), and can also store data sources, metadata, entity relationship data models, data catalogs, event maps, sample event logs and multi-view event logs according to the present invention.

프로세서(110)는 메타 데이터 추출부(111), 제1 이벤트 맵 생성부(112), 품질 계산부(113), 제2 이벤트 맵 생성부(114) 및 다중 관점 이벤트 로그 생성부(115)를 포함하여 구성될 수 있다. The processor (110) may be configured to include a metadata extraction unit (111), a first event map generation unit (112), a quality calculation unit (113), a second event map generation unit (114), and a multi-viewpoint event log generation unit (115).

메타 데이터 추출부(111)는 메모리(130)에서 데이터 소스를 수신하고, 데이터 소스에서 메타 데이터를 추출할 수 있다.The metadata extraction unit (111) can receive a data source from the memory (130) and extract metadata from the data source.

제1 이벤트 맵 생성부(112)는 메타 데이터를 이용해서 이벤트 맵과 샘플 이벤트 로그를 생성할 수 있다.The first event map generation unit (112) can generate an event map and a sample event log using metadata.

보다 구체적으로, 제1 이벤트 맵 생성부(112)는 1) 추출된 메타데이터를 표준화된 엔터티 관계 데이터 모델로 변환하고, 2) 엔터티 관계 데이터 모델들을 연결하여 데이터 카탈로그를 생성하고, 3) 데이터 카탈로그에서 활동 개념(activity notion) 및 타임 스탬프 개념(timestamp notion)을 추론하고, 데이터 카탈로그에서 사례 개념(case notion) 을 추론하고, 4) 활동 개념, 타임 스탬프 개념 및 사례 개념을 연관시켜 이벤트 맵을 생성하고, 5) 데이터 소스에서 샘플 데이터를 추출하고, 이벤트 맵을 기반으로 샘플 데이터에서 데이터를 추출해서 샘플 이벤트 로그를 생성할 수 있다.More specifically, the first event map generation unit (112) can 1) convert extracted metadata into a standardized entity relationship data model, 2) generate a data catalog by connecting entity relationship data models, 3) infer an activity notion and a timestamp notion from the data catalog, infer a case notion from the data catalog, 4) generate an event map by associating the activity concept, the timestamp concept, and the case concept, and 5) extract sample data from a data source and extract data from the sample data based on the event map to generate a sample event log.

이때, 제1 이벤트 맵 생성부(112)는 활동 개념, 타임 스탬프 개념 및 사례 개념 중에서 적어도 하나가 발견되지 않으면, 제2 이벤트 맵 생성부(114)로 데이터 품질을 기반으로 이벤트 맵을 생성을 요청할 수 있다.At this time, if at least one of the activity concept, the time stamp concept, and the case concept is not found, the first event map generation unit (112) can request the second event map generation unit (114) to generate an event map based on data quality.

제1 이벤트 맵 생성부(112)는 엔터티 관계 데이터 모델들을 연결하여 데이터 카탈로그를 생성할 때, 다음의 과정으로 생성할 수 있다. 제1 이벤트 맵 생성부(112)는 1) 엔터티 관계 데이터 모델들에 포함된 필드 이름 유사성을 확인하고, 2) 확인된 유사성을 이용해서 데이터 카탈로그가 비순환이 되도록 엔터티 관계 데이터 모델들을 연결하고, 3) 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 각각에서 필수적이지 않은 선택적인 필드를 제거하여 데이터 카탈로그를 생성할 수 있다. The first event map generation unit (112) can generate a data catalog by connecting entity relationship data models through the following process. The first event map generation unit (112) can generate a data catalog by 1) checking the similarity of field names included in entity relationship data models, 2) using the checked similarity to connect entity relationship data models so that the data catalog is acyclic, and 3) removing optional fields that are not essential from each of the entity relationship data models included in the data catalog.

제1 이벤트 맵 생성부(112)는 데이터 카탈로그에서 활동 개념 및 타임 스탬프 개념을 추론 할 때, 다음의 과정으로 추론할 수 있다. 제1 이벤트 맵 생성부(112)는 1) 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들에 대한 각 필드의 데이터 유형을 분석하고, 2) 사전 기반 패턴 일치를 사용하여 엔터티 관계 데이터 모델들의 각 필드 이름을 분석하고, 3) 엔터티 관계 데이터 모델들에 포함된 필드 중에서 날짜 또는 시간 분석을 통해서 활동 개념과 타임 스탬프 개념을 추론하고, 4) 엔터티 관계 데이터 모델들 각각에서 대표 타임 스템프 필드를 제외한 나머지 타임 스탬프 필드를 제거하고, 5) 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들에 중복된 열 이름을 제거하고, 6) 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들에 각각에 대해서 해당 엔터티 관계 데이터 모델에 의존하는 엔터티 관계 데이터 모델의 수를 기준으로 엔터티 관계 데이터 모델들에 각각의 중심성을 계산하고, 7) 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 중에서 타임 스탬프 개념 후보가 없고, 해당 엔터티 관계 데이터 모델을 의존하는 엔터티 관계 데이터 모델이 없는 엔터티 관계 데이터 모델을 제거하고, 8) 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 각각에 대해서 최대 하나의 범주 필드와 숫자 필드를 가지도록 기본 키 또는 외래 키가 아니 필드 및 타임 스탬프에 해당하지 않는 필드를 제거하고, 9) 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 각각에 대해서 타임 스템프 필드의 텍스트 유사성 분석을 통해서 타임 스탬프 개념 및 활동 수명 주기를 추론하고, 10) 타임 스탬프 개념에서 날짜 및 시간 패턴을 제거하고, 기본 키 또는 외래 키가 아니 필드 및 타임 스탬프에 해당하지 않는 필드를 연결하여 활동 개념을 추론할 수 있다.The first event map generation unit (112) can infer the activity concept and the timestamp concept from the data catalog through the following process. The first event map generation unit (112) 1) analyzes the data type of each field for the entity relationship data models included in the data catalog, 2) analyzes each field name of the entity relationship data models using dictionary-based pattern matching, 3) infers the activity concept and the timestamp concept through date or time analysis among the fields included in the entity relationship data models, 4) removes the remaining timestamp fields except for the representative timestamp field from each of the entity relationship data models, 5) removes duplicate column names in the entity relationship data models included in the data catalog, 6) calculates the centrality of each entity relationship data model based on the number of entity relationship data models that depend on each entity relationship data model included in the data catalog, 7) removes entity relationship data models that do not have a timestamp concept candidate and do not have an entity relationship data model that depends on the entity relationship data model included in the data catalog, and 8) generates a centrality map for each entity relationship data model included in the data catalog. 9) For each entity-relationship data model included in the data catalog, the timestamp concept and activity life cycle can be inferred through text similarity analysis of the timestamp field, and 10) the date and time pattern can be removed from the timestamp concept and the activity concept can be inferred by linking the fields that are not primary or foreign keys and timestamps so that there is at most one category field and one numeric field.

제1 이벤트 맵 생성부(112)는 데이터 카탈로그에서 사례 개념을 추론할 때, 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들의 각 필드가 인용된 횟수와 인용된 횟수의 순위를 추론하여 사례 개념을 생성할 수 있다.When inferring a case concept from a data catalog, the first event map generation unit (112) can generate a case concept by inferring the number of times each field of entity relationship data models included in the data catalog is cited and the ranking of the number of times cited.

품질 계산부(113)는 샘플 이벤트 로그의 품질을 계산할 수 있다.The quality calculation unit (113) can calculate the quality of a sample event log.

보다 구체적으로, 품질 계산부(113)는 샘플 이벤트 로그의 사례 식별자 비율, 추적 변형 비율, 케이스당 고유 활동의 평균 비율, 엔드 포인트 활동 비율 및 시작 종료 활동 비율 중에서 적어도 하나를 고려해서 샘플 이벤트 로그의 품질을 계산할 수 있다.More specifically, the quality calculation unit (113) can calculate the quality of the sample event log by considering at least one of the case identifier ratio, the trace mutation ratio, the average ratio of unique activities per case, the endpoint activity ratio, and the start-end activity ratio of the sample event log.

제2 이벤트 맵 생성부(114)는 샘플 이벤트 로그의 품질 값이 기준값 미만이면, 데이터 품질을 기반으로 이벤트 맵을 생성할 수 있다. 보다 구체적으로, 제2 이벤트 맵 생성부(114)는 1) 데이터 소스로부터 데이터 품질 지표를 계산하고, 2) 데이터 소스로부터 데이터 품질 차원과 평균값을 계산하고, 3) 분류 알고리즘을 사용하여 이벤트 개념 후보를 추론하고, 4) 이벤트 개념 후보를 기반으로 다수의 후보 이벤트 맵을 생성하고, 5) 데이터 소스에서 샘플 데이터를 추출하고, 다수의 후보 이벤트 맵 각각을 기반으로 샘플 후보 이벤트 로그를 생성하고, 6) 샘플 후보 이벤트 로그 각각에 대한 품질을 계산하고, 7) 샘플 후보 이벤트 로그의 품질을 기반으로 다수의 후보 이벤트 맵 중에서 이벤트 맵을 결정할 수 있다. The second event map generation unit (114) can generate an event map based on data quality if the quality value of the sample event log is lower than a reference value. More specifically, the second event map generation unit (114) can 1) calculate a data quality index from a data source, 2) calculate a data quality dimension and an average value from the data source, 3) infer event concept candidates using a classification algorithm, 4) generate a plurality of candidate event maps based on the event concept candidates, 5) extract sample data from the data source and generate sample candidate event logs based on each of the plurality of candidate event maps, 6) calculate the quality for each of the sample candidate event logs, and 7) determine an event map from among the plurality of candidate event maps based on the quality of the sample candidate event logs.

제2 이벤트 맵 생성부(114)는 샘플 후보 이벤트 로그의 품질을 기반으로 다수의 후보 이벤트 맵 중에서 이벤트 맵을 결정할 때, 완전 자동으로 설정된 경우, 품질이 가장 높은 샘플 후보 이벤트 로그에 대응하는 후보 이벤트 맵을 이벤트 맵을 결정할 수 있다.When the second event map generation unit (114) determines an event map from among a plurality of candidate event maps based on the quality of the sample candidate event logs, if set to fully automatic, it can determine the candidate event map corresponding to the sample candidate event log with the highest quality as the event map.

제2 이벤트 맵 생성부(114)는 샘플 후보 이벤트 로그의 품질을 기반으로 다수의 후보 이벤트 맵 중에서 이벤트 맵을 결정할 때, 수동으로 설정된 경우, 다수의 후보 이벤트 맵과 대응하는 샘플 후보 이벤트 로그의 품질 정보를 사용자에게 제공하고, 사용자가 선택하는 후보 이벤트 맵을 이벤트 맵을 결정할 수 있다. The second event map generation unit (114) determines an event map from among a plurality of candidate event maps based on the quality of the sample candidate event logs. If manually set, the second event map generation unit (114) provides the user with quality information of the sample candidate event logs corresponding to the plurality of candidate event maps, and determines the event map based on the candidate event map selected by the user.

제2 이벤트 맵 생성부(114)는 분류 알고리즘을 사용하여 이벤트 개념 후보를 추론할 때, 다음의 과정으로 추론할 수 있다. 제2 이벤트 맵 생성부(114)는 1) 사전에 학습된 이벤트 로그를 학습 데이터로 분류 알고리즘의 입력으로 하여 생성된 분류 모델을 이용하여 데이터 소스로부터 추출되는 엔터티 관계 데이터 모델들의 각 필드에 대한 이벤트 개념을 이벤트 개념 후보로 예측하고, 2) 엔터티 관계 데이터 모델들에서 기본 키 또는 외래 키가 아니면서 이벤트 개념 필드가 아닌 필드를 제거하고, 3) 엔터티 관계 데이터 모델들에 포함된 모든 이벤트 개념 필드에 대한 데이터 품질 지표의 평균값을 평균해서 임계값으로 설정하고, 이벤트 개념 필드의 데이터 품질 지표의 평균값이 임계값 보다 낮은 이벤트 개념 필드를 제거하고, 4) 엔터티 관계 데이터 모델들에 포함된 필드 이름은 동일하나 이벤트 개념 필드의 이벤트 개념 후보가 다른 필드를 같은 이벤트 개념 후보를 가지도록 수정하고, 5) 엔터티 관계 데이터 모델들에 포함된 동일한 필드 이름을 가진 필드를 가장 높은 데이터 품질 지표의 평균값을 가진 필드를 제외하고 제거하여 이벤트 개념 후보를 추론할 수 있다.The second event map generation unit (114) can infer event concept candidates by the following process when inferring them using a classification algorithm. The second event map generation unit (114) 1) predicts event concepts for each field of entity relationship data models extracted from a data source as event concept candidates by using a classification model generated by using pre-learned event logs as learning data as input to a classification algorithm, 2) removes fields that are not primary keys or foreign keys and are not event concept fields from the entity relationship data models, 3) sets the average value of data quality indicators for all event concept fields included in the entity relationship data models as a threshold value, and removes event concept fields whose average value of data quality indicators of event concept fields is lower than the threshold value, 4) modifies fields with the same field names included in the entity relationship data models but whose event concept candidates are different to have the same event concept candidates, and 5) removes fields with the same field names included in the entity relationship data models except for the field with the highest average value of data quality indicators, thereby inferring event concept candidates.

또한, 제2 이벤트 맵 생성부(114)는 제1 이벤트 맵 생성부(112)로부터 활동 개념, 타임 스탬프 개념 및 사례 개념 중에서 적어도 하나가 발견되지 않았다는 이유로 이벤트 맵의 생성을 요청받으면, 1) 데이터 품질을 기반으로 이벤트 맵을 생성을 하고, 2) 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 데이터 소스에서 다중 관점 이벤트 로그를 생성할 수 있다.In addition, if the second event map generation unit (114) is requested to generate an event map because at least one of the activity concept, the time stamp concept, and the case concept is not found by the first event map generation unit (112), 1) an event map can be generated based on data quality, and 2) a multi-viewpoint event log can be generated from a data source based on the event map generated based on data quality.

다중 관점 이벤트 로그 생성부(115)는 품질 계산부(113)에서 계산한 샘플 이벤트 로그의 품질 값이 기준값 이상이면, 이벤트 맵을 기반으로 데이터 소스에서 데이터를 수집하여 다중 관점 이벤트 로그를 생성할 수 있다.The multi-viewpoint event log generation unit (115) can generate a multi-viewpoint event log by collecting data from a data source based on an event map if the quality value of the sample event log calculated by the quality calculation unit (113) is higher than a reference value.

다중 관점 이벤트 로그 생성부(115)는 제2 이벤트 맵 생성부(114)에서 데이터 품질을 기반으로 생성된 이벤트 맵을 생성하면, 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 데이터 소스에서 다중 관점 이벤트 로그를 생성할 수 있다.The multi-viewpoint event log generation unit (115) can generate a multi-viewpoint event log from a data source based on the event map generated based on data quality by generating an event map generated based on data quality in the second event map generation unit (114).

도 1에서 메타 데이터 추출부(111), 제1 이벤트 맵 생성부(112), 품질 계산부(113), 제2 이벤트 맵 생성부(114) 및 다중 관점 이벤트 로그 생성부(115)는 프로세서(110)에 포함되어 구성되어 있지만, 이에 한정되는 것은 아니며, 별도의 장치로 구성될 수도 있다. 또한, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현될 수도 있다.In Fig. 1, the metadata extraction unit (111), the first event map generation unit (112), the quality calculation unit (113), the second event map generation unit (114), and the multi-viewpoint event log generation unit (115) are configured to be included in the processor (110), but are not limited thereto and may be configured as separate devices. In addition, they may be implemented in the form of program commands that can be executed through various computer means.

이하, 상기와 같이 구성된 본 발명에 따른 방법을 아래에서 도면을 참조하여 설명한다.Hereinafter, the method according to the present invention configured as described above will be described with reference to the drawings below.

도 2를 참조하면, 시스템(100)은 데이터 소스를 수신할 수 있다(210).Referring to FIG. 2, the system (100) can receive a data source (210).

그리고, 시스템(100)은 데이터 소스에서 메타 데이터를 추출할 수 있다(220).And, the system (100) can extract metadata from a data source (220).

이때, 메타 데이터는 기본 키, 외래 키 인지 여부를 표시하는 키 정보, 해당 필드가 무엇인지 나타내는 필드 이름, 해당 필드가 케이스 유형 또는 타임 스템프 유형인지 여부를 구분하는 유형 정보, 해당 필드가 필수인지 옵션 인지 여부를 표시하는 필수 정보를 포함하여 구성될 수 있다.At this time, metadata can be composed of key information indicating whether it is a primary key or a foreign key, a field name indicating what the field is, type information distinguishing whether the field is a case type or a timestamp type, and required information indicating whether the field is required or optional.

그리고, 시스템(100)은 메타 데이터를 이용해서 이벤트 맵과 샘플 이벤트 로그를 생성할 수 있다(230). 230단계의 구체적은 설명은 추후 도 3을 통해서 설명하고자 한다.And, the system (100) can generate an event map and a sample event log using metadata (230). A detailed description of step 230 will be provided later with reference to FIG. 3.

그리고, 시스템(100)은 샘플 이벤트 로그의 품질을 계산할 수 있다(240).Additionally, the system (100) can calculate the quality of the sample event log (240).

240단계에서 시스템(100)은 샘플 이벤트 로그의 사례 식별자 비율, 추적 변형 비율, 케이스당 고유 활동의 평균 비율 및 엔드 포인트 활동 비율 및 시작 종료 활동 비율 중에서 적어도 하나를 고려해서 계산할 수 있다.At step 240, the system (100) can calculate by considering at least one of a case identifier ratio of the sample event log, a trace mutation ratio, an average ratio of unique activities per case, an endpoint activity ratio, and a start-to-finish activity ratio.

보다 구체적으로 240단계에서 시스템(100)은 아래 <수학식 1>과 <수학식 2>를 참조해서이벤트 로그의 품질을 계산할 수 있다.More specifically, at step 240, the system (100) can calculate the quality of the event log by referring to <Mathematical Formula 1> and <Mathematical Formula 2> below.

[수학식 1][Mathematical formula 1]

여기서, Here,

는 이벤트 로그(

)의 흥미도 이고,

is the event log (

) is also of interest,

는 이벤트 로그(

)의 사례 식별자 비율이고,

is the event log (

) is the case identifier ratio,

는 이벤트 로그(

)의 고유 사례 식별자의 총 수이고,

is the event log (

) is the total number of unique case identifiers,

는 이벤트 로그(

)의 총 이벤트 수이고,

is the event log (

) is the total number of events,

는 이벤트 로그(

)의 추적 변형 비율이고,

is the event log (

) is the tracking deformation ratio,

는 이벤트 로그(

)의 총 추적 변형 수이고,

is the event log (

) is the total number of tracking variations,

는 케이스당 고유 활동의 평균 비율이고,

is the average proportion of unique activities per case,

는 이벤트 로그(

)의 케이스(

)의 총 고유 활동 수 이고,

is the event log (

) case(

) is the total number of unique activities,

는 이벤트 로그(

)의 케이스(

)의 전체 이벤트이고,

is the event log (

) case(

) is the entire event,

는 엔드 포인트 활동 비율이고,

is the endpoint activity rate,

는 이벤트 로그(

)의 고유한 시작 및 종료 활동의 총 수 이고,

is the event log (

) is the total number of unique start and end activities,

는 시작 종료 활동 비율이고,

is the start-end activity ratio,

는 이벤트 로그(

)의 고유한 시작 및 종료 활동의 총 수를 나타냅니다.

is the event log (

) represents the total number of unique start and end activities.

는 이벤트 로그(

)에서 시작 및 종료 활동의 공통 요소의 총 수를 나타냅니다.

is the event log (

) represents the total number of common elements of start and end activities.

는 사례 식별자 비율에 대한 0에서 1 사이의 값을 가지는 가중치이고,

is a weight with a value between 0 and 1 for the case identifier ratio,

는 추적 변형 비율에 대한 0에서 1 사이의 값을 가지는 가중치이고,

is a weight with a value between 0 and 1 for the tracking deformation ratio,

는 케이스당 고유 활동의 평균 비율에 대한 0에서 1 사이의 값을 가지는 가중치이고,

is a weight with a value between 0 and 1 for the average proportion of unique activities per case,

는 엔드 포인트 활동 비율에 대한 0에서 1 사이의 값을 가지는 가중치이고,

is a weight with a value between 0 and 1 for the endpoint activity ratio,

는 시작 종료 활동 비율에 대한 0에서 1 사이의 값을 가지는 가중치이다.

is a weight with a value between 0 and 1 for the start-to-end activity ratio.

[수학식 2][Mathematical formula 2]

여기서,

는 다중 관점 이벤트 로그 관심도로 이벤트 맵 결과

에서 생성되는 각 이벤트 로그 관점

에 대한 이벤트 로그 관심도의 평균으로, 샘플 이벤트 로그의 품질에 해당하고,Here,

is a multi-perspective event log with an event map result as an interest map.

Each event log perspective generated from

The average of the event log interest in the sample event log, corresponding to the quality of the sample event log,

는 <수학식 1>에서 각 이벤트 로그의 관점

의 이벤트 로그 흥미도를 나타내고,

is the viewpoint of each event log in <Mathematical Formula 1>

Indicates the interest of the event log,

은 이벤트 로그 관점의 총 수 이다.

is the total number of event log views.

이때, 이벤트 맵과 생성된 이벤트 로그 관점 간의 관계는 각 이벤트 맵 관점에 대해 정확히 1개의 이벤트 로그가 생성되어

의 관계를 가질 수 있다.At this time, the relationship between the event map and the generated event log perspective is that exactly one event log is generated for each event map perspective.

can have a relationship.

그리고, 시스템(100)은 샘플 이벤트 로그의 품질 값이 기준값 이상지 여부를 확인할 수 있다(250).And, the system (100) can check whether the quality value of the sample event log is higher than the reference value (250).

250단계의 확인결과 샘플 이벤트 로그의 품질 값이 기준값 이상이면, 시스템(100)은 이벤트 맵을 기반으로 데이터 소스에서 데이터를 수집하여 다중 관점 이벤트 로그를 생성할 수 있다(270).If the quality value of the sample event log as a result of the verification in step 250 is higher than the reference value, the system (100) can collect data from the data source based on the event map to generate a multi-view event log (270).

250단계의 확인결과 샘플 이벤트 로그의 품질 값이 기준값 미만이면, 시스템(100)은 데이터 품질을 기반으로 이벤트 맵을 생성할 수 있다(260). 260단계의 구체적은 설명은 추후 도 6을 통해서 설명하고자 한다.If the quality value of the sample event log as a result of the verification in step 250 is less than the reference value, the system (100) can generate an event map based on the data quality (260). A detailed description of step 260 will be provided later with reference to FIG. 6.

그리고, 시스템(100)은 270단계에서 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 데이터 소스에서 다중 관점 이벤트 로그를 생성할 수 있다.And, the system (100) can generate a multi-view event log from a data source based on an event map generated based on data quality in step 270.

도 3을 참조하면, 시스템(100)은 추출된 메타데이터를 표준화된 엔터티 관계 데이터 모델로 변환할 수 있다(310).Referring to FIG. 3, the system (100) can convert extracted metadata into a standardized entity relationship data model (310).

그리고, 시스템(100)은 엔터티 관계 데이터 모델들을 연결하여 데이터 카탈로그를 생성할 수 있다(320). 320단계의 구체적은 설명은 추후 도 4를 통해서 설명하고자 한다.And, the system (100) can create a data catalog by connecting entity relationship data models (320). A specific description of step 320 will be explained later through FIG. 4.

그리고, 시스템(100)은 데이터 카탈로그에서 활동 개념 및 타임 스탬프 개념을 추론할 수 있다(330). 330단계의 구체적은 설명은 추후 도 5를 통해서 설명하고자 한다.And, the system (100) can infer the activity concept and the time stamp concept from the data catalog (330). A specific description of step 330 will be explained later through FIG. 5.

그리고, 시스템(100)은 데이터 카탈로그에서 사례 개념을 추론할 수 있다(340).Additionally, the system (100) can infer case concepts from the data catalog (340).

340단계에서 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들의 각 필드가 인용된 횟수와 인용된 횟수의 순위를 추론하여 사례 개념을 생성할 수 있다. 사례 개념은 엔터티 관계 데이터 모델 정보, 필드 이름, 해당 필드가 인용된 횟수 정보 및 인용된 횟수의 순위 정보를 포함할 수 있다. In step 340, the system (100) can generate a case concept by inferring the number of times each field of the entity relationship data models included in the data catalog is cited and the rank of the number of times cited. The case concept can include entity relationship data model information, a field name, information on the number of times the field is cited, and information on the rank of the number of times cited.

그리고, 시스템(100)은 활동 개념, 타임 스탬프 개념 및 사례 개념을 모두 발견했는지 확인할 수 있다(350).And, the system (100) can check whether it has discovered all of the activity concept, the time stamp concept, and the case concept (350).

350단계의 확인결과 활동 개념, 타임 스탬프 개념 및 사례 개념을 모두 발견되지 않았으면, 즉, 활동 개념, 타임 스탬프 개념 및 사례 개념 중에서 하나라도 발견되지 않은 개념이 존재하면, 시스템(100)은 도 2의 260단계로 진행하여 260단계를 수행할 수 있다.If, as a result of the verification in step 350, none of the activity concept, the time stamp concept, and the case concept are found, that is, if there is at least one concept that is not found among the activity concept, the time stamp concept, and the case concept, the system (100) can proceed to step 260 of FIG. 2 and perform step 260.

350단계의 확인결과 활동 개념, 타임 스탬프 개념 및 사례 개념을 모두 발견되었으면, 시스템(100)은 활동 개념, 타임 스탬프 개념 및 사례 개념을 연관시켜 이벤트 맵을 생성할 수 있다(360). 즉, 360단계에서 시스템(100)은 활동 개념, 타임 스탬프 개념 및 사례 개념을 모두 포함하도록 이벤트 맵을 생성할 수 있다.If all of the activity concepts, timestamp concepts, and case concepts are found as a result of the verification in step 350, the system (100) can generate an event map by associating the activity concepts, timestamp concepts, and case concepts (360). That is, in step 360, the system (100) can generate an event map to include all of the activity concepts, timestamp concepts, and case concepts.

그리고, 시스템(100)은 데이터 소스에서 샘플 데이터를 추출하고, 이벤트 맵을 기반으로 샘플 데이터에서 데이터를 추출해서 샘플 이벤트 로그를 생성할 수 있다(370).And, the system (100) can extract sample data from a data source and extract data from the sample data based on an event map to generate a sample event log (370).

그리고, 시스템(100)은 활동 개념, 타임 스탬프 개념 및 사례 개념 중에서 적어도 하나가 발견되지 않으면, 데이터 품질을 기반으로 이벤트 맵을 생성할 수 있다(380).And, the system (100) can generate an event map based on data quality if at least one of the activity concept, the time stamp concept, and the case concept is not found (380).

그리고, 시스템(100)은 데이터 품질을 기반으로 생성된 이벤트 맵을 기반으로 데이터 소스에서 다중 관점 이벤트 로그를 생성할 수 있다(390).And, the system (100) can generate a multi-view event log from a data source based on an event map generated based on data quality (390).

도 4는 본 발명의 일 실시 예에 따른 시스템에서 데이터 카탈로그를 생성하는 과정을 도시한 흐름도이다.FIG. 4 is a flowchart illustrating a process of creating a data catalog in a system according to an embodiment of the present invention.

도 4를 참조하면, 시스템(100)은 엔터티 관계 데이터 모델들에 포함된 필드 이름 유사성을 확인할 수 있다(410).Referring to FIG. 4, the system (100) can check the similarity of field names included in entity relationship data models (410).

그리고, 시스템(100)은 확인된 유사성을 이용해서 데이터 카탈로그가 비순환이 되도록 엔터티 관계 데이터 모델들을 연결할 수 있다(420).Additionally, the system (100) can connect entity relationship data models to make the data catalog acyclic by utilizing the verified similarity (420).

그리고, 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 각각에서 필수적이지 않은 선택적인 필드를 제거할 수 있다(430).Additionally, the system (100) can remove optional fields that are not essential from each of the entity relationship data models included in the data catalog (430).

도 5를 참조하면, 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들에 대한 각 필드의 데이터 유형을 분석 할 수 있다(510).Referring to FIG. 5, the system (100) can analyze the data type of each field for entity relationship data models included in the data catalog (510).

그리고, 시스템(100)은 사전 기반 패턴 일치를 사용하여 엔터티 관계 데이터 모델들의 각 필드 이름을 분석 할 수 있다(512).Additionally, the system (100) can analyze each field name of entity relationship data models using dictionary-based pattern matching (512).

그리고, 시스템(100)은 엔터티 관계 데이터 모델들에 포함된 필드 중에서 날짜 또는 시간 분석을 통해서 활동 개념과 타임 스탬프 개념을 추론할 수 있다(514). 이때, 타임 스탬프 개념은 엔터티 관계 데이터 모델 정보, 필드 이름 정보, 해당 필드의 수명 주기에 관한 정보를 포함할 수 있다. And, the system (100) can infer the activity concept and the time stamp concept through date or time analysis among the fields included in the entity relationship data models (514). At this time, the time stamp concept can include entity relationship data model information, field name information, and information about the life cycle of the corresponding field.

그리고, 시스템(100)은 엔터티 관계 데이터 모델들 각각에서 대표 타임 스템프 필드를 제외한 나머지 타임 스탬프 필드를 제거할 수 있다(516).And, the system (100) can remove all timestamp fields except the representative timestamp field from each of the entity relationship data models (516).

그리고, 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들에 중복된 열 이름을 제거할 수 있다(518).Additionally, the system (100) can remove duplicate column names in entity relational data models included in the data catalog (518).

그리고, 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들에 각각에 대해서 해당 엔터티 관계 데이터 모델에 의존하는 엔터티 관계 데이터 모델의 수를 기준으로 엔터티 관계 데이터 모델들에 각각의 중심성을 계산할 수 있다(520).And, the system (100) can calculate the centrality of each entity relationship data model included in the data catalog based on the number of entity relationship data models that depend on each entity relationship data model (520).

그리고, 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 중에서 타임 스탬프 개념 후보가 없고, 해당 엔터티 관계 데이터 모델을 의존하는 엔터티 관계 데이터 모델이 없는 엔터티 관계 데이터 모델을 제거할 수 있다(522).And, the system (100) can remove an entity relationship data model that does not have a time stamp concept candidate among the entity relationship data models included in the data catalog and does not have an entity relationship data model that depends on the entity relationship data model (522).

그리고, 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 각각에 대해서 최대 하나의 범주 필드와 숫자 필드를 가지도록 기본 키 또는 외래 키가 아니 필드 및 타임 스탬프에 해당하지 않는 필드를 제거할 수 있다(524).And, the system (100) can remove fields that are not primary keys or foreign keys and fields that do not correspond to timestamps so that each of the entity relational data models included in the data catalog has at most one category field and one numeric field (524).

그리고, 시스템(100)은 데이터 카탈로그에 포함된 엔터티 관계 데이터 모델들 각각에 대해서 타임 스템프 필드의 텍스트 유사성 분석을 통해서 타임 스탬프 개념 및 활동 수명 주기를 추론할 수 있다(526).And, the system (100) can infer the time stamp concept and activity life cycle through text similarity analysis of the time stamp field for each entity relationship data model included in the data catalog (526).

그리고, 시스템(100)은 타임 스탬프 개념에서 날짜 및 시간 패턴을 제거하고, 기본 키 또는 외래 키가 아니 필드 및 타임 스탬프에 해당하지 않는 필드를 연결하여 활동 개념을 추론할 수 있다(528).And, the system (100) can remove the date and time pattern from the timestamp concept and infer the activity concept by connecting fields that are not primary keys or foreign keys and fields that do not correspond to timestamps (528).

도 6을 참조하면, 시스템(100)은 데이터 소스로부터 데이터 품질 지표를 계산할 수 있다(610).Referring to FIG. 6, the system (100) can calculate a data quality index from a data source (610).

그리고, 시스템(100)은 데이터 소스로부터 데이터 품질 차원과 평균값을 계산할 수 있다(620).And, the system (100) can calculate data quality dimensions and average values from the data source (620).

이때, 시스템(100)은 610단계와 620단계를 Andrews, Robert, et al. "Quality-informed semi-automated event log generation for process mining." Decision Support Systems 132 (2020): 113265.을 참조해서 계산할 수 있다. At this time, the system (100) can calculate steps 610 and 620 by referring to Andrews, Robert, et al. "Quality-informed semi-automated event log generation for process mining." Decision Support Systems 132 (2020): 113265.

그리고, 시스템(100)은 분류 알고리즘을 사용하여 이벤트 개념 후보를 추론할 수 있다(630). 630단계의 구체적은 설명은 추후 도 7을 통해서 설명하고자 한다.And, the system (100) can infer event concept candidates using a classification algorithm (630). A specific description of step 630 will be described later through FIG. 7.

그리고, 시스템(100)은 이벤트 개념 후보를 기반으로 다수의 후보 이벤트 맵을 생성할 수 있다(640).And, the system (100) can generate a plurality of candidate event maps based on the event concept candidates (640).

그리고, 시스템(100)은 데이터 소스에서 샘플 데이터를 추출하고, 다수의 후보 이벤트 맵 각각을 기반으로 샘플 후보 이벤트 로그를 생성할 수 있다(650). Additionally, the system (100) can extract sample data from a data source and generate sample candidate event logs based on each of a plurality of candidate event maps (650).

그리고, 시스템(100)은 샘플 후보 이벤트 로그 각각에 대한 품질을 계산할 수 있다(660).Additionally, the system (100) can calculate the quality for each sample candidate event log (660).

그리고, 시스템(100)은 샘플 후보 이벤트 로그의 품질을 기반으로 다수의 후보 이벤트 맵 중에서 이벤트 맵을 결정할 수 있다(670).And, the system (100) can determine an event map among a plurality of candidate event maps based on the quality of the sample candidate event log (670).

670단계에서 시스템(100)은 품질이 가장 높은 샘플 후보 이벤트 로그에 대응하는 후보 이벤트 맵을 이벤트 맵을 결정하거나 또는 다수의 후보 이벤트 맵과 대응하는 샘플 후보 이벤트 로그의 품질 정보를 사용자에게 제공하고, 사용자가 선택하는 후보 이벤트 맵을 이벤트 맵을 결정할 수 있다.In step 670, the system (100) determines an event map corresponding to a sample candidate event log having the highest quality, or provides the user with quality information of sample candidate event logs corresponding to a plurality of candidate event maps, and determines an event map corresponding to a candidate event map selected by the user.

도 7을 참조하면, 그리고, 시스템(100)은 사전에 학습된 이벤트 로그를 학습 데이터로 분류 알고리즘의 입력으로 하여 생성된 분류 모델을 이용하여 데이터 소스로부터 추출되는 엔터티 관계 데이터 모델들의 각 필드에 대한 이벤트 개념을 이벤트 개념 후보로 예측할 수 있다(710).Referring to FIG. 7, the system (100) can predict event concepts for each field of entity relationship data models extracted from a data source as event concept candidates by using a classification model generated by using pre-learned event logs as learning data and as input to a classification algorithm (710).

그리고, 시스템(100)은 엔터티 관계 데이터 모델들에서 기본 키 또는 외래 키가 아니면서 이벤트 개념 필드가 아닌 필드를 제거할 수 있다(720).Additionally, the system (100) can remove fields that are not primary or foreign keys and are not event concept fields from entity relational data models (720).

그리고, 시스템(100)은 엔터티 관계 데이터 모델들에 포함된 모든 이벤트 개념 필드에 대한 데이터 품질 지표의 평균값을 평균해서 임계값으로 설정하고, 이벤트 개념 필드의 데이터 품질 지표의 평균값이 임계값 보다 낮은 이벤트 개념 필드를 제거할 수 있다(730).And, the system (100) can set the average value of the data quality indicators for all event concept fields included in the entity relationship data models as a threshold value, and remove event concept fields whose average value of the data quality indicators of the event concept fields is lower than the threshold value (730).

그리고, 시스템(100)은 엔터티 관계 데이터 모델들에 포함된 필드 이름은 동일하나 이벤트 개념 필드의 이벤트 개념 후보가 다른 필드를 같은 이벤트 개념 후보를 가지도록 수정할 수 있다(740).And, the system (100) can modify fields that have the same field name but different event concept candidates in the event concept fields included in entity relationship data models to have the same event concept candidates (740).

그리고, 시스템(100)은 엔터티 관계 데이터 모델들에 포함된 동일한 필드 이름을 가진 필드를 가장 높은 데이터 품질 지표의 평균값을 가진 필드를 제외하고 제거할 수 있다(750).And, the system (100) can remove fields with the same field name included in entity relationship data models except for the field with the highest average value of data quality indicator (750).

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices and components described in the embodiments may be implemented using one or more general-purpose computers or special-purpose computers, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing instructions and responding to them. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing device is sometimes described as being used alone, but those skilled in the art will appreciate that the processing device may include multiple processing elements and/or multiple types of processing elements. For example, the processing unit may include multiple processors, or a processor and a controller. Other processing configurations, such as parallel processors, are also possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing device to perform a desired operation or may, independently or collectively, command the processing device. The software and/or data may be permanently or temporarily embodied in any type of machine, component, physical device, virtual equipment, computer storage medium or device, or transmitted signal waves, for interpretation by the processing device or for providing instructions or data to the processing device. The software may also be distributed over network-connected computer systems, and stored or executed in a distributed manner. The software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program commands that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program commands, data files, data structures, etc., alone or in combination. The program commands recorded on the medium may be those specially designed and configured for the embodiment or may be those known to and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program commands such as ROMs, RAMs, flash memories, etc. Examples of the program commands include not only machine language codes generated by a compiler but also high-level language codes that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described above by way of limited examples and drawings, those skilled in the art may make various modifications and variations from the above description. For example, appropriate results may be achieved even if the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or are replaced or substituted by other components or equivalents.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also included in the scope of the claims described below.

Claims

Step of receiving a data source;

A step of extracting metadata from the above data source;

A step of generating an event map and a sample event log using the above metadata;

a step of calculating the quality of the above sample event log; and

If the quality value of the above sample event log is greater than or equal to the reference value, a step of collecting data from the data source based on the above event map to generate a multi-view event log

A method of generating a multi-perspective event log that includes:

In the first paragraph,

The step of generating the event map and the sample event log using the above metadata is:

A step of converting the extracted metadata into a standardized entity relationship data model;

A step of creating a data catalog by connecting the above entity relationship data models;

A step of inferring the activity concept and the timestamp concept from the above data catalog;

A step of inferring case concepts from the above data catalog;

A step of generating the event map by associating the above activity concept, the above time stamp concept and the above case concept; and

A step of extracting sample data from the above data source and extracting data from the sample data based on the above event map to generate the sample event log.

A method of generating a multi-perspective event log that includes:

In the second paragraph,

The steps for creating a data catalog by connecting the above entity relationship data models are:

A step of verifying the similarity of field names included in the above entity relationship data models;

A step of connecting the entity relationship data models so that the data catalog becomes acyclic by utilizing the verified similarity; and

A step of removing optional fields that are not essential from each of the entity relationship data models included in the above data catalog.

A method of generating a multi-perspective event log that includes:

In the second paragraph,

The step of inferring the activity concept and the timestamp concept from the above data catalog is:

A step of analyzing the data type of each field for the entity relationship data models included in the above data catalog;

A step of analyzing each field name of the above entity relationship data models using dictionary-based pattern matching;

A step of inferring the activity concept and the timestamp concept through date or time analysis among the fields included in the above entity relationship data models;

A step of removing all timestamp fields except the representative timestamp field from each of the above entity relationship data models;

A step of removing duplicate column names in the entity relationship data models included in the above data catalog;

A step of calculating the centrality of each of the entity relationship data models included in the data catalog based on the number of entity relationship data models that depend on each of the entity relationship data models;

A step of removing an entity-relationship data model that does not have a timestamp concept candidate among the entity-relationship data models included in the data catalog and does not have an entity-relationship data model that depends on the entity-relationship data model;

Steps to remove fields that are not primary or foreign keys and fields that are not timestamps so that each of the entity relationship data models included in the above data catalog has at most one category field and one numeric field:

A step of inferring the time stamp concept and activity life cycle through text similarity analysis of the time stamp field for each of the entity relationship data models included in the data catalog; and

A step of removing the date and time pattern from the above timestamp concept and inferring the activity concept by connecting the fields that are not the primary key or the foreign key and the fields that do not correspond to the timestamp.

A method of generating a multi-perspective event log that includes:

In the second paragraph,

The step of inferring the case concept from the above data catalog is:

Generate the case concept by inferring the number of times each field of the entity relationship data models included in the data catalog is cited and the rank of the number of times cited.

How to generate a multi-perspective event log.

In the second paragraph,

If at least one of the above activity concept, the above timestamp concept and the above case concept is not found, a step of generating an event map based on data quality; and

A step of generating a multi-view event log from the data source based on an event map generated based on the above data quality.

A method of generating a multi-perspective event log that includes more than one viewpoint.

In the first paragraph,

The steps for calculating the quality of the above sample event log are:

Calculated by considering at least one of the case identifier ratio, trace mutation ratio, average ratio of unique activities per case, endpoint activity ratio, and start-end activity ratio of the above sample event log.

How to generate a multi-perspective event log.

In the first paragraph,

If the quality value of the above sample event log is below the reference value, a step of generating an event map based on data quality; and

In Article 8,

The steps for generating an event map based on the above data quality are:

A step of calculating data quality indicators from the above data source;

A step of calculating data quality dimensions and average values from the above data sources;

A step of inferring event concept candidates using a classification algorithm;

A step of generating a plurality of candidate event maps based on the above event concept candidates;

A step of extracting sample data from the above data source and generating a sample candidate event log based on each of the plurality of candidate event maps;

A step of calculating the quality for each of the above sample candidate event logs; and

A step of determining the event map among the plurality of candidate event maps based on the quality of the sample candidate event logs.

A method of generating a multi-perspective event log that includes:

In Article 9,

The step of inferring the event concept candidates using the above classification algorithm is:

A step of predicting an event concept for each field of entity relationship data models extracted from the data source as an event concept candidate using a classification model generated by using pre-learned event logs as learning data and inputting the classification algorithm;

A step of removing fields that are neither primary key nor foreign key and are not event concept fields from the above entity-relationship data models;

A step of setting an average value of data quality indicators for all event concept fields included in the entity relationship data models as a threshold value, and removing an event concept field whose average value of data quality indicators is lower than the threshold value;

A step for modifying fields whose field names are the same but whose event concept candidates are different in the event concept fields included in the above entity relationship data models to have the same event concept candidates; and

Step of removing fields with the same field name included in the above entity relationship data models, except for the field with the highest average value of data quality indicator.

A method of generating a multi-perspective event log that includes:

In Article 9,

The step of determining the event map among the plurality of candidate event maps based on the quality of the sample candidate event logs is:

Determine the event map corresponding to the candidate event log of the sample candidate with the highest quality.

How to generate a multi-perspective event log.

In Article 9,

Providing the user with quality information of the sample candidate event log corresponding to the above plurality of candidate event maps, and determining the event map based on the candidate event map selected by the user.

How to generate a multi-perspective event log.

A computer-readable recording medium characterized by having recorded thereon a program for executing the method of any one of claims 1 to 12.

Memory that stores the data source; and

Processor

Including,

The above processor,

Extract metadata from the above data sources,

Using the above metadata, create an event map and sample event logs,

Calculate the quality of the above sample event log,

If the quality value of the above sample event log is higher than the reference value, data is collected from the data source based on the above event map to generate a multi-view event log.

A system that generates multi-perspective event logs.

In Article 14,

The above processor,

When generating the event map and sample event log using the above metadata,

Convert the above extracted metadata into a standardized entity relationship data model,

Create a data catalog by connecting the above entity relationship data models,

Infer the activity concept and timestamp concept from the above data catalog,

Infer case concepts from the above data catalog,

Generate the event map by associating the above activity concept, the above timestamp concept and the above case concept,

Extract sample data from the above data source, and extract data from the sample data based on the above event map to generate the sample event log.

A system that generates multi-perspective event logs.

In Article 15,

The above processor,

If at least one of the above activity concepts, timestamp concepts and case concepts is not found, an event map is generated based on data quality,

Generate multi-view event logs from the above data sources based on the event map generated based on the above data quality.

A system that generates multi-perspective event logs.

In Article 14,

The above processor,

When calculating the quality of the above sample event log,

A system that generates multi-perspective event logs.

In Article 14,

The above processor,

If the quality value of the above sample event log is below the reference value, an event map is generated based on the data quality.

A system that generates multi-perspective event logs.