KR20190087807A

KR20190087807A - Datalake framework

Info

Publication number: KR20190087807A
Application number: KR1020180006070A
Authority: KR
Inventors: 차병래; 박선; 신병춘; 김종원
Original assignee: 제노테크주식회사
Priority date: 2018-01-17
Filing date: 2018-01-17
Publication date: 2019-07-25

Abstract

본 발명은 기업이나 기관에서 비지니스 시스템을 통해 생성된 대용량 데이터를 캡처링, 처리, 분석하여 데이터를 소비하는 사용자 또는 시스템에 제공할 수 있도록 전사적 데이터레이크를 구축할 수 있는 데이터레이크 프레임워크에 관한 것이다.
본 발명은 기업 내의 업무데이터를 생성하는 업무자기기(100)와 연결되어 상기 업무데이터를 획득하기 위한 데이터획득부(20)와, 상기 데이터획득부(20)와 연결된 상기 업무자기기(100)를 분리하고 상기 획득한 업무데이터를 전달하기 위해 통신을 지원하기 위한 데이터전달부(30)와, 상기 데이터전달부(30)를 통해 전달되는 업무데이터의 전달속도를 제어하기 위한 전달속도제어부(40)와, 상기 데이터전달부(30)를 통해 전달된 업무데이터를 모델링하여 변환데이터로 변환하고 배치처리하거나 실시간처리한 후 상기 변환데이터에 대한 메타데이터를 생성하여 관리하기 위한 람다 아키텍처부(50)와, 상기 변환데이터를 보호하면서 외부로 유출되는 것을 방지하기 위해 암호화하기 위한 데이터보안부(60)와, 상기 암호화된 변환데이터를 저장하기 위한 분산스토리지부(70)와, 상기 저장된 변환데이터를 사용하기 원하는 사용자기기(200)로 상기 변환데이터를 추출하여 제공하기 위한 데이터제공부(80)를 포함하여 구성된다.The present invention relates to a data rake framework capable of constructing an enterprise data rake so that an enterprise or an organization can provide large-capacity data generated through a business system to a user or system that consumes data by capturing, processing, and analyzing .
The present invention relates to a data processing apparatus and method for a data processing system, the data processing system including a data obtaining unit connected to a work magnetic machine for generating business data within the enterprise to obtain the business data, A data transfer unit 30 for supporting communication to transfer the acquired business data and a transfer rate control unit 40 for controlling the transfer rate of business data transferred through the data transfer unit 30 A lambda architecture unit 50 for modeling business data transferred through the data transfer unit 30, converting the data into transformation data, performing batch processing or real time processing, and generating and managing metadata for the transformation data, A data security unit 60 for encrypting the converted data to protect the converted data from being leaked to the outside, A portion (70) and, the user device 200 wants to use the stored conversion data is configured to include the data provider 80 for providing extracts the transform data.

Description

Data Lake Framework {DATALAKE FRAMEWORK}

본 발명은 데이터레이크 프레임워크에 관한 것으로, 보다 상세하게는 기업이나 기관에서 비지니스 시스템을 통해 생성된 대용량 데이터를 캡처링, 처리, 분석하여 데이터를 소비하는 사용자 또는 시스템에 제공할 수 있도록 전사적 데이터레이크를 구축할 수 있는 데이터레이크 프레임워크에 관한 것이다.The present invention relates to a data rake framework. More particularly, the present invention relates to a data rake framework for capturing, processing, and analyzing large-volume data generated by a business enterprise or an organization through a business system, To the data rake framework.

최근 IT기술의 발달로 인해 기업 내에 인터넷 등의 사용이 증가하면서 많은 양의 데이터를 생산하고 소비한다.Due to recent advances in IT technology, the use of the Internet and so on in the enterprise increases and consumes a large amount of data.

이에 기업에서는 많은 양의 데이터를 저장하고 관리하기 위해 기업 데이터 구축 및 분석 시스템의 필요성을 인식하고 데이터 웨어하우스 또는 데이터 사일로 등을 구축하고 있는 추세이다.Therefore, companies are aware of the necessity of enterprise data construction and analysis system to store and manage large amount of data, and are building data warehouse or data silos.

데이터 웨어하우스는 방대한 조직 내에서 분산 운영되는 각각의 데이터 베이스 관리 시스템들을 효율적으로 통합하여 조정 및 관리하며, 효율적인 의사 결정 시스템을 위한 기초를 제공하는 실무적인 활용 방법론으로써, 관리 하드웨어, 관리 소프트웨어, 추출ㆍ변환ㆍ정렬 도구, 데이터 베이스 마케팅 시스템, 메타 데이터(meta data), 최종 사용자 접근 및 활용 도구 등으로 구성된다.The data warehouse is a practical application methodology that efficiently integrates and manages each database management system that is distributed and operated within a large organization and provides the basis for an efficient decision-making system. • Transformation and alignment tools, database marketing system, meta data, end user access and utilization tools.

이러한 데이터 웨어하우스는 공개특허 제10-2018-0000413호(공개일자: 2018년01월 03일)에 기재된 바와 같이, 소정의 업무 분야에 관한 업무 정보를 저장하는 데이터베이스로부터 데이터 웨어하우스를 생성하는 데이터 웨어하우스를 생성하는 방법에 있어서, 상기 업무 분야에 따른 모델 데이터베이스를 획득하는 단계와, 상기 모델 데이터베이스에 대하여 ETL(Extraction, Transformation & Transfer)을 수행하여 다차원 모델링 구조를 생성하는 단계와, 상기 업무 정보를 상기 모델 데이터베이스에 매핑(mapping)하여 기준 데이터베이스를 생성하는 단계와 상기 다차원 모델링 구조에 기초하여 상기 기준 데이터베이스로부터 상기 데이터 웨어하우스를 생성하는 단계를 통해 생성될 수 있다.Such a data warehouse may include a data warehouse generating data warehouse from a database storing business information on a certain business field, as disclosed in Japanese Patent Laid-open No. 10-2018-0000413 (published on Jan. 03, 2013) A method for creating a warehouse, comprising: acquiring a model database according to the business field; generating a multidimensional modeling structure by performing Extraction, Transformation & Transfer (ETL) on the model database; To a model database to generate a reference database, and generating the data warehouse from the reference database based on the multidimensional modeling structure.

그러나 데이터 웨어하우스는 데이터양의 방대함과 복잡성으로 인해 실패 위험이 있으며, 막대한 비용과 기간을 투자해야 하는 문제점이 있다.Data warehouses, however, are at risk of failures due to the sheer size and complexity of data volumes, and have the potential to invest significant time and money.

이에, 데이터 웨어하우스의 단점을 보완하기 위해 데이터 레이크를 사용하는 기업들이 많아지고 있는 추세이다.Therefore, there are a growing number of companies that use data rake to compensate for the disadvantages of data warehouses.

데이터 레이크는 일반적인 데이터베이스 구조를 먼저 정의한 다음, 이 구조에 맞는 데이터로 데이터를 채우는 대신에 모든 종류의 데이터를 저장한 다음 필요할 때 이 데이터를 필요한 형식으로 사용할 수 있게 한다.Data Rake defines a common database structure first and then stores all kinds of data instead of populating it with data that fits that structure, making it available in the required format when needed.

데이터 레이크는 모든 유형의 데이터를 어떤 규모라도 저렴한 비용으로 수집 및 저장이 가능하게 되며, 데이터 보안 및 무단 액세스 방지, 중앙 저장소에서 관련 데이터를 카탈로그화, 검색 및 발견, 새로운 유형의 데이터 분석 수행 등과 같은 장점이 있다.Data Rake enables all types of data to be collected and stored at any cost, at low cost. It can be used to prevent data security and unauthorized access, catalog, search and find relevant data from the central repository, perform new types of data analysis There are advantages.

그러나 데이터 레이크는 방대한 데이터양을 처리하기 때문에 속도가 저하되는 문제점이 있다.However, data rake has a problem of speed reduction because it processes a large amount of data.

따라서, 기업 내의 방대한 데이터를 저장하고 관리하기 위한 데이터 레이크의 장점을 구현하면서도 속도가 저하되는 것을 방지할 수 있는 프레임워크가 요구되고 있는 실정이다.Accordingly, there is a need for a framework that can prevent the speed of data from being degraded while realizing the advantages of data rake for storing and managing vast amounts of data in the enterprise.

본 발명은 상술한 문제점을 해결하기 위해 제안된 것으로, 기업이나 기관에서 비지니스 시스템을 통해 생성된 대용량 데이터를 캡처링, 처리, 분석할 수 있도록 데이터 흐름을 기준에 따라 저장하여 관리할 수 있는 데이터레이크 프레임워크를 제공하는 목적이 있다.Disclosure of Invention Technical Problem [8] The present invention has been proposed in order to solve the above-described problems, and it is an object of the present invention to provide a data rake The purpose of the framework is to provide.

또한, 데이터가 유입되는 동안 데이터를 소비하는 사용자 또는 시스템에 제공할 수 있도록 데이터 추적 가능성, 데이터 계보 및 데이터 흐름 전반의 데이터 민감도에 기반한 보안 측면에서 메타데이터를 캡처하고 관리할 수 있는 데이터레이크 프레임워크를 제공하는 목적이 있다.A Data Rake framework that can capture and manage metadata in terms of data traceability, data lineage, and security based on data sensitivity across data flows so that it can be delivered to users or systems that consume data while data is flowing And the like.

또한, 데이터를 저장할 경우 상기 데이터에 대한 암호화 유무를 선택할 수 있는 데이터레이크 프레임워크를 제공하는 목적이 있다.It is another object of the present invention to provide a data rake framework capable of selecting whether or not to encrypt data when storing data.

본 발명이 해결하려는 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다. The problems to be solved by the present invention are not limited to the above-mentioned problems, and other matters not mentioned can be clearly understood by those skilled in the art from the following description.

상기의 목적을 달성하기 위한 본 발명에 의한 데이터레이크 프레임워크는, 기업 내의 업무데이터를 생성하는 업무자기기(100)와 연결되어 상기 업무데이터를 획득하기 위한 데이터획득부(20)와, 상기 데이터획득부(20)와 연결된 상기 업무자기기(100)를 분리하고 상기 획득한 업무데이터를 전달하기 위해 통신을 지원하기 위한 데이터전달부(30)와, 상기 데이터전달부(30)를 통해 전달되는 업무데이터의 전달속도를 제어하기 위한 전달속도제어부(40)와, 상기 데이터전달부(30)를 통해 전달된 업무데이터를 모델링하여 변환데이터로 변환하고 배치처리하거나 실시간처리한 후 상기 변환데이터에 대한 메타데이터를 생성하여 관리하기 위한 람다 아키텍처부(50)와, 상기 변환데이터를 보호하면서 외부로 유출되는 것을 방지하기 위해 암호화하기 위한 데이터보안부(60)와, 상기 암호화된 변환데이터를 저장하기 위한 분산스토리지부(70)와, 상기 저장된 변환데이터를 사용하기 원하는 사용자기기(200)로 상기 변환데이터를 추출하여 제공하기 위한 데이터제공부(80)를 포함하여 구성된다.According to an aspect of the present invention, there is provided a data rake framework including a data acquiring unit (20) for acquiring the business data, the data acquiring unit (20) being connected to a business intelligent machine (100) A data transfer unit 30 for communicating with the acquiring unit 20 to transfer the acquired business data to the work magnetic susceptor 100, A transmission speed control unit 40 for controlling the transmission speed of the business data; a communication unit 40 for modeling the business data transmitted through the data transmission unit 30, converting the data into converted data, A lambda architecture unit 50 for generating and managing meta data, and a data encryption unit 50 for encrypting the encrypted data to protect the converted data from being leaked to the outside A data storage unit (70) for extracting and providing the transformed data to a user device (200) that desires to use the stored transformed data, 80).

또한, 상기 람다 아키텍처부(50)는, 상기 업무데이터에 데이터 클리닝 및 처리하고 모델링하여 변환데이터로 변환시키기 위한 위상데이터분석부(51)와, 상기 위상데이터분석부(51)에서 변환된 변환데이터를 일괄처리하기 위한 배치처리부(52)와, 상기 위상데이터분석부(51)에서 변환된 변환데이터를 실시간 처리하기 위한 실시간처리부(53)와, 상기 변환데이터의 품질을 향상시키기 위해 상기 배치처리부(52)와 실시간처리부(53)로 상기 업무데이터를 기반으로 기계학습 및 데이터 과학 처리를 지원하기 위한 데이터처리지원부(54)와, 상기 변환데이터에 대한 메타데이터를 생성하여 관리하기 위한 메타데이터부(55)를 포함하여 구성될 수 있다.The lambda architecture unit 50 includes a phase data analysis unit 51 for performing data cleaning, processing, modeling, and conversion of the business data into transformed data, A real-time processing unit 53 for real-time processing the converted data converted by the phase data analyzing unit 51, and a batch processing unit 53 for improving the quality of the converted data, A data processing support unit 54 for supporting machine learning and data science processing based on the business data to a real-time processing unit 53 and a metadata unit 52 for generating and managing metadata for the converted data, 55).

본 발명에 있어서, 상기 데이터보안부(60)는, 상기 변환데이터에 대한 보호여부를 선택하기 위한 암호화선택부(61)와, 상기 변환데이터를 보호하기 위해 암호를 설정하기 위한 암호설정부(62)와, 상기 암호화된 변환데이터의 암호를 해독하기 위한 마스터암호화키를 관리하는 암호관리부(63)를 포함하여 구성되는 것이 바람직하다.In the present invention, the data security unit 60 includes an encryption selecting unit 61 for selecting whether to protect the converted data, a password setting unit 62 for setting a password to protect the converted data, And a cryptographic management unit (63) for managing a master encryption key for decrypting the encrypted conversion data.

또한, 상기 사용자기기(200)는, 상기 데이터보안부(60) 및 상기 데이터제공부(80)와 통신하기 위한 사용자통신부(210)와, 사용자의 정보를 입력하여 사용자정보를 생성하기 위한 사용자정보부(220)와, 상기 생성된 사용자정보와 상기 사용자기기(200)를 식별하기 위한 사용자IP주소를 포함한 요청신호를 생성하여 변환데이터를 요청하기 위한 데이터요청부(230)를 포함하고, 상기 데이터보안부(60)는, 상기 사용자기기(200)로부터 수신받은 요청신호에 포함된 사용자정보가 미리 등록된 사용자정보인지 판단하여 인증하기 위한 사용자인증부(64)와, 상기 사용자인증부(64)에서 인증된 사용자정보를 통해 변환데이터의 사용권한을 부여하기 위한 권한부여부(65)와, 상기 요청신호에 포함된 사용자IP주소가 미리 설정된 IP주소범위에 포함되는지 확인하기 위한 IP주소확인부(66)를 포함하여 구성될 수 있다.The user device 200 includes a user communication unit 210 for communicating with the data security unit 60 and the data providing unit 80 and a user information unit for inputting user information and generating user information And a data request unit (230) for generating a request signal including the generated user information and a user IP address for identifying the user equipment (200) and requesting the converted data, wherein the data request unit 60 includes a user authentication unit 64 for determining whether the user information included in the request signal received from the user device 200 is previously registered user information and authenticating the user information, (65) for granting a right to use conversion data through user information, an IP address verification unit (65) for checking whether a user IP address included in the request signal is included in a preset IP address range, And can be configured to include an indentation (66).

상술한 바와 같이 본 발명에 따르면, 기업이나 기관에서 비지니스 시스템을 통해 생성된 대용량 데이터를 데이터 흐름 기준에 따라 캡처링, 처리, 분석, 저장함으로 인해 GPU 및 SSD의 속도를 향상시킬 수 있을 뿐만 아니라 과부하를 방지할 수 있는 효과가 있다.As described above, according to the present invention, it is possible not only to improve the speed of the GPU and the SSD by capturing, processing, analyzing and storing the large-capacity data generated by the business system in accordance with the data flow standard, Can be prevented.

또한, 데이터가 유입되는 동안 데이터 추적 가능성, 데이터 계보 및 데이터 흐름 전반의 데이터 민감도에 기반한 보안 측면에서 메타데이터를 캡처하고 관리할 수 있어 데이터를 소비하는 사용자 또는 시스템이 원하는 데이터를 효율적으로 추출할 수 있음은 물론 보안을 향상시킬 수 있는 효과가 있다.In addition, metadata can be captured and managed from the perspective of data traceability, data lineage, and security based on data sensitivity across data flows during data entry, enabling users or systems consuming data to efficiently extract desired data. There is, of course, the effect of improving security.

또한, 데이터를 저장할 경우 상기 데이터에 대한 암호화 유무를 선택할 수 있어 외부로 유출되는 것을 미연에 방지할 수 있는 효과가 있다.In addition, when data is stored, it is possible to select whether or not the data is encrypted, thereby preventing the data from being leaked to the outside.

도 1은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크,
도 2는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 물리적 자원,
도 3은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 클라우드 버스팅과 클라우드 스패닝,
도 4는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 람다 아키텍처부,
도 5는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 데이터보안부,
도 6은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 사용자기기,
도 7은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크로 구축된 데이터레이크에 데이터를 저장하는 순서도,
도 8은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크로 구축된 데이터레이크에서 데이터를 제공하는 순서도.1 is a block diagram of a data rake framework according to an embodiment of the present invention,
FIG. 2 is a diagram illustrating a physical resource of a data rake framework according to an exemplary embodiment of the present invention,
FIG. 3 is a flow chart illustrating the cloud busting and cloud spanning of the data rake framework according to an embodiment of the present invention,
4 is a block diagram illustrating a lambda architecture part of a data rake framework according to an embodiment of the present invention,
FIG. 5 is a block diagram illustrating a data security unit of a data rake framework according to an exemplary embodiment of the present invention,
6 is a block diagram illustrating a user equipment of a data rake framework according to an embodiment of the present invention,
7 is a flowchart for storing data in a data rake constructed by a data rake framework according to an embodiment of the present invention;
8 is a flowchart of providing data in a data rake constructed with a data rake framework according to an embodiment of the present invention;

이하, 첨부된 도면을 참조하여 본 발명에 의한 데이터레이크 프레임워크를 상세히 설명한다.Hereinafter, a data rake framework according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크이고, 도 2는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 물리적 자원이며, 도 3은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 클라우드 버스팅과 클라우드 스패닝이다.FIG. 1 is a data rake framework according to an embodiment of the present invention. FIG. 2 is a physical resource of a data rake framework according to an embodiment of the present invention. Cloudbusting of the framework and cloud spanning.

또한, 도 4는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 람다 아키텍처부이며, 도 5는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 데이터보안부이고, 도 6은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 사용자기기이다.5 is a data security unit of a data rake framework according to an embodiment of the present invention. FIG. 6 is a block diagram of a data rake framework according to an embodiment of the present invention. And is a user device of the data rake framework according to the embodiment.

한편, 도 7은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크로 구축된 데이터레이크에 데이터를 저장하는 순서도이고, 도 8은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크로 구축된 데이터레이크에서 데이터를 제공하는 순서도이다.FIG. 7 is a flowchart for storing data in a data rake constructed by the data rake framework according to an embodiment of the present invention. FIG. 8 is a flowchart illustrating a method for storing data in a data rake framework constructed by a data rake framework according to an embodiment of the present invention. Which is a flowchart for providing data in FIG.

상기 도면의 구성 요소들에 인용부호를 부가함에 있어서, 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 동일한 부호를 가지도록 하고 있으며, 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 공지 기능 및 구성에 대한 상세한 설명은 생략한다. 또한, '상부', '하부', '앞', '뒤', '선단', '전방', '후단' 등과 같은 방향성 용어는 개시된 도면(들)의 배향과 관련하여 사용된다. 본 발명의 실시 예의 구성요소는 다양한 배향으로 위치설정될 수 있기 때문에 방향성 용어는 예시를 목적으로 사용되는 것이지 이를 제한하는 것은 아니다.In the drawings, the same reference numerals are given to the same elements even when they are shown in different drawings. In the drawings, the same reference numerals as used in the accompanying drawings are used to designate the same or similar elements. And detailed description of the configuration will be omitted. Also, directional terms such as "top", "bottom", "front", "back", "front", "forward", "rear", etc. are used in connection with the orientation of the disclosed drawing (s). Since the elements of the embodiments of the present invention can be positioned in various orientations, the directional terminology is used for illustrative purposes, not limitation.

본 발명의 바람직한 일실시 예에 의한 데이터레이크 프레임워크는, 상기 도 1에 도시된 바와 같이, 기업 내의 업무데이터를 생성하는 다수의 업무자기기(100)와 연결되어 상기 업무자데이터를 획득하기 위한 데이터획득부(20)와, 상기 데이터획득부(20)에서 획득한 업무데이터를 전달하기 위해 통신을 지원하기 위한 데이터전달부(30)와, 상기 데이터전달부(30)를 통해 전달되는 업무데이터의 전달속도를 제어하기 위한 전달속도제어부(40)와, 상기 데이터전달부(30)를 통해 전달된 업무데이터를 모델링하여 변환데이터로 변환한 후 배치처리하거나 실시간처리하기 위한 람다 아키텍처부(50)와, 상기 변환데이터를 암호화하여 보호하기 위한 데이터보안부(60)와, 상기 암호화된 변환데이터를 저장하기 위한 분산스토리지부(70)와, 상기 저장된 변환데이터를 사용하길 원하는 사용자기기(200)로 상기 변환데이터를 제공하기 위한 데이터제공부(80)를 포함하여 구성될 수 있다.As shown in FIG. 1, the data rake framework according to an embodiment of the present invention includes a plurality of task mirrors 100 for generating task data in an enterprise, A data acquiring unit 20 for acquiring the business data acquired by the data acquiring unit 20, a data transfer unit 30 for supporting communication to transfer the business data acquired by the data acquiring unit 20, A lambda architecture unit 50 for modeling business data transferred through the data transfer unit 30, converting the data into converted data, and batch processing or real-time processing the data, A data security unit 60 for encrypting and protecting the converted data, a distributed storage unit 70 for storing the encrypted converted data, And a data providing unit 80 for providing the conversion data to the user device 200 that desires to transmit the converted data.

상기와 같이 데이터레이크 프레임워크를 구성함으로써 기업이나 기관에서 비지니스 시스템을 통해 생성된 대용량 데이터를 캡처링, 처리, 분석하여 사용자 또는 데이터를 소비하는 시스템에 제공할 수 있도록 전사적 데이터레이크를 구축할 수 있게 된다.By constructing the data rake framework as described above, it is possible to construct an enterprise data rake so that a corporation or an organization can capture, process, and analyze large-scale data generated through a business system and provide it to a user or a system consuming data do.

상기 업무자기기(100)는, 기업이나 기관에서 비지니스 시스템을 이용하여 업무 정보가 포함된 업무데이터를 생성할 수 있다. The business entity 100 can generate business data including business information by using a business system in an enterprise or an organization.

상기 업무데이터를 생성하는 다수의 업무자기기(100)로부터 생성된 대용량의 업무데이터를 저장하고 관리하기 위해 데이터레이크로 전송된다.And is transmitted to the data rake to store and manage a large volume of business data generated from a plurality of business machines 100 that generate the business data.

상기 데이터레이크는 상기 도 2 및 도 3에 도시된 바와 같이 논리적 기능을 지원하기 위한 컴퓨팅 자원(11), 스토리지 자원(12), 네트워킹 자원(13) 등과 같은 물리적 자원(10)이 구성된다.As shown in FIGS. 2 and 3, the data rake includes physical resources 10 such as a computing resource 11, a storage resource 12, a networking resource 13, and the like for supporting a logical function.

상기 컴퓨팅 자원(11)은 CPU, GPU, 외부의 멀티 클라우드로 구분될 수 있다. 이때, 상기 CPU는 여러개의 코어들로 구성되며, 외부의 멀티 클라우드는 클라우드 버스팅에 의해 제공된다.The computing resource 11 may be divided into a CPU, a GPU, and an external multi-cloud. At this time, the CPU is composed of several cores, and the external multi-cloud is provided by cloud bursting.

상기 스토리지 자원(12)은 RAM/ROM, HDD(Hard Disk Drive), SSHD(Solid State Hybrid Drive), SSD(Solid State Drive), 클라우드 스토리지, SDS(Software-Defined Storage) 등으로 구성될 수 있다. 여기서, 상기 클라우드 스토리지와 SDS는 클라우드 스패닝에 의해 제공된다.The storage resource 12 may include a RAM / ROM, a hard disk drive (HDD), a solid state hybrid drive (SSHD), a solid state drive (SSD), a cloud storage, and a software-defined storage (SDS). Here, the cloud storage and SDS are provided by cloud spanning.

상기 네트워킹 자원(13)은 스위치, 라우터, 브릿지 등과 논리적으로 SDN(Software-Defined Networking)과 가상네트워킹 기술 등으로 구성될 수 있다.The networking resource 13 may logically be configured with a switch, a router, a bridge, etc., and may be configured with software-defined networking (SDN) and virtual networking technology.

상기와 같이 물리적 자원이 구성되어 논리적 기능을 지원할 수 있다.As described above, physical resources can be configured to support logical functions.

상기 데이터획득부(20)는, 상기 업무자기기(100)와 연결되어 상기 업무데이터를 수신받아 획득한다. 즉, 상기 데이터획득부(20)는 인터넷, 클라우드 컴퓨팅 등을 통해 상기 업무자기기(100)와 연결되어 상기 업무자기기(100)로부터 상기 업무데이터를 수신받아 획득한다.The data acquiring unit 20 is connected to the task console 100 and receives and acquires the task data. That is, the data acquisition unit 20 is connected to the task server 100 via the Internet, cloud computing, etc., and receives the task data from the task server 100 to acquire the task data.

이때, 상기 데이터획득부(20)는 상기 업무데이터를 추가 처리할 수 있는 메시지로 변환할 수 있다.At this time, the data obtaining unit 20 may convert the business data into a message that can be additionally processed.

다시 말해, 상기 데이터획득부(20)에서 상기 업무데이터를 획득한 후 추가 처리하기 위해서는 상기 획득된 업무데이터를 파싱하고, 상기 파싱된 업무데이터를 변환한 후 메시지를 발간하여 버퍼메모리에 임시로 저장시키게 된다.In other words, in order to perform additional processing after acquiring the business data in the data acquiring unit 20, the acquired business data is parsed, the parsed business data is converted, and a message is generated and temporarily stored in the buffer memory .

상기 데이터획득부(20)에서 상기 획득된 업무데이터를 메시지를 변환하기 위해 다양한 스키마 사양을 수용할 수 있는 유연성이 있어야 한다. 즉, 데이터는 구조화된 데이터, 준-구조화된 데이터, 비구조화된 데이터로 분류될 수 있으며, 모든 유형의 데이터가 스키마로 정의하기 어렵기 때문에 상기 데이터획득부(20)는 획득한 업무데이터에 대한 다양한 스키마 사양을 수용할 수 있는 유연성이 있어야 한다.The data obtaining unit 20 should be flexible enough to accommodate various schema specifications in order to convert the obtained business data into messages. That is, since data can be classified into structured data, quasi-structured data, and unstructured data, and all types of data can not be defined by a schema, the data acquiring unit 20 acquires It should be flexible enough to accommodate various schema specifications.

또한, 상기 데이터획득부(20)는 추가 처리하기 위해 변환된 업무데이터의 메시지를 데이터레이크에 원활하게 푸시할 수 있는 빠른 연결 메커니즘이 있어야 한다.In addition, the data acquiring unit 20 must have a quick connection mechanism to smoothly push the converted business data message to the data rake for further processing.

상기 데이터획득부(20)를 통해 획득된 업무데이터를 메시지로 변환한 후 상기 데이터전달부(30)로 전송된다.And converts the business data acquired through the data acquiring unit 20 into a message and transmits the message to the data transfer unit 30. [

상기 데이터전달부(30)는, 상기 데이터획득부(20)와 연결된 상기 업무자기기(100)를 분리하면서 상기 획득한 업무데이터를 상기 람다 아키텍처부(50)로 전달하기 위한 통신을 지원한다.The data transfer unit 30 supports communication for transferring the obtained business data to the lambda architecture unit 50 while separating the business logic unit 100 connected to the data acquisition unit 20.

다시 말해, 상기 데이터전달부(30)는 상기 데이터획득부(20)와 상기 업무자기기(100) 사이의 연결을 분리하여 불필요한 데이터가 유입되는 것을 방지하여 주며, 상기 데이터획득부(20)로부터 수신받은 메시지로 변환된 업무데이터의 전달을 보장하게 된다.In other words, the data transfer unit 30 separates the connection between the data acquisition unit 20 and the work magnetic unit 100 to prevent unnecessary data from being input, and the data acquisition unit 20 The transmission of the business data converted into the received message is guaranteed.

상기 데이터전달부(30)는 상기 메시지로 변환된 업무데이터의 전달을 보장하기 위해 메시지의 지속성을 갖고 있어야 하며, 상기 메시지의 지속성은 스토리지 매체에서 지원할 수 있다.The data transfer unit 30 must have persistence of the message in order to guarantee transfer of the business data converted into the message, and the persistence of the message can be supported by the storage medium.

또한, 상기 데이터전달부(30)는 1:1 통신을 위한 큐(Queue)와 1:n의 게시와 구독을 위한 토픽 메시징(Topic messaging) 구조의 통신을 지원하게 된다.In addition, the data delivery unit 30 supports communication of a queue for 1: 1 communication and a topic messaging structure for 1: n posting and subscription.

상기 데이터전달부(30)로 인해 상기 데이터획득부(20)에서 획득된 업무데이터가 전달되는 동안 변형되는 것을 방지하여 줄 수 있게 된다.The data transfer unit 30 can prevent the business data acquired by the data acquisition unit 20 from being modified while being transferred.

상기 전달속도제어부(40)는, 상기 데이터전달부(30)를 통해 상기 람다 아키텍처부(50)로 전달되는 업무데이터의 전달속도를 제어한다.The transfer rate control unit 40 controls the transfer rate of the task data transferred to the lambda architecture unit 50 through the data transfer unit 30.

상기 전달속도제어부(40)는 다양한 부하조건에서 확장 가능한 고확장성을 확보하여야 하며, 상기 고확장성을 확보함으로 인해 다양한 데이터를 고속으로 처리할 수 있게 된다.The transmission speed control unit 40 must ensure high scalability that can be expanded under various load conditions, and it can process various data at a high speed by securing the high expandability.

상기 전달속도제어부(40)는 고장 안전(fail-safety)이나 시스템 대체 작동(fail-over)의 내결함성을 제공할 수 있으며, 멀티 스레드와 멀티 이벤트 실행을 지원하여 복수의 처리를 병행할 수 있다.The transfer rate control unit 40 may provide fail-safe or fail-over fault tolerance, and may support a plurality of processes by supporting multi-thread and multi-event execution.

상기 고장 안전(fail-safety)는 데이터레이크에 고장이 생기거나 조작을 잘못하였을 경우에 치명적인 결과에 이르지 않도록 방지하기 위한 것이며, 상기 시스템 대체 작동(fail-over)은 데이터레이크의 작동이 정지되면 예비 장치가 자동으로 대체 작동하도록 하기 위한 것이다.The fail-safety is intended to prevent a catastrophic result in the event of a failure or malfunction of the data rake, and the fail-over of the system can be avoided if the operation of the data rake is stopped, To allow the device to automatically work alternately.

또한, 상기 전달속도제어부(40)는 상기 업무데이터의 구조를 타겟 데이터 포맷으로 변환할 수 있으며, 상기 업무데이터의 추가 처리를 위해 전달된 데이터의 무결한 형태를 제공할 수 있다. 즉, 상기 전달속도제어부(40)는 상기 업무데이터를 상기 람다 아키텍처부(50)에서 데이터를 처리할 경우에 필요한 타켓 데이터 포멧으로 변환할 수 있으며, 상기 업무데이터의 무결한 형태를 제공함으로써 상기 업무데이터에 부적절한 정보 등이 입력되는 것을 방지할 수 있게 된다.Also, the delivery rate control unit 40 may convert the structure of the business data into a target data format, and may provide a seamless form of the transferred data for further processing of the business data. That is, the transfer rate control unit 40 can convert the task data into a target data format required when processing data in the lambda struc- ture unit 50, and by providing a seamless form of the task data, It is possible to prevent inappropriate information from being input to the data.

상기 데이터전달부(30)를 통해 전달되는 업무데이터를 수신받은 람다 아키텍처부(50)는, 상기 도 4에 도시된 바와 같이, 상기 업무데이터를 모델링하여 변환데이터로 변환한 후 한정된 데이터의 처리 방식인 배치처리하거나 범위가 한정되지 않고 끊임없이 흘러가는 데이터의 처리방식인 실시간처리하게 된다.As shown in FIG. 4, the lambda architecture unit 50 receives the business data transmitted through the data transfer unit 30, converts the business data into transformation data, In real-time processing, which is a method of processing data that is constantly flowing without being limited in range.

여기서, 상기 람다 아키텍처부(50)는 상기 배치처리와 실시간처리 중 하나를 선택하여 처리할 수 있지만, 추가적으로 배치처리 또는 실시간처리가 필요한 경우 배치 데이터와 실시간 데이터의 병합 문제를 해결하기 위해 필요하게 된다.Here, the lambda architecture unit 50 can select and process one of the batch processing and real-time processing, but it is necessary to solve the problem of merging batch data and real-time data in addition to batch processing or real- .

상기 람다 아키텍처부(50)는 상기 업무데이터를 모델링하기 위한 위상데이터분석부(51)와, 상기 모델링된 업무데이터를 일괄처리하기 위한 배치처리부(52)와, 상기 모델링된 업무데이터를 실시간 처리하기 위한 실시간처리부(53)와, 상기 모델링된 업무데이터를 처리하기 위해 상기 배치처리부(52)와 실시간처리부(53)로 기계학습 및 데이터 과학 처리를 지원하기 위한 데이터처리지원부(54)와, 상기 모델링된 업무데이터에 메타데이터를 생성하기 위한 메타데이터부(55)를 포함하여 구성될 수 있다.The lambda architecture unit 50 includes a phase data analysis unit 51 for modeling the business data, a batch processing unit 52 for collectively processing the modeled business data, A data processing support unit 54 for supporting machine learning and data science processing to the batch processing unit 52 and the real time processing unit 53 for processing the modeled business data, And a metadata unit 55 for generating metadata in the business data.

상기 위상데이터분석부(51)는, 위상데이터분석을 통해 상기 데이터전달부(30)에서 전달받은 업무데이터를 모델링된 데이터인 변환데이터로 변환한다.The phase data analyzer 51 converts the business data received from the data transmitter 30 into the transform data, which is modeled data, through phase data analysis.

즉, 상기 위상데이터분석부(51)는 상기 업무데이터에 데이터 클리닝 및 처리하고, 데이터 모델링 알고리즘을 적용함으로써 상기 업무데이터를 모델링하여 변환데이터로 변환한다.That is, the phase data analyzer 51 models the business data by applying data modeling algorithm to the business data, and converts the business data into conversion data.

상기 배치처리부(52)는, 시스템 자원의 최적 활동을 보장할 수 있으며, 상기 데이터전달부(30)를 통해 수신된 업무데이터에 대해 일괄 처리를 수행하면서 모델링되어 변환된 변환데이터에 대한 높은 품질의 출력 하기 위해 장기 실행 작업에 적용할 수 있다.The batch processing unit 52 is capable of ensuring optimal activity of the system resources and is capable of performing batch processing on the business data received through the data transfer unit 30, It can be applied to long-running tasks to output.

또한, 상기 배치처리부(52)는 일괄처리를 복구하기 위해 재생 또는 반환하기 위한 메커니즘을 제공할 수 있게 되며, 고품질의 변환데이터를 생성하기 위해 상기 업무데이터를 기반으로 기계학습 및 데이터 과학 처리를 지원해야 한다.In addition, the batch processing unit 52 can provide a mechanism for reproducing or returning to restore batch processing, and supports machine learning and data science processing based on the job data to generate high-quality converted data Should be.

또한, 상기 배치처리부(52)는 상기 변환데이터의 데이터 품질을 향상시키기 위해 중복 제거, 오류 데이터 검출 및 데이터 계보의 뷰를 제공할 수 있다.In addition, the batch processor 52 may provide a view of deduplication, error data detection, and data lineage to improve the data quality of the transformed data.

상기 실시간처리부(53)는, 상기 배치처리부(52)와 분리되며, 상기 업무데이터를 실시간 처리하기 위한 것이다.The real-time processing unit 53 is separate from the batch processing unit 52 and is for real-time processing of the business data.

상기 실시간처리부(53)는 실시간 처리와 관련된 데이터 모델을 생성할 수 있으며 모든 장기 실행 프로세스들이일괄 처리 모드로 위임되어 있어야 하며, 상기 업무데이터가 미처리 및 산적되지 않도록 고속 접근과 저장소의 지원이 필요하게 된다.The real-time processing unit 53 can generate a data model related to real-time processing, all long-term execution processes must be delegated to a batch processing mode, and fast access and storage support are required so that the business data is not processed and accumulated do.

이때, 상기 실시간처리부(53)는 상기 배치처리부(52)에서 일괄 처리된 데이터 세트와 병합할 수 있는 방식으로 출력 모델을 생성해야 한다. 다시 말해, 상기 실시간처리부(53)는 상기 배치처리부(52)에서 일괄 처리된 데이터 세트와 병합할 수 있도록 출력 모델을 생성함으로써 추후 사용자기기(200)로 데이터를 제공할 경우 보강된 데이터를 제공할 수 있게 된다.At this time, the real-time processing unit 53 must generate an output model in a manner that can be merged with the batch data set processed in the batch processing unit 52. [ In other words, the real-time processing unit 53 generates an output model so as to be merged with the batch data set processed in the batch processing unit 52, thereby providing enhanced data when providing data to the user equipment 200 in the future .

상기 실시간처리부(53)를 구성함으로써 상기 업무자기기(100)로부터 다량의 업무데이터가 전달되더라도 데이터 스트림에 대해 빠른 작업을 수행할 수 있게 된다.By configuring the real-time processing unit 53, it is possible to perform a quick operation on the data stream even though a large amount of business data is transferred from the work client machine 100.

상기 데이터처리지원부(54)는, 상기 배치처리부(52)와 실시간처리부(53)로 상기 업무데이터를 기반으로 한 기계 학습 및 데이터 과학 처리를 지원한다.The data processing support unit 54 supports the machine learning and data science processing based on the business data by the batch processing unit 52 and the real time processing unit 53.

상기 데이터처리지원부(54)로 인해 상기 배치처리부(52)와 실시간처리부(53)에서 처리되는 변환데이터의 품질을 대폭 향상시킬 수 있게 된다.The quality of the converted data processed in the batch processing unit 52 and the real-time processing unit 53 can be greatly improved due to the data processing support unit 54.

또한, 상기 데이터처리지원부(54)는 추후 상기 배치처리 또는 실시간처리되어 품질이 향상된 변환데이터를 상기 분산스토리지부(70)에 저장하고, 상기 사용자기기(200)로 데이터를 제공하기 위해 상기 데이터제공부(80)로 전달되게 된다.In addition, the data processing support unit 54 stores the converted data, which has been improved in quality by the batch processing or real time processing, in the distributed storage unit 70, (80).

상기 메타데이터부(55)는, 상기 변환데이터에 대한 메타데이터를 생성하여 저장하고 관리하며 검색할 수 있도록 메타데이터를 관리한다.The metadata unit 55 manages the metadata to generate, store, manage, and retrieve metadata about the transformed data.

상기 메타데이터는 데이터에 관한 구조화된 데이터로, 대량의 정보 가운데에서 찾고 있는 정보를 효율적으로 찾아내서 이용하기 위해 일정한 규칙에 따라 콘텐츠에 대하여 부여되는 데이터이다. 즉 상기 메타데이터부(55)에서는 추후 사용자기기(200)로부터 데이터 요청이 수신되었을 경우 데이터레이크에 저장된 다수의 변환데이터 중에서 요청된 데이터를 추출하기 위해 각각의 변환데이터에 대한 메타데이터를 생성하고 관리하게 된다.The metadata is structured data on data, and is data assigned to a content according to a certain rule in order to efficiently find and use the information found in a large amount of information. That is, when the data request is received from the user equipment 200, the metadata unit 55 generates and manages metadata for each transformed data in order to extract requested data from a plurality of transformed data stored in the data rake, .

상기 메타데이터부(55)에서 S/W의 CI(Content Integration)과 CD(Content Delivery)와 CD(Content Deployment)를 제공할 수 있다. 즉, 상기 메타데이터부(55)는 S/W의 내용통합, 콘텐츠 전달, 내용 배치 기능을 제공할 수 있다.The metadata unit 55 can provide CI (Content Integration), CD (Content Delivery) and CD (Content Deployment) of the S / W. That is, the metadata unit 55 can provide content integration, content delivery, and content placement functions of the S / W.

또한, 상기 메타데이터부(55)는 상기 컴퓨팅 자원과 스토리지 자원을 제공하기 위해 클라우드 버스팅과 클라우드 스패닝을 제공하게 된다.In addition, the metadata unit 55 provides cloud busting and cloud spanning to provide the computing resources and storage resources.

상기 클라우드 버스팅은 하이브리드 클라우드(혼합형 클라우드) 환경에서 사용되는 응용 프로그램 배포 모델이며, 상기 업무자기기(100)의 컴퓨팅 용량을 초과하면 초과 수요로 인해 퍼블릭 클라우드로 자동 전송되어 응용 프로그램이 계속 실행될 수 있도록 한다.The cloud bursting is an application distribution model used in a hybrid cloud (hybrid cloud) environment. When the computing capacity of the workflow server 100 is exceeded, the cloud bursting is automatically transferred to the public cloud due to excess demand, .

상기 클라우드 스패닝은 많은 컴퓨팅 자원들을 필요로 하는 응용 프로그램 구성 요소가 여러 클라우드 환경에서 동시에 배포되도록 하는 전달 모델이며, 여러 대의 컴퓨터를 연결하여 상호 협력하도록 할 수 있다.The cloud spanning is a delivery model in which application program components requiring a large amount of computing resources are simultaneously distributed in a plurality of cloud environments, and a plurality of computers can be connected and cooperated with each other.

상기 람다 아키텍처부(50)에서 상기 업무데이터가 처리되어 생성된 변환데이터를 보호하기 위해 데이터보안부(60)로 전달된다.In the lambda architecture unit 50, the business data is processed and transferred to the data security unit 60 to protect the generated conversion data.

상기 데이터보안부(60)는, 상기 도 5에 도시된 바와 같이, 상기 변환데이터를 보호하기 위해 암호화시키게 된다. 즉, 상기 데이터보안부(60)는 상기 변환데이터를 암호화시켜 보호함으로써 대용량 데이터 분석을 활용하여 의사 결정을 하기 위한 사용자가 증가함에 따라 발생하는 보안 사고를 미연에 방지할 수 있게 된다.The data security unit 60 encrypts the converted data to protect the data as shown in FIG. That is, the data security unit 60 protects the converted data by encrypting it, thereby preventing a security incident that occurs due to an increase in the number of users who make a decision using the large-capacity data analysis.

상기 데이터보안부(60)는 상기 변환데이터에 대한 보호여부를 선택하기 위한 암호화선택부(61)와, 상기 암호화선택부(61)를 통해 상기 변환데이터의 보호를 선택할 경우 상기 변환데이터에 대한 암호를 설정하기 위한 암호설정부(62)와, 상기 변환데이터의 암호를 해독하기 위한 마스터암호화키를 관리하는 암호관리부(63)를 포함할 수 있다.The data security unit 60 includes an encryption selection unit 61 for selecting whether or not to protect the conversion data, and a decryption unit 62 for decrypting the conversion data, And a password management unit 63 for managing a master encryption key for decrypting the converted data.

상기 암호화선택부(61)는, 상기 변환데이터를 저장하기 전에 상기 변환데이터의 중요도에 따라 상기 변환데이터에 대한 보호여부를 판단하여 암호화의 진행여부를 선택할 수 있다.The encryption selection unit 61 may determine whether to protect the transformed data according to the degree of importance of the transformed data before storing the transformed data, and may select whether to proceed with the encryption.

상기 암호화선택부(61)에서 상기 변환데이터를 보호하지 않는 것으로 선택될 경우 상기 변환데이터는 분산스토리지부(70)로 저장된다.When the encryption selection unit 61 does not protect the conversion data, the conversion data is stored in the dispersion storage unit 70. [

반면, 상기 암호화선택부(61)에서 상기 변환데이터를 보호하는 것으로 선택할 경우 상기 변환데이터가 상기 분산스토리지부(70)로 저장되기 전에 상기 암호설정부(62)를 통해 암호화하게 된다.On the other hand, if the encryption selecting unit 61 selects to protect the converted data, the encrypted data is encrypted through the encryption setting unit 62 before the converted data is stored in the distributed storage unit 70.

상기 암호설정부(62)는, 상기 암호화선택부(61)에서 상기 변환데이터를 암호화하는 것으로 선택될 경우 상기 변환데이터에 대한 암호를 설정하게 된다. 상기 암호설정부(62)에서 상기 변환데이터를 암호화하기 위한 암호를 직접 입력하거나 자동으로 입력될 수 있다.The encryption setting unit 62 sets a password for the conversion data when the encryption selection unit 61 is selected to encrypt the conversion data. The password for encrypting the conversion data may be directly input or automatically input by the encryption setting unit 62. [

상기 암호설정부(62)를 통해 설정된 상기 변환데이터의 암호는 상기 암호관리부(63)에서 관리될 수 있다.The encryption of the conversion data set through the encryption setting unit 62 can be managed by the encryption management unit 63. [

상기 암호관리부(63)는 상기 변환데이터의 암호를 해독하기 위한 마스터암호화키를 관리할 수 있다. 상기 마스터암호화키는 암호화된 다수의 변환데이터를 해독할 수 있게 된다.The encryption management unit 63 can manage a master encryption key for decrypting the encrypted data. The master encryption key can decrypt a plurality of encrypted conversion data.

상기 암호관리부(63)에서 상기 마스터암호화키를 관리함으로써 상기 변환데이터의 암호를 해독할 수 있게 된다.The cryptographic management unit 63 can decrypt the cryptographic data by managing the master encryption key.

한편, 상기 데이터보안부(60)는 상기 업무자기기(100)로부터 상기 데이터획득부(20)로 업무데이터가 전송되는 동안에는 표준 TLS(전송 계층 보안) 프로토콜을 이용할 수 있다. 즉, 상기 데이터보안부(60)는 표준 TLS(전송 계층 보안) 프로토콜을 이용하여 상기 업무자기기(100)에서 상기 데이터획득부(20)로 전송되는 업무데이터를 보호할 수 있어 상기 업무데이터가 전송되는 동안 외부로 유출되는 것을 방지할 수 있게 된다.Meanwhile, the data security unit 60 may use a standard TLS (Transport Layer Security) protocol while the business data is transmitted from the task console 100 to the data acquisition unit 20. That is, the data security unit 60 can protect business data transmitted from the business entity 100 to the data acquisition unit 20 using a standard TLS (Transport Layer Security) protocol, It is possible to prevent the liquid from leaking to the outside.

상기 데이터보안부(60)로 인해 인증되지 않은 사용자기기(200)로 데이터가 유출되는 것을 방지할 수 있게 된다. 즉, 상기 데이터보안부(60)로 인해 상기 변환데이터가 상기 분산스토리지부(70)로 영구 저장되기 전에 암호화되고, 검색하기 전에 데이터를 해독하기 때문에 데이터를 액세스하는 사용자기기(200)는 투명성을 제공하며, 사용자기기(200)가 데이터를 암호화하거나 해독하기 위해 코드 변경 등을 하지 않아도 되어 사용자들에게 편리함을 제공할 수 있게 된다.It is possible to prevent the data from being leaked to the unauthenticated user equipment 200 due to the data security unit 60. [ That is, since the converted data is encrypted before being permanently stored in the distributed storage unit 70 by the data security unit 60, and the data is decrypted before being retrieved, the user device 200 accessing the data provides transparency And the user device 200 does not need to change the code in order to encrypt or decrypt the data, thereby providing convenience to the users.

또한, 상기 데이터보안부(60)는 상기 변환데이터를 요청하는 사용자기기(200)를 인증하게 된다. 즉, 상기 데이터보안부(60)는 상기 변환데이터를 요청하는 사용자기기(200)가 상기 변환데이터를 전송받을 권리가 있는지 확인할 수 있다.Also, the data security unit 60 authenticates the user device 200 requesting the conversion data. That is, the data security unit 60 can check whether the user device 200 requesting the conversion data has the right to receive the conversion data.

이때, 상기 데이터보안부(60)는 상기 사용자기기(200)로부터 수신받은 사용자정보를 인증하기 위한 사용자인증부(64)와, 상기 사용자인증부(64)에서 인증된 사용자정보를 통해 변환데이터의 사용권한을 부여하기 위한 권한부여부(65)와, 상기 사용자기기(200)의 IP주소를 확인하기 위한 IP주소확인부(66)를 포함하여 구성될 수 있다.The data security unit 60 includes a user authentication unit 64 for authenticating the user information received from the user device 200 and a user authentication unit 64 for using the converted data And an IP address verifying unit 66 for verifying the IP address of the user device 200. The IP address verification unit 66 determines whether the IP address of the user device 200 is authorized.

이때, 상기 데이터보안부(60)는 상기 업무자기기(100) 또는 상기 사용자기기(200)를 인증하기 위해 사용자인증부(64)와, 권한부여부(65)와, IP주소확인부(66) 중 어느 하나 이상을 사용할 수 있다.The data security unit 60 includes a user authentication unit 64, an authorization unit 65 and an IP address verification unit 66 to authenticate the work server 100 or the user equipment 200. [ May be used.

상기 사용자인증부(64)는, 상기 사용자기기(200)를 인증하기 위한 것으로, 미리 사용자정보가 저장되어 있는 것이 바람직하다.The user authentication unit 64 is for authenticating the user device 200, and preferably stores user information in advance.

상기 사용자인증부(64)는 상기 사용자기기(200)에서 생성된 사용자정보를 수신받아 미리 저장된 사용자정보에 포함되는지 판단한다. 즉, 상기 사용자인증부(64)는 상기 사용자기기(200)로부터 제공받은 사용자정보가 미리 저장된 사용자정보에 저장되어 있는지 판단하여 미리 저장된 사용자정보에 저장되어 있을 경우 상기 사용자기기(200)를 인증하고, 미리 저장된 사용자정보에 저장되어 있지 않은 경우 상기 사용자기기(200)를 인증하지 않는다.The user authentication unit 64 receives the user information generated in the user equipment 200 and determines whether the user information is included in the user information stored in advance. That is, the user authentication unit 64 determines whether the user information provided from the user device 200 is stored in the user information stored in advance, and authenticates the user device 200 when stored in the user information stored in advance , And does not authenticate the user device 200 if it is not stored in the user information stored in advance.

상기 사용자인증부(64)에서 인증되지 않은 사용자기기(200)로는 데이터를 제공하지 않아야 한다.The user authentication unit 64 should not provide data to the unauthenticated user equipment 200. [

이때, 상기 사용자인증부(64)는 다단계 인증을 사용할 수 있다.At this time, the user authentication unit 64 may use multi-step authentication.

상기 권한부여부(65)는, 상기 사용자인증부(64)에서 인증된 사용자기기(200)로부터 수신받은 사용자정보를 통해 사용권한을 부여한다. 다시 말해, 상기 권한부여부(65)는 미리 저장된 사용자정보별로 제공받을 수 있는 변환데이터의 사용권한이 설정되어 있어야 하며, 상기 인증받은 사용자기기(200)의 사용자정보에 따라 변환데이터의 사용권한을 부여한다.The authority granting unit 65 grants a use right through the user information received from the user equipment 200 authenticated by the user authenticating unit 64. In other words, the right of use of the conversion data, which can be provided for each user information previously stored, should be set in the right granting unit 65. In accordance with the user information of the authenticated user equipment 200, .

여기서, 상기 권한부여부(65)는 직책에 권한을 부여하는 RBAC(Role Based Access Control)에 의한 계정 관련 작업과 속성 기반 접근 통제인 ABAC(Attribute Based Access Control)에 으이한 데이터 및 컴퓨팅 자원 관련 작업에 대한 권한 부여를 구분한다.Here, the authority granting unit 65 may include an account related task by RBAC (Role Based Access Control) granting authority to a position and a task related to data and computing resource related to attribute based access control (ABAC) And the authorization for.

상기 권한부여부(65)로 인해 데이터레이크에 대한 접근 권한을 제어할 수 있어 인증받은 사용자기기라도 모든 변환데이터를 제공받을 수 없도록 함으로써 중요한 정보가 유출되는 것을 방지할 수 있게 된다.The access right to the data rake can be controlled by the right granting unit 65 so that even if the authenticated user equipment can not receive all the converted data, important information can be prevented from being leaked.

상기 IP주소확인부(66)는, 변환데이터를 제공할 수 있는 IP주소의 범위를 미리 설정하고, 상기 사용자기기(200)에서 수신받은 상기 사용자기기(200)의 사용자IP주소가 미리 설정된 IP주소범위에 포함되는지 판단한다.The IP address confirmation unit 66 previously sets a range of IP addresses that can provide the conversion data and sets the IP address of the user device 200 to the IP address of the user device 200, Range.

상기 사용자기기(200)의 사용자IP주소가 미리 설정된 IP주소범위에 포함될 경우 상기 사용자기기(200)로 변환데이터를 제공할 수 있도록 인증하는 반면, 상기 사용자기기(200)의 사용자IP주소가 미리 설정된 IP주소범위에 포함되지 않은 경우 상기 사용자기기(200)로 변환데이터를 제공할 수 없도록 인증되지 않도록 한다.When the user IP address of the user device 200 is included in a preset IP address range, the user device 200 authenticates the converted user data so as to provide conversion data, And is not authenticated so as not to be able to provide the conversion data to the user device 200 when the user device 200 is not included in the IP address range.

상기 IP주소확인부(66)를 통해 상기 사용자기기(200)의 사용자IP주소를 확인함으로써 데이터레이크에 대한 보안을 대폭 향상시킬 수 있게 된다.The security of the data rake can be greatly improved by checking the user IP address of the user device 200 through the IP address verifying part 66. [

다른 한편, 상기 데이터보안부(60)에는 데이터 관리 활동의 로그에 의한 감사와 데이터 관련 활동의 메타데이터에 의한 진단을 수행할 수 있다.On the other hand, the data security unit 60 can perform audit by a log of data management activities and diagnosis by metadata of data-related activities.

상기 분산스토리지부(70)는, 상기 데이터보안부(60)를 통해 암호화된 변환데이터를 저장한다.The distributed storage unit 70 stores the encrypted conversion data through the data security unit 60. [

상기 분산스토리지부(70)는 직렬연산과 랜덤연산을 지원할 수 있으며, 다중 데이터 구조 스토리지를 위해 유연하고 확장이 가능할 수 있다.The distributed storage unit 70 may support serial operations and random operations, and may be flexible and extensible for storage of multiple data structures.

또한, 상기 분산스토리지부(70)는 상기 데이터획득부(20)를 통해 수신되는 업무데이터 및 데이터 스트림에 대한 람다 아키텍처부(50)의 전체 솔루션 반응성을 정의할 수 있게 된다. 즉, 상기 분산스토리지부(70)는 연결 체인에서 가장 느린 시스템에 맞춰 반응성이 결정되며, 상기 분산스토리지부(70)가 빠르지 않을 경우 상기 람다 아키텍처부(50)의 실시간처리부(53)에 의해 수행되는 동작이 느려지기 때문에 상기 람다 아키텍처부(50)에서 실시간 처리를 방해하게 된다.In addition, the distributed storage unit 70 can define the overall solution responsiveness of the lambda architecture unit 50 with respect to the task data and the data stream received through the data acquisition unit 20. That is, the distributed storage unit 70 determines the responsiveness in accordance with the slowest system in the connection chain, and when the distributed storage unit 70 is not fast, the real-time processing unit 53 of the lambda architecture unit 50 performs So that the lambda architecture unit 50 interferes with real-time processing.

상기 데이터제공부(80)는, 상기 저장된 변환데이터를 사용하기 원하는 사용자기기(200)로 제공하기 위한 것이다.The data providing unit 80 is for providing the stored conversion data to the user device 200 desiring to use the converted data.

상기 데이터제공부(80)는 상기 저장된 변환데이터를 데이터 서비스 또는 내보내기 등의 방법으로 상기 사용자기기(200)로 전송할 수 있으며, 이때, 상기 변환데이터는 상기 사용자기기(200)가 사용할 수 있도록 메시지, 파일, 데이터 덤프 등의 형식으로 전송하는 것이 바람직하다.The data providing unit 80 may transmit the stored converted data to the user equipment 200 through a data service or an exporting method. At this time, the converted data may be transmitted to the user equipment 200 through a message, File, data dump, or the like.

또한, 상기 데이터제공부(80)는 일괄 처리된 데이터와 실시간 처리된 데이터의 병합된 뷰를 제공하며, 변환데이터를 소비하는 어플리케이션에 대한 확장 가능 및대응적이여야 한다.The data providing unit 80 also provides a merged view of the batch processed data and the real-time processed data, and must be scalable and responsive to the application consuming the converted data.

한편, 상기 데이터제공부(80)를 통해 변환데이터를 제공받길 원하는 사용자기기(200)는, 상기 도 6에 도시된 바와 같이, 데이터를 소비하는 사용자 또는 시스템으로써, 상기 데이터제공부(80)와 통신하기 위한 사용자통신부(210)와, 사용자정보를 생성하기 위한 사용자정보부(220)와, 상기 생성된 사용자정보와 상기 사용자기기(200)를 식별하기 위한 사용자IP주소를 포함한 요청신호를 생성하여 변환데이터를 요청하기 위한 데이터요청부(230)를 포함하여 구성될 수 있다.6, the user equipment 200 desiring to receive the conversion data through the data providing unit 80 may be a user or a system that consumes data, as shown in FIG. 6, A user information part 220 for generating user information, and a user IP address for identifying the user information and the user device 200, And a data request unit 230 for requesting data.

상기 사용자통신부(210)는, 인터넷 등을 이용하여 통신할 수 있으며, 이에 한정하는 것은 아니다.The user communication unit 210 can communicate using the Internet or the like, but is not limited thereto.

상기 사용자정보부(220)는, 사용자의 ID, 직책 등과 같은 정보를 입력하여 사용자정보를 생성한다. 상기 사용자정보부(220)를 통해 사용자정보를 생성함으로써 다른 사용자와 구분될 수 있게 된다.The user information unit 220 generates user information by inputting information such as a user's ID, title, and the like. And can be distinguished from other users by generating user information through the user information unit 220.

상기 데이터요청부(230)는, 상기 사용자정보부(220)에서 생성된 사용자정보와 상기 사용자IP주소를 포함하여 요청신호를 생성한다. 즉, 상기 데이터요청부(230)는 상기 변환데이터를 요청하기 위해 상기 사용자정보와 상기 사용자IP주소를 포함하여 다른 사용자와 식별될 수 있도록 요청신호를 생성하게 된다.The data request unit 230 generates a request signal including the user information generated in the user information unit 220 and the user IP address. That is, the data requesting unit 230 generates a request signal so as to be able to identify the other user including the user information and the user IP address to request the conversion data.

상기 데이터요청부(230)에서 생성된 요청신호는 상기 사용자통신부(210)를 통해 상기 데이터보안부(60) 및 데이터제공부(80)로 전송되게 된다.The request signal generated in the data request unit 230 is transmitted to the data security unit 60 and the data providing unit 80 through the user communication unit 210. [

상기와 같이 구성된 데이터레이크 프레임워크를 통해 구축된 데이터레이크는, 상기 도 7에 도시된 바와 같이, 상기 업무자기기(100)로부터 생성된 업무데이터를 상기 데이터획득부(20)를 통해 획득한다(S10).As shown in FIG. 7, the data rake constructed through the data rake framework configured as described above acquires the business data generated from the business console 100 through the data acquiring unit 20 S10).

상기 획득된 업무데이터를 상기 데이터전달부(30)를 통해 상기 람다 아키텍처부(50)로 전달한다.And transfers the obtained business data to the lambda architecture unit 50 through the data transfer unit 30. [

상기 람다 아키텍처부(50)는 전달받은 업무데이터를 모델링하여 변환데이터로 변환시킨 후 배치처리 또는 실시간처리 중 하나를 선택하여 처리한다(S11, S12, S13).The lambda architecture unit 50 models the received business data, converts the converted business data into converted data, and then selects one of batch processing and real time processing (S11, S12, S13).

이때, 상기 람다 아키텍처부(50)에서는 상기 변환데이터를 배치처리하거나 실시간처리한 후 품질을 향상시킨다.At this time, the lambda architecture unit 50 improves the quality after batch processing or real-time processing of the converted data.

또한, 상기 람다 아키텍처부(50)에서 상기 변환데이터의 메타데이터를 생성하여 저장하고(S14), 상기 변환데이터를 상기 데이터보안부(60)로 전송한다.In addition, the lambda architecture unit 50 generates and stores metadata of the transformed data (S14), and transmits the transformed data to the data security unit 60. [

상기 변환데이터를 전송받은 데이터보안부(60)는 상기 변환데이터의 보안여부를 선택하게 된다(S15).The data security unit 60, which has received the conversion data, selects whether the converted data is secured or not (S15).

상기 데이터보안부(60)에서 상기 변환데이터의 보안이 필요하다고 선택한 경우 상기 암호설정부(62)를 통해 상기 변환데이터에 대한 암호를 설정한다(S16).If the data security unit 60 determines that the conversion data is required to be secured, the encryption setting unit 62 sets a password for the conversion data (S16).

상기 암호설정부(62)에서 암호화된 변환데이터의 암호를 해독할 수 있도록 상기 암호관리부(63)에 암호를 전송하고, 상기 암호관리부(63)에서 상기 변환데이터의 암호를 해독할 수 있는 마스터암호화키를 관리하여야 한다.The encryption management unit 63 transmits a password to the encryption management unit 63 so as to decrypt the encrypted conversion data in the encryption setting unit 62, The key must be managed.

상기 암호설정부(62)를 통해 암호화된 변환데이터는 상기 분산스토리지부(70)로 전송되어 저장된다(S17).The converted data encrypted through the encryption setting unit 62 is transferred to the distributed storage unit 70 and stored (S17).

반면, 상기 데이터보안부(60)에서 상기 변환데이터의 보안이 필요없다고 선택한 경우 상기 변환데이터를 상기 분산스토리지부(70)로 전송시켜 저장한다(S17).On the other hand, if the data security unit 60 does not need the security of the converted data, the converted data is transmitted to the distributed storage unit 70 and stored (S17).

한편, 본 발명의 일실시 예에 의한 데이터레이크 프레임워크를 통해 구축된 데이터레이크에서 데이터를 제공받기 위해서는, 상기 도 8에 도시된 바와 같이, 먼저 상기 사용자기기(200)에서 변환데이터의 제공을 요청하기 위해 사용자정보와 사용자IP주소를 포함한 요청신호를 생성한다.Meanwhile, in order to receive data from the data rake constructed through the data rake framework according to an embodiment of the present invention, as shown in FIG. 8, first, the user device 200 requests to provide conversion data A request signal including user information and a user IP address is generated.

상기 사용자기기(200)는 상기 데이터보안부(60)와 상기 데이터제공부(80)로 요청신호를 전송하여 변환데이터의 제공을 요청한다(S20).The user equipment 200 transmits a request signal to the data security unit 60 and the data providing unit 80 to request the provision of the converted data at step S20.

상기 사용자기기(200)의 요청신호를 수신받은 데이터보안부(60)는 상기 사용자기기(200)의 사용자정보가 미리 등록된 사용자정보인지 판단한다(S21).Upon receipt of the request signal from the user equipment 200, the data security unit 60 determines whether the user information of the user equipment 200 is registered in advance (S21).

즉, 상기 데이터보안부(60)는 상기 사용자인증부(64)를 통해 상기 사용자기기(200)의 사용자정보가 미리 등록되어 있을 경우 사용자기기(200)를 인증하고, 상기 사용자기기(200)의 사용자정보가 미리 등록되어 있지 않을 경우 상기 사용자기기(200)로 변환데이터가 전송되지 않도록 인증하지 않는다.That is, the data security unit 60 authenticates the user device 200 when the user information of the user device 200 is registered in advance through the user authentication unit 64, If the information is not registered in advance, the user device 200 is not authenticated so that the converted data is not transmitted.

상기 사용자인증부(64)를 통해 인증된 사용자정보를 상기 권한부여부(65)로 전달하여 상기 사용자정보에 대한 변환데이터의 사용권한을 부여한다(S22).The authenticated user information is transmitted to the right granting unit 65 through the user authenticating unit 64 to grant the right to use the converted data for the user information at step S22.

여기서, 상기 사용권한을 부여받은 사용자기기(200)의 요청신호에 포함된 사용자IP주소가 상기 IP주소확인부(66)을 통해 미리 설정된 IP주소범위에 포함되는지 확인할 수도 있다.Here, the user IP address included in the request signal of the user device 200 to which the use right is granted may be checked through the IP address verifying unit 66 to see if it is included in the preset IP address range.

상기 사용자기기(200)의 사용자IP주소가 상기 IP주소범위에 포함될 경우 상기 사용자기기(200)와 상기 데이터제공부(80)를 연결하여 상기 사용자기기(200)가 변환데이터를 제공받을 수 있게 한다(S23).When the user IP address of the user equipment 200 is included in the IP address range, the user equipment 200 is connected to the data providing unit 80 so that the user equipment 200 can receive the converted data (S23).

상기 사용자기기(200)의 사용자IP주소가 상기 IP주소범위에 포함되지 않을 경우 상기 사용자기기(200)와 상기 데이터제공부(80)가 연결되지 않도록 한다.If the user IP address of the user device 200 is not included in the IP address range, the user device 200 and the data provider 80 are not connected.

상기와 같이 사용되는 데이터레이크를 구축하기 위한 데이터레이크 프레임워크는 기업이나 기관에서 비지니스 시스템을 통해 생성된 대용량 데이터를 데이터 흐름 기준으로 캡처링, 처리, 분석, 저장함으로 인해 GPU 및 SSD의 속도를 향상시킬 수 있을 뿐만 아니라 과부하를 방지할 수 있는 효과가 있다.The data rake framework for building the data rake used above can speed up the GPU and SSD by capturing, processing, analyzing and storing large amounts of data generated by the business system from the enterprise or organization on a data flow basis So that it is possible to prevent an overload.

또한, 데이터가 유입되는 동안 데이터 추적 가능성, 데이터 계보 및 라이프 사이클 전반의 데이터 민감도에 기반한 보안 측면에서 메타데이터를 캡처하고 관리할 수 있어 데이터를 사용하는 사용자 또는 데이터를 소비하는 시스템이 원하는 데이터를 효율적으로 추출할 수 있음은 물론 보안을 향상시킬 수 있다.In addition, metadata can be captured and managed from the perspective of data traceability, data lineage, and security based on data sensitivity across the lifecycle while data is flowing, enabling users using the data or systems consuming the data to efficiently and efficiently Can be extracted as well as security can be improved.

또한, 데이터를 저장할 경우 상기 데이터에 대한 암호화 유무를 선택할 수 있어 데이터를 보호할 수 있으며 외부로 유출되는 것을 미연에 방지할 수 있게 된다.In addition, when data is stored, it is possible to select whether or not the data is encrypted, thereby protecting the data and preventing the data from being leaked to the outside.

앞에서 설명되고, 도면에 도시된 본 발명의 실시 예들은 본 발명의 기술적 사상을 한정하는 것으로 해석되어서는 안 된다. 본 발명의 보호범위는 청구범위에 기재된 사항에 의하여만 제한되고, 본 발명의 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상을 다양한 형태로 개량 변경하는 것이 가능하다. 따라서 이러한 개량 및 변경은 통상의 지식을 가진 자에게 자명한 것인 경우에는 본 발명의 보호범위에 속하게 될 것이다.The embodiments of the present invention described above and shown in the drawings should not be construed as limiting the technical idea of the present invention. The scope of protection of the present invention is limited only by the matters described in the claims, and those skilled in the art will be able to modify the technical idea of the present invention in various forms. Accordingly, such improvements and modifications will fall within the scope of the present invention if they are apparent to those skilled in the art.

10: 물리적 자원 11: 컴퓨팅 자원
12: 스토리지 자원 13: 네트워킹 자원
20: 데이터획득부 30: 데이터전달부
40: 전달속도제어부 50: 람다 아키텍처부
51: 위상데이터분석부 52: 배치처리부
53: 실시간처리부 54: 데이터처리지원부
55: 메타데이터부 60: 데이터보안부
61: 암호화선택부 62: 암호설정부
63: 암호관리부 64: 사용자인증부
65: 권한부여부 66: IP주소확인부
70: 분산스토리지부 80: 데이터제공부
100: 업무자기기 200: 사용자기기
210: 사용자통신부 220: 사용자정보부
230: 데이터요청부10: physical resource 11: computing resource
12: Storage resources 13: Networking resources
20: data acquisition unit 30:
40: transfer rate control section 50: lambda architecture section
51: phase data analysis unit 52:
53: real-time processing unit 54: data processing support unit
55: Metadata section 60: Data security section
61: encryption selection unit 62: password setting unit
63: password management unit 64: user authentication unit
65: permission authority 66: IP address verification unit
70: Distributed storage unit 80: Data providing unit
100: Business magnetic device 200: User device
210: user communication unit 220: user information unit
230: Data request unit

Claims

A data acquiring unit (20) for acquiring the business data, the data acquiring unit (20) being connected to a business entity (100)
A data transfer unit 30 for separating the work magnetic unit 100 connected to the data acquisition unit 20 and supporting communication for transferring the acquired work data,
A transmission rate control unit 40 for controlling the transmission rate of business data transmitted through the data transmission unit 30,
A lambda architecture unit 50 for modeling business data transferred through the data transfer unit 30, converting the data into transformation data, performing batch processing or real-time processing, and generating and managing metadata for the transformation data,
A data security unit 60 for encrypting the converted data to protect the converted data from being leaked to the outside,
A distributed storage unit 70 for storing the encrypted conversion data,
And a data providing unit (80) for extracting and providing the converted data to a user device (200) desiring to use the stored converted data.

The method according to claim 1,
The lambda architecture unit 50 includes a phase data analysis unit 51 for performing data cleaning, processing, modeling, and conversion of the business data into converted data; A real-time processing unit 53 for real-time processing the converted data converted by the phase data analyzing unit 51; a batch processing unit 52 for improving the quality of the converted data; A data processing support unit 54 for supporting machine learning and data science processing based on the business data to a real time processing unit 53 and a metadata unit 55 for generating and managing metadata for the converted data, The data rake framework comprising:

The method according to claim 1 or 2,
The data security unit 60 includes an encryption selecting unit 61 for selecting whether to protect the converted data, an encryption setting unit 62 for setting a password for protecting the converted data, And a password management unit (63) for managing a master encryption key for decrypting the converted data.

The method of claim 3,
The user device 200 includes a user communication unit 210 for communicating with the data security unit 60 and the data providing unit 80, a user information unit 220 for generating user information by inputting user information, And a data request unit 230 for generating a request signal including the generated user information and a user IP address for identifying the user equipment 200 to request converted data,
The data security unit 60 includes a user authentication unit 64 for determining whether the user information included in the request signal received from the user equipment 200 is previously registered user information and authenticating the user information, (65) for granting use rights of the converted data through the authenticated user information in the request signal, an IP address verification unit (65) for checking whether the user IP address included in the request signal is included in the preset IP address range, (66). &Lt; / RTI >