WO2017094967A1

WO2017094967A1 - Natural language processing schema and method and system for establishing knowledge database therefor

Info

Publication number: WO2017094967A1
Application number: PCT/KR2016/000868
Authority: WO
Inventors: 최기선; 함영균; 남상하; 최규현
Original assignee: Korea Advanced Institute of Science and Technology KAIST
Current assignee: Korea Advanced Institute of Science and Technology KAIST
Priority date: 2015-12-03
Filing date: 2016-01-27
Publication date: 2017-06-08
Anticipated expiration: 2018-06-03
Also published as: KR101802051B1; KR20170065417A

Abstract

Provided are a method and a system for establishing a natural language processing schema for a text on the basis of identification information including location information related to each of identified words and relationship information indicating the dependency relationship between words included in the text by identifying the words included in the text, associating the identification information with each of the identified words, and identifying the relationship information.

Description

Natural language processing schema and its knowledge database construction method and system

아래의 실시예들은 자연 언어 처리 스키마 및 그 지식 데이터베이스를 구축 하는 방법과 관련되고, 특히, 텍스트 내의 단어를 개체화하여, 각 단어에 대한 자연 언어 처리 주석 정보에 관한 스키마 및 그 지식 데이터베이스를 구축하는 방법과 관련된다.The following embodiments relate to a method for building a natural language processing schema and its knowledge database, and in particular, a method for constructing a schema about natural language processing annotation information for each word and its knowledge database by individualizing words in the text. Is associated with.

최근 시맨틱 웹(semantic web)과 빅데이터 등에 대한 연구가 활발해짐에 따라 언어 자원의 처리에 대한 수요가 급증하고 있으며, 온톨로지(ontology)에 기반하여 언어 자원을 구축하는 다양한 연구들이 진행되고 있다.Recently, as research on the semantic web and big data is active, the demand for processing of language resources is rapidly increasing, and various studies for constructing language resources based on ontology are being conducted.

한편, 다양한 형태의 데이터로 존재하는 자연 언어 처리 주석(NLP annotation)에 대한 접근성과 관련된 문제를 해결하기 위해, 자연 언어 처리 주석 정보를 온톨로지에 기반하여 정의하고자 하는 연구들이 또한 진행되고 있다.On the other hand, in order to solve the problem related to the accessibility to the natural language processing annotation (NLP annotation) that exists in various forms of data, researches are attempting to define the natural language processing annotation information based on the ontology.

자연 언어 처리 주석 정보의 처리를 위한 기존의 방법은 텍스트의 메타데이터를 온톨로지화하고, 문장 단위의 주석 정보를 단순히 저장하는 방법을 사용한다. 이러한 방법은 사용자에게 동일한 형식의 데이터를 사용하게 한다는 장점이 있으나, 사용자가 단어에 대한 구체적인 정보나 단어 간의 구조와 관련된 자세한 정보를 얻고자 할 경우, 검색 품질 및 속도의 저하가 발생한다는 문제가 있다.Natural language processing The existing method for processing annotation information uses a method of ontotizing metadata of text and simply storing annotation information in sentence units. This method has the advantage of allowing the user to use the same type of data, but there is a problem in that the search quality and speed are deteriorated when the user wants to obtain detailed information about a word or a detailed structure between words. .

따라서, 자연 언어 처리 결과에 대한 사용자의 접근성을 높이면서 사용자에게 안정적이고 신뢰성 있는 정보를 제공하는 자연 언어 처리 주석 정보의 처리 방법이 요구된다.Therefore, there is a need for a method of processing natural language processing annotation information that provides a user with stable and reliable information while increasing the user's access to natural language processing results.

한국등록특허 제10-1476225호(등록일 2014년 12월 18일)에는 자연어 및 수식의 조합으로 이루어진 조합 데이터를 입력 받아 자연어 및 수식을 각각 분리하고, 분리된 자연어와 수식을 구성하고 있는 각각의 구성 정보를 분석하는 것에 기반하여 자연어 및 수식을 색인화할 수 있도록 해주는 장치 및 방법이 개시되어 있다.Korean Patent Registration No. 10-1476225 (Registration Date December 18, 2014) receives combination data consisting of a combination of natural language and formula, separates natural language and formula, respectively, and constructs each of the separated natural language and formula. Apparatus and methods are disclosed for enabling the indexing of natural language and expressions based on analyzing information.

상기에서 설명된 정보는 단지 이해를 돕기 위한 것이며, 종래 기술의 일부를 형성하지 않는 내용을 포함할 수 있으며, 종래 기술이 통상의 기술자에게 제시할 수 있는 것을 포함하지 않을 수 있다.The information described above is merely for the sake of understanding, and may include content that does not form part of the prior art, and may not include what the prior art may suggest to those skilled in the art.

텍스트에 포함된 각 단어에 대해 각 단어와 관련된 주석 정보를 연관시키고, 텍스트에 대한 자연 언어 처리 스키마를 구축하는 방법 및 시스템을 제공할 수 있다.It is possible to provide a method and system for associating annotation information associated with each word for each word included in the text, and building a natural language processing schema for the text.

단어와 같은 문자열의 색인 사 해당 단어의 용례, 해당 단어와 관련된 문장의 구조, 또는 여러 단어 간의 언어 구조 및 의미 관계 등을 파악할 수 있도록 해주는 언어적 지식 데이터베이스를 구축하는 방법 및 시스템을 제공할 수 있다.Index of a string such as a word can provide a method and system for constructing a linguistic knowledge database that allows the usage of the word, the structure of the sentence associated with the word, or the linguistic structure and semantic relationship between multiple words. .

일 측면에 있어서, 텍스트에 포함된 적어도 하나의 단어를 식별하는 단계, 상기 식별된 단어에, 상기 식별된 단어와 관련된 위치 정보를 포함하는 식별 정보를 연관시키는 단계, 상기 식별 정보에 기반하여 상기 식별된 단어 및 상기 텍스트에 포함된 다른 단어 간의 관계 정보를 식별하는 단계 및 상기 식별 정보 및 상기 관계 정보에 기반하여 상기 텍스트에 대한 스키마를 구축하는 단계를 포함하는, 텍스트에 대한 스키마를 구축하는 방법이 제공된다. In one aspect, identifying at least one word included in text, associating identification information with location information associated with the identified word to the identified word, the identification based on the identification information Identifying relationship information between the word and other words included in the text, and constructing a schema for the text based on the identification information and the relationship information. Is provided.

상기 텍스트 내에 포함된 단어는 그 자체로서 의미를 갖는 텍스트 내의 문자 또는 연속된 문자열일 수 있다. The words contained within the text may be characters or continuous strings in the text that have meaning by themselves.

상기 적어도 하나의 단어를 식별하는 단계는, 상기 텍스트 내에 포함된 단어들 중 상기 스키마의 구축에 유효한 목표 단어들을 결정하는 단계 및 상기 결정된 목표 단어들을 추출하는 단계를 포함할 수 있다.Identifying the at least one word may include determining target words valid for building the schema among words included in the text and extracting the determined target words.

상기 식별 정보를 연관시키는 단계는 상기 추출된 목표 단어들 각각의 식별 정보를 대응하는 목표 단어와 연관시킬 수 있다.The associating the identification information may associate the identification information of each of the extracted target words with a corresponding target word.

상기 관계 정보를 식별하는 단계는, 상기 추출된 목표 단어들 중 적어도 2개의 목표 단어들 간의 관계 정보를 식별할 수 있다.The identifying of the relationship information may identify relationship information between at least two target words among the extracted target words.

상기 스키마를 구축하는 단계는, 상기 추출된 목표 단어들 각각 및 상기 추출된 목표 단어들과 연관된 식별 정보 각각과 연관된 복수의 노드들을 결정하는 단계, 식별된 관계 정보에 기반하여, 상기 추출된 목표 단어들 중 제1 목표 단어와 연관된 제1 노드와 상기 제1 목표 단어와 관련된 제2 목표 단어와 연관된 제2 노드를 연결하는 단계 및 상기 제1 노드와 상기 제1 목표 단어의 식별 정보를 나타내는 노드를 연결하고, 상기 제2 노드와 상기 제2 목표 단어의 식별 정보를 나타내는 노드를 연결하는 단계를 포함할 수 있다. The constructing of the schema may include determining each of the extracted target words and a plurality of nodes associated with each of the identification information associated with the extracted target words, based on the identified relationship information. Connecting a first node associated with a first target word and a second node associated with a second target word associated with the first target word, and a node representing identification information of the first node and the first target word; Connecting the second node and a node representing identification information of the second target word.

상기 제1 목표 단어는 사용자에 의해 입력된 검색어에 포함된 단어일 수 있다.The first target word may be a word included in a search word input by a user.

상기 식별된 단어 및 상기 식별 정보의 연관 관계에 관한 정보는 데이터베이스 내에 저장될 수 있다. Information regarding an association between the identified word and the identification information may be stored in a database.

상기 식별 정보는 상기 식별된 단어의 통합 자원 식별자(Uniform Resource Identifier; URI) 정보, 상기 식별된 단어의 품사를 나타내는 정보 및 상기 식별된 단어의 언어학적(linguistic) 정보 중 적어도 하나를 포함할 수 있다.The identification information may include at least one of Uniform Resource Identifier (URI) information of the identified word, information indicating a part-of-speech of the identified word, and linguistic information of the identified word. .

상기 식별 정보는 상기 텍스트가 포함된 문서 내에서의 상기 텍스트의 위치를 나타내는 정보, 상기 식별된 단어의 상기 텍스트 내의 위치를 나타내는 정보, 상기 식별된 단어의 형태소 분석 결과를 나타내는 정보, 상기 식별된 단어의 품사를 나타내는 정보, 상기 식별된 단어의 개체명을 나타내는 정보 및 상기 식별된 단어의 개체명의 태그를 나타내는 정보 중 적어도 하나를 포함할 수 있다. 식별 정보는 텍스트에 대한 자연 언어 처리 주석 정보 및 기타 정보를 포함할 수 있다.The identification information is information indicating a position of the text in a document containing the text, information indicating a position in the text of the identified word, information indicating a stemming result of the identified word, the identified word It may include at least one of information indicating a part-of-speech of the information, information indicating the entity name of the identified word, and information indicating a tag of the entity name of the identified word. The identifying information may include natural language processing annotation information and other information about the text.

상기 식별된 단어가 상기 텍스트와 상이한 다른 텍스트에도 포함된 단어인 경우, 상기 관계 정보를 식별하는 단계는 상기 식별된 단어 및 상기 다른 텍스트에 포함된 다른 단어 간의 관계 정보를 식별할 수 있다.If the identified word is a word that is also included in other text different from the text, identifying the relationship information may identify relationship information between the identified word and other words included in the other text.

상기 스키마를 구축하는 단계는, 상기 식별된 단어의 식별 정보와 상기 식별된 단어 및 상기 다른 텍스트에 포함된 다른 단어 간의 관계 정보에 기반하여 상기 다른 텍스트에 대한 스키마를 구축할 수 있다.The constructing of the schema may build a schema for the other text based on the identification information of the identified word and relationship information between the identified word and other words included in the other text.

상기 관계 정보는 상기 식별된 단어 및 상기 다른 단어 간의 의존 관계를 나타내는 정보를 포함할 수 있다.The relationship information may include information indicating a dependency relationship between the identified word and the other word.

상기 의존 관계는 상기 식별된 단어 및 상기 다른 단어 간의 주술 관계, 주어-목적어 관계 및 수식 관계 중 적어도 하나일 수 있다.The dependency relationship may be at least one of a spelling relationship, a subject-object relationship, and a mathematical relationship between the identified word and the other word.

다른 일 측면에 있어서, 텍스트에 포함된 적어도 하나의 단어를 식별하는 단계, 상기 식별된 단어에, 상기 식별된 단어와 관련된 위치 정보를 포함하는 식별 정보를 연관시키는 단계, 상기 식별 정보에 기반하여 상기 식별된 단어 및 상기 텍스트에 포함된 다른 단어 간의 관계 정보를 식별하는 단계, 상기 식별 정보 및 상기 관계 정보에 기반하여 상기 텍스트에 대한 스키마를 구축하는 단계 및 상기 구축된 스키마를 데이터베이스 내에 저장하는 단계를 포함하는, 텍스트에 대한 스키마의 데이터베이스를 구축하는 방법이 제공된다. In another aspect, identifying at least one word included in text, associating identification information including location information associated with the identified word to the identified word, based on the identification information; Identifying relationship information between the identified word and other words included in the text, building a schema for the text based on the identification information and the relationship information, and storing the constructed schema in a database. A method of building a database of schemas for text is included.

또 다른 일 측면에 있어서, 텍스트에 포함된 적어도 하나의 단어를 식별하는 단계, 상기 식별된 단어에, 상기 식별된 단어와 관련된 위치 정보를 포함하는 식별 정보를 연관시키는 단계, 상기 식별 정보에 기반하여 상기 식별된 단어 및 상기 텍스트에 포함된 다른 단어 간의 관계 정보를 식별하는 단계, 상기 식별 정보 및 상기 관계 정보에 기반하여 상기 텍스트에 대한 스키마를 구축하는 단계, 사용자로부터 검색어를 수신하는 단계 및 상기 검색어가 상기 식별된 단어를 포함할 경우, 상기 구축된 스키마에 기반한 검색 결과를 상기 검색어에 의한 상기 사용자의 검색 요청에 대한 검색 결과로서 제공하는 단계를 포함하는, 검색 결과 제공 방법이 제공된다.In another aspect, identifying at least one word included in text, associating identification information including location information associated with the identified word to the identified word, based on the identification information Identifying relationship information between the identified word and other words included in the text, constructing a schema for the text based on the identification information and the relationship information, receiving a search word from a user, and the search word If includes the identified word, providing a search result based on the established schema as a search result for the user's search request by the search term, is provided.

또 다른 일 측면에 있어서, 텍스트에 포함된 적어도 하나의 단어를 식별하고, 상기 식별된 단어에, 상기 식별된 단어와 관련된 위치 정보를 포함하는 식별 정보를 연관시키고, 상기 식별 정보에 기반하여 상기 식별된 단어 및 상기 텍스트에 포함된 다른 단어 간의 관계 정보를 식별하고, 상기 식별 정보 및 상기 관계 정보에 기반하여 상기 텍스트에 대한 스키마를 구축하는 제어부 및 상기 구축된 스키마 및 상기 식별된 단어 및 상기 식별 정보의 연관 관계에 관한 정보 중 적어도 하나를 저장하는 저장부를 포함하는, 텍스트에 포함된 정보를 처리하는 시스템이 제공된다.In yet another aspect, identifying at least one word included in text, associating identification information including location information associated with the identified word to the identified word, and identifying the based on the identification information. A control unit for identifying relationship information between the identified word and another word included in the text, and constructing a schema for the text based on the identification information and the relationship information, and the constructed schema and the identified word and the identification information. Provided is a system for processing information included in text, comprising a storage unit for storing at least one of information relating to an association of a.

텍스트에 포함된 각 단어를 개체화하여 텍스트에 대한 자연 언어 처리 스키마를 구축하고, 그 데이터베이스를 구축함으로써, 사용자의 정보의 접근성을 높이면서 안정적이고 신뢰성 있는 정보가 사용자에게 제공될 수 있는 방법 및 시스템이 제공된다.By constructing a natural language processing schema for text by individualizing each word contained in the text, and building a database, a method and system for providing stable and reliable information to the user while increasing the accessibility of the user's information is provided. Is provided.

추가적인 외부의 색인 장치의 사용 또는 추가적인 분석 작업 없이, 스키마 내부에서 단어와 같은 문자열을 색인함으로써 해당 단어의 용례, 해당 단어와 관련된 문장의 구조, 또는 여러 단어 간의 언어 구조 및 의미 관계 등에 관한 정보를 획득할 수 있게 해주는 방법 및 시스템이 제공된다. Obtain information about the usage of the word, the structure of sentences associated with the word, or the linguistic structure and semantic relationships between the words, by indexing strings such as words within the schema, without the use of additional external indexing devices or additional analysis. A method and system are provided to enable this.

도 1a 내지 1d는 일 예에 따른, 텍스트에 대해 실제 세계에서의 표상에 대응하는 지식 구조를 구축하는 방법을 나타낸다.1A-1D illustrate a method of building a knowledge structure corresponding to a representation in the real world for text, according to an example.

도 2는 일 실시예에 따른, 텍스트에 대한 자연 언어 처리 스키마 및 지식 데이터베이스를 구축하는 방법을 나타낸다.2 illustrates a method of building a natural language processing schema and knowledge database for text, according to an embodiment.

도 3은 일 실시예에 따른, 텍스트에 대한 자연 언어 처리 스키마를 구축하고, 그 데이터베이스를 구축하는 시스템을 나타낸다.3 illustrates a system for building a natural language processing schema for text and building its database, according to one embodiment.

도 4은 일 실시예에 따른, 텍스트에 대한 자연 언어 처리 스키마를 구축하고, 그 데이터베이스를 구축하는 방법을 나타내는 흐름도이다.4 is a flow diagram illustrating a method of building a natural language processing schema for text and constructing a database according to an embodiment.

도 5는 일 예에 따른, 텍스트에 포함된 단어(들)를 식별하는 방법을 나타내는 흐름도이다.5 is a flowchart illustrating a method of identifying word (s) included in text according to an example.

도 6은 일 예에 따른, 텍스트에 대한 자연 언어 처리 스키마를 구축하는 방법을 나타내는 흐름도이다.6 is a flow diagram illustrating a method of building a natural language processing schema for text, according to an example.

도 7은 일 실시예에 따른, 텍스트에 대한 자연 언어 처리 스키마에 기반한 검색 결과 제공 방법을 나타내는 흐름도이다.7 is a flowchart illustrating a method of providing a search result based on a natural language processing schema for text according to an embodiment.

도 8은 일 예에 따른, 텍스트에 대한 자연 언어 처리 스키마를 나타낸다.8 illustrates a natural language processing schema for text, according to an example.

도 9a 및 9b는 일 예에 따른, 복수의 텍스트들에 대한 자연 언어 처리 스키마를 나타낸다.9A and 9B illustrate a natural language processing schema for a plurality of texts, according to an example.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. Like reference numerals in the drawings denote like elements.

도 1a 내지 1d에는 텍스트(문장) "포드는 1913년 7월 14일, 미국에서 태어났다. 그는 1977년, 대통령 임기를 시작하였다. ..."에 대해, 해당 텍스트의 실제 세계에서의 표상 및 텍스트 환경에서 구축된 지식 구조가 도시되었다. In Figures 1A-1D, the text "Ford was born in the United States on July 14, 1913. He began his presidency in 1977. ..." in the real world representation of the text and The knowledge structure built in the text environment is shown.

텍스트의 실제 세계에서의 표상은 해당 텍스트에 포함된 정보가 실제 세계에서 어떻게 인식되고 있는지를 나타낼 수 있다. 예컨대, 텍스트의 실제 세계에서의 표상은 "제랄드 포드"는 미국에서 태어났고(born in), 1913년 7월 14일에 태어났으며(born at), 미국의 38번째 대통령이고, 1974년 8월 9일 임기를 시작하여, 1977년 1월 20일에 임기가 끝났으며, 미국에서 2006년 12월 26일자로 사망(died at)하였음을 나타낼 수 있다. The representation of the text in the real world may indicate how the information contained in the text is perceived in the real world. For example, in the real world representation of the text, "Gerald Ford" was born in the United States (born in) on July 14, 1913 (born at), the 38th President of the United States, and in August 1974. The term may begin on the 9th, ending on January 20, 1977, and died at December 26, 2006 in the United States.

텍스트의 지식 구조는, 해당 텍스트의 실제 세계에서의 표상에서 인식되는 정보를 텍스트 환경에서 나타낸 것일 수 있다. 예컨대, 텍스트의 지식 구조는 온톨로지 지식 베이스(지식 데이터베이스) 상에 저장된 텍스트에 대한 자연 언어 처리 스키마일 수 있다. 예컨대, 텍스트의 지식 구조는 "제랄드 포드(res:제랄드_포드)"는 미국에서 태어났고(prop:birthPlace), 1913년 7월 14일에 태어났으며(prop:birthDate), 미국의 대통령이고(prop:job), 1974년 8월 9일 임기를 시작하여(prop:startYear), 1977년 1월 20일에 임기가 끝났으며(prop:endYear), 2006년 12월 26일자로 사망(prop:deathYear)하였음을 나타낼 수 있다. 텍스트의 지식 구조는 시간 축을 따라 구축될 수 있으며, 실제 세계에서의 표상에서 인식되는 정보에 최대한 가깝게(혹은, 실제 세계에서의 표상에서 인식되는 정보를 최대한 포함하도록) 구축될 수 있다.The knowledge structure of the text may represent the information recognized in the representation of the text in the real world in the text environment. For example, the knowledge structure of text may be a natural language processing schema for text stored on an ontology knowledge base (knowledge database). For example, the knowledge structure of the text is "gerald pod" (res: gerald_ford) was born in the United States (prop: birthPlace), was born on July 14, 1913 (prop: birthDate), and the president of the United States ( prop: job), starting on August 9, 1974 (prop: startYear), ending on January 20, 1977 (prop: endYear), and died on December 26, 2006 (prop: deathYear Can be indicated. The knowledge structure of the text can be built along the time axis and can be constructed as close as possible to the information recognized in the representation in the real world (or to include the maximum information recognized in the representation in the real world).

해당 텍스트의 지식 구조는 자연 언어 처리 주석 지식 베이스 포맷(Natural Language Processing Annotation Knowledge Base Format; NKF)에 의해 구축될 수 있다. 예컨대, 도 1b 내지 1d에 도시된 것처럼, 텍스트 내에 포함된 단어(개체)들 각각 및/또는 텍스트와 연관된 정보는 개체화되어 노드로서 연결될 수 있고, 단어들 간의 관계가 추출되어 해당 관계를 나타내는 정보가 주석으로서 노드에 연관됨으로써, 해당 텍스트에 대한 지식 구조가 구축될 수 있다. NKF는 소기의 자연 언어 처리 주석 베이스의 형식을 나타낼 수 있다.The knowledge structure of the text may be constructed by the Natural Language Processing Annotation Knowledge Base Format (NKF). For example, as shown in FIGS. 1B-1D, each of the words (objects) contained in the text and / or information associated with the text may be individualized and linked as a node, and the relationship between the words may be extracted to indicate information representing that relationship. By being associated with a node as an annotation, a knowledge structure for that text can be built. NKF may represent the desired natural language processing annotation base.

도 1a 내지 1d에서 "res"는 텍스트 내의 지식 개체(예컨대, 명사 또는 단어)를 나타낼 수 있고, 개체 간의 연결(관계 추출)은 개체들 간의 관계 정보에 기반하여 이루어질 수 있다. 예컨대, 개체가 동사인 경우, 해당 개체의 다른 개체와의 관계 추출을 위해서는 적어도 2개의 다른 개체들과의 관계 정보가 요구될 수 있다.In FIGS. 1A-1D, "res" may represent a knowledge entity (eg, a noun or a word) in text, and a connection (relationship extraction) between entities may be made based on relationship information between entities. For example, when an entity is a verb, relationship information with at least two other entities may be required for extracting a relationship with another entity of the entity.

NKF는 텍스트에 대해, 언어 정보 분석에 기반한 주석 정보를 포함하는 지식 구조를 구축할 수 있고, 구축된 지식 구조는 데이터베이스로서 구축될 수 있다. 또한, 사용자의 검색 요청에 따라, 텍스트 및 지식 구조가 연관된 검색 정보가 검색 결과로서 제공될 수 있다.The NKF may build a knowledge structure that includes annotation information based on linguistic information analysis for the text, and the constructed knowledge structure may be built as a database. In addition, according to a user's search request, search information associated with text and knowledge structures may be provided as a search result.

텍스트에 대한 지식 구조(스키마) 및 그 데이터베이스의 구축 방법에 대해서는 후술될 도 2 내지 도 9b를 참조하여 더 자세하게 설명된다.A knowledge structure (schema) for text and a method of constructing the database will be described in more detail with reference to FIGS. 2 to 9B to be described later.

도 2에서는, 도 1을 참조하여 전술된 NKF에 따른 (입력된) 텍스트에 대한 자연 언어 처리를 통해, 텍스트와 관련된 주석 정보를 포함하는 자연 언어 처리 스키마가 구축되고, 그 지식 베이스(데이터베이스)가 구축되는 방법이 도시되었다.In FIG. 2, a natural language processing schema including annotation information related to text is constructed through natural language processing on (input) text according to the NKF described above with reference to FIG. 1, and the knowledge base (database) The method of construction is shown.

단계(210)에서, 자연 언어 처리 주석 도구는 텍스트(문장)(들)에 대한 자연 언어 처리를 수동 또는 자동으로 수행할 수 있다. 단계(220)에서, 자연 언어 처리 주석부는 자연 언어 처리 주석 도구에 의한 처리 결과, 즉, 텍스트(문장) 별 주석 결과(주석 정보)를 저장하고 출력할 수 있다. 자연 언어 처리 주석부의 출력 결과는 단계(210)에서의 처리 결과에 따라 상이할 수 있다. 단계(230)에서, 단어 추출기는 텍스트(문장)로부터 소기의 목적에 따라, 스키마 구축에 사용될 목표 단어(들)를 추출할 수 있다. 예컨대, 문장 내에 포함된 명사 또는 동사가 목표 단어로서 추출될 수 있다. 단계(240)에서, 단어 추출기에 의해 추출된 목표 단어는 개체화되어 색인될 수 있다. 목표 단어의 개체화는 예컨대, 자원 디스크립션 프레임워크(Resource Description Framework; RDF) 스키마 내에서 수행될 수 있다. 단계(250)에서, 추출된 목표 단어들에 대한 부가 정보(각 목표 단어에 대한 식별 정보 및/또는 목표 단어들 간의 관계 정보)가 자연 언어 처리 주석부로부터 획득되어 해당하는 목표 단어와 연관될 수 있다. 단계(260)에서, 추출된 목표 단어들 간의 이항 관계가 자연 언어 처리 주석부로부터 획득되어 목표 단어와 연관될 수 있고, 획득된 이항 관계에 따라 목표 단어들이 서로 연결될 수 있다. 단계(270)에서, 전술된 단계들(210 내지 260)에 의해 구축된 그래프(텍스트에 대한 스키마에 대응) 및/또는 추출된 단어 및 그 주석 정보 간의 연관 관계를 나타내는 정보는 지식 데이터베이스 내에 저장될 수 있다.In step 210, the natural language processing annotation tool may perform natural language processing on the text (sentence) (s) manually or automatically. In operation 220, the natural language processing annotation unit may store and output a processing result by the natural language processing annotation tool, that is, an annotation result (comment information) for each text (statement). The output result of the natural language processing annotation unit may be different depending on the processing result in step 210. In step 230, the word extractor may extract the target word (s) to be used in constructing the schema from the text (sentence), depending on the desired purpose. For example, a noun or a verb included in a sentence may be extracted as a target word. In step 240, the target words extracted by the word extractor may be individualized and indexed. The individualization of the target word may be performed within, for example, a Resource Description Framework (RDF) schema. In step 250, additional information on the extracted target words (identification information about each target word and / or relationship information between the target words) may be obtained from the natural language processing annotation unit and associated with the corresponding target word. have. In step 260, the binomial relationship between the extracted target words may be obtained from the natural language processing annotation unit and associated with the target word, and the target words may be connected to each other according to the obtained binomial relationship. In step 270, information representing an association between the graph (corresponding to the schema for the text) and / or the extracted word and its annotation information constructed by the steps 210 to 260 described above may be stored in the knowledge database. Can be.

전술된 단계들(210 내지 270)은 후술될 시스템(300) 또는 그 구성에 의해 수행될 수 있다.The above described steps 210 to 270 may be performed by the system 300 or a configuration thereof to be described later.

텍스트에 대한 지식 구조(스키마) 및 그 데이터베이스의 구축 방법에 대해서는 후술될 도 3 내지 도 9b를 참조하여 더 자세하게 설명된다.A knowledge structure (schema) for text and a method for constructing the database thereof will be described in more detail with reference to FIGS. 3 to 9B to be described later.

앞서 도 1을 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.Since the technical contents described above with reference to FIG. 1 may be applied as it is, a more detailed description will be omitted below.

도시된 시스템(300)은 도 1 및 2를 참조하여 전술된 NKF에 따른 텍스트에 대한 자연 언어 처리를 통해, 텍스트에 대한 자연 언어 처리 스키마를 구축하는 시스템에 대응할 수 있다.The illustrated system 300 may correspond to a system for building a natural language processing schema for text through natural language processing for text according to the NKF described above with reference to FIGS. 1 and 2.

시스템(300)은 제어부(310), 통신부(320) 및 저장부(330)를 포함할 수 있다. 제어부(310)는 시스템(300)의 구성 요소들을 관리할 수 있고, 시스템(300)이 사용하는 프로그램 또는 어플리케이션을 실행할 수 있다. 예컨대, 제어부(310)는 (입력된) 텍스트의 자연 언어 처리를 수행하고 텍스트에 대한 스키마를 구축하기 위한 프로그램 또는 어플리케이션을 실행할 수 있다. 또한, 제어부(310)는 프로그램 또는 어플리케이션의 실행 및 데이터의 처리 등에 필요한 연산을 처리할 수 있다. 제어부(310)는 시스템(300)의 적어도 하나의 프로세서 또는 프로세서 내의 적어도 하나의 코어(core)일 수 있다.The system 300 may include a controller 310, a communicator 320, and a storage 330. The controller 310 may manage components of the system 300 and execute a program or application used by the system 300. For example, the controller 310 may execute a program or an application for performing natural language processing of the (input) text and building a schema for the text. In addition, the controller 310 may process operations necessary for executing a program or an application and processing data. The controller 310 may be at least one processor of the system 300 or at least one core within the processor.

통신부(320)는 시스템(300)과는 상이한 장치(들) 또는 서버와 통신하기 위한 장치일 수 있다. 예컨대, 통신부(320)는 다른 장치(들) 또는 서버로부터 텍스트(또는 검색어)를 수신할 수 있다. 도시되지는 않았으나, 통신부(320)는 다른 장치(들) 또는 서버와의 신호 및 정보의 송수신을 위한 하나 이상의 안테나를 포함할 수 있다. 통신부(320)는 시스템(300)의 네트워크 인터페이스 카드, 네트워크 인터페이스 칩 및 네트워킹 인터페이스 포트 등과 같은 하드웨어 모듈 또는 네트워크 디바이스 드라이버(driver) 또는 네트워킹 프로그램과 같은 소프트웨어 모듈일 수 있다.The communicator 320 may be a device for communicating with a different device (s) or server than the system 300. For example, the communicator 320 may receive text (or a search term) from other device (s) or a server. Although not shown, the communication unit 320 may include one or more antennas for transmitting and receiving signals and information with other device (s) or a server. The communicator 320 may be a hardware module such as a network interface card, a network interface chip and a networking interface port of the system 300, or a software module such as a network device driver or a networking program.

저장부(330)는 제어부(310)가 실행하는 스키마 구축을 위한 프로그램 또는 어플리케이션과 관련된 정보 및/또는 구축된 스키마를 저장할 수 있다. 저장부(330)는 텍스트에 포함된 단어 또는 해당 단어 및 그 단어의 식별 정보 간의 연관 관계를 저장하는 데이터베이스일 수 있다. 또는, 저장부(330)는 구축된 스키마를 저장하는 데이터베이스일 수 있다. 도시된 것과는 달리, 저장부(330)는 시스템(300)과 별개의 장치로서 구성될 수도 있다.The storage unit 330 may store information related to a program or application for constructing a schema executed by the control unit 310 and / or a built schema. The storage unit 330 may be a database that stores an association relationship between a word included in text or a corresponding word and identification information of the word. Alternatively, the storage unit 330 may be a database that stores the built schema. Unlike shown, storage 330 may be configured as a separate device from system 300.

시스템(300)에 의해 텍스트에 대한 지식 구조(스키마) 및 그 데이터베이스가 구축되는 방법에 대해서는 후술될 도 4 내지 도 9b를 참조하여 더 자세하게 설명된다.The knowledge structure (schema) for the text and how the database is built by the system 300 is described in more detail with reference to FIGS. 4-9B described below.

앞서 도 1 및 2를 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.Since the technical contents described above with reference to FIGS. 1 and 2 may be applied as it is, a more detailed description will be omitted below.

단계(410)에서, 제어부(310)는 텍스트에 포함된 적어도 하나의 단어를 식별할 수 있다. 텍스트는 문서 내에 존재하는 적어도 하나의 단어를 포함하는 문장일 수 있다. 텍스트 내에 포함된 단어는 그 자체로서 의미를 갖는 텍스트 내의 문자 또는 연속된 문자열일 수 있다. 예컨대, 단어는 명사(구), 동사(구), 형용사(구) 또는 부사(구)일 수 있다. 혹은, 단어는 조사일 수 있다.In operation 410, the controller 310 may identify at least one word included in the text. The text may be a sentence including at least one word present in the document. A word contained within the text may be a character or a continuous string in the text that has meaning in itself. For example, a word may be a noun (a phrase), a verb (a phrase), an adjective (a phrase) or an adverb (a phrase). Or the word can be a survey.

단계(420)에서, 제어부(310)는 단계(410)에서 식별된 단어에 대해 식별된 단어와 관련된 위치 정보를 포함하는 식별 정보를 연관시킬 수 있다. 식별 정보는 텍스트에 대한 자연 언어 처리 작업을 통해 제어부(310)에 의해 생성될 수 있다. 단어와 관련된 식별 정보 식별된 단어의 통합 자원 식별자(Uniform Resource Identifier; URI) 정보, 식별된 단어의 품사를 나타내는 정보 및 식별된 단어의 언어학적(linguistic) 정보 중 적어도 하나일 수 있다. 식별된 단어의 URI 정보는 식별된 단어가 어떤 문서에 존재하는지에 관한 정보(문서의 위치 정보), 어떤 문서의 어떤 문장 내에 존재하는지에 관한 정보(문장의 위치 정보) 및 문장 내의 어떤 위치에 존재하는지에 관한 정보 중 적어도 하나를 포함할 수 있다. 예컨대, 식별된 단어의 URI 정보는 해당 단어가 존재하는 문서의 URL 정보를 포함할 수 있다. In operation 420, the controller 310 may associate identification information including location information related to the word identified for the word identified in operation 410. The identification information may be generated by the controller 310 through a natural language processing task for the text. Identification information associated with a word may be at least one of Uniform Resource Identifier (URI) information of the identified word, information representing a part-of-speech of the identified word, and linguistic information of the identified word. The URI information of the identified word is information about which document the identified word is in (document position information), information about which sentence is in which document (position position information), and at which position within the sentence. It may include at least one of information about whether. For example, the URI information of the identified word may include URL information of the document in which the word exists.

또한, 식별 정보는 텍스트(문장)가 포함된 문서 내에서의 텍스트(문장)의 위치를 나타내는 정보, 식별된 단어의 텍스트(문장) 내의 위치를 나타내는 정보, 식별된 단어의 형태소 분석 결과를 나타내는 정보, 식별된 단어의 품사를 나타내는 정보, 식별된 단어의 개체명을 나타내는 정보 및 식별된 단어의 개체명의 태그를 나타내는 정보 중 적어도 하나를 포함할 수 있다. 개체명은 단어의 의미에 기반하여 분류된 카테고리일 수 있다. 예컨대, 개체명은 인명, 기관명 또는 지명 등을 나타낼 수 있다. 개체명은 도 1a 내지 1d의 "CLASS"에 대응할 수 있다. 단어의 품사는 예컨대, 동사(구), 명사(구), 형용사(구) 또는 부사(구)일 수 있다.In addition, the identification information includes information indicating the position of the text (sentence) in the document containing the text (sentence), information indicating the position in the text (sentence) of the identified word, and information indicating the stemming result of the identified word. , Information indicating the part-of-speech of the identified word, information indicating the entity name of the identified word, and information indicating a tag of the entity name of the identified word. The entity name may be a category classified based on the meaning of the word. For example, the individual name may represent a person's name, organization name or place name. The entity name may correspond to "CLASS" in FIGS. 1A-1D. The part of speech of a word may be, for example, a verb (a phrase), a noun (a phrase), an adjective (a phrase) or an adverb (a phrase).

식별 정보는 도 2를 참조하여 전술된 단계들(210 및 220)의 자연 언어 처리에 의해 텍스트에 대해 획득되는 주석 정보(부가 정보)에 포함될 수 있다.The identification information may be included in annotation information (additional information) obtained for the text by the natural language processing of the steps 210 and 220 described above with reference to FIG. 2.

단계(430)에서, 제어부(310)는 식별 정보에 기반하여 식별된 단어 및 텍스트에 포함된 다른 단어 간의 관계 정보를 식별할 수 있다. 관계 정보는 텍스트 내의 식별된 단어 및 다른 단어 간의 의존 관계를 나타내는 정보일 수 있다. 관계 정보는 텍스트에 대한 자연 언어 처리 작업을 통해 제어부(310)에 의해 생성될 수 있다. 의존 관계는, 예컨대, 식별된 단어 및 다른 단어 간의 주술 관계, 주어-목적어 관계, 수식 관계, 부사 관계(어느 하나가 부사인지 여부를 판단함) 및 기타 의존 관계 중 적어도 하나일 수 있다.In operation 430, the controller 310 may identify relationship information between the identified word and other words included in the text based on the identification information. The relationship information may be information indicating a dependency relationship between the identified word and other words in the text. The relationship information may be generated by the controller 310 through a natural language processing task for text. The dependency relationship may be, for example, at least one of a magical relationship between the identified word and another word, a subject-object relationship, a mathematical relationship, an adverb relationship (to determine whether one is an adverb), and other dependency relationships.

관계 정보는 도 2를 참조하여 전술된 단계들(210 및 220)의 자연 언어 처리에 의해 텍스트에 대해 획득되는 주석 정보(부가 정보)에 포함될 수 있다.The relationship information may be included in annotation information (additional information) obtained for text by the natural language processing of steps 210 and 220 described above with reference to FIG. 2.

단계(450)에서, 제어부(310)는 텍스트 내의 단어들의 식별 정보 및 관계 정보에 기반하여 텍스트에 대한 스키마를 구축할 수 있다. 구축되는 스키마(NKF 스키마는) 텍스트 내의 식별된 단어들을 개체화함으로써 구축되는 RDF 스키마일 수 있다. 말하자면, 구축된 NKF 스키마는 텍스트에 포함된 각 단어를 중심으로 한 정보를 RDF 그래프를 사용하여 표현할 수 있다. 스키마 및 스키마의 구축 방법에 대해서는 후술될 도 6, 8, 9a 및 9b를 참조하여 더 자세하게 설명된다.In operation 450, the controller 310 may build a schema for the text based on the identification information and the relationship information of the words in the text. Constructed Schema (NKF Schema) may be an RDF Schema that is constructed by individualizing identified words in text. In other words, the constructed NKF schema can express information about each word in the text using an RDF graph. The schema and the method of constructing the schema are described in more detail with reference to FIGS. 6, 8, 9a and 9b which will be described later.

텍스트 내에서 식별되는 단어가 텍스트와는 상이한 다른 텍스트(들)에도 포함된 단어인 경우, 즉, 동일한 단어가 복수의 문서들 내의 문장 또는 문장들 내에서 존재할 경우, 단계(420 및 430)에서, 제어부(310)는 상기 텍스트뿐만 아니라 다른 텍스트(들)과 관련하여서도, 각 단어에 대한 식별 정보를 생성하여 각 단어와 연관시키고, 다른 텍스트(들) 내에 포함된 각 단어 및 다른 단어 간의 관계 정보를 식별할 수 있다.If the word identified in the text is a word that is also included in other text (s) different from the text, that is, if the same word is present in a sentence or sentences in a plurality of documents, then at steps 420 and 430, The controller 310 generates identification information for each word in association with not only the text but also other text (s), associates with each word, and relationship information between each word and other words included in the other text (s). Can be identified.

말하자면, 단계(430)에서, 제어부(310)는 식별된 단어가 제1 텍스트와는 상이한 다른 텍스트인 제2 텍스트에도 포함된 경우, 상기 식별된 단어 및 제2 텍스트에 포함된 다른 단어 간의 관계 정보를 식별할 수 있고, 단계(440)에서, 상기 식별된 단어의 식별 정보와 상기 식별된 단어 및 제2 텍스트에 포함된 다른 단어 간의 관계 정보에 기반하여 제2 텍스트에 대한 스키마를 구축할 수 있다. 예컨대, 동일한 단어를 포함하는 모든 텍스트들에 대해 자연 언어 처리 스키마가 구축될 수 있다. 복수의 텍스트들에 대해 구축된 자연 언어 처리 스키마에 대해서는 도 9a 및 9b를 참조하여 더 자세하게 설명된다. In other words, in step 430, when the identified word is also included in the second text, which is another text different from the first text, the control unit 310 indicates relationship information between the identified word and the other words included in the second text. In operation 440, a schema for the second text may be established based on the identification information of the identified word and relationship information between the identified word and other words included in the second text. . For example, a natural language processing schema can be built for all texts containing the same word. A natural language processing schema built for a plurality of texts is described in more detail with reference to FIGS. 9A and 9B.

단계(450)에서, 제어부(310)는 단계(440)에서 구축된 스키마를 데이터베이스(또는 저장부(330))에 저장할 수 있다. 이로서, 스키마에 대한 지식 데이터베이스가 구축될 수 있다.In operation 450, the controller 310 may store the schema established in operation 440 in the database (or the storage 330). As such, a knowledge database about the schema can be built.

한편, 제어부(310)는 텍스트 내의 식별된 단어 및 상기 식별된 단어의 식별 정보 간의 연관 관계에 관한 정보를 데이터베이스(또는 저장부(330)) 내에 저장할 수 있다. 말하자면, 저장부(330)는 구축된 스키마; 및 식별된 단어 및 상기 식별 정보의 연관 관계에 관한 정보 중 적어도 하나를 저장할 수 있다. 예컨대, 제어부(310)는 데이터베이스(또는 저장부(330)) 내에 텍스트의 개체화된 단어들 각각 및 그 식별 정보와의 연관 관계를 저장할 수 있다. 개체화된 단어에 관한 정보가 데이터베이스 내에 저장됨으로써, 개체화된 단어는 그 자체로서 고유 값을 가질 수 있으며, 개체화된 단어를 인덱싱함으로써 해당 단어에 대한 검색 속도를 높일 수 있으며, 단어를 중심으로 RDF 그래프를 생성하여 스키마를 구축할 수 있다.Meanwhile, the controller 310 may store information regarding an association relationship between the identified word in the text and the identification information of the identified word in the database (or the storage 330). In other words, the storage unit 330 is a built schema; And information about an association between the identified word and the identification information. For example, the controller 310 may store an association relationship between each of the individualized words of text and identification information thereof in the database (or the storage 330). Information about an individualized word is stored in a database, so that the individualized word can have its own unique value, which can speed up the search for that word by indexing the individualized word and build an RDF graph around the word. You can build the schema by creating it.

앞서 도 1 내지 3을 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.Since the technical contents described above with reference to FIGS. 1 to 3 may be applied as it is, a more detailed description will be omitted below.

도 5에서는, 텍스트로부터 단어(들)을 식별(추출)하는 방법이 더 자세하게 설명된다. 후술될 단계들(510 및 520)은 도 4를 참조하여 전술된 단계(410)에 포함될 수 있다.In FIG. 5, a method of identifying (extracting) word (s) from text is described in more detail. Steps 510 and 520 to be described below may be included in step 410 described above with reference to FIG. 4.

단계(510)에서, 제어부(310)는 텍스트 내에 포함된 단어들 중 스키마의 구축에 유효한 목표 단어들을 결정할 수 있다. "목표 단어"는 그 자체로서 의미를 갖는 문자열일 수 있고, 스키마 구축에 있어서 노드를 구성하는 텍스트 내의 단어에 대응할 수 있다(Goal of String; GoS). 예컨대, 목표 단어들의 결정은 단어의 품사 또는 소정의 태그 세트에 따라 텍스트에 포함된 단어들을 분류하거나, 기타 형태소 분석(Morphological analysis)에 따라 텍스트에 포함된 단어들을 분류함으로써 수행될 수 있다. "목표 단어"는 조사가 아닌 텍스트 내의 각 단어로서 예컨대, 동사 또는 명사일 수 있다. In operation 510, the controller 310 may determine target words valid for constructing a schema among words included in the text. The "target word" may be a string having meaning in itself and may correspond to a word in the text constituting the node in constructing the schema (Goal of String; GoS). For example, determination of target words may be performed by classifying words included in the text according to the word of speech or a predetermined set of tags, or classifying words included in the text according to other morphological analysis. "Target word" is each word in the text that is not a survey and may be, for example, a verb or a noun.

단계(520)에서, 제어부(310)는 단계(510)에서 결정된 목표 단어들을 추출할 수 있다. 추출된 목표 단어들은 단계(440)에서 구축될 스키마의 노드를 결정하기 위해 사용될 수 있다.In operation 520, the controller 310 may extract target words determined in operation 510. The extracted target words may be used to determine the node of the schema to be built in step 440.

스키마를 구축하는 방법에 대해서는, 후술될 도 6을 참조하여 더 자세하게 서명된다. 앞서 도 1 내지 4를 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.The method of building the schema is signed in more detail with reference to FIG. Since the technical contents described above with reference to FIGS. 1 to 4 may be applied as it is, a more detailed description will be omitted below.

후술될 단계들(610 내지 630)은 도 4를 참조하여 전술된 단계(440)에 포함될 수 있다.Steps 610 to 630 to be described below may be included in step 440 described above with reference to FIG. 4.

단계(420)에서, 제어부(310)는 도 5를 참조하여 전술된 단계(510)에서 추출된 목표 단어들 각각의 식별 정보를 대응하는 목표 단어와 연관시킬 수 있다. 단계(430)에서, 제어부(310)는 추출된 목표 단어들 중 적어도 2개의 목표 단어들 간의 관계 정보를 식별할 수 있다.In operation 420, the controller 310 may associate identification information of each of the target words extracted in operation 510 described above with reference to FIG. 5, with a corresponding target word. In operation 430, the controller 310 may identify relationship information between at least two target words among the extracted target words.

단계(610)에서, 제어부(310)는 추출된 목표 단어들 각각 및 상기 추출된 목표 단어들과 연관된 식별 정보 각각과 연관된 복수의 노드들을 결정할 수 있다. 예컨대, 제어부(310)는 추출된 목표 단어들 각각과 연관된 노드들을 결정할 수 있고, 추가로, 식별 정보가 나타내는 각 목표 단어에 관한 정보와 연관된 노드들을 결정할 수 있다.In operation 610, the controller 310 may determine each of the extracted target words and a plurality of nodes associated with each of the identification information associated with the extracted target words. For example, the controller 310 may determine nodes associated with each of the extracted target words, and may further determine nodes associated with information about each target word indicated by the identification information.

단계(620)에서, 제어부(310)는 단계(430)에서 식별된 관계 정보에 기반하여, 추출된 목표 단어들 중 제1 목표 단어와 연관된 제1 노드와 제1 목표 단어와 관련된 제2 목표 단어와 연관된 제2 노드를 연결할 수 있다. 제2 목표 단어는 제1 목표 단어와 의존 관계를 가질 수 있다. 노드들 간의 연결에 있어서, 관계 정보를 나타내는 정보가 스키마 내에 부가되어 포함될 수 있다. In step 620, the controller 310 based on the relationship information identified in step 430, the first node associated with the first target word among the extracted target words and the second target word associated with the first target word. Connect a second node associated with the. The second target word may have a dependency relationship with the first target word. In the connection between nodes, information representing the relationship information may be added and included in the schema.

단계(630)에서, 제어부(310)는 제1 노드와 제1 목표 단어의 식별 정보를 나타내는 노드를 연결할 수 있고, 상기 제2 노드와 상기 제2 목표 단어의 식별 정보를 나타내는 노드를 연결할 수 있다. 노드들 간의 연결에 있어서, 식별 정보를 나타내는 정보가 스키마 내에 부가되어 포함될 수 있다. In operation 630, the controller 310 may connect a node representing identification information of the first target word with a first node, and connect the node representing identification information of the second target word with the second node. . In the connection between nodes, information representing identification information may be added and included in the schema.

구축된 스키마는 제1 노드가 상위에 존재하는 트리 구조를 가지는 그래프일 수 있다. 구축되는 스키마에 대해서는 도 8, 도 9a 및 9b를 참조하여 더 자세하게 설명된다.The constructed schema may be a graph having a tree structure in which the first node exists above. The constructed schema is described in more detail with reference to FIGS. 8, 9A, and 9B.

제1 노드와 연관된 제1 목표 단어는 사용자에 의해 입력된 검색어에 포함된 단어일 수 있다. 예컨대, 제1 목표 단어는 사용자가 입력한 검색어에 포함된 인덱싱의 대상이 되는 단어일 수 있다.The first target word associated with the first node may be a word included in a search word input by a user. For example, the first target word may be a word to be indexed included in a search word input by a user.

앞서 도 1 내지 5를 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.Since the technical contents described above with reference to FIGS. 1 to 5 may be applied as it is, a more detailed description will be omitted below.

단계(710)에서, 제어부(310)는 통신부(320)를 통해 사용자로부터 검색어를 수신할 수 있다.In operation 710, the controller 310 may receive a search word from a user through the communication unit 320.

단계(720)에서, 제어부(310)는 수신된 검색어가 구축된 스키마와 연관된 단어(도 4 내지 6을 참조하여 전술된 식별된 단어에 대응)를 포함할 경우, 구축된 스키마에 기반한 검색 결과를 상기 검색어에 의한 사용자의 검색 요청에 대한 검색 결과로서 제공할 수 있다. 예컨대, 구축된 스키마 내에 포함된 상기 연관된 단어와 관련된 정보가 검색 결과로서 제공될 수 있다. 즉, 외부의 색인 장치의 도움 없이 사용자에게 고속으로 풍부한 정보를 포함하는 검색 결과가 제공될 수 있다. In operation 720, if the received search word includes a word (corresponding to the identified word described above with reference to FIGS. 4 to 6) associated with the constructed schema, the controller 310 generates a search result based on the constructed schema. It may be provided as a search result for the user's search request by the search word. For example, information related to the associated word included in the built schema may be provided as a search result. That is, a search result including abundant information can be provided to the user at high speed without the help of an external indexing device.

앞서 도 1 내지 6을 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.Since the technical contents described above with reference to FIGS. 1 to 6 may be applied as they are, a detailed description thereof will be omitted below.

도 8에 도시된 그래프는, 도 4 내지 도 7을 참조하여 전술된 시스템(300)에 의해 구축된 스키마에 포함된 NKF 그래프의 예시에 대응할 수 있다.The graph shown in FIG. 8 may correspond to an example of an NKF graph included in a schema built by the system 300 described above with reference to FIGS. 4 through 7.

도 8에서는 텍스트(문장) "포드는 1913년 7월 14일, 미국에서 태어났다"에 대한 NKF 그래프가 도시되었다. In FIG. 8 an NKF graph is shown for the text (phrase) Ford was born in the United States on July 14, 1913.

lbox는 텍스트에 포함된 단어 또는 해당 단어 및 그 단어의 식별 정보 간의 연관 관계를 저장하는 데이터베이스(저장부(330))일 수 있다. 각 단어에 대한 정보는 개체화되어 고유 값을 갖고 lbox 내에 저장될 수 있다. 말하자면, lbox는 구축된 텍스트 스키마의 데이터베이스 구성일 수 있다.The lbox may be a database (storage unit 330) that stores an association relationship between a word included in text or a corresponding word and identification information of the word. Information about each word can be individualized and stored in an lbox with a unique value. In other words, lbox can be a database configuration of a built-in text schema.

하기에서는, NKF 온톨로지 용어 및 도시된 NKF 그래프에 사용된 주석 형식에 대해 설명한다. 사용된 주석은 전술된 각 단어에 대한 식별 정보 및 관계 정보가 포함하는 정보에 대응할 수 있다. 아래에서 설명되는 주석 형식은 일 예시이며, 실시예에서 사용되는 주석 형식은 하기와 상이할 수 있다. In the following, the NKF ontology terminology and annotation format used in the depicted NKF graph are described. The annotation used may correspond to information included in the identification information and the relationship information for each of the words described above. The annotation format described below is an example, and the annotation format used in the embodiment may be different from the following.

1. 단어에 대한 주석 형식1. Formatting comments for words

1) nif:StringURI1) nif: StringURI

String URI는 단어의 문자열의 위치 정보를 나타내는 고유의 값일 수 있다. String URI는 NIF2.0 표준 및 RFC 5147 표준을 따라 다음과 같이 표현될 수 있다.The String URI may be a unique value representing location information of a string of words. String URIs may be expressed as follows according to the NIF2.0 standard and the RFC 5147 standard.

<URL#charx,y><URL # charx, y>

x는 해당 단어 문자열의 시작 위치, y는 끝 위치일 수 있다. 이때 위치 정보는 0으로부터 시작하며, 띄어쓰기는 고려되지 않을 수 있다. 위치 정보는 NIF2.0 표준을 준수하여 다음을 언급하여야 한다. x may be the start position of the word string, y may be the end position. At this time, the location information starts from 0, and the spacing may not be considered. The location information shall refer to the following in conformity with the NIF2.0 standard.

context(6.3.1.1), begin index(6.3.1.2), end index(6.3.1.3)context (6.3.1.1), begin index (6.3.1.2), end index (6.3.1.3)

2) nif:referenceContext2) nif: referenceContext

referenceContext는 해당 단어가 나온 문장의 전체 위치를 나타낼 수 있다. 즉, 지식 베이스에서 단어가 나온 문장의 전체 위치를 나타낼 수 있다. referenceContext는 해당 단어의 context 를 식별하고자 할 때 유용하게 사용될 수 있다.The referenceContext may indicate the entire position of the sentence where the word came from. That is, it can represent the entire position of the sentence from which the word appeared in the knowledge base. The referenceContext can be useful when you want to identify the context of the word.

온톨로지 타입은 "ObjectProperty"이고, 도메인은 "nif:String-entity"이고 범위는 "nif:StringURI "일 수 있다. The ontology type may be "ObjectProperty", the domain may be "nif: String-entity" and the range may be "nif: StringURI".

3) nif:beginIndex3) nif: beginIndex

beginIndex는 지식베이스 내의 단어의 첫 번째 위치를 나타낼 수 있다.beginIndex may indicate the first position of a word in the knowledge base.

온톨로지 타입은 "DatatypeProperty"이고, 도메인은 "nif:String-entity"이고 범위는 "xsd:nonNegativeInger"일 수 있다. The ontology type may be "DatatypeProperty", the domain may be "nif: String-entity", and the range may be "xsd: nonNegativeInger".

4) nif:endIndex4) nif: endIndex

endIndex는 지식베이스 내의 단어의 마지막 위치를 나타낼 수 있다.endIndex may indicate the last position of a word in the knowledge base.

5) nkf:String-entity5) nkf: String-entity

String-entity는 지식 베이스 상에서의 개체, 즉, NKF에서의 단어를 나타낼 수 있다. 온톨로지 타입은 "Class"이다.String-entity can represent an entity on the knowledge base, ie words in the NKF. The ontology type is "Class".

6) nkf:hasStringURI6) nkf: hasStringURI

hasStringURI는 지식 베이스 내의 단어가, 실제 문장에서 어떠한 위치에 존재하는지를 나타매는 프로퍼티(property)일 수 있다.hasStringURI may be a property that indicates where a word in the knowledge base is located in the actual sentence.

온톨로지 타입은 "DatatypeProperty"이고, 도메인은 "nif:String-entity"이고 범위는 "nif:StringURI"일 수 있다. The ontology type may be "DatatypeProperty", the domain may be "nif: String-entity", and the range may be "nif: StringURI".

7) nkf:anchorOf7) nkf: anchorOf

nkf:anchorOf는 지식 베이스 내의 단어가, 문장에서 어떠한 위치에 존재하는지를 나타내는 프로퍼티일 수 있다.nkf: anchorOf may be a property indicating where a word in the knowledge base exists in the sentence.

온톨로지 타입은 "DatatypeProperty"이고, 도메인은 "nif:String-entity"일 수 있다.The ontology type may be "DatatypeProperty" and the domain may be "nif: String-entity".

2. 품사 정보 주석 형식2. Part of speech information comment format

1) nif:oliaLink1) nif: oliaLink

oliaLink는 지식 베이스의 내의 단어에 대한 형태소 분석의 결과인 품사 태그 정보 주석을 나타낼 수 있다.oliaLink may represent a part-of-speech tag information annotation that is the result of stemming the words in the knowledge base.

온톨로지 타입은 "ObjectProperty "이고, 도메인은 " nif:StringURI "이고, 범위는 "Individual"일 수 있다. The ontology type may be "ObjectProperty", the domain may be "nif: StringURI", and the range may be "Individual".

2) nif:oliaCategory2) nif: oliaCategory

oliaCategory는 지식베이스 내의 단어에 대한 형태소 분석의 결과인 품사 개념 주석을 나타낼 수 있다.oliaCategory can represent part-of-speech concept annotations that result from stemming the words in the knowledge base.

온톨로지 타입은 "ObjectProperty "이고, 도메인은 " nif:StringURI "이고, 범위는 "Class"일 수 있다. The ontology type may be "ObjectProperty", the domain may be "nif: StringURI", and the range may be "Class".

3. 개체명 정보 주석 형식3. Object Name Information Comment Format

개체명 정보는 NERD 온톨로지를 사용하며, 이에 대한 주석처리는 W3C의 ITS 온톨로지를 사용할 수 있다.The entity name information uses the NERD ontology, and the annotation processing can use the W3C ITS ontology.

1) itsrdf:taClassRef1) itsrdf: taClassRef

taClassRef는 지식베이스의 단어에 대한 개체명 개념 주석을 나타낼 수 있다.taClassRef can represent an entity name concept annotation for words in the knowledge base.

온톨로지 타입은 "ObjectProperty"이고, 도메인은 " nif:StringURI "이고, 범위는 "Class"일 수 있다. The ontology type may be "ObjectProperty", the domain may be "nif: StringURI", and the range may be "Class".

2) itsrdf:taIdentRef2) itsrdf: taIdentRef

taIdentRef는 지식 베이스의 단어에 대한 개체명 태그 주석을 나타낼 수 있다. 온톨로지 타입은 "ObjectProperty"이고, 도메인은 "nif:StringURI"이고, 범위는 "Individual"일 수 있다.taIdentRef may represent an entity name tag comment for a word in the knowledge base. The ontology type may be "ObjectProperty", the domain may be "nif: StringURI", and the range may be "Individual".

4. 구문 분석 정보 주석 형식4. Parsing Information Comment Format

자연 언어 처리 분석 결과는 문장을 중심으로, 문장의 고유 값에 대한 분석 결과가 나열되는 형태로 존재한다. 따라서, 이를 단어 중심으로 표현하기 위해서는 단어들 간의 의존 관계를 표현할 수 있는 새로운 프로퍼티 정의가 필요하다.The result of natural language processing analysis exists in a form in which the analysis results of the unique values of the sentences are arranged around the sentences. Therefore, in order to express this in terms of words, a new property definition is needed to express dependency relationships between words.

1) nkf:sbj1) nkf: sbj

sbj는 각 String URI 간의 의존 관계를 표시하며, 이 경우 Subject 가 Object 에 대하여 주어 관계에 있음을 의미할 수 있다.sbj indicates a dependency relationship between each String URI, and in this case, it may mean that Subject is in a subject relationship with respect to Object.

온톨로지 타입은 "ObjectProperty"이고, 도메인은 "nif:StringURI"이고, 범위는 "nif:StringURI"일 수 있다.The ontology type may be "ObjectProperty", the domain may be "nif: StringURI", and the range may be "nif: StringURI".

2) nkf:obj2) nkf: obj

sbj는 각 String URI 간의 의존 관계를 표시하며, 이 경우 Subject 가 Object 에 대하여 목적어 관계에 있음을 의미할 수 있다.sbj indicates a dependency relationship between each String URI, and in this case, it may mean that Subject is in object relation to Object.

3) nkf:ajt3) nkf: ajt

ajt는 각 String URI 간의 의존관계를 표시하며, 이 경우 Subject 가 Object 에 대하여 부사 관계에 있음을 의미할 수 있다.ajt indicates a dependency between each String URI, and in this case, it may mean that Subject is an adverb with respect to Object.

4) nkf:josa4) nkf: josa

josa는 각 String URI 간의 의존관계를 표시하며, 이 경우 Object 가 Subject 에 대한 조사임을 의미할 수 있다.josa indicates the dependency between each String URI. In this case, it can mean that Object is a subject's investigation.

5) nkf:dp5) nkf: dp

dp는 각 String URI 간의 기타 의존 관계를 표시할 수 있다. 온톨로지 타입은 "ObjectProperty"이고, 도메인은 "nif:StringURI"이고, 범위는 "nif:StringURI"일 수 있다.dp can indicate other dependencies between each String URI. The ontology type may be "ObjectProperty", the domain may be "nif: StringURI", and the range may be "nif: StringURI".

전술된 주석 형식들을 사용함으로써, 도시된 것처럼 텍스트 내에 포함된 단어를 중심으로 한 정보를 RDF 그래프의 형식으로 표현할 수 있다. By using the above-described annotation formats, it is possible to express information about words contained in the text in the form of an RDF graph as shown.

앞서 도 1 내지 7을 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.Since the technical contents described above with reference to FIGS. 1 to 7 may be applied as it is, a more detailed description will be omitted below.

도 9a 및 9b는, 예컨대, 공통된 단어 "Gates"를 포함하는 복수의 텍스트들(문장 1: "Gates co-founded Microsoft", 문장 2: "Gates has partnership with Paul")에 대해 구축된 스키마에 포함된 NKF 그래프의 예시에 대응할 수 있다.9A and 9B are included in a schema built for a plurality of texts (eg, sentence 1: "Gates co-founded Microsoft", sentence 2: "Gates has partnership with Paul") containing the common word "Gates", for example. It may correspond to an example of the NKF graph.

아래에서는, 구체적인 적용 예시를 설명한다.In the following, specific application examples will be described.

하기 예시에서는, 단어 "Gates" 와 "Micosoft" 에 대한 개체화가 RDF 스키마에서 이루어진다.In the example below, an individualization of the words "Gates" and "Micosoft" is made in the RDF schema.

네임스페이스(namespace) URI가 지정(specify)되어야 한다.Namespace URIs must be specified.

@prefix nkf: <http://example.nkf.org/nkf/>@prefix nkf: <http://example.nkf.org/nkf/>

@prefix : <http://example.nkf.org/entity/>@prefix: <http://example.nkf.org/entity/>

스트링의 개체화(Entitization of the string)Entitization of the string

:String rdf:type nkf:Entity : String rdf: type nkf: Entity

예시 문장: "Gates co-founded Microsoft"Example sentence: "Gates co-founded Microsoft"

:Gates rdf:type nkf:Entity: Gates rdf: type nkf: Entity

:Microsoft rdf:type nkf:Entity: Microsoft rdf: type nkf: Entity

개체화된 스트링은 "명사", "동사", "네임이 부여된 개체(Named Entity)" 또는 사용자가 관심을 가질 수 있는 가능한 여하한 스트링일 수 있다.An individualized string can be a "noun", a "verb", a "named entity", or any other possible string of interest to the user.

개체화된 단어에 대해 고유값(URI)이 부여된다. 본 예시에서는 개체화된 단어 "Gates"에 대해 웹 문서의 모든 "Gates"의 각각에 대한 고유값이 그 위치 정보에 기반하여 부여되었다.A unique value (URI) is assigned to the individualized word. In this example, for the individualized word "Gates", a unique value for each of the "Gates" of the web document is given based on the location information.

개체화된 스트링을 대응하는 String URI 및/또는 Sentence URI에 맵핑시킬 수 있다.The individualized string can be mapped to the corresponding String URI and / or Sentence URI.

:String nkf:hasStringURI <StringURI>: String nkf: hasStringURI <StringURI>

:String nkf:hasSentenceURI <SentenceURI>: String nkf: hasSentenceURI <SentenceURI>

:Gates : Gates

nkf:hasStringURInkf: hasStringURI

<http://example.org/example.html#char=0,5>;<http://example.org/example.html#char=0,5>;

nkf:hasSentenceURInkf: hasSentenceURI

<http://example.org/example.html#char=0,25><http://example.org/example.html#char=0,25>

고유값 (URI)에 대해, 각각의 실제 단어를 매핑시킨다. 스트링을 대응하는 String URI 및/또는 Sentence URI에 맵핑시킬 수 있다.For an eigenvalue (URI), map each actual word. You can map a string to the corresponding String URI and / or Sentence URI.

<StringURI> rdf:type nkf:String<StringURI> rdf: type nkf: String

<StringURI> nkf:string "STRING"<StringURI> nkf: string "STRING"

<StringURI> nkf:hasSentenceURI <SentenceURI><StringURI> nkf: hasSentenceURI <SentenceURI>

<http://example.org/example.html#char=0,5><http://example.org/example.html#char=0,5>

rdf:type nkf:String;rdf: type nkf: String;

nkf:string "Gates";nkf: string "Gates";

nkf:hasSentenceURInkf: hasSentenceURI

앞서 부여된 각 단어의 고유값에 대해 해당 단어의 품사 정보가 주석으로서 추가된다(Part-of-speech tag annotation).The part-of-speech information of the word is added as an annotation for the unique value of each word given above (Part-of-speech tag annotation).

nkf:pos tag:ProperNounnkf: pos tag: ProperNoun

앞서 부여된 각 단어의 고유값에 대해 해당 단어의 개체명 태그 정보가 주석으로서 추가된다(NER tag annotation).The entity name tag information of the word is added as an annotation for the unique value of each word given above (NER tag annotation).

nkf:ner tag:Personnkf: ner tag: Person

앞서 부여된 각 단어의 고유값에 대해, 구문(단어)의 의존 관계가 주석으로서 추가된다.For the unique value of each word given above, the dependency of the phrase (word) is added as a comment.

<StringURI> nkf:dependency <StringURI><StringURI> nkf: dependency <StringURI>

nkf:dependencynkf: dependency

<http://example.org/example.html#char=6,16>;<http://example.org/example.html#char=6,16>;

nkf:rolenkf: role

role:nsbjrole: nsbj

도 9a 및 9b에서 도시된 것처럼, 동일한 단어("Gates")가 복수의 텍스트들 내에 존재하더라도, 각 텍스트에 대해 상이한 StringURI(StringURI 1, 4)가 할당됨으로써 각 텍스트와 관련된 정보가 컨텍스트(context)에 따라 구별될 수 있다.9A and 9B, even if the same word ("Gates") is present in the plurality of texts, a different StringURI (StringURI 1, 4) is assigned to each text so that the information associated with each text is contextual. Can be distinguished according to.

또한, 도시된 것처럼, 문장 1에 대한 NKF 그래프 및 문장 2에 대한 그래프가 서로 연관되어 있음으로써, "Paul"과 "Microsoft" 간의 관계에 대한 정보도 획득할 수 있다. 예컨대, 사용자가 "Paul"을 검색할 경우, 별도의 외부 색인 장치의 도움 없이도, "Paul"이 "Microsoft"의 공동 설립자라는 정보 또한 획득할 수 있을 것이다. In addition, as illustrated, the NKF graph for sentence 1 and the graph for sentence 2 are related to each other, so that information about the relationship between "Paul" and "Microsoft" may also be obtained. For example, if a user searches for "Paul", the information that "Paul" is co-founder of "Microsoft" may also be obtained without the help of a separate external indexing device.

도 9b에서 도시된 것과 같은 NKF 색인용 스키마가 없을 경우에는, 공통된 단어("Gates")가 나와 있는 모든 용례를 찾고자 할 경우, 모든 stringURI에 대하여 해당 단어("Gates")가 존재하는지를 stringURI의 갯수 만큼 조회하여야 하기 때문에 검색 속도가 stringURI 갯수에 비례하여 느려질 수 있다(즉, 종래의 NIF의 경우 모든 stringURI 마다 각각 "Gates", "Microsoft"에 해당하는지를 조회해야 하고, nif:dependency가 존재하는지를 검색해야 함에 비해, 본 실시예의 경우에는 "Gates", "Microsoft"가 이미 지식베이스의 등록되어 있으므로 stringURI를 조회하지 않고, nif:dependency를 검색할 수 있음).In the absence of a schema for NKF indexes such as that shown in FIG. 9B, if you want to find all the cases where a common word ("Gates") is found, the number of stringURIs for all stringURIs The search speed may be slower in proportion to the number of stringURIs (ie, in the case of the conventional NIF, each stringURI should be checked for "Gates" and "Microsoft" respectively, and a nif: dependency exists. In contrast, in the present embodiment, since "Gates" and "Microsoft" are already registered in the knowledge base, nif: dependency can be searched without querying stringURI.

앞서 도 1 내지 도 8을 참조하여 설명된 기술적 내용들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 이하 생략하기로 한다.Since the technical contents described above with reference to FIGS. 1 to 8 may be applied as it is, a more detailed description will be omitted below.

전술된 실시예들에 대한 설명은 국문 및 영문이 아닌 다른 언어로 쓰여진 텍스트에 대해서도 유사하게 적용될 수 있다.The description of the above embodiments may be similarly applied to texts written in languages other than Korean and English.

전술된 설명은 ISO/TC 37/SC 4 Language Resource Management 의 국제표준과 관련될 수 있다.The foregoing description may relate to the international standard of ISO / TC 37 / SC 4 Language Resource Management.

또한, 전술된 실시예들과 관련된 설명에는 하기의 표준들에 설명된 내용들이 적용될 수 있다.In addition, the contents described in the following standards may be applied to the description related to the above-described embodiments.

W3C (World Wide Web Consortium)World Wide Web Consortium (W3C)

RDF (Resource Description Framework)Resource Description Framework (RDF)

RDF - http://www.w3.org/RDF/RDF-http://www.w3.org/RDF/

RDFS 1.1 (RDF Schema) - http://www.w3.org/TR/rdf-schema/ RDFS 1.1 (RDF Schema)-http://www.w3.org/TR/rdf-schema/

OWL (Web Ontology Language)OWL (Web Ontology Language)

OWL - http://www.w3.org/TR/owl-features/ OWL-http://www.w3.org/TR/owl-features/

ITS (Internationalization Tag Set)ITS (Internationalization Tag Set)

ITS 2.0 - http://www.w3.org/TR/its20/ ITS 2.0-http://www.w3.org/TR/its20/

NIF (NLP Interchange Format)NIF (NLP Interchange Format)

NIF core NIF core

- http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html -http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html

NERD (Named Entity Recognition and Disambiguation)NERD (Named Entity Recognition and Disambiguation)

- http://nerd.eurecom.fr/ontology -http://nerd.eurecom.fr/ontology

OLiA (Ontologies of Linguistic Annotation)OnLiologies of Linguistic Annotation (OLiA)

- http://nachhalt.sfb632.uni-potsdam.de/owl/-http://nachhalt.sfb632.uni-potsdam.de/owl/

텍스트에 대한 구문적 구조의 도출, 분석 형식 및 그 표현은 표준 "ISO 24615-1:2014 Language resource management -- Syntactic annotation framework (SynAF) -- Part 1: Syntactic model"에서 설명된 내용들이 적용될 수 있다. The derivation of the syntactic structure of the text, the format of its analysis, and its expression may be applied to those described in the standard "ISO 24615-1: 2014 Language resource management-Syntactic annotation framework (SynAF)-Part 1: Syntactic model". .

용어 주석(annotation)은 표준 "ISO　24615-1:2014 Syntactic Annotation Framework, 3.9에서 설명된 내용 (Feature-value pair denoting a linguistic property of a linguistic segment)에 대응할 수 있다.The term annotation may correspond to a feature-value pair denoting a linguistic property of a linguistic segment described in the standard "ISO # 24615-1: 2014 Syntactic Annotation Framework, 3.9."

텍스트에 대한 형태소 분석(Morphological analysis) 은 표준 "ISO 24611:2012 Language resource management -- Morpho-syntactic annotation framework (MAF)"에서 설명된 내용이 참조될 수 있다. Morphological analysis of the text may be referred to the content described in the standard "ISO 24611: 2012 Language resource management-Morpho-syntactic annotation framework (MAF)".

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments may be, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable arrays (FPAs), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

Claims

Identifying at least one word included in the text;

Associating said identification word with identification information including location information associated with said identified word;

Identifying relationship information between the identified word and other words included in the text based on the identification information; And

Constructing a schema for the text based on the identification information and the relationship information

Including, how to build a schema for text.

The method of claim 1,

And a word contained within the text is itself a character or a contiguous string in the text that has meaning.

The method of claim 1,

Identifying the at least one word,

Determining target words valid for building the schema among words included in the text; And

Extracting the determined target words

Including, how to build a schema for text.

The method of claim 3,

The associating the identification information may include associating identification information of each of the extracted target words with a corresponding target word,

The identifying of the relationship information may include identifying relationship information between at least two target words among the extracted target words,

The step of building the schema,

Determining a plurality of nodes associated with each of the extracted target words and each of the identification information associated with the extracted target words;

Based on the identified relationship information, connecting a first node associated with a first target word of the extracted target words and a second node associated with a second target word associated with the first target word; And

Connecting the node representing the identification information of the first target word with the first node, and connecting the node representing identification information of the second target word with the second node.

Including, how to build a schema for text.

The method of claim 4, wherein

Wherein the first target word is a word included in a search word input by a user.

The method of claim 1,

And information relating to the association of the identified words with the identification information is stored in a database.

The method of claim 1,

The identification information includes at least one of Uniform Resource Identifier (URI) information of the identified word, information representing a part of speech of the identified word, and linguistic information of the identified word. How to build a schema for it.

The method of claim 7, wherein

The identification information is information indicating a position of the text in a document containing the text, information indicating a position in the text of the identified word, information indicating a stemming result of the identified word, the identified word And at least one of information representing a part-of-speech of the information, information representing the entity name of the identified word, and information representing a tag of the entity name of the identified word.

The method of claim 1,

If the identified word is a word that is also included in other text different from the text, identifying the relationship information identifies relationship information between the identified word and other words included in the other text,

The constructing of the schema may include constructing a schema for the text based on the identification information of the identified word and relationship information between the identified word and other words included in the other text. How to build.

The method of claim 1,

Wherein said relationship information includes information indicative of a dependency relationship between said identified word and said other word.

The method of claim 10,

And the dependency relationship is at least one of a spelling relationship, a subject-object relationship, and a mathematical relationship between the identified word and the other word.

Identifying at least one word included in the text;

Identifying relationship information between the identified word and other words included in the text based on the identification information;

Constructing a schema for the text based on the identification information and the relationship information; and

Storing the constructed schema in a database

Including, how to build a database of schema for text.

Identifying at least one word included in the text;

Constructing a schema for the text based on the identification information and the relationship information;

Receiving a search word from a user; And

If the search word includes the identified word, providing a search result based on the established schema as a search result for the search request of the user by the search word

Including, search results providing method.

Identify at least one word included in text, associate the identified word with identification information including location information associated with the identified word, and include in the identified word and the text based on the identification information A control unit for identifying relationship information between different words and building a schema for the text based on the identification information and the relationship information; And

A storage unit for storing at least one of information about an association between the constructed schema, the identified word, and the identification information

A system for processing information included in text, including.