KR102023573B1

KR102023573B1 - System and method for providing intelligent voice imformation

Info

Publication number: KR102023573B1
Application number: KR1020170166687A
Authority: KR
Inventors: 조정현; 김익재; 최희승; 남기표
Original assignee: 한국과학기술연구원
Priority date: 2017-12-06
Filing date: 2017-12-06
Publication date: 2019-09-24
Anticipated expiration: 2037-12-06
Also published as: KR20190066862A

Abstract

실시예들은 대상 공간에 포함된 관심 객체를 노드로 변환한 제1 공간 맥락을 생성하는 공간 맥락 변환부; 상기 대상 공간에 위치하는 사용자의 위치를 산출하며, 상기 사용자의 위치와 상기 제1 공간 맥락을 결합하여 제2 공간 맥락을 생성하는 공간 처리부; 및 상기 대상 공간에 연관된 음성 질의를 수신한 경우, 상기 제2 공간 맥락에 기반하여 상기 음성 질의에 대응하는 답변을 생성하는 음성 처리부를 포함하는 지능형 음성 정보 제공 시스템 및 상기 시스템에 의해 수행되는 지능형 음성 정보 제공 방법에 관한 것이다.Embodiments may include a spatial context converter configured to generate a first spatial context obtained by converting an object of interest included in a target space into a node; A spatial processor configured to calculate a location of a user located in the target space and to generate a second spatial context by combining the location of the user and the first spatial context; And a voice processing unit configured to generate an answer corresponding to the voice query based on the second spatial context when the voice query associated with the target space is received, and the intelligent voice performed by the system. It relates to a method of providing information.

Description

Intelligent voice information providing system and method {SYSTEM AND METHOD FOR PROVIDING INTELLIGENT VOICE IMFORMATION}

본 발명의 실시예들은 정보 제공에 관한 것으로서, 보다 상세하게는 사용자가 위치 가능한 공간을 공간 맥락(space context)으로 분석하고, 상기 공간에 연관된 음성 질의가 입력되면 상기 공간 맥락에 기반하여 음성 답변을 제공하는 것에 관한 것이다.Embodiments of the present invention relate to providing information, and more particularly, to analyze a space in which a user is located in a space context, and to input a voice answer based on the space context when a voice query related to the space is input. It is about providing.

최근 음성 인식 기술 및 AI(Artificial Intelligence) 기술의 발전으로 인해, 인간의 음성에 포함된 의미를 이해하고 이에 대응되는 답변을 제공하는 음성 정보 제공 기술 또한 발전하고 있다. 이와 관련된 음성 정보 제공 서비스 시장(예컨대, 지능형 가상 비서 시장)은 해외 글로벌 기업들의 다양한 음성 정보 제공 서비스(예를 들어, 애플의 시리(Siri), 아마존의 에코(Echo), 구글의 구글 어시스턴트(Google Assistant), 마이크로소프트의 코타나(Cortana), 삼성의 빅스비(Bixby))로 인해 급격한 성장을 하고 있는 추세이다. Recently, due to the development of voice recognition technology and AI (Artificial Intelligence) technology, voice information providing technology for understanding the meaning included in the human voice and providing an answer corresponding thereto is also being developed. The related voice information service market (e.g., intelligent virtual assistant market) includes various voice information service providers of overseas global companies (e.g., Apple's Siri, Amazon's Echo, and Google's Google Assistant). Assistant), Microsoft's Cortana, and Samsung's Bixby) are growing rapidly.

현재 음성 정보 제공 서비스는 단순히 음성 인식 및 음성 정보 제공 기술을 넘어서, 고도화된 정보 처리 기술을 기반으로 음성, 시각 및/또는 촉각 등 다양한 측면에서 인식된 음성에 대한 답변을 제공하도록 고도화되고 있으며, 이런 서비스들은 특히 시각적으로 불편하거나 장애가 있는 사용자들에게 큰 도움을 주고 있다. Currently, voice information providing services are being advanced to provide answers to voices recognized in various aspects such as voice, visual and / or tactile based on advanced information processing technology, beyond simply voice recognition and voice information providing technology. Services are particularly helpful for users who are visually uncomfortable or disabled.

그러나 현재의 음성 정보 제공 서비스는 GPS 기반의 거시적 위치 기술에 기반하여 사용자가 위치하는 공간을 인지하고 이에 대하여 정보를 제공하기 때문에, 사용자가 위치하는 공간에 대해서 구체적인 인지가 이루어지지 않는 문제점이 있었다. 이로 인해, 상술한 사용자들이 현재 위치하고 있는 미시적인 공간에 대해서는 구체적인 음성 정보를 현재의 음성 정보 제공 서비스를 통해 제공받을 수 없는 한계가 있었다.However, the current voice information providing service recognizes the space where the user is located based on the GPS-based macroscopic location technology and provides information on the space. Therefore, there is a problem in that the specific space is not recognized. For this reason, there is a limit in that specific voice information cannot be provided through the current voice information providing service for the micro space where the above users are currently located.

특허공개공보 제10-2014-0068855호Patent Publication No. 10-2014-0068855 특허공개공보 제10-2016-0069329호Patent Publication No. 10-2016-0069329

본 발명의 실시예들에 따르면, 사용자가 위치 가능한 공간을 공간 맥락(space context)으로 분석하고, 상기 공간에 연관된 음성 질의가 입력되면 상기 공간 맥락에 기반하여 음성 답변을 제공하는 시스템 및 방법이 제공된다.According to embodiments of the present invention, there is provided a system and method for analyzing a space in which a user is located in a space context, and providing a voice answer based on the space context when a voice query associated with the space is input. do.

본 발명의 일 측면에 따른 지능형 음성 정보 제공 시스템은 대상 공간에 포함된 관심 객체를 노드로 변환한 제1 공간 맥락을 생성하는 공간 맥락 변환부; 상기 대상 공간에 위치하는 사용자의 위치를 산출하며, 상기 사용자의 위치와 상기 제1 공간 맥락을 결합하여 제2 공간 맥락을 생성하는 공간 처리부; 및 상기 대상 공간에 연관된 음성 질의를 수신한 경우, 상기 제2 공간 맥락에 기반하여 상기 음성 질의에 대응하는 답변을 생성하는 음성 처리부를 포함할 수 있다. An intelligent voice information providing system according to an aspect of the present invention comprises: a spatial context converter configured to generate a first spatial context obtained by converting an object of interest included in a target space into a node; A spatial processor configured to calculate a location of a user located in the target space and to generate a second spatial context by combining the location of the user and the first spatial context; And a voice processor configured to generate an answer corresponding to the voice query based on the second spatial context when the voice query associated with the target space is received.

일 실시예에서, 상기 공간 맥락 변환부는 상기 대상 공간의 원시 영상 정보를 획득하고, 상기 원시 영상 정보를 분석하여 상기 대상 공간에 포함된 관심 객체를 관심 객체 노드로 형상화하며, 상기 관심 객체 노드 및 관심 객체 노드를 연결한 연결선을 포함한 제1 공간 맥락을 생성하도록 더 구성될 수 있다. In one embodiment, the spatial context converter obtains the raw image information of the target space, analyzes the raw image information, and shapes the object of interest included in the target space into an object of interest node, the object of interest node and the interest. It may be further configured to create a first spatial context that includes a connecting line connecting the object nodes.

일 실시예에서, 상기 공간 맥락 변환부는 상기 원시 영상 정보에 기초하여 상기 대상 공간을 3차원 베이직 공간으로 재구성하고, 상기 3차원 베이직 공간 내에 위치하는 상기 관심 객체를 인식하고 색인하며, 상기 색인된 객체를 관심 객체 노드로 형상화하도록 더 구성될 수 있다. In one embodiment, the spatial context transform unit reconstructs the object space into a three-dimensional basic space based on the raw image information, recognizes and indexes the object of interest located in the three-dimensional basic space, and the indexed object. May be further configured to shape the object of interest.

일 실시예에서, 상기 공간 맥락 변환부는 상기 3차원 베이직 공간을 단일 관심 객체를 포함한 세그먼트로 분할하며, 각 세그먼트에 상기 관심 객체에 대응되는 공간적 형상을 적용하도록 더 구성될 수 있다. In one embodiment, the spatial context converter divides the three-dimensional basic space into segments including a single object of interest, and may be further configured to apply a spatial shape corresponding to the object of interest to each segment.

일 실시예에서, 상기 공간 처리부는 상기 사용자 위치에 기초하여 상기 사용자를 사용자 노드로 형상화하고, 상기 사용자 노드를 상기 제1 공간 맥락에 포함시키며, 상기 사용자 노드를 시작점으로 설정하고 각 노드 사이를 다른 노드를 경유하지 않도록 연결한 연결선을 포함한 제2 공간 맥락을 생성하도록 더 구성될 수 있다. In one embodiment, the spatial processor shapes the user into a user node based on the user location, includes the user node in the first spatial context, sets the user node as a starting point, and alternates between each node. It may be further configured to create a second spatial context that includes connecting lines that do not pass through the node.

일 실시예에서, 상기 공간 처리부는 상기 사용자의 1인칭 시점에 대응하는 1인칭 영상을 획득하고, 상기 1인칭 시점 영상에 포함된 관심 객체의 형상에 기초하여 상기 사용자의 위치를 산출하도록 더 구성될 수 있다.The spatial processor may be further configured to acquire a first-person image corresponding to the first-person view of the user and calculate a location of the user based on a shape of the object of interest included in the first-person view image. Can be.

일 실시예에서, 상기 공간 처리부는 상기 사용자에 대한 3인칭 시점에 대응하는 3인칭 영상을 획득하고, 상기 3인칭 시점 영상에 포함된 상기 사용자의 형상에 기초하여 상기 사용자의 위치를 산출하도록 더 구성될 수 있다. In an embodiment, the spatial processor is further configured to acquire a third person image corresponding to the third person view of the user and calculate a location of the user based on the shape of the user included in the third person view image. Can be.

일 실시예에서, 상기 음성 질의의 내용에 제1 관심 객체만이 포함된 경우, 상기 답변은 상기 제1 관심 객체와 상이한 제2 관심 객체와의 상대적인 위치 관계로 표현될 수 있다. In an embodiment, when only the first object of interest is included in the content of the voice query, the answer may be expressed as a relative positional relationship with the second object of interest different from the first object of interest.

일 실시예에서, 상기 제1 관심 객체와의 거리에 기초하여 결정될 수 있다. In one embodiment, the distance may be determined based on a distance from the first object of interest.

일 실시예에서, 상기 제2 관심 객체는 상기 대상 공간에 위치한 관심 객체의 속성에 더 기초하여 결정될 수 있다. In an embodiment, the second object of interest may be determined based on attributes of the object of interest located in the target space.

일 실시예에서, 상기 음성 질의의 내용에 상기 대상 공간의 서브 공간이 연관된 경우, 상기 답변은 상기 서브 공간의 위치를 이용하여 생성될 수 있다. 여기서, 상기 서브 공간의 위치는 상기 서브 공간에 연관된 관심 객체를 탐색하고, 상기 탐색된 관심 객체에 기초하여 결정될 수 있다. In an embodiment, when a sub space of the target space is associated with the content of the voice query, the answer may be generated using the location of the sub space. Here, the location of the subspace may be searched for an object of interest associated with the subspace and determined based on the searched object of interest.

일 실시예에서, 상기 서브 공간의 위치는 상기 탐색된 관심 객체의 위치에 대하여 중심 좌표일 수 있다. In one embodiment, the location of the subspace may be a center coordinate with respect to the location of the searched object of interest.

본 발명의 다른 일 측면에 따른 지능형 음성 정보 제공 시스템에 의해 수행되는 지능형 음성 정보 제공 방법은 대상 공간에 포함한 관심 객체를 노드로 형상화한 제1 공간 맥락을 획득하는 단계; 상기 대상 공간에 위치한 사용자의 위치를 산출하는 단계; 상기 사용자의 위치와 상기 제1 공간 맥락을 결합하여 제2 공간 맥락을 생성하는 단계; 상기 대상 공간에 연관된 음성 질의를 수신한 경우, 상기 제2 공간 맥락에 기반하여 상기 음성 질의에 대응하는 답변을 생성하는 단계를 포함할 수 있다. An intelligent voice information providing method performed by an intelligent voice information providing system according to another aspect of the present invention includes the steps of: obtaining a first spatial context in which an object of interest included in a target space is shaped into a node; Calculating a location of a user located in the target space; Combining a location of the user with the first spatial context to create a second spatial context; If a voice query associated with the target space is received, generating an answer corresponding to the voice query based on the second spatial context.

일 실시예에서, 상기 제1 공간 맥락을 획득하는 단계는 상기 대상 공간의 원시 영상 정보를 획득하는 단계; 상기 원시 영상 정보를 분석하여 상기 대상 공간에 위치한 관심 객체를 관심 객체 노드로 형상화하는 단계; 및 상기 관심 객체 노드 및 상기 관심 객체 노드를 연결한 연결선을 포함하는 제1 공간 맥락을 생성하는 단계를 더 포함할 수 있다. In an embodiment, the acquiring of the first spatial context may include acquiring raw image information of the object space; Analyzing the raw image information and shaping an object of interest located in the target space into an object of interest node; And generating a first spatial context including a connection line connecting the object of interest node and the object of interest node.

일 실시예에서, 상기 원시 영상 정보를 분석하여 상기 대상 공간에 위치한 관심 객체를 관심 객체 노드로 형상화하는 단계는 상기 원시 영상 정보에 기초하여 대상 공간을 3차원 베이직 공간으로 재구성하는 단계; 상기 3차원 베이직 공간에 포함된 관심 객체를 인식하는 단계; 상기 인식된 관심 객체를 색인하는 단계; 및 상기 색인된 객체를 관심 객체 노드로 형상화하는 단계를 포함할 수 있다. In an embodiment, analyzing the raw image information and shaping the object of interest located in the target space into the object of interest node may include reconfiguring the target space into a 3D basic space based on the raw image information; Recognizing an object of interest included in the three-dimensional basic space; Indexing the recognized object of interest; And shaping the indexed object into an object of interest node.

일 실시예에서, 상기 원시 영상 정보를 분석하여 상기 대상 공간에 위치한 관심 객체를 관심 객체 노드로 형상화하는 단계는 단일 관심 객체를 포함한 세그먼트로 상기 3차원 베이직 공간을 분할하는 단계; 및 상기 단일 관심 객체에 대응되는 공간적 형상을 상기 세그먼트에 적용하는 단계를 더 포함할 수 있다. In an embodiment, analyzing the raw image information and shaping the object of interest located in the target space into an object of interest node may include: dividing the three-dimensional basic space into segments including a single object of interest; And applying a spatial shape corresponding to the single object of interest to the segment.

일 실시예에서, 상기 제2 공간 맥락을 생성하는 단계는 상기 사용자 위치에 기초하여 상기 사용자를 사용자 노드로 형상화하는 단계; 및 상기 사용자 노드를 상기 제1 공간 맥락에 포함시키는 단계를 포함할 수 있다. In one embodiment, creating the second spatial context comprises shaping the user into a user node based on the user location; And including the user node in the first spatial context.

일 실시예에서, 상기 제2 공간 맥락을 생성하는 단계는 상기 사용자 노드를 시작점으로 설정하는 단계; 및 각 노드 사이를 다른 노드를 경유하지 않도록 연결하는 연결선을 생성하는 단계를 더 포함할 수 있다. In one embodiment, creating the second spatial context comprises setting the user node as a starting point; And generating a connection line connecting each node so as not to pass through another node.

일 실시예에서, 상기 제2 공간 맥락을 이용하여 상기 음성 질의에 대응하는 답변을 생성하는 단계는 상기 음성 질의의 내용을 분석하여 필요 정보를 요청하는 단계; 상기 요청에 응답하여 상기 제2 공간 맥락에 포함된 노드를 탐색하는 단계; 탐색 결과에 기초하여 필요 정보를 생성하고 리턴하는 단계; 및 상기 리턴된 필요 정보를 이용하여 답변을 생성하는 단계를 포함할 수 있다. In an embodiment, generating an answer corresponding to the voice query using the second spatial context may include analyzing the contents of the voice query and requesting necessary information; Searching for a node included in the second spatial context in response to the request; Generating and returning necessary information based on the search result; And generating an answer using the returned necessary information.

컴퓨터 판독가능 기록매체에 저장된 명령어가 실행되는 경우 프로세서가 실시예들에 따른 지능형 음성 정보 제공 방법을 수행하게 할 수 있다. When the instructions stored in the computer-readable recording medium are executed, the processor may perform the intelligent voice information providing method according to the embodiments.

본 발명의 실시예들에 따르면, 사용자가 위치 가능한 공간을 노드와 연결선으로 구성된 공간 맥락으로 재해석하여 공간 내에 위치하는 물체 또는 사용자의 위치를 음성을 통해 공간과 시각에 대한 공감각적인 정보로 사용자에게 제공할 수 있다. 여기서, 공감각적인 정보는 질의 대상 객체를 기타 객체들과의 상대적인 위치 관계로 표현한 정보를 나타낸다. According to the embodiments of the present invention, the user's location is reinterpreted in the context of a space composed of nodes and connecting lines, and the user's location of the object or the user's location within the space is transmitted to the user as the synesthesia information about the space and the time through voice. Can provide. Here, synesthesia information represents information representing a query object in a positional relationship with other objects.

그 결과, 사용자는 알고자 하는 대상(사용자 본인 또는 물체)의 위치를 상대적인 위치 관계 표현으로 제공받게 되어 보다 직관적으로 대상의 위치를 알 수 있다. 특히, 물체의 위치를 사용자와의 상대적인 위치 관계로 알 수 있어 사용자 편의성을 극대화할 수 있다. As a result, the user is provided with a relative positional expression of the position of the object (user or object) to be known, so that the position of the object can be known more intuitively. In particular, the position of the object can be known by the relative positional relationship with the user can maximize the user convenience.

또한, 시각적으로 불편한 사용자가 대부분의 시간을 소비하는 실내 공간에서 생활하는데 편의성을 제공할 수 있다. 이 과정에서 GPS와 같은 거시적인 위치 기반 기술을 이용하지 않기 때문에, GPS 신호가 기능을 하는데 어려움이 있는 실내 공간에서도 원활하게 사용자에게 서비스를 제공할 수 있다. In addition, it can provide convenience for a user who is visually inconvenient to live in an indoor space where they spend most of their time. Since this process does not use macroscopic location-based technology such as GPS, it can smoothly provide services to the user even in indoor spaces where GPS signals are difficult to function.

나아가, 상기 공간에 대한 위치를 이용하는 경우 상대적으로 단순화된 공간 맥락을 이용하게 되어 보다 빠른 처리 속도를 제공할 수 있다. In addition, the use of a position relative to the space allows a relatively simplified spatial context to be used to provide a faster processing speed.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 청구범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

도 1은, 본 발명의 일 실시예에 따른, 지능형 음성 정보 제공 시스템의 개념도이다.
도 2는, 본 발명의 일 실시예에 따른, 지능형 음성 정보 제공 시스템에 의해 수행되는 지능형 음성 정보 제공 방법의 흐름도이다.
도 3은, 본 발명의 일 실시예에 따른, 공간 전처리 과정의 흐름도이다.
도 4는, 본 발명의 일 실시예에 따른, 공간 전처리 과정을 영상으로 나타낸 도면이다.
도 5는, 본 발명의 일 실시예에 따른, 음성 질의에 대응하는 답변을 생성하는 과정의 흐름도이다.
상기 도면들은 단지 도시(illustration)의 목적을 위해서 본 발명의 다양한 실시예들을 묘사한다. 통상의 기술자는 본 명세서에 설명된 구조 및 방법의 대안적인 실시예가 본 명세서에 설명된 발명의 원리를 벗어나지 않고 사용될 수도 있다는 것을 다음의 설명으로부터 용이하게 인식할 수 있을 것이다.1 is a conceptual diagram of an intelligent voice information providing system according to an embodiment of the present invention.
2 is a flowchart of an intelligent voice information providing method performed by an intelligent voice information providing system according to an embodiment of the present invention.
3 is a flowchart of a spatial preprocessing process according to an embodiment of the present invention.
4 is a diagram illustrating an image of a spatial preprocessing process according to an embodiment of the present invention.
5 is a flowchart of a process of generating an answer corresponding to a voice query according to an embodiment of the present invention.
The drawings depict various embodiments of the invention for illustration purposes only. Those skilled in the art will readily appreciate from the following description that alternative embodiments of the structures and methods described herein may be used without departing from the principles of the invention described herein.

실시예들은 여기에 첨부된 도면들을 참조하여 설명될 것이다 그러나, 여기에 개시된 원리들은 많은 상이한 형태로 구현될 수도 있으며 여기에서 기재된 실시예로 제한되어 생각되지 않아야 한다. 발명의 상세한 설명에서, 잘 알려진 특징 및 기술에 대한 상세한 설명이 실시예의 특징을 불필요하게 불명확하게 하는 것을 피하기 위해 생략될 수도 있다.Embodiments will be described with reference to the accompanying drawings, however, the principles disclosed herein may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the detailed description of the invention, detailed descriptions of well-known features and techniques may be omitted to avoid unnecessarily obscuring the features of the embodiments.

본 명세서에서 사용자가 위치 가능한 공간은 일반적으로 사용자가 생활하는 주거 공간(예컨대, 집 전체 또는 특정 방(room)) 지칭하나, 사용자가 상당 시간을 보낼 수 있는 공간(예컨대, 직장, 가족의 집, 친척의 집 등)을 또한 지칭할 수 있다. In the present specification, a space where a user can be located generally refers to a living space where the user lives (eg, an entire house or a specific room), but a space where the user can spend a considerable time (eg, a workplace, a family home, Relatives' homes, etc.) may also be referred to.

또한, 본 명세서에서 실내 공간은 일반적으로 외부와 적어도 일시적 및/또는 부분적으로 차단되어 일정한 면적을 가지고 개념적으로 구분될 수 있는 다양한 공간(예컨대, 개폐 가능한 문을 가지면서, 상당 시간 열려 있는 공간, 차량 내부, 직장 내 파티션 등)을 지칭한다.In addition, in the present specification, the interior space is generally at least temporarily and / or partially blocked from the outside, and various spaces that can be conceptually divided with a certain area (eg, a space that is open for a considerable time while opening and closing a door, a vehicle Internal, rectal partitions, etc.).

본 명세서에서 대상 공간은 사용자가 위치 가능한 공간을 보다 단순하게 재정의한 공간 맥락으로 변환되는 대상을 지칭한다.In this specification, the object space refers to an object that is transformed into a space context in which a user's positionable space is more simply redefined.

이하에서, 도면을 참조하여 본 발명의 실시예들에 대하여 상세히 살펴본다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은, 본 발명의 일 실시예에 따른, 지능형 음성 정보 제공 시스템(1000)의 개념도이다. 상기 시스템(1000)은 공간 맥락 변환부(100), 공간 처리부(300) 및 음성 처리부(500)를 포함한다. 1 is a conceptual diagram of an intelligent voice information providing system 1000 according to an embodiment of the present invention. The system 1000 includes a spatial context converter 100, a spatial processor 300, and a voice processor 500.

실시예들에 따른 지능형 음성 정보 제공 시스템(1000)은 전적으로 하드웨어이거나, 또는 부분적으로 하드웨어이고 부분적으로 소프트웨어인 측면을 가질 수 있다. 예컨대, 상기 시스템(1000)의 각 구성은 특정 형식 및 내용의 데이터를 처리하거나 또는/또한 전자통신 방식으로 주고받기 위한 하드웨어 및 이에 관련된 소프트웨어를 통칭할 수 있다. 즉, 본 명세서에서 "부(unit)", "모듈(module)", "장치" 또는 "시스템" 등은 하드웨어, 하드웨어와 소프트웨어의 조합, 또는 소프트웨어 등 컴퓨터 관련 엔티티(entity)를 지칭한다. 예를 들어, 본 명세서에서 부, 모듈, 장치 또는 시스템 등은 실행중인 프로세스, 프로세서, 객체(object), 실행 파일(executable), 실행 스레드(thread of execution), 프로그램(program), 및/또는 컴퓨터(computer)일 수 있으나, 이에 제한되는 것은 아니다. 예를 들어, 컴퓨터에서 실행중인 애플리케이션(application) 및 컴퓨터의 양쪽이 모두 본 명세서의 부, 모듈, 장치 또는 시스템 등에 해당할 수 있다.The intelligent voice information providing system 1000 according to the embodiments may be entirely hardware, or partly hardware and partly software. For example, each component of the system 1000 may collectively refer to hardware and software related thereto for processing data in a specific format and content, and / or for electronic communication. That is, as used herein, "unit", "module", "device" or "system" and the like refer to hardware, a combination of hardware and software, or a computer-related entity such as software. For example, parts, modules, devices, or systems herein refer to running processes, processors, objects, executables, threads of execution, programs, and / or computers. computer, but is not limited thereto. For example, both an application running on a computer and a computer may correspond to a part, module, device, system, or the like herein.

실시예들에 따른 지능형 음성 정보 제공 시스템(1000)을 구성하는 각각의 부(100, 300, 500)는 반드시 물리적으로 구분되는 별개의 구성요소를 지칭하는 것으로 의도되지 않는다. 즉, 도 1에서 공간 맥락 변환부(100), 공간 처리부(300), 및 음성 처리부(500)는 서로 구분되는 별개의 블록으로 도시되나, 실시예에 따라서는 공간 맥락 변환부(100), 공간 처리부(300), 및 음성 처리부(500) 중 일부 또는 전부가 동일한 하나의 장치 내에 집적화될 수 있다. 예를 들어, 공간 처리부(300)와 음성 처리부(500)는 하나의 스마트 폰에 집적화될 수 있다. 또한, 각각의 부(100, 300, 500)는 이들이 구현된 컴퓨팅 장치에서 수행하는 동작에 따라 장치를 기능적으로 구분한 것일 뿐, 반드시 서로 분리된 별개의 소자를 의미하는 것이 아니다. 그러나 이는 예시적인 것으로서, 다른 실시예에서는 공간 맥락 변환부(100), 공간 처리부(300), 및 음성 처리부(500) 중 하나 이상이 다른 부와 물리적으로 구분되는 별개의 장치로 구현될 수도 있다. 예컨대, 공간 처리부(300) 는 공간 맥락 변환부(100)와 통신 가능하게 연결된 구성요소일 수도 있다.Each portion 100, 300, 500 constituting the intelligent voice information providing system 1000 according to the embodiments is not intended to refer to a separate component that is physically separated. That is, in FIG. 1, the spatial context converter 100, the spatial processor 300, and the voice processor 500 are illustrated as separate blocks that are separated from each other, but according to an exemplary embodiment, the spatial context converter 100, the space Some or all of the processing unit 300 and the voice processing unit 500 may be integrated in the same device. For example, the space processor 300 and the voice processor 500 may be integrated into one smartphone. In addition, each unit (100, 300, 500) is only functionally divided devices according to the operations performed by the computing device they are implemented, and does not necessarily mean separate elements separated from each other. However, as an example, in another embodiment, one or more of the spatial context converter 100, the spatial processor 300, and the voice processor 500 may be implemented as separate devices that are physically separated from the other units. For example, the spatial processor 300 may be a component communicatively connected to the spatial context converter 100.

공간 맥락 변환부(100)는 사용자가 위치 가능한 공간을 공간 맥락으로 변환하는 하나의 수단으로서, 사용자가 위치 가능한 공간에 포함된 객체들을 인식하고, 인식된 객체들을 기준으로 공간을 재-정의한다. 일 실시예에서, 상기 대상 공간의 원시 영상 정보를 획득하고, 상기 원시 영상 정보를 분석하여 상기 대상 공간에 포함된 관심 객체를 관심 객체 노드로 형상화하며, 상기 관심 객체 노드 및 관심 객체 노드를 연결한 연결선을 포함한 제1 공간 맥락을 생성한다. The spatial context converting unit 100 recognizes objects included in the space in which the user can locate and re-defines the space based on the recognized objects. In an embodiment, the raw image information of the target space is obtained, the raw image information is analyzed to form an object of interest included in the target space as an object of interest, and the object of interest and the object of interest node are connected. Create a first spatial context that includes a connecting line.

공간 맥락 변환부(100)는 대상 공간에 대한 공간 정보를 획득하기 위해 일반적인 2차원 영상을 획득할 수 있는 제1 영상 촬영기, 예를 들어 360도 카메라, ?K 카메라(deep camera), 마이크로소프트의 키넥트(Kinect)와 같은 RGB-D 카메라 및 이들의 조합과 유/무선으로 접속될 수도 있다. 여기서 획득되는 영상 정보는 대상 공간을 직접적으로 표현하는 영상 정보로서, 원시 영상 정보로 지칭된다. 일부 실시예에서, 대상 공간에 대한 영상 정보는 상술한 영상 정보 획득 수단에 의해 사전에 획득될 수 있다. 또 다른 일부 실시예에서, 제1 영상 촬영기는 물체의 형상을 스캔하는 3D 스캐너일 수도 있다. The spatial context converter 100 may obtain a first two-dimensional imager capable of obtaining a general two-dimensional image, for example, a 360 degree camera, a? K camera, a deep camera, or a Microsoft It may be connected via wired / wireless with an RGB-D camera such as Kinect and a combination thereof. The image information obtained here is image information directly representing a target space, and is referred to as raw image information. In some embodiments, the image information about the target space may be acquired in advance by the above-described image information obtaining means. In yet another embodiment, the first imager may be a 3D scanner that scans the shape of the object.

공간 맥락 변환부(100)는 원시 영상 정보를 획득하여 하나 이상의 관심 객체를 포함한 대상 공간을 공간 맥락으로 재-정의한다. 공간 맥락 변환부(100)가 대상 공간 내의 관심 객체를 인식하고 공간 맥락을 생성하는 과정은 아래의 단계(S130)에서 보다 상세하게 서술된다. 공간 맥락 변환부(100)는 생성된 공간 맥락 정보를 공간 처리부(300)에 제공할 수 있다. The spatial context converter 100 re-defines the target space including one or more objects of interest into a spatial context by obtaining raw image information. The process of the spatial context converting unit 100 to recognize the object of interest in the target space and generate the spatial context is described in more detail in step S130 below. The spatial context converter 100 may provide the generated spatial context information to the spatial processor 300.

공간 처리부(300)는 대상 공간 내부의 현재 상태를 분석한다. 공간 처리부(300)는 대상 공간에 연관된 정보, 즉 대상 공간을 변환한 공간 맥락을 획득하고, 사용자의 현재 위치를 산출하며, 대상 공간에 연관된 음성 질의가 수신된 경우 공간 맥락을 이용하여 답변을 생성하도록 한다. The space processor 300 analyzes the current state inside the target space. The space processor 300 obtains information related to the target space, that is, a spatial context in which the target space is converted, calculates a current position of the user, and generates a response using the spatial context when a voice query related to the target space is received. Do it.

일 실시예에서, 공간 처리부(300)는 공간 맥락 변환부(100)에서 생성된 공간 맥락을 제공받아 이를 기초로 사용자가 위치하는 대상 공간에 대한 인식을 진행할 수 있다. 이 경우, 사용자가 현재 위치하고 있는 대상 공간 내부에 위치하는 객체들을 인식하는 과정 등 별도의 처리 과정이 필요치 않기 때문에 보다 빠르고 편리하게 사용자의 공간 상태를 인식할 수 있다. In one embodiment, the spatial processor 300 may receive the spatial context generated by the spatial context converter 100 to recognize the target space where the user is located. In this case, since a separate processing process such as a process of recognizing objects located in the target space in which the user is currently located is not required, the user may recognize the user's space state more quickly and conveniently.

다른 일부 실시예에서, 공간 처리부(300)는 사용자가 위치하는 현재 대상 공간을 실시간으로 분석하여 사용자 및 관심 객체를 동시에 고려한 공간 맥락으로 변환할 수 있다. 즉, 사용자 노드 및 객체 노드를 포함하는 그래프를 생성할 수 있다. 이 경우, 공간 맥락 변환부(100)와 공간 처리부(300)는 하나의 컴포넌트로 집적된다. In another exemplary embodiment, the space processor 300 may analyze the current target space in which the user is located in real time and convert the space into a spatial context considering the user and the object of interest at the same time. That is, a graph including a user node and an object node may be generated. In this case, the spatial context converter 100 and the spatial processor 300 are integrated into one component.

상기 다른 일부 실시예에서, 공간 처리부(300)는 공간 맥락 변환부(100)에서 생성된 공간 맥락 정보에 대상 공간에 현재 위치하고 있는 객체들에 대한 정보를 보정하는 과정을 더 수행할 수 있다. 이 경우, 전처리 이후에 이동된 객체들에 대한 위치를 수정할 수 있어, 사용자에게 보다 정확한 공간 정보를 제공할 수 있다. In some other embodiments, the spatial processor 300 may further perform a process of correcting the information on the objects currently located in the target space in the spatial context information generated by the spatial context converter 100. In this case, the position of the moved objects after the preprocessing can be modified, thereby providing more accurate spatial information to the user.

공간 처리부(300)는 대상 공간에 위치하고 있는 사용자의 위치를 산출할 수 있다. 일 실시예에서, 공간 처리부(300)는 사용자의 1인칭 시점에 대응하는 1인칭 영상을 획득하고, 1인칭 시점 영상에 포함된 관심 객체의 형상에 기초하여 사용자의 위치를 산출할 수 있다. 이 경우, 공간 처리부(300)는 제2 영상 촬영기(331)와 유/무선으로 접속될 수 있다. 제2 영상 촬영기(331)은 사용자의 1인칭 시점에 대응하는 영상을 촬영하며, 예를 들어 스마트 폰일 수 있다. 제2 영상 촬영기(331)가 획득한 영상 정보는 1인칭 영상 정보로 지칭된다. 상기 실시예에서, 공간 처리부(300)는 1인칭 영상 정보에 포함된 객체들을 인식하고, 객체들과 사용자 간의 거리를 산출하여 사용자의 현재 위치를 산출할 수 있다. The space processor 300 may calculate a location of a user located in the target space. In an embodiment, the spatial processor 300 may obtain a first-person image corresponding to the first-person view of the user and calculate a location of the user based on the shape of the object of interest included in the first-person view image. In this case, the space processor 300 may be connected to the second imager 331 by wire or wirelessly. The second imager 331 captures an image corresponding to the first person view of the user, and may be, for example, a smart phone. The image information acquired by the second imager 331 is referred to as first person image information. In the above embodiment, the spatial processor 300 may recognize objects included in the first-person image information and calculate a current position of the user by calculating a distance between the objects and the user.

다른 실시예에서, 공간 처리부(300)는 사용자에 대한 3인칭 시점에 대응하는 3인칭 영상을 획득하고, 3인칭 시점 영상에 포함된 사용자의 형상에 기초하여 사용자의 위치 정보를 산출할 수 있다. 이 경우, 공간 처리부(300)는 제3 영상 촬영기(333)와 유/무선으로 접속될 수 있다. 제3 영상 촬영기(330)은 사용자에 대하여 3인칭 시점에 대응하는 영상을 촬영하며, 예를 들어 CCTV일 수 있다. 제3 영상 촬영기(333)가 획득한 영상 정보는 3인칭 영상 정보로 지칭된다. 상기 실시예에서, 공간 처리부(300)는 3인칭 영상 정보에 포함된 사용자의 형상에 기초하여 사용자의 현재 위치를 산출할 수 있다. In another embodiment, the spatial processor 300 may acquire a third person image corresponding to the third person view of the user and calculate location information of the user based on the shape of the user included in the third person view image. In this case, the space processor 300 may be connected to the third imager 333 by wire or wirelessly. The third imager 330 photographs an image corresponding to the third person view of the user, for example, may be CCTV. The image information acquired by the third imager 333 is referred to as third person image information. In the above embodiment, the spatial processor 300 may calculate the current position of the user based on the shape of the user included in the third-person image information.

공간 처리부(300)는 산출된 사용자 위치와 대상 공간에 연관된 정보(즉, 공간 맥락)를 결합하고, 이를 기초로 사용자에게 대상 공간에 연관된 다양한 정보를 제공할 수 있다. 일 실시예에서, 공간 처리부(300)는 사용자의 음성 질의에 대응하는 답변을 생성하기 위해 필요한 정보를 음성 처리부(500)로부터 요청받는 경우, 사용자의 위치가 반영된 공간 맥락을 이용하여 리턴 정보를 생성하고 이를 제공할 수 있다. 공간 처리부(300)가 리턴 정보를 생성하는 과정은 단계(S550)에서 보다 상세하게 서술된다. The space processor 300 may combine the calculated user location and the information (ie, the spatial context) related to the target space, and provide various information related to the target space to the user based on this. In one embodiment, when the space processor 300 is requested from the voice processor 500 for information necessary to generate an answer corresponding to a user's voice query, the space processor 300 generates return information using a spatial context in which the user's location is reflected. And provide it. The process of generating the return information by the space processor 300 is described in more detail in step S550.

음성 처리부(500)는 입력된 사용자의 음성 질의를 분석하고, 음성 질의에 대응되는 음성 답변을 제공하는 하나의 수단이다. 음성 처리부(500)는 음성 질의 및 응답을 위한 음성 정보 입/출력 장치에 유/무선으로 접속되며, 음성 처리부(500)가 입력된 사용자의 음성 질의를 분석하고, 이에 대응되는 답변을 제공하는 과정들은 애플의 시리(Siri), 아마존의 에코(Echo) 및/또는 공개특허공보 제10-2016-0069329호 등을 통해 통상의 기술자에게 잘 알려져 있는 것이므로, 발명의 요지를 명확하게 하기 위하여 본 명세서에서는 자세한 설명을 생략한다.The voice processor 500 is a means for analyzing a voice query of an input user and providing a voice answer corresponding to the voice query. The voice processing unit 500 is connected to the voice information input / output device for voice inquiry and response by wire / wireless, and the voice processing unit 500 analyzes the input voice query of the user and provides a corresponding response. These are well known to those skilled in the art through Apple's Siri, Amazon's Echo and / or Published Patent Publication No. 10-2016-0069329, and the like, for the purpose of clarity of the invention. Detailed description will be omitted.

일 실시예에서, 음성 처리부(500)는 음성 질의가 대상 공간에 관련된 표현을 포함하고 있는지 분석한다. 대상 공간에 관련된 표현은 대상 공간 내에 위치하는 객체들 또는 사용자의 위치 등을 포함한다. In one embodiment, the speech processing unit 500 analyzes whether the speech query includes a representation related to the target space. The expression related to the target space includes the location of the user or the objects located in the target space.

음성 처리부(500)는 수신된 음성 질의가 대상 공간에 연관된 음성 질의(예를 들어, 대상 공간 내에 위치하는 객체의 위치를 묻는 질의)라고 분석한 경우, 답변에 필요한 공간 정보를 공간 처리부(300)에 요청한다. 이와 같은 음성 처리부(500)에서 수행되는 과정들은 단계(S500)에서 보다 상세하게 서술된다. When the received voice query is analyzed as a voice query related to the target space (for example, a query for the location of an object located in the target space), the voice processor 500 analyzes the spatial information necessary for the response. Ask. Processes performed in the voice processing unit 500 are described in more detail in step S500.

상기 시스템(1000)은 운영체제(Operation System; OS), 즉 시스템을 기반으로 다양한 소프트웨어를 실행하거나 제작할 수 있다. 상기 운영체제는 소프트웨어가 장치의 하드웨어를 사용할 수 있도록 하기 위한 시스템 프로그램으로서, 안드로이드 OS, iOS, 윈도우 모바일 OS, 바다 OS, 심비안 OS, 블랙베리 OS 등 모바일 컴퓨터 운영체제 및 윈도우 계열, 리눅스 계열, 유닉스 계열, MAC, AIX, HP-UX 등 컴퓨터 운영체제를 모두 포함할 수 있다.The system 1000 may execute or manufacture various software based on an operating system (OS), that is, a system. The operating system is a system program for enabling the software to use the hardware of the device, and the mobile computer operating system such as Android OS, iOS, Windows Mobile OS, Sea OS, Symbian OS, Blackberry OS, Windows, Linux, Unix, It can include any computer operating system, such as MAC, AIX, or HP-UX.

상기 시스템(1000)이 본 명세서에 서술되지 않은 다른 구성요소를 포함할 수도 있다는 것이 당업자에게 명백할 것이다. 예를 들어, 공간 맥락 변환부(100)와 공간 처리부(300) 및 음성 처리부(500) 사이를 연결하는 네트워크, 네트워크 인터페이스 및 프로토콜 등 본 명세서에 서술된 동작에 필요한 다른 하드웨어 요소를 더 포함할 수도 있다.It will be apparent to those skilled in the art that the system 1000 may include other components not described herein. For example, it may further include other hardware elements necessary for the operations described herein, such as a network, a network interface, and a protocol connecting the spatial context converter 100 and the spatial processor 300 and the voice processor 500. have.

도 2는, 본 발명의 일 실시예에 따른, 지능형 음성 정보 제공 시스템(1000)에 의해 수행되는 지능형 음성 정보 제공 방법의 흐름도이다. 지능형 음성 정보 제공 방법은 하나 이상의 관심 객체를 포함한 대상 공간을 제1 공간 맥락으로 변환하는 단계(S100); 대상 공간에 위치하는 사용자를 인식하고, 사용자의 위치를 반영한 제2 공간 맥락을 생성하는 단계(S300) 및 제2 공간 맥락에 기초하여 음성 질의 및 응답을 수행하는 단계(S500)를 포함한다. 2 is a flowchart of an intelligent voice information providing method performed by the intelligent voice information providing system 1000 according to an embodiment of the present invention. The intelligent voice information providing method may include converting a target space including one or more objects of interest into a first spatial context (S100); Recognizing a user located in the target space, generating a second spatial context reflecting the user's location (S300) and performing a voice query and response based on the second spatial context (S500).

도 3은, 본 발명의 일 실시예에 따른, 공간 맥락 변환부(100)에 의해 수행되는 공간 맥락 변환 과정(S100)의 흐름도이고, 도 4는, 본 발명의 일 실시예에 따른, 공간 맥락 변환 과정(S100)의 각 단계에 따른 해당 영상을 나타낸 도면이다. 3 is a flowchart of a spatial context transformation process S100 performed by the spatial context transformation unit 100 according to an embodiment of the present invention, and FIG. 4 is a spatial context according to an embodiment of the present invention. A diagram illustrating a corresponding image according to each step of the conversion process (S100).

도 3-4를 참조하면, 단계(S100)는 하나 이상의 관심 객체를 포함한 대상 공간을 공간 맥락으로 변환하는 단계로서, 우선 공간 맥락 변환부(100)는 제1 영상 촬영기를 통해 원시 영상 정보를 획득한다(S110). 도 4(a)에 도시된 바와 같이, 원시 영상은 대상 공간의 상태를 직접적으로 표현하는 영상 정보이다. 그 후, 공간 맥락 변환부(100)는 원시 영상 정보를 분석하여 대상 공간에 포함된 관심 객체를 노드로 각각 형상화 한다(S130).3-4, step S100 is a step of converting a target space including one or more objects of interest into a spatial context. First, the spatial context converter 100 obtains raw image information through a first imager. (S110). As shown in FIG. 4A, the raw image is image information that directly expresses a state of a target space. Thereafter, the spatial context converter 100 analyzes the raw image information and shapes each object of interest included in the target space into nodes (S130).

공간 맥락 변환부(100)는 일단 원시 영상 정보를 획득하면(S110), 원시 영상 정보에 기초하여 대상 공간을 3차원 베이직 공간으로 재구성한다(S131). 여기서, 3차원 베이직 공간은 대상 공간의 상태(예를 들어, 어떤 객체들이 위치하고 있는지 등)가 분석되기 이전의 기초 공간으로서, 도 4(a)와 같은 2차원의 원시 영상으로부터 생성된, 도 4(b)와 같이 배경과 내부 객체들 간의 구분이 되지 않은 상태로 대상 공간의 실내 구조를 표현하는 3차원의 가상 공간을 지칭한다. Once the spatial context converter 100 acquires the raw image information (S110), the spatial context converter 100 reconstructs the target space into a 3D basic space based on the raw image information (S131). Here, the three-dimensional basic space is a basic space before the state of the object space (for example, what objects are located, etc.) is analyzed and is generated from a two-dimensional raw image as shown in FIG. 4 (a). As shown in (b), it refers to a three-dimensional virtual space that represents the interior structure of the target space without being distinguished between the background and the internal objects.

그 후, 공간 맥락 변환부(100)는 3차원 베이직 공간 내에 위치하는 관심 객체를 인식한다(S133). 여기서, 관심 객체는 대상 공간 내에 위치한 다수의 객체 중 상대적인 위치를 정하는데 이용될 수 있는 주요한 객체들을 나타낸다. 일 실시예에서, 관심 객체는 TV, 책장, 싱크대, 침대, 소파 등과 같이 크기가 크고, 이동하지 않는 물건 객체를 포함한다. 다른 일 실시예에서, 관심 객체는 크고, 이동하지 않는 물건 객체는 물론, 크기는 작지만 사용자에게 중요한 물건 객체 또는 사용빈도가 상대적으로 높은 물건 객체, 예를 들어 스마트 폰, 리모컨 등 또한 포함할 수 있다. Thereafter, the spatial context converter 100 recognizes an object of interest located in the 3D basic space (S133). Here, the object of interest refers to the main objects that can be used to determine a relative position among a plurality of objects located in the target space. In one embodiment, the object of interest includes a large, non-moving object object, such as a TV, bookcase, sink, bed, sofa, or the like. In another embodiment, the object of interest may include a large, non-moving object object, as well as a small but important object object or a frequently used object object, such as a smart phone, a remote controller, and the like. .

예를 들어, 도 4(a), 4(b)를 참조하면, 식탁 위에 위치하는 컵(A)과 책(B)은 대상 공간에 위치하는 물건 객체이나, 크기가 비교적 작고 스마트 폰 등에 비해 사용 빈도가 낮으며 다른 객체들과의 상대적인 위치 관계를 표시하는데 사용되지 않는 것이 일반적이기 때문에 관심 객체에 해당되지 않는다. For example, referring to FIGS. 4 (a) and 4 (b), the cup A and the book B placed on the dining table are used in comparison with an object object located in a target space, but relatively small in size and used in a smartphone. It is not an object of interest because it is common and is not commonly used to indicate relative positional relationships with other objects.

이러한 관심 객체는 대상 공간에서의 객체의 위치, 크기, 위치 변화 여부, 다른 객체들과의 상대적인 위치 관계(예를 들어, 어느 객체를 중심으로 다른 객체들이 고르게 분포하고 있는 경우 등), 사용 빈도 중 적어도 하나에 기초하여 결정될 수 있다. 일부 실시예에서 관심 객체는 사용자 지정에 의해 결정될 수도 있다. These objects of interest may be located in the target space, in terms of their position, size, position change, relative position to other objects (for example, when other objects are distributed evenly around an object, etc.) It can be determined based on at least one. In some embodiments, the object of interest may be determined by user specification.

공간 맥락 변환부(100)가 관심 객체를 인식하는 과정은 기계적으로 수행될 수 있다. 일 실시예에서, 공간 맥락 변환부(100)는 기 저장된 관심 객체에 연관된 속성에 기초하여 관심 객체를 학습하고, 학습 결과에 기초하여 3차원 베이직 공간 내부를 분석하여 3차원 베이직 공간 내에 포함된 관심 객체의 유형 및 위치를 결정한다. The process of recognizing the object of interest by the spatial context converter 100 may be performed mechanically. In one embodiment, the spatial context converter 100 learns an object of interest based on an attribute associated with a pre-stored object of interest, analyzes the interior of the three-dimensional basic space based on the learning result, and includes the interest included in the three-dimensional basic space. Determine the type and location of the object.

상기 실시예에서, 기 저장된 관심 객체에 연관된 속성에 기초하여 관심 객체를 학습하는 알고리즘은 SVM(Support Vector Machine, 의사결정 트리(Decision Tree), KNN(K-nearest neighbor), 신경망(Neural Network) 알고리즘 등과 같은 다양한 지도 학습 알고리즘을 포함한다. In the above embodiment, the algorithm for learning the object of interest based on the attributes associated with the object of interest previously stored may include a support vector machine (SVM), a decision tree, a K-nearest neighbor, and a neural network algorithm. And various supervised learning algorithms.

다른 실시예에서, 공간 맥락 변환부(100)는 기 저장된 3차원 관심 객체 모델의 형상과 유사한 형상이 있는지 3차원 베이직 공간 내부를 탐색하고, 유사한 형상이 있는 경우 탐색된 형상을 3차원 관심 객체 모델로 결정한다. 여기서 유사한 형상인지 여부는 각 3차원 관심 객체 모델의 속성과 적어도 일부분 일치하는지 여부로 결정된다. In another embodiment, the spatial context converting unit 100 searches inside the 3D basic space for a shape that is similar to the shape of the pre-stored 3D object model, and if the shape is similar, the 3D object model is searched. Decide on Here, whether or not the shape is similar is determined by whether at least partially coincides with an attribute of each three-dimensional object model of interest.

상기 다른 실시예에서, 공간 맥락 변환부(100)는 별도의 데이터베이스(미도시)를 더 포함할 수도 있다. 데이터베이스는 실내 공간에 위치 가능한 관심 객체들을 인식하기 위해, 다양한 3차원 객체 모델 및 각 3차원 객체 모델에 연관된 속성 정보를 포함한다. 상기 실시예와 같이 상기 공간 맥락 변환부(100)가 데이터베이스를 더 포함하는 경우, 저장되는 데이터베이스 정보는 외부 데이터베이스로부터 획득될 수도 있고, 사용자 입력으로부터 획득될 수도 있다. 통상의 기술자에게는 데이터베이스는 여기에 명백하게 언급되지 않은 다른 데이터 저장소도 포함할 수도 있다는 것이 명백할 것이다. In the above other embodiment, the spatial context converter 100 may further include a separate database (not shown). The database includes various three-dimensional object models and attribute information associated with each three-dimensional object model in order to recognize objects of interest that can be located in the indoor space. When the spatial context converter 100 further includes a database as in the above embodiment, the stored database information may be obtained from an external database or may be obtained from a user input. It will be apparent to those skilled in the art that the database may also include other data stores not explicitly mentioned herein.

3차원 베이직 공간에 포함된 관심 객체를 인식하면(S133), 공간 맥락 변환부(100)는 인식된 관심 객체들을 색인하여 식별한다(S137). 일 실시예에서, 공간 맥락 변환부(100)는 인식된 관심 객체들을 유형별로 색인한다. 각 색인은 실제 공간에 위치하는 객체를 식별하는 지표로서 제1 색인의 경우 TV, 제2 색인의 경우 식탁, 제3 색인의 경우 침대, 제4 색인의 경우 창문 등을 나타낸다. When the object of interest included in the 3D basic space is recognized (S133), the spatial context converter 100 indexes and recognizes the recognized objects of interest (S137). In one embodiment, the spatial context transformation unit 100 indexes the recognized objects of interest by type. Each index is an index for identifying an object located in a real space and represents a TV for the first index, a table for the second index, a bed for the third index, and a window for the fourth index.

일 실시예에서, 단계(S130)는 단계(S137)에서 색인된 관심 객체를 단일로 포함한 세그먼트로 3차원 베이직 공간을 분할하는 단계(S134) 및 관심 객체에 대응되는 공간적 형상을 각 세그먼트에 적용하는 단계(S135)를 더 포함할 수 있다. 여기서 공간적 형상은 도 4(c)에 도시된 바와 같이 관심 객체를 관심 객체를 최소 형상으로 분해할 수 있는 형상으로서, 원형, 사각형 등과 같이 물체를 단순하고 직관적으로 표현할 수 있는 형상이다. 이 경우, 단계(S137)는 도 4(c)와 같이 공간적 형상에 색인을 수행한다. In one embodiment, step S130 is a step of dividing the three-dimensional basic space into segments including a single object of interest indexed in step S137 (S134) and applying a spatial shape corresponding to the object of interest to each segment. Step S135 may be further included. Here, the spatial shape is a shape capable of decomposing an object of interest into a minimum shape as shown in FIG. 4C, and is a shape that can simply and intuitively express an object such as a circle or a rectangle. In this case, step S137 indexes the spatial shape as shown in FIG.

관심 객체를 포함한 세그먼트로 3차원 공간을 분할하게 되면, 관심 객체에 기반하여 대상 공간을 부분적으로 분석할 수 있다. When the 3D space is divided into segments including the object of interest, the target space may be partially analyzed based on the object of interest.

공간 맥락 변환부(100)는 색인된 관심 객체를 관심 객체 노드로 각각 형상화하는 한다(S137). 여기서 관심 객체 노드는 관심 객체에 연관된 노드로서, 각 객체 노드는 어떤 관심 객체인지를 나타내는 관심 객체의 유형, 크기, 관심 객체가 일반적으로 위치하는 서브 공간 등과 같은 속성 및 관심 객체의 위치에 대한 정보를 포함한다. 여기서 관심 객체의 위치는 3차원 좌표를 나타낸다. The spatial context converter 100 shapes the indexed objects of interest into objects of interest (S137). Here, the object of interest node is a node associated with the object of interest, where each object node indicates information about attributes and the location of the object of interest, such as the type, size of the object of interest, the subspace in which the object of interest is generally located, etc. Include. Here, the position of the object of interest represents three-dimensional coordinates.

이와 같이 객체의 유형에 대하여 개별적으로 색인을 부여함으로써, 각 객체의 유형을 효율적으로 식별할 수 있다. 또한, 사용자는 객체의 유형에 기초하여 실내 공간의 형태를 인지하는 것이 일반적이므로, 사용자에게 익숙하고 직관적인 공간 정보를 제공하는데 기초가 된다. In this way, by individually indexing the types of objects, the type of each object can be efficiently identified. In addition, since the user generally recognizes the shape of the indoor space based on the type of the object, it is the basis for providing the user with familiar and intuitive spatial information.

그러면, 공간 맥락 변환부(100)는 관심 객체 노드 및 관심 객체 노드를 연결한 연결선을 포함하는 제1 공간 맥락을 생성할 수 있다(S150). 즉, 실제 복잡하고 불필요할 수 있는 정보들을 포함한 대상 공간을 공간 분석 측면에서 보다 단순한 공간 맥락으로 재-정의하여, 실제의 모든 내부 정보를 분석할 필요 없이, 그래프에 포함된 노드를 순회하는 방식으로 분석이 가능하여 처리 속도 측면에서 이점이 있다. Then, the spatial context converting unit 100 may generate a first spatial context including a connection line connecting the object of interest node and the object of interest (S150). In other words, by re-defining a target space containing information that is actually complex and unnecessary in a simpler spatial context in terms of spatial analysis, it traverses the nodes contained in the graph without having to analyze all the actual internal information. Analyzes are possible and thus have an advantage in terms of processing speed.

공간 처리부(300)는 실제 대상 공간에 위치하는 사용자를 인식하고, 사용자의 위치와 관심 객체 노드를 포함한 공간 맥락을 결합하는 단계(S300)를 수행한다. 일 실시예에서, 공간 처리부(300)는 공간 맥락 변환부(100)에서 생성된 공간 맥락(즉, 그래프)을 통해 실제 대상 공간을 분석한 정보를 획득하고, 사용자의 위치를 이에 반영한다(S300). The space processor 300 recognizes a user located in an actual target space and performs a step (S300) of combining a user's location with a spatial context including an object node of interest. In one embodiment, the spatial processor 300 obtains information analyzing the actual target space through the spatial context (that is, graph) generated by the spatial context converter 100, and reflects the user's location therein (S300). ).

상기 실시예에서, 공간 처리부(300)는 공간 맥락 처리부(100)에 의해 대상 공간을 일차적으로 분석한 정보(제1 공간 맥락)를 획득하였으므로, 추가적으로 사용자의 위치를 이에 반영하는 과정을 수행한다. In the above embodiment, since the spatial processing unit 300 obtains information (first spatial context) that primarily analyzes the target space by the spatial context processing unit 100, the spatial processing unit 300 additionally reflects the user's location.

일 실시예에서, 공간 처리부(300)는 대상 공간에 위치한 사용자의 위치를 산출한다(S310).In an embodiment, the space processor 300 calculates a location of a user located in the target space (S310).

일 실시예에서, 공간 처리부(300)는 사용자의 1인칭 시점에 대응하는 1인칭 영상 정보를 획득하고, 획득된 1인칭 시점 영상 정보에 기초하여 대상 공간 내 상기 사용자의 위치를 산출한다(S310). 공간 처리부(300)는 1인칭 시점 영상 정보에 포함된 객체들의 형상을 분석하여 사용자가 대상 공간 내부의 어느 위치에 현재 위치하는지 결정할 수 있다. 일부 실시예에서, 공간 처리부(300)는 제2 영상 촬영기(331)의 위치를 사용자의 위치로 결정할 수도 있다. In an embodiment, the space processor 300 obtains first-person image information corresponding to the first-person view of the user and calculates a location of the user in the target space based on the obtained first-person view image information (S310). . The spatial processor 300 may analyze the shapes of the objects included in the first-person view image information to determine where the user is currently located in the target space. In some embodiments, the spatial processor 300 may determine the location of the second imager 331 as the location of the user.

다른 실시예에서, 공간 처리부(300)는 사용자에 대한 3인칭 시점에 대응하는 3인칭 영상 정보를 획득하고, 획득된 3인칭 시점 영상 정보에 기초하여 대상 공간 내 사용자의 위치를 산출한다(S310). 공간 처리부(300)는 3인칭 시점 영상에 포함된 사용자의 형상을 포착하고 사용자 이외의 객체 및 배경과의 관계를 계산하여 사용자의 위치를 산출한다. 공간 처리부(300)는 사용자 신체의 일부, 또는 전부에 대한 좌표로 사용자의 위치를 결정할 수 있다. 상기 실시예에서, 공간 처리부(300) 및 제3 촬영기(333)은 서로 유/무선 통신 가능하도록 구성된다. In another embodiment, the spatial processor 300 obtains third-person image information corresponding to the third-person view of the user, and calculates a location of the user in the target space based on the obtained third-person view image information (S310). . The spatial processor 300 captures the shape of the user included in the third-person view image and calculates the position of the user by calculating a relationship between objects other than the user and the background. The space processor 300 may determine the location of the user by using coordinates of a part or all of the user's body. In the above embodiment, the space processor 300 and the third camera 333 are configured to enable wired / wireless communication with each other.

공간 처리부(30)는 결정된 사용자 위치와 제1 공간 맥락을 결합하여 제2 공간 맥락을 생성한다(S330). 공간 처리부(300)는 결정된 사용자 위치에 기초하여 대상 공간에 위치하는 사용자를 사용자 노드로 형상화하고, 사용자 노드를 제1 공간 맥락에 포함시켜 음성 질의에 대한 답변을 생성하기 위한 최종적인 공간 맥락을 생성한다(S330). The spatial processor 30 combines the determined user location with the first spatial context to generate a second spatial context (S330). The spatial processor 300 forms a user located in the target space as a user node based on the determined user position, and generates a final spatial context for generating a response to a voice query by including the user node in the first spatial context. (S330).

일 실시예에서, 공간 처리부(300)는 사용자 노드를 시작점으로 설정하고, 각 노드 사이를 다른 노드를 경유하지 않도록 연결하는 연결선을 생성한다. 도 4(d)를 참조하면, 공간 맥락은 사용자 노드, 관심 객체 노드 및 이들을 연결하는 연결선으로 구성되어 있어, 대상 공간에 물체 또는 사용자가 위치하는지는 해당 노드의 존재 여부를 탐색하면 되고, 이들 간의 위치 관계는 연결선을 통해 쉽게 파악할 수 있다. 이로 인해 아래의 단계(S500)에서 보다 빠르고 편리하게 음성 질의에 대한 답변을 생성할 수 있다.In one embodiment, the space processor 300 sets a user node as a starting point, and generates a connection line connecting the nodes so as not to pass through other nodes. Referring to FIG. 4 (d), the spatial context is composed of a user node, an object of interest node, and a connection line connecting them, so whether an object or a user is located in a target space may be searched for the existence of the node. The positional relationship can be easily identified through the connecting line. As a result, in step S500 below, an answer to the voice query may be generated more quickly and conveniently.

다른 일부 실시예에서, 시작점은 사용자 노드 이외의 노드로 설정될 수 있다. 다른 일 실시예에서, 시작점은 방위에 기초하여 설정될 수 있다. 예를 들어, 시작점은 가장 동쪽에 위치한 관심 객체 노드로 설정될 수 있다. 이 경우, 아래의 도 5를 참조하면, 음성 처리부(500)가 “가장 동쪽에 위치한 물건이 뭐지”란 질문을 수신할 때 보다 빠르게 답변을 생성하기 위한 필요 정보를 탐색할 수 있는 이점이 있다.In some other embodiments, the starting point can be set to a node other than the user node. In another embodiment, the starting point can be set based on the orientation. For example, the starting point can be set to the object node of interest located at the easternmost point. In this case, referring to FIG. 5 below, when the voice processing unit 500 receives a question “what is the object located in the far east”, there is an advantage of searching for necessary information for generating an answer more quickly.

도 5는, 본 발명의 일 실시예에 따른, 음성 질의에 대응하는 답변을 생성하는 과정의 흐름도이다. 5 is a flowchart of a process of generating an answer corresponding to a voice query according to an embodiment of the present invention.

음성 처리부(500)는 대상 공간에 연관된 음성 질의를 수신한 경우, 상기 음성 질의에 대응하는 답변을 생성하고 사용자에게 제공한다(S500) 여기서, 생성되는 답변은 제2 공간 맥락에 기반한 것을 특징으로 한다.When the voice processor 500 receives a voice query related to the target space, the voice processor 500 generates and provides an answer corresponding to the voice query to the user (S500). Here, the generated answer is based on a second spatial context. .

음성 처리부(500)는 음성 질의를 수신하고 내용을 분석한다. 음성 처리부(500)는 음성 질의에 공간 관련 표현(예를 들어, 물건의 위치)이 포함된 경우 대상 공간에 연관된 음성 질의라고 결정한다. 여기서, 실제 대상 공간에 물건 객체가 위치하는지 여부와는 무관하다. The voice processing unit 500 receives a voice query and analyzes the content. The speech processor 500 determines that the speech query is related to the target space when the speech query includes a space related expression (for example, the location of the object). Here, it is irrelevant whether the object object is located in the actual target space.

분석 결과, 음성 질의가 대상 공간에 연관된 질의라고 결정된 경우 음성 처리부(500)는 대응하는 답변을 생성하는데 필요한 정보(이하, 필요 정보)를 공간 처리부(300)에 요청한다(S510). As a result of the analysis, when it is determined that the voice query is a query related to the target space, the voice processing unit 500 requests the space processing unit 300 for information (hereinafter required information) necessary for generating a corresponding answer (S510).

공간 처리부(300)는 요청에 응답하여 제2 공간 맥락을 통해 필요 정보가 있는지를 탐색한다(S530). 공간 처리부(300)는 그래프의 노드를 연결선을 따라 탐색하여 공간 내부에 위치하는 객체들 및 사용자들의 위치 관계를 분석할 수 있다. In response to the request, the space processor 300 searches for whether there is necessary information through the second spatial context (S530). The space processor 300 may search for nodes of the graph along a connection line and analyze the positional relationship between objects and users located in the space.

일 예에서, 사용자로부터 가장 가까운 객체를 탐색하고자 하는 경우 원-링 레이어(one-ring layer)에 포함된 객체 노드를 우선 탐색하고, 이 중에서 사용자 노드와 가장 가까운 거리에 위치한 노드를 결정한다. In one example, when the user wants to search the object closest to the user, the object node included in the one-ring layer is first searched, and among these, the node located closest to the user node is determined.

다른 노드를 경유하지 않도록 연결되기 때문에, 관심 객체와 관심 객체 간의 상대적인 위치 또한 연결선을 따라 조회하면 쉽게 분석할 수 있다. Since the connection is not via other nodes, the relative position between the object of interest and the object of interest can also be easily analyzed by looking along the connection line.

공간 처리부(300)는 필요 정보가 탐색하여 필요 정보를 생성하기 위한 정보를 제2 공간 맥락으로부터 추출하고, 추출된 정보에 기초하여 리턴 정보를 생성하고 음성 처리부(500)로 리턴한다(S550). 음성 처리부(500)는 리턴 정보를 이용하여 답변을 생성하고 사용자에게 생성된 답변을 제공한다(S570). The spatial processing unit 300 searches for necessary information to extract information for generating the necessary information from the second spatial context, generates return information based on the extracted information, and returns to the voice processing unit 500 (S550). The voice processor 500 generates an answer using the return information and provides the generated answer to the user (S570).

공간 처리부(300)가 생성하는 리턴 정보는 음성 질의의 내용에 따른 다양할 수 있다. . The return information generated by the spatial processor 300 may vary according to the contents of the voice query. .

일 실시예에서, 음성 질의의 내용이 둘 이상의 관심 객체에 연관된 경우, 리턴 정보는 질의 대상 관심 객체 간의 상대적인 위치 관계에 연관된 정보를 포함한다. In one embodiment, if the content of the voice query is associated with two or more objects of interest, the return information includes information related to the relative positional relationship between the objects of interest to be queried.

예를 들어, TV에서 소파까지 거리가 얼마나 되지”란 음성 질의가 입력된 경우, 음성 처리부(500)는 해당 음성 질의의 내용을 분석하여 “TV”, “소파”의 위치 및/또는 이들의 거리 정보가 필요하다고 결정하고, 공간 처리부(300)에 필요한 정보를 요청한다. For example, when the voice query "How far is the distance from the TV to the sofa" is input, the voice processing unit 500 analyzes the content of the voice query and the position of "TV", "sofa" and / or the distance thereof. It is determined that the information is necessary, and requests the necessary information to the space processing unit 300.

공간 처리부(300)는 “TV” “소파”가 대상 공간에 존재하는지를 우선 탐색하고, 존재한다면 “TV”, “소파”의 위치로부터 두 객체 사이의 거리를 산출한다. 공간 처리부(300)는 해당 객체 노드에 포함된 위치 정보에 기초하여 두 관심 객체 사이의 거리를 산출한다. 산출 결과는 다시 음성 처리부(500)에 리턴되고, 음성 처리부(500)는 리턴된 정보를 이용하여 사용자에게 음성 답변을 제공한다. 예컨대, “TV와 소파의 거리는 약 1m입니다”는 답변이 제공될 수 있다. The space processor 300 first searches whether the "TV" "sofa" exists in the target space, and if present, calculates the distance between two objects from the positions of "TV" and "sofa". The spatial processor 300 calculates a distance between two objects of interest based on location information included in the object node. The calculation result is returned to the voice processing unit 500, and the voice processing unit 500 provides a voice answer to the user using the returned information. For example, the answer may be, “The distance between the TV and the sofa is about 1 meter.”

다른 일 실시예에서, 음성 질의의 내용이 하나의 관심 객체에만 연관된 경우, 리턴 정보는 질의 대상 관심 객체를 제외한 나머지 중 어느 하나와의 상대적인 위치 관계에 연관된 정보로 산출된다. In another embodiment, when the content of the voice query is associated with only one object of interest, the return information is calculated as information related to a relative positional relationship with any one other than the object of interest to be queried.

예를 들어, “내가 리모컨을 어디에 놓아두었지?”란 음성 질의가 입력된 경우, 음성 처리부(500)는 해당 음성 질의의 내용을 분석하여 “리모컨” 하나만이 질의에 연관되었다고 분석한다. 음성 처리부(500)는 “리모컨”의 위치를 표현할 상대적인 위치 관계 정보를 답변을 위한 필요 정보로 결정하고, 공간 처리부(300)에 이를 요청한다. For example, when a voice query “Where did I put the remote control?” Is input, the voice processing unit 500 analyzes the contents of the voice query and analyzes that only one “remote” is associated with the query. The voice processing unit 500 determines relative positional relationship information to express the location of the "remote control" as necessary information for the answer, and requests the spatial processing unit 300 to do so.

공간 처리부(300)는 “리모컨”이 대상 공간에 존재하는지를 우선 탐색한다. 또한, 하나의 관심 객체만이 음성 질의에 연관되었기 때문에, 답변을 생성하는데 필요한 다른 관심 객체를 결정한다. 여기서, 다른 관심 객체는 질의 대상 관심 객체를 제외한, 상대적인 위치 관계의 기준이 되는 노드를 지칭한다. 이때, 상대적인 위치의 기준이 되는 다른 노드는 사용자, 가장 가까운 객체, 및 크기가 가장 큰 객체 중 적어도 하나에 연관된 노드일 수 있다. The space processor 300 first searches whether the "remote control" exists in the target space. Also, since only one object of interest is associated with the voice query, it determines the other object of interest needed to generate the answer. Here, the other object of interest refers to a node that is a reference of a relative positional relationship, excluding a query object of interest. In this case, the other node that is a reference of the relative position may be a node associated with at least one of the user, the nearest object, and the largest object.

만약, 리모컨에 가장 가까운 객체가 식탁으로 결정된 경우, 리모컨의 위치는 식탁과의 상대적인 위치 관계로 표현되고, 이 표현이 음성 처리부(500)에 리턴된다. 그 결과, 음성 처리부(500)는 “리모컨은 식탁 위에 올려져 있습니다.”라는 답변을 생성하고 이를 제공할 수 있다. If the object closest to the remote controller is determined as the dining table, the position of the remote controller is represented by a relative positional relationship with the dining table, and the expression is returned to the voice processing unit 500. As a result, the voice processing unit 500 may generate and provide an answer, "The remote control is placed on the table."

다른 일 실시예에서, 음성 질의의 내용에 대상 공간의 서브 공간이 연관된 경우, 답변은 상기 서브 공간의 위치를 이용하여 생성된다. 여기서, 서브 공간의 위치는 서브 공간에 연관된 관심 객체를 탐색하고, 탐색된 관심 객체에 기초하여 결정될 수 있다. 여기서 서브 공간은 공간적인 측면에서 대강 공간의 하위 개념에 해당되는 부분 공간으로서, 예컨대 대상 공간이 주거 공간 전체인 경우 서브 공간은 화장실, 주방, 거실, 개인 방 등을 포함한다. In another embodiment, when the subspace of the target space is associated with the content of the voice query, an answer is generated using the location of the subspace. Here, the location of the subspace may be searched for an object of interest associated with the subspace and determined based on the found object of interest. Here, the sub-space is a sub-space corresponding to the sub-concept of the rough space in terms of space. For example, when the target space is the entire living space, the sub-space includes a toilet, a kitchen, a living room, a private room, and the like.

예를 들어, “화장실까지 안내해줘”란 음성 질의가 입력된 경우, 음성 처리부(500)는 해당 음성 질의의 내용을 분석하여(S510) “화장실”의 위치 및 사용자 위치가 필요하다고 결정하고 공간 처리부(300)에 이를 요청한다(S530). For example, when the voice query “Guide to the toilet” is input, the voice processing unit 500 analyzes the contents of the voice query (S510) and determines that the location of the “toilet” and the user's location are necessary, and the space processing unit Request this to the 300 (S530).

공간 처리부(300)는 공간 맥락에 포함된 각 노드의 속성 정보를 탐색하여 “화장실”에 연관된 노드를 모두 추출한다. 그 후, 추출된 노드들의 위치 정보에 기초하여 대략적인 “화장실”의 위치를 결정한다. 상기 예에서, “세면대”, “변기” 및 “욕조”가 탐색된 경우, “세면대”, “변기” 및 “욕조”의 위치에 기초하여 “화장실”의 위치가 결정된다. 일 실시예에서, “화장실”의 위치는 “세면대”, “변기” 및 “욕조”에 대해 중심인 좌표로 결정된다. The space processor 300 searches for attribute information of each node included in the spatial context and extracts all nodes associated with the “toilet”. Then, the approximate location of the "toilet" is determined based on the extracted location information of the nodes. In the above example, when "wash basin", "toilet" and "bath" are searched, the position of "toilet" is determined based on the positions of "sink", "toilet" and "bath". In one embodiment, the location of the "toilet" is determined by the coordinates centered on the "sink", "toilet" and "bathtub."

공간 처리부(300)는 “화장실”과 사용자 간의 거리를 산출하여 이를 음성 처리부(500)로 리턴하고(S550), 음성 처리부(500)는 리턴된 정보를 이용하여 “오른쪽으로 돌아서, 3미터 정도 걸으세요”란 답변을 생성할 수 있다(S570). The space processing unit 300 calculates the distance between the “toilet” and the user and returns it to the voice processing unit 500 (S550). The voice processing unit 500 turns to the right using the returned information, and walks about 3 meters. Please ”you can create an answer (S570).

이와 같이, 지능형 음성 정보 제공 시스템(1000) 및 방법은 사용자가 실제 위치하는 대상 공간을 노드 및 연결선을 포함하는 공간 맥락(즉, 그래프)으로 추상화하여 공간적인 측면에서 보다 빠른 분석 및 정보 처리를 수행할 수 있다. 나아가, 사용자에게 제공하는 답변을 상대적인 위치 관계로 표현하도록 생성하여 사용자에게 보다 직관적이고 익숙한 음성 답변을 제공할 수 있다. As such, the intelligent voice information providing system 1000 and the method abstract the target space where the user is actually located in a spatial context (that is, a graph) including nodes and connection lines to perform faster analysis and information processing in terms of space. can do. Furthermore, by providing a response provided to the user to be expressed in a relative positional relationship, the user may provide a more intuitive and familiar voice answer.

상기 도 2, 도 3 및 도 5에서 도시된 단계들은 단지 예시적인 것으로, 단계의 순서가 상이할 수도, 또는 추가적인 단계를 더 포함할 수도, 또는 생략될 수도 있다. The steps shown in FIGS. 2, 3, and 5 are merely exemplary, and the order of the steps may be different, or may further include, or may be omitted.

이상에서 설명한 실시예들에 따른 지능형 음성 정보 제공 시스템 및 방법에 의한 동작은, 적어도 부분적으로 컴퓨터 프로그램으로 구현될 수 있다. Operations by the intelligent voice information providing system and method according to the embodiments described above may be implemented at least in part by a computer program.

상기 컴퓨터는 데스크탑 컴퓨터, 랩탑 컴퓨터, 노트북, 스마트 폰, 또는 이와 유사한 것과 같은 컴퓨팅 장치일 수도 있고 통합될 수도 있는 임의의 장치일 수 있다. 컴퓨터는 하나 이상의 대체적이고 특별한 목적의 프로세서, 메모리, 저장공간, 및 네트워킹 구성요소(무선 또는 유선 중 어느 하나)를 가지는 장치다. 상기 컴퓨터는 예를 들어, 마이크로소프트의 윈도우와 호환되는 운영 체제, 애플 OS X 또는 iOS, 리눅스 배포판(Linux distribution), 또는 구글의 안드로이드 OS와 같은 운영체제(operating system)를 실행할 수 있다. The computer may be a computing device, such as a desktop computer, laptop computer, notebook, smart phone, or the like, or any device that may be integrated. A computer is a device having one or more alternative, special purpose processors, memory, storage, and networking components (either wireless or wired). The computer can run, for example, an operating system that is compatible with Microsoft's Windows, Apple OS X or iOS, a Linux distribution, or an operating system such as Google's Android OS.

또한, 본 실시예를 구현하기 위한 컴퓨터 프로그램은 상술한 컴퓨터에서 실행될 수 있는 기능적인 프로그램, 명령어, 및 코드들의 형태로 구성된다. 상기 서술된 동작 및 그들에 연관된 부(unit), 모듈 등은 소프트웨어, 펌웨어, 하드웨어 또는 이들의 임의의 조합 내에서 구현될 수도 있다. 예를 들어, 소프트웨어 모듈은 컴퓨터 프로그램 코드를 포함하는 컴퓨터-판독가능 매체로 구성되는 컴퓨터 프로그램 제품과 함께 구현되고, 이는 기술된 임의의 또는 모든 단계, 동작, 또는 과정을 수행하기 위한 컴퓨터 프로세서에 의해 실행될 수 있다. 본 실시예를 구현하기 위한 기능적인 프로그램, 명령어, 및 코드는 본 실시예가 속하는 기술 분야의 통상의 기술자에 의해 용이하게 이해될 수 있을 것이다.In addition, the computer program for implementing the present embodiment is configured in the form of functional programs, instructions, and codes that can be executed in the computer described above. The operations described above and their associated units, modules, and the like may be implemented within software, firmware, hardware, or any combination thereof. For example, a software module is implemented with a computer program product, which is comprised of a computer-readable medium containing computer program code, which is executed by a computer processor to perform any or all of the steps, operations, or processes described. Can be executed. Functional programs, instructions, and codes for implementing the present embodiment will be readily understood by those skilled in the art.

또한, 지능형 음성 정보 제공 시스템 및 방법에 의한 동작은 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다. 실시예들에 따른 지능형 음성 정보 제공 시스템 및 방법에 의한 동작을 구현하기 위한 프로그램이 기록되고 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.In addition, operations by the intelligent voice information providing system and method may be recorded on a computer-readable recording medium. A computer-readable recording medium having recorded thereon a program for implementing an operation by an intelligent voice information providing system and method according to the embodiments includes all kinds of recording devices storing data that can be read by a computer. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이상에서 살펴본 본 발명은 도면에 도시된 실시예들을 참고로 하여 설명하였으나 이는 예시적인 것에 불과하며 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 실시예의 변형이 가능하다는 점을 이해할 것이다. 그러나 이와 같은 변형은 본 발명의 기술적 보호범위 내에 있다고 보아야 한다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해서 정해져야 할 것이다.Although the present invention described above has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and variations may be made therefrom. However, such modifications should be considered to be within the technical protection scope of the present invention. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

Claims

A spatial context converter configured to generate a first spatial context in which objects of interest included in the target space are converted into nodes;
A spatial processor configured to calculate a location of a user located in the target space and to generate a second spatial context by combining the location of the user and the first spatial context; And
If a voice query associated with the target space is received, a voice processor for generating an answer corresponding to the voice query based on the second spatial context,
The space processing unit,
And configure the user to be a user node based on the user location, and to include the user node in the first spatial context.

The method of claim 1,
The spatial context transform unit,
Acquiring the raw image information of the target space, analyzing the raw image information to shape an object of interest included in the target space into an object of interest, and a first line including a connection line connecting the object of interest and the object of interest Intelligent voice information providing system further configured to create a spatial context.

The method of claim 2,
The spatial context transform unit,
Wherein based on the raw image information, and reconstruct the target area in a three-dimensional basic space coming, and identify and index the object of interest located within the three-dimensional basic space, configured further to shape the said index object as objects of interest node Intelligent voice information providing system.

The method of claim 3,
The spatial context transform unit,
And dividing the three-dimensional basic space into segments including a single object of interest, and applying the spatial shape corresponding to the object of interest to each segment.

The method of claim 1,
The space processing unit,
And configured to create a second spatial context including a connection line that sets the user node as a starting point and connects each node so as not to pass through another node.

The method of claim 1,
The space processing unit,
And acquire a first-person image corresponding to the first-person view of the user, and calculate a location of the user based on a shape of the object of interest included in the first-person view image.

The method of claim 1,
The space processing unit,
And acquire a third person image corresponding to a third person view of the user, and calculate a location of the user based on a shape of the user included in the third person view image.

The method of claim 1,
And when only a first object of interest is included in the content of the voice query, the answer is expressed by a relative positional relationship with the second object of interest that is different from the first object of interest.

The method of claim 8,
The second object of interest is,
Intelligent voice information providing system is determined based on the distance to the first object of interest.

The method of claim 9,
The second object of interest is,
The intelligent voice information providing system is further determined based on the attributes of the object of interest located in the target space.

The method of claim 1,
When the subspace of the target space is related to the content of the voice query, the answer is generated using the location of the subspace,
The location of the subspace searches for the object of interest associated with the subspace, and is determined based on the found object of interest.

The method of claim 11,
And the location of the subspace is a center coordinate with respect to the found location of the object of interest.

An intelligent voice information providing method performed by an intelligent voice information providing system,
Obtaining a first spatial context in which the object of interest included in the target space is shaped as a node;
Calculating a location of a user located in the target space;
Combining a location of the user with the first spatial context to create a second spatial context; And
When receiving a voice query associated with the target space, generating an answer corresponding to the voice query based on the second spatial context,
Creating the second spatial context,
Shaping the user into a user node based on the user location; And
And including the user node in the first spatial context.

The method of claim 13,
Acquiring the first spatial context,
Obtaining raw image information of the target space;
Analyzing the raw image information and shaping an object of interest located in the target space into an object of interest node; And
And generating a first spatial context including a connection line connecting the object of interest node and the object of interest node.

The method of claim 14,
Analyzing the raw image information and shaping the object of interest located in the target space into an object of interest node,
Reconstructing a three-dimensional space, the basic target area on the basis of the source image information;
Recognizing an object of interest included in the three-dimensional basic space;
Indexing the recognized object of interest; And
Shaping the indexed object into an object of interest node.

The method of claim 15,
Analyzing the raw image information and shaping the object of interest located in the target space into an object of interest node,
Dividing the three-dimensional basic space into segments containing a single object of interest; And
And applying a spatial shape corresponding to the single object of interest to the segment.

delete

The method of claim 13,
Creating the second spatial context,
Setting the user node as a starting point; And
And generating a connection line connecting each node so as not to pass through another node.

The method of claim 13,
Generating an answer corresponding to the voice query using the second spatial context,
Analyzing the contents of the voice query and requesting necessary information;
Searching for a node included in the second spatial context in response to the request;
Generating and returning necessary information based on the search result; And
Intelligent voice information providing method comprising the step of generating an answer using the returned necessary information.

A computer readable recording medium storing instructions, wherein when the instructions are executed, the processor performs the intelligent voice information providing method according to any one of claims 13 to 16, 18 and 19. Computer-readable recording medium.