KR101742779B1

KR101742779B1 - System for making dynamic digital image by voice recognition

Info

Publication number: KR101742779B1
Application number: KR1020150066329A
Authority: KR
Inventors: 이석희; 김건우
Original assignee: 이석희; 김건우
Priority date: 2015-05-12
Filing date: 2015-05-12
Publication date: 2017-06-01
Anticipated expiration: 2035-05-12
Also published as: KR20160133335A

Abstract

본 발명은 음성인식형 입체적 디지털영상 구현시스템을 제공한다. 이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 음성인식을 통해 자동으로 생성되는 음성기반 텍스트정보가 영상프레임 내부에 표기되도록 함으로써 사진, 동영상과 같은 영상프레임이 생동감있고 입체적으로 구현될 수 있으며, 다양한 편집기능이 제공되어 사용자의 편의 향상과 다이나믹한 영상프레임 구현이 가능해지는 한편, 인식된 음성정보를 실시간 확인하여 유효하지 않은 음성정보가 인식되거나 음성정보 인식오류가 발생될 경우 음성인식과 음성기반 텍스트정보의 생성이 재실행될 수 있도록 하고, 필요에 따라 촬영자의 음성과 피사체의 음성 중 어느 하나가 선택적으로 인식될 수 있도록 함으로써 음성인식의 기능성이 향상되며, 음성기반 텍스트정보 이외에 미리 설정되어 있는 기본제공 텍스트정보나 사용자에 의해 입력되는 직접입력 텍스트정보도 영상프레임 내부에 표기될 수 있도록 함으로써 사용자의 선택 폭이 넓어지고 텍스트정보의 표기가 다변화될 수 있는 기술적 특징을 갖는다.
본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 외부의 음성을 인식하는 음성인식모듈(10a)과; 음성인식모듈(10a)로부터 인식된 음성정보를 텍스트정보로 변환하여 음성기반 텍스트정보를 생성시키는 정보변환모듈(20)과; 영상프레임 내부에 표기되는 텍스트정보에 대한 특성정보와 편집정보를 설정하는 텍스트정보 표기 설정모듈(30a)과; 피사체를 촬영하여 영상프레임을 생성시키는 촬영모듈(40)과; 사용자에 의해 음성인식모듈(10a)의 활성화 유무가 선택되도록 하는 음성인식모듈 활성화 관리모듈(10b)과; 음성인식모듈(10a)이 활성화될 경우, 텍스트정보 표기 설정모듈(30a)로부터 설정된 텍스트정보에 대한 특성정보와 편집정보에 맞추어 촬영모듈(40)로부터 생성된 영상프레임에 음성기반 텍스트정보가 표기되도록 하는 텍스트정보 표시 관리모듈(30b) 및; 텍스트정보가 표기된 영상프레임인 편집완료 영상프레임을 저장하고 관리하는 편집완료 영상프레임 관리모듈(30c)을 포함하는 구성으로 이루어진다.The present invention provides a stereoscopic digital image implementation system for speech recognition. According to the present invention, in the system for realizing the stereoscopic digital image of speech recognition, the voice-based text information generated automatically by speech recognition is displayed in the image frame, so that image frames such as photographs and moving images can be implemented in a lively and stereoscopic manner In addition, various editing functions are provided to improve the user's convenience and realize a dynamic image frame. In addition, when invalid voice information is recognized by recognizing recognized voice information in real time, or when voice recognition error occurs, The generation of the voice-based text information can be re-executed, and the function of the voice recognition can be improved by allowing either of the photographer's voice and the voice of the subject to be selectively recognized as needed, Built-in text information or To widen the user's choice so that it can be indicated by the internal direct input text information is inputted image frame has a technical feature that the representation of the text information may be diversified.
A speech recognition type stereoscopic digital image realization system according to the present invention includes a speech recognition module 10a for recognizing an external speech; An information conversion module 20 for converting the voice information recognized by the voice recognition module 10a into text information to generate voice-based text information; A text information notation setting module 30a for setting characteristic information and edit information for text information displayed in a video frame; An imaging module (40) for imaging an object to generate an image frame; A voice recognition module activation management module 10b for allowing the user to select whether the voice recognition module 10a is activated or not; When the voice recognition module 10a is activated, the voice information is displayed in the video frame generated from the photographing module 40 in accordance with the characteristic information and the editing information of the text information set from the text information notation setting module 30a A text information display management module 30b; And an edited image frame management module 30c for storing and managing an edited image frame that is an image frame in which text information is displayed.

Description

[0001] The present invention relates to a voice recognition type stereoscopic digital image realization system,

본 발명은 음성인식형 입체적 디지털영상 구현시스템에 관한 것으로, 좀더 구체적으로는 음성인식을 통해 자동으로 생성되는 음성기반 텍스트정보가 영상프레임 내부에 표기되도록 함으로써 사진, 동영상과 같은 영상프레임이 생동감있고 입체적으로 구현될 수 있으며, 다양한 편집기능이 제공되어 사용자의 편의 향상과 다이나믹한 영상프레임 구현이 가능해지는 한편, 인식된 음성정보를 실시간 확인하여 유효하지 않은 음성정보가 인식되거나 음성정보 인식오류가 발생될 경우 음성인식과 음성기반 텍스트정보의 생성이 재실행될 수 있도록 하고, 필요에 따라 촬영자의 음성과 피사체의 음성 중 어느 하나가 선택적으로 인식될 수 있도록 함으로써 음성인식의 기능성이 향상되며, 음성기반 텍스트정보 이외에 미리 설정되어 있는 기본제공 텍스트정보나 사용자에 의해 입력되는 직접입력 텍스트정보도 영상프레임 내부에 표기될 수 있도록 함으로써 사용자의 선택 폭이 넓어지고 텍스트정보의 표기가 다변화될 수 있는 음성인식형 입체적 디지털영상 구현시스템에 관한 것이다.
More particularly, the present invention relates to a voice recognition type stereoscopic digital image realization system, and more particularly, to a voice recognition system in which voice-based text information automatically generated through voice recognition is displayed in an image frame, And various editing functions are provided to improve user's convenience and realize a dynamic image frame. Meanwhile, it is possible to check the recognized voice information in real time, so that invalid voice information is recognized or a voice information recognition error occurs The function of speech recognition is improved by making it possible to re-execute the speech recognition and the generation of the voice-based text information and to selectively recognize either the voice of the photographer or the voice of the subject as needed, In addition, pre-set built-in text Bona directly input by the user input text information will also widen the choice of the user by so that it can be marked on the inside of the image frame to a speech recognition-type three-dimensional digital images implementing a system with a representation of the text information may be diversified.

최근 광학기술 및 전자산업의 급격한 발달로 새로운 종류의 디지털 기기들이 대량으로 출현하고 있고, 종래의 카메라 및 휴대폰 등도 새로운 개념으로 발전하고 있다. 예를 들어, 종래의 카메라는 빛을 이용하여 촬상한 이미지를 기록하고, 현상 및 인화하는 과정을 거친 후에 촬상한 이미지를 볼 수 있다.Recently, with the rapid development of optical technology and electronic industry, new kinds of digital devices are emerging in large quantities, and conventional cameras and mobile phones are being developed as new concepts. For example, in a conventional camera, an image captured using light can be recorded, developed and printed, and then an image captured can be viewed.

이에 대해, 디지털 카메라(또는 DSLR)는 사진을 찍은 후 복잡한 현상 및 인화 과정을 거치는 것이 아니라 화면을 카메라에 내장된 디지털 저장 매체에 저장하고, 모니터 또는 프린터를 이용하여 출력함으로써 손쉽게 촬상한 영상을 확인할 수 있다. 이러한 디지털 카메라는 종래의 카메라와 스캐너의 역할을 대체할 수 있고, PC의 화상 데이터와 호환성이 높아 편집 및 수정을 간편하게 할 수 있다는 장점을 갖는다. 그러나 이러한 디지털 카메라는 시간이 지난 후에 영상의 촬영장소, 촬영시의 느낌 및 동행인 등과 같이 촬영한 시점의 구체적 상황을 촬영된 영상만으로는 쉽게 기억할 수 없다는 문제점을 갖는다.On the other hand, digital cameras (or DSLRs) do not require complicated phenomena and print processes after taking a picture, but they store the screen in a digital storage medium built in the camera and output it by using a monitor or a printer to easily check the image . Such a digital camera can replace the conventional functions of a camera and a scanner, and has a high compatibility with image data of a PC, so that it can be easily edited and corrected. However, such a digital camera has a problem that it is not possible to easily memorize a specific situation at the time of photographing, such as a photographing place, a feeling at the time of photographing, and a companion after a lapse of time.

한편 휴대폰은 피처폰에서 스마트폰 등과 같이 다양하게 발전하고 있으며, 동영상 또는 이미지 등의 영상을 촬영하거나 촬영된 영상을 전송할 수도 있다. 그러나 휴대폰으로 촬영된 동영상 또는 이미지 역시 시간이 지난 후에는 디지털 카메라와 동일한 문제점을 갖는다.
On the other hand, mobile phones have been developed in a variety of ways, such as smart phones, in feature phones, and can capture images such as moving images or images, or transmit captured images. However, a moving image or an image shot by a mobile phone has the same problem as a digital camera after a lapse of time.

이를 개선하여 영상에 텍스트와 같은 추가적인 정보를 입력하는 기술들이 개발되고 있는데, 이와 관련한 기술로는 대한민국 등록특허공보 등록번호 제10-1053045호 "영상물의 정보 입력 시스템", 등록번호 제10-1115701호 "음성 인식 기술을 이용하여 생성된 메타데이터로 영상 콘텐츠에 주석을 달기 위한 방법 및 장치" 등이 안출되어 있다.Techniques for inputting additional information such as text on a video image have been developed by improving the above-mentioned technologies. For example, Korean Patent Registration No. 10-1053045 entitled " Video Information Input System ", Registration No. 10-1115701 "Method and Apparatus for Annotating Image Content with Metadata Generated by Using Speech Recognition Technology"

이와 같은 종래기술들은 음성인식을 통해 음성을 인식하고, 인식된 음성을 문자텍스트 변환하여 디지털 사진에 부가하는 기술을 제안하고 있으나, 단순히 인식된 음성을 텍스트로 변환하여 사진에 부가하는 기술이어서 다양한 편집기능이 제공되는 않는 한계가 있었으며, 다수 인원이 음성을 발화하고 있는 상황에서 음성인식 오류가 발생되어 제기능을 수행하지 못하는 문제점이 있었다.
Such conventional techniques have proposed a technique of recognizing a voice through voice recognition and converting the recognized voice into a text text and adding it to a digital picture. However, since the technique of simply converting a recognized voice into text and adding it to a picture, There is a limitation in that the function is not provided, and a voice recognition error occurs in a situation where a large number of people utter voice, thus failing to perform the function.

삭제delete

대한민국 등록특허공보 등록번호 제10-1053045호 "영상물의 정보 입력 시스템"Korean Patent Registration No. 10-1053045 entitled " Information Input System for Video & 대한민국 등록특허공보 등록번호 제10-1115701호 "음성 인식 기술을 이용하여 생성된 메타데이터로 영상 콘텐츠에 주석을 달기 위한 방법 및 장치"Korean Registered Patent Publication No. 10-1115701 entitled " METHOD AND APPARATUS FOR INTERPRETING IMAGE CAPTURES WITH METADATA CREATED BY SPEECH RECOGNITION TECHNIQUE "

따라서 본 발명은 이와 같은 종래 기술의 문제점을 개선하여, 음성인식을 통해 자동으로 생성되는 음성기반 텍스트정보가 영상프레임 내부에 표기되도록 함으로써 사진, 동영상과 같은 영상프레임이 생동감있고 입체적으로 구현될 수 있는 새로운 형태의 음성인식형 입체적 디지털영상 구현시스템을 제공하는 것을 목적으로 한다.
SUMMARY OF THE INVENTION Accordingly, the present invention has been made to solve the above-mentioned problems of the conventional art, and it is an object of the present invention to provide an image display apparatus and a display method therefor, And it is an object of the present invention to provide a new type of voice recognition type stereoscopic digital image implementation system.

그리고 본 발명은 다양한 편집기능이 제공되어 사용자의 편의 향상과 다이나믹한 영상프레임 구현이 가능해지는 새로운 형태의 음성인식형 입체적 디지털영상 구현시스템을 제공하는 것을 목적으로 한다.
It is another object of the present invention to provide a new type of voice recognition type stereoscopic digital image realization system in which a variety of editing functions are provided so that the convenience of the user can be improved and a dynamic image frame can be realized.

또한 본 발명은 인식된 음성정보를 실시간 확인하여 유효하지 않은 음성정보가 인식되거나 음성정보 인식오류가 발생될 경우 음성인식과 음성기반 텍스트정보의 생성이 재실행될 수 있도록 함으로써 잘못된 음성기반 텍스트정보가 영상프레임에 표기되는 것이 방지될 수 있는 새로운 형태의 음성인식형 입체적 디지털영상 구현시스템을 제공하는 것을 목적으로 한다.
Further, according to the present invention, when invalid voice information is recognized or a voice information recognition error occurs, the voice recognition and the generation of voice-based text information can be re-executed by checking the recognized voice information in real time, It is an object of the present invention to provide a new type of voice recognition type stereoscopic digital image implementation system that can be prevented from being displayed in a frame.

이와 더불어 본 발명은 필요에 따라 촬영자의 음성과 피사체의 음성 중 어느 하나가 선택적으로 인식될 수 있도록 함으로써 음성인식의 기능성이 향상되는 새로운 형태의 음성인식형 입체적 디지털영상 구현시스템을 제공하는 것을 목적으로 한다.
Another object of the present invention is to provide a new type of voice recognition type stereoscopic digital image realization system in which the function of the voice recognition is improved by allowing one of the voice of the photographer and the voice of the subject to be selectively recognized as necessary do.

덧붙여 본 발명은 음성기반 텍스트정보 이외에 미리 설정되어 있는 기본제공 텍스트정보나 사용자에 의해 입력되는 직접입력 텍스트정보도 영상프레임 내부에 표기될 수 있도록 함으로써 사용자의 선택 폭이 넓어지고 텍스트정보의 표기가 다변화될 수 있는 새로운 형태의 음성인식형 입체적 디지털영상 구현시스템을 제공하는 것을 목적으로 한다.
In addition, in addition to the voice-based text information, the present invention can display predefined built-in text information or direct input text information inputted by the user in the video frame, thereby widening the selection width of the user and diversifying the representation of the text information Dimensional stereoscopic image, and to provide a new type of voice recognition type stereoscopic digital image realization system that can be used.

상술한 목적을 달성하기 위한 본 발명의 특징에 의하면, 본 발명은 외부의 음성을 인식하는 음성인식모듈(10a)과; 음성인식모듈(10a)로부터 인식된 음성정보를 텍스트정보로 변환하여 음성기반 텍스트정보를 생성시키는 정보변환모듈(20)과; 영상프레임 내부에 표기되는 텍스트정보에 대한 특성정보와 편집정보를 설정하는 텍스트정보 표기 설정모듈(30a)과; 피사체를 촬영하여 영상프레임을 생성시키는 촬영모듈(40)과; 사용자에 의해 음성인식모듈(10a)의 활성화 유무가 선택되도록 하는 음성인식모듈 활성화 관리모듈(10b)과; 음성인식모듈(10a)이 활성화될 경우, 텍스트정보 표기 설정모듈(30a)로부터 설정된 텍스트정보에 대한 특성정보와 편집정보에 맞추어 촬영모듈(40)로부터 생성된 영상프레임에 음성기반 텍스트정보가 표기되도록 하는 텍스트정보 표시 관리모듈(30b) 및; 텍스트정보가 표기된 영상프레임인 편집완료 영상프레임을 저장하고 관리하는 편집완료 영상프레임 관리모듈(30c)을 포함하는 것을 특징으로 하는 음성인식형 입체적 디지털영상 구현시스템을 제공한다.
According to an aspect of the present invention for achieving the above-described object, the present invention provides a speech recognition system including: a speech recognition module (10a) for recognizing an external speech; An information conversion module 20 for converting the voice information recognized by the voice recognition module 10a into text information to generate voice-based text information; A text information notation setting module 30a for setting characteristic information and edit information for text information displayed in a video frame; An imaging module (40) for imaging an object to generate an image frame; A voice recognition module activation management module 10b for allowing the user to select whether the voice recognition module 10a is activated or not; When the voice recognition module 10a is activated, the voice information is displayed in the video frame generated from the photographing module 40 in accordance with the characteristic information and the editing information of the text information set from the text information notation setting module 30a A text information display management module 30b; And an edited image frame management module 30c for storing and managing an edited image frame that is an image frame in which text information is displayed.

이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템에서 텍스트정보 표기 설정모듈(30a)은 텍스트정보의 글자크기가 설정되는 글자크기 설정유닛(31)과; 텍스트정보의 글자체가 설정되는 글자체 설정유닛(32)과; 텍스트정보의 글자색채가 설정되는 글자색채 설정유닛(33)과; 텍스트정보의 표기 언어 종류가 설정되는 표기 언어 설정유닛(34)과; 텍스트정보의 영상프레임 내 표기 위치가 설정되는 텍스트 표기위치 설정유닛(35)을 포함할 수 있다.
In the speech recognition type stereoscopic digital image realization system according to the present invention, the text information notation setting module 30a includes a text size setting unit 31 for setting a text size of text information; A font setting unit (32) for setting a font of text information; A character color setting unit (33) for setting a character color of text information; A markup language setting unit (34) for setting a markup language type of text information; And a text notation position setting unit 35 in which the notation position in the video frame of the text information is set.

이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 음성인식모듈(10a)과 정보변환모듈(20)로부터 생성된 음성기반 텍스트정보를 실시간 출력하는 디스플레이모듈(50)을 포함하여, 인식된 음성정보의 실시간 확인이 가능하도록 할 수 있다.
The system for implementing stereoscopic digital image recognition of speech according to the present invention includes a speech recognition module 10a and a display module 50 for outputting the voice-based text information generated from the information conversion module 20 in real time, It is possible to enable real-time confirmation of voice information.

이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 디스플레이모듈(50)에 출력된 음성기반 텍스트정보에 대한 사용자의 유효성 유무 판단에 의해 유효하지 않은 것으로 판단된 음성기반 텍스트정보가 삭제되도록 하는 음성기반 텍스트정보 삭제모듈(60)을 포함하여, 인식된 음성정보의 유효성 유무의 판단을 통해 유효하지 않은 음성정보 인식시 음성인식과 음성기반 텍스트정보의 생성이 재실행되도록 할 수 있다.
The system for realizing a three-dimensional digital image of speech recognition according to the present invention allows the voice-based text information determined to be invalid by the user's validity determination of the voice-based text information output to the display module 50 to be deleted Based on the determination of the validity of the recognized voice information, the voice-based text information deletion module 60 can be used to re-execute the voice recognition and the generation of the voice-based text information upon recognition of the invalid voice information.

이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 기본제공 텍스트정보가 설정되어 저장된 기본제공 텍스트정보 설정모듈(70)과; 사용자에 의해 기본제공 텍스트정보의 사용 유무가 선택되도록 하는 한편, 기본제공 텍스트정보 사용 선택시 디스플레이모듈(50)로 현재 설정되어 있는 복수의 기본제공 텍스트정보를 출력하여 사용자에 의해 선택되도록 하는 기본제공 텍스트정보 관리모듈(71)을 포함하고, 사용자에 의해 기본제공 텍스트정보 사용 선택시 음성인식모듈(10a)이 비활성화되도록 할 수 있다.
The system for realizing a three-dimensional digital image according to the present invention includes a built-in text information setting module 70 in which built-in text information is set and stored; The display module 50 outputs a plurality of built-in text information currently set by the user and selects the built-in text information to be selected by the user when the built-in text information is selected, And a text information management module 71. The speech recognition module 10a may be disabled when the user selects to use the built-in text information.

이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 사용자에 의해 텍스트가 직접 입력되어 저장되는 직접입력 텍스트정보 입력모듈(80)과; 직접입력 텍스트정보의 사용 유무가 사용자에 의해 선택되도록 하는 한편, 직접입력 텍스트정보 사용 선택시 영상프레임 내부에 직접입력 텍스트정보가 표기되도록 하는 직접입력 텍스트정보 관리모듈(81)을 포함하고, 사용자에 의해 직접입력 텍스트정보 사용 선택시 음성인식모듈(10a)이 비활성화되도록 할 수 있다.
The system for realizing a three-dimensional digital image according to the present invention includes a direct input text information input module 80 for directly inputting and storing text by a user; A direct input text information management module 81 for allowing the user to select whether the direct input text information is used or not and directly inputting the input text information in the image frame when the direct input text information is selected, The voice recognition module 10a can be deactivated when the direct input text information is selected.

이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 촬영모듈(40)의 위치지점으로부터 설정영역 내부에 위치한 촬영자의 음성을 인식하는 촬영자 음성인식모드와, 촬영모듈(40)의 위치지점으로부터 설정영역 외부에 위치한 피사체의 음성을 인식하는 피사체 음성인식모드 중에서 선택된 어느 하나의 음성인식모드가 사용자에 의해 선택되도록 하는 음성인식모드 선택모듈(90)을 포함하고, 음성인식모듈(10a)은 선택된 음성인식모드에 따라 촬영자 음성인식과 피사체 음성인식 중에서 선택된 어느 하나를 실행하게 될 수 있다.
The system for implementing stereoscopic digital image recognition of speech according to the present invention includes a photographer's voice recognition mode for recognizing the voice of the photographer located in the setting area from the position of the photographing module 40, A voice recognition mode selection module (90) for allowing a user to select one voice recognition mode selected from among a voice recognition mode for recognizing a voice of a subject located outside the setting area and a voice recognition module It is possible to execute any one of the photographer's voice recognition and the subject voice recognition according to the voice recognition mode.

이와 같은 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템은 휴대폰(2), 스마트 폰(3)을 포함하는 스마트 기기(5), 디지털 카메라(6)를 포함하는 디지털 기기(1)에 음성인식모듈(10a), 음성인식모듈 활성화 관리모듈(10b), 정보변환모듈(20), 텍스트정보 표기 설정모듈(30a), 촬영모듈(40), 텍스트정보 표시 관리모듈(30b), 편집완료 영상프레임 관리모듈(30c)이 구비되도록 할 수 있다.
The system for realizing a three-dimensional digital image of speech recognition according to the present invention includes a smart device 5 including a mobile phone 2, a smart phone 3, a digital camera 1 including a digital camera 6, Module 10a, the voice recognition module activation management module 10b, the information conversion module 20, the text information notation setting module 30a, the photographing module 40, the text information display management module 30b, Management module 30c may be provided.

본 발명에 의한 음성인식형 입체적 디지털영상 구현시스템에 의하면, 사진, 동영상과 같은 영상프레임이 생동감있고 입체적으로 구현되고, 다양한 편집기능이 제공으로 사용자의 편의가 향상되며 다이나믹한 영상프레임 구현이 가능해지는 효과가 있다. 또한 본 발명에 의한 음성인식형 입체적 디지털영상 구현시스템에 의하면, 인식된 음성정보의 실시간 확인을 통해 유효하지 않은 음성정보가 인식되거나 음성정보 인식오류가 발생될 경우 음성인식과 음성기반 텍스트정보의 생성이 재실행될 수 있으므로 잘못된 음성기반 텍스트정보가 영상프레임에 표기되는 것이 방지되는 효과가 있다.According to the speech recognition type stereoscopic digital image realization system according to the present invention, image frames such as photographs and moving images are implemented in a lively and stereoscopic manner and various editing functions are provided, thereby improving user's convenience and realizing a dynamic image frame It is effective. In addition, according to the system for realizing a three-dimensional digital image of speech recognition, according to the present invention, when invalid voice information is recognized through real-time confirmation of recognized voice information or when a voice information recognition error occurs, voice recognition and voice- The erroneous voice-based text information is prevented from being displayed in the image frame.

그리고 본 발명에 의한 음성인식형 입체적 디지털영상 구현시스템에 의하면, 필요에 따라 촬영자의 음성이나 피사체의 음성이 선택적으로 인식되도록 하므로, 음성인식의 기능성이 향상되며, 음성기반 텍스트정보 이외에 미리 설정되어 있는 기본제공 텍스트정보나 사용자에 의해 입력되는 직접입력 텍스트정보도 영상프레임 내부에 표기될 수 있으므로, 사용자의 선택 폭이 넓어지고 텍스트정보의 표기가 다변화되는 효과가 있다.
According to the system for implementing stereoscopic digital image recognition of speech according to the present invention, since the voice of the photographer or the voice of the subject is selectively recognized as needed, the functionality of the voice recognition is improved, and in addition to the voice- Since the built-in text information or the direct input text information inputted by the user can be displayed in the image frame, the selection width of the user is widened and the representation of the text information is diversified.

도 1은 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템이 적용되는 디지털 기기의 예시도;
도 2는 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템의 기본 구성블록도;
도 3은 본 발명의 실시예에 따른 텍스트정보 표기 설정모듈의 구성블록도;
도 4는 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템의 음성정보 유효성 판단 구성을 보여주기 위한 블록도;
도 5는 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템의 확장 구성블록도;
도 6은 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템의 음성정보모드 선택 및 실행 구성을 보여주기 위한 블록도이다.1 is an exemplary diagram of a digital device to which a speech recognition type stereoscopic digital image realization system according to an embodiment of the present invention is applied;
2 is a block diagram of a basic configuration of a speech recognition type stereoscopic digital image realization system according to an embodiment of the present invention;
3 is a configuration block diagram of a text information notation setting module according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating a configuration of determining a voice information validity of a system for implementing a three-dimensional digital video image recognition system according to an embodiment of the present invention; FIG.
5 is an expanded block diagram of a system for implementing a stereoscopic digital image recognition system according to an embodiment of the present invention;
FIG. 6 is a block diagram for illustrating a voice information mode selection and execution configuration of a voice recognition type stereoscopic digital image implementation system according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면 도 1 내지 도 6에 의거하여 상세히 설명한다. 한편, 도면과 상세한 설명에서 일반적인 음성인식, 음성정보의 텍스트정보 변환기술, 영상 촬영기술, 텍스트의 영상프레임 표기 기술, 영상 편집기술 등으로부터 이 분야의 종사자들이 용이하게 알 수 있는 구성 및 작용에 대한 도시 및 언급은 간략히 하거나 생략하였다. 특히 도면의 도시 및 상세한 설명에 있어서 본 발명의 기술적 특징과 직접적으로 연관되지 않는 요소의 구체적인 기술적 구성 및 작용에 대한 상세한 설명 및 도시는 생략하고, 본 발명과 관련되는 기술적 구성만을 간략하게 도시하거나 설명하였다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings 1 to 6. On the other hand, in the drawings and the detailed description, it is assumed that a person skilled in the art can easily understand the constitution and operation from the speech recognition, the text information conversion technique of the voice information, the image photographing technique, The cities and references are brief or omitted. In the drawings and specification, there are shown in the drawings and will not be described in detail, and only the technical features related to the present invention are shown or described only briefly. Respectively.

본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템(100)은 도 1에서와 같이 휴대폰(2), 스마트 폰(3)이나 스마트 패드(4)와 같은 스마트 기기(5), 디지털 카메라(6)를 포함하는 디지털 기기(1)에 적용되는 것으로, 음성인식을 통해 자동으로 생성되는 음성기반 텍스트정보가 영상프레임 내부에 표기되도록 함으로써 사진, 동영상과 같은 영상프레임이 생동감있고 입체적으로 구현될 수 있도록 한다.1, the system 100 for implementing a voice recognition type stereoscopic digital image according to the present invention includes a smart device 5 such as a mobile phone 2, a smart phone 3 or a smart pad 4, a digital camera 6, Based text information that is automatically generated through speech recognition is displayed in an image frame so that image frames such as photographs and moving images can be implemented in a lively and stereoscopic manner .

이를 위한 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템(100)은 도 2에서와 같이 음성인식모듈(10a), 정보변환모듈(20), 텍스트정보 표기 설정모듈(30a), 촬영모듈(40), 음성인식모듈 활성화 관리모듈(10b), 텍스트정보 표시 관리모듈(30b), 편집완료 영상프레임 관리모듈(30c)을 포함하는 구성으로 이루어진다.2, the speech recognition type stereoscopic digital image implementation system 100 according to an embodiment of the present invention includes a speech recognition module 10a, an information conversion module 20, a text information notation setting module 30a, Module 40, a voice recognition module activation management module 10b, a text information display management module 30b, and an edited image frame management module 30c.

여기서 음성인식모듈(10a), 음성인식모듈 활성화 관리모듈(10b), 정보변환모듈(20), 텍스트정보 표기 설정모듈(30a), 촬영모듈(40), 텍스트정보 표시 관리모듈(30b), 편집완료 영상프레임 관리모듈(30c)이 전술(前述)된 각종 디지털 기기(1)에 구비되면서 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템(100)이 구현되게 된다.
Here, the voice recognition module 10a, the voice recognition module activation management module 10b, the information conversion module 20, the text information notation setting module 30a, the photographing module 40, the text information display management module 30b, The complete image frame management module 30c is provided in the various digital devices 1 described above to implement the voice recognition type stereoscopic digital image implementation system 100 according to the present invention.

음성인식모듈(10a)은 외부의 음성을 인식하는 것으로, 촬영자나 피사체 인물이 외치는 각종 구호, 소리, 음성들을 인식하게 된다. 즉 영상촬영시 발화될 수 있는“자기야 사랑해”“다같이 화이팅”“여기 프랑스야”“부모님 건강하세요”“몇년도 몇 번째 생일날”“11번째 결혼기념일 아내와 함께”“막내 돌잔칫날”“김치”"치즈““스마일”“친구야 힘내”“해운대에서”“고등학교 동기모임”“자 찍습니다”“I love you" 등의 각종 음성들이 음성인식모듈(10a)을 통해 인식될 수 있다.
The voice recognition module 10a recognizes an external voice and recognizes various slogans, sounds, and voices cited by the photographer and the subject. "I love you baby", which can be ignited at the time of filming. "I'm in France.""I am here in France.""My parents are healthy.""A few years birthday.""11th wedding anniversary with my wife."" A variety of voices can be recognized through the voice recognition module 10a such as " cheese "," smiley ",&

정보변환모듈(20)은 음성인식모듈(10a)로부터 인식된 음성정보를 텍스트정보로 변환하여 음성기반 텍스트정보를 생성시키는 모듈이다.
The information conversion module 20 is a module for converting the voice information recognized from the voice recognition module 10a into text information to generate voice-based text information.

텍스트정보 표기 설정모듈(30a)은 영상프레임 내부에 표기되는 텍스트정보에 대한 특성정보와 편집정보를 설정하는 모듈이다. 이를 위하여 텍스트정보 표기 설정모듈(30a)은 도 3에서와 같이 텍스트정보의 글자크기(4, 6, 8, 10, 12, 15, 30 등의 글자크기)가 설정되는 글자크기 설정유닛(31), 텍스트정보의 글자체(명조체, 고딕체, 궁서체, 굴림체, 돋움체, 바탕체, 기타 적용가능한 모든 글자체)가 설정되는 글자체 설정유닛(32), 텍스트정보의 글자색체(사용자가 원하는 각종 색채)가 설정되는 글자색채 설정유닛(33), 텍스트정보의 표기 언어 종류(한글, 영어, 중국어, 일어, 프랑스어, 독일어 등 모든 언어)가 설정되는 표기 언어 설정유닛(34), 텍스트정보의 영상프레임 내 표기 위치(우상면, 우하면, 좌상면, 좌하면, 정중앙 등 사진상의 모든 위치)가 설정되는 텍스트 표기위치 설정유닛(35)을 포함할 수 있다.
The text information notation setting module 30a is a module for setting property information and edit information for text information displayed in an image frame. The text information notifying module 30a includes a text size setting unit 31 for setting the text size (4, 6, 8, 10, 12, 15, 30, etc.) A font setting unit 32 for setting fonts (textures, texts, gothic fonts, arithmetic fonts, cursors, primitives, bases and other applicable fonts) of text information; fonts for setting character colors (various colors desired by the user) A color setting unit 33, a notation language setting unit 34 for setting a notation language type of text information (all languages such as Korean, English, Chinese, Japanese, French and German) And a text notation position setting unit 35 in which all positions on a photograph, such as a face, a right upper face, an upper left face, a left lower face, and a center, are set.

촬영모듈(40)은 피사체를 촬영하여 영상프레임을 생성시키는 모듈이다. 여기서 촬영모듈(40)은 정지영상인 사진을 촬영하여 영상프레임으로 생성시킬 수도 있고, 동영상을 촬영하여 영상프레임으로 생성시킬 수도 있다.
The photographing module 40 is a module for photographing a subject to generate an image frame. Here, the photographing module 40 may photograph a still image and generate the image frame, or may photograph the moving image to generate an image frame.

음성인식모듈 활성화 관리모듈(10b)은 사용자에 의해 음성인식모듈(10a)의 활성화 유무가 선택되도록 하는 모듈이다. 음성인식모듈 활성화 관리모듈(10b)에 의해 음성인식모듈(10a)이 활성화되면, 음성인식모듈(10a)은 음성인식을 시작하게 된다.
The voice recognition module activation management module 10b is a module for allowing the user to select whether the voice recognition module 10a is activated or not. When the voice recognition module 10a is activated by the voice recognition module activation management module 10b, the voice recognition module 10a starts voice recognition.

텍스트정보 표시 관리모듈(30b)은 음성인식모듈(10a)이 활성화될 경우, 텍스트정보 표기 설정모듈(30a)로부터 설정된 텍스트정보에 대한 특성정보와 편집정보에 맞추어 촬영모듈(40)로부터 생성된 영상프레임에 음성기반 텍스트정보가 표기되도록 하는 모듈이다. 한편 본 발명에 따른 음성인식형 입체적 디지털영상 구현시스템(100)은 사진촬영시간 표기 여부가 사용자에 의해 선택될 수 있도록 하는데, 사진 촬영시간 표기가 선택될 경우 텍스트정보 표시 관리모듈(30b)은 텍스트정보와 함께 사진 촬영시간이 표기되도록 한다.
When the speech recognition module 10a is activated, the text information display management module 30b displays the text information generated by the imaging module 40 in accordance with the characteristic information and the editing information of the text information set from the text information notation setting module 30a Based text information in a frame. Meanwhile, in the system 100 for realizing a stereoscopic digital image according to the present invention, the user can select whether the photographing time is displayed or not. When the photographing time notation is selected, the text information display management module 30b displays text Make sure that the photographing time is displayed along with the information.

편집완료 영상프레임 관리모듈(30c)은 텍스트정보가 표기된 영상프레임인 편집완료 영상프레임을 저장하고 관리하는 모듈이다.
The edited image frame management module 30c is a module for storing and managing an edited image frame, which is an image frame in which text information is displayed.

여기서 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템(100)은 도 4에서와 같이 디스플레이모듈(50)와 음성기반 텍스트정보 삭제모듈(60)를 구비하여 인식된 음성정보의 유효성 유무의 판단을 통해 유효하지 않은 음성정보 인식시 음성인식과 음성기반 텍스트정보의 생성이 재실행되도록 할 수 있다.
4, the system 100 for implementing a voice recognition type stereoscopic digital image according to an embodiment of the present invention includes a display module 50 and a voice-based text information deletion module 60 to determine whether the recognized voice information is valid or not The generation of the voice recognition and the voice-based text information can be re-executed upon the recognition of the invalid voice information.

디스플레이모듈(50)은 음성인식모듈(10a)과 정보변환모듈(20)로부터 생성된 음성기반 텍스트정보를 실시간 출력하는 모듈로서, 인식된 음성정보의 실시간 확인이 가능하도록 한다.
The display module 50 is a module for real-time output of the voice-based text information generated from the voice recognition module 10a and the information conversion module 20, and enables the recognized voice information to be checked in real time.

음성기반 텍스트정보 삭제모듈(60)은 디스플레이모듈(50)에 출력된 음성기반 텍스트정보에 대한 사용자의 유효성 유무 판단에 의해 유효하지 않은 것으로 판단된 음성기반 텍스트정보가 삭제되도록 하는 모듈이다.
The voice-based text information deletion module 60 is a module that deletes the voice-based text information that is determined to be invalid by judging whether the user is valid or not based on the voice-based text information output to the display module 50.

또한 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템(100)은 도 5에서와 같이 기본제공 텍스트정보 관리모듈(71), 기본제공 텍스트정보 관리모듈(71), 직접입력 텍스트정보 입력모듈(80), 직접입력 텍스트정보 관리모듈(81)을 구비하여 음성기반 텍스트정보 이외에 기본제공 텍스트정보나 직접입력 텍스트정보도 영상프레임에 표기될 수 있도록 할 수 있다.
5, the system 100 for realizing a speech recognition type stereoscopic digital image according to an embodiment of the present invention includes a built-in text information management module 71, a built-in text information management module 71, Module 80 and a direct input text information management module 81 so that the built-in text information or the direct input text information can be displayed in the image frame in addition to the voice-based text information.

기본제공 텍스트정보 설정모듈(70)은 기본제공 텍스트정보가 설정되어 저장되는 모듈이다. 기본제공 텍스트정보로는 영상 촬영시 자주 사용하는 문장들인 “김치”“치즈””화이팅“”사랑해“ 등이 설정될 수 있고, 이외의 다양한 문장들이 시스템 설계자나 사용자에 의해 설정될 수 있다.
The built-in text information setting module 70 is a module in which built-in text information is set and stored. As the built-in text information, "Kimchi", "Cheese", "Whiting", "I love you" and so on, which are frequently used sentences for capturing images, can be set, and various other sentences can be set by the system designer or user.

기본제공 텍스트정보 관리모듈(71)은 사용자에 의해 기본제공 텍스트정보의 사용 유무가 선택되도록 하는 한편, 기본제공 텍스트정보 사용 선택시 디스플레이모듈(50)로 현재 설정되어 있는 복수의 기본제공 텍스트정보를 출력하여 사용자에 의해 선택되도록 하는 모듈이다. 사용자에 의해 기본제공 텍스트정보 사용 선택시 음성인식모듈(10a)이나 직접입력 텍스트정보 입력모듈(80)은 비활성화된다.
The built-in text information management module 71 allows the user to select whether or not to use the built-in text information, and selects a plurality of built-in text information currently set by the display module 50 And is selected by the user. The speech recognition module 10a or the direct input text information input module 80 is inactivated when the user uses the built-in text information.

직접입력 텍스트정보 입력모듈(80)은 사용자에 의해 텍스트가 직접 입력되어 저장되는 모듈이다.
The direct input text information input module 80 is a module for directly inputting and storing text by a user.

직접입력 텍스트정보 관리모듈(81)은 직접입력 텍스트정보의 사용 유무가 사용자에 의해 선택되도록 하는 한편, 직접입력 텍스트정보 사용 선택시 영상프레임 내부에 직접입력 텍스트정보가 표기되도록 하는 모듈이다.The direct input text information management module 81 allows the user to directly select whether the input text information is used or not and directly input the input text information in the image frame when the direct input text information is selected.

사용자에 의해 직접입력 텍스트정보 사용 선택시 음성인식모듈(10a)이나 기본제공 텍스트정보 관리모듈(71)이 비활성화된다.
When the user directly uses the input text information, the speech recognition module 10a or the built-in text information management module 71 is inactivated.

한편 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템(100)은 도 6에서와 같이 음성인식모드 선택모듈(90)을 구비하여 선택된 음성인식모드에 따라 음성인식이 수행되도록 할 수 있다.
Meanwhile, the speech recognition type stereoscopic digital image implementation system 100 according to the embodiment of the present invention may include a speech recognition mode selection module 90 as shown in FIG. 6 to perform speech recognition according to a selected speech recognition mode .

음성인식모드 선택모듈(90)은 촬영모듈(40)의 위치지점으로부터 설정영역 내부에 위치한 촬영자의 음성을 인식하는 촬영자 음성인식모드와, 촬영모듈(40)의 위치지점으로부터 설정영역 외부에 위치한 피사체의 음성을 인식하는 피사체 음성인식모드 중에서 선택된 어느 하나의 음성인식모드가 사용자에 의해 선택되도록 하는 모듈이다. 이에 대응하여 음성인식모듈(10a)은 선택된 음성인식모드에 따라 촬영자 음성인식과 피사체 음성인식 중에서 선택된 어느 하나를 실행하게 된다.
The voice recognition mode selection module 90 selects a voice recognition mode for recognizing the voice of the photographer located within the setting area from the position of the photographing module 40 and a voice recognition mode And a subject voice recognition mode for recognizing the voice of the user is selected by the user. In response to this, the voice recognition module 10a executes either one of the voice recognition of the photographer and the voice recognition of the subject in accordance with the selected voice recognition mode.

상기와 같이 구성된 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템(100)은 음성인식을 통해 자동으로 생성되는 음성기반 텍스트정보가 영상프레임 내부에 표기되도록 함으로써 사진, 동영상과 같은 영상프레임이 생동감있고 입체적으로 구현될 수 있으며, 다양한 편집기능이 제공되어 사용자의 편의 향상과 다이나믹한 영상프레임 구현이 가능해지는 한편, 인식된 음성정보를 실시간 확인하여 유효하지 않은 음성정보가 인식되거나 음성정보 인식오류가 발생될 경우 음성인식과 음성기반 텍스트정보의 생성이 재실행될 수 있도록 하고, 필요에 따라 촬영자의 음성과 피사체의 음성 중 어느 하나가 선택적으로 인식될 수 있도록 함으로써 음성인식의 기능성이 향상되며, 음성기반 텍스트정보 이외에 미리 설정되어 있는 기본제공 텍스트정보나 사용자에 의해 입력되는 직접입력 텍스트정보도 영상프레임 내부에 표기될 수 있도록 함으로써 사용자의 선택 폭이 넓어지고 텍스트정보의 표기가 다변화될 수 있다.
In the speech recognition type stereoscopic digital image realization system 100 according to the embodiment of the present invention configured as described above, the voice-based text information automatically generated through voice recognition is displayed in the video frame, And various editing functions are provided to improve user's convenience and realize a dynamic image frame. Meanwhile, it is possible to recognize recognized voice information in real time and recognize invalid voice information or recognize voice information The function of the speech recognition is improved by allowing the speech recognition and the generation of the voice-based text information to be replayed when an error occurs and by selectively allowing either of the photographer's voice and the subject's voice to be selectively recognized, In addition to voice-based text information, By allowing the host to be marked on the information and the input text information is directly inside the image frame input by the user is widened, the user's choice may be a representation of the text information diversification.

상술한 바와 같은, 본 발명의 실시예에 따른 음성인식형 입체적 디지털영상 구현시스템을 상기한 설명 및 도면에 따라 도시하였지만, 이는 예를 들어 설명한 것에 불과하며 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 다양한 변화 및 변경이 가능하다는 것을 이 분야의 통상적인 기술자들은 잘 이해할 수 있을 것이다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it should be understood that the same is by way of illustration and example only and is not to be taken by way of illustration, It will be understood by those of ordinary skill in the art that various changes and modifications may be made.

1 : 디지털 기기 2 : 휴대폰
3 : 스마트 폰 4 : 스마트 패드
5 : 스마트 기기 6 : 디지털 카메라
10a : 음성인식모듈 10b : 음성인식모듈 활성화 관리모듈
20 : 정보변환모듈 30a : 텍스트정보 표기 설정모듈
31 : 글자크기 설정유닛 32 : 글자체 설정유닛
33 : 글자색채 설정유닛 34 : 표기 언어 설정유닛
35 : 텍스트 표기위치 설정유닛 30b : 텍스트정보 표시 관리모듈
30c : 편집완료 영상프레임 관리모듈 40 : 촬영모듈
50 : 디스플레이모듈 60 : 음성기반 텍스트정보 삭제모듈
70 : 기본제공 텍스트정보 설정모듈 71 : 기본제공 텍스트정보 관리모듈
80 : 직접입력 텍스트정보 입력모듈 81 : 직접입력 텍스트정보 관리모듈
90 : 음성인식모드 선택모듈
100 : 음성인식형 입체적 디지털영상 구현시스템1: Digital device 2: Mobile phone
3: Smartphone 4: SmartPad
5: Smart device 6: Digital camera
10a: voice recognition module 10b: voice recognition module activation management module
20: Information Conversion Module 30a: Text Information Conversion Setting Module
31: character size setting unit 32: font setting unit
33: character color setting unit 34: notation language setting unit
35: Text notation position setting unit 30b: Text information display management module
30c: edited image frame management module 40: photographing module
50: display module 60: voice-based text information deletion module
70: built-in text information setting module 71: built-in text information management module
80: Direct input text information input module 81: Direct input text information management module
90: Speech recognition mode selection module
100: Speech recognition type stereoscopic digital image implementation system

Claims

A voice recognition module (10a) for recognizing slogans, sounds, and voices shouted by a selected one of a photographer and a photographed person;
An information conversion module 20 for converting the voice information recognized by the voice recognition module 10a into text information to generate voice-based text information;
A character size setting unit 31 for setting character information and editing information for text information to be displayed in the video frame, the character size setting unit 31 for setting a character size of text information, a font setting unit 32 for setting a font of text information, A text color setting unit 33 for setting a character color of information, a notation language setting unit 34 for setting a notation language type of text information, a text notation position setting unit 35 for setting notation position in the video frame of text information A text information notation setting module 30a configured to include a text information notifying module 30a;
A photographing module (40) for photographing a subject to generate an image frame, and photographing any one of a still image and a moving image to generate an image frame;
A voice recognition module activation management module 10b for allowing the user to select whether the voice recognition module 10a is activated or not;
When the voice recognition module 10a is activated, the voice information is displayed in the video frame generated from the photographing module 40 in accordance with the characteristic information and the editing information of the text information set from the text information notation setting module 30a A text information display management module (30b) for displaying a photograph photographing time together with text information when a photograph photographing time mark is selected by the user;
An edited image frame management module 30c for storing and managing an edited image frame that is an image frame in which text information is displayed;
A display module (50) for outputting the voice-based text information generated from the voice recognition module (10a) and the information conversion module (20) in real time and enabling the recognized voice information to be checked in real time;
Based text information deletion module (60) that deletes the voice-based text information that is determined to be invalid by determining whether the user is valid or not based on the voice-based text information output to the display module (50);
A built-in text information setting module 70 for setting and storing built-in text information set by a system designer and a user;
The user can select whether or not to use the built-in text information. In addition, when the built-in text information is selected, a plurality of built-in text information currently set by the display module 50 is selected and selected by the user, A built-in text information management module 71 for making the speech recognition module 10a or the direct input text information input module 80 inactive when the built-in text information is selected by the user;
A direct input text information input module 80 for directly inputting and storing text by a user;
The user is allowed to select whether the direct input text information is used or not. In addition, when the direct input text information is selected, the input text information is directly displayed in the image frame. When the user directly uses the input text information, A direct input text information management module 81 for causing the built-in text information management module 71 or the built-in text information management module 71 to be inactivated;
A photographer's voice recognition mode for recognizing the voice of the photographer located within the setting area from the positional point of the photographing module 40 and a subject voice recognition mode for recognizing the voice of the subject located outside the setting area from the positional point of the photographing module 40 And a voice recognition mode selection module 90 for allowing the voice recognition module 10a to execute any one of the voice recognition of the photographer and the voice recognition of the subject in accordance with the selected voice recognition mode ),
The present invention is applied to a digital device (1) including a mobile phone (2), a smart device (5) including a smart phone (3) and a smart pad (4), and a digital camera (6) Image implementation system.

delete