KR20120088493A

KR20120088493A - Method and system for providing multimedia content overlay image chatting service using voice recognition of personal communication terminal

Info

Publication number: KR20120088493A
Application number: KR1020110062235A
Authority: KR
Inventors: 이길수; 윤정준
Original assignee: (주)티아이스퀘어
Priority date: 2011-01-31
Filing date: 2011-06-27
Publication date: 2012-08-08
Anticipated expiration: 2031-06-27
Also published as: KR101268436B1

Abstract

스마트폰이나 태블릿 퍼스널컴퓨터에서 음성 인식을 이용하여 메신저 채팅 내용과 함께 영상 합성 내용을 함께 보여줄 수 있는 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법 및 시스템이 제공된다. 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법은, 메시징 서비스를 포함한 커뮤니케이션 서비스를 제공하는 서비스 제공 시스템을 통해 제1 단말 및 제2 단말 중 적어도 어느 하나의 사용자 단말에서 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공하는 방법으로, 사용자 단말로부터 획득한 음성 메시지에서 음성을 추출하는 제1 단계, 추출된 음성으로부터 단어를 분리하는 제2 단계, 분리된 단어를 미리 설정된 키워드 목록 내의 키워드와 비교하여 단어와 일치하는 키워드를 검색하는 제3 단계, 키워드에 대응하는 이미지 또는 동영상 형태의 멀티미디어 콘텐츠 데이터를 추출하는 제4 단계, 및 음성 메시지를 출력하고 멀티미디어 콘텐츠 데이터를 재생하는 제5 단계를 포함한다.Provided are a method and system for providing a multimedia content synthesis video chat service capable of showing a video synthesis content together with a messenger chat content using a voice recognition in a smartphone or a tablet personal computer. The multimedia content synthesis video chat service providing method is a method of providing a multimedia content synthesis video chat service in at least one user terminal of a first terminal and a second terminal through a service providing system that provides a communication service including a messaging service. A first step of extracting a voice from a voice message obtained from a user terminal, a second step of separating a word from the extracted voice, and searching for a keyword matching the word by comparing the separated word with a keyword in a preset keyword list A third step, a fourth step of extracting multimedia content data in the form of an image or a video corresponding to a keyword, and a fifth step of outputting a voice message and playing the multimedia content data.

Description

Method and system for providing a video chat service for synthesizing multimedia contents using speech recognition in personal mobile terminal

본 발명은 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법 및 시스템에 관한 것으로서, 보다 상세하게는 스마트폰이나 태블릿(Tablet) 퍼스널 컴퓨터와 같은 개인 휴대 단말에서 음성인식을 이용하여 멀티미디어 콘텐츠 합성 내용을 영상 채팅 내용과 함께 사용자 단말에 제공할 수 있는 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법 및 시스템에 관한 것이다.
The present invention relates to a method and system for providing a multimedia content synthesis video chat service, and more particularly, to synthesizing a multimedia content synthesis content using a voice recognition in a personal mobile terminal such as a smartphone or a tablet personal computer. The present invention relates to a method and system for providing a multimedia video chat service that can be provided to a user terminal.

최근 이동통신 기술의 발전에 힘입어 개인 휴대 단말을 이용한 다양한 커뮤니케이션 서비스가 연구 개발되고 있다. 커뮤니케이션 서비스로는 서로의 얼굴 영상을 보면서 통화하는 영상 통화 서비스, 영상 통화 중에 문자 메시지를 송수신할 수 있는 영상 채팅 서비스 등이 있다.Recently, with the development of mobile communication technology, various communication services using personal portable terminals have been researched and developed. The communication service includes a video call service for viewing a face image of each other and a video chat service for transmitting and receiving text messages during a video call.

영상 채팅 서비스는 통상 3세대(3G) 패킷교환망(PS: Packet Switched network) 또는 와이파이(WiFi)와 같은 데이터 통신망을 통하여 개인 휴대 단말 상에서 상대방과 인스턴트 메시지를 송수신하는 방식으로 수행된다. 이러한 영상 채팅 서비스는 주로 사용자의 얼굴 영상을 전송하면서 별도로 제공되는 채팅창을 통해 문자 메시지를 전송하는 것에 불과하므로 영상 채팅 서비스 화면이 단조롭고 때로는 음성 통화나 영상 통화만을 수행할 때보다 어색하고 불편하다는 문제점이 있다.The video chat service is generally performed in a manner of transmitting and receiving instant messages with a counterpart on a personal mobile terminal through a data communication network such as 3G packet switched network (PS) or Wi-Fi (WiFi). Since the video chat service mainly transmits a text message through a separate chat window while transmitting a face image of a user, the video chat service screen is monotonous and sometimes awkward and inconvenient than performing only a voice call or a video call. There is this.

또한, 최근에는 채팅 서비스로서 사용자의 입력 음성 그대로를 전달하는 음성 메시지 전송 기능을 제공할 계획들이 발표되어 있으나 그러한 음성 메시지 전송 기능은 단순히 메시지 전달 기능이어서 역동적인(Dynamic) 재미를 느낄 수 없다.
In addition, recently, plans have been announced to provide a voice message transmission function that delivers a user's input voice as a chat service, but such a voice message transmission function is simply a message delivery function, so that dynamic fun cannot be felt.

본 발명은 사용자의 입력 음성 메시지 내용에 따라 실시간으로 추가적인 정지영상이나 동영상을 다이내믹하게 보여줌으로써 사용자에게 재미있는 화면을 제공할 수 있는 멀티미디어 콘텐츠 합성 영상 채팅 서비스 방법 및 이러한 방법을 이용하는 멀티미디어 콘텐츠 합성 영상 채팅 서비스 시스템을 제공하는 데 그 주된 목적이 있다.
The present invention provides a multimedia content synthesis video chat service method capable of providing an interesting screen to a user by dynamically displaying an additional still image or a video in real time according to the input voice message content of the user, and a multimedia content synthesis video chat service using the method. The main purpose is to provide a system.

상기 기술적 과제를 해결하기 위하여 본 발명의 일 측면에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법은, 네트워크를 통해 서비스 지원 서버에 연결되는 발신측 개인 휴대 단말과 착신측 개인 휴대 단말 간에 자체 탑재된 메시지 어플리케이션을 활용하여 채팅을 수행하도록 서비스하는 과정을 포함하는 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법으로서, 서비스 지원 서버에 구비된 SIP(Session Initiation Protocol) Registrar/Proxy 서버에서 발신측 및 착신측 개인 휴대 단말들에 대한 위치 등록 절차를 수행하는 단계; 발신측 개인 휴대 단말의 채팅 연결 요청에 따라 착신측 개인 휴대 단말과 채팅 연결 호를 중계 설정하는 단계; 및 서비스 지원 서버에 구비된 MSRP(Message Session Relay Protocol) Relay 서버에서 발신측 개인 휴대 단말에서 받은 메시지를 착신측 개인 휴대 단말로 전송하고, 착신측 개인 휴대 단말에서 받은 메시지를 발신측 개인 휴대 단말로 전송하는 단계를 포함한다. 여기에서, 발신측 또는 착신측 개인 휴대 단말은 메시지 어플리케이션의 구비된 음성인식엔진에 의해 메시지 내의 음성에서 적어도 하나의 단어에 상응하는 정지영상 또는 동영상을 메시징 신호와 합성하여 출력하는 것을 특징으로 한다.In order to solve the above technical problem, a method for providing a multimedia content composite video chat service according to an aspect of the present invention includes a message application that is self-mounted between a calling personal mobile terminal and a called personal portable terminal connected to a service support server through a network. A method of providing a multimedia content video chat service including a process of performing a chat by using a service, comprising: a session initiation protocol (SIP) registrar / proxy server provided in a service support server to a calling party and a called party personal mobile terminal; Performing a location registration procedure for the; Relaying the chat connection call with the called party's personal mobile terminal according to the chat connection request of the calling party's personal mobile terminal; And a message session relay protocol (MSRP) relay server provided in the service support server to transmit a message received from the calling personal portable terminal to the called personal portable terminal, and to transmit the message received from the called personal portable terminal to the calling personal portable terminal. Transmitting. Here, the calling party or the called party personal portable terminal is characterized by outputting a still image or a video corresponding to at least one word in the voice in the message by the voice recognition engine provided in the message application by combining with the messaging signal.

본 발명의 또 다른 측면에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 시스템은 네트워크에 연결되어 발신측 개인 휴대 단말과 착신측 개인 휴대 단말 간에 자체 탑재된 메시지 어플리케이션을 활용하여 채팅을 수행하도록 서비스하는 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 시스템으로서, 발신측 및 착신측 개인 휴대 단말들에 대한 위치 등록 절차를 수행하고, 발신측 개인 휴대 단말의 채팅 연결 요청에 따라 착신측 개인 휴대 단말과 채팅 연결 호를 중계 설정하는 SIP(Session Initiation Protocol) Registrar/Proxy 서버; 및 발신측 개인 휴대 단말에서 받은 메시지를 착신측 개인 휴대 단말로 전송하고, 착신측 개인 휴대 단말에서 받은 메시지를 발신측 개인 휴대 단말로 전송하는 MSRP(Message Session Relay Protocol) Relay 서버를 포함한다. 여기에서, 발신측 또는 착신측 개인 휴대 단말은 메시지 어플리케이션의 구비된 음성인식엔진에 의해 메시지 내의 음성에서 적어도 하나의 단어를 추출하고 추출한 단어에 상응하는 정지영상 또는 동영상을 메시징 신호와 합성하여 출력하는 것을 특징으로 한다.Multimedia content synthesis video chat service providing system according to another aspect of the present invention is connected to the network multimedia content synthesis service to perform a chat using a self-loading message application between the calling personal mobile terminal and the destination personal mobile terminal A video chat service providing system comprising: a SIP that performs a location registration procedure for calling and called personal mobile terminals and relays a chat connection call with a called personal mobile terminal in response to a chat connection request of the calling personal mobile terminal; (Session Initiation Protocol) Registrar / Proxy server; And a Message Session Relay Protocol (MSRP) Relay server for transmitting a message received from the calling party's personal portable terminal to the called party's personal portable terminal and transmitting the message received from the called party's personal portable terminal to the calling party's personal portable terminal. Here, the calling party or the called party personal mobile terminal extracts at least one word from the voice in the message by using the voice recognition engine provided in the message application, and outputs a still image or video corresponding to the extracted word by synthesizing it with the messaging signal. It is characterized by.

본 발명의 또 다른 측면에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법은, 메시징 서비스를 포함한 커뮤니케이션 서비스를 제공하는 서비스 제공 시스템을 통해 제1 단말 및 제2 단말 중 적어도 어느 하나의 사용자 단말에서 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공하는 방법에 있어서, 사용자 단말로부터 획득한 음성 메시지에서 음성을 추출하는 제1 단계; 추출된 음성으로부터 단어를 분리하는 제2 단계; 분리된 단어를 미리 설정된 키워드 목록 내의 키워드와 비교하여 단어와 일치하는 키워드를 검색하는 제3 단계; 키워드에 대응하는 이미지 또는 동영상 형태의 멀티미디어 콘텐츠 데이터를 추출하는 제4 단계; 및 음성 메시지를 출력하고 멀티미디어 콘텐츠 데이터를 재생하는 제5 단계를 포함한다.In accordance with another aspect of the present invention, there is provided a method for providing a multimedia content synthesis video chat service, comprising: synthesizing multimedia content in at least one user terminal of a first terminal and a second terminal through a service providing system that provides a communication service including a messaging service; A method for providing a video chat service, the method comprising: extracting a voice from a voice message obtained from a user terminal; Separating a word from the extracted voice; A third step of searching for a keyword matching the word by comparing the separated word with a keyword in a preset keyword list; Extracting multimedia content data in the form of an image or a video corresponding to a keyword; And a fifth step of outputting a voice message and playing back multimedia content data.

바람직하게, 제3 단계는 서비스 제공 시스템의 콘텐츠 관리 서버로부터 키워드 목록 및 키워드 목록에 대응하는 보유 콘텐츠 정보 목록을 받는 단계를 포함할 수 있다.Preferably, the third step may include receiving a keyword list and a list of reserved content information corresponding to the keyword list from the content management server of the service providing system.

일 실시예에서, 사용자 단말은 서비스 제공 시스템의 SIP(Session Initiation Protocol) Registrar/Proxy 서버와 연결되며 SIP 메시지를 생성하고 송수신하는 SIP Registrar/Proxy 서버 연동부; 서비스 제공 시스템의 MSRP(Message Session Relay Protocol) Relay 서버를 통해 인스턴트 메시지를 송수신하는 MSRP Relay 서버 연동부; 콘텐츠 관리 서버와 연결되는 콘텐츠 관리 서버 연동부; 및 MSRP Relay 서버 연동부에 연결되며 음성 메시지 내의 단어 또는 키워드를 인식하는 음성 인식 처리부를 구비한다.In one embodiment, the user terminal is connected to the Session Initiation Protocol (SIP) Registrar / Proxy server of the service providing system and the SIP Registrar / Proxy server interworking unit for generating and sending and receiving SIP messages; MSRP Relay server interworking unit for transmitting and receiving instant messages through a Message Session Relay Protocol (MSRP) Relay server of the service providing system; A content management server interworking unit connected to the content management server; And a voice recognition processing unit connected to the MSRP relay server interworking unit and recognizing a word or a keyword in the voice message.

일 실시예에서, 음성 인식 처리부는, 음성 메시지를 실시간으로 음성 인식 엔진에 전달하는 음성 추출부; 음성 추출부에서 받은 음성 메시지를 인식하여 분석 데이터를 생성하는 음성 인식 엔진; 음성 인식 엔진에서 받은 분석 데이터로부터 문장을 생성하는 문장 단어 처리부; 문장 단어 처리부로부터 받은 문장으로부터 단어 또는 키워드를 검출하는 키워드 단어 검출부; 및 키워드 단어 검출부로부터 받은 단어 또는 키워드에 상응하는 멀티미디어 콘텐츠를 검색하는 단어 이미지 검색부를 구비한다.In one embodiment, the speech recognition processing unit includes a speech extraction unit for delivering a voice message to the speech recognition engine in real time; A speech recognition engine configured to generate the analysis data by recognizing the speech message received from the speech extractor; A sentence word processor for generating a sentence from analysis data received from a speech recognition engine; A keyword word detector for detecting a word or keyword from a sentence received from the sentence word processor; And a word image search unit for searching for multimedia content corresponding to a word or keyword received from the keyword word detector.

일 실시예에서, 제4 단계는 콘텐츠 관리 서버 연동부에 연결된 다운로드 처리부가 콘텐츠 관리 서버로부터 멀티미디어 콘텐츠 데이터를 받는 단계를 포함한다.In an embodiment, the fourth step includes receiving a multimedia content data from a content management server by a download processor connected to the content management server linkage unit.

일 실시예에서, 콘텐츠 관리 서버는 콘텐츠 관리 서버 연동부와 연결되는 콘텐츠 클라이언트 연동부; 콘텐츠 클라이언트 연동부에 연결되어 사용자 단말로 콘텐츠를 전송하는 콘텐츠 업로드 처리부; 및 단어 또는 키워드에 상응하는 멀티미디어 콘텐츠를 저장 및 관리하는 콘텐츠 데이터베이스를 구비한다.In one embodiment, the content management server includes a content client interworking unit connected to the content management server interlocking unit; A content upload processor connected to the content client interlocking unit and transmitting the content to the user terminal; And a content database for storing and managing multimedia content corresponding to words or keywords.

일 실시예에서, 사용자 단말은 SIP Registrar/Proxy 서버 연동부에 연결되며, 사용자 단말의 위치가 변경될 때, SIP Registrar/Proxy 서버와 연동하여 사용자 단말의 로밍을 처리하는 채팅 연결 및 위치 등록 처리부를 구비한다.In one embodiment, the user terminal is connected to the SIP Registrar / Proxy server interworking unit, when the location of the user terminal, the chat connection and location registration processing unit for processing roaming of the user terminal in conjunction with the SIP Registrar / Proxy server Equipped.

일 실시예에서, 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법은, 제1 단계 이전, 제1 또는 제2 단말의 요청에 따라 MSRP Relay 서버에서 제1 단말 및 상기 제2 단말 간의 채팅 연결을 설정하는 단계를 더 포함한다.In an embodiment, the method for providing a multimedia content video chat service may include establishing a chat connection between the first terminal and the second terminal in an MSRP relay server according to a request of the first or second terminal before the first step. It includes more.

일 실시예에서, 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법은, 제5 단계 이후, 사용자 단말의 채팅창에 입력되는 문자 메시지로부터 단어 또는 키워드를 분리하고 분리한 단어 또는 키워드에 상응하는 이미지 또는 플래시 파일을 검색하여 문자 메시지와 함께 사용자 단말의 화면에 재생하는 단계를 더 포함한다.In one embodiment, the method for providing a multimedia content video chat service, after the fifth step, separates a word or keyword from a text message input into a chat window of a user terminal and displays an image or flash file corresponding to the separated word or keyword. The method may further include searching and playing the text message on the screen of the user terminal.

본 발명의 또 다른 측면에 의하면, 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 시스템은, 메시징 서비스를 포함한 커뮤니케이션 서비스를 제1 단말과 제2 단말에 제공하면서 제1 단말 및 제2 단말 중 적어도 어느 하나의 사용자 단말에 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공하는 시스템에 있어서, 제1 단말 및 제2 단말 사이에서 SIP(Session Initiation Protocol) 메시지를 라우팅하는 신호 처리 서버; 제1 단말 및 제2 단말 간의 채팅 연결 또는 메시지 세션을 설정하는 MSRP(Message Session Relay Protocol) Relay 서버; 및 제1 단말 및 제2 단말 중 적어도 어느 하나의 사용자 단말의 요청에 따라 특정 키워드에 상응하는 멀티미디어 콘텐츠 데이터를 사용자 단말로 전송하는 콘텐츠 관리 서버를 포함한다. 여기에서, 사용자 단말은 마이크를 통해 입력되거나 네트워크를 통해 수신되고 상기 키워드가 포함된 음성 메시지를 출력하면서 이미지 또는 동영상 형태의 상기 멀티미디어 콘텐츠 데이터를 재생하는 것을 특징으로 한다.According to another aspect of the present invention, the multimedia content synthesis video chat service providing system, at least one user terminal of the first terminal and the second terminal while providing a communication service including a messaging service to the first terminal and the second terminal. A system for providing a multimedia content video chatting service, comprising: a signal processing server for routing a Session Initiation Protocol (SIP) message between a first terminal and a second terminal; A Message Session Relay Protocol (MSRP) Relay server for establishing a chat connection or a message session between the first terminal and the second terminal; And a content management server transmitting multimedia content data corresponding to a specific keyword to the user terminal according to a request of at least one user terminal of the first terminal and the second terminal. Here, the user terminal is characterized in that to play the multimedia content data in the form of an image or video while outputting a voice message input through a microphone or received via a network and including the keyword.

일 실시예에서, 콘텐츠 관리 서버는 사용자 단말의 요청에 따라 키워드에 대한 키워드 목록 및 키워드 목록의 각 키워드에 대응하는 보유 콘텐츠 정보 목록을 사용자 단말로 전송한다.In one embodiment, the content management server transmits a keyword list for the keyword and a list of reserved content information corresponding to each keyword of the keyword list to the user terminal according to a request of the user terminal.

일 실시예에서, 콘텐츠 관리 서버는 사용자 단말의 콘텐츠 관리 서버 연동부와 연결되는 콘텐츠 클라이언트 연동부; 콘텐츠 클라이언트 연동부에 연결되어 사용자 단말로 콘텐츠를 전송하는 콘텐츠 업로드 처리부; 및 키워드에 상응하는 멀티미디어 콘텐츠를 저장 및 관리하는 콘텐츠 데이터베이스를 구비한다.In one embodiment, the content management server includes a content client interworking unit connected to the content management server interlocking unit of the user terminal; A content upload processor connected to the content client interlocking unit and transmitting the content to the user terminal; And a content database for storing and managing multimedia content corresponding to the keyword.

일 실시예에서, 신호 처리 서버는 SIP(Session Initiation Protocol) Registrar 서버 및 SIP Registrar 서버에 연결되는 SIP Proxy 서버를 포함한다. 여기에서, SIP Proxy 서버는 사용자 단말의 위치가 변경될 때 사용자 단말의 채팅 연결 및 위치 등록 처리부와 연동하여 사용자 단말의 로밍을 처리한다.In one embodiment, the signal processing server includes a Session Initiation Protocol (SIP) Registrar server and a SIP Proxy server connected to the SIP Registrar server. Here, the SIP proxy server processes roaming of the user terminal in cooperation with the chat connection and the location registration processing unit of the user terminal when the location of the user terminal is changed.

일 실시예에서, 사용자 단말은, 채팅창으로 입력되거나 네트워크를 통해 수신되는 문자 메시지로부터 단어 또는 키워드를 분리하고 분리한 단어 또는 키워드에 상응하는 이미지 또는 플래시 파일을 검색하여 음성 메시지 및 문자 메시지와 함께 사용자 단말의 화면에 재생한다.
In one embodiment, the user terminal separates a word or keyword from a text message entered into a chat window or received through a network, and searches for an image or flash file corresponding to the separated word or keyword to be accompanied by a voice message and a text message. Play on the screen of the user terminal.

본 발명에 의하면, 와이파이(WiFi)를 포함한 인터넷이나 3G 패킷교환(PS) 망에서 스마트폰이나 태블릿 PC와 같은 개인 휴대 단말에 입력되거나 수신된 음성 메시지로부터 특정 단어 또는 키워드를 검출하고 검출한 키워드에 관련된 플래시 혹은 이미지를 영상 채팅 신호와 함께 실시간으로 자신의 단말 및/또는 상대방의 단말에서 출력할 수 있도록 함으로써 영상 채팅 사용자는 재미있는 채팅을 할 수 있다. 아울러, 서비스 제공자는 스마트폰 또는 태블릿 PC의 음성 인식 엔진을 이용하여 채팅 중에 사용자가 입력한 음성 내용의 적어도 일부를 실시간으로 관련 이미지 또는 동영상과 합성하여 보거나 들을 수 있도록 함으로써 사용자들이 재미있는 영상 채팅 서비스를 제공할 수 있다.According to the present invention, a specific word or keyword is detected and detected from a voice message input or received from a personal mobile terminal such as a smartphone or a tablet PC in the Internet or 3G packet switched (PS) network including Wi-Fi. The video chat user can have a fun chat by allowing the related flash or image to be output in real time with his or her terminal and / or the terminal of the other party together with the video chat signal. In addition, the service provider uses a voice recognition engine of a smartphone or tablet PC to allow the user to view or listen to at least a part of the voice content input by the user during a chat in real time with related images or videos, thereby providing a fun video chat service. Can provide.

또한, 본 서비스를 제공받기 위해 사용자는 별도의 콘텐츠를 미리 설정하는 불편함이나 상대방에게 정지영상이나 동영상을 보여주기 위해 DTMF(Dual Tone Multi-Frequency) 입력과 같은 별도의 키 조작을 수행할 필요가 없다. 즉, 사용자는 편리성이 향상된 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공받을 수 있다.In addition, in order to receive this service, the user does not need to perform a separate key operation such as DTMF (Dual Tone Multi-Frequency) input in order to display a still image or a video to a counterpart, or to set up a separate content in advance. none. That is, the user may be provided with a multimedia content synthesis video chat service having improved convenience.

그리고, 채팅 도중 사용자 선택에 따라 음성 입력 모드와 함께 기존의 문자 입력 모드를 함께 이용하면, 음성 메시지와 문자 메시지에 각각 포함된 특정 단어나 키워드에 상응하는 멀티미디어 콘텐츠를 영상 채팅 화면에 함께 표시할 수 있다. 따라서, 사용자는 음성 및 문자 메시지를 자유로이 사용하면서 본 서비스를 이용할 수 있는 이점이 있다. 또한, 사용자는 채팅 상대방에게 전달하고자 하는 내용을 강조할 수 있고 보다 입체감 있는 채팅을 할 수 있다.
In addition, when the user selects the voice input mode and the existing text input mode during the chat, multimedia content corresponding to a specific word or keyword included in the voice message and the text message can be displayed together on the video chat screen. have. Therefore, the user can use the service while freely using voice and text messages. In addition, the user can emphasize the content to be delivered to the chat counterpart and can chat more three-dimensionally.

도 1은 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 시스템(이하, 간략히 서비스 제공 시스템이라 함)의 개략적인 구성도이다.
도 2는 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법의 개략적인 흐름도이다.
도 3은 본 발명의 또 다른 일 실시예에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법(이하, 간략히 서비스 제공 방법이라 함)의 개략적인 흐름도이다.
도 4는 도 2 또는 도 3의 본 발명의 서비스 제공 방법에 채용가능한 사용자 단말의 메신저 어플리케이션에 대한 개략적인 구성도이다.
도 5는 도 4의 메신저 어플리케이션의 음성 인식 처리부에 대한 개략적인 구성도이다.
도 6은 도 1의 사용자 단말의 위치 등록 절차를 설명하기 위한 흐름도이다.
도 7은 도 1의 사용자 단말들 간의 채팅 연결 절차를 설명하기 위한 흐름도이다.
도 8은 도 1의 사용자 단말들 간의 멀티미디어 콘텐츠를 송수신하는 절차를 설명하기 위한 흐름도이다.
도 9는 도 1의 사용자 단말들 간의 채팅 연결 종료 절차를 설명하기 위한 흐름도이다.
도 10은 도 1의 서비스 제공 시스템에 채용가능한 콘텐츠 자동 업데이트 기능을 설명하기 위한 개략적인 블록도이다.
도 11은 도 1의 사용자 단말에 채용가능한 메시지 어플리케이션에서의 사용자 인터페이스 화면을 나타낸 도면이다.1 is a schematic configuration diagram of a multimedia content chatting video chatting service providing system (hereinafter, simply referred to as a service providing system) according to an exemplary embodiment of the present invention.
2 is a schematic flowchart of a method for providing a multimedia content synthetic video chat service according to an embodiment of the present invention.
3 is a schematic flowchart of a method for providing a multimedia content synthetic video chat service according to another embodiment of the present invention (hereinafter, simply referred to as a service providing method).
4 is a schematic configuration diagram of a messenger application of a user terminal employable in the service providing method of FIG. 2 or 3.
FIG. 5 is a schematic configuration diagram of a speech recognition processor of the messenger application of FIG. 4.
6 is a flowchart illustrating a location registration procedure of a user terminal of FIG. 1.
7 is a flowchart illustrating a chat connection procedure between user terminals of FIG. 1.
8 is a flowchart illustrating a procedure for transmitting and receiving multimedia content between user terminals of FIG. 1.
9 is a flowchart illustrating a chat connection termination procedure between user terminals of FIG. 1.
FIG. 10 is a schematic block diagram illustrating a content automatic update function employable in the service providing system of FIG. 1.
FIG. 11 is a diagram illustrating a user interface screen in a message application employable in the user terminal of FIG. 1.

이하, 첨부 도면을 참조하여 본 발명에 의한 실시예를 상세하게 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 시스템(이하, 간략히 서비스 제공 시스템이라 함)의 개략적인 구성도이다.1 is a schematic configuration diagram of a multimedia content chatting video chatting service providing system (hereinafter, simply referred to as a service providing system) according to an exemplary embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 서비스 제공 시스템(100)은 WiFi 등의 무선랜을 포함한 인터넷이나 3G 패킷교환 PS(Packet Switched) 망에서 스마트폰, 태블릿 퍼스널컴퓨터와 같은 사용자 단말들(10, 20) 간에 메시징 서비스를 제공하면서 입력된 음성 메시지의 음성을 분석하고 분석에 의해 인식된 단어에 해당하는 멀티미디어 콘텐츠를 메시지 신호와 합성하여 실시간으로 채팅 참여자들에게 보여주는 서비스를 제공한다.Referring to FIG. 1, the service providing system 100 according to the present embodiment includes user terminals 10 such as a smartphone and a tablet personal computer in the Internet or a 3G packet switched PS network including a wireless LAN such as WiFi. 20) provides a service that analyzes the voice of the input voice message while providing a messaging service and synthesizes the multimedia content corresponding to the word recognized by the analysis with the message signal to show the chat participants in real time.

인식된 단어에 해당하는 멀티미디어 콘텐츠는 이미지나 플래시와 같은 정지영상 또는 동영상을 포함하며, 이러한 멀티미디어 콘텐츠는 인터넷 프로토콜(IP: Internet Protocol) 기반의 네트워크를 통해 전송가능한 모든 이미지 데이터 또는 동영상 데이터를 포함하고, 미리 설정된 단어 또는 키워드에 상응하는 이미지 또는 느낌을 표현하기 위한 것이다.The multimedia content corresponding to the recognized word includes still images or videos such as images or flashes, and the multimedia content includes all image data or video data that can be transmitted through an Internet Protocol (IP) based network. In order to express an image or a feeling corresponding to a preset word or keyword.

서비스 제공 시스템(100)은 사용자 단말들(10, 20)에 연결되는 서비스 지원 장치로서 SIP Registrar/Proxy 서버(110), MSRP Relay 서버(120), 및 콘텐츠 관리 서버(Content Management Server, 130)를 포함한다.The service providing system 100 may include a SIP registrar / proxy server 110, an MSRP relay server 120, and a content management server 130 as a service supporting device connected to the user terminals 10 and 20. Include.

SIP Registrar/Proxy 서버(110)는 네트워크를 통하여 영상 채팅 서비스를 이용하고자 하는 사용자에게 멀티미디어 콘텐츠 합성 영상 채팅 서비스의 가입자인지를 식별하고 사용자 단말에서의 서비스 요청 신호에 따라 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공하기 위한 시나리오를 수행한다. 예를 들면, 신호 처리 서버(110)는 사용자 단말로부터 멀티미디어 콘텐츠 합성 영상 채팅에 대한 호 접속 요청 신호를 수신하고 그 신호에 따라 해당 단말들 사이에 영상 통화를 위한 세션을 형성할 수 있다.The SIP Registrar / Proxy server 110 identifies a subscriber of the multimedia content synthesis video chat service to a user who wants to use the video chat service through a network, and provides the multimedia content synthesis video chat service according to the service request signal from the user terminal. Perform a scenario to do this. For example, the signal processing server 110 may receive a call connection request signal for a multimedia content composite video chat from a user terminal and form a session for video call between the terminals according to the signal.

이러한 SIP Registrar/Proxy 서버(110)는 IMS(Internet Protocol Multimedia Subsystem) 네트워크 기반의 SIP 시그널링(Session Initiation Protocol Signaling)을 처리하며, 외부 망에 대한 SIP 세션 요청을 제어하는 S-Proxy(SIP Proxy) 기능을 수행한다. 또한, SIP Registrar/Proxy 서버(110)는 IMS 네트워크로부터 라우팅(Routing)되는 SIP 호에 대한 관리, 통계, CDR(Call Detail Record)을 처리한다. 여기에서, IMS 네트워크는 IP 멀티미디어 서비스를 제공하기 위한 노드들의 집합 또는 기반 구조(Architectural Framework)을 지칭하며, 호 세션 제어부(CSCF: Call Session Control Function), 홈 가입자 서버(HSS: Home Subscriber Server), 및 어플리케이션 서버를 포함할 수 있고, 소정의 IP 멀티미디어 서비스가 설정되어 있는 사용자 단말에 대한 정보를 본 서비스 제공 시스템으로 전달해 준다. 또한, SIP Registrar/Proxy 서버(110)는 등록 서버(Registration Server)와 연동하여 가입자 인증을 처리할 수 있다. 등록 서버는 가입자의 전화 번호부 리스트 관리, 가입자 상태 관리, 가입자 인증 관리를 담당하며, SIP 프로토콜 기반으로 IMS 네트워크와 연동한다.The SIP Registrar / Proxy server 110 processes Session Initiation Protocol Signaling (SMS) signaling based on an Internet Protocol Multimedia Subsystem (IMS) network and controls a SIP session request to an external network. Do this. In addition, the SIP Registrar / Proxy server 110 processes management, statistics, and call detail records (CDRs) for SIP calls routed from the IMS network. Here, the IMS network refers to a set of nodes or an architectural framework for providing an IP multimedia service, and includes a call session control function (CSCF), a home subscriber server (HSS), And an application server, and transmits information about a user terminal on which a predetermined IP multimedia service is set to the service providing system. In addition, the SIP Registrar / Proxy server 110 may process subscriber authentication in cooperation with a registration server. The registration server is in charge of subscriber's phone book list management, subscriber status management, subscriber authentication management, and interworking with IMS network based on SIP protocol.

MSRP(Message Session Relay Protocol) Relay 서버(120)는 인스턴트 메시지(Instant Message)의 전송을 담당하고, 음악, 이미지, 동영상 등에 대한 바이너리(Binary) 파일의 전송 세션 관리와 신뢰성 있는 파일 전송을 담당한다.Message Session Relay Protocol (MSRP) Relay server 120 is responsible for the transmission of instant messages, and is responsible for managing the transmission session of binary files for music, images, videos, and the like, and for reliable file transfer.

콘텐츠 관리 서버(Content Managemnet Server, 130)는 미리 설정된 단어나 키워드에 상응하는 이미지 또는 동영상 형태의 다양한 멀티미디어 콘텐츠를 저장 및 관리한다. 콘텐츠 관리 서버(130)는 외부의 콘텐츠 제공자(CP: Contents Provider) 등으로부터 멀티미디어 콘텐츠를 제공받을 수 있도록 외부 망의 포털사이트(Portal Sites), 모바일 앱 스토어 등의 오픈 마켓(Open Market), 및 유클라우드(ucloud) 서버를 위한 플랫폼(Platform)을 제공하고, 이들과의 연동 기능을 수행할 수 있다. 여기에서, 포털사이트는 다음(www.daum.net), 네이버(www.naver.com) 등을 포함하고, 모바일 앱 스토어는 SK 텔레콤사의 T-스토어, 애플사의 아이폰 앱스토어(iPhone App Store), 노키아사의 오비 스토어(Nokia Ovi Store), 안드로이드 마켓(Android Market), 블랙베리 앱 월드(Blackberry App World) 등을 포함하고, 유클라우드 서버는 본 발명의 서비스 제공 시스템이나 포털사이트, 또는 오픈 마켓 등에서 제공하는 것으로, 사용자 단말의 요청에 따라 서버에 저장된 이미지 또는 동영상 형태의 멀티미디어 콘텐츠를 사용자 단말에 적합한 형태로 변환하여 사용자 단말로 전송하는 수단 또는 그러한 기능을 수행하는 구성부를 지칭한다.The content management server 130 stores and manages various multimedia contents in the form of an image or a video corresponding to a preset word or keyword. The content management server 130 is an open market such as portal sites of an external network, an open market such as a mobile app store, and the like to receive multimedia contents from an external content provider (CP). It can provide a platform (cloud) for the cloud (ucloud) server, and can perform the interworking function with them. Here, portal sites include Daum (www.daum.net), Naver (www.naver.com), etc., and mobile app stores include SK Telecom's T-Store, Apple's iPhone App Store, Nokia Ovi Store, Android Market, Blackberry App World, etc., and the ucloud server is provided in the service providing system or portal site of the present invention, open market, etc. In this regard, it refers to a means for converting the multimedia content in the form of an image or video stored in the server in a form suitable for the user terminal and transmitting the same to the user terminal or a component for performing such a function at the request of the user terminal.

이하의 상세한 설명에 있어서, SIP Registrar/Proxy 서버(110)는 SIP(Session Initial Protocol) 기반의 신호를 처리하는 서버의 대표적인 일례이므로 신호 처리 서버(110)라 하고, MSRP Relay 서버(120)는 인스턴트 메시지의 송수신을 처리하는 서버의 대표적인 일례이므로 메시지 처리 서버(120)라 하고, 콘텐츠 관리 서버(130)는 간략히 CMS(130)이라 한다.In the following detailed description, since the SIP Registrar / Proxy server 110 is a representative example of a server that processes a Session Initial Protocol (SIP) based signal, the SIP Registrar / Proxy server 110 is referred to as a signal processing server 110 and the MSRP Relay server 120 is an instant. Since it is a representative example of a server that handles the transmission and reception of messages, it is called a message processing server 120, and the content management server 130 is briefly referred to as a CMS (130).

서비스 제공 시스템(100)이 본 실시예의 서비스를 제공하는 서버로서 동작할 때, 사용자 단말(10, 20)은 서비스 제공 시스템(100)과 연동하는 클라이언트 단말로 동작할 수 있다. 또한, 발신측 또는 착신측 사용자 단말은 서버 및 클라이언트 중 적어도 어느 하나의 입장에서 상대방 사용자 단말과 연결될 수 있다.When the service providing system 100 operates as a server for providing a service of the present embodiment, the user terminals 10 and 20 may operate as client terminals interworking with the service providing system 100. In addition, the calling party or the called user terminal may be connected to the counterpart user terminal from at least one of the server and the client.

본 실시예에 의하면, 사용자 단말들(10, 20)은 기탑재된 채팅 또는 메신저 어플리케이션을 통해 서비스 제공 시스템(100)과 연동하며 영상 채팅 중 음성 메시지에 포함된 특정 단어나 키워드에 상응하는 멀티미디어 콘텐츠를 음성 메시지를 포함한 영상 채팅 신호와 함께 화면에 표시하면서 스피커로 출력할 수 있다. 다시 말해서, 서비스 제공 시스템(100)은 스마트폰 또는 태블릿 PC와 같은 사용자 단말들(10, 20) 간에 음성 인식 영상 합성 채팅 서비스를 제공하기 위해 설치되는 음성 인식 엔진을 포함한 메신저 어플리케이션을 사용자 단말에 제공하고, 사용자 단말들(10, 20)의 현재 위치와 채팅 연결 제어를 위한 신호 처리 서버(110)와, 사용자 단말들(10, 20) 사이에서 송수신되는 음성 메시지 및 멀티미디어 콘텐츠 콘텐츠 데이터를 전달(Relay)하기 위한 메시지 처리 서버(120), 그리고 채팅 연결된 사용자 단말들 사이에서 업데이트된 인식 단어 목록(이하, 키워드 목록이라고도 함)과 관련 멀티미디어 콘텐츠 보유 목록(이하, 보유 콘텐츠 정보 목록이라고도 함)을 전송하는 CMS(130)를 구비하고, 이들 간의 다이내믹한 상호 작용에 의해 사용자 단말들(10, 20) 간의 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공한다.According to the present embodiment, the user terminals 10 and 20 are linked with the service providing system 100 through a built-in chat or messenger application, and the multimedia content corresponding to a specific word or keyword included in a voice message during a video chat. Can be displayed on the screen along with the video chat signal including the voice message and output to the speaker. In other words, the service providing system 100 provides a user application with a messenger application including a speech recognition engine installed to provide a voice recognition video synthesis chat service between user terminals 10 and 20 such as a smartphone or a tablet PC. And transmitting voice messages and multimedia content content data transmitted and received between the signal processing server 110 for controlling the current location of the user terminals 10 and 20 and the chat connection control, and the user terminals 10 and 20. A message processing server 120 for transmitting an updated recognition word list (hereinafter referred to as keyword list) and related multimedia content retention list (hereinafter referred to as possessed content information list) between the user terminals connected to the chat. A multimedia interface between the user terminals 10 and 20 with a CMS 130 and by dynamic interaction there between Provides Tents composite video chat service.

본 실시예에 있어서, 어플리케이션은 모바일 어플리케이션을 지칭하며, 이러한 모바일 어플리케이션은 사용자 단말이나 네트워크의 아키텍처 측면에서 볼 때 가장 최상위에 위치하는 어플리케이션 레이어에서 동작하는 모듈이다. 즉, 모바일 어플리케이션은 사용자 측면에서 사용자가 직접 사용자 단말에서 활성/비활성(Activate/Deactivate)을 통제하고 통화, 정보습득, 엔터테인먼트 등의 기능을 제공하는 모듈이며, 사업자 측면에서 해당 어플리케이션에 따라 사용자 단말에 특정 서비스 예컨대, 본 실시예의 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공하거나 사용자 단말에 과금을 매길 수 있는 개체로서 기능할 수 있다. 이러한 어플리케이션은 네트워크를 통해 서비스 제공 시스템과 연동하며 서비스 제공 시스템에서 제공하는 서비스를 사용자가 손쉽게 이용할 수 있도록 하기 위한 것으로, 사용자 단말의 종류나 형태, 사용자 단말에 탑재되는 운영 시스템(Operating System)의 종류나 버전, 개발 언어의 종류 등에 의해 다양한 형태와 구성이 구현가능하다.
In the present embodiment, the application refers to a mobile application, which is a module that operates at the topmost application layer in terms of the architecture of the user terminal or the network. In other words, the mobile application is a module that allows the user to directly control activation / deactivation from the user terminal and provides functions such as call, information acquisition, and entertainment. A specific service, for example, may provide a multimedia content synthesis video chat service of the present embodiment or function as an entity capable of charging a user terminal. These applications are interworked with the service providing system through the network, so that the user can easily use the services provided by the service providing system, and the type or form of the user terminal and the type of operating system mounted on the user terminal. Various forms and configurations can be implemented depending on the type, version, and type of development language.

도 2는 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법의 개략적인 흐름도이다.2 is a schematic flowchart of a method for providing a multimedia content synthetic video chat service according to an embodiment of the present invention.

도 2를 참조하면, 사용자 단말들(10, 20)이 서비스 제공 시스템(100)의 메시지 처리 서버(120)를 통하여 영상 채팅을 위한 음성 메시지를 송수신하는 도중(도 1 참조), 적어도 하나의 사용자 단말은 기탑재된 어플리케이션(이하, 앱이라 함)을 통하여 음성 메시지가 입력되었는지를 판단한다(S210).Referring to FIG. 2, while the user terminals 10 and 20 transmit and receive a voice message for video chatting through the message processing server 120 of the service providing system 100 (see FIG. 1), at least one user The terminal determines whether a voice message is input through a preloaded application (hereinafter, referred to as an app) (S210).

상기 단계(S210)의 판단 결과, 음성 메시지의 입력이면, 사용자 단말은 앱에 구비된 음성 분석 엔진을 통하여 음성 메시지로부터 음성을 추출하고(S212), 추출한 음성으로부터 문장을 생성하고(S214), 생성한 문장으로부터 단어를 분리한다(S216).As a result of the determination in step S210, if the voice message is input, the user terminal extracts the voice from the voice message through the voice analysis engine included in the app (S212), and generates a sentence from the extracted voice (S214), and generates The word is separated from one sentence (S216).

사용자 단말은 음성 인식에 의해 분리된 단어를 미리 설정된 멀티미디어 콘텐츠 데이터에 대응하는 키워드들과 비교한다(S220).The user terminal compares the words separated by speech recognition with keywords corresponding to the preset multimedia content data (S220).

단계(S220)에서의 비교 결과, 단어가 특정 키워드와 일치하지 않으면(S222), 문장의 다음 단어를 읽어온다(S224). 그리고, 상기 단계(S220)로 되돌아가서 새롭게 독출한 단어를 키워드들과 비교한다.As a result of the comparison in step S220, if the word does not match a specific keyword (S222), the next word of the sentence is read (S224). In step S220, the newly read word is compared with the keywords.

한편, 단계(S220)에서의 비교 결과, 단어가 특정 키워드와 일치하면(S222), 특정 키워드에 상응하는 멀티미디어 콘텐츠 데이터를 추출하고(S226), 음성 메시지를 출력하는 사용자 단말의 채팅 화면상에 앞서 추출한 멀티미디어 콘텐츠 데이터를 합성 또는 오버레이(Overlay)하여 재생한다(S230).
On the other hand, when the comparison result in step S220, if the word matches a specific keyword (S222), and extracts the multimedia content data corresponding to the specific keyword (S226), and prior to the chat screen of the user terminal to output a voice message The extracted multimedia content data is synthesized or overlayed and reproduced (S230).

도 3은 본 발명의 또 다른 일 실시예에 따른 멀티미디어 콘텐츠 합성 영상 채팅 서비스 제공 방법의 개략적인 흐름도이다.3 is a schematic flowchart of a method for providing a multimedia content synthetic video chat service according to another embodiment of the present invention.

도 3을 참조하면, 사용자 단말들(10, 20)이 서비스 제공 시스템(100)의 메시지 처리 서버(120)를 통하여 영상 채팅을 위한 음성 메시지를 송수신하는 도중(도 1 참조), 적어도 하나의 사용자 단말은 기탑재된 어플리케이션(이하, 앱이라 함)을 통하여 음성 메시지가 입력되었는지를 판단한다(S310).Referring to FIG. 3, at least one user transmits and receives a voice message for video chatting through the message processing server 120 of the service providing system 100 (see FIG. 1). The terminal determines whether a voice message is input through a preloaded application (hereinafter referred to as an app) (S310).

상기 단계(S310)의 판단 결과, 음성 메시지의 입력이면, 사용자 단말은 앱에 구비된 음성 분석 엔진을 통하여 음성 메시지로부터 음성을 추출하고(S312), 추출한 음성으로부터 문장을 생성하고(S314), 생성한 문장으로부터 단어를 분리한다(S316).As a result of the determination in step S310, if the voice message is input, the user terminal extracts the voice from the voice message through the voice analysis engine included in the app (S312), and generates a sentence from the extracted voice (S314), and generates The word is separated from one sentence (S316).

사용자 단말은 음성 인식에 의해 분리된 단어를 미리 설정된 멀티미디어 콘텐츠의 키워드 목록 내의 키워드들과 비교한다(S320). 이러한 키워드는 상기 단계(S320) 전에 서비스 제공 시스템의 콘텐츠 관리 서버에 요청하여 다운로드 받은 키워드 목록에 포함된 것일 수 있다. 여기에서, 사용자 단말은 키워드 목록과 함께 키워드 목록 내의 각 키워드에 상응하는 콘텐츠 정보를 포함한 보유 콘텐츠 정보 목록을 서비스 제공 시스템으로부터 다운로드 받을 수 있다(S318). 이 단계(S318)는 음성 메시지인지를 판단하는 단계(S310) 이전에 수행될 수도 있다.The user terminal compares the words separated by the speech recognition with keywords in the keyword list of the preset multimedia content (S320). These keywords may be included in the keyword list downloaded by requesting the content management server of the service providing system before the step (S320). Here, the user terminal may download a list of reserved content information including content information corresponding to each keyword in the keyword list together with the keyword list from the service providing system (S318). This step S318 may be performed before step S310 of determining whether it is a voice message.

단계(S320)에서의 비교 결과, 단어가 특정 키워드와 일치하지 않으면(S322), 문장의 다음 단어를 읽어온다(S324). 그리고, 상기 단계(S320)로 되돌아가서 새롭게 독출한 단어를 키워드들과 비교한다.As a result of the comparison in step S320, if the word does not match a specific keyword (S322), the next word of the sentence is read (S324). In step S320, the newly read word is compared with the keywords.

한편, 상기 단계(S320)에서의 비교 결과, 단어가 특정 키워드와 일치하면(S322), 특정 키워드에 상응하는 멀티미디어 콘텐츠 데이터를 서버 즉 서비스 제공 시스템에 요청하고(S326), 서버로부터 받은 해당 콘텐츠를 수신한 후(S328), 음성 메시지를 출력하는 사용자 단말의 채팅 화면상에 서버로부터 수신한 멀티미디어 콘텐츠 데이터를 합성 또는 중첩하여 재생한다(S330).
On the other hand, when the comparison result in the step (S320), if the word matches a specific keyword (S322), the multimedia content data corresponding to the specific keyword is requested to the server, that is, the service providing system (S326), the corresponding content received from the server After reception (S328), the multimedia content data received from the server is synthesized or superimposed on the chat screen of the user terminal which outputs the voice message and reproduced (S330).

도 4는 도 2 또는 도 3의 본 발명의 서비스 제공 방법에 채용가능한 사용자 단말의 메신저 어플리케이션에 대한 개략적인 구성도이다.4 is a schematic configuration diagram of a messenger application of a user terminal employable in the service providing method of FIG. 2 or 3.

도 4를 참조하면, 본 실시예의 사용자 단말에 설치되는 메신저 어플리케이션(12)은 음성 인식 엔진을 포함한다. 메신저 어플리케이션(12)은 SIP Registrar/Proxy 서버 연동부(410), 채팅 연결 및 위치 등록 처리부(412), Content 관리 서버 연동부(420), 다운로드 처리부(422), 메모리(424), MSRP Relay 서버 연동부(430), MSRP 메시지 송신 처리부(432), 음성 인식 처리부(434), 음성 입력 처리부(436), MSRP 메시지 수신 처리부(442), 및 출력부(444)를 구비한다.Referring to FIG. 4, the messenger application 12 installed in the user terminal of this embodiment includes a speech recognition engine. The messenger application 12 includes a SIP Registrar / Proxy server interworking unit 410, a chat connection and location registration processing unit 412, a content management server interworking unit 420, a download processing unit 422, a memory 424, and an MSRP Relay server. An interworking unit 430, an MSRP message transmission processing unit 432, a voice recognition processing unit 434, a voice input processing unit 436, an MSRP message receiving processing unit 442, and an output unit 444 are provided.

여기에서, SIP Registrar/Proxy 서버 연동부(410)는 서비스 제공 시스템의 SIP Registrar/Proxy 서버와 연동하여 SIP 메시지를 송수신한다. 채팅 연결 및 위치 등록 처리부(412)는 사용자 단말 자신의 현재 위치를 등록하거나 채팅 연결한 상대방 사용자 단말의 위치를 조회하여 채팅 연결 및 채팅 연결 유지 기능을 수행한다.Here, the SIP Registrar / Proxy server interworking unit 410 transmits and receives SIP messages in association with the SIP Registrar / Proxy server of the service providing system. The chat connection and location registration processor 412 registers the current location of the user terminal or inquires the location of the counterpart user terminal that chat is connected to perform a chat connection and chat connection maintenance function.

Content 관리 서버 연동부(420)는 서비스 제공 시스템의 콘텐츠 관리 서버와 연동하여 인식 단어에 대한 업데이트된 키워드 목록, 키워드 목록에 상응하는 콘텐츠 정보를 포함한 보유 콘텐츠 정보 목록, 이미지나 플래시 파일 형태의 멀티미디어 콘텐츠 데이터를 수신한다. 다운로드 처리부(422)는 콘텐츠 관리 서버로부터 수신한 키워드 목록, 보유 콘텐츠 정보 목록, 및 멀티미디어 콘텐츠 데이터 등을 사용자 단말의 메모리 등에 저장한다.The content management server interworking unit 420 works in conjunction with the content management server of the service providing system to update the keyword list for the recognized word, the content information list including the content information corresponding to the keyword list, and the multimedia content in the form of an image or a flash file. Receive data. The download processing unit 422 stores the keyword list, the retained content information list, and the multimedia content data received from the content management server in the memory of the user terminal.

MSRP Relay 서버 연동부(430)는 서비스 제공 시스템의 MSRP Relay 서버와 연동하여 MSRP 메시지를 송수신한다. MSRP 메시지 수신 처리부(442)는 MSRP Relay 서버 연동부(430)로부터 수신한 MSRP 메시지를 분석하여 영상 채팅용 오디오 신호, 영상 채팅용 비디오 신호, 영상 채팅 신호에 합성할 이미지 또는 플래시인지를 구별하여 각각의 데이터를 추출하고 출력부(444)로 데이터를 전송한다. 출력부(444)는 음성 메시지를 스피커를 통해 출력하고 문자와 이미지, 플래시 데이터를 채팅 진행 화면에 출력한다.The MSRP Relay server interworking unit 430 transmits and receives MSRP messages in association with the MSRP Relay server of the service providing system. The MSRP message receiving processing unit 442 analyzes the MSRP message received from the MSRP relay server interworking unit 430 to distinguish whether the video chat audio signal, the video chat video signal, the image to be synthesized into the video chat signal, or the flash is respectively. Extract the data from and transmit the data to the output unit 444. The output unit 444 outputs a voice message through the speaker and outputs text, images, and flash data to the chat progress screen.

음성 입력 처리부(436)는 채팅 진행시 사용자가 입력한 음성을 입력받아 음성 인식 처리부(434)로 전송한다. 음성 인식 처리부(434)는 음성 인식 엔진과 메모리에 저장되어 있는 키워드 목록을 사용하여 합성할 이미지 또는 플래시를 조회하여 출력부(444)를 통해 화면에 합성 이미지 또는 플래시를 출력하고 MSRP 메시지 송신 처리부(432)에 음성 메시지와 합성할 이미지 또는 플래시 데이터를 전송한다. MSRP 메시지 송신 처리부(432)는 영상 채팅용 오디오 신호, 영상 채팅용 비디오 신호, 영상 채팅용 비디오 신호에 합성할 이미지 또는 플래시를 MSRP 메시지로 변환하여 MSRP Relay 서버 연동부로 전송한다(430).
The voice input processor 436 receives a voice input by the user during a chat and transmits it to the voice recognition processor 434. The speech recognition processor 434 queries the image or flash to be synthesized using the keyword list stored in the speech recognition engine and the memory, and outputs the synthesized image or flash on the screen through the output unit 444 and transmits the MSRP message transmission processor ( 432 the image or flash data to be synthesized with the voice message. The MSRP message transmission processor 432 converts an image or flash to be synthesized into the video chat audio signal, the video chat video signal, and the video chat video signal into an MSRP message and transmits the converted MSRP message to the MSRP relay server interworking unit 430.

도 5는 도 4의 메신저 어플리케이션에 채용가능한 음성 인식 처리부에 대한 개략적인 구성도이다.FIG. 5 is a schematic configuration diagram of a speech recognition processor that may be employed in the messenger application of FIG. 4.

도 5를 참조하면, 음성 인식 처리부(434)는 음성 추출부(510), 음성 인식 엔진(520), 문장 단어 처리부(530), 키워드 단어 검출부(540), 및 단어 이미지 검색부(550)를 구비할 수 있다.Referring to FIG. 5, the speech recognition processor 434 may include a speech extractor 510, a speech recognition engine 520, a sentence word processor 530, a keyword word detector 540, and a word image searcher 550. It can be provided.

음성 추출부(510)는 수신한 음성 메시지 데이터에서 음성을 추출한다.The voice extractor 510 extracts a voice from the received voice message data.

음성 인식 엔진(520)은 추출한 음성 메시지를 문장으로 변경한다. 음성 인식 엔진(520)은 음성 추출부(510)로부터 실시간으로 입력되는 음성 데이터를 음성 스트리밍 데이터 수신 프로세스(522)에 따라 수신하고, 소정의 음성 분석 프로세스(524)를 수행하는 엔진을 활용하여 데이터베이스의 음성 분석 정보(526)를 토대로 해당 음성을 분석하고, 분석된 음성으로부터 획득한 문장을 음성 분석 데이터 전송 프로세스(528)를 통하여 실시간으로 문장 단어 처리부(530)로 전송한다.The speech recognition engine 520 changes the extracted voice message into a sentence. The voice recognition engine 520 receives voice data input in real time from the voice extractor 510 according to the voice streaming data receiving process 522, and utilizes an engine that performs a predetermined voice analysis process 524. The voice is analyzed based on the voice analysis information 526, and the sentence obtained from the analyzed voice is transmitted to the sentence word processor 530 in real time through the voice analysis data transmission process 528.

문장 단어 처리부(530)는 문장을 개별 단어로 분리하는 역할을 수행한다.The sentence word processor 530 divides a sentence into individual words.

키워드 단어 검출부(540)는 문장에서 등록된 키워드를 찾기 위해 개별 단어로 분리하는 역할을 수행한다.The keyword word detector 540 divides the individual words into individual words in order to find a keyword registered in a sentence.

단어 이미지 검색부(550)는 개별 단어에 대해 메모리에 등록되어 있는 키워드 목록을 사용하여 관련 이미지 또는 플래시를 검색하는 역할을 수행한다. 단어 이미지 검색부(550)는 사용자 단말 내의 저장부에서 또는 서비스 제공 시스템의 콘텐츠 관리 서버에 요청하여 해당 콘텐츠를 검색할 수 있다.
The word image search unit 550 searches for a related image or flash by using a keyword list registered in memory for each word. The word image search unit 550 may search for the corresponding content in the storage unit of the user terminal or by requesting the content management server of the service providing system.

도 6은 도 1의 사용자 단말의 위치 등록 절차를 설명하기 위한 흐름도이다.6 is a flowchart illustrating a location registration procedure of a user terminal of FIG. 1.

도 6에 도시한 바와 같이, 제1 사용자 단말인 제1 단말(10)과 제2 사용자 단말인 제2 단말(20)의 위치 등록 절차는 각 사용자 단말(10, 20)과 SIP Rgegistrar/Proxy 서버(110; 이하, 간략히 SIP Proxy 서버라고 함) 간에 SIP 프로토콜을 사용하여 수행되며 그 연동 절차는 다음과 같다.As shown in FIG. 6, the location registration procedure of the first terminal 10, which is the first user terminal, and the second terminal 20, which is the second user terminal, is performed by each of the user terminals 10 and 20 and the SIP Rgegistrar / Proxy server. (110; hereinafter referred to simply as SIP Proxy server) is performed using the SIP protocol and the interworking procedure is as follows.

먼저, SIP Proxy 서버(110)는 제1 단말(10)로부터 REGISTER 메시지를 수신하고, REGISTER 메시지 수신에 따라 해당 등록 절차를 시작한다(S610).First, the SIP proxy server 110 receives a REGISTER message from the first terminal 10 and starts a corresponding registration procedure according to the reception of the REGISTER message (S610).

SIP Proxy 서버(110)는 서버에서 생성한 임시키(Nonce Key)와 인증 알고리즘을 포함한 401 Unauthorized 응답 메시지를 제1 단말(10)에 전송하여 인증을 요구한다(S612). 제1 단말(10)은 사용자 암호를 사용하여 인증키 값을 생성하고 인증(Authorization)을 포함한 REGISTER 메시지를 SIP Proxy 서버(110)로 전송한다(S614).The SIP proxy server 110 transmits a 401 Unauthorized response message including a nonce key and an authentication algorithm generated by the server to the first terminal 10 to request authentication (S612). The first terminal 10 generates an authentication key value using the user password and transmits a REGISTER message including authentication (Authorization) to the SIP Proxy server 110 (S614).

SIP Proxy 서버(110)는 제1 단말(10)로부터 인증을 포함한 REGISTER 메시지를 받고 인증키 값을 검증하여 인증을 수행한 후 인증이 정상적으로 이루어지면 제1 단말(10)의 위치 정보를 등록하고 200 OK 메시지를 제1 단말(10)로 전송한다.The SIP proxy server 110 receives the REGISTER message including the authentication from the first terminal 10, verifies the authentication key value, performs authentication, and if the authentication is normally performed, the SIP proxy server 110 registers the location information of the first terminal 10. The OK message is transmitted to the first terminal 10.

위의 단계들(S610 ~ S616)과 유사하게, SIP Proxy 서버(110)는 제1 사용자 단말인 제2 단말(20)로부터 REGISTER 메시지를 수신하고, 제2 사용자 단말(20)에 대한 등록 절차를 시작한다(S620).Similar to the above steps S610 to S616, the SIP proxy server 110 receives a REGISTER message from the second terminal 20, which is the first user terminal, and performs a registration procedure for the second user terminal 20. Start (S620).

SIP Proxy 서버(110)는 서버에서 생성한 임시 키와 인증 알고리즘을 포함한 401 Unauthorized 응답 메시지를 제2 단말(20)로 전송하여 인증을 요구한다(S622). 제2 단말(20)은 사용자 암호를 사용하여 인증키 값을 생성하고 인증을 포함한 REGISTER 메시지를 SIP Proxy 서버(110)로 전송한다(S624).The SIP proxy server 110 transmits a 401 Unauthorized response message including a temporary key generated by the server and an authentication algorithm to the second terminal 20 to request authentication (S622). The second terminal 20 generates an authentication key value using the user password and transmits a REGISTER message including the authentication to the SIP proxy server 110 (S624).

SIP Proxy 서버(110)는 제2 단말(20)로부터 인증을 포함한 REGISTER 메시지를 받고 인증키 값을 검증하여 인증을 수행한 후 인증이 정상적으로 이루어지면 제2 단말(20)의 위치 정보를 등록하고 200 OK 메시지를 제2 단말(20)로 전송한다.
The SIP proxy server 110 receives the REGISTER message including the authentication from the second terminal 20, verifies the authentication key value, performs the authentication, and if the authentication is normally performed, the SIP proxy server 110 registers the location information of the second terminal 20. The OK message is transmitted to the second terminal 20.

도 7은 도 1의 사용자 단말들 간의 채팅 연결 절차를 설명하기 위한 흐름도이다.7 is a flowchart illustrating a chat connection procedure between user terminals of FIG. 1.

도 7에 도시한 바와 같이, 제1 단말(10)은 우선 메시지 전송을 위한 MSRP Relay 서버의 MSRP URI(Uniform Resource Identifier)를 얻기 위해 MSRP AUTH 메시지를 MSRP Relay 서버(120)로 전송한다(S711).As shown in FIG. 7, the first terminal 10 first transmits an MSRP AUTH message to the MSRP Relay server 120 to obtain an MSRP Uniform Resource Identifier (URI) of the MSRP Relay server for message transmission (S711). .

MSRP Relay 서버(120)는 인증이 필요함을 알리기 위해 MSRP 401 무인증(Unauthorized) 응답 메시지를 제1 단말(10)로 전송한다(S713). 제1 단말(10)은 인증 키를 포함한 인증(Authorization) 헤더 필드를 MSRP AUTH 메시지에 실어 MSRP Relay 서버에 전송한다(S715).The MSRP Relay server 120 transmits an MSRP 401 Unauthorized response message to the first terminal 10 to inform that authentication is required (S713). The first terminal 10 transmits the authentication header field including the authentication key to the MSRP AUTH message to the MSRP Relay server (S715).

MSRP Relay 서버(120)는 Authorization 헤더 필드 값들을 체크하여 인증을 수행하고 인증이 성공하면 제1 단말(10)이 메시지를 전송할 때 사용할 MSRP URI를 MSRP 200 OK 응답 메시지에 실어 제1 단말(10)로 전송한다(S717). 제1 단말(10)은 자신의 MSRP URI를 SDP(Session Description Procotol)에 실은 INVITE 메시지를 채팅 서비스에 의해 연결하고자 하는 제2 단말(20)로 SIP Proxy 서버(110)를 경유하여 전송한다(S719).The MSRP Relay server 120 performs authentication by checking Authorization header field values, and if authentication is successful, the MSRP Relay server 120 loads the MSRP URI to be used when the first terminal 10 transmits the message in the MSRP 200 OK response message. In step S717. The first terminal 10 transmits the INVITE message carrying its MSRP URI to the Session Description Procotol (SDP) to the second terminal 20 to be connected by the chat service via the SIP Proxy server 110 (S719). ).

SIP Proxy 서버(110)는 등록 서버 등을 통해 제2 단말(20)의 위치를 조회하고 제1 단말(10)로 호 연결 시도 중임을 알리기 위해 100 Trying 메시지를 전송한다(S721). SIP Proxy 서버(110)는 제1 단말(10)로부터 수신한 INVITE 메시지를 앞서 조회한 위치 정보에 따라 제2 단말(20)의 실제 위치로 전송한다(S723).The SIP proxy server 110 transmits a 100 Trying message to inquire the location of the second terminal 20 through the registration server and to inform the first terminal 10 of the call connection attempt (S721). The SIP proxy server 110 transmits the INVITE message received from the first terminal 10 to the actual location of the second terminal 20 according to the previously inquired location information (S723).

제2 단말(20)은 제1 단말(10) 측으로부터의 INVITE 메시지를 받으면 제1 단말(10)의 MSRP URI를 저장하고 메시지 전송을 위한 MSRP Relay 서버의 MSRP URI를 얻기 위해 MSRP Relay 서버(120)로 MSRP AUTH 메시지를 전송한다(S725).When the second terminal 20 receives the INVITE message from the first terminal 10, the MSRP relay server 120 stores the MSRP URI of the first terminal 10 and obtains the MSRP URI of the MSRP Relay server for message transmission. In step S725, the MSRP AUTH message is transmitted.

MSRP Relay 서버(120)는 인증이 필요함을 알리기 위해 제2 단말(20)로 MSRP 401 Unauthorized 응답 메시지를 전송한다(S727). 제2 단말(20)은 인증 키를 포함한 인증(Authorization) 헤더 필드를 MSRP AUTH 메시지에 실어 MSRP Relay 서버(120)로 전송한다(S729). MSRP Relay 서버(120)는 Authorization 헤더 필드 값들을 체크하여 인증을 수행하고 인증이 성공하면 제2 단말(20)이 메시지를 전송할 때 사용할 MSRP URI를 MSRP 200 OK 응답 메시지에 실어 제2 단말(20)로 전송한다(S731).The MSRP Relay server 120 transmits an MSRP 401 Unauthorized response message to the second terminal 20 to inform that authentication is required (S727). The second terminal 20 transmits the authentication header field including the authentication key to the MSRP AUTH message to the MSRP relay server 120 (S729). The MSRP Relay server 120 performs authentication by checking Authorization header field values, and if the authentication succeeds, the MSRP Relay server 120 loads the MSRP URI to be used when the second terminal 20 transmits the message in the MSRP 200 OK response message. In step S731.

제2 단말(20)은 180 Ringing 메시지를 SIP Proxy 서버(110)로 전송하고 제2 단말(20)의 사용자에게 채팅 연결 요청이 왔음을 알린다(S733).The second terminal 20 transmits a 180 ringing message to the SIP proxy server 110 and notifies the user of the second terminal 20 that a chat connection request has been received (S733).

SIP Proxy 서버(120)는 180 Ringing 메시지를 제1 단말(10)로 전송하여 제2 단말(20)이 응답 대기 중임을 알린다(S735). 제2 단말(20)의 사용자가 참여를 선택하여 채팅에 참가하면, 제2 단말(20)은 MSRP URI를 SDP에 실은 200 OK 메시지를 SIP Proxy 서버(110)로 전송한다(S737). SIP Proxy 서버(110)는 200 OK 메시지를 제1 단말(10)로 전송하여 제2 단말(20)이 채팅에 참여하였음을 알린다(S739). 이때, 제1 단말(10)은 200 OK 메시지 내 SDP로부터 제2 단말(20)의 MSRP URI를 얻는다.The SIP proxy server 120 transmits a 180 ringing message to the first terminal 10 to inform that the second terminal 20 is waiting for a response (S735). When the user of the second terminal 20 joins the chat by selecting participation, the second terminal 20 transmits a 200 OK message carrying the MSRP URI to the SDP to the SIP proxy server 110 (S737). The SIP proxy server 110 transmits a 200 OK message to the first terminal 10 to inform that the second terminal 20 participated in the chat (S739). At this time, the first terminal 10 obtains the MSRP URI of the second terminal 20 from the SDP in the 200 OK message.

제1 단말(10)은 ACK 메시지를 SIP Proxy 서버(110)로 전송하여 채팅 연결이 완료되었음을 알린다(S741). SIP Proxy 서버(110)는 제2 단말(20)로 ACK 메시지를 전달한다(S743). 전술한 과정이 완료되면, 제1 단말(10)과 제2 단말(20)은 MSRP Relay 서버(120)와 연결이 되고 상대방의 MSRP URI를 사용하여 메시지를 전송할 수 있게 된다. MSRP Relay 서버(120)와의 연결은 SIP BYE 메시지를 전송하거나 수신하기 전까지는 유지된다.
The first terminal 10 notifies that the chat connection is completed by transmitting an ACK message to the SIP Proxy server 110 (S741). The SIP proxy server 110 transmits an ACK message to the second terminal 20 (S743). When the above-described process is completed, the first terminal 10 and the second terminal 20 is connected to the MSRP Relay server 120 and can transmit a message using the MSRP URI of the counterpart. The connection with the MSRP Relay server 120 is maintained until the SIP BYE message is transmitted or received.

도 8은 도 1의 사용자 단말들 간의 메시지를 송수신하는 절차를 설명하기 위한 흐름도이다.8 is a flowchart illustrating a procedure of transmitting and receiving a message between user terminals of FIG. 1.

도 8에 도시한 바와 같이, 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 이용하여 제1 단말(10) 및 제2 단말(20) 간의 메시지 송수신 절차는 다음과 같다. 여기에서, 메시지는 음성 메시지, 영상 채팅 신호에 합성될 이미지 또는 플래시와 같은 멀티미디어 콘텐츠 데이터를 포함한다. 아래의 일부 절차들 S810 ~ S816은 제1 단말(10)에서 제2 단말(20)로 메시지를 전송하는 과정을, 또다른 일부 절차들 S820 ~ S826은 제2 단말(20)에서 제1 단말(10)로 메시지를 전송하는 과정을 나타낸다.As shown in FIG. 8, a message transmission / reception procedure between the first terminal 10 and the second terminal 20 using the multimedia content composite video chat service is as follows. Here, the message includes multimedia content data such as flash or an image to be synthesized with a voice message, a video chat signal. Some of the procedures S810 to S816 below transmit a message from the first terminal 10 to the second terminal 20, and some of the other procedures S820 to S826 are performed on the first terminal (the second terminal 20). 10) shows the process of sending a message.

우선, 제1 단말(10)은 MSRP SEND 메시지에 제2 단말(20)로 전달할 메시지를 실어 MSRP Relay 서버(120)로 전송한다(S810). MSRP Relay 서버(120)는 수신된 MSRP SEND 메시지의 목적지 MSRP URI를 사용하여 제2 단말(20)로 SEND 메시지를 전달한다(S812).First, the first terminal 10 loads a message to be delivered to the second terminal 20 in the MSRP SEND message and transmits the message to the MSRP relay server 120 (S810). The MSRP Relay server 120 transmits the SEND message to the second terminal 20 using the destination MSRP URI of the received MSRP SEND message (S812).

제2 단말(20)은 SEND 메시지를 수신하였음을 알리기 위해 MSRP Relay 서버(120)로 200 OK 메시지를 전송하고(S814) 수신한 SEND 메시지를 분석하여 음성, 합성될 이미지, 플래시 데이터를 추출하여 처리한다. 제2 단말(20)에는 음성 메시지에서 획득한 단어에 상응하는 멀티미디어 콘텐츠가 음성 메시지와 함께 출력될 수 있다. MSRP Relay 서버(120)는 제2 단말(20)에서 메시지를 수신하였음을 알리기 위해 제1 단말(10)로 200 OK 메시지를 전달한다(S816).The second terminal 20 transmits a 200 OK message to the MSRP relay server 120 to inform the reception of the SEND message (S814), analyzes the received SEND message, and extracts and processes voice, image to be synthesized, and flash data. do. The second terminal 20 may output multimedia content corresponding to the word acquired in the voice message together with the voice message. The MSRP relay server 120 transmits a 200 OK message to the first terminal 10 to inform that the second terminal 20 has received the message (S816).

다음으로, 제2 단말(20)은 MSRP SEND 메시지에 제1 단말(10)로 전달할 메시지를 실어 MSRP Relay 서버(120)로 전송한다(S820). MSRP Relay 서버(120)는 수신된 MSRP SEND 메시지의 목적지 MSRP URI를 사용하여 제1 단말(10)로 SEND 메시지를 전달한다(S822).Next, the second terminal 20 loads a message to be delivered to the first terminal 10 in the MSRP SEND message and transmits the message to the MSRP relay server 120 (S820). The MSRP Relay server 120 transmits the SEND message to the first terminal 10 using the destination MSRP URI of the received MSRP SEND message (S822).

제1 단말(10)은 제2 단말(20)의 SEND 메시지를 수신하였음을 알리기 위해 MSRP Relay 서버(120)로 200 OK 메시지를 전송하고(S824) 수신한 SEND 메시지를 분석하여 음성, 합성될 이미지, 플래시 데이터를 추출하여 처리한다. 제1 단말(10)에는 제2 단말(20) 측으로부터의 음성 메시지에서 획득한 단어에 상응하는 멀티미디어 콘텐츠가 음성 메시지와 함께 출력될 수 있다. MSRP Relay 서버(120)는 제1 단말(10)에서 메시지를 수신하였음을 알리기 위해 제2 단말(20)로 200 OK 메시지를 전달한다(S826).
The first terminal 10 transmits a 200 OK message to the MSRP relay server 120 to inform the reception of the SEND message of the second terminal 20 (S824) and analyzes the received SEND message to perform voice and image synthesis. Extract and process flash data. In the first terminal 10, multimedia content corresponding to a word acquired in the voice message from the second terminal 20 may be output together with the voice message. The MSRP Relay server 120 transmits a 200 OK message to the second terminal 20 to inform that the first terminal 10 has received the message (S826).

도 9는 도 1의 사용자 단말들 간의 채팅 연결 종료 절차를 설명하기 위한 흐름도이다.9 is a flowchart illustrating a chat connection termination procedure between user terminals of FIG. 1.

도 9에 도시한 바와 같이, 제1 단말(10) 및 제2 단말(20) 간의 채팅 연결 종료 절차에 있어서, 채팅 종료 시작은 제1 단말(10) 또는 제2 단말(20)에 관계없이 어느 쪽에서라도 먼저 시작할 수 있다. 제1 단말(10)에서 채팅 종료를 수행하는 절차는 다음과 같다.As shown in FIG. 9, in the chat connection termination procedure between the first terminal 10 and the second terminal 20, the chat end start is performed regardless of the first terminal 10 or the second terminal 20. You can start from the side first. The procedure of performing a chat end in the first terminal 10 is as follows.

제1 단말(10)은 채팅 종료를 위해 SIP Proxy 서버(110)로 SIP BYE 메시지를 전송한다(S910). SIP Proxy 서버(110)는 제2 단말(20)로 BYE 메시지를 전달한다(S912).The first terminal 10 transmits a SIP BYE message to the SIP Proxy server 110 to terminate the chat (S910). The SIP proxy server 110 transmits a BYE message to the second terminal 20 (S912).

제2 단말(20)은 MSRP Relay 서버(120)와의 연결을 종료하고 채팅 연결에 필요한 자원을 모두 해제하고 200 OK 메시지를 SIP Proxy 서버(110)로 전송한다(S914). SIP Proxy 서버(110)는 200 OK 메시지를 제1 단말(10)로 전송한다(S916). 200 OK 메시지를 수신한 제1 단말(10)은 MSRP Relay 서버(120)와의 연결을 종료하고 모든 자원들을 해제하여 채팅을 종료한다.
The second terminal 20 terminates the connection with the MSRP Relay server 120, releases all resources necessary for the chat connection, and transmits a 200 OK message to the SIP Proxy server 110 (S914). The SIP proxy server 110 transmits a 200 OK message to the first terminal 10 (S916). Receiving the 200 OK message, the first terminal 10 terminates the connection with the MSRP relay server 120 and releases all resources to end the chat.

도 10은 도 1의 서비스 제공 시스템에 채용가능한 콘텐츠 자동 업데이트 기능을 설명하기 위한 개략적인 블록도이다.FIG. 10 is a schematic block diagram illustrating a content automatic update function employable in the service providing system of FIG. 1.

도 10에 도시한 바와 같이, 본 실시예의 서비스 제공 시스템(100)은 키워드 목록과 키워드 목록 내의 각 키워드에 관련 이미지 또는 플래시 형태의 콘텐츠 정보를 포함한 보유 콘텐츠 정보 목록과 해당 콘텐츠들이 사용자 단말(10)과의 연동으로 자동으로 사용자 단말(10)에 업데이트될 수 있도록 한다.As shown in FIG. 10, the service providing system 100 according to the present embodiment includes a keyword list and a list of content information including content information in the form of images or flashes associated with each keyword in the keyword list, and corresponding contents of the user terminal 10. It can be updated to the user terminal 10 automatically in conjunction with.

사용자 단말(10)에서 음성 인식 엔진을 포함한 메신저 어플리케이션(12)이 구동되면, 사용자 단말(10)의 다운로드 처리부(420)는 메모리(424)로부터 사용자 단말(10)에 저장되어 있는 콘텐츠 목록을 콘텐츠 관리 서버 연동부(420)를 통해 서비스 제공 시스템(100)의 콘텐츠 관리 서버(130)로 전송한다.When the messenger application 12 including the speech recognition engine is driven in the user terminal 10, the download processing unit 420 of the user terminal 10 displays a list of contents stored in the user terminal 10 from the memory 424. The management server interworking unit 420 transmits the content management server 130 to the service providing system 100.

콘텐츠 관리 서버(130) 내의 콘텐츠 업로드 처리부(134)는 콘텐츠 클라이언트 연동부(132)를 통해 사용자 단말(10)의 콘텐츠 목록을 수신하고 콘텐츠 데이터베이스(Database)(136)로부터 업데이트할 콘텐츠가 존재하는지를 조회한다. 업데이트할 콘텐츠가 있으면, 콘텐츠 업로드 처리부(134)는 콘텐츠 클라이언트 연동부(132)를 거쳐 업데이트 콘텐츠의 내용을 사용자 단말(10)로 전송한다. 사용자 단말(10) 내 다운로드 처리부(422)는 콘텐츠 관리 서버 연동부(420)로부터 수신한 업데이트 콘텐츠 내용을 메모리에 저장하고 최종 콘텐츠 정보를 갱신한다. 이후 사용자가 채팅 서비스를 수행 도중 서비스 제공 시스템의 콘텐츠가 업데이트되면 콘텐츠 업로드 처리부(134)는 콘텐츠 클라이언트 연동부(132)를 거쳐 업데이트된 콘텐츠 내용을 사용자 단말(10)로 전송하고 사용자 단말(10)은 다운로드 처리부(422)를 통해 메모리(424)에 업데이트된 콘텐츠 내용을 저장하고 갱신한다.The content upload processing unit 134 in the content management server 130 receives the content list of the user terminal 10 through the content client interworking unit 132 and inquires whether there is content to update from the content database 136. do. If there is content to be updated, the content upload processing unit 134 transmits the content of the updated content to the user terminal 10 via the content client interworking unit 132. The download processing unit 422 in the user terminal 10 stores the updated content content received from the content management server interworking unit 420 in the memory and updates the final content information. Then, if the content of the service providing system is updated while the user is performing the chat service, the content upload processing unit 134 transmits the updated content content to the user terminal 10 via the content client interworking unit 132 and the user terminal 10. Stores and updates the updated content content in the memory 424 through the download processing unit 422.

전술한 본 실시예에 의하면, 3G/4G PS 망이나 WiFi 망에서 스마트폰 또는 태블릿 퍼스널컴퓨터와 같은 사용자 단말을 이용하여 실시간으로 사용자가 입력한 음성과 그에 대응하는 추가적인 이미지 혹은 영상을 채팅 신호와 함께 재생하여 보여주는 멀티미디어 콘텐츠 합성 영상 채팅 서비스를 제공할 수 있다.
According to the present embodiment described above, a voice input by a user in real time using a user terminal such as a smartphone or a tablet personal computer in a 3G / 4G PS network or a WiFi network and an additional image or video corresponding thereto along with a chat signal. It is possible to provide a multimedia video chat service of playing and showing multimedia contents.

도 11은 도 1의 사용자 단말에 채용가능한 메신저 어플리케이션에서의 사용자 인터페이스 화면을 나타낸 도면이다.FIG. 11 is a diagram illustrating a user interface screen in a messenger application employable in the user terminal of FIG. 1.

도 11에 도시한 바와 같이, 사용자가 음성 인식 엔진을 포함한 메신저 어플리케이션을 사용자 단말에서 실행하면 메신저 어플리케이션은 사용자가 등록한 친구 목록을 사용자 단말의 화면에 나열한다(1110).As illustrated in FIG. 11, when a user executes a messenger application including a speech recognition engine in a user terminal, the messenger application lists a friend list registered by the user on a screen of the user terminal (1110).

사용자는 친구 목록 우측의 "채팅 버튼"을 눌러 원하는 사용자와 영상 채팅을 시작할 수 있다. 화면 상단에 표시가능한 사용자 메뉴로는 "친구 관리", "콘텐츠 관리", "설정" 등이 있으며, 그러한 경우, "친구 관리" 메뉴를 이용하여 친구 목록 내의 연락처를 추가, 수정, 또는 삭제할 수 있고, "콘텐츠 관리" 메뉴를 이용하여 사용자 단말에 등록되어 있는 인식 단어 또는 키워드 목록, 및 그와 관련된 플래시 또는 이미지의 목록을 확인할 수 있고 서비스 제공 시스템에 접속하여 추가적인 인식 단어 또는 키워드 목록과 그와 관련된 멀티미디어 콘텐츠 데이터를 다운로드 할 수 있다. 사용자는 "설정" 메뉴를 이용하여 자신의 사진이나 채팅 문자의 색상, 음성 입력 모드 등과 같은 서비스 옵션을 설정할 수 있다.The user can press the "Chat button" to the right of the friend list to start a video chat with the desired user. User menus that can be displayed at the top of the screen include "Friends Management", "Content Management", "Settings", etc. In this case, the "Friends Management" menu can be used to add, edit, or delete contacts in the friend list. , A list of recognized words or keywords registered in the user terminal, and a list of flashes or images associated with the contents can be checked using the "Content Management" menu. Download multimedia content data. The user may set service options such as a color of a picture or a chat text, a voice input mode, etc. using the “setting” menu.

발신측 사용자 단말과 착신측 사용자 단말의 영상 통화 중(1121, 1122)에 발신측 사용자가 음성으로 “커피 마시러 갈래?”라는 말을 하면, 사용자 단말에 탑재된 음성 인식 엔진은 음성 메시지에서 “커피”라는 단어 또는 키워드를 인식한다. 음성 인식 엔진의 인식 결과에 따라 발신측 사용자 단말의 제1 앱은 키워드와 관련된 플래시 또는 이미지를 자신의 단말의 콘텐츠 데이터베이스(1133)에서 검색한 후 해당 콘텐츠를 화면에 출력한다(1131).During a video call between the calling user terminal and the called user terminal (1121, 1122), when the calling user speaks "Would you like to drink coffee?" By voice, the voice recognition engine mounted on the user terminal may say " Recognize the word or keyword. According to the recognition result of the speech recognition engine, the first app of the calling user terminal searches for the flash or image related to the keyword in the content database 1133 of the terminal and outputs the corresponding content to the screen (1131).

발신측 사용자 단말의 콘텐츠 목록으로부터 사용자 인터페이스상에 출력할 플래시 또는 이미지가 결정되면, 해당 플래시 또는 이미지는 화면에 합성 또는 오버레이되어 표시된다. 그리고, 입력된 음성 메시지는 상대방에게 전달된다.When the flash or image to be output on the user interface is determined from the content list of the calling user terminal, the flash or image is displayed by being synthesized or overlaid on the screen. The input voice message is transmitted to the other party.

착신측 사용자 단말에 탑재된 제2 앱은 발신측 사용자 단말로부터 음성 메시지를 받고, 음성 메시지에서 "커피"라는 키워드를 인식하고, 키워드에 상응하는 멀티미디어 콘텐츠를 콘텐츠 데이터베이스(1134)에서 검색한 후 화면에 영상 통화 또는 영상 채팅 신호와 합성 또는 오버레이하여 출력한다. 이때, 수신한 음성 메시지는 스피커를 통해 출력된다(1132).The second app mounted on the called user terminal receives a voice message from the calling user terminal, recognizes the keyword "coffee" in the voice message, searches for the multimedia content corresponding to the keyword in the content database 1134, and then displays the screen. Synthesize or overlay with video call or video chat signal to output. In this case, the received voice message is output through the speaker (1132).

음성 인식은 주위 소음의 영향으로 인하여 오차가 발생할 수 있기 때문에 제1 앱 및 제2 앱은 유사 키워드와 유사 키워드와 관련된 플래시 또는 이미지들도 같이 검색하고 검색된 콘텐츠 목록을 출력하여 사용자가 정확한 플래시 또는 이미지를 선택할 수 있도록 동작할 수 있다.Because speech recognition can cause errors due to ambient noise, the first app and the second app can also search for similar keywords and the flash or images related to the keyword and output a list of the searched contents so that the user can find the correct flash or image. It can be operated to select.

전술한 실시예에 적용가능한 변형예로서, 소정 메뉴를 선택하여 플러스 음성 모드를 활성화함으로써, 음성 메시지 내의 단어 인식에 더하여 메시저 어플리케이션 내의 문자 인식 엔진을 통해 채팅창으로 입력된 문자 메시지 내의 특정 단어를 인식하고 그에 상응하여 추가적인 멀티미디어 콘텐츠를 검색하여 영상 채팅 신호와 함께 사용자 단말에 표시해 줄 수 있다. 그러한 경우, 음성 메시지와 문자 메시지 모두에 기초하여 멀티미디어 콘텐츠를 합성한 영상 채팅 서비스를 이용할 수 있으므로, 사용자는 영상 채팅의 내용을 쉽게 강조할 수 있고 보다 입체감 있는 채팅을 할 수 있다.
As a variant applicable to the above-described embodiment, by selecting a predetermined menu to activate the plus voice mode, in addition to the word recognition in the voice message, a specific word in the text message entered into the chat window through the text recognition engine in the messer application is displayed. Recognize and search for additional multimedia content accordingly and display the video chat signal on the user terminal. In such a case, since a video chat service in which multimedia contents are synthesized based on both a voice message and a text message can be used, a user can easily emphasize the content of the video chat and have a three-dimensional chat.

이상에서, 바람직한 실시예들을 참조하여 본 발명을 설명하였으나, 본 발명은 상기의 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명은 첨부한 특허청구범위 및 도면 등의 전체적인 기재를 참조하여 해석되어야 할 것이며, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. It will be possible. Accordingly, the present invention should be construed by reference to the appended claims, drawings, and the like, and all of its equivalents or equivalent variations fall within the scope of the present invention.

Claims

In the method for providing a multimedia content composite video chat service in at least one user terminal of the first terminal and the second terminal through a service providing system for providing a communication service including a messaging service,
Extracting a voice from a voice message obtained from the user terminal;
Separating a word from the extracted voice;
A third step of searching for a keyword matching the word by comparing the separated word with a keyword in a preset keyword list;
Extracting multimedia content data in the form of an image or a video corresponding to the keyword; And
And outputting the voice message and reproducing the multimedia content data.

The method of claim 1,
And the third step includes receiving the keyword list and a list of reserved content information corresponding to the keyword list from a content management server of a service providing system.

The method of claim 2,
The user terminal is connected to the Session Initiation Protocol (SIP) Registrar / Proxy server of the service providing system, the SIP Registrar / Proxy server interworking unit for generating and sending and receiving SIP messages; MSRP Relay server interworking unit for transmitting and receiving instant messages through a Message Session Relay Protocol (MSRP) Relay server of the service providing system; A content management server interworking unit connected to the content management server; And a speech recognition processing unit connected to the MSRP relay server interworking unit and recognizing a word or a keyword in a voice message.

The method of claim 3,
The speech recognition processing unit,
A voice extractor for delivering a voice message to a voice recognition engine in real time;
A speech recognition engine configured to recognize the speech message received by the speech extractor and generate analysis data;
A sentence word processor for generating a sentence from analysis data received from the speech recognition engine;
A keyword word detector for detecting a word or keyword from a sentence received from the sentence word processor; And
A word image search unit for searching for multimedia content corresponding to a word or keyword received from the keyword word detector
Multimedia content synthesis video chat service providing method comprising a.

The method of claim 4, wherein
And the fourth step includes a step of receiving the multimedia content data from the content management server by a download processor connected to the content management server linkage unit.

The method of claim 5,
The content management server includes a content client interworking unit connected to the content management server interlocking unit; A content upload processor connected to the content client interworking unit to transmit content to a user terminal; And a content database storing and managing multimedia content corresponding to the word or keyword.

The method of claim 3,
The user terminal is connected to the SIP Registrar / Proxy server interworking unit, and when the location of the user terminal is changed, a chat connection and location registration processing unit for processing roaming of the user terminal in conjunction with the SIP Registrar / Proxy server Method of providing a multimedia video chat service.

The method of claim 7, wherein
Before the first step, at the request of the first or the second terminal further comprises the step of establishing a chat connection between the first terminal and the second terminal in the MSRP Relay server. .

The method of claim 1,
After the fifth step, a word or keyword is separated from a text message input to the chat window of the user terminal and the image or flash file corresponding to the separated word or keyword is searched and displayed on the screen of the user terminal together with the text message. Method of providing a multimedia content synthesis video chat service further comprising the step of playing.

A system for providing a communication service including a messaging service and providing a multimedia content synthesis video chat service to at least one of a first terminal and a second terminal while performing a communication service between a first terminal and a second terminal.
A signal processing server for routing a Session Initiation Protocol (SIP) message between the first terminal and the second terminal;
A Message Session Relay Protocol (MSRP) Relay server for establishing a chat connection or a message session between the first terminal and the second terminal; And
Content management server for transmitting the multimedia content data corresponding to a specific keyword to the user terminal at the request of at least one user terminal of the first terminal and the second terminal.
Including,
And the user terminal plays the multimedia content data in the form of an image or a video while outputting a voice message input through a microphone or received through a network and including the keyword.

The method of claim 10,
And the content management server transmits a keyword list for the keyword and a list of reserved content information corresponding to each keyword of the keyword list to the user terminal at the request of the user terminal.

The method of claim 10,
The content management server includes a content client interworking unit connected to a content management server interworking unit of a user terminal; A content upload processor connected to the content client interworking unit to transmit content to a user terminal; And a content database storing and managing multimedia content corresponding to the word or keyword.

The method of claim 10,
The signal processing server includes a Session Initiation Protocol (SIP) Registrar server and a SIP Proxy server connected to the SIP Registrar server, wherein the SIP Proxy server is connected to a chat connection and a location registration processing unit of the user terminal. System for providing a multimedia content synthesis video chat service that processes roaming of the user terminal when a location is changed.

The method of claim 10,
The user terminal separates a word or keyword from a text message input into a chat window or received through a network, searches for an image or flash file corresponding to the separated word or keyword, and displays the text message on the screen of the user terminal together with the text message. A system for providing a multimedia video chat service for playing multimedia content.