WO2025047334A1

WO2025047334A1 - Video processing system, video processing method, and program

Info

Publication number: WO2025047334A1
Application number: PCT/JP2024/028068
Authority: WO
Inventors: 拓也森田; 哲夫金子
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2023-08-25
Filing date: 2024-08-06
Publication date: 2025-03-06
Anticipated expiration: 2026-02-25

Abstract

The present disclosure provides a video processing system, a video processing method, and a program with which it is possible to generate a composite video having a higher degree of completion. At the time of imaging, a first video processing unit generates and displays a simple composite video by performing first video processing for simply synthesizing CG with video and creates a metafile including meta-information used in synthesizing the CG with the video. After imaging, a second video processing unit generates a highly accurate composite video by performing second video processing for synthesizing CG with video at high accuracy using a background in which the simple composite video generated by using the metafile is displayed. The present technology can be applied, for example, to a video processing system that synthesizes two-dimensional or three-dimensional CG with video.

Description

Image processing system, image processing method, and program

　本開示は、映像処理システムおよび映像処理方法、並びにプログラムに関し、特に、より完成度の高い合成映像を生成することができるようにした映像処理システムおよび映像処理方法、並びにプログラムに関する。 This disclosure relates to an image processing system, an image processing method, and a program, and in particular to an image processing system, an image processing method, and a program that are capable of generating a more complete composite image.

　従来、例えば、映像の動きに合わせて２次元または３次元のＣＧ（Computer Graphics）を合成するために、映像内の特徴点の動きを時間方向でトラッキングする技術が開発されている。 Conventionally, for example, technology has been developed that tracks the movement of feature points in an image over time in order to synthesize two-dimensional or three-dimensional CG (Computer Graphics) in accordance with the movement of the image.

　例えば、特許文献１には、３次元ＣＧをリアルタイムに合成するために、予め撮像した動画像データから特徴点のフレーム間トラッキングにより画像上の位置を特定する合成方法が開示されている。 For example, Patent Document 1 discloses a synthesis method for synthesizing 3D CG in real time by identifying the positions of feature points on images from pre-captured video data through inter-frame tracking.

特開２００８－１５５７６号公報JP 2008-15576 A

　しかしながら、従来、撮影時には、ＣＧが合成されていない映像を確認しながら撮影を行うことになるため、ユーザは、ＣＧが合成された合成映像のイメージを想像しながら撮影を行うことになる。そのため、後段の映像処理でＣＧが合成された合成映像では、ユーザのイメージとは異なるカメラワークで撮影されていることがあり、合成映像の完成度が低下してしまうことになっていた。 However, conventionally, when shooting, users would have to check the image that had not been composited with CG while shooting, and would have to imagine what the composite image would look like with CG composited in. As a result, the composite image that has been composited with CG in the later image processing stage may have been shot with a camerawork that differs from the user's imagination, reducing the quality of the composite image.

　本開示は、このような状況に鑑みてなされたものであり、より完成度の高い合成映像を生成することができるようにするものである。 This disclosure was made in light of these circumstances, and makes it possible to generate more complete composite images.

　本開示の一側面の映像処理システムは、撮影時に、映像に対してＣＧを簡易的に合成する第１の映像処理を行うことで簡易合成映像を生成して表示させ、前記映像に対して前記ＣＧを合成するのに用いられたメタ情報を含むメタファイルを作成する第１の映像処理部と、撮影後、前記メタファイルを用いて生成される前記簡易合成映像が表示されるバックグラウンドで、前記映像に対してＣＧを高精度に合成する第２の映像処理を行うことで高精度な合成映像を生成する第２の映像処理部とを備える。 The image processing system of one aspect of the present disclosure includes a first image processing unit that performs a first image processing for simply synthesizing CG with the image during shooting to generate and display a simplified composite image and creates a metafile including meta information used to synthesize the CG with the image, and a second image processing unit that performs a second image processing for synthesizing CG with high precision with the image after shooting in the background where the simplified composite image generated using the metafile is displayed, to generate a highly accurate composite image.

　本開示の一側面の映像処理方法またはプログラムは、撮影時に、映像に対してＣＧを簡易的に合成する第１の映像処理を行うことで簡易合成映像を生成して表示させ、前記映像に対して前記ＣＧを合成するのに用いられたメタ情報を含むメタファイルを作成することと、撮影後、前記メタファイルを用いて生成される前記簡易合成映像が表示されるバックグラウンドで、前記映像に対してＣＧを高精度に合成する第２の映像処理を行うことで高精度な合成映像を生成することとを含む。 An image processing method or program according to one aspect of the present disclosure includes performing a first image processing step during shooting to simply combine CG with the image to generate and display a simplified composite image, creating a metafile containing meta information used to combine the CG with the image, and performing a second image processing step after shooting to highly accurately combine CG with the image in the background while the simplified composite image generated using the metafile is displayed, to generate a highly accurate composite image.

　本開示の一側面においては、撮影時に、映像に対してＣＧを簡易的に合成する第１の映像処理を行うことで簡易合成映像が生成されて表示され、その映像に対してＣＧを合成するのに用いられたメタ情報を含むメタファイルが作成され、撮影後、メタファイルを用いて生成される簡易合成映像が表示されるバックグラウンドで、映像に対してＣＧを高精度に合成する第２の映像処理を行うことで高精度な合成映像が生成される。 In one aspect of the present disclosure, when shooting, a first image processing is performed to simply composite CG onto the image, a simplified composite image is generated and displayed, a metafile is created containing meta information used to composite the CG onto the image, and after shooting, a highly accurate composite image is generated by performing a second image processing to highly accurately composite CG onto the image in the background while the simplified composite image generated using the metafile is displayed.

本技術を適用した映像処理システムの利用例について説明する図である。FIG. 1 is a diagram illustrating a usage example of a video processing system to which the present technology is applied. 映像処理システムの一実施の形態の構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of an embodiment of a video processing system; 撮影時において簡易合成映像を生成する映像処理を説明するフローチャートである。11 is a flowchart illustrating image processing for generating a simplified composite image during shooting. 本線映像処理時において高精度な合成映像を生成する映像処理を説明するフローチャートである。10 is a flowchart illustrating image processing for generating a highly accurate composite image during main line image processing. 映像処理システムにおいて2DCGを用いた利用例について説明する図である。FIG. 1 is a diagram illustrating an example of a use of 2DCG in a video processing system. 撮影時において映像処理の実行を指示するＵＩイメージの一例を示す図である。FIG. 13 is a diagram showing an example of a UI image for instructing execution of video processing during shooting. 本線映像処理時において映像処理の実行を指示するＵＩイメージの一例を示す図である。13 is a diagram showing an example of a UI image for instructing execution of video processing during main line video processing; FIG. 本技術を適用したコンピュータの一実施の形態の構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of an embodiment of a computer to which the present technology is applied.

　以下、本技術を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。 Below, specific embodiments of the application of this technology will be described in detail with reference to the drawings.

　＜映像処理システムの利用例＞
　図１を参照して、本技術を適用した映像処理システムの利用例について説明する。 <Examples of using the video processing system>
An example of use of a video processing system to which the present technology is applied will be described with reference to FIG.

　図１の上側には、映像処理システム１１を利用した撮影時の一例が示されており、カメラ２１が、被写体１２の撮影を行いながら、その撮影により取得される映像をスマートフォン２２へ送信する。カメラ２１からスマートフォン２２への映像の送信には、HDMI（High-Definition Multimedia Interface）（登録商標）や、SDI（Serial Digital Interface）、USB（Universal Serial Bus）（登録商標）、無線LAN（Local Area Network）などの各種の通信規格を使用することができる。そして、スマートフォン２２は、カメラ２１により撮影された映像に映されている被写体１２に3DCGモデルを簡易的に合成する映像処理（トラッキング、合成処理、およびレンダリング）を行って、その結果として得られる簡易合成映像を表示する。 The upper part of Figure 1 shows an example of shooting using the image processing system 11, where the camera 21 shoots an image of the subject 12 while transmitting the image acquired by the shooting to the smartphone 22. Various communication standards such as HDMI (High-Definition Multimedia Interface) (registered trademark), SDI (Serial Digital Interface), USB (Universal Serial Bus) (registered trademark), and wireless LAN (Local Area Network) can be used to transmit the image from the camera 21 to the smartphone 22. The smartphone 22 then performs image processing (tracking, compositing, and rendering) to simply composite a 3DCG model onto the subject 12 shown in the image captured by the camera 21, and displays the resulting simplified composite image.

　従って、映像処理システム１１を利用することで、ユーザは、撮影時において、被写体１２に3DCGモデルが簡易的に合成された簡易合成映像をスマートフォン２２でリアルタイムに確認することができる。これにより、ユーザは、被写体１２への3DCGモデルの合成を見据えたフレーミングとなるようなカメラワークで撮影を行うことができる。また、ユーザは、撮影した映像を見返す際に、被写体１２に3DCGモデルが簡易的に合成された簡易合成映像を確認することができるので、被写体１２を再撮影する必要があるか否かという検討などを、その時点で行うことができる。 Therefore, by using the video processing system 11, the user can check in real time on the smartphone 22 a simply composite image in which a 3DCG model is simply composited onto the subject 12 during shooting. This allows the user to shoot with camerawork that allows for framing that takes into account the composition of the 3DCG model onto the subject 12. Furthermore, when reviewing the shot video, the user can check the simply composite image in which a 3DCG model is simply composited onto the subject 12, so that the user can consider at that point whether or not it is necessary to re-shoot the subject 12.

　そして、映像処理システム１１では、カメラ２１による被写体１２の撮影が終了した後、パーソナルコンピュータ２３を用いた映像処理を行うことができる。即ち、撮影時における映像処理では3DCGモデルが簡易的に合成されているだけであり、より高精度に3DCGモデルを合成する必要がある。 In the image processing system 11, after the camera 21 has finished capturing the image of the subject 12, the image processing can be performed using the personal computer 23. In other words, the image processing during capture only results in a simple synthesis of the 3DCG model, and it is necessary to synthesize the 3DCG model with higher accuracy.

　図１の下側には、映像処理システム１１を利用した本線映像処理時の一例が示されており、パーソナルコンピュータ２３が、カメラ２１から記録メディア２４を介して本線映像ファイルを取得するとともに、スマートフォン２２と通信を行うことによって3DCGメタファイルを取得する。本線映像ファイルは、3DCGが合成されていない映像が含まれるファイルであり、3DCGメタファイルは、スマートフォン２２において行われた簡易的な映像処理で被写体１２に3DCGモデルを合成するのに用いられた各種のメタ情報が含まれるファイルである。そして、パーソナルコンピュータ２３は、3DCGメタファイルを用いて簡易合成映像を再生しながら、そのバックグラウンドで、被写体１２に3DCGモデルを高精度に合成する映像処理を行って、より高品位な合成映像を生成する。 The lower part of Figure 1 shows an example of main line video processing using the video processing system 11, where the personal computer 23 acquires a main line video file from the camera 21 via the recording medium 24 and acquires a 3DCG metafile by communicating with the smartphone 22. The main line video file is a file that contains video that has not been composited with 3DCG, and the 3DCG metafile is a file that contains various meta information used to composite a 3DCG model onto the subject 12 through simple video processing performed by the smartphone 22. The personal computer 23 then plays back the simple composite video using the 3DCG metafile, while in the background performing video processing to composite a 3DCG model onto the subject 12 with high precision, generating a higher quality composite video.

　このように、映像処理システム１１を利用することで、ユーザは、本線映像処理時において、カメラ２１による撮影が終了した後すぐに、より完成度の高い合成映像をパーソナルコンピュータ２３に生成させことができる。従って、ユーザは、より迅速に、映像の編集やリリースなどに取り掛かることができる。 In this way, by using the video processing system 11, the user can have the personal computer 23 generate a more complete composite video immediately after the camera 21 finishes capturing images during main line video processing. This allows the user to more quickly begin editing and releasing the video.

　ここで、図１では、被写体１２に3DCGモデルを合成する映像処理について説明したが、本技術は、被写体１２に3DCGモデルを合成する映像処理に限定されることはない。例えば、本技術は、映像内の被写体以外の空間に3DCGモデルを合成する映像処理にも適用することができる。即ち、机や椅子だけが置いてある部屋を撮影した映像内の空間に、冷蔵庫やベッドなどの3DCGモデルを合成する映像処理を行うことができる。そして、本技術では、このような映像処理を行う際に、カメラ２１をトラッキングすることで、カメラ２１の動きに合わせて、3DCGモデルの位置や向きなども映像内の空間に合わせて動くように合成することができる。 Here, in FIG. 1, the image processing for superimposing a 3DCG model on the subject 12 has been described, but the present technology is not limited to image processing for superimposing a 3DCG model on the subject 12. For example, the present technology can also be applied to image processing for superimposing a 3DCG model on a space other than the subject in the image. That is, image processing can be performed to superimpose a 3DCG model of a refrigerator, bed, etc. on the space in an image of a room containing only a desk and chairs. Furthermore, when performing such image processing, the present technology tracks the camera 21, and the position and orientation of the 3DCG model can be superimposed so that it moves in accordance with the movement of the camera 21 and matches the space in the image.

　なお、図１に示すように、スマートフォン２２をカメラ２１に固定して映像処理システム１１を利用する場合、スマートフォン２２は、カメラ２１が備える６軸センサの出力を用いて簡易的な映像処理を行ってもよいし、スマートフォン２２が備える６軸センサの出力を用いて簡易的な映像処理を行ってもよい。一方、スマートフォン２２をカメラ２１に固定せずに映像処理システム１１を利用する場合、カメラ２１が備える６軸センサの出力をスマートフォン２２に供給することが必要となり、スマートフォン２２は、カメラ２１の６軸センサの出力を用いて簡易的な映像処理を行う。 As shown in FIG. 1, when the image processing system 11 is used with the smartphone 22 fixed to the camera 21, the smartphone 22 may perform simple image processing using the output of the six-axis sensor provided in the camera 21, or may perform simple image processing using the output of the six-axis sensor provided in the smartphone 22. On the other hand, when the image processing system 11 is used without fixing the smartphone 22 to the camera 21, it becomes necessary to supply the output of the six-axis sensor provided in the camera 21 to the smartphone 22, and the smartphone 22 performs simple image processing using the output of the six-axis sensor of the camera 21.

　＜映像処理システムの構成例＞
　図２は、本技術を適用した映像処理システムの一実施の形態の構成例を示すブロック図である。 <Example of video processing system configuration>
FIG. 2 is a block diagram showing an example of the configuration of an embodiment of a video processing system to which the present technology is applied.

　図２に示すように、映像処理システム１１は、カメラ２１、スマートフォン２２、およびパーソナルコンピュータ２３を備え、モデル記憶部２５からスマートフォン２２およびパーソナルコンピュータ２３に3DCGモデルがインポートされるように構成される。 As shown in FIG. 2, the video processing system 11 includes a camera 21, a smartphone 22, and a personal computer 23, and is configured to import a 3DCG model from a model storage unit 25 to the smartphone 22 and the personal computer 23.

　カメラ２１は、姿勢認識部３１、撮影部３２、タイムコード生成部３３、データ送信部３４、表示部３５、および記録メディアドライブ３６を備えて構成される。 The camera 21 is configured with a posture recognition unit 31, a photographing unit 32, a time code generation unit 33, a data transmission unit 34, a display unit 35, and a recording media drive 36.

　姿勢認識部３１は、カメラ２１の姿勢を認識し、その姿勢を示す姿勢情報を逐次、データ送信部３４に供給する。例えば、姿勢認識部３１は、カメラ２１が備える６軸センサによって検出される６軸の加速度または角速度を示す６軸データに基づいて、カメラ２１の姿勢を認識することができる。 The attitude recognition unit 31 recognizes the attitude of the camera 21 and sequentially supplies attitude information indicating the attitude to the data transmission unit 34. For example, the attitude recognition unit 31 can recognize the attitude of the camera 21 based on six-axis data indicating six-axis acceleration or angular velocity detected by a six-axis sensor equipped in the camera 21.

　撮影部３２は、例えば、CMOS（Complementary Metal-Oxide-Semiconductor）イメージセンサなどの撮像素子によって被写体１２を撮影し、その撮影によって取得された映像を、データ送信部３４、表示部３５、および記録メディアドライブ３６に供給する。 The image capturing unit 32 captures an image of the subject 12 using an image capturing element such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor, and supplies the image captured by the capture to the data transmission unit 34, the display unit 35, and the recording media drive 36.

　タイムコード生成部３３は、撮影部３２により撮影された映像のフレームレートに応じたタイムコードを生成して、データ送信部３４および記録メディアドライブ３６に供給する。 The time code generation unit 33 generates a time code according to the frame rate of the video captured by the shooting unit 32 and supplies it to the data transmission unit 34 and the recording media drive 36.

　データ送信部３４は、撮影部３２から供給される映像のフレームごとに、タイムコード生成部３３から供給されるタイムコードを付加する。さらに、データ送信部３４は、撮影部３２から供給される映像のフレームごとに、姿勢認識部３１から逐次供給されるカメラ２１の姿勢情報のうち、ぞれぞれのフレームに対応するタイミングで得られた姿勢情報を付加する。そして、データ送信部３４は、タイムコードとカメラ２１の姿勢情報とが付加された映像をスマートフォン２２に順次送信する。 The data transmission unit 34 adds a time code supplied from the time code generation unit 33 to each frame of the video supplied from the image capture unit 32. Furthermore, the data transmission unit 34 adds posture information obtained at the timing corresponding to each frame from the posture information of the camera 21 sequentially supplied from the posture recognition unit 31 to each frame of the video supplied from the image capture unit 32. The data transmission unit 34 then sequentially transmits the video to which the time code and posture information of the camera 21 have been added to the smartphone 22.

　表示部３５は、撮影部３２により撮影中のリアルタイムの映像であるライブビュー映像を表示する。 The display unit 35 displays a live view image, which is a real-time image being captured by the image capture unit 32.

　記録メディアドライブ３６は、撮影部３２により撮影された映像のフレームごとに、タイムコード生成部３３から供給されるタイムコードを付加し、その映像が含まれる本線映像ファイルを、記録メディア２４（図１）に記録する。また、本線映像ファイルには、必要に応じて、撮影部３２により撮影された映像の解像度を低下させたプロキシファイルが含まれるようにすることができる。なお、カメラ２１の姿勢情報が本線映像ファイルに含まれるようにしてもよく、例えば、後述する3DCGメタファイルにカメラ２１またはスマートフォン２２の姿勢情報が含まれている場合、カメラ２１の姿勢情報が本線映像ファイルに含まれていなくてもよい。 The recording media drive 36 adds a time code supplied from the time code generation unit 33 to each frame of the video captured by the imaging unit 32, and records the main line video file containing that video on the recording media 24 (Figure 1). The main line video file can also contain a proxy file that reduces the resolution of the video captured by the imaging unit 32, as necessary. Note that the main line video file may contain orientation information of the camera 21. For example, if the 3DCG metafile described below contains orientation information of the camera 21 or smartphone 22, the main line video file does not need to contain orientation information of the camera 21.

　スマートフォン２２は、姿勢認識部４１、簡易トラッキング部４２、簡易合成処理部４３、簡易レンダリング部４４、表示部４５、およびメタファイル作成部４６を備えて構成される。 The smartphone 22 is configured with a posture recognition unit 41, a simple tracking unit 42, a simple synthesis processing unit 43, a simple rendering unit 44, a display unit 45, and a metafile creation unit 46.

　姿勢認識部４１は、スマートフォン２２の姿勢を認識し、その姿勢を示す姿勢情報を逐次、簡易トラッキング部４２およびメタファイル作成部４６に供給する。例えば、姿勢認識部４１は、スマートフォン２２が備える６軸センサによって検出される６軸の加速度または角速度を示す６軸データに基づいてスマートフォン２２の姿勢を認識することができる。 The attitude recognition unit 41 recognizes the attitude of the smartphone 22, and sequentially supplies attitude information indicating the attitude to the simplified tracking unit 42 and the metafile creation unit 46. For example, the attitude recognition unit 41 can recognize the attitude of the smartphone 22 based on six-axis data indicating six-axis acceleration or angular velocity detected by a six-axis sensor provided in the smartphone 22.

　簡易トラッキング部４２は、姿勢認識部４１から供給される姿勢情報や、映像（被写体１２や、被写体１２以外の風景など映像内の全て）から抽出される特徴点などに基づいて、カメラ２１の自己位置を簡易的にトラッキングする。例えば、スマートフォン２２をカメラ２１に固定して映像処理システム１１を利用する場合（図１参照）、簡易トラッキング部４２は、姿勢認識部４１から供給されるスマートフォン２２の姿勢情報、および、カメラ２１から送信されてくる映像に付加されたカメラ２１の姿勢情報を用いることができる。一方、スマートフォン２２をカメラ２１に固定せずに映像処理システム１１を利用する場合、簡易トラッキング部４２は、カメラ２１から送信されてくる映像に付加されたカメラ２１の姿勢情報のみを用いることができる。そして、簡易トラッキング部４２は、カメラ２１の自己位置を簡易的にトラッキングした結果を示すトラッキング情報（例えば、カメラ２１の簡易的な位置や向き、姿勢などを示す情報）を、簡易合成処理部４３に供給する。 The simplified tracking unit 42 simply tracks the self-position of the camera 21 based on the posture information provided by the posture recognition unit 41 and feature points extracted from the video (the subject 12, the scenery other than the subject 12, and everything in the video). For example, when the smartphone 22 is fixed to the camera 21 and the video processing system 11 is used (see FIG. 1), the simplified tracking unit 42 can use the posture information of the smartphone 22 provided by the posture recognition unit 41 and the posture information of the camera 21 added to the video transmitted from the camera 21. On the other hand, when the video processing system 11 is used without fixing the smartphone 22 to the camera 21, the simplified tracking unit 42 can only use the posture information of the camera 21 added to the video transmitted from the camera 21. The simplified tracking unit 42 then provides tracking information (e.g., information indicating the simplified position, orientation, posture, etc. of the camera 21) indicating the result of the simplified tracking of the self-position of the camera 21 to the simplified synthesis processing unit 43.

　簡易合成処理部４３は、簡易トラッキング部４２から供給されるトラッキング情報に基づいて、カメラ２１から送信されてくる映像に対して、モデル記憶部２５からインポートされた3DCGモデルを簡易的に合成する合成処理を行う。例えば、簡易合成処理部４３は、映像内の風景、および、被写体１２の位置や向き、姿勢などに対応するように、映像に対して合成される3DCGモデルの簡易的な位置や向き、姿勢などを決定する。そして、簡易合成処理部４３は、3DCGモデルの簡易的な位置や向き、姿勢などを示す簡易3DCGモデル合成情報を生成して簡易レンダリング部４４に供給する。 The simple synthesis processing unit 43 performs synthesis processing to simply synthesize the 3DCG model imported from the model storage unit 25 onto the video transmitted from the camera 21 based on the tracking information supplied from the simple tracking unit 42. For example, the simple synthesis processing unit 43 determines the simplified position, orientation, posture, etc. of the 3DCG model to be synthesized onto the video so as to correspond to the scenery in the video and the position, orientation, posture, etc. of the subject 12. The simple synthesis processing unit 43 then generates simple 3DCG model synthesis information indicating the simplified position, orientation, posture, etc. of the 3DCG model and supplies it to the simple rendering unit 44.

　簡易レンダリング部４４は、簡易合成処理部４３から供給される簡易3DCGモデル合成情報に基づいて3DCGモデルを配置し、トラッキング情報に基づいてカメラ２１に対応する仮想カメラを設定して、その仮想カメラを用いて3DCGモデルを簡易的にレンダリングする。そして、簡易レンダリング部４４は、レンダリングによって得られる3DCGモデルの画像を、カメラ２１から送信されてくる映像の１フレームごとに重畳させることで簡易合成映像を生成し、表示部４５に供給する。 The simple rendering unit 44 positions the 3DCG model based on the simple 3DCG model synthesis information supplied from the simple synthesis processing unit 43, sets a virtual camera corresponding to the camera 21 based on the tracking information, and uses the virtual camera to simply render the 3DCG model. The simple rendering unit 44 then generates a simple synthesis image by superimposing the image of the 3DCG model obtained by rendering on each frame of the image transmitted from the camera 21, and supplies this to the display unit 45.

　表示部４５は、簡易レンダリング部４４から供給される簡易合成映像、即ち、カメラ２１によって撮影された映像に対して、カメラ２１に対応する仮想カメラでレンダリングされた3DCGモデルの画像が簡易的に合成された映像を表示する。 The display unit 45 displays the simplified composite image supplied from the simplified rendering unit 44, i.e., an image obtained by simply combining the image captured by the camera 21 with an image of a 3DCG model rendered by a virtual camera corresponding to the camera 21.

　メタファイル作成部４６は、スマートフォン２２において行われた簡易的な映像処理で映像に3DCGモデルを合成するのに用いられた各種のメタ情報が含まれる3DCGメタファイルを作成し、パーソナルコンピュータ２３に送信する。例えば、3DCGメタファイルには、１フレームごとにタイムコードが付加された簡易3DCGモデル合成情報（3DCGモデルの簡易的な位置情報を示すXYZ座標）、および、タイムコードが付加された姿勢情報（XYZ座標）が少なくとも含まれる。また、3DCGメタファイルには、3DCGモデルのモデル名や、撮影起点時における3DCGモデルの簡易的な位置情報（XYZ座標）、ズーム／フォーカス／アイリス（＋ゲイン／ホワイトバランス）情報などを含むことができる。 The metafile creation unit 46 creates a 3DCG metafile that includes various meta information used to synthesize a 3DCG model into an image through simple image processing performed on the smartphone 22, and transmits the created 3DCG metafile to the personal computer 23. For example, the 3DCG metafile includes at least simple 3DCG model synthesis information (XYZ coordinates indicating simple position information of the 3DCG model) with a time code added for each frame, and posture information (XYZ coordinates) with a time code added. The 3DCG metafile can also include the model name of the 3DCG model, simple position information (XYZ coordinates) of the 3DCG model at the time of shooting start point, zoom/focus/iris (plus gain/white balance) information, etc.

　パーソナルコンピュータ２３は、記録メディアドライブ５１、同期処理部５２、トラッキング部５３、合成処理部５４、レンダリング部５５、および表示部５６を備えて構成される。 The personal computer 23 is configured with a recording media drive 51, a synchronization processing unit 52, a tracking unit 53, a synthesis processing unit 54, a rendering unit 55, and a display unit 56.

　記録メディアドライブ５１は、カメラ２１の記録メディアドライブ３６によって本線映像ファイルが記録された記録メディア２４（図１）がセットされると、記録メディア２４から本線映像ファイルを読み出して、同期処理部５２に供給する。 When the recording media 24 (FIG. 1) on which the main line video file is recorded is set by the recording media drive 36 of the camera 21, the recording media drive 51 reads the main line video file from the recording media 24 and supplies it to the synchronization processing unit 52.

　同期処理部５２は、スマートフォン２２から送信されてきた3DCGメタファイルと、記録メディアドライブ５１から供給される本線映像ファイルに含まれている映像とを、それぞれのタイムコードに従って同期させる同期処理を行う。そして、同期処理部５２は、同期が取られた本線映像ファイルをトラッキング部５３およびレンダリング部５５に供給し、同期が取られた3DCGメタファイルを合成処理部５４およびレンダリング部５５に供給する。 The synchronization processing unit 52 performs a synchronization process to synchronize the 3DCG metafile transmitted from the smartphone 22 with the video contained in the main line video file supplied from the recording media drive 51 according to their respective time codes. The synchronization processing unit 52 then supplies the synchronized main line video file to the tracking unit 53 and the rendering unit 55, and supplies the synchronized 3DCG metafile to the composition processing unit 54 and the rendering unit 55.

　トラッキング部５３は、同期処理部５２から供給される3DCGメタファイルに含まれている姿勢情報や、同期処理部５２から供給される本線映像ファイルに含まれている映像から抽出される特徴点などに基づいて、カメラ２１の自己位置を高精度にトラッキングする。そして、トラッキング部５３は、カメラ２１の自己位置を高精度にトラッキングした結果を示すトラッキング情報（例えば、カメラ２１の高精度な位置や向き、姿勢などを示す情報）を、合成処理部５４に供給する。 The tracking unit 53 tracks the self-position of the camera 21 with high accuracy based on the posture information contained in the 3DCG metafile supplied from the synchronization processing unit 52 and feature points extracted from the image contained in the main line image file supplied from the synchronization processing unit 52. The tracking unit 53 then supplies the synthesis processing unit 54 with tracking information indicating the results of tracking the self-position of the camera 21 with high accuracy (for example, information indicating the highly accurate position, orientation, posture, etc. of the camera 21).

　合成処理部５４は、トラッキング部５３から供給されるトラッキング情報に基づいて、同期処理部５２から供給される本線映像ファイルに含まれている映像に対して、モデル記憶部２５からインポートされた3DCGモデルを高精度に合成する合成処理を行う。例えば、合成処理部５４は、映像内の風景、および、被写体１２の位置や向き、姿勢などに対応するように、映像に対して合成される3DCGモデルの高精度な位置や向き、姿勢などを決定する。そして、合成処理部５４は、3DCGモデルの高精度な位置や向き、姿勢などを示す3DCGモデル合成情報を生成してレンダリング部５５に供給する。 The compositing processing unit 54 performs compositing processing to highly accurately combine the 3DCG model imported from the model storage unit 25 with the video contained in the main line video file supplied from the synchronization processing unit 52 based on the tracking information supplied from the tracking unit 53. For example, the compositing processing unit 54 determines the highly accurate position, orientation, posture, etc. of the 3DCG model to be composited with the video so as to correspond to the scenery in the video and the position, orientation, posture, etc. of the subject 12. The compositing processing unit 54 then generates 3DCG model composition information indicating the highly accurate position, orientation, posture, etc. of the 3DCG model and supplies it to the rendering unit 55.

　レンダリング部５５は、同期処理部５２から供給される3DCGメタファイルに含まれている簡易3DCGモデル合成情報を用いて、簡易レンダリング部４４と同様に簡易合成映像を生成し、表示部５６に供給して表示させる。このように、レンダリング部５５において生成された簡易合成映像が表示部５６に表示されるバックグラウンドで、トラッキング部５３によるトラッキング、および、合成処理部５４による合成処理が行われる。 The rendering unit 55 uses the simple 3DCG model synthesis information contained in the 3DCG metafile supplied from the synchronization processing unit 52 to generate a simple synthetic image in the same manner as the simple rendering unit 44, and supplies it to the display unit 56 for display. In this way, while the simple synthetic image generated in the rendering unit 55 is displayed in the background on the display unit 56, tracking is performed by the tracking unit 53, and synthesis processing is performed by the synthesis processing unit 54.

　さらに、簡易合成映像が表示部５６に表示されるバックグラウンドで、レンダリング部５５は、合成処理部５４から供給される3DCGモデル合成情報に基づいて3DCGモデルを配置し、トラッキング情報に基づいてカメラ２１に対応する仮想カメラを設定して、その仮想カメラを用いて3DCGモデルを高精度にレンダリングする。そして、レンダリング部５５は、レンダリングによって得られる3DCGモデルの画像を、同期処理部５２から供給される本線映像ファイルに含まれている映像の１フレームごとに重畳させることで高精度な合成映像を生成し、表示部５６に供給する。 Furthermore, in the background where the simplified composite image is displayed on the display unit 56, the rendering unit 55 positions the 3DCG model based on the 3DCG model composition information supplied from the composition processing unit 54, sets a virtual camera corresponding to the camera 21 based on the tracking information, and uses the virtual camera to render the 3DCG model with high precision. The rendering unit 55 then generates a highly accurate composite image by superimposing the image of the 3DCG model obtained by rendering on each frame of the image contained in the main line video file supplied from the synchronization processing unit 52, and supplies this to the display unit 56.

　表示部５６は、レンダリング部５５から供給される簡易合成映像を表示し、その後、簡易合成映像が表示部５６から高精度な合成映像が供給されると、その高精度な合成映像を表示する。 The display unit 56 displays the simplified composite image supplied from the rendering unit 55, and then when the simplified composite image is supplied with a high-precision composite image from the display unit 56, the display unit 56 displays the high-precision composite image.

　以上のように映像処理システム１１は構成されており、ユーザは、撮影時において、スマートフォン２２に表示される簡易合成映像を確認しながら、より適切なフレーミングとなるようなカメラワークで撮影を行うことができる。さらに、ユーザは、本線映像処理時において、パーソナルコンピュータ２３に表示される簡易合成映像を確認することで、合成映像のイメージをリアルに想像しながら、映像の編集作業を行うことができる。そして、パーソナルコンピュータ２３で簡易合成映像を表示させているバックグラウンドで、より高品位な合成映像を生成することができる。これにより、映像処理システム１１は、より完成度の高い合成映像を生成することができる。 The video processing system 11 is configured as described above, and when shooting, the user can check the simple composite image displayed on the smartphone 22 and shoot with camerawork that results in more appropriate framing. Furthermore, when processing main line video, the user can edit the video while realistically imagining the image of the composite image by checking the simple composite image displayed on the personal computer 23. Then, in the background while the simple composite image is being displayed on the personal computer 23, a higher quality composite image can be generated. This allows the video processing system 11 to generate a more complete composite image.

　＜映像処理の処理例＞
　図３および図４を参照して、映像処理システム１１において実行される映像処理の処理理について説明する。ここでは、スマートフォン２２の姿勢情報は用いられず、カメラ２１の姿勢情報が用いられる処理例について説明する。 <Example of video processing>
3 and 4, the image processing executed in the image processing system 11 will be described. Here, an example of processing will be described in which the orientation information of the camera 21 is used, not the orientation information of the smartphone 22.

　図３は、撮影時において簡易合成映像を生成する映像処理を説明するフローチャートである。例えば、ユーザが、カメラ２１およびスマートフォン２２を連携させた映像処理を行う操作を行うと処理が開始される。 FIG. 3 is a flowchart explaining the image processing for generating a simple composite image when shooting. For example, the processing starts when the user performs an operation for performing image processing in cooperation with the camera 21 and the smartphone 22.

　ステップＳ１１において、カメラ２１では、撮影部３２による被写体１２の撮影が開始され、姿勢認識部３１によるカメラ２１の姿勢の認識、および、タイムコード生成部３３によるタイムコードの生成が開始される。そして、データ送信部３４は、タイムコードとカメラ２１の姿勢情報とが付加された映像をスマートフォン２２に順次送信する。 In step S11, the imaging unit 32 of the camera 21 starts imaging the subject 12, the posture recognition unit 31 starts recognizing the posture of the camera 21, and the time code generation unit 33 starts generating a time code. The data transmission unit 34 then sequentially transmits the video with the time code and posture information of the camera 21 added to it to the smartphone 22.

　ステップＳ１２において、簡易トラッキング部４２は、ステップＳ１１で送信されてきた映像に付加されているカメラ２１の姿勢情報や、ステップＳ１１で送信されてきた映像から抽出される特徴点などに基づいて、カメラ２１の自己位置をトラッキングする。そして、簡易トラッキング部４２は、カメラ２１の自己位置を簡易的にトラッキングした結果を示すトラッキング情報を、簡易合成処理部４３に供給する。 In step S12, the simplified tracking unit 42 tracks the self-position of the camera 21 based on the attitude information of the camera 21 added to the image transmitted in step S11 and feature points extracted from the image transmitted in step S11. The simplified tracking unit 42 then supplies tracking information indicating the results of the simplified tracking of the self-position of the camera 21 to the simplified synthesis processing unit 43.

　ステップＳ１３において、簡易合成処理部４３は、ステップＳ１２で供給されるトラッキング情報に基づいて、ステップＳ１１で送信されてきた映像に対して3DCGモデルを簡易的に合成する合成処理を行い、簡易3DCGモデル合成情報を生成して簡易レンダリング部４４に供給する。 In step S13, the simple synthesis processing unit 43 performs synthesis processing to simply synthesize a 3DCG model onto the video transmitted in step S11 based on the tracking information supplied in step S12, and generates simple 3DCG model synthesis information and supplies it to the simple rendering unit 44.

　ステップＳ１４において、簡易レンダリング部４４は、ステップＳ１３で供給される簡易3DCGモデル合成情報に基づいて3DCGモデルを配置し、ステップＳ１２で供給されるトラッキング情報に基づいてカメラ２１に対応する仮想カメラを設定して、その仮想カメラを用いて3DCGモデルを簡易的にレンダリングする。そして、簡易レンダリング部４４は、レンダリングによって得られる3DCGモデルの画像を、カメラ２１から送信されてくる映像に重畳させることで簡易合成映像を生成し、表示部４５に供給する。 In step S14, the simple rendering unit 44 positions the 3DCG model based on the simple 3DCG model synthesis information supplied in step S13, sets a virtual camera corresponding to the camera 21 based on the tracking information supplied in step S12, and uses the virtual camera to simply render the 3DCG model. The simple rendering unit 44 then superimposes the image of the 3DCG model obtained by rendering onto the image transmitted from the camera 21 to generate a simple synthesized image and supplies it to the display unit 45.

　ステップＳ１５において、表示部４５は、ステップＳ１４で供給された簡易合成映像を表示する。 In step S15, the display unit 45 displays the simplified composite image provided in step S14.

　ステップＳ１６において、メタファイル作成部４６は、例えば、ステップＳ１１で送信されてきた映像に付加されているカメラ２１の姿勢情報や、ステップＳ１３で生成された簡易3DCGモデル合成情報などを少なくとも含み、それらにタイムコードが付加された3DCGメタファイルを作成する。 In step S16, the metafile creation unit 46 creates a 3DCG metafile that includes at least the posture information of the camera 21 that was added to the video transmitted in step S11 and the simple 3DCG model synthesis information generated in step S13, for example, and to which a time code has been added.

　ステップＳ１６の処理後、処理はステップＳ１１に戻り、例えば、ユーザによって映像処理を終了する操作が行われるまで、以下、同様の処理が繰り返して行われる。 After step S16, the process returns to step S11, and the same process is repeated until, for example, the user performs an operation to end the video processing.

　図４は、本線映像処理時において高精度な合成映像を生成する映像処理を説明するフローチャートである。例えば、ユーザが、本線映像ファイルが記録された記録メディア２４をパーソナルコンピュータ２３にセットし、スマートフォン２２およびパーソナルコンピュータ２３の通信を接続させた後、映像処理を開始させる操作を行うと処理が開始される。 FIG. 4 is a flowchart explaining the image processing for generating a highly accurate composite image during main line image processing. For example, the process is started when the user inserts the recording medium 24 on which the main line image file is recorded into the personal computer 23, connects the smartphone 22 and the personal computer 23 for communication, and then performs an operation to start image processing.

　ステップＳ２１において、スマートフォン２２では、メタファイル作成部４６が、図３のステップＳ１６で作成した3DCGメタファイルを、パーソナルコンピュータ２３へ送信する。 In step S21, the metafile creation unit 46 in the smartphone 22 transmits the 3DCG metafile created in step S16 of FIG. 3 to the personal computer 23.

　ステップＳ２２において、記録メディアドライブ５１は、記録メディア２４から本線映像ファイルを読み出して、同期処理部５２に供給する。 In step S22, the recording media drive 51 reads the main line video file from the recording media 24 and supplies it to the synchronization processing unit 52.

　ステップＳ２３において、同期処理部５２は、ステップＳ２１で送信されてきた3DCGメタファイルと、ステップＳ２２で供給された本線映像ファイルに含まれている映像とを、それぞれのタイムコードに従って同期させる同期処理を行う。そして、同期処理部５２は、同期が取られた本線映像ファイルをトラッキング部５３およびレンダリング部５５に供給し、同期が取られた3DCGメタファイルを合成処理部５４およびレンダリング部５５に供給する。 In step S23, the synchronization processing unit 52 performs a synchronization process to synchronize the 3DCG metafile transmitted in step S21 with the video contained in the main line video file supplied in step S22 according to their respective time codes. The synchronization processing unit 52 then supplies the synchronized main line video file to the tracking unit 53 and the rendering unit 55, and supplies the synchronized 3DCG metafile to the composition processing unit 54 and the rendering unit 55.

　ステップＳ２４において、レンダリング部５５は、ステップＳ２３で供給された3DCGメタファイルに含まれている簡易3DCGモデル合成情報を用いて、簡易レンダリング部４４と同様に簡易合成映像を生成し、表示部５６に供給して表示させる。 In step S24, the rendering unit 55 uses the simple 3DCG model synthesis information contained in the 3DCG metafile supplied in step S23 to generate a simple synthesis image in the same manner as the simple rendering unit 44, and supplies it to the display unit 56 for display.

　一方、ステップＳ２４の処理と並行して、ステップＳ２５乃至Ｓ２７の処理が行われる。 Meanwhile, steps S25 to S27 are carried out in parallel with step S24.

　ステップＳ２５において、トラッキング部５３は、ステップＳ２３で供給された3DCGメタファイルに含まれている姿勢情報や、ステップＳ２３で供給された本線映像ファイルに含まれている映像から抽出される特徴点などに基づいて、カメラ２１の自己位置を高精度にトラッキングする。そして、トラッキング部５３は、カメラ２１の自己位置を高精度にトラッキングした結果を示すトラッキング情報を、合成処理部５４に供給する。 In step S25, the tracking unit 53 tracks the self-position of the camera 21 with high accuracy based on the posture information included in the 3DCG metafile supplied in step S23 and feature points extracted from the image included in the main line image file supplied in step S23. The tracking unit 53 then supplies the synthesis processing unit 54 with tracking information indicating the results of tracking the self-position of the camera 21 with high accuracy.

　ステップＳ２６において、合成処理部５４は、ステップＳ２５で供給されたトラッキング情報に基づいて、ステップＳ２３で供給された本線映像ファイルに含まれている映像に対して3DCGモデルを高精度に合成する合成処理を行い、3DCGモデル合成情報を生成してレンダリング部５５に供給する。 In step S26, the synthesis processing unit 54 performs synthesis processing to synthesize a 3DCG model with high accuracy onto the image contained in the main line image file supplied in step S23 based on the tracking information supplied in step S25, and generates 3DCG model synthesis information and supplies it to the rendering unit 55.

　ステップＳ２７において、レンダリング部５５は、ステップＳ２６で供給された3DCGモデル合成情報に基づいて3DCGモデルを配置し、ステップＳ２５で供給されたトラッキング情報に基づいてカメラ２１に対応する仮想カメラを設定し、その仮想カメラを用いて3DCGモデルを高精度にレンダリングする。そして、レンダリング部５５は、レンダリングによって得られる3DCGモデルの画像を、同期処理部５２から供給される本線映像ファイルに含まれている映像の１フレームごとに重畳させることで高精度な合成映像を生成し、表示部５６に供給する。 In step S27, the rendering unit 55 positions the 3DCG model based on the 3DCG model synthesis information supplied in step S26, sets a virtual camera corresponding to camera 21 based on the tracking information supplied in step S25, and uses the virtual camera to render the 3DCG model with high precision. The rendering unit 55 then superimposes the image of the 3DCG model obtained by rendering onto each frame of the video included in the main line video file supplied from the synchronization processing unit 52 to generate a highly accurate synthesized video, and supplies it to the display unit 56.

　ステップＳ２８において、表示部５６は、ステップＳ２４で表示した簡易合成映像に替えて、ステップＳ２７で供給された高精度な合成映像を表示する。 In step S28, the display unit 56 displays the high-precision composite image provided in step S27, replacing the simplified composite image displayed in step S24.

　ステップＳ２８の処理後、処理はステップＳ２１に戻り、例えば、ユーザによって映像処理を終了させる操作が行われるまで、以下、同様の処理が繰り返して行われる。 After step S28, the process returns to step S21, and the same process is repeated until, for example, the user performs an operation to end the video processing.

　以上のような映像処理が実行されることで、映像処理システム１１では、撮影時に簡易合成映像を表示し、本線映像処理時に簡易合成映像を表示するバックグラウンドで合成映像を生成することで、より完成度の高い合成映像を生成することができる。 By performing the above-described image processing, the image processing system 11 can generate a more complete composite image by displaying a simplified composite image during shooting and generating a composite image in the background that displays the simplified composite image during main line image processing.

　＜映像処理システムの利用例＞
　図５を参照して、映像処理システムで２Ｄモデルを利用する利用例について説明する。 <Examples of using the video processing system>
An example of using a 2D model in a video processing system will be described with reference to FIG.

　上述の図１を参照して説明したように、映像処理システム１１は、撮影時および本線映像処理時において、映像に映されている被写体に対して3DCGモデルを合成することができる。さらに、映像処理システム１１は、3DCGモデルと同様に、被写体に対して2Dモデルを合成することができる。 As described above with reference to FIG. 1, the image processing system 11 can synthesize a 3DCG model for a subject shown in an image when shooting and processing main line images. Furthermore, the image processing system 11 can synthesize a 2D model for a subject in the same way as a 3DCG model.

　図５の上側に示すように、カメラ２１が、被写体１３の撮影を行いながら、その撮影により取得される映像をスマートフォン２２へ送信し、スマートフォン２２が、その映像に映されている被写体１３に2Dモデルを簡易的に合成する映像処理を行って、その結果として得られる簡易合成映像を表示する。なお、図５に示す例では、2Dモデルとしてモザイク画像が図示されているが、その他、スタンプなどの画像を2Dモデルとして利用することができる。 As shown in the upper part of Figure 5, camera 21 photographs subject 13 while transmitting the image acquired by the photograph to smartphone 22, which then performs image processing to simply combine a 2D model with subject 13 shown in the image, and displays the resulting simplified composite image. Note that in the example shown in Figure 5, a mosaic image is shown as the 2D model, but other images such as stamps can also be used as the 2D model.

　その後、図５の下側に示すように、パーソナルコンピュータ２３が、カメラ２１から記録メディア２４を介して本線映像ファイルを取得するとともに、スマートフォン２２と通信を行うことによって2Dメタファイルを取得する。2Dメタファイルは、スマートフォン２２において行われた簡易的な映像処理で被写体１２に2Dモデルを合成するのに用いられた各種のメタ情報が含まれるファイルである。そして、パーソナルコンピュータ２３は、2Dメタファイルを用いて簡易合成映像を再生しながら、そのバックグラウンドで、被写体１２に2Dモデルを高精度に合成する映像処理を行って、より高品位な合成映像を生成する。もちろん、被写体１３に2Dモデルを合成する他、映像内の被写体１３以外の空間に2Dモデルを合成してもよい。 Then, as shown in the lower part of FIG. 5, the personal computer 23 acquires the main line video file from the camera 21 via the recording medium 24, and acquires a 2D metafile by communicating with the smartphone 22. The 2D metafile is a file that contains various meta information used to superimpose a 2D model onto the subject 12 through simple video processing performed by the smartphone 22. The personal computer 23 then plays back the simple composite video using the 2D metafile, while in the background performing video processing to superimpose the 2D model onto the subject 12 with high precision, generating a higher quality composite video. Of course, in addition to superimposing a 2D model onto the subject 13, a 2D model may also be superimposed into a space other than the subject 13 in the video.

　このように、映像処理システム１１を利用することで、図１を参照して上述した3DCGモデルを利用する利用例と同様に、より完成度の高い合成映像をパーソナルコンピュータ２３に生成させることができる。 In this way, by using the video processing system 11, it is possible to generate a more complete composite image on the personal computer 23, similar to the example of using the 3DCG model described above with reference to FIG. 1.

　例えば、2Dメタファイルには、１フレームごとにタイムコードが付加された簡易2Dモデル合成情報（2Dモデルの簡易的な位置情報を示すXYZ座標）が少なくとも含まれる。また、2Dメタファイルには、2Dモデルのモデル名や、ズーム／フォーカス／アイリス（＋ゲイン／ホワイトバランス）情報などを含むことができる。なお、2Dモデルは、カメラ２１の動きに合わせて向きを変更させる必要はない。 For example, the 2D metafile contains at least simple 2D model synthesis information (XYZ coordinates indicating simple position information of the 2D model) with a time code added for each frame. The 2D metafile can also contain the model name of the 2D model, zoom/focus/iris (+gain/white balance) information, etc. It is not necessary to change the orientation of the 2D model in accordance with the movement of the camera 21.

　＜ＵＩイメージの表示例＞
　図６および図７を参照して、映像処理システム１１を利用した映像処理の実行を指示するＵＩ（User Interface）イメージについて説明する。 <UI image display example>
A UI (User Interface) image for instructing the execution of video processing using the video processing system 11 will be described with reference to FIGS. 6 and 7. FIG.

　図６は、撮影時において映像処理の実行を指示するＵＩイメージの一例を示す図である。 Figure 6 shows an example of a UI image that instructs the execution of video processing during shooting.

　図６のＡには、カメラ２１の表示部３５に表示されるＵＩイメージが示されており、「簡易合成を行うために姿勢情報および映像を連携しますか？」というメッセージとともに、「Ｙｅｓ」のGUI（Graphical User Interface）ボタンおよび「Ｎｏ」のGUIボタンが表示されている。図６のＢには、スマートフォン２２の表示部４５に表示されるＵＩイメージが示されており、「簡易合成を行うために姿勢情報、映像、および2D/3DCGモデルを受信しますか？」というメッセージとともに、「Ｙｅｓ」のGUIボタンおよび「Ｎｏ」のGUIボタンが表示されている。 A in FIG. 6 shows a UI image displayed on the display unit 35 of the camera 21, displaying a "Yes" GUI (Graphical User Interface) button and a "No" GUI button along with the message "Do you want to link posture information and images to perform simple composition?". B in FIG. 6 shows a UI image displayed on the display unit 45 of the smartphone 22, displaying a "Yes" GUI button and a "No" GUI button along with the message "Do you want to receive posture information, images, and 2D/3DCG models to perform simple composition?".

　例えば、ユーザが、図示するようなＵＩイメージを表示部３５および表示部４５に表示させ、それぞれの「Ｙｅｓ」のGUIボタンに対するユーザ操作を行うと、カメラ２１からスマートフォン２２へ姿勢情報および映像の送信が開始される。そして、スマートフォン２２では、姿勢情報および映像の受信が開始され、モデル記憶部２５から2D/3DCGモデルのインポートが行われる。なお、2D/3DCGモデルのインポートは事前に行われるようにしてもよい。 For example, when a user causes a UI image as shown in the figure to be displayed on display unit 35 and display unit 45, and performs a user operation on each of the "Yes" GUI buttons, transmission of posture information and video from camera 21 to smartphone 22 begins. Then, smartphone 22 begins receiving posture information and video, and imports the 2D/3DCG model from model storage unit 25. Note that the import of the 2D/3DCG model may be performed in advance.

　図７は、本線映像処理時において映像処理の実行を指示するＵＩイメージの一例を示す図である。 FIG. 7 shows an example of a UI image that instructs the execution of video processing during main line video processing.

　図７のＡには、カメラ２１の表示部３５に表示されるＵＩイメージが示されており、「本線映像の編集を行うために映像を送信しますか？」というメッセージとともに、「Ｙｅｓ」のGUIボタンおよび「Ｎｏ」のGUIボタンが表示されている。図７のＢには、スマートフォン２２の表示部４５に表示されるＵＩイメージが示されており、「本線映像の編集を行うために2D/3DCGメタファイルを送信しますか？」というメッセージとともに、「Ｙｅｓ」のGUIボタンおよび「Ｎｏ」のGUIボタンが表示されている。図７のＣには、パーソナルコンピュータ２３の表示部５６に表示されるＵＩイメージが示されており、「本線映像の編集を行うために2D/3DCGメタファイル、映像、および2D/3DCGモデルを受信しますか？」というメッセージとともに、「Ｙｅｓ」のGUIボタンおよび「Ｎｏ」のGUIボタンが表示されている。 7A shows a UI image displayed on the display unit 35 of the camera 21, displaying a "Yes" GUI button and a "No" GUI button along with the message "Do you want to send video to edit the main line video?". 7B shows a UI image displayed on the display unit 45 of the smartphone 22, displaying a "Yes" GUI button and a "No" GUI button along with the message "Do you want to send a 2D/3DCG metafile to edit the main line video?". 7C shows a UI image displayed on the display unit 56 of the personal computer 23, displaying a "Yes" GUI button and a "No" GUI button along with the message "Do you want to receive a 2D/3DCG metafile, video, and 2D/3DCG model to edit the main line video?".

　例えば、ユーザが、図示するようなＵＩイメージを表示部３５、表示部４５、および表示部５６に表示させ、それぞれの「Ｙｅｓ」のGUIボタンに対するユーザ操作を行うと、カメラ２１からパーソナルコンピュータ２３へ映像の送信が開始されるとともに、スマートフォン２２からパーソナルコンピュータ２３へ2D/3DCGメタファイルの送信が開始される。そして、パーソナルコンピュータ２３では、映像および2D/3DCGメタファイルの受信が開始され、モデル記憶部２５から2D/3DCGモデルのインポートが行われる。なお、2D/3DCGモデルのインポートは事前に行われるようにしてもよい。 For example, when a user causes a UI image as shown in the figure to be displayed on display unit 35, display unit 45, and display unit 56, and performs a user operation on each of the "Yes" GUI buttons, transmission of video from camera 21 to personal computer 23 begins, and transmission of a 2D/3DCG metafile from smartphone 22 to personal computer 23 begins. Then, personal computer 23 begins receiving the video and 2D/3DCG metafile, and imports the 2D/3DCG model from model storage unit 25. Note that the import of the 2D/3DCG model may be performed in advance.

　映像処理システム１１では、このようなＵＩイメージを用いて、映像処理の実行を指示することができる。 In the video processing system 11, such a UI image can be used to instruct the execution of video processing.

　本実施の形態において、ＣＧの合成処理とは、カメラ２１で撮影した映像に対して後段の編集でＣＧを合成することである。ＣＧは、カメラ２１の動きに合わせて移動する必要もある。また、トラッキングとは、カメラ２１で撮影した映像内の特徴点、例えば、映像内のオブジェクトなどをトラッキングして、２Ｄトラッキングや３Ｄカメラトラッキングを行うことである。２Ｄトラッキングは、カメラ２１で撮影した映像の特徴点や加速度センサを利用して映像内の物体をトラッキングすることであり、例えば、映像内の人間に常にモザイクを追従させたいときなどに使用される。３Ｄカメラトラッキングは、カメラ２１で撮影した映像はどのようにカメラ２１を動かして撮影したのかを加速度センサや特徴点などに基づいて解析し、仮想カメラを生成することである。一般的には、３Ｄカメラトラッキングは、ＣＧと仮想カメラをリンクさせることで、映像の動きに合わせて合成したＣＧも同じように動き、違和感のない合成が行えるようになる。 In this embodiment, CG synthesis processing refers to synthesizing CG with the image captured by the camera 21 in a later editing stage. The CG also needs to move in accordance with the movement of the camera 21. Tracking refers to tracking feature points in the image captured by the camera 21, such as objects in the image, to perform 2D tracking or 3D camera tracking. 2D tracking refers to tracking objects in the image using the feature points of the image captured by the camera 21 and an acceleration sensor, and is used, for example, when it is desired to have a mosaic always follow a person in the image. 3D camera tracking refers to analyzing how the camera 21 was moved to capture the image based on an acceleration sensor and feature points, and generating a virtual camera. Generally, 3D camera tracking links the CG and the virtual camera, so that the synthesized CG moves in the same way in accordance with the movement of the image, allowing for seamless synthesis.

　なお、カメラ２１が備えるジャイロは、６軸または８軸のセンサであり、あるいは、それ以外のセンサであってもよい。また、スマートフォン２２に替えて、タブレットを使用したり、外付けジャイロセンサと外付けモニタとを組み合わせて使用したりすることができる。さらに、カメラ２１が、３Ｄトラッキングおよび合成処理を行う機能を備えている場合、スマートフォン２２などを外付けする必要はなく、上述したスマートフォン２２の処理をカメラ２１で実行することができる。また、カメラ２１による撮影と同時に簡易合成映像を生成するのではなく、カメラ２１によって撮影済みの映像を用いて簡易合成映像を生成してもよい。 The gyro provided in the camera 21 may be a 6-axis or 8-axis sensor, or may be another sensor. Also, instead of the smartphone 22, a tablet may be used, or an external gyro sensor may be used in combination with an external monitor. Furthermore, if the camera 21 has a function for performing 3D tracking and synthesis processing, there is no need to attach an external smartphone 22, and the above-mentioned processing of the smartphone 22 can be performed by the camera 21. Also, instead of generating a simple composite image at the same time as shooting with the camera 21, a simple composite image may be generated using images already shot by the camera 21.

　また、スマートフォン２２がパーソナルコンピュータ２３と同等の処理能力を備えている場合、スマートフォン２２で本線映像処理を行ってもよく、この場合、本線映像ファイルおよび3DCGメタファイルをパーソナルコンピュータ２３に送信する必要はない。さらに、カメラ２１がスマートフォン２２およびパーソナルコンピュータ２３と同等の処理能力を備えていれば、全ての映像処理をカメラ２１の内部で実行するようにしてもよく、この場合、カメラ２１から外部に映像を送信する必要はない。 Furthermore, if the smartphone 22 has the same processing capability as the personal computer 23, the main line image processing may be performed by the smartphone 22, in which case there is no need to transmit the main line image file and the 3DCG metafile to the personal computer 23. Furthermore, if the camera 21 has the same processing capability as the smartphone 22 and the personal computer 23, all image processing may be performed inside the camera 21, in which case there is no need to transmit the image from the camera 21 to the outside.

　＜コンピュータの構成例＞
　次に、上述した一連の処理（映像処理方法）は、ハードウェアにより行うこともできるし、ソフトウェアにより行うこともできる。一連の処理をソフトウェアによって行う場合には、そのソフトウェアを構成するプログラムが、汎用のコンピュータ等にインストールされる。 <Example of computer configuration>
Next, the above-mentioned series of processes (video processing method) can be performed by hardware or software. When the series of processes is performed by software, a program constituting the software is installed in a general-purpose computer or the like.

　図８は、上述した一連の処理を実行するプログラムがインストールされるコンピュータの一実施の形態の構成例を示すブロック図である。 FIG. 8 is a block diagram showing an example of the configuration of one embodiment of a computer on which a program that executes the series of processes described above is installed.

　プログラムは、コンピュータに内蔵されている記録媒体としてのハードディスク１０５やROM１０３に予め記録しておくことができる。 The program can be pre-recorded on the hard disk 105 or ROM 103 as a recording medium built into the computer.

　あるいはまた、プログラムは、ドライブ１０９によって駆動されるリムーバブル記録媒体１１１に格納（記録）しておくことができる。このようなリムーバブル記録媒体１１１は、いわゆるパッケージソフトウェアとして提供することができる。ここで、リムーバブル記録媒体１１１としては、例えば、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto Optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリ等がある。 Alternatively, the program can be stored (recorded) on a removable recording medium 111 driven by the drive 109. Such a removable recording medium 111 can be provided as so-called packaged software. Here, examples of the removable recording medium 111 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, a semiconductor memory, etc.

　なお、プログラムは、上述したようなリムーバブル記録媒体１１１からコンピュータにインストールする他、通信網や放送網を介して、コンピュータにダウンロードし、内蔵するハードディスク１０５にインストールすることができる。すなわち、プログラムは、例えば、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送することができる。 In addition to being installed in a computer from the removable recording medium 111 as described above, the program can also be downloaded to the computer via a communication network or broadcasting network and installed in the built-in hard disk 105. That is, the program can be transferred to the computer wirelessly from a download site via an artificial satellite for digital satellite broadcasting, or transferred to the computer via a wired connection via a network such as a LAN (Local Area Network) or the Internet.

　コンピュータは、CPU(Central Processing Unit)１０２を内蔵しており、CPU１０２には、バス１０１を介して、入出力インタフェース１１０が接続されている。 The computer has a built-in CPU (Central Processing Unit) 102, to which an input/output interface 110 is connected via a bus 101.

　CPU１０２は、入出力インタフェース１１０を介して、ユーザによって、入力部１０７が操作等されることにより指令が入力されると、それに従って、ROM(Read Only Memory)１０３に格納されているプログラムを実行する。あるいは、CPU１０２は、ハードディスク１０５に格納されたプログラムを、RAM(Random Access Memory)１０４にロードして実行する。 When a command is input by the user via the input/output interface 110, such as by operating the input unit 107, the CPU 102 executes a program stored in the ROM (Read Only Memory) 103 accordingly. Alternatively, the CPU 102 loads a program stored on the hard disk 105 into the RAM (Random Access Memory) 104 and executes it.

　これにより、CPU１０２は、上述したフローチャートにしたがった処理、あるいは上述したブロック図の構成により行われる処理を行う。そして、CPU１０２は、その処理結果を、必要に応じて、例えば、入出力インタフェース１１０を介して、出力部１０６から出力、あるいは、通信部１０８から送信、さらには、ハードディスク１０５に記録等させる。 As a result, the CPU 102 performs processing according to the above-mentioned flowchart, or processing performed by the configuration of the above-mentioned block diagram. Then, the CPU 102 outputs the processing results from the output unit 106 via the input/output interface 110, or transmits them from the communication unit 108, or even records them on the hard disk 105, as necessary.

　なお、入力部１０７は、キーボードや、マウス、マイク等で構成される。また、出力部１０６は、LCD(Liquid Crystal Display)やスピーカ等で構成される。 The input unit 107 is composed of a keyboard, mouse, microphone, etc. The output unit 106 is composed of an LCD (Liquid Crystal Display), speaker, etc.

　ここで、本明細書において、コンピュータがプログラムに従って行う処理は、必ずしもフローチャートとして記載された順序に沿って時系列に行われる必要はない。すなわち、コンピュータがプログラムに従って行う処理は、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含む。 In this specification, the processing performed by a computer according to a program does not necessarily have to be performed in chronological order according to the order described in the flowchart. In other words, the processing performed by a computer according to a program also includes processing that is executed in parallel or individually (for example, parallel processing or processing by objects).

　また、プログラムは、１のコンピュータ（プロセッサ）により処理されるものであっても良いし、複数のコンピュータによって分散処理されるものであっても良い。さらに、プログラムは、遠方のコンピュータに転送されて実行されるものであっても良い。 The program may be processed by one computer (processor), or may be distributed among multiple computers. Furthermore, the program may be transferred to a remote computer for execution.

　さらに、本明細書において、システムとは、複数の構成要素（装置、モジュール（部品）等）の集合を意味し、すべての構成要素が同一筐体中にあるか否かは問わない。したがって、別個の筐体に収納され、ネットワークを介して接続されている複数の装置、及び、１つの筐体の中に複数のモジュールが収納されている１つの装置は、いずれも、システムである。 Furthermore, in this specification, a system refers to a collection of multiple components (devices, modules (parts), etc.), regardless of whether all the components are in the same housing. Therefore, multiple devices housed in separate housings and connected via a network, and a single device in which multiple modules are housed in a single housing, are both systems.

　また、例えば、１つの装置（または処理部）として説明した構成を分割し、複数の装置（または処理部）として構成するようにしてもよい。逆に、以上において複数の装置（または処理部）として説明した構成をまとめて１つの装置（または処理部）として構成されるようにしてもよい。また、各装置（または各処理部）の構成に上述した以外の構成を付加するようにしてももちろんよい。さらに、システム全体としての構成や動作が実質的に同じであれば、ある装置（または処理部）の構成の一部を他の装置（または他の処理部）の構成に含めるようにしてもよい。 Also, for example, the configuration described above as one device (or processing unit) may be divided and configured as multiple devices (or processing units). Conversely, the configurations described above as multiple devices (or processing units) may be combined and configured as one device (or processing unit). Of course, configurations other than those described above may also be added to the configuration of each device (or each processing unit). Furthermore, as long as the configuration and operation of the system as a whole are substantially the same, part of the configuration of one device (or processing unit) may be included in the configuration of another device (or other processing unit).

　また、例えば、本技術は、１つの機能を、ネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 Also, for example, this technology can be configured as cloud computing, in which a single function is shared and processed collaboratively by multiple devices via a network.

　また、例えば、上述したプログラムは、任意の装置において実行することができる。その場合、その装置が、必要な機能（機能ブロック等）を有し、必要な情報を得ることができるようにすればよい。 Furthermore, for example, the above-mentioned program can be executed in any device. In that case, it is sufficient that the device has the necessary functions (functional blocks, etc.) and is able to obtain the necessary information.

　また、例えば、上述のフローチャートで説明した各ステップは、１つの装置で実行する他、複数の装置で分担して実行することができる。さらに、１つのステップに複数の処理が含まれる場合には、その１つのステップに含まれる複数の処理は、１つの装置で実行する他、複数の装置で分担して実行することができる。換言するに、１つのステップに含まれる複数の処理を、複数のステップの処理として実行することもできる。逆に、複数のステップとして説明した処理を１つのステップとしてまとめて実行することもできる。 Furthermore, for example, each step described in the above flowchart can be executed by one device, or can be shared and executed by multiple devices. Furthermore, if one step includes multiple processes, the multiple processes included in that one step can be executed by one device, or can be shared and executed by multiple devices. In other words, multiple processes included in one step can be executed as multiple step processes. Conversely, processes described as multiple steps can be executed collectively as one step.

　なお、コンピュータが実行するプログラムは、プログラムを記述するステップの処理が、本明細書で説明する順序に沿って時系列に実行されるようにしても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで個別に実行されるようにしても良い。つまり、矛盾が生じない限り、各ステップの処理が上述した順序と異なる順序で実行されるようにしてもよい。さらに、このプログラムを記述するステップの処理が、他のプログラムの処理と並列に実行されるようにしても良いし、他のプログラムの処理と組み合わせて実行されるようにしても良い。 In addition, the processing of the steps that describe a program executed by a computer may be executed chronologically in the order described in this specification, or may be executed in parallel, or individually at the required timing, such as when a call is made. In other words, as long as no contradictions arise, the processing of each step may be executed in an order different from the order described above. Furthermore, the processing of the steps that describe this program may be executed in parallel with the processing of other programs, or may be executed in combination with the processing of other programs.

　なお、本明細書において複数説明した本技術は、矛盾が生じない限り、それぞれ独立に単体で実施することができる。もちろん、任意の複数の本技術を併用して実施することもできる。例えば、いずれかの実施の形態において説明した本技術の一部または全部を、他の実施の形態において説明した本技術の一部または全部と組み合わせて実施することもできる。また、上述した任意の本技術の一部または全部を、上述していない他の技術と併用して実施することもできる。 Note that the multiple present technologies described in this specification can be implemented independently and individually, provided no contradictions arise. Of course, any multiple present technologies can also be implemented in combination. For example, part or all of the present technology described in any embodiment can be implemented in combination with part or all of the present technology described in another embodiment. Also, part or all of any of the present technologies described above can be implemented in combination with other technologies not described above.

　＜構成の組み合わせ例＞
　なお、本技術は以下のような構成も取ることができる。
（１）
　撮影時に、映像に対してＣＧ（Computer Graphics）を簡易的に合成する第１の映像処理を行うことで簡易合成映像を生成して表示させ、前記映像に対して前記ＣＧを合成するのに用いられたメタ情報を含むメタファイルを作成する第１の映像処理部と、
　撮影後、前記メタファイルを用いて生成される前記簡易合成映像が表示されるバックグラウンドで、前記映像に対してＣＧを高精度に合成する第２の映像処理を行うことで高精度な合成映像を生成する第２の映像処理部と
　を備える映像処理システム。
（２）
　撮影を行うカメラをさらに備え、
　前記カメラの姿勢を示す姿勢情報に基づいて、前記第１の映像処理部および前記第２の映像処理部が行われる
　上記（１）に記載の映像処理システム。
（３）
　前記第１の映像処理部は、
　　前記姿勢情報に基づいて特定される前記カメラの撮影方向、および、前記映像から抽出される特徴点に基づいて、前記カメラを簡易的にトラッキングする簡易トラッキング部と、
　　前記トラッキングの結果に基づいて、前記映像に前記ＣＧを合成するのに必要な簡易合成情報を生成する簡易合成処理部と、
　　前記簡易合成情報に基づいて前記ＣＧを配置して、前記姿勢情報に基づいて特定される前記カメラに対応する仮想カメラを設定して、その仮想カメラを用いて前記ＣＧを簡易的にレンダリングして前記簡易合成映像を生成する簡易レンダリング部と
　を有する上記（２）に記載の映像処理システム。
（４）
　前記カメラは、前記カメラにより撮影された前記映像のフレームを特定するタイムコードを生成するタイムコード生成部を有し、
　前記第２の映像処理部は、前記メタファイルに付加されている前記タイムコード、および、前記映像のフレームごとに付加されている前記タイムコードに従って、前記メタファイルと前記映像とを同期させる同期処理部を有する
　上記（３）に記載の映像処理システム。
（５）
　前記第２の映像処理部は、
　　前記姿勢情報に基づいて特定される前記カメラの撮影方向、および、前記映像から抽出される特徴点に基づいて、前記カメラを高精度にトラッキングするトラッキング部と、
　　前記トラッキングの結果に基づいて、前記映像に前記ＣＧを合成するのに必要な合成情報を生成する合成処理部と、
　　前記合成情報に基づいて前記ＣＧを配置して、前記姿勢情報に基づいて特定される前記カメラに対応する仮想カメラを設定して、その仮想カメラを用いて前記ＣＧを高精度にレンダリングして前記合成映像を生成するレンダリング部と
　を有する上記（４）に記載の映像処理システム。
（６）
　前記レンダリング部が、前記メタファイルに含まれている前記簡易合成情報を用いて前記簡易合成映像を生成して表示させているバックグラウンドで、前記トラッキング部によるトラッキング、前記合成処理部による前記合成情報の生成、前記レンダリング部による前記合成映像の生成が行われる
　上記（５）に記載の映像処理システム。
（７）
　映像処理システムが、
　撮影時に、映像に対してＣＧ（Computer Graphics）を簡易的に合成する第１の映像処理を行うことで簡易合成映像を生成して表示させ、前記映像に対して前記ＣＧを合成するのに用いられたメタ情報を含むメタファイルを作成することと、
　撮影後、前記メタファイルを用いて生成される前記簡易合成映像が表示されるバックグラウンドで、前記映像に対してＣＧを高精度に合成する第２の映像処理を行うことで高精度な合成映像を生成することと
　を含む映像処理方法。
（８）
　映像処理システムのコンピュータに、
　撮影時に、映像に対してＣＧ（Computer Graphics）を簡易的に合成する第１の映像処理を行うことで簡易合成映像を生成して表示させ、前記映像に対して前記ＣＧを合成するのに用いられたメタ情報を含むメタファイルを作成することと、
　撮影後、前記メタファイルを用いて生成される前記簡易合成映像が表示されるバックグラウンドで、前記映像に対してＣＧを高精度に合成する第２の映像処理を行うことで高精度な合成映像を生成することと
　を含む映像処理を実行させるためのプログラム。 <Examples of configuration combinations>
The present technology can also be configured as follows.
(1)
a first image processing unit that performs a first image processing for simply synthesizing CG (Computer Graphics) with an image during shooting to generate and display a simplified synthesized image, and creates a metafile including meta information used to synthesize the CG with the image;
and a second image processing unit that generates a highly accurate composite image by performing a second image processing for synthesizing CG with the image with high precision after shooting while the simple composite image generated using the metafile is displayed in the background.
(2)
Further equipped with a camera for taking pictures,
The image processing system according to (1) above, wherein the first image processing unit and the second image processing unit are operated based on posture information indicating a posture of the camera.
(3)
The first image processing unit includes:
a simplified tracking unit that simply tracks the camera based on the shooting direction of the camera identified based on the posture information and feature points extracted from the video;
a simplified synthesis processing unit that generates simplified synthesis information required for synthesizing the CG image on the video based on a result of the tracking;
The video processing system described in (2) above has a simple rendering unit that positions the CG based on the simple synthesis information, sets a virtual camera corresponding to the camera identified based on the posture information, and uses the virtual camera to simply render the CG to generate the simple composite image.
(4)
the camera has a time code generation unit that generates a time code that identifies a frame of the video captured by the camera;
The second video processing unit has a synchronization processing unit that synchronizes the metafile and the video according to the time code added to the metafile and the time code added to each frame of the video.The video processing system described in (3) above.
(5)
The second image processing unit includes:
a tracking unit that tracks the camera with high accuracy based on the shooting direction of the camera specified based on the posture information and feature points extracted from the video;
a synthesis processing unit that generates synthesis information required for synthesizing the CG image on the video based on a result of the tracking;
and a rendering unit that positions the CG based on the synthesis information, sets a virtual camera corresponding to the camera identified based on the posture information, and uses the virtual camera to render the CG with high accuracy to generate the synthetic image.
(6)
The video processing system described in (5) above, in which tracking is performed by the tracking unit, the synthesis information is generated by the synthesis processing unit, and the synthetic video is generated by the rendering unit in the background while the rendering unit generates and displays the simple synthetic video using the simple synthesis information contained in the metafile.
(7)
The video processing system
During shooting, a first image processing is performed to simply combine CG (Computer Graphics) with the image to generate and display a simplified composite image, and a metafile is created that includes meta information used to combine the CG with the image;
After shooting, in the background where the simple composite image generated using the metafile is displayed, a second image processing is performed to composite CG with the image with high accuracy, thereby generating a highly accurate composite image.
(8)
The computer in the video processing system
During shooting, a first image processing is performed to simply combine CG (Computer Graphics) with the image to generate and display a simplified composite image, and a metafile is created that includes meta information used to combine the CG with the image;
After shooting, in the background where the simple composite image generated using the metafile is displayed, a second image processing is performed to synthesize CG with high precision onto the image, thereby generating a highly accurate composite image.

　なお、本実施の形態は、上述した実施の形態に限定されるものではなく、本開示の要旨を逸脱しない範囲において種々の変更が可能である。また、本明細書に記載された効果はあくまで例示であって限定されるものではなく、他の効果があってもよい。 Note that this embodiment is not limited to the above-described embodiment, and various modifications are possible without departing from the gist of this disclosure. Furthermore, the effects described in this specification are merely examples and are not limiting, and other effects may also be present.

　１１　映像処理システム，　１２および１３　被写体，　２１　カメラ，　２２　スマートフォン，　２３　パーソナルコンピュータ，　２４　記録メディア，　２５　モデル記憶部，　３１　姿勢認識部，　３２　撮影部，　３３　タイムコード生成部，　３４　データ送信部，　３５　表示部，　３６　記録メディアドライブ，　４１　姿勢認識部，　４２　簡易トラッキング部，　４３　簡易合成処理部，　４４　簡易レンダリング部，　４５　表示部，　４６　メタファイル作成部，　５１　記録メディアドライブ，　５２　同期処理部，　５３　トラッキング部，　５４　合成処理部，　５５　レンダリング部，　５６　表示部 11 Image processing system, 12 and 13 Subject, 21 Camera, 22 Smartphone, 23 Personal computer, 24 Recording media, 25 Model memory unit, 31 Posture recognition unit, 32 Shooting unit, 33 Time code generation unit, 34 Data transmission unit, 35 Display unit, 36 Recording media drive, 41 Posture recognition unit, 42 Simple tracking unit, 43 Simple synthesis processing unit, 44 Simple rendering unit, 45 Display unit, 46 Metafile creation unit, 51 Recording media drive, 52 Synchronization processing unit, 53 Tracking unit, 54 Synthesis processing unit, 55 Rendering unit, 56 Display unit

Claims

a first image processing unit that performs a first image processing for simply synthesizing CG (Computer Graphics) with an image during shooting to generate and display a simplified synthesized image, and creates a metafile including meta information used to synthesize the CG with the image;
and a second image processing unit that generates a highly accurate composite image by performing a second image processing for synthesizing CG with the image with high precision after shooting while the simple composite image generated using the metafile is displayed in the background.

Further equipped with a camera for taking pictures,
The image processing system according to claim 1 , wherein the first image processing section and the second image processing section are operated based on attitude information indicating an attitude of the camera.

The first image processing unit includes:
a simplified tracking unit that simply tracks the camera based on the shooting direction of the camera identified based on the posture information and feature points extracted from the video;
a simplified synthesis processing unit that generates simplified synthesis information required for synthesizing the CG image on the video based on a result of the tracking;
and a simple rendering unit that positions the CG based on the simple synthesis information, sets a virtual camera corresponding to the camera identified based on the posture information, and uses the virtual camera to simply render the CG to generate the simple synthetic image.

the camera has a time code generation unit that generates a time code that identifies a frame of the video captured by the camera;
The video processing system of claim 3 , wherein the second video processing unit has a synchronization processing unit that synchronizes the metafile and the video in accordance with the time code added to the metafile and the time code added to each frame of the video.

The second image processing unit includes:
a tracking unit that tracks the camera with high accuracy based on the shooting direction of the camera specified based on the posture information and feature points extracted from the video;
a synthesis processing unit that generates synthesis information required for synthesizing the CG image on the video based on a result of the tracking;
a rendering unit that positions the CG based on the synthesis information, sets a virtual camera corresponding to the camera identified based on the posture information, and uses the virtual camera to render the CG with high accuracy to generate the synthetic image.

The video processing system according to claim 5, wherein tracking by the tracking unit, generation of the synthesis information by the synthesis processing unit, and generation of the synthetic video by the rendering unit are performed in the background while the rendering unit generates and displays the simple synthetic video using the simple synthesis information contained in the metafile.

The video processing system
During shooting, a first image processing is performed to simply combine CG (Computer Graphics) with the image to generate and display a simplified composite image, and a metafile is created that includes meta information used to combine the CG with the image;
After shooting, in the background where the simple composite image generated using the metafile is displayed, a second image processing is performed to composite CG with the image with high accuracy, thereby generating a highly accurate composite image.

The computer in the video processing system
During shooting, a first image processing is performed to simply combine CG (Computer Graphics) with the image to generate and display a simplified composite image, and a metafile is created that includes meta information used to combine the CG with the image;
After shooting, in the background where the simplified composite image generated using the metafile is displayed, a second image processing is performed to synthesize CG with high precision onto the image, thereby generating a highly accurate composite image.