WO2012153747A1

WO2012153747A1 - Information processing device, information processing method, and information processing program

Info

Publication number: WO2012153747A1
Application number: PCT/JP2012/061800
Authority: WO
Inventors: 真澄石川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-05-12
Filing date: 2012-04-27
Publication date: 2012-11-15
Anticipated expiration: 2013-11-12
Also published as: JP2014170980A

Abstract

[Problem] To generate video content that retains the feel of source video. [Solution] This information processing device is characterized by the provision of an extraction means, a determination means, and a generation means. Said extraction means extracts two or more partial videos or still images from a source video. On the basis of characteristics of said source video, the determination means determines the presentation method for the two or more partial videos or still images extracted by the extraction means. On the basis of the presentation method determined by the determination means, the generation means generates video content that includes the two or more partial videos or still images.

Description

Information processing apparatus, information processing method, and information processing program

　本発明は、情報処理装置、情報処理方法および情報処理プログラムに関し、特に動画の一部を抽出して、新たな映像コンテンツを生成する情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and an information processing program, and more particularly to an information processing apparatus, an information processing method, and an information processing program that extract a part of a moving image and generate new video content.

　映像から抽出した画像をもとに新規映像を生成する技術の一例が、特許文献１に記載されている。特許文献１のシステムは、入力映像を構成する各フレーム間の内容（カメラの動きや物体の動きや人の顔等の映像内容や、音声イベント等の音響的内容）をもとにキーフレームを選択する。そして、特許文献１のシステムは、選択したキーフレームが任意の提示時間で表示される、新規映像を生成する。
　また、入力映像から抽出したキーフレームをもとに新規映像を生成する方式の一例が、特許文献２に記載されている。特許文献２の画像処理装置は、入力映像から、顔検出またはユーザ操作に応じてキーフレームを選択する。そして、特許文献２の画像処理装置は、選択したキーフレームが、そのキーフレーム中の顔の大きさ、笑顔、年齢、向き、操作情報をもとに決定した提示時間で表示される、新規映像を生成する。
　特許文献３には、設定された再生モードに応じて選ばれる特徴的な映像区間である代表区間と、入力映像中の代表区間以外の短い映像区間であるつなぎ区間をもとに新規映像を生成する映像処理装置が記載されている。特許文献３の映像処理装置は、新規映像中で連続する映像区間の類似性をもとに、ＢＧＭ（Ｂａｃｋｇｒｏｕｎｄ　Ｍｕｓｉｃ）を設定する。
　特許文献４には、顔にズームインするシーンからキーフレームを選択する動画像インデックス生成装置が記載されている。特許文献４の動画像インデックス生成装置は、入力映像中で連続する２フレームの関係性（互いに同一人物の大きさが異なる顔が映ったフレームであること）をもとに、顔が最もズーム・アップされた画像を選択する。 An example of a technique for generating a new video based on an image extracted from a video is described in Patent Document 1. The system of Patent Document 1 uses key frames based on the contents between frames constituting an input video (video contents such as camera movements, object movements, human faces, and acoustic contents such as audio events). select. And the system of patent document 1 produces | generates the new image | video by which the selected key frame is displayed by arbitrary presentation time.
Patent Document 2 describes an example of a method for generating a new video based on a key frame extracted from an input video. The image processing apparatus of Patent Literature 2 selects a key frame from an input video according to face detection or user operation. Then, the image processing apparatus of Patent Document 2 displays a new video in which the selected key frame is displayed with the presentation time determined based on the face size, smile, age, orientation, and operation information in the key frame. Is generated.
In Patent Document 3, a new video is generated based on a representative section that is a characteristic video section selected according to a set playback mode and a connecting section that is a short video section other than the representative section in the input video. A video processing apparatus is described. The video processing apparatus disclosed in Patent Document 3 sets BGM (Background Music) based on the similarity of consecutive video sections in a new video.
Patent Document 4 describes a moving image index generation apparatus that selects a key frame from a scene that zooms in on a face. The moving image index generation device of Patent Document 4 is based on the relationship between two consecutive frames in an input video (that is, a frame in which faces of the same person are different in size). Select the uploaded image.

特表２００９−５３７０４７号公報Special table 2009-537047 gazette 特開２０１０−２１３１３６号公報JP 2010-213136 A 特開２００８−１７８０９０号公報JP 2008-178090 A 特開平１０−２２４７３６号公報Japanese Patent Laid-Open No. 10-224736

　しかしながら、特許文献１及び特許文献２に記載されている技術は、生成された映像を、ソース映像中でのキーフレーム間の関連性を考慮することなく提示する。そのため、ソース映像中で、互いに共通の被写体の変化や動作の遷移、あるいは、互いに共通の意図で撮影された被写体を、新規映像の視聴により理解できない場合がある。
　特許文献３に記載されている技術は、キーフレーム単体の内容もしくは事前の設定をもとに、キーフレームの提示時間とエフェクトを決定する技術である。
　特許文献４に記載されている技術は、入力映像中で連続する２フレーム間の関連性をキーフレームの選択に利用する技術である。
　特許文献３及び特許文献４の技術は、新規映像中で連続するフレームにそれぞれ対応づけられた入力映像中の区間の関連性から、キーフレームの提示時間とエフェクトを決定することはできない。
　本発明の目的は、上述の課題を解決する技術を提供することにある。 However, the techniques described in Patent Literature 1 and Patent Literature 2 present the generated video without considering the relationship between key frames in the source video. For this reason, in the source video, there may be cases where it is not possible to understand a change in a common subject, a transition of operation, or a subject shot with a common intention by viewing a new video.
The technique described in Patent Document 3 is a technique for determining a key frame presentation time and an effect based on the content of a key frame alone or a preset setting.
The technique described in Patent Document 4 is a technique that uses the relationship between two consecutive frames in an input video for selecting a key frame.
The techniques of Patent Literature 3 and Patent Literature 4 cannot determine the key frame presentation time and the effect from the relevance of the sections in the input video that are respectively associated with consecutive frames in the new video.
The objective of this invention is providing the technique which solves the above-mentioned subject.

　上記目的を達成するため、本発明に係る情報処理装置は、ソース動画から、少なくとも２つの部分動画または静止画像を抽出する抽出手段と、前記抽出手段で抽出した前記少なくとも２つの部分動画または静止画像の提示方法を、前記ソース動画の特徴に基づいて決定する決定手段と、前記決定手段で決定した提示方法に基づいて、前記少なくとも２つの部分動画または静止画像を含む映像コンテンツを生成する生成手段と、を備えたことを特徴とする。
　上記目的を達成するため、本発明に係る情報処理方法は、ソース動画から、少なくとも２つの部分動画または静止画像を抽出し、抽出した前記少なくとも２つの部分動画または静止画像の提示方法を、前記ソース動画の特徴に基づいて決定し、決定した提示方法に基づいて、前記少なくとも２つの部分動画または静止画像を含む映像コンテンツを生成することを特徴とする。
　上記目的を達成するため、本発明に係る情報処理プログラムは、コンピュータを、ソース動画から、少なくとも２つの部分動画または静止画像を抽出する抽出手段と、前記抽出手段で抽出した前記少なくとも２つの部分動画または静止画像の提示方法を、前記ソース動画の特徴に基づいて決定する決定手段と、前記決定手段で決定した提示方法に基づいて、前記少なくとも２つの部分動画または静止画像を含む映像コンテンツを生成する生成手段として動作させることを特徴とする。 To achieve the above object, an information processing apparatus according to the present invention includes an extraction unit that extracts at least two partial moving images or still images from a source moving image, and the at least two partial moving images or still images extracted by the extraction unit. Determining means for determining the presentation method based on the characteristics of the source video, and generating means for generating video content including the at least two partial videos or still images based on the presentation method determined by the determining means , Provided.
To achieve the above object, an information processing method according to the present invention extracts at least two partial moving images or still images from a source moving image, and provides the extracted presentation method of the at least two partial moving images or still images as the source It is determined based on the feature of the moving image, and video content including the at least two partial moving images or still images is generated based on the determined presentation method.
In order to achieve the above object, an information processing program according to the present invention comprises a computer for extracting at least two partial moving images or still images from a source moving image, and the at least two partial moving images extracted by the extracting means. Alternatively, a determining unit that determines a still image presentation method based on characteristics of the source moving image and a video content including the at least two partial moving images or still images are generated based on the presentation method determined by the determining unit. It is characterized by operating as a generating means.

　本発明によれば、ソース映像の雰囲気を残した映像コンテンツを生成することができる。 According to the present invention, it is possible to generate video content that retains the atmosphere of the source video.

本発明の第１実施形態に係る情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which concerns on 1st Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置のスライドショー生成を説明する図である。It is a figure explaining the slide show production | generation of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の映像コンテンツ生成を説明する図である。It is a figure explaining the video content production | generation of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の映像コンテンツ生成を説明する図である。It is a figure explaining the video content production | generation of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第３実施形態に係る情報処理装置の映像コンテンツ生成を説明する図である。It is a figure explaining the video content production | generation of the information processing apparatus which concerns on 3rd Embodiment of this invention. 本発明の第３実施形態に係る情報処理装置の映像コンテンツ生成を説明する図である。It is a figure explaining the video content production | generation of the information processing apparatus which concerns on 3rd Embodiment of this invention. 本発明の第３実施形態に係る情報処理装置の映像コンテンツ生成を説明する図である。It is a figure explaining the video content production | generation of the information processing apparatus which concerns on 3rd Embodiment of this invention.

　以下に、図面を参照して、本発明の実施の形態について例示的に詳しく説明する。ただし、以下の実施の形態に記載されている構成要素はあくまで例示であり、本発明の技術範囲をそれらのみに限定する趣旨のものではない。
　［第１実施形態］
　本発明の第１実施形態としての情報処理装置１００について、図１を用いて説明する。情報処理装置１００は、動画を編集して映像コンテンツを生成する装置である。
　図１に示すように、情報処理装置１００は、キーフレーム抽出部１０１（抽出部）と提示方法決定部１０２（決定部）と映像コンテンツ生成部１０３（生成部）とを含む。
　キーフレーム抽出部１０１は、ソース動画から、少なくとも２つの部分動画または静止画像を抽出する。提示方法決定部１０２は、キーフレーム抽出部１０１で抽出した少なくとも２つの部分動画または静止画像の提示方法を、ソース動画の特徴に基づいて決定する。そして、映像コンテンツ生成部１０３は、提示方法決定部１０２で決定した提示方法に基づいて、少なくとも２つの部分動画または静止画像を含む映像コンテンツを生成する。映像コンテンツは、例えばスライドショーを含む。
　本実施形態によれば、ソース映像の雰囲気を残した映像コンテンツを生成することができる。
　［第２実施形態］
　（前提技術）
　映像投稿サイトでは、視聴者が映像の内容を素早く把握して興味をもった映像を効率的に選択できるように、映像から抽出した静止画像や部分動画（以下、キーフレームと呼ぶ）をリスト提示する機能が用いられている。
　視聴者が映像全体を適切に理解するためには、通常複数のキーフレームが必要である。しかし、多数のキーフレームをディスプレイ上に一度に提示すると、キーフレーム１枚あたりの大きさが小さくなり内容を十分確認できない場合がある。
　その問題を解決するためには、キーフレームを切り替えて順に提示する提示方法が有効と考えられる（以降、キーフレームを切り替えて順に提示した映像を映像コンテンツと呼ぶ）。映像コンテンツ中で連続して提示されるキーフレームの間の関連性を理解することは、入力映像の内容を把握する上で有効な場合がある。例えば、視聴者は、映像コンテンツ中で連続するキーフレームが、共通の被写体の変化や動作の遷移を表現していることを理解することで、動作や変化の流れを理解できる。また、視聴者は、連続するキーフレームに含まれる被写体が、入力映像内で共通の意図で（たとえば、同程度の興味で）撮影されたものであることを理解することで、被写体の重要度の関係性を理解できる。
　連続するキーフレームの関連性を視聴者に理解させるためには、キーフレームの提示時間や、キーフレーム間に挿入されるエフェクト等の提示方法は重要な意味をもつ。例えば、連続するキーフレームが同じ方法で提示された場合、実際にはそれらのキーフレーム間に関連性が無くても、それらのキーフレームに関連性があると視聴者が誤解する場合がある。また、連続するキーフレームがまったく異なる方法で提示された場合、キーフレーム間に関連性がないと視聴者が誤認する可能性がある。よって、キーフレーム間の関連性を視聴者に正しく理解させるためには、キーフレーム間の内容の関連性に応じて提示ルールを制御することが有効といえる。
　［構成］
　本発明の第２実施形態に係る情報処理装置２００の構成について、図２を用いて説明する。図２は、本実施形態に係る情報処理装置２００の概略構成を説明するための図である。
　情報処理装置２００は、キーフレーム抽出部２０１（抽出部）と提示方法決定部２０２（決定部）と映像コンテンツ生成部２０３（生成部）と映像入力部２０４とを備えている。
　キーフレーム抽出部２０１は、入力映像としてのソース動画２１０から、キーフレームとして、少なくとも２つの部分動画または静止画像を抽出する。提示方法決定部２０２は、キーフレーム抽出部２０１で抽出した少なくとも２つの部分動画または静止画像の提示方法を、ソース動画の特徴に基づいて決定する。映像コンテンツ生成部２０３は、提示方法決定部２０２で決定した提示方法に基づいて、少なくとも２つの部分動画または静止画像を連続的に提示する新規映像としての映像コンテンツ２４０を生成する。
　また、提示方法決定部２０２は、関連性判定部２２１（判定部）と提示方法選択部２２２（選択部）とを含む。関連性判定部２２１は、少なくとも２つの部分動画または静止画像に含まれる対象物に共通性があるか否かを、ソース動画に基づいて判定する。関連性判定部２２１は、少なくとも２つの部分動画または静止画像に含まれる対象物が同一か否かを、ソース動画に基づいて判定する。提示方法選択部２２２は、対象物に共通性がある場合には、対象物に共通性がない場合とは異なる提示方法を選択する。提示方法決定部２０２は、映像コンテンツ２４０における、キーフレームの提示時間を決定する。
　映像入力部２０４は、ビデオカメラなどからソース動画２１０を入力して、キーフレーム抽出部２０１および提示方法決定部２０２に渡す。キーフレーム抽出部２０１は、ソース動画２１０から抽出したキーフレームのみならず、そのキーフレームに関連するキーフレーム情報を映像コンテンツ生成部２０３に送る。キーフレーム情報は、キーフレームを識別するキーフレームＩＤ（ｉｄｅｎｔｉｆｉｅｒ）、映像コンテンツ内での提示順位、キーフレームの画素情報である。
　映像入力部２０４は、関連性判定部２２１からの要求に応じて、入力映像の情報（映像情報）を関連性判定部２２１に入力する。映像情報は、例えば、キーフレームＩＤ、キーフレームに対応した区間の画素情報や音響情報である。キーフレームに対応した区間は、例えば、入力映像中のキーフレームが属する単位区間、あるいは、キーフレームと同一の被写体が含まれている単位区間である。単位区間は、以下の４つの区間の何れかまたはその組合せであってもよい。一定の時間間隔で区切られた区間。カメラの切り変わり点等の撮影機器の制御信号をもとに区切られた区間。フレームの画像変化点や音響変化点等の映像から抽出される特徴量をもとに区切られた区間。場所や被写体や時間帯等の撮影内容の変化点として手動で区切られた区間。
　キーフレームに対応する区間は、各キーフレームに対して少なくとも１個存在する。複数のキーフレームが１個の区間に対応づけられても構わない。区間の画像情報は、例えば、区間に属するフレームの画像情報である。区間の音響情報は、例えば、区間と同期した音情報である。また、区間情報は、区間内に映っている被写体、撮影場所、撮影時刻を記述したメタ情報、ＧＰＳ（Ｇｌｏｂａｌ　Ｐｏｓｉｔｉｏｎｉｎｇ　Ｓｙｓｔｅｍ）等のセンサ情報を含んでいてもよい。
　関連性判定部２２１は、キーフレーム抽出部２０１から入力されたキーフレーム情報をもとに、映像入力部２０４からキーフレームに対応する映像情報を取得する。そして、関連性判定部２２１は、キーフレーム間の関連性を判定する。関連性判定部２２１は、キーフレーム関連性情報を提示方法選択部２２２に入力する。キーフレーム関連性情報は、例えば、キーフレームＩＤと関連性フラグである。キーフレーム情報は、上記に加えてキーフレームの画素情報を含んでいてもよい。関連性フラグは、あらかじめ規定された関連性種別のうち、現在のキーフレームとその後に提示されるキーフレームとの間に存在する関連性種別を示すデータ、もしくは、いずれの関連性種別も存在しない（関連性が無い）ことを示すデータである。たとえば、関連性判定部２２１は、あるキーフレームとその後のキーフレームとの間に存在する全関連性種別の関連性フラグにフラグ１を設定する。そして、関連性判定部２２１は、存在しない関連種別の関連性フラグにフラグ０を設定する。あるいは、関連性判定部２２１は、関連性種別に応じて意味を持つ任意の数値を、関連性フラグに設定してもよい。
　関連性判定部２２１は、キーフレームにおける対象物の撮影方法が同一か否かを、ソース動画に基づいて判定してもよい。関連性判定部２２１は、キーフレームに含まれる対象物に共通性があるか否かを、ソース動画の音響的な特徴に基づいて判定してもよい。
　被写体の同一性に着目した関連性の判定方法について以下に述べる。
　（関連性１．　被写体の同一性）
　関連性判定部２２１は、キーフレーム同士の被写体の同一性を、キーフレームを抽出したソース動画での撮影の連続性などによって決定することができる。関連性１は、このように決定された関連性である。
　「被写体が同一である」とは、映像コンテンツ中で連続するキーフレーム対の被写体が、共通であることをいう。「被写体が同一である」場合には、キーフレーム対に含まれる被写体が見かけ上まったく変化のない場合も、キーフレーム対に含まれる被写体が一連の変化や動作の過程で互いに異なるある瞬間の場合も含まれる。たとえば、時間の経過に伴って色が変わる建造物を撮影した映像において、色が変化する前と後のキーフレームの被写体は、キーフレームのみから判断すると異なるように見える。しかし、関連性判定部２２１は、ソース動画を参照することによって、色が変化する前と後のキーフレームの被写体が同一であると判断することができる。昆虫が孵化する様子を撮影したソース動画から抽出された、孵化の前と後のキーフレーム対の関係なども同様に、キーフレームのみから判断すると被写体が異なるように見える。しかし、関連性判定部２２１は、ソース動画を参照することによって被写体が同一であると判断することができる。一方、時間の経過に伴って音色が変化する楽器を撮影した映像から、音色が変化する前後のキーフレームを抽出した場合には、キーフレームの抽出箇所の音声のみから判断すると、それらのキーフレームは異なる被写体を撮影したものと判断されうる。しかし、この場合も、関連性判定部２２１は、ソース動画全体を参照することによって被写体が同一であると判断することができる。
　すなわち、ソース動画を参照することによって、同じ被写体を連続的に撮影した動画から抽出されたキーフレーム群が、同じ被写体を連続的に撮影した動画から抽出されたことが明確になる。つまり、関連性判定部２２１は、例えば、ソース動画の編集点を見つけ、その間のキーフレーム群については被写体が同じだと推定することができる。
　関連性判定部２２１は、は、あるキーフレームの被写体とその後のキーフレームの被写体とが、同一である場合に、関連性１についての関連性フラグに１を設定する。関連性判定部２２１は、あるキーフレームの被写体とその後のキーフレームの被写体とが、同一でない場合に、関連性１についての関連性フラグに０を設定する。被写体の同一性判定は、たとえば以下の方法で実現できる。映像コンテンツ中で連続するキーフレーム対が同一の区間に対応づけられており、かつ、それらのキーフレームから検出された被写体領域（被写体の画像領域）が同一の場合に、関連性判定部２２１は、その連続するキーフレーム対の被写体が同一であると判定してもよい。あるいは、関連性判定部２２１は、映像コンテンツ中で連続するキーフレーム対それぞれから被写体領域を検出する。そして、関連性判定部２２１は、各キーフレームに対応した区間からそれぞれ被写体領域を検出し、トラッキングする。関連性判定部２２１は、一方の区間から検出された被写体領域およびトラック過程の被写体領域と、もう一方の区間から検出された被写体領域およびトラック過程の被写体領域とを比較する。関連性判定部２２１は、それらの被写体領域の、色や形状などの画像特徴量が類似する場合に、各キーフレームの被写体は同一であると判定してもよい。
　あるいは、関連性判定部２２１は、映像コンテンツ中で連続するキーフレーム対について、各キーフレームに対応した区間からそれぞれ被写体の画像領域と被写体の発する音響情報を抽出してもよい。そして、一方の区間から検出された被写体領域と、もう一方の区間から検出された被写体領域とで、画像的特徴と音響情報とがともに類似する場合に、関連性判定部２２１は、キーフレーム対に含まれる被写体は同一であると判定してもよい。
　キーフレームから被写体領域を検出する手法は、あらかじめ登録された特定対象を検出する場合と、登録されていない一般対象を検出する場合とに分けられる。関連性判定部２２１は、特定対象を検出する場合は、登録された各対象の画像データをテンプレートにすればよい。そして、関連性判定部２２１は、様々な解像度に変換したテンプレートで区間に属するキーフレームを走査すればよい。そして、関連性判定部２２１は、テンプレートと同じ位置の画素値の差分が小さい領域を対応する被写体領域として検出すればよい。
　あるいは、関連性判定部２２１は、キーフレームの各部分領域から色・テクスチャ・形状を表現する画像特徴量を抽出してもよい。そして、関連性判定部２２１は、登録された各対象の画像特徴量と類似した画像特徴量をもつ部分領域を対応する被写体領域としてもよい。また、特定対象が人物の場合には、顔全体から得られる情報を利用する手法がある。そのような手法として、例えば様々な顔の映っている画像をテンプレートとして記憶し、キーフレームとテンプレートの差分がある閾値以下のとき顔が入力画像中に存在すると判定する手法が挙げられる。また、関連性判定部２２１は、肌色などの色情報や、エッジの方向や密度を組み合わせたモデルをあらかじめ記憶しておき、モデルに類似した領域を被写体領域として検出してもよい。更に、関連性判定部２２１は、以下の手法のいずれかまたはその組合せを利用してもよい。顔（頭部）の輪郭を楕円、目や口を細長の形状をしていることを利用して作成したテンプレートを用いて検出する手法。頬や額の部分は輝度が高く、目や口の部分の輝度は低いという輝度分布の特性を利用する手法。顔の対称性や肌色領域と位置を利用する手法。
　また、関連性判定部２２１は、大量の人物顔と非顔の学習サンプルから得られた特徴量分布を統計的に学習し、入力画像から得られる特徴量が顔と非顔のどちらの分布に属するかを判定する手法（ニューラルネットやサポートベクターマシン、ＡｄａＢｏｏｓｔ法）など用いてもよい。また一般対象を検出する場合は、関連性判定部２２１は、例えばＮｏｒｍａｌｉｚｅｄ　Ｃｕｔや、Ｓａｌｉｅｎｃｙ　Ｍａｐ、Ｄｅｐｔｈ　ｏｆ　Ｆｉｅｌｄ（ＤｏＦ）を用いてもよい。Ｎｏｒｍａｌｉｚｅｄ　Ｃｕｔは、画像を複数の領域に分割する手法である。Ｊｉａｎｂｏ　Ｓｈｉ　ａｎｄ　Ｊｉｔｅｎｄｒａ　Ｍａｌｉｋ，“Ｎｏｒｍａｌｉｚｅｄ　Ｃｕｔｓ　ａｎｄ　Ｉｍａｇｅ　Ｓｅｇｍｅｎｔａｔｉｏｎ”，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓ　ｏｎ　Ｐａｔｔｅｒｎ　Ａｎａｌｙｓｉｓ　ａｎｄ　Ｍａｃｈｉｎｅ　Ｉｎｔｅｌｌｉｇｅｎｃｅ，ｖｏｌ．２２，Ｎｏ．８，Ａｕｇｕｓｔ２０００にＮｏｒｍａｌｉｚｅｄ　Ｃｕｔについて詳しい説明がある。関連性判定部２２１は、Ｎｏｒｍａｌｉｚｅｄ　Ｃｕｔにより分割された領域のうち、キーフレーム中央に位置する領域を被写体領域として検出してもよい。また、関連性判定部２２１は、Ｓａｌｉｅｎｃｙ　Ｍａｐにより高い重要度が算出された領域を被写体領域として検出してもよい。Ｓａｌｉｅｎｃｙ　Ｍａｐは、画像中の物体領域を視覚注意から算出する方法である。Ｓａｌｉｅｎｃｙ　Ｍａｐについては、Ｌ．Ｉｔｔｉ，Ｃ．Ｋｏｃｈ　ａｎｄ　Ｅ．Ｎｉｅｂｕｒ，“Ａ　Ｍｏｄｅｌ　ｏｆ　Ｓａｌｉｅｎｃｙ−ｂａｓｅｄ　Ｖｉｓｕａｌ　Ａｔｔｅｎｔｉｏｎｆｏｒ　Ｒａｐｉｄ　Ｓｃｅｎｅ　Ａｎａｌｙｓｉｓ，”ＩＥＥＥ　Ｔｒａｎｓ．Ｐａｔｔｅｒｎ　Ａｎａｌｙｓｉｓ　ａｎｄ　Ｍａｃｈｉｎｅ　Ｉｎｔｅｌｌｉｇｅｎｃｅ，Ｖｏｌ．２０，Ｎｏ．１１，ｐｐ．１２５４−１２５９，１９９８に詳しい記載がある。
　また、Ｄｏｆは、被写界深度内に存在する対象のエッジにはボケがなく、被写界深度外のエッジにボケが発生する特性に基づく手法である。詳しくは、３Ｄｕ−Ｍｉｎｇ　Ｔｓａｉ，Ｈｕ−Ｊｏｎｇ　Ｗａｎｇ，“Ｓｅｇｍｅｎｔｉｎｇ　ｆｏｃｕｓｅｄ　ｏｂｊｅｃｔｓ　ｉｎ　ｃｏｍｐｌｅｘ　ｖｉｓｕａｌ　ｉｍａｇｅｓ”，Ｐａｔｔｅｒｎ　Ｒｅｃｏｇｎｉｔｉｏｎ　Ｌｅｔｔｅｒｓ，Ｖｏｌ．１９，ｐｐ．９２９　９４０，１９９８．に開示がある。関連性判定部２２１は、エッジの太さをもとにボケ量を算出し、ボケの少ないエッジを結合し、焦点が合っている領域を対象領域として検出してもよい。
　関連性判定部２２１は、静止画像中の位置または視認性の高さ（照明条件、向き、角度、画面上での位置、他の対象による隠れ、ボケ、（人物の場合には）表情、等に基づく映りの良さを示す評価値）または複数画像での出現頻度をもとにキーフレームから対象領域を検出してもよい。また、関連性判定部２２１は、検出された複数の被写体領域を組み合わせて１つとしてもよい。関連性判定部２２１は、キーフレームに対応した区間からの被写体領域の検出を、たとえば、次のように行ってもよい。関連性判定部２２１は、キーフレーム中から検出された被写体領域の画像情報をテンプレートにすればよい。そして、関連性判定部２２１は、キーフレームに対応する区間に属するいずれかのフレームから被写体領域を検出すればよい。
　区間内で検出された被写体領域のトラッキングは、たとえ以下の方法で実現できる。関連性判定部２２１は、被写体領域が検出されたフレームを開始フレームにする。関連性判定部２２１は、時間方向に隣接するフレームに対し被写体領域の検出処理を行う。関連性判定部２２１は、被写体領域の検出に用いるテンプレートに、既に検出された対象領域の画像特徴量を用いればよい。そして、関連性判定部２２１は、既に検出された対象領域の検出位置を中心に規定範囲の領域を、そのテンプレートで走査すればよい。関連性判定部２２１は、各被写体領域から画像特徴量を抽出し、画像特徴量の差が小さいほど高い値を算出する尺度をもとに、被写体領域の間の類似度を算出してもよい。画像特徴量は、被写体領域から検出された色、エッジ、テクスチャ等の画像情報をもとに算出できる。あるいは、関連性判定部２２１は、各被写体の画像領域からＳＩＦＴ（Ｓｃａｌｅ−Ｉｎｖａｒｉａｎｔ　Ｆｅａｔｕｒｅ　Ｔｒａｎｓｆｏｒｍ）等の局所特徴点を検出すればよい。そして、関連性判定部２２１は、画像領域間で特徴点を対応づければよい。そして、関連性判定部２２１は、対応づけられた特徴点の個数が多い、もしくは、対応づけられた特徴点の位置関係が画像間で似ているほど高い値を算出する尺度を用いてもよい。関連性判定部２２１は、例えば、被写体領域が検出された区間から、音響情報として複数の周波数帯の音響エネルギーを抽出してもよい。そして、関連性判定部２２１は、音響エネルギーの差が小さいほど高い値を算出する尺度をもとに、被写体の発する音響情報の類似性を算出してもよい。関連性判定部２２１は、上記のように、キーフレームに対応した区間の情報を用いることで、キーフレーム情報のみを用いた場合よりも、キーフレーム間で生じた被写体の見え方の変化や背景の変化に対して頑強に、被写体の同一性を判定できる。
　提示方法選択部２２２は、キーフレーム切り替え時のエフェクトまたはジングルを決定する。提示方法選択部２２２は、連続するキーフレームがお互いに関連性を有する場合には、関連性を有しない場合とは異なるエフェクトまたはジングルを決定する。提示方法選択部２２２は、映像コンテンツにおけるキーフレームの背景音楽を決定する。
　具体的には、提示方法選択部２２２は、関連性判定部２２１から入力されるキーフレーム関連性情報と、あらかじめ登録された提示ルールをもとに、キーフレームの提示方法を決定する。提示方法選択部２２２は、キーフレームの提示方法を示す情報（提示方法情報）を、映像コンテンツ生成部２０３に入力する。提示方法情報は、各キーフレームの提示方法を示すデータである。提示方法情報は、キーフレームＩＤと提示時間を含んでいればよい。提示方法情報は、上記に加えてエフェクト、ＢＧＭ、音響ジングル、映像ジングルを含んでいてもよい。
　提示ルールは、関連性種別に応じたキーフレームの提示方法を規定するルールである。提示ルールは、連続するキーフレーム対の各提示時間を規定するパラメータを含む。また、提示ルールは、提示時間に加えて、キーフレームの間に挿入するエフェクトやＢＧＭ、ジングル（短い映像や音楽、効果音）に関する制御パラメータを含んでいてもよい。また、提示ルールには、連続するキーフレーム対にいずれの関連性種別も存在しない場合の提示方法が規定されていてもよい。提示ルールとして、例えば以下が挙げられる。
　（１）提示時間に関するルール
　提示方法決定部２０２は、少なくとも２つの部分動画または静止画像がお互いに関連性を有する場合には、一方の提示時間に基づいて他方の提示時間を決定する。具体的には、提示方法決定部２０２は、少なくとも２つの部分動画または静止画像に含まれる対象物が同一である場合には、前に挿入される部分動画または静止画像の提示時間より、後に挿入される部分動画または静止画像の提示時間を短くする。一方、提示方法決定部２０２は、少なくとも２つの部分動画または静止画像がお互いに関連性を有しない場合には、独立に提示時間を決定する。
　言い換えれば、提示方法決定部２０２は、連続するキーフレーム対に含まれる被写体の同一性もしくは被写体に対する撮影者の興味の同一性をもとに、キーフレーム対の提示時間を決定する。例えば、連続するキーフレーム対に含まれる被写体が同一、もしくは被写体に対する撮影者の興味が同一の場合には、提示方法決定部２０２は、はじめに提示されるキーフレームの提示時間を初期値Ｔｓにしてもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の被写体もしくは同一の興味で撮影された被写体を含むキーフレーム群のうち、視認性の高いキーフレームの提示時間をＴｐにしてもよい。そして、提示方法決定部２０２は、Ｔｐを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の被写体もしくは同一の興味で撮影された被写体を含むキーフレーム群のうち、提示時間がＴｑ以下になったキーフレームの次のキーフレームの提示時間を初期値Ｔｓにしてもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の被写体もしくは同一の興味で撮影された被写体を含むキーフレーム群のうち、最後に提示されるキーフレームの提示時間を初期値Ｔｓに設定してもよい。提示方法決定部２０２は、あらかじめ設定しておいた映像コンテンツ全体の時間から、提示するキーフレーム数に応じて、ＴｓとＴｐの値を算出してもよい。また、連続するキーフレーム対に含まれる被写体が同一でない、もしくは、同一の興味で撮影された被写体でない場合には、提示方法決定部２０２は、前のキーフレームの提示時間と独立に後続のキーフレームの提示時間を決定する。例えば、提示方法決定部２０２は、後続のキーフレームの提示時間を、初期値Ｔｓに設定してもよい。提示方法決定部２０２は、後続のキーフレームの提示時間を、規定範囲内のランダムな値に設定してもよい。
　例として、図３を用いて、人物Ａの周囲を回りながら撮影した動画から５枚のキーフレーム３０１~３０５を抽出した場合について説明する。図３の例ではキーフレーム３０１~３０５に含まれる被写体は同一である。そのため、提示方法決定部２０２は、上述のように、最初のキーフレームの提示時間Ｔｓに対して、パラメータａを乗算することによって、後続のキーフレームの提示時間を算出する。このとき、始めのキーフレーム３０１の提示時間を初期値Ｔｓとすると、後続のキーフレーム３０２の提示時間Ｔｉは以下の式（１）で表わされる。

　さらに、正面を向いたキーフレーム３０３で視認性の評価値が閾値以上の場合、キーフレーム３０３の提示時間をＴｐとすると、後続のキーフレーム３０４、３０５の提示時間Ｔｊは以下の式（１）で表わされる。

　パラメータａが０から１の間に設定されていれば、人物Ａを含むキーフレームのうち初めに提示されたキーフレーム３０１と人物Ａの映りがよいキーフレーム３０３は長く提示される。他のキーフレームは、キーフレーム３０１、３０３から遠ざかるに従って徐々に短く提示される。
　これにより、本実施形態には、利用者が、対象が初めて登場したキーフレームや映りのよいキーフレームの内容を理解することができ、その他の画像の内容は理解した内容とほぼ同様であると理解することができるという効果がある。また本実施形態の情報処理装置２００は、同じ対象を含む画像であっても、連続する画像の提示時間が変化する映像を生成できる。そのため、本実施形態には、視聴者を飽きさせないテンポ感ある映像コンテンツを生成できるという効果がある。
　（２）エフェクト・ＢＧＭ・ジングルに関するルール
　提示方法選択部２２２は、連続するキーフレーム対に含まれる被写体の同一性、もしくは被写体に対する撮影者の興味の同一性をもとに、キーフレーム対の間に挿入するエフェクト・ＢＧＭ・ジングルを決定する。例えば、連続するキーフレーム対に含まれる被写体が同一もしくは被写体に対する撮影者の興味が同一の場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の少ないエフェクトとしてあらかじめ登録された特殊効果（ディゾルブやフェード等）を挿入する。連続するキーフレーム対に含まれる被写体が同一でない場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の大きいエフェクトとしてあらかじめ登録された特殊効果（ページめくり、ワイプ等のＤＶＥ（Ｄｉｇｉｔａｌ　Ｖｉｄｅｏ　Ｅｆｆｅｃｔ））を挿入する。また例えば、連続するキーフレーム対に含まれる被写体が同一もしくは同一の興味で撮影された被写体の場合には、提示方法選択部２２２は、キーフレーム対の提示中に同じＢＧＭを流す。連続するキーフレーム対に含まれる被写体が同一でない場合には、提示方法選択部２２２は、キーフレームの切り替え時にＢＧＭを止めるもしくは異なるＢＧＭに切り替える。また、提示方法選択部２２２は、被写体が同一でない、もしくは同一の興味で撮影された被写体を含まないキーフレームの間に、映像や音響のジングルを挿入してもよい。これにより、同一の被写体もしくは同一の興味で撮影された被写体を含むキーフレーム群は、画像や音響の変化がなく滑らかに接続される。
　そのため、視聴者は、それらのキーフレーム群に、同一の被写体もしくは同一の重要度の被写体が含まれていることを、容易に理解できる。そして、視聴者は、それらのキーフレーム群が、一連の変化や動作の途中の画像であるか同一の意図で撮影された被写体を含む画像であることを、容易に理解できる。また、キーフレーム中の被写体が同一でない、あるいは、同一の意図で撮影された被写体を含まない場合には、画像や音響が大きく変化する。そのため、視聴者は、キーフレームの内容が大きく変化したことに気づくことができ、新規映像の理解に集中することができる。
　映像コンテンツ生成部２０３は、提示方法選択部２２２で選択された提示方法情報と、キーフレーム抽出部２０１から入力されるキーフレーム情報をもとに、新規な映像コンテンツを生成し、出力する。
　（動作）
　次に、図４のフローチャートを参照して、本実施の形態の動作について詳細に説明する。以下では、一例として、本実施形態の情報処理装置２００が、図５に示すキーフレーム５０１~５１３を抽出して、映像コンテンツを生成する場合の動作について説明する。この映像コンテンツは、建物の中にある温室で、花と人物とを撮影した出来事を伝えるものである。図５の矩形が、関連性判定部２２１によって各キーフレームから検出された対象領域である。
　また、提示方法選択部２２２は、対象領域が同一であるキーフレーム対については、提示ルールとして、大小関係もしくは部分関係に基づくルールを用いて提示方法を制御する。また、提示方法選択部２２２は、対象領域が同一でないキーフレーム対については、提示ルールとして、同種性に基づくルールを用いて提示方法を制御する。なお、大小関係、部分関係、同種性に基づくルールについては、第３実施形態以降に詳しく説明する。
　まず、ステップＳ４０１において映像入力部２０４はソースとなる動画を入力する。ステップＳ４０２において、映像入力部２０４は、入力したソース動画を、キーフレーム抽出部２０１に渡す。キーフレーム抽出部２０１は、キーフレームの抽出を行う。
　ステップＳ４０３では、キーフレーム抽出部２０１は、キーフレーム情報を提示方法決定部２０２に渡す。また、映像入力部２０４は、映像情報を提示方法決定部２０２に渡す。関連性判定部２２１は、キーフレーム５０１~５１３が抽出されたソース動画を参照して、被写体の同一性や、撮影方法の共通性を判定する。
　さらに、ステップＳ４０３において、関連性判定部２２１は、キーフレーム５０１~５１３から対象領域を検出する。関連性判定部２２１には、対象としてあらかじめ建造物・草花・人が登録されているものとする。また、関連性判定部２２１は、それぞれのモデルを学習しているものとする。そして、関連性判定部２２１は、キーフレーム５０１~５１３から、それぞれ、建造物の対象領域として実線矩形で囲まれた箇所を検出する。
　ステップＳ４０５では、関連性判定部２２１は、対象領域０および対象領域１の画素情報から画像特徴量を抽出する。そして、関連性判定部２２１は、領域間の類似性をもとに同一性・大小関係・部分関係・同種性を判定する。対象領域０と１は、建造物の種別として検出されているため、関連性判定部２２１は、対象領域０と１の間には同種性があると判定する。また、関連性判定部２２１は、キーフレーム５０１上の破線矩形の領域を、対象領域１と対象領域０の共通領域として検出する。そのため、関連性判定部２２１は、対象領域１と０は大小関係にあると判定する。また共通領域以外の領域が対象領域０上には存在しないため、関連性判定部２２１は、対象領域１と０の間には、部分関係の関係性はないと判定する。よって、キーフレーム５０１におけるキーフレーム５０２との間の関連性フラグは、同一性・大小関係・部分関係・同種性の順に１、−１、０、１となる。
　提示方法選択部２２２は、画像関連性情報として、画像ＩＤと関連性フラグをもとに提示方法を選択する。例えば、キーフレーム５０１とキーフレーム５０２の対象領域は同一のため、提示方法選択部２２２は、大小関係もしくは部分関係に基づくルールを適用する。提示方法決定部２０２は、開始画像であるキーフレーム５０１の提示時間は初期値Ｔｓにする。そして、提示方法決定部２０２は、キーフレーム５０１、５０２の大小関係が小・大の関係であるため、キーフレーム５０２の提示時間を、ａ＊Ｔｓにする。また、キーフレーム５０１、５０２に大小関係があることから、提示方法選択部２２２は、キーフレーム５０１、５０２の切り替わりのエフェクトとして、視覚的変化の少ないディゾルブを挿入する（ステップＳ４０７）。
　映像コンテンツ生成部２０３は、決定した提示時間・エフェクトで、キーフレ−ム５０１、５０２を用いて映像コンテンツを生成する（ステップＳ４０９）。
　図６の例では、キーフレームから検出される対象領域の種別が対象領域６０１である。各関連性種別に対する関連性フラグが６０２である。提示方法決定部２０２によって決定される提示時間長およびエフェクトが、提示時間長６０３およびエフェクト６０４である。
　本実施形態によれば、入力映像から抽出されたキーフレームを用いて、キーフレーム中の被写体の入力映像中での意味的な関連性を理解しやすい新規映像を生成することができる。
　［第３実施形態］
　第２実施形態に開示した関連性１に代えて、あるいは加えて以下の関連性のいずれか１つの変化に応じてそれぞれ提示方法を変更してもよい。
　（関連性２．動画撮影方法）
　キーフレーム抽出部２０１が、ソース動画中で撮影者が被写体を撮影した撮影方法に関連性のあるキーフレーム群を抽出した場合、本実施形態の情報処理装置２００は、その撮影方法の関連性に応じた提示方法で、それらのキーフレーム群を提示する。例えば、連続する複数のキーフレームが、いずれもフォロー撮影（被写体の動きに追従するようにカメラを動かして撮影する技法）で撮影された動画部分から抽出された場合、関連性判定部２２１は、それらのキーフレーム群は関連性があると判定する。
　同様に、連続する複数のキーフレームが、いずれもズーム撮影で撮影された動画部分から抽出された場合や、いずれも一定時間以上静止撮影された動画部分から抽出された場合にも、関連性判定部２２１は、それらのキーフレーム群にはある種の関連性、共通性が存在すると判断する。関連性判定部２２１は、撮影方法に関連性がある場合、関連性２についての関連性フラグに１を設定する。関連性判定部２２１は、撮影方法にない場合、関連性２についての関連性フラグに０を設定する。
　関連性判定部２２１は、連続するキーフレーム対それぞれから被写体領域を検出する。そして、関連性判定部２２１は、この被写体領域の情報をもとに、キーフレームに対応する区間から、それぞれ被写体領域を検出する。関連性判定部２２１は、被写体領域および背景領域の動きベクトルを解析する。そして、関連性判定部２２１は、被写体がフォロー、ズーム、静止等の意図的な方法で撮影されていることを判定する。
　関連性判定部２２１は、例えば以下の方法で、区間内の被写体領域の撮影方法を判定できる。関連性判定部２２１は、被写体がフォロー撮影されていることを、特許第４５９３３１４公報に開示されたフォロー対象判定方式によって判定できる。また、関連性判定部２２１は、被写体がズームインもしくは静止撮影されていることを、特開２００７−１９８１４号公報のカメラモーション判定方式によって判定できる。
　〔動画撮影方法の共通性に応じたルール〕
　（２−１）提示時間に関するルール
　提示方法決定部２０２は、連続するキーフレーム対のソース動画の撮影方法の同一性をもとに、キーフレーム対の提示時間を決定する。例えば、同一の撮影方法で撮影されたキーフレームが連続した場合、提示方法決定部２０２は、徐々に提示時間を短くしていく。すなわち、提示方法決定部２０２は、同一の方法で撮影されたキーフレーム群のうち、はじめに提示されるキーフレームの提示時間を初期値Ｔｓにする。そして、提示方法決定部２０２は、Ｔｓを基準として後続のキーフレームの提示時間を決定する。また、提示方法決定部２０２は、同一の方法で撮影されたキーフレーム群のうち、視認性の高いキーフレームの提示時間をＴｐにしてもよい。そして、提示方法決定部２０２は、Ｔｐを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の方法で撮影されたキーフレーム群のうち、キーフレームの提示時間がＴｑ以下になった次のキーフレームの提示時間を初期値Ｔｓにしてもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の方法で撮影されたキーフレーム群のうち、最後に提示される画像の提示時間をＴｓに設定してもよい。提示方法決定部２０２は、あらかじめ設定されている映像コンテンツ全体の提示時間から、提示する画像数に応じて、Ｔｓ、Ｔｐの値を算出してもよい。連続するキーフレーム対が異なる方法で撮影された場合には、提示方法決定部２０２は、前のキーフレームの提示時間と独立に後続のキーフレームの提示時間を決定する。例えば、提示方法決定部２０２は、後続のキーフレームの提示時間を、初期値Ｔｓに設定してもよい。提示方法決定部２０２は、後続のキーフレームの提示時間を、規定範囲内のランダムな値に設定してもよい。
　（２−２）エフェクト・ＢＧＭ・ジングルに関するルール
　提示方法選択部２２２は、連続するキーフレーム対の撮影方法の同一性をもとに、キーフレーム対の間に挿入するエフェクト・ＢＧＭ・ジングルを決定する。例えば、連続するキーフレーム対が同じ方法で撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の少ないエフェクトとしてあらかじめ登録された特殊効果（ディゾルブやフェード等）を挿入する。連続するキーフレーム対が異なる方法で撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の大きいエフェクトとしてあらかじめ登録された特殊効果（ページめくり、ワイプ等のＤＶＥ）を挿入する。また例えば、連続するキーフレーム対が同じ方法で撮影された場合には、提示方法選択部２２２は、キーフレーム対の提示中同じＢＧＭを流す。また、提示方法選択部２２２は、連続するキーフレーム対が異なる方法で撮影された場合には、キーフレームの切り替え時に、ＢＧＭを止めるもしくは異なるＢＧＭに切り替える。また、提示方法選択部２２２は、異なる方法で撮影されたキーフレーム間に、ジングルを挿入してもよい。これにより、連続するキーフレーム対が同じ方法で撮影された場合、その連続するキーフレーム対は、画像や音響の変化がなく滑らかに接続される。そのため、視聴者は、それらのキーフレームが互いに変化がなくほぼ同じ内容であることを容易に理解できる。また、連続するキーフレーム対が異なる方法で撮影された場合には、画像や音響が大きく変化する。そのため、視聴者は、内容に変化があることに気づくことができる。そして、視聴者は、映像コンテンツの内容理解に集中することができる。
　（関連性３．　対象の大小関係）
　関連性判定部２２１は、キーフレーム同士の大小関係を、キーフレームを抽出したソース動画でのズーム撮影の有無および対象領域の面積によって決定してもよい。関連性３は、このように決定された関連性である。
　「対象の大小関係にある」とは、映像コンテンツ内で連続するキーフレーム対に含まれる対象が同一であり、かつ対象領域の面積に規定値以上の差があることである。例えば、対象の周囲を含んだ画像と、対象のみを撮影した画像を組み合わせて映像コンテンツを生成することで、対象を紹介するケースがある。
　関連性判定部２２１は、対象の大小関係を、同一と判定された対象領域に共通する部分領域の面積、または共通する部分領域に含まれる特徴点間の距離によって判定できる。例えば、関連性判定部２２１は、特徴点間の距離が大であるほど対象が大きく撮影されていると判定できる。関連性判定部２２１は、映像コンテンツ内で連続するキーフレーム対の間で同一と判定された対象領域の間で、対象の大小関係を判定してもよい。この場合、関連性判定部２２１は、あるキーフレーム中の対象領域の面積よりも、その次のキーフレーム中の対象領域の面積の方が大きい場合、関連性３についての関連性フラグに１を設定する。関連性判定部２２１は、あるキーフレーム中の対象領域の面積よりも、その次のキーフレーム中の対象領域の面積の方が小さい場合、関連性３についての関連性フラグに−１を設定する。関連性判定部２２１は、あるキーフレーム中の対象領域の面積と、その次のキーフレーム中の対象領域の面積の間に、大小関係が存在しない場合、関連性３についての関連性フラグに０を設定する。あるいは、関連性判定部２２１は、映像コンテンツに含まれる全キーフレームから検出された対象領域のうち、同一と判定された対象領域に共通する部分領域の面積もしくは特徴点間距離を比較して、対象の大小関係を決定してもよい。例えば、関連性判定部２２１は、同一と判定された対象領域に共通する部分領域の最大面積Ｓｍａｘと最小面積Ｓｍｉｎをもとに、（Ｓｍａｘ＋２Ｓｍｉｎ）／３よりも小さい同一の対象領域を小、（Ｓｍａｘ＋２Ｓｍｉｎ）／３よりも大きく（２Ｓｍａｘ＋Ｓｍｉｎ）／３より小さい同一の対象領域を中、（２Ｓｍａｘ＋Ｓｍｉｎ）／３よりも大きい同一の対象領域を大とする。関連性判定部２２１は、連続するキーフレーム中の対象領域が、小と中もしくは中と大の関係であれば、関連性フラグに１を設定する。関連性判定部２２１は、連続するキーフレーム中の対象領域が、小と大の関係であれば、関連性フラグに２を設定する。関連性判定部２２１は、連続するキーフレーム中の対象領域が、大と中もしくは中と小の関係であれば、関連性フラグに−１を設定する。関連性判定部２２１は、連続するキーフレーム中の対象領域が、大と小の関係であれば、関連性フラグに−２を設定する。関連性判定部２２１は、連続するキーフレーム中の対象領域に大小関係が存在しない場合、関連性フラグに０を設定する。
　〔対象の大小関係に応じたルール〕
　（３−１）提示時間に関するルール
　提示方法決定部２０２は、連続するキーフレーム対に含まれる対象の大小関係をもとに、キーフレーム対の提示時間を決定する。例えば、提示方法決定部２０２は、対象の大小関係にあるキーフレーム群のうち、はじめに提示されるキーフレームの提示時間を初期値Ｔｓに決定する。そして、提示方法決定部２０２は、Ｔｓを基準として後続のキーフレームの提示時間を決定する。また、提示方法決定部２０２は、対象の大小関係にあるキーフレーム群のうち、視認性の高いキーフレームの提示時間をＴｐに決定してもよい。そして、提示方法決定部２０２は、Ｔｐを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、大小関係にあるキーフレーム群のうち、キーフレームの提示時間がＴｑ以下になった次のキーフレームの提示時間を初期値Ｔｓに決定してもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、大小関係にあるキーフレーム群のうち最後に提示されるキーフレームの提示時間をＴｓに設定してもよい。提示方法決定部２０２は、あらかじめ設定されている映像コンテンツ全体の提示時間から、提示する画像数に応じて、Ｔｓ、Ｔｐの値を算出してもよい。また、提示方法決定部２０２は、連続するキーフレーム対に含まれる対象に大小関係がない場合には、前のキーフレームの提示時間と独立に、後続のキーフレームの提示時間を決定する。例えば、提示方法決定部２０２は、後続のキーフレームの提示時間を、初期値Ｔｓに設定してもよい。提示方法決定部２０２は、後続のキーフレームの提示時間を、規定範囲内のランダムな値に設定してもよい。
　様々な大きさの対象Ｂを撮影したキーフレームを情報処理装置２００が再生させる場合について、図７を用いて説明する。関連性判定部２２１は、映像コンテンツに含まれる全キーフレームから検出された対象領域のうち同一と判定された対象領域の間の面積を比較して、連続するキーフレーム間の大小関係を決定したとする。また、提示方法決定部２０２は、あるキーフレームの提示時間に対し、関連性フラグ分パラメータａを乗算させることで、次のキーフレームの提示時間を算出するとする。このとき、始めのキーフレーム７０１の提示時間は初期値Ｔｓであり、キーフレーム７０１、７０２は小と中の関係であり、キーフレーム７０２、７０３は中と大の関係であり、キーフレーム７０３、７０４は大と小の関係であるとする。このとき、キーフレーム７０１、７０２の関連性フラグが１であるため、キーフレーム７０２の提示時間はａＴｓになる（ａの乗算）。さらにキーフレーム７０３の関連性フラグが１であるため、キーフレーム７０３の提示時間はａ×ａ×Ｔｓになる（ａの乗算）。キーフレーム７０３、７０４の関連性フラグは−２であるため、７０４の提示時間はＴｓになる（ａ×ａの除算）。パラメータａが０から１の間に設定されていれば、対象Ｂが小さく撮影されたキーフレーム（ロングショット）が長く提示される。また、対象Ｂがより大きく撮影されたキーフレーム（ミドルショット、タイトショット）は短く提示される。
　これにより、利用者は、対象Ｂ以外の情景が映りこんだ情報量の多いキーフレームについて内容を理解することができる。そして、利用者は、以降の内容が前のキーフレームの一部分であることを直感的に理解することができる。また、本実施形態の情報処理装置２００は、同じ対象を含む画像であっても、連続する画像の提示時間が変化する映像を生成することができる。そのため、視聴者を飽きさせないテンポ感ある映像コンテンツが生成できるという効果がある。
　（３−２）エフェクト・ＢＧＭ・ジングルに関するルール
　提示方法選択部２２２は、連続するキーフレーム対に含まれる対象の大小関係をもとに、キーフレーム対の間に挿入するエフェクト・ＢＧＭ・ジングルを決定する。例えば、連続するキーフレーム対に含まれる対象が大小関係にある場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の少ないエフェクトとしてあらかじめ登録された特殊効果（ディゾルブやフェード等）を挿入する。連続するキーフレーム対に含まれる対象が大小関係にない場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の大きいエフェクトとしてあらかじめ登録された特殊効果（ページめくり、ワイプ等のＤＶＥ）を挿入する。また例えば、連続するキーフレーム対に含まれる対象が大小関係にある場合には、提示方法選択部２２２は、キーフレーム対の提示中同じＢＧＭを流す。また、連続するキーフレーム対に含まれる対象が同一でない場合には、提示方法選択部２２２は、キーフレームの切り替え時にＢＧＭを止めるもしくは異なるＢＧＭに切り替える。
　また、提示方法選択部２２２は、大小関係が存在しない画像間に、ジングルを挿入してもよい。これにより、大小関係の対象を撮影したキーフレーム群は、画像や音響の変化がなく滑らかに接続される。そのため、視聴者は、キーフレームが互いに変化がなくほぼ同じ内容であることを容易に理解できる。また、キーフレーム群が大小関係に無い場合には、画像や音響が大きく変化する。そのために、視聴者は内容に変化があったことに気づくことができ、映像コンテンツの内容理解に集中することができる。
　（関連性４．　対象の部分関係）
　関連性判定部２２１は、関連性を、２つのキーフレームに表わされた対象の部分関係によって決定してもよい。つまり、関連性判定部２２１は、キーフレーム対に含まれる２つのキーフレームに表わされた対象が全体と部分との関係にあるかによって、関連性を決定してもよい。関連性４は、このように決定された関連性である。
　「対象の部分関係にある」は、対象映像コンテンツ内で連続するキーフレーム対に映っている対象が、同一であり、かつ互いに異なる対象の部分を撮影した画像となっていることを表す。例えば、広い景色や、大きな対象や、長い対象を撮影したい場合に、対象の一部を撮影したキーフレームを組み合わせて映像コンテンツ再生することで全体を表現するケースがこれにあたる。
　関連性判定部２２１は、あるキーフレーム中の対象領域とその次のキーフレーム中の対象領域とが対象の部分関係にある場合には、関連性４についての関連性フラグに１を設定する。関連性判定部２２１は、あるキーフレーム中の対象領域とその次のキーフレーム中の対象領域とが対象の部分関係でない場合には、関連性４についての関連性フラグに０を設定する。関連性判定部２２１は、映像コンテンツ内で連続するキーフレーム中の同一と判定された対象領域に共通する部分領域（共通領域）をもとに、対象の部分関係を判定できる。例えば、関連性判定部２２１は、対象領域の一方をテンプレートにする。そして、関連性判定部２２１は、もう一方の対象領域を走査して差分の少ない位置を検出し重なった領域を共通領域にする。関連性判定部２２１は、各対象領域の共通領域以外の領域がどちらも規定面積以上の場合に、対象の部分関係と判定する。あるいは、関連性判定部２２１は、映像コンテンツに含まれる全キーフレームから同一と判定された対象領域の相対的な位置をもとに、対象の部分関係を判定してもよい。
　全体から部分への対象の変化が連続した場合には、提示方法決定部２０２は、関連性の変化が無い場合と同様に、提示方法の変更を行なう。
　〔対象の部分関係に応じたルール〕
　（４−１）提示時間に関するルール
　提示方法決定部２０２は、連続するキーフレーム対に含まれる対象の部分関係をもとに、キーフレーム対の提示時間を決定する。例えば、提示方法決定部２０２は、対象の部分関係にあるキーフレーム群のうち、はじめに提示されるキーフレームの提示時間を初期値Ｔｓにする。そして、提示方法決定部２０２は、Ｔｓを基準として後続のキーフレームの提示時間を決定する。また、提示方法決定部２０２は、対象の部分関係にあるキーフレーム群のうち、視認性の高いキーフレームの提示時間をＴｐにしてもよい。そして、提示方法決定部２０２は、Ｔｐを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、部分関係にあるキーフレーム群のうち、キーフレームの提示時間がＴｑ以下になった次のキーフレームの提示時間を初期値Ｔｓにしてもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、部分関係にあるキーフレーム群のうち、最後に提示される画像の提示時間をＴｓに設定してもよい。提示方法決定部２０２は、あらかじめ設定されている映像コンテンツ全体の提示時間から、提示する画像数に応じて、Ｔｓ、Ｔｐの値を算出してもよい。提示方法決定部２０２は、連続するキーフレーム対に含まれる対象に部分関係がない場合には、前のキーフレームの提示時間と独立に、後続のキーフレームの提示時間を決定する。例えば、提示方法決定部２０２は、後続のキーフレームの提示時間を、初期値Ｔｓに設定してもよい。提示方法決定部２０２は、後続のキーフレームの提示時間を、規定範囲内のランダムな値に設定してもよい。
　風景を撮影したキーフレームを情報処理装置２００が再生させる場合について、図８を用いて説明する。関連性判定部２２１は、連続するキーフレーム間の部分関係を、映像コンテンツに含まれる全キーフレームから検出された対象領域のうち、同一と判定された対象領域の間で共通する部分領域と、対象領域との位置関係をもとに、判定したとする。また、提示方法決定部２０２は、あるキーフレームの提示時間に対して規定パラメータを乗算することで、次のキーフレームの提示時間を算出するとする。
　提示方法決定部２０２は、始めのキーフレーム８０１の提示時間を初期値Ｔｓにする。キーフレーム８０１と８０２、８０２と８０３は部分関係があり、キーフレーム８０３と８０４とは部分関係がない。このとき、始めのキーフレーム８０１の提示時間は初期値Ｔｓであり、キーフレーム８０１、８０２の関連性フラグは１なので、キーフレーム８０２の提示時間はａ×Ｔｓになる。さらに、キーフレーム８０２、８０３の関連性フラグも１なので、キーフレーム８０３の提示時間はａ２　Ｔｓになる。キーフレーム８０３と８０４の関連性フラグは０なので、提示方法決定部２０２は、キーフレーム８０４の提示時間を初期値のＴｓにする。
　パラメータａは、０から１の間で、かつ、キーフレーム間で一致する部分領域の面積が大であるほど小さい値を設定されている。そうすると、風景について初めて提示されるキーフレーム８０１が長く提示される。また、その他の部分は、前に提示された画像との重複する情報量に応じた提示時間で提示される。これにより、利用者は、風景について始めに提示されたキーフレームについて内容を理解することができる。そして、利用者は、以降の内容が始めのキーフレームとほぼ同等の内容であることを理解することができる。また、本実施形態の情報処理装置２００は、同じ対象を含む画像であっても、連続する画像の提示時間が変化する映像を生成することができる。そのため、本字死刑対には、視聴者を飽きさせないテンポ感ある映像コンテンツが生成できるという効果がある。
　（４−２）エフェクト・ＢＧＭ・ジングルに関するルール
　提示方法選択部２２２は、連続するキーフレーム対に含まれる対象の部分関係をもとに、キーフレーム対の間に挿入するエフェクト・ＢＧＭ・ジングルを決定する。例えば、連続するキーフレーム対に含まれる対象が部分関係にある場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の少ないエフェクトとしてあらかじめ登録された特殊効果（ディゾルブやフェード等）を挿入する。連続するキーフレーム対に含まれる対象が部分関係にない場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の大きいエフェクトとしてあらかじめ登録された特殊効果（ページめくり、ワイプ等のＤＶＥ）を挿入する。また例えば、連続するキーフレーム対が対象の部分関係にある場合には、提示方法選択部２２２は、キーフレーム対の提示中同じＢＧＭを流す。連続するキーフレーム対の対象が同一でない場合には、提示方法選択部２２２は、キーフレームの切り替え時にＢＧＭを止めるもしくは異なるＢＧＭに切り替える。また、提示方法選択部２２２は、大小関係が存在しない画像間に、ジングルを挿入してもよい。これにより、連続するキーフレーム対が対象の部分関係にある場合、その連続するキーフレーム対は、画像や音響の変化がなく滑らかに接続される。そのため、視聴者は、連続するキーフレームが互いに変化がなくほぼ同じ内容であることを容易に理解できる。また、連続するキーフレーム対が部分関係にない場合には、画像や音響が大きく変化する。そのために、視聴者は内容に変化があったことに気づくことができ、映像コンテンツの内容理解に集中することができる。
　（関連性５．　対象の同種性）
　関連性判定部２２１は、関連性を、２つのキーフレームに表わされた対象が同種か否かによって決定してもよい。関連性５は、このように決定された関連性である。
　「対象が同種である」とは、映像コンテンツ内で連続するキーフレーム対に映っている主要な対象が、互いに同じ種別の対象であることである。関連性判定部２２１は、あるキーフレーム中の対象領域とその次のキーフレーム中の対象領域とが同種の関係にある場合、関連性５についての関連性フラグに１を設定する。関連性判定部２２１は、あるキーフレーム中の対象領域とその次のキーフレーム中の対象領域とが異種の場合、関連性５についての関連性フラグに０を設定する。対象の同種性の判別は、同種性を判別したい各種別に属する対象の画像データ（登録データ）をもとに、機械学習に基づく方法で実現できる。関連性判定部２２１は、まず登録データから各種別に属する対象の画像特徴量を抽出する。関連性判定部２２１は、画像特徴量として、色ヒストグラムやエッジヒストグラム等の大域特徴を用いてもよい。関連性判定部２２１は、画像特徴量として、ＨｏＧ（Ｈｉｓｔｏｇｒａｍｓ　ｏｆ　Ｏｒｉｅｎｔｅｄ　Ｇｒａｄｉｅｎｔ）やＳＩＦＴ等の局所特徴量を用いてもよい。関連性判定部２２１は、大域特徴を用いてＳＶＭ（Ｓｉｎｇｕｌａｒ　Ｖａｌｕｅ　Ｄｅｃｏｍｐｏｓｉｔｉｏｎ）やニューラルネットワークやＧＭＭ（Ｇａｕｓｓｉａｎ　Ｍｉｘｔｕｒｅ　Ｍｏｄｅｌ）等で学習を行ってもよい。あるいは、関連性判定部２２１は、局所特徴量からＢｏＷ（Ｂａｇ　ｏｆ　Ｗｏｒｄｓ）のように特徴量空間の変換を行った上で学習を行ってもよい。関連性判定部２２１は、映像コンテンツに含まれる各キーフレーム中の対象領域について同種性を判別する際、各対象領域の画像特徴量と、学習の結果得られた各種別のモデルとの間でそれぞれ類似性を求める。そして、関連性判定部２２１は、対象領域を規定値以上の類似度を得た最も近いモデルの種別と判定する。関連性判定部２２１は、同じ種別と判定された対象領域を同種と判定する。関連性判定部２２１は、上記以外の方法で、同種性を判定しても構わない。
　同種の対象を含む画像が３つ連続した場合には、提示方法決定部２０２は、関連性の変化が無い場合と同様に、提示方法の変更を行なう。
　〔対象の同種性に応じたルール〕
　（５−１）提示時間に関するルール
　提示方法決定部２０２は、連続するキーフレーム対に含まれる対象の同種性をもとに、キーフレーム対の提示時間を決定する。例えば、同種の対象を撮影したキーフレームが連続した場合、提示方法決定部２０２は、徐々に提示時間を短くしていく。すなわち、提示方法決定部２０２は、同種の対象を含むキーフレーム群のうち、はじめに提示されるキーフレームの提示時間を初期値Ｔｓにする。そして、提示方法決定部２０２は、Ｔｓを基準として後続のキーフレームの提示時間を決定する。また、提示方法決定部２０２は、同種の対象を含むキーフレーム群のうち、視認性の高いキーフレームの提示時間をＴｐにしてもよい。そして、提示方法決定部２０２は、Ｔｐを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同種の対象を含むキーフレーム群のうち、キーフレームの提示時間がＴｑ以下になった次のキーフレームの提示時間を初期値Ｔｓにしてもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同種の対象を含むキーフレーム群のうち、最後に提示される画像の提示時間をＴｓに設定してもよい。提示方法決定部２０２は、あらかじめ設定されている映像コンテンツ全体の提示時間から、提示する画像数に応じて、Ｔｓ、Ｔｐの値を算出してもよい。提示方法決定部２０２は、連続するキーフレーム対に含まれる対象が同種でない場合には、前のキーフレームの提示時間と独立に後続のキーフレームの提示時間を決定する。例えば、提示方法決定部２０２は、後続のキーフレームの提示時間を、初期値Ｔｓに設定してもよい。提示方法決定部２０２は、後続のキーフレームの提示時間を、規定範囲内のランダムな値に設定してもよい。
　花を撮影したキーフレームを再生したキーフレームを、情報処理装置２００が再生させる場合について、図９を用いて説明する。関連性判定部２２１は、連続するキーフレーム間の同種性を、機械学習に基づく方法で判定したとする。また、提示方法決定部２０２は、あるキーフレームの提示時間に対して、関連性フラグ分のパラメータを乗算させることで、次のキーフレームの提示時間を算出する。提示方法決定部２０２は、始めのキーフレーム９０１の提示時間を初期値Ｔｓとする。キーフレーム９０１、９０２、キーフレーム９０２、９０３は同種、キーフレーム９０３、９０４は異種の関係である。このとき、始めのキーフレーム９０１、９０２の関連性フラグが１であるため、キーフレーム９０２の提示時間はａ×Ｔｓになる。さらにキーフレーム９０２、９０３の関連性フラグが１であるため、キーフレーム９０３の提示時間はａ２　Ｔｓになる。キーフレーム９０３、９０４の関連性フラグは０であるため、提示方法決定部２０２は、キーフレーム９０４の提示時間を初期値のＴｓに戻す。パラメータａが０から１の間に設定されていると、植物を含むキーフレームのうち初めてに提示されたキーフレーム９０１が長く提示される。また、後続のキーフレームは、９０１から離れるほど短い提示時間で提示される。これにより、利用者は、始めに提示されたキーフレームから画像内容が植物であることを理解することができる。そして、利用者は、以降のキーフレームの内容がほぼ同等であることを理解することができる。
　また、本実施形態の情報処理装置２００は、同じ対象を含む画像であっても、連続する画像の提示時間が変化する映像を生成することができる。そのため、本実施形態には、視聴者を飽きさせないテンポ感ある映像コンテンツが生成できるという効果がある。情報処理装置２００は、花畑で撮影した複数の花の画像を、同種の被写体を順に再生することで、この種の被写体が沢山存在したことを表現できる。
　（５−２）エフェクト・ＢＧＭ・ジングルに関するルール
　提示方法選択部２２２は、連続するキーフレーム対に含まれる対象の同種性をもとに、キーフレーム対の間に挿入するエフェクト・ＢＧＭ・ジングルを決定する。例えば、連続するキーフレーム対に含まれる対象が同種の場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の少ないエフェクトとしてあらかじめ登録された特殊効果（ディゾルブやフェード等）を挿入する。連続するキーフレーム対に含まれる対象が異種の場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の大きいエフェクトとしてあらかじめ登録された特殊効果（ページめくり、ワイプ等のＤＶＥ）を挿入する。また例えば、連続するキーフレーム対が同種の場合には、提示方法選択部２２２は、キーフレーム対の提示中同じＢＧＭを流す。連続するキーフレーム対が異種の場合には、提示方法選択部２２２は、キーフレームの切り替え時にＢＧＭを止めるもしくは異なるＢＧＭに切り替える。また、提示方法選択部２２２は、異種のキーフレーム間に、ジングルを挿入してもよい。これにより、連続するキーフレーム対に含まれる対象が同種の場合、その連続するキーフレーム対は、画像や音響の変化がなく滑らかに接続される。そのため、視聴者は、キーフレームが互いに変化がなくほぼ同じ内容であることを容易に理解できる。また、連続するキーフレーム対に含まれる対象が異種の場合には、画像や音響が大きく変化する。そのために、視聴者は、内容に変化があったことに気づくことができ、映像コンテンツの内容理解に集中することができる。
　（関連性６．　撮影場所の同一性）
　関連性判定部２２１は、関連性を、２つのキーフレームの撮影場所の共通性によって決定してもよい。関連性６は、このように決定された関連性である。
　「撮影場所が同一である」とは、映像コンテンツ内で連続するキーフレーム対を撮影した場所が同一であることである。関連性判定部２２１は、あるキーフレームの撮影場所とその次のキーフレームの撮影場所が同じである場合、関連性６についての関連性フラグに１を設定する。また、関連性判定部２２１は、あるキーフレームの撮影場所とその次のキーフレームの撮影場所異なる場合、関連性６についての関連性フラグに０を設定する。関連性判定部２２１は、撮影場所の同一性を、キーフレーム中の対象領域以外の領域（背景領域）の類似度をもとに判定できる。例えば、関連性判定部２２１は、キーフレームから対象領域と背景領域を分離すればよい。そして、関連性判定部２２１は、背景領域から抽出した画像特徴量が類似する場合に、同一の撮影場所と判定してもよい。関連性判定部２２１は、上記以外の方法で、撮影場所の同一性を判定しても構わない。関連性判定部２２１は、映像コンテンツ内で連続するキーフレームの間で背景の類似性を判定することで、撮影場所の同一性を判定してもよい。あるいは、関連性判定部２２１は、映像コンテンツに含まれる全キーフレーム中の背景領域の同一性をもとに、撮影場所の同一性を判定してもよい。関連性判定部２２１は、画像情報に加えて、メタ情報である撮影場所やセンサ情報であるＧＰＳを組み合わせて、撮影場所の同一性を判定してもよい。
　同じ撮影場所で撮影された画像が３つ連続した場合には、提示方法決定部２０２は、関連性の変化が無い場合と同様にして、提示方法の変更を行なう。
　〔撮影場所の同一性に応じたルール〕
　（６−１）提示時間に関するルール
　提示方法決定部２０２は、連続するキーフレーム対の撮影場所の同一性をもとに、キーフレーム対の提示時間を決定する。例えば、同一の撮影場所で撮影されたキーフレームが連続した場合、提示方法決定部２０２は、徐々に提示時間を短くしていく。例えば、提示方法決定部２０２は、同一の場所で撮影されたキーフレーム群のうち、はじめに提示されるキーフレームの提示時間を初期値Ｔｓにする。そして、提示方法決定部２０２は、Ｔｓを基準として後続のキーフレームの提示時間を決定する。また、提示方法決定部２０２は、同一の場所で撮影されたキーフレーム群のうち、視認性の高いキーフレームの提示時間をＴｐにしてもよい。そして、提示方法決定部２０２は、Ｔｐを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の場所で撮影されたキーフレーム群のうち、キーフレームの提示時間がＴｑ以下になった次のキーフレームの提示時間を初期値Ｔｓにしてもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の場所で撮影されたキーフレーム群のうち、最後に提示される画像の提示時間をＴｓに設定してもよい。提示方法決定部２０２は、あらかじめ設定されている映像コンテンツ全体の提示時間から、提示する画像数に応じて、Ｔｓ、Ｔｐの値を算出してもよい。連続するキーフレーム対が異なる場所で撮影された場合には、提示方法決定部２０２は、前のキーフレームの提示時間と独立に後続のキーフレームの提示時間を決定する。例えば、提示方法決定部２０２は、後続のキーフレームの提示時間を、初期値Ｔｓに設定してもよい。提示方法決定部２０２は、後続のキーフレームの提示時間を、規定範囲内のランダムな値に設定してもよい。
　（６−２）エフェクト・ＢＧＭ・ジングルに関するルール
　提示方法選択部２２２は、連続するキーフレーム対の撮影場所の同一性をもとに、キーフレーム対の間に挿入するエフェクト・ＢＧＭ・ジングルを決定する。例えば、連続するキーフレーム対が同じ場所で撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の少ないエフェクトとしてあらかじめ登録された特殊効果（ディゾルブやフェード等）を挿入する。連続するキーフレーム対が異なる場所で撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の大きいエフェクトとしてあらかじめ登録された特殊効果（ページめくり、ワイプ等のＤＶＥ）を挿入する。また例えば、連続するキーフレーム対が同じ場所で撮影された場合には、提示方法選択部２２２は、キーフレーム対の提示中同じＢＧＭを流す。連続するキーフレーム対が異なる場所で撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時にＢＧＭを止めるもしくは異なるＢＧＭに切り替える。また、提示方法選択部２２２は、異なる場所で撮影されたキーフレーム間に、ジングルを挿入してもよい。これにより、連続するキーフレーム対が同じ場所で撮影された場合、その連続するキーフレーム対は画像や音響の変化がなく滑らかに接続される。そのため、視聴者は、キーフレームが互いに変化がなくほぼ同じ内容であることを容易に理解できる。また、連続するキーフレーム対が異なる場所で撮影された場合には、画像や音響が大きく変化する。そのため、視聴者は内容に変化があることに気づくことができ、映像コンテンツの内容理解に集中することができる。
　（関連性７．　撮影時間帯の同一性）
　関連性判定部２２１は、関連性を、キーフレーム対に含まれる２つのキーフレームの撮影時間帯の共通性によって決定してもよい。関連性７は、このように決定された関連性である。
　「撮影時間帯が同一である」とは、映像コンテンツ内で連続するキーフレーム対を撮影した時間帯が同一であることである。関連性判定部２２１は、あるキーフレームの撮影時間帯とその次のキーフレームの撮影時間帯が同じである場合、関連性７についての関連性フラグに１を設定する。関連性判定部２２１は、あるキーフレームの撮影時間帯とその次のキーフレームの撮影時間帯が異なる場合、関連性７についての関連性フラグに０を設定する。関連性判定部２２１は、撮影時間帯の同一性を、キーフレーム中の背景領域の色情報をもとに判定できる。例えば、関連性判定部２２１は、１日を複数の時間帯に分割する。関連性判定部２２１は、各時間帯における太陽光の色ヒストグラムの統計量を保持しておく。そして、関連性判定部２２１は、キーフレームの背景領域中にいずれかの時間帯の統計量と近い部分領域が含まれるときに、そのキーフレームがその時間帯に撮影されたと判定する。関連性判定部２２１は、各キーフレームの撮影時間帯を推定する。そして、関連性判定部２２１は、推定したキーフレームの撮影時間帯が同じ場合に、キーフレームの撮影時間帯が同一であると判定する。関連性判定部２２１は、撮影時間帯の同一性を、上記以外の方法で判定しても構わない。関連性判定部２２１は、撮影時間帯の同一性を、映像コンテンツ内で連続するキーフレームの間で撮影時間帯の類似性を判定してもよい。あるいは、関連性判定部２２１は、映像コンテンツに含まれる全キーフレーム中の撮影時間帯の同一性をもとに、撮影時間帯の同一性を判定してもよい。関連性判定部２２１は、画像情報に加えて、メタ情報である撮影時刻と組み合わせて、撮影時間帯の同一性を判定してもよい。
　同じ撮影時間帯に撮影された画像が３つ連続した場合には、提示方法決定部２０２は、関連性の変化が無い場合と同様にして、提示方法の変更を行なう。例えば、提示方法決定部２０２は、同じ時間間隔で徐々に提示時間を短くしていく。
　〔撮影時間帯の同一性に応じたルール〕
　（７−１）提示時間に関するルール
　提示方法決定部２０２は、連続するキーフレーム対の撮影時間帯の同一性をもとに、キーフレーム対の提示時間を決定する。例えば、一定範囲の撮影時間に撮影されたキーフレームが連続した場合、提示方法決定部２０２は、徐々に提示時間を短くしていく。すなわち、提示方法決定部２０２は、同一の時間帯に撮影されたキーフレーム群のうち、はじめに提示されるキーフレームの提示時間を初期値Ｔｓにする。そして、提示方法決定部２０２は、Ｔｓを基準として後続のキーフレームの提示時間を決定する。また、提示方法決定部２０２は、同一の時間帯に撮影されたキーフレーム群のうち、視認性の高いキーフレームの提示時間をＴｐにしてもよい。そして、提示方法決定部２０２は、Ｔｐを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の時間帯に撮影されたキーフレーム群のうち、キーフレームの提示時間がＴｑ以下になった次のキーフレームの提示時間を初期値Ｔｓにしてもよい。そして、提示方法決定部２０２は、Ｔｓを基準に後続のキーフレームの提示時間を決定してもよい。また、提示方法決定部２０２は、同一の時間帯に撮影されたキーフレーム群のうち、最後に提示される画像の提示時間をＴｓに設定してもよい。提示方法決定部２０２は、あらかじめ設定されている映像コンテンツ全体の提示時間から、提示する画像数に応じて、Ｔｓ、Ｔｐの値を算出してもよい。連続するキーフレーム対が異なる時間帯に撮影された場合には、提示方法決定部２０２は、前のキーフレームの提示時間と独立に後続のキーフレームの提示時間を決定する。例えば、提示方法決定部２０２は、後続のキーフレームの提示時間を、初期値Ｔｓに設定してもよい。提示方法決定部２０２は、後続のキーフレームの提示時間を、規定範囲内のランダムな値に設定してもよい。なお、ソース動画が長時間にわたるノーカット動画である場合、関連性判定部２２１は、キーフレームを抽出した箇所の動画の撮影時間帯が大きく違っていても、それらのキーフレームは関連性のあるキーフレームであると判定することができる。
　（７−２）エフェクト・ＢＧＭ・ジングルに関するルール
　提示方法選択部２２２は、連続するキーフレーム対の撮影時間帯の同一性をもとに、キーフレーム対の間に挿入するエフェクト・ＢＧＭ・ジングルを決定する。例えば、連続するキーフレーム対が同一の時間帯に撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の少ないエフェクトとしてあらかじめ登録された特殊効果（ディゾルブやフェード等）を挿入する。連続するキーフレーム対が異なる時間帯に撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時に、視覚的な変化の大きいエフェクトとしてあらかじめ登録された特殊効果（ページめくり、ワイプ等のＤＶＥ）を挿入する。また例えば、連続するキーフレーム対が同一の時間帯に撮影された場合には、提示方法選択部２２２は、キーフレーム対の提示中同じＢＧＭを流す。連続するキーフレーム対が異なる時間帯に撮影された場合には、提示方法選択部２２２は、キーフレームの切り替え時にＢＧＭを止めるもしくは異なるＢＧＭに切り替える。また、提示方法選択部２２２は、異なる時間帯のキーフレーム間に、ジングルを挿入してもよい。これにより、連続するキーフレーム対が同一の時間帯に撮影された場合、その連続するキーフレーム対は、画像や音響の変化がなく滑らかに接続される。そのため、視聴者は、それらのキーフレームが互いに変化がなくほぼ同じ内容であることを容易に理解できる。また、連続するキーフレーム対が異なる時間帯に撮影された場合には、画像や音響が大きく変化する。そのために、視聴者は、内容に変化があったことに気づくことができ、映像コンテンツの内容理解に集中することができる。提示方法選択部２２２は、提示ルールに、上記のいずれか１つのルールを適用してもよい。提示方法選択部２２２は、提示ルールに、複数のルールを組み合わせて用いてもよい。映像コンテンツ生成部２０３は、提示方法決定部２０２から入力された提示方法情報と、映像入力部２０４から入力された画像情報をもとに、映像コンテンツを生成する。
　［他の実施形態］
　以上、本発明の実施形態について詳述したが、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。
　また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する情報処理プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされるプログラム、あるいはそのプログラムを格納した媒体、そのプログラムをダウンロードさせるＷＷＷ（Ｗｏｒｌｄ　Ｗｉｄｅ　Ｗｅｂ）サーバも、本発明の範疇に含まれる。
　情報処理装置１００、情報処理装置２００は、それぞれ、コンピュータ及びコンピュータを制御するプログラム、専用のハードウェア、又は、コンピュータ及びコンピュータを制御するプログラムと専用のハードウェアの組合せにより実現することができる。
　キーフレーム抽出部１００、提示方法決定部１０２、映像コンテンツ生成部１０３、キーフレーム抽出部２０１、提示方法決定部２０２、映像コンテンツ生成部２０３、映像入力部２０４、関連性判定部２２１、提示方法選択部２２２は、例えば、プログラムを記憶する記録媒体からメモリに読み込まれた、各部の機能を実現するための専用のプログラムと、そのプログラムを実行するプロセッサにより実現することができる。あるいは、キーフレーム抽出部１００、提示方法決定部１０２、映像コンテンツ生成部１０３、キーフレーム抽出部２０１、提示方法決定部２０２、映像コンテンツ生成部２０３、映像入力部２０４、関連性判定部２２１、提示方法選択部２２２の一部又は全部を、各部の機能を実現する専用の回路によって実現することもできる。
　以上、実施形態を参照して本発明を説明したが、本発明は上記実施形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
　以上、実施形態（及び実施例）を参照して本願発明を説明したが、本願発明は上記実施形態（及び実施例）に限定されものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
　この出願は、２０１１年５月１２日に出願された日本出願特願２０１１−１０７１０４を基礎とする優先権を主張し、その開示の全てをここに取り込む。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the components described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.
[First Embodiment]
An information processing apparatus 100 as a first embodiment of the present invention will be described with reference to FIG. The information processing apparatus 100 is an apparatus that generates video content by editing a moving image.
As illustrated in FIG. 1, the information processing apparatus 100 includes a key frame extraction unit 101 (extraction unit), a presentation method determination unit 102 (determination unit), and a video content generation unit 103 (generation unit).
The key frame extraction unit 101 extracts at least two partial moving images or still images from the source moving image. The presentation method determination unit 102 determines the presentation method of at least two partial moving images or still images extracted by the key frame extraction unit 101 based on the characteristics of the source moving image. The video content generation unit 103 generates video content including at least two partial moving images or still images based on the presentation method determined by the presentation method determination unit 102. The video content includes, for example, a slide show.
According to the present embodiment, it is possible to generate video content that leaves the atmosphere of the source video.
[Second Embodiment]
(Prerequisite technology)
Video posting sites provide a list of still images and partial videos (hereinafter referred to as key frames) extracted from the video so that viewers can quickly grasp the video content and select videos of interest efficiently. Function is used.
In order for the viewer to properly understand the entire video, multiple key frames are usually required. However, when a large number of key frames are presented on the display at a time, the size per key frame may be reduced, and the content may not be sufficiently confirmed.
In order to solve the problem, it is considered that a presentation method in which key frames are switched and presented in order is effective (hereinafter, videos presented in order by switching key frames are referred to as video contents). Understanding the relevance between key frames that are continuously presented in video content may be effective in understanding the content of the input video. For example, the viewer can understand the flow of motion and change by understanding that consecutive key frames in the video content express a change of a common subject and a transition of motion. In addition, the viewer understands that the subject included in the consecutive key frames is taken with a common intention (for example, with similar interest) in the input video, and thus the importance of the subject. Understand the relationship.
In order for the viewer to understand the relevance of consecutive key frames, the presentation method of key frame presentation time and effects inserted between key frames is important. For example, if successive key frames are presented in the same way, the viewer may misunderstand that the key frames are related, even though there is actually no relationship between the key frames. Also, if consecutive key frames are presented in a completely different manner, the viewer may misunderstand that there is no association between the key frames. Therefore, in order to allow the viewer to correctly understand the relationship between the key frames, it can be said that it is effective to control the presentation rule according to the relationship between the contents of the key frames.
[Constitution]
The configuration of the information processing apparatus 200 according to the second embodiment of the present invention will be described with reference to FIG. FIG. 2 is a diagram for explaining a schematic configuration of the information processing apparatus 200 according to the present embodiment.
The information processing apparatus 200 includes a key frame extraction unit 201 (extraction unit), a presentation method determination unit 202 (determination unit), a video content generation unit 203 (generation unit), and a video input unit 204.
The key frame extraction unit 201 extracts at least two partial moving images or still images as key frames from the source moving image 210 as input video. The presentation method determination unit 202 determines the presentation method of at least two partial moving images or still images extracted by the key frame extraction unit 201 based on the characteristics of the source moving image. The video content generation unit 203 generates video content 240 as a new video that continuously presents at least two partial moving images or still images based on the presentation method determined by the presentation method determination unit 202.
The presentation method determination unit 202 includes an association determination unit 221 (determination unit) and a presentation method selection unit 222 (selection unit). The relevance determination unit 221 determines whether or not the objects included in at least two partial moving images or still images have commonality based on the source moving image. The relevancy determination unit 221 determines whether or not the objects included in at least two partial moving images or still images are the same based on the source moving image. When the object has commonality, the presentation method selection unit 222 selects a presentation method different from that when the object has no commonality. The presentation method determination unit 202 determines a key frame presentation time in the video content 240.
The video input unit 204 inputs the source moving image 210 from a video camera or the like, and passes it to the key frame extraction unit 201 and the presentation method determination unit 202. The key frame extraction unit 201 sends not only the key frame extracted from the source moving image 210 but also key frame information related to the key frame to the video content generation unit 203. The key frame information is a key frame ID (identifier) for identifying the key frame, a presentation order in the video content, and pixel information of the key frame.
The video input unit 204 inputs information on the input video (video information) to the relevance determination unit 221 in response to a request from the relevance determination unit 221. The video information is, for example, a key frame ID, pixel information of a section corresponding to the key frame, and acoustic information. The section corresponding to the key frame is, for example, a unit section to which the key frame in the input video belongs, or a unit section including the same subject as the key frame. The unit section may be any of the following four sections or a combination thereof. Sections separated by a certain time interval. A section divided based on the control signal of the photographic equipment such as the camera turning point. A section divided based on features extracted from video such as image change points and sound change points of frames. A section that is manually delimited as a point of change in shooting content such as location, subject, or time zone.
There is at least one section corresponding to each key frame for each key frame. A plurality of key frames may be associated with one section. The image information of the section is, for example, image information of a frame that belongs to the section. The acoustic information of the section is, for example, sound information synchronized with the section. The section information may include sensor information such as meta information describing a subject, a shooting location, and a shooting time shown in the section, and GPS (Global Positioning System).
The relevancy determination unit 221 acquires video information corresponding to the key frame from the video input unit 204 based on the key frame information input from the key frame extraction unit 201. Then, the relevance determination unit 221 determines relevance between key frames. The relevancy determination unit 221 inputs key frame relevance information to the presentation method selection unit 222. The key frame relevance information is, for example, a key frame ID and a relevance flag. The key frame information may include pixel information of the key frame in addition to the above. The relevance flag is data indicating the relevance type existing between the current key frame and the key frame to be presented thereafter among the relevance types specified in advance, or there is no relevance type. This data indicates that there is no relationship. For example, the relevancy determination unit 221 sets the flag 1 to the relevance flags of all relevance types existing between a certain key frame and a subsequent key frame. Then, the relevancy determination unit 221 sets the flag 0 to the relevance flag of the related type that does not exist. Or the relevance determination part 221 may set the arbitrary numerical value which has a meaning according to a relevance type to a relevance flag.
The relevancy determination unit 221 may determine whether the shooting method of the target object in the key frame is the same based on the source moving image. The relevance determination unit 221 may determine whether the objects included in the key frame have commonality based on the acoustic characteristics of the source video.
A method for determining the relevance focusing on the identity of the subject will be described below.
(Relevance 1. Subject identity)
The relevance determination unit 221 can determine the identity of the subject between the key frames based on the continuity of shooting with the source moving image from which the key frames are extracted. The relationship 1 is the relationship determined in this way.
“Subjects are the same” means that the subjects of a pair of key frames that are continuous in the video content are common. When “subjects are the same”, the subject included in the key frame pair does not change at all, but the subject included in the key frame pair is different from each other in the course of a series of changes and operations. Is also included. For example, in an image of a building that changes color with the passage of time, the subject of the key frame before and after the color change appears to be different when judged from the key frame alone. However, the relevancy determination unit 221 can determine that the subject of the key frame before and after the color change is the same by referring to the source moving image. Similarly, the relationship between the key frame pair before and after hatching extracted from the source video that captures the insect hatching appears to be a different subject when judged from the key frame alone. However, the relevancy determination unit 221 can determine that the subject is the same by referring to the source moving image. On the other hand, if key frames before and after the timbre change are extracted from a video of a musical instrument whose timbre changes over time, those key frames can be determined based on only the sound at the key frame extraction location. Can be determined to have taken a different subject. However, also in this case, the relevance determination unit 221 can determine that the subject is the same by referring to the entire source moving image.
That is, by referring to the source moving image, it becomes clear that the key frame group extracted from the moving image obtained by continuously shooting the same subject is extracted from the moving image obtained by continuously shooting the same subject. In other words, for example, the relevancy determination unit 221 can find an editing point of the source moving image and estimate that the subject is the same for the key frame group between them.
The relevancy determination unit 221 sets 1 in the relevance flag for relevance 1 when the subject of a certain key frame and the subject of the subsequent key frame are the same. The relevancy determination unit 221 sets a relevance flag for relevance 1 to 0 when the subject of a certain key frame and the subject of a subsequent key frame are not the same. The identity determination of the subject can be realized by the following method, for example. When consecutive key frame pairs in the video content are associated with the same section and the subject areas (image areas of the subject) detected from these key frames are the same, the relevancy determination unit 221 Alternatively, it may be determined that the subjects of the consecutive key frame pairs are the same. Alternatively, the relevancy determination unit 221 detects the subject area from each of the key frame pairs that are continuous in the video content. Then, the relevancy determination unit 221 detects and tracks the subject area from the section corresponding to each key frame. The relevance determining unit 221 compares the subject area and the track process subject area detected from one section with the subject area and the track process subject area detected from the other section. The relevancy determination unit 221 may determine that the subject of each key frame is the same when the image feature amounts such as color and shape of the subject regions are similar.
Alternatively, the relevance determination unit 221 may extract the image area of the subject and the acoustic information emitted by the subject from the section corresponding to each key frame for the key frame pairs continuous in the video content. When the subject area detected from one section and the subject area detected from the other section have similar image characteristics and acoustic information, the relevancy determination unit 221 selects the key frame pair. May be determined to be the same subject.
The method of detecting the subject area from the key frame is divided into a case where a specific object registered in advance is detected and a case where a general object which is not registered is detected. When detecting the specific target, the relevance determination unit 221 may use the registered image data of each target as a template. Then, the relevance determination unit 221 may scan the key frames belonging to the section with templates converted into various resolutions. Then, the relevancy determination unit 221 may detect an area where the difference in pixel values at the same position as the template is small as the corresponding subject area.
Alternatively, the relevancy determination unit 221 may extract an image feature amount expressing color, texture, and shape from each partial region of the key frame. Then, the relevancy determination unit 221 may set a partial region having an image feature amount similar to the registered image feature amount of each target as a corresponding subject region. In addition, when the specific target is a person, there is a method of using information obtained from the entire face. As such a method, for example, an image in which various faces are reflected is stored as a template, and a method of determining that a face is present in an input image when a difference between a key frame and the template is equal to or less than a threshold value. Further, the relevancy determination unit 221 may store a model combining color information such as skin color, edge direction and density in advance, and detect a region similar to the model as a subject region. Furthermore, the relevance determination unit 221 may use any of the following methods or a combination thereof. A method of detecting the outline of the face (head) using an ellipse and a template created using the shape of an elongated eye and mouth. A technique that uses the characteristics of the luminance distribution that the cheeks and forehead are bright and the eyes and mouth are low. A technique that uses the symmetry and skin tone area and position of the face.
In addition, the relevance determination unit 221 statistically learns the feature amount distribution obtained from a large amount of human face and non-face learning samples, and the feature amount obtained from the input image is either the face or non-face distribution. A method for determining whether the image belongs (neural network, support vector machine, AdaBoost method) or the like may be used. Further, when detecting a general target, the relevancy determination unit 221 may use, for example, Normalized Cut, Saliency Map, Depth of Field (DoF). Normalized Cut is a method of dividing an image into a plurality of regions. Jianbo Shi and Jitendra Malik, “Normalized Cuts and Image Segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8. August 2000 has a detailed description of Normalized Cut. The relevancy determination unit 221 may detect a region located in the center of the key frame among the regions divided by the normalized cut as the subject region. In addition, the relevancy determination unit 221 may detect an area for which a high importance level has been calculated by the Salient Map as a subject area. Saliency Map is a method for calculating an object region in an image from visual attention. For Saliency Map, see L. Itti, C.I. Koch and E.M. Niebur, “A Model of Salientity-based Visual Attention for Rapid Scene Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, no. 11, pp. 1254-1259, 1998 has a detailed description.
Dof is a method based on the characteristic that the target edge existing within the depth of field is not blurred and the edge outside the depth of field is blurred. For details, see 3Du-Ming Tsai, Hu-Jong Wang, “Segmenting focused objects in complex visual images”, Pattern Recognition Letters, Vol. 19, pp. 929 940, 1998. There is a disclosure. The relevancy determination unit 221 may calculate the amount of blur based on the thickness of the edge, combine the edges with less blur, and detect a focused region as the target region.
The relevancy determination unit 221 is a position in a still image or high visibility (lighting conditions, orientation, angle, position on the screen, hiding by other objects, blur, facial expression (in the case of a person), etc. The target area may be detected from the key frame on the basis of the evaluation value indicating the goodness of the reflection based on or the appearance frequency in a plurality of images. In addition, the relevance determination unit 221 may combine a plurality of detected subject areas into one. The relevancy determination unit 221 may detect the subject area from the section corresponding to the key frame as follows, for example. The relevancy determination unit 221 may use the image information of the subject area detected from the key frame as a template. Then, the relevancy determination unit 221 may detect the subject area from any frame belonging to the section corresponding to the key frame.
Tracking of the subject area detected in the section can be realized by the following method. The relevancy determination unit 221 sets the frame in which the subject area is detected as the start frame. The relevancy determination unit 221 performs subject area detection processing on adjacent frames in the time direction. The relevancy determination unit 221 may use the image feature amount of the target area that has already been detected as a template used for detection of the subject area. Then, the relevance determination unit 221 only needs to scan the region in the specified range around the detection position of the target region that has already been detected with the template. The relevancy determination unit 221 may extract the image feature amount from each subject region, and calculate the similarity between the subject regions based on a scale that calculates a higher value as the difference between the image feature amounts is smaller. . The image feature amount can be calculated based on image information such as a color, an edge, and a texture detected from the subject area. Alternatively, the relevancy determination unit 221 may detect local feature points such as SIFT (Scale-Invariant Feature Transform) from the image area of each subject. Then, the relevance determination unit 221 may associate feature points between image regions. Then, the relevancy determination unit 221 may use a scale that calculates a higher value as the number of associated feature points is larger or the positional relationship between the associated feature points is similar between images. . For example, the relevancy determination unit 221 may extract acoustic energy of a plurality of frequency bands as acoustic information from the section in which the subject region is detected. Then, the relevancy determination unit 221 may calculate the similarity of the acoustic information emitted by the subject based on a scale that calculates a higher value as the difference in acoustic energy is smaller. As described above, the relevancy determination unit 221 uses the information of the section corresponding to the key frame, so that the change in the appearance of the subject that occurs between the key frames and the background can be achieved as compared with the case where only the key frame information is used. It is possible to determine the identity of a subject robustly against changes in the subject.
The presentation method selection unit 222 determines an effect or jingle when switching key frames. The presentation method selection unit 222 determines an effect or jingle different from the case where there is no relationship when consecutive key frames are related to each other. The presentation method selection unit 222 determines the background music of the key frame in the video content.
Specifically, the presentation method selection unit 222 determines a key frame presentation method based on key frame relevance information input from the relevance determination unit 221 and a pre-registered presentation rule. The presentation method selection unit 222 inputs information (presentation method information) indicating a key frame presentation method to the video content generation unit 203. The presentation method information is data indicating the presentation method of each key frame. The presentation method information only needs to include the key frame ID and the presentation time. In addition to the above, the presentation method information may include an effect, BGM, audio jingle, and video jingle.
The presentation rule is a rule that defines a method for presenting a key frame according to the relevance type. The presentation rule includes a parameter that defines each presentation time of consecutive key frame pairs. In addition to the presentation time, the presentation rules may include control parameters related to effects, BGM, and jingles (short video, music, and sound effects) inserted between key frames. The presentation rule may define a presentation method in the case where no relevance type exists in the continuous key frame pairs. Examples of the presentation rules include the following.
(1) Rules regarding presentation time
When at least two partial moving images or still images are related to each other, the presentation method determination unit 202 determines the other presentation time based on one presentation time. Specifically, when at least two partial moving images or still images contain the same object, the presentation method determining unit 202 inserts the presentation method after the partial moving image or still image to be inserted before. To shorten the presentation time of the partial video or still image. On the other hand, the presentation method determination unit 202 determines the presentation time independently when at least two partial moving images or still images are not related to each other.
In other words, the presentation method determination unit 202 determines the presentation time of the key frame pair based on the identity of the subject included in the consecutive key frame pairs or the identity of the photographer's interest in the subject. For example, when the subjects included in successive key frame pairs are the same or the photographer's interest in the subjects is the same, the presentation method determination unit 202 sets the presentation time of the key frame presented first to the initial value Ts. Also good. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set Tp as the presentation time of a highly visible key frame in a key frame group including the same subject or a subject photographed with the same interest. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Tp. In addition, the presentation method determination unit 202 sets the presentation time of the key frame next to the key frame whose presentation time is equal to or less than Tq among the key frames including the same subject or the subject photographed with the same interest as an initial value. It may be Ts. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of the key frame that is presented last among the key frames including the same subject or the subject photographed with the same interest as the initial value Ts. The presentation method determination unit 202 may calculate the values of Ts and Tp according to the number of key frames to be presented from the preset time of the entire video content. In addition, when the subjects included in the consecutive key frame pairs are not the same or are not photographed with the same interest, the presentation method determining unit 202 selects the subsequent key independently of the presentation time of the previous key frame. Determine the frame presentation time. For example, the presentation method determination unit 202 may set the presentation time of subsequent key frames to the initial value Ts. The presentation method determination unit 202 may set the presentation time of subsequent key frames to a random value within a specified range.
As an example, a case in which five key frames 301 to 305 are extracted from a moving image shot around the person A will be described with reference to FIG. In the example of FIG. 3, the subjects included in the key frames 301 to 305 are the same. Therefore, the presentation method determination unit 202 calculates the presentation time of the subsequent key frame by multiplying the presentation time Ts of the first key frame by the parameter a as described above. At this time, when the presentation time of the first key frame 301 is an initial value Ts, the presentation time Ti of the subsequent key frame 302 is expressed by the following equation (1).

Further, when the visibility evaluation value of the key frame 303 facing the front is equal to or greater than the threshold, when the presentation time of the key frame 303 is Tp, the presentation time Tj of the subsequent

key frames

304 and 305 is expressed by the following equation (1). It is represented by

If the parameter a is set between 0 and 1, among the key frames including the person A, the key frame 301 presented first and the key frame 303 with good reflection of the person A are presented long. Other key frames are presented gradually shorter as they move away from the

key frames

301 and 303.
As a result, in this embodiment, the user can understand the contents of the key frame in which the object first appears or the key frame with good reflection, and the contents of the other images are almost the same as the understood contents. It has the effect of being able to understand. Further, the information processing apparatus 200 according to the present embodiment can generate a video in which the presentation time of successive images changes even for images including the same target. Therefore, the present embodiment has an effect that it is possible to generate video content with a sense of tempo that does not bore viewers.
(2) Rules regarding effects, BGM, and jingles
The presentation method selection unit 222 determines the effect, BGM, and jingle to be inserted between the key frame pairs based on the identity of the subject included in the successive key frame pairs or the identity of the photographer's interest in the subject. To do. For example, when the subjects included in consecutive key frame pairs are the same or the photographer's interest in the subjects is the same, the presentation method selection unit 222 registers in advance as an effect with little visual change when switching key frames. Insert special effects (such as dissolves and fades). When the subjects included in the consecutive key frame pairs are not the same, the presentation method selection unit 222 performs special effects (such as page turning and wipe) registered in advance as effects having a large visual change at the time of key frame switching. Insert DVE (Digital Video Effect). Further, for example, when the subjects included in the consecutive key frame pairs are subjects photographed with the same or the same interest, the presentation method selection unit 222 plays the same BGM while presenting the key frame pairs. When the subjects included in the consecutive key frame pairs are not the same, the presentation method selection unit 222 stops the BGM or switches to a different BGM when the key frame is switched. In addition, the presentation method selection unit 222 may insert a video or audio jingle between key frames that do not include subjects that are not the same or are photographed with the same interest. Thereby, the key frame group including the same subject or the subject photographed with the same interest is smoothly connected without any change in image or sound.
Therefore, the viewer can easily understand that the same subject or the subject having the same importance is included in these key frame groups. Then, the viewer can easily understand that these key frame groups are images in the middle of a series of changes and operations, or images including a subject photographed with the same intention. In addition, when the subjects in the key frame are not the same or do not include subjects photographed with the same intention, the image and sound change greatly. Therefore, the viewer can notice that the contents of the key frame have changed greatly, and can concentrate on understanding the new video.
The video content generation unit 203 generates and outputs a new video content based on the presentation method information selected by the presentation method selection unit 222 and the key frame information input from the key frame extraction unit 201.
(Operation)
Next, the operation of the present embodiment will be described in detail with reference to the flowchart of FIG. Hereinafter, as an example, an operation when the information processing apparatus 200 according to the present embodiment extracts the key frames 501 to 513 illustrated in FIG. 5 and generates video content will be described. This video content conveys the event of shooting a flower and a person in the greenhouse in the building. A rectangle in FIG. 5 is a target area detected from each key frame by the relevance determination unit 221.
Also, the presentation method selection unit 222 controls the presentation method using a rule based on a magnitude relationship or a partial relationship as a presentation rule for a key frame pair having the same target area. Moreover, the presentation method selection part 222 controls a presentation method using the rule based on a homogeneity as a presentation rule about the key frame pair whose object area | region is not the same. The rules based on the magnitude relationship, the partial relationship, and the homogeneity will be described in detail in the third and subsequent embodiments.
First, in step S401, the video input unit 204 inputs a moving image serving as a source. In step S <b> 402, the video input unit 204 passes the input source video to the key frame extraction unit 201. The key frame extraction unit 201 extracts key frames.
In step S <b> 403, the key frame extraction unit 201 passes the key frame information to the presentation method determination unit 202. In addition, the video input unit 204 passes the video information to the presentation method determination unit 202. The relevance determining unit 221 determines the identity of the subject and the commonality of the shooting method with reference to the source moving image from which the key frames 501 to 513 are extracted.
Further, in step S403, the relevance determination unit 221 detects the target area from the key frames 501 to 513. Assume that in the relevance determination unit 221, buildings, flowers, and people are registered in advance as targets. Further, it is assumed that the relevancy determination unit 221 has learned each model. Then, the relevancy determination unit 221 detects, from the key frames 501 to 513, locations surrounded by a solid line rectangle as the target region of the building.
In step S <b> 405, the relevance determination unit 221 extracts an image feature amount from the pixel information of the target area 0 and the target area 1. Then, the relevance determination unit 221 determines the identity, the magnitude relationship, the partial relationship, and the homogeneity based on the similarity between the regions. Since the

target areas

0 and 1 are detected as the types of buildings, the relevance determination unit 221 determines that there is a homogeneity between the

target areas

0 and 1. In addition, the relevance determination unit 221 detects a rectangular area on the key frame 501 as a common area between the target area 1 and the target area 0. Therefore, the relevance determination unit 221 determines that the

target areas

1 and 0 are in a magnitude relationship. In addition, since there is no area other than the common area on the target area 0, the relevancy determination unit 221 determines that there is no partial relationship between the

target areas

1 and 0. Therefore, the relationship flag between the key frame 501 and the key frame 502 is 1, −1, 0, 1 in the order of identity, magnitude relationship, partial relationship, and homogeneity.
The presentation method selection unit 222 selects a presentation method based on the image ID and the relevance flag as the image relevance information. For example, since the target areas of the key frame 501 and the key frame 502 are the same, the presentation method selection unit 222 applies a rule based on a magnitude relationship or a partial relationship. The presentation method determination unit 202 sets the presentation time of the key frame 501 that is the start image to the initial value Ts. Then, the presentation method determination unit 202 sets the presentation time of the key frame 502 to a * Ts because the magnitude relationship between the

key frames

501 and 502 is a small / large relationship. Since the

key frames

501 and 502 have a magnitude relationship, the presentation method selection unit 222 inserts a dissolve with a small visual change as an effect of switching the key frames 501 and 502 (step S407).
The video content generation unit 203 generates video content using the

key frames

501 and 502 with the determined presentation time / effect (step S409).
In the example of FIG. 6, the type of the target area detected from the key frame is the target area 601. The relevance flag for each relevance type is 602. The presentation time length and the effect determined by the presentation method determination unit 202 are the presentation time length 603 and the effect 604.
According to the present embodiment, it is possible to generate a new video that can easily understand the semantic relevance of the subject in the key frame in the input video by using the key frame extracted from the input video.
[Third Embodiment]
Instead of or in addition to the relevance 1 disclosed in the second embodiment, the presentation method may be changed according to any one of the following relevance changes.
(Relevance 2. Movie shooting method)
When the key frame extraction unit 201 extracts a key frame group relevant to the photographing method in which the photographer photographed the subject in the source video, the information processing apparatus 200 according to the present embodiment determines the relevance of the photographing method. These key frames are presented by a corresponding presentation method. For example, when a plurality of consecutive key frames are all extracted from a moving image portion captured by follow shooting (a technique of shooting by moving the camera to follow the movement of the subject), the relevancy determination unit 221 It is determined that these key frames are related.
Similarly, if multiple consecutive key frames are extracted from a moving image part that was shot with zoom shooting, or if they were all extracted from a moving picture part that was shot statically for a certain period of time, the relevance determination The unit 221 determines that there is a certain relationship or commonality among these key frame groups. The relevance determination unit 221 sets 1 to the relevance flag for relevance 2 when there is relevance to the imaging method. The relevance determination unit 221 sets 0 in the relevance flag for relevance 2 when there is no shooting method.
The relevancy determination unit 221 detects a subject area from each successive key frame pair. Then, the relevance determining unit 221 detects the subject areas from the sections corresponding to the key frames based on the information on the subject areas. The relevancy determination unit 221 analyzes the motion vectors of the subject area and the background area. Then, the relevancy determination unit 221 determines that the subject is photographed by an intentional method such as follow, zoom, or stillness.
The relevancy determination unit 221 can determine the method of shooting the subject area in the section, for example, by the following method. The relevancy determination unit 221 can determine that the subject is being followed by the follow target determination method disclosed in Japanese Patent No. 4593314. Further, the relevancy determination unit 221 can determine that the subject is zoomed in or is still photographed by the camera motion determination method disclosed in Japanese Patent Laid-Open No. 2007-19814.
[Rules based on commonality of video shooting methods]
(2-1) Rules regarding presentation time
The presentation method determination unit 202 determines the presentation time of the key frame pairs based on the identity of the source video shooting methods of consecutive key frame pairs. For example, when the key frames shot by the same shooting method are consecutive, the presentation method determination unit 202 gradually shortens the presentation time. That is, the presentation method determination unit 202 sets the presentation time of the key frame presented first among the key frame groups photographed by the same method to the initial value Ts. And the presentation method determination part 202 determines the presentation time of a subsequent key frame on the basis of Ts. Further, the presentation method determination unit 202 may set the presentation time of a key frame having high visibility among key frame groups photographed by the same method to Tp. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Tp. In addition, the presentation method determination unit 202 may set the presentation time of the next key frame in which the presentation time of the key frame is equal to or less than Tq among the key frame groups photographed by the same method as the initial value Ts. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of the image presented last among the key frame groups photographed by the same method to Ts. The presentation method determination unit 202 may calculate the values of Ts and Tp according to the number of images to be presented from the preset presentation time of the entire video content. When successive key frame pairs are photographed by different methods, the presentation method determination unit 202 determines the presentation time of the subsequent key frame independently of the presentation time of the previous key frame. For example, the presentation method determination unit 202 may set the presentation time of subsequent key frames to the initial value Ts. The presentation method determination unit 202 may set the presentation time of subsequent key frames to a random value within a specified range.
(2-2) Rules regarding effects, BGM, and jingles
The presentation method selection unit 222 determines an effect, a BGM, and a jingle to be inserted between a pair of key frames based on the identity of consecutive key frame pairs. For example, when consecutive key frame pairs are photographed by the same method, the presentation method selection unit 222 performs special effects (such as dissolves and fades) registered in advance as effects with little visual change when switching key frames. ) Is inserted. When consecutive key frame pairs are photographed by different methods, the presentation method selection unit 222 performs special effects (such as page turning and wipe) registered in advance as effects having a large visual change when switching key frames. DVE) is inserted. Further, for example, when consecutive key frame pairs are photographed by the same method, the presentation method selection unit 222 plays the same BGM during presentation of the key frame pairs. In addition, when successive key frame pairs are captured by different methods, the presentation method selection unit 222 stops the BGM or switches to a different BGM when the key frames are switched. In addition, the presentation method selection unit 222 may insert jingles between key frames captured by different methods. As a result, when successive key frame pairs are photographed in the same manner, the successive key frame pairs are smoothly connected without any change in image or sound. Therefore, the viewer can easily understand that those key frames have almost the same contents without any change. Further, when consecutive key frame pairs are captured by different methods, the image and sound change greatly. Therefore, the viewer can notice that there is a change in the content. The viewer can concentrate on understanding the content of the video content.
(Relevance 3. Target size relationship)
The relevancy determination unit 221 may determine the magnitude relationship between the key frames based on the presence / absence of zoom shooting in the source moving image from which the key frames are extracted and the area of the target region. The relationship 3 is the relationship determined in this way.
The “target size relationship” means that the targets included in consecutive key frame pairs in the video content are the same, and the area of the target region has a difference greater than a specified value. For example, there is a case where a target is introduced by generating a video content by combining an image including the periphery of the target and an image obtained by photographing only the target.
The relevance determination unit 221 can determine the magnitude relationship between objects based on the area of partial areas common to the target areas determined to be the same or the distance between feature points included in the common partial areas. For example, the relevancy determination unit 221 can determine that the larger the distance between feature points is, the larger the target is photographed. The relevancy determination unit 221 may determine the size relationship between the target areas determined to be the same between consecutive key frame pairs in the video content. In this case, when the area of the target region in the next key frame is larger than the area of the target region in a certain key frame, the relevancy determination unit 221 sets 1 to the relevance flag for the relevance 3. Set. When the area of the target region in the next key frame is smaller than the area of the target region in a certain key frame, the relevance determining unit 221 sets −1 in the relevance flag for the relevance 3 . When there is no magnitude relationship between the area of the target region in a certain key frame and the area of the target region in the next key frame, the relevancy determination unit 221 sets the relevance flag for relevance 3 to 0. Set. Alternatively, the relevancy determination unit 221 compares the areas of the partial regions common to the target regions determined to be the same among the target regions detected from all the key frames included in the video content, or the distances between the feature points, You may determine the magnitude relationship of object. For example, the relevancy determination unit 221 reduces the same target area smaller than (Smax + 2Smin) / 3 based on the maximum area Smax and the minimum area Smin of the partial areas common to the target areas determined to be the same, ( The same target area that is larger than (Smax + 2Smin) / 3 and smaller than (2Smax + Smin) / 3 is set as medium, and the same target area that is larger than (2Smax + Smin) / 3 is set as large. The relevancy determination unit 221 sets the relevance flag to 1 if the target area in the continuous key frame has a small-medium or medium-large relationship. The relevancy determination unit 221 sets 2 in the relevance flag if the target area in the continuous key frame has a small and large relationship. The relevancy determination unit 221 sets −1 to the relevance flag if the target area in the continuous key frame has a large / medium or medium / small relationship. The relevance determination unit 221 sets −2 to the relevance flag if the target area in the continuous key frame has a large and small relationship. The relevancy determination unit 221 sets the relevance flag to 0 when there is no magnitude relationship between target areas in consecutive key frames.
[Rules according to the target size]
(3-1) Rules regarding presentation time
The presentation method determination unit 202 determines the presentation time of a key frame pair based on the size relationship of objects included in successive key frame pairs. For example, the presentation method determination unit 202 determines the presentation time of the key frame that is presented first among the key frame groups having a target size relationship as the initial value Ts. And the presentation method determination part 202 determines the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may determine the presentation time of a key frame with high visibility among the key frame groups having a target size relationship as Tp. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Tp. In addition, the presentation method determination unit 202 may determine, as an initial value Ts, the presentation time of the next key frame in which the key frame presentation time is equal to or less than Tq among the large and small key frame groups. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of the key frame presented last among the key frame groups having a magnitude relationship to Ts. The presentation method determination unit 202 may calculate the values of Ts and Tp according to the number of images to be presented from the preset presentation time of the entire video content. The presentation method determining unit 202 determines the presentation time of the subsequent key frame independently of the presentation time of the previous key frame when there is no magnitude relationship between the objects included in the continuous key frame pairs. For example, the presentation method determination unit 202 may set the presentation time of subsequent key frames to the initial value Ts. The presentation method determination unit 202 may set the presentation time of subsequent key frames to a random value within a specified range.
A case where the information processing apparatus 200 reproduces a key frame obtained by photographing the target B having various sizes will be described with reference to FIG. The relevance determination unit 221 compares the areas of the target areas determined to be the same among the target areas detected from all the key frames included in the video content, and determines the magnitude relationship between successive key frames. And In addition, the presentation method determination unit 202 calculates the presentation time of the next key frame by multiplying the presentation time of a certain key frame by the relevance flag parameter a. At this time, the presentation time of the first key frame 701 is the initial value Ts, the

key frames

701 and 702 have a small and medium relationship, the

key frames

702 and 703 have a medium and large relationship, and the

key frames

703, 703, Assume that 704 has a relationship between large and small. At this time, since the relevance flag of the

key frames

701 and 702 is 1, the presentation time of the key frame 702 is aTs (multiplication of a). Further, since the relevance flag of the key frame 703 is 1, the presentation time of the key frame 703 is a × a × Ts (multiplication of a). Since the relevance flag of the

key frames

703 and 704 is −2, the presentation time of 704 is Ts (a × a division). If the parameter a is set between 0 and 1, a key frame (long shot) in which the target B is photographed small is presented long. In addition, a key frame (middle shot or tight shot) in which the target B is photographed larger is presented short.
As a result, the user can understand the contents of key frames with a large amount of information in which scenes other than the target B are reflected. The user can intuitively understand that the subsequent contents are a part of the previous key frame. Further, the information processing apparatus 200 according to the present embodiment can generate a video in which the presentation time of successive images changes even for images including the same target. Therefore, there is an effect that it is possible to generate video content with a sense of tempo that does not bore viewers.
(3-2) Rules regarding effects, BGM, and jingles
The presentation method selection unit 222 determines an effect, a BGM, and a jingle to be inserted between the key frame pairs based on the size relationship of the objects included in the continuous key frame pairs. For example, when the objects included in successive key frame pairs are in a size relationship, the presentation method selection unit 222 performs special effects (dissolve or Insert fade etc. When the objects included in the consecutive key frame pairs are not in a size relationship, the presentation method selection unit 222 performs special effects (page turning, wipe, etc.) registered in advance as effects having a large visual change when switching key frames. Etc. DVE). Further, for example, when the objects included in the continuous key frame pairs are in a magnitude relationship, the presentation method selection unit 222 plays the same BGM during the presentation of the key frame pairs. In addition, when the targets included in the consecutive key frame pairs are not the same, the presentation method selection unit 222 stops the BGM or switches to a different BGM when the key frame is switched.
The presentation method selection unit 222 may insert jingles between images that do not have a magnitude relationship. Thereby, the key frame group which image | photographed the object of magnitude relationship is connected smoothly, without a change of an image or an audio | voice. Therefore, the viewer can easily understand that the key frames have almost the same contents without any change. In addition, when the key frame group is not in a size relationship, the image and sound change greatly. Therefore, the viewer can notice that the content has changed, and can concentrate on understanding the content of the video content.
(Relevance 4. Target partial relationship)
The relevancy determination unit 221 may determine the relevance based on the partial relationship of the target represented in the two key frames. That is, the relevancy determination unit 221 may determine relevance depending on whether the object represented in the two key frames included in the key frame pair is in a relationship between the whole and the part. The relationship 4 is the relationship determined in this way.
“Having a partial relationship between objects” indicates that the objects shown in consecutive key frame pairs in the target video content are the same and are images obtained by capturing different target parts. For example, when a wide landscape, a large object, or a long object is to be photographed, this is a case in which the whole is expressed by reproducing video content by combining key frames in which a part of the object is photographed.
When the target area in a certain key frame and the target area in the next key frame have a target partial relationship, the relevancy determination unit 221 sets 1 in the relevance flag for the relevance 4. When the target area in a certain key frame and the target area in the next key frame are not the target partial relationship, the relevance determining unit 221 sets 0 in the relevance flag for the relevance 4. The relevance determination unit 221 can determine a target partial relationship based on a partial area (common area) common to target areas determined to be the same in consecutive key frames in video content. For example, the relevance determination unit 221 uses one of the target areas as a template. Then, the relevancy determination unit 221 scans the other target area, detects a position with a small difference, and sets the overlapped area as a common area. The relevance determination unit 221 determines a target partial relationship when both regions other than the common region of each target region are larger than the specified area. Alternatively, the relevancy determination unit 221 may determine the target partial relationship based on the relative position of the target region determined to be the same from all the key frames included in the video content.
When the change of the object from the whole to the part continues, the presentation method determination unit 202 changes the presentation method as in the case where there is no change in relevance.
[Rules according to the target partial relationship]
(4-1) Rules regarding presentation time
The presentation method determination unit 202 determines the presentation time of a key frame pair based on the target partial relationship included in successive key frame pairs. For example, the presentation method determination unit 202 sets the presentation time of the key frame presented first among the key frame groups in the target partial relationship to the initial value Ts. And the presentation method determination part 202 determines the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set Tp as the presentation time of a key frame with high visibility among the key frame groups in the target partial relationship. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Tp. In addition, the presentation method determination unit 202 may set the presentation time of the next key frame in which the presentation time of the key frame is equal to or less than Tq among the key frame groups having a partial relationship as the initial value Ts. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of the image presented last among the key frame groups having a partial relationship to Ts. The presentation method determination unit 202 may calculate the values of Ts and Tp according to the number of images to be presented from the preset presentation time of the entire video content. The presentation method determination unit 202 determines the presentation time of the subsequent key frame independently of the presentation time of the previous key frame when there is no partial relationship between the objects included in the consecutive key frame pairs. For example, the presentation method determination unit 202 may set the presentation time of subsequent key frames to the initial value Ts. The presentation method determination unit 202 may set the presentation time of subsequent key frames to a random value within a specified range.
A case where the information processing apparatus 200 reproduces a key frame in which a landscape is photographed will be described with reference to FIG. The relevance determination unit 221 includes a partial region that is common among target regions determined to be the same among target regions detected from all key frames included in the video content with respect to a partial relationship between consecutive key frames. Assume that the determination is made based on the positional relationship with the target region. Also, it is assumed that the presentation method determination unit 202 calculates the presentation time of the next key frame by multiplying the presentation time of a certain key frame by a specified parameter.
The presentation method determination unit 202 sets the presentation time of the first key frame 801 to the initial value Ts.

Key frames

801 and 802 and 802 and 803 have a partial relationship, and

key frames

803 and 804 have no partial relationship. At this time, the presentation time of the first key frame 801 is the initial value Ts, and the relevance flag of the

key frames

801 and 802 is 1, so the presentation time of the key frame 802 is a × Ts. Further, since the relevance flag of the

key frames

802 and 803 is 1, the presentation time of the key frame 803 is a2 Ts. Since the association flag between the

key frames

803 and 804 is 0, the presentation method determination unit 202 sets the presentation time of the key frame 804 to the initial value Ts.
The parameter a is set to a smaller value between 0 and 1 and as the area of the partial region that matches between the key frames is larger. Then, the key frame 801 that is presented for the first time about the landscape is presented for a long time. In addition, the other part is presented at a presentation time corresponding to the amount of information overlapping with the previously presented image. Thereby, the user can understand the content about the key frame first presented about the scenery. Then, the user can understand that the subsequent contents are substantially equivalent to the first key frame. Further, the information processing apparatus 200 according to the present embodiment can generate a video in which the presentation time of successive images changes even for images including the same target. Therefore, the actual capital punishment pair has the effect of generating video content with a tempo that does not bore viewers.
(4-2) Rules regarding effects, BGM, and jingles
The presentation method selection unit 222 determines an effect, a BGM, and a jingle to be inserted between a pair of key frames based on a target partial relationship included in successive key frame pairs. For example, when the objects included in successive key frame pairs are in a partial relationship, the presentation method selection unit 222 performs special effects (dissolve or pre-registered) as effects with little visual change when switching key frames. Insert fade etc. When the target included in the consecutive key frame pairs is not in a partial relationship, the presentation method selection unit 222 performs special effects (page turning, wipe, etc.) registered in advance as effects having a large visual change when switching key frames. Etc. DVE). Also, for example, when consecutive key frame pairs are in the target partial relationship, the presentation method selection unit 222 plays the same BGM during presentation of the key frame pairs. When the targets of successive key frame pairs are not the same, the presentation method selection unit 222 stops the BGM or switches to a different BGM when the key frame is switched. The presentation method selection unit 222 may insert jingles between images that do not have a magnitude relationship. Thereby, when a continuous key frame pair has a target partial relationship, the continuous key frame pair is smoothly connected without any change in image or sound. Therefore, the viewer can easily understand that the continuous key frames have almost the same contents without any change. In addition, when continuous key frame pairs are not in a partial relationship, the image and sound change greatly. Therefore, the viewer can notice that the content has changed, and can concentrate on understanding the content of the video content.
(Relevance 5. Target homogeneity)
The relevance determination unit 221 may determine relevance depending on whether or not the objects represented in the two key frames are of the same type. The relationship 5 is the relationship determined in this way.
“The target is the same type” means that main targets appearing in consecutive key frame pairs in the video content are the same type of target. When the target area in a certain key frame and the target area in the next key frame have the same kind of relationship, the relevancy determination unit 221 sets 1 in the relevance flag for the relevance 5. When the target area in a certain key frame and the target area in the next key frame are different, the relevancy determination unit 221 sets 0 in the relevance flag for the relevance 5. Discrimination of the homogeneity of the object can be realized by a method based on machine learning based on the image data (registered data) of the object belonging to each type for which homogeneity is desired. The relevancy determination unit 221 first extracts image feature amounts of objects belonging to various types from the registration data. The relevancy determination unit 221 may use a global feature such as a color histogram or an edge histogram as the image feature amount. The relevancy determination unit 221 may use a local feature amount such as HoG (Histograms of Oriented Gradient) or SIFT as the image feature amount. The relevancy determination unit 221 may perform learning using a global feature using SVM (Singular Value Decomposition), a neural network, a GMM (Gaussian Mixture Model), or the like. Alternatively, the relevancy determination unit 221 may perform learning after converting a feature amount space from a local feature amount, such as BoW (Bag of Words). When determining the homogeneity of the target area in each key frame included in the video content, the relevancy determination unit 221 determines between the image feature amount of each target area and the various models obtained as a result of learning. Each seeks similarity. Then, the relevance determination unit 221 determines that the target region is the closest model type that has obtained a similarity equal to or higher than a specified value. The relevancy determination unit 221 determines that the target areas determined to be the same type are the same type. The relevancy determination unit 221 may determine the homogeneity by a method other than the above.
When three images including the same type of target are consecutive, the presentation method determination unit 202 changes the presentation method as in the case where there is no change in relevance.
[Rules according to target homogeneity]
(5-1) Rules regarding presentation time
The presentation method determination unit 202 determines the presentation time of a key frame pair based on the homogeneity of objects included in successive key frame pairs. For example, when key frames obtained by shooting the same type of object are consecutive, the presentation method determination unit 202 gradually shortens the presentation time. That is, the presentation method determination unit 202 sets the presentation time of the key frame presented first in the key frame group including the same type of target to the initial value Ts. And the presentation method determination part 202 determines the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of a highly visible key frame among key frame groups including the same type of target to Tp. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Tp. In addition, the presentation method determination unit 202 may set the presentation time of the next key frame in which the presentation time of the key frame is equal to or less than Tq in the key frame group including the same type of target as the initial value Ts. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of the image presented last among the key frame groups including the same type of target to Ts. The presentation method determination unit 202 may calculate the values of Ts and Tp according to the number of images to be presented from the preset presentation time of the entire video content. The presentation method determination unit 202 determines the presentation time of the subsequent key frame independently of the presentation time of the previous key frame when the objects included in the successive key frame pairs are not of the same type. For example, the presentation method determination unit 202 may set the presentation time of subsequent key frames to the initial value Ts. The presentation method determination unit 202 may set the presentation time of subsequent key frames to a random value within a specified range.
A case where the information processing apparatus 200 reproduces a key frame obtained by reproducing a key frame obtained by photographing a flower will be described with reference to FIG. Assume that the relevancy determination unit 221 determines the homogeneity between successive key frames by a method based on machine learning. Also, the presentation method determination unit 202 calculates the presentation time of the next key frame by multiplying the presentation time of a certain key frame by the parameter for the relevance flag. The presentation method determination unit 202 sets the presentation time of the first key frame 901 as the initial value Ts. The key frames 901 and 902 and the

key frames

902 and 903 have the same type, and the

key frames

903 and 904 have a different type. At this time, since the relevance flag of the first

key frames

901 and 902 is 1, the presentation time of the key frame 902 is a × Ts. Further, since the relevance flag of the

key frames

902 and 903 is 1, the presentation time of the key frame 903 is a2 Ts. Since the relevance flag of the

key frames

903 and 904 is 0, the presentation method determination unit 202 returns the presentation time of the key frame 904 to the initial value Ts. When the parameter a is set between 0 and 1, the key frame 901 presented for the first time among the key frames including the plant is presented for a long time. Subsequent key frames are presented in a presentation time that is shorter as they move away from 901. Thereby, the user can understand that the image content is a plant from the key frame presented first. Then, the user can understand that the contents of the subsequent key frames are almost the same.
Further, the information processing apparatus 200 according to the present embodiment can generate a video in which the presentation time of successive images changes even for images including the same target. Therefore, the present embodiment has an effect that it is possible to generate video content with a sense of tempo that does not bore viewers. The information processing apparatus 200 can express that there are many subjects of this type by sequentially reproducing the same type of subjects in a plurality of flower images taken in the flower field.
(5-2) Rules regarding effects, BGM, and jingles
The presentation method selection unit 222 determines an effect, a BGM, and a jingle to be inserted between key frame pairs based on the homogeneity of objects included in consecutive key frame pairs. For example, when the objects included in consecutive key frame pairs are of the same type, the presentation method selection unit 222 performs special effects (such as dissolves and fades) registered in advance as effects with little visual change when switching key frames. ) Is inserted. When the targets included in consecutive key frame pairs are different, the presentation method selection unit 222 performs special effects (such as page turning and wipe) registered in advance as effects having a large visual change when switching key frames. DVE) is inserted. Also, for example, when consecutive key frame pairs are of the same type, the presentation method selection unit 222 plays the same BGM during presentation of the key frame pairs. When the consecutive key frame pairs are different, the presentation method selection unit 222 stops the BGM or switches to a different BGM when the key frame is switched. The presentation method selection unit 222 may insert jingles between different types of key frames. Thereby, when the object contained in a continuous key frame pair is the same type, the continuous key frame pair is smoothly connected without any change in image or sound. Therefore, the viewer can easily understand that the key frames have almost the same contents without any change. In addition, when the objects included in the continuous key frame pairs are different, the image and sound change greatly. Therefore, the viewer can notice that the content has changed, and can concentrate on understanding the content of the video content.
(Relevance 6. Identity of shooting location)
The relevance determination unit 221 may determine the relevance based on the commonality of the shooting locations of the two key frames. The relationship 6 is the relationship determined in this way.
“The shooting location is the same” means that the location where the consecutive key frame pairs are shot in the video content is the same. When the shooting location of a certain key frame is the same as the shooting location of the next key frame, the relationship determination unit 221 sets 1 to the relationship flag for the relationship 6. Further, the relevance determination unit 221 sets a relevance flag for relevance 6 to 0 when the shooting location of a certain key frame differs from the shooting location of the next key frame. The relevancy determination unit 221 can determine the identity of the shooting location based on the similarity of an area (background area) other than the target area in the key frame. For example, the relevancy determination unit 221 may separate the target area and the background area from the key frame. Then, the relevancy determination unit 221 may determine the same shooting location when the image feature values extracted from the background region are similar. The relevancy determination unit 221 may determine the identity of the shooting location by a method other than the above. The relevance determination unit 221 may determine the identity of the shooting location by determining the similarity of the background between consecutive key frames in the video content. Alternatively, the relevancy determination unit 221 may determine the identity of the shooting location based on the identity of the background area in all key frames included in the video content. The relevancy determination unit 221 may determine the identity of the shooting location by combining the shooting location as meta information and the GPS as sensor information in addition to the image information.
When three images taken at the same shooting location are consecutive, the presentation method determination unit 202 changes the presentation method in the same manner as when there is no change in relevance.
[Rules according to the identity of the shooting location]
(6-1) Rules regarding presentation time
The presentation method determination unit 202 determines the presentation time of the key frame pairs based on the identity of the shooting locations of successive key frame pairs. For example, when key frames shot at the same shooting location are consecutive, the presentation method determination unit 202 gradually shortens the presentation time. For example, the presentation method determination unit 202 sets the presentation time of the key frame presented first among the key frame groups photographed at the same place as the initial value Ts. And the presentation method determination part 202 determines the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of a key frame having high visibility among key frame groups photographed at the same place as Tp. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Tp. In addition, the presentation method determination unit 202 may set the presentation time of the next key frame in which the presentation time of the key frame is equal to or less than Tq among the key frame groups photographed at the same place as the initial value Ts. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of the image presented last among the key frame groups photographed at the same place to Ts. The presentation method determination unit 202 may calculate the values of Ts and Tp according to the number of images to be presented from the preset presentation time of the entire video content. When consecutive key frame pairs are photographed at different locations, the presentation method determination unit 202 determines the presentation time of the subsequent key frame independently of the presentation time of the previous key frame. For example, the presentation method determination unit 202 may set the presentation time of subsequent key frames to the initial value Ts. The presentation method determination unit 202 may set the presentation time of subsequent key frames to a random value within a specified range.
(6-2) Rules regarding effects, BGM, and jingles
The presentation method selection unit 222 determines an effect, a BGM, and a jingle to be inserted between the key frame pairs based on the identity of the shooting locations of successive key frame pairs. For example, when consecutive key frame pairs are photographed at the same place, the presentation method selection unit 222 performs special effects (such as dissolves and fades) registered in advance as effects with little visual change when switching key frames. ) Is inserted. When consecutive key frame pairs are photographed at different locations, the presentation method selection unit 222 performs special effects (such as page turning and wipe) registered in advance as effects having a large visual change when switching key frames. DVE) is inserted. Further, for example, when consecutive key frame pairs are photographed at the same place, the presentation method selection unit 222 plays the same BGM during presentation of the key frame pairs. When consecutive key frame pairs are photographed at different locations, the presentation method selection unit 222 stops the BGM or switches to a different BGM when the key frame is switched. In addition, the presentation method selection unit 222 may insert jingles between key frames taken at different locations. As a result, when successive key frame pairs are photographed at the same place, the successive key frame pairs are smoothly connected without any change in image or sound. Therefore, the viewer can easily understand that the key frames have almost the same contents without any change. In addition, when consecutive key frame pairs are shot at different locations, the image and sound change greatly. Therefore, the viewer can notice that there is a change in the content, and can concentrate on understanding the content of the video content.
(Relevance 7. Same time zone)
The relevancy determination unit 221 may determine the relevance based on the commonality of the shooting time zones of two key frames included in the key frame pair. The relationship 7 is the relationship determined in this way.
“Shooting time zones are the same” means that the time zones when shooting consecutive key frame pairs in the video content are the same. The relevancy determination unit 221 sets 1 to the relevance flag for relevance 7 when the shooting time zone of a certain key frame is the same as the shooting time zone of the next key frame. When the shooting time zone of a certain key frame and the shooting time zone of the next key frame are different, the relevance determination unit 221 sets 0 to the relevance flag for the relevance 7. The relevance determination unit 221 can determine the identity of the shooting time zone based on the color information of the background area in the key frame. For example, the relevancy determination unit 221 divides one day into a plurality of time zones. The relevancy determination unit 221 holds the statistics of the color histogram of sunlight in each time zone. Then, the relevancy determination unit 221 determines that the key frame was captured in the time zone when the background region of the key frame includes a partial region close to the statistical amount of any time zone. The relevancy determination unit 221 estimates the shooting time period of each key frame. Then, the relevancy determination unit 221 determines that the key frame shooting time zones are the same when the estimated key frame shooting time zones are the same. The relevancy determination unit 221 may determine the identity of the shooting time zone by a method other than the above. The relevancy determination unit 221 may determine the similarity of the shooting time zones between consecutive key frames in the video content for the identity of the shooting time zones. Alternatively, the relevancy determination unit 221 may determine the identity of the shooting time zone based on the identity of the shooting time zones in all key frames included in the video content. The relevance determination unit 221 may determine the identity of the shooting time period in combination with the shooting time that is meta information in addition to the image information.
When three images taken in the same shooting time period are consecutive, the presentation method determination unit 202 changes the presentation method in the same manner as when there is no change in relevance. For example, the presentation method determination unit 202 gradually shortens the presentation time at the same time interval.
[Rules according to the identity of the shooting period]
(7-1) Rules regarding presentation time
The presentation method determination unit 202 determines the presentation time of a key frame pair based on the identity of shooting time zones of consecutive key frame pairs. For example, in the case where key frames photographed within a certain range of photographing time are consecutive, the presentation method determination unit 202 gradually shortens the presentation time. That is, the presentation method determination unit 202 sets the presentation time of the key frame presented first among the key frame groups photographed in the same time zone to the initial value Ts. And the presentation method determination part 202 determines the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of a key frame having high visibility among key frame groups photographed in the same time zone as Tp. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Tp. In addition, the presentation method determination unit 202 may set the presentation time of the next key frame in which the presentation time of the key frame is equal to or less than Tq among the key frame groups photographed in the same time zone as the initial value Ts. And the presentation method determination part 202 may determine the presentation time of a subsequent key frame on the basis of Ts. In addition, the presentation method determination unit 202 may set the presentation time of the image presented last among the key frame groups photographed in the same time zone to Ts. The presentation method determination unit 202 may calculate the values of Ts and Tp according to the number of images to be presented from the preset presentation time of the entire video content. When consecutive key frame pairs are captured in different time zones, the presentation method determination unit 202 determines the presentation time of the subsequent key frame independently of the presentation time of the previous key frame. For example, the presentation method determination unit 202 may set the presentation time of subsequent key frames to the initial value Ts. The presentation method determination unit 202 may set the presentation time of subsequent key frames to a random value within a specified range. When the source video is an uncut video over a long period of time, the relevancy determination unit 221 determines that the key frames are related keys even if the shooting time zone of the video where the key frame is extracted is significantly different. It can be determined that it is a frame.
(7-2) Rules regarding effects, BGM, and jingles
The presentation method selection unit 222 determines the effect, BGM, and jingle to be inserted between the key frame pairs based on the identity of the shooting time zones of the consecutive key frame pairs. For example, when consecutive key frame pairs are photographed in the same time zone, the presentation method selection unit 222 performs special effects (dissolve or Insert fade etc. When consecutive key frame pairs are photographed at different time periods, the presentation method selection unit 222 performs special effects (page turning, wipe, etc.) registered in advance as effects having a large visual change when switching key frames. DVE) is inserted. Further, for example, when consecutive key frame pairs are photographed in the same time zone, the presentation method selection unit 222 plays the same BGM during presentation of the key frame pairs. When consecutive key frame pairs are photographed in different time zones, the presentation method selection unit 222 stops the BGM or switches to a different BGM when switching the key frame. Further, the presentation method selection unit 222 may insert jingles between key frames in different time zones. As a result, when consecutive key frame pairs are photographed in the same time zone, the consecutive key frame pairs are smoothly connected without any change in image or sound. Therefore, the viewer can easily understand that those key frames have almost the same contents without any change. Further, when consecutive key frame pairs are photographed in different time zones, the image and sound change greatly. Therefore, the viewer can notice that the content has changed, and can concentrate on understanding the content of the video content. The presentation method selection unit 222 may apply any one of the above rules to the presentation rules. The presentation method selection unit 222 may use a plurality of rules in combination with the presentation rules. The video content generation unit 203 generates video content based on the presentation method information input from the presentation method determination unit 202 and the image information input from the video input unit 204.
[Other Embodiments]
As mentioned above, although embodiment of this invention was explained in full detail, the system or apparatus which combined the separate characteristic contained in each embodiment how was included in the category of this invention.
In addition, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where an information processing program that implements the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a program installed in the computer, a medium storing the program, and a WWW (World Wide Web) server that downloads the program are also included in the scope of the present invention. .
The information processing apparatus 100 and the information processing apparatus 200 can be realized by a computer and a program for controlling the computer, dedicated hardware, or a combination of the computer and the program for controlling the computer and dedicated hardware, respectively.
Key frame extraction unit 100, presentation method determination unit 102, video content generation unit 103, key frame extraction unit 201, presentation method determination unit 202, video content generation unit 203, video input unit 204, relevance determination unit 221, presentation method selection The unit 222 can be realized by, for example, a dedicated program for realizing the function of each unit read into a memory from a recording medium that stores the program, and a processor that executes the program. Alternatively, the key frame extraction unit 100, the presentation method determination unit 102, the video content generation unit 103, the key frame extraction unit 201, the presentation method determination unit 202, the video content generation unit 203, the video input unit 204, the relevance determination unit 221, and the presentation Part or all of the method selection unit 222 may be realized by a dedicated circuit that implements the function of each unit.
The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
While the present invention has been described with reference to the embodiments (and examples), the present invention is not limited to the above embodiments (and examples). Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2011-107104 for which it applied on May 12, 2011, and takes in those the indications of all here.

Claims

Extraction means for extracting at least two partial videos or still images from the source video;
Determining means for determining a method of presenting the at least two partial moving images or still images extracted by the extracting means based on characteristics of the source moving images;
Generating means for generating video content including the at least two partial moving images or still images based on the presentation method determined by the determining means;
An information processing apparatus comprising:

2. The information processing apparatus according to claim 1, wherein the generation unit generates the video content that continuously presents the at least two still images.

3. The information processing apparatus according to claim 1, wherein the determining unit determines a presentation time of the at least two partial moving images or still images in the video content.

The said determination means determines the other presentation time based on one presentation time, when the said at least 2 partial moving image or a still image has a relation with each other. Information processing device.

When the at least two partial moving images or the objects included in the still image are the same, the determining means includes the partial moving image or the still image inserted after the presentation time of the previously inserted partial moving image or the still image. The information processing apparatus according to claim 4, wherein an image presentation time is shortened.

The said determination means determines presentation time independently, when the said at least 2 partial moving image or a still image does not have a mutual relationship, The presentation time of Claim 3 thru | or 5 characterized by the above-mentioned. Information processing device.

7. The information processing apparatus according to claim 1, wherein the determining unit determines an effect or a jingle when switching the at least two partial moving images or still images in the video content.

The said determination means determines the effect or jingle different from the case where it does not have a relationship, when the said at least 2 partial moving image or a still image has a relationship with each other. Information processing device.

9. The information processing apparatus according to claim 1, wherein the determining unit determines background music of the at least two partial moving images or still images in the video content.

The determining means includes
Determination means for determining whether or not the object included in the at least two partial moving images or still images is related based on the source moving image;
A selection means for selecting a presentation method different from the case where the object is not related when the object is related;
The information processing apparatus according to any one of claims 1 to 9, further comprising:

The determination means includes
The information processing apparatus according to claim 10, wherein whether or not the objects included in the at least two partial moving images or still images are the same is determined based on the source moving image.

The determination means includes
The information processing apparatus according to claim 10 or 11, wherein it is determined based on the source moving image whether or not the shooting method of the object in the at least two partial moving images or still images is the same.

The determination means includes
13. The method according to claim 10, wherein whether or not the objects included in the at least two partial moving images or the still image have commonality is determined based on an acoustic feature of the source moving image. The information processing apparatus according to item 1.

Extract at least two partial videos or still images from the source video,
A method of presenting the extracted at least two partial moving images or still images is determined based on characteristics of the source moving image;
An information processing method comprising: generating video content including the at least two partial moving images or still images based on the determined presentation method.

Computer
Extraction means for extracting at least two partial videos or still images from the source video;
Determining means for determining a method of presenting the at least two partial moving images or still images extracted by the extracting means based on characteristics of the source moving images;
Generating means for generating video content including the at least two partial moving images or still images based on the presentation method determined by the determining means;
An information processing program characterized by being operated in the same manner.