WO2024261856A1

WO2024261856A1 - Information processing device, information processing method, and recording medium

Info

Publication number: WO2024261856A1
Application number: PCT/JP2023/022755
Authority: WO
Inventors: 和也柿崎; 拓磨天田; 雅弘佛崎; 俊則荒木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2024-12-26
Anticipated expiration: 2025-12-20

Abstract

An information processing device 1 comprises: a reception unit 11 that receives input of a first image and a second image; a synthesis unit 12 that synthesizes a third image on the basis of the first image and the second image; a difference emphasis unit 13 that emphasizes a difference between the second image and the third image; a calculation unit 14 that calculates an index representing the likeness of the second image to a synthesized image on the basis of the emphasized difference; and a determination unit 15 that determines whether the second image is a synthesized image according to the index.

Description

Information processing device, information processing method, and recording medium

　本開示は、情報処理装置、情報処理方法、及び、記録媒体の技術分野に関する。 This disclosure relates to the technical fields of information processing devices, information processing methods, and recording media.

　特許文献１には、クラスが人物の顔（物体）が本物か（真）、あるいは、フェイクか（偽）を示し、検知対象クラスの要素として、合成された領域（即ち、本来の動画に付け足された領域）を検知して、動画がフェイク動画（例えば、合成処理等で生成された嘘の動画）を検知する技術が記載されている。 Patent Document 1 describes a technology in which a class indicates whether a person's face (object) is real (true) or fake (false), and a synthesized area (i.e., an area added to an original video) is detected as an element of the detection target class to detect whether the video is a fake video (e.g., a false video generated by a synthesis process, etc.).

国際公開第２０２２／０５４２４６号公報International Publication No. 2022/054246

　本開示は、入力された画像が合成された画像か否かを精度よく検知することを目的とする情報処理装置、情報処理方法、及び、記録媒体を提供することを課題とする。 The present disclosure aims to provide an information processing device, an information processing method, and a recording medium that are intended to accurately detect whether an input image is a composite image.

　情報処理装置の一の態様は、第１の画像及び第２の画像の入力を受け付ける受付手段と、前記第１の画像及び前記第２の画像に基づいて、第３の画像を合成する合成手段と、前記第２の画像と前記第３の画像との差分を強調する差分強調手段と、前記強調された差分に基づいて、前記第２の画像の合成された画像らしさを表す指標を算出する算出手段と、前記指標に応じて、前記第２の画像が合成された画像か否かを判定する判定手段とを備える。 One aspect of the information processing device includes a receiving means for receiving an input of a first image and a second image, a synthesis means for synthesizing a third image based on the first image and the second image, a difference emphasis means for emphasizing the difference between the second image and the third image, a calculation means for calculating an index representing the likelihood that the second image is a synthesized image based on the emphasized difference, and a determination means for determining whether the second image is a synthesized image or not according to the index.

　情報処理方法の一の態様は、第１の画像及び第２の画像の入力を受け付け、前記第１の画像及び前記第２の画像に基づいて、第３の画像を合成し、前記第２の画像と前記第３の画像との差分を強調し、前記強調された差分に基づいて、前記第２の画像の合成された画像らしさを表す指標を算出し、前記指標に応じて、前記第２の画像が合成された画像か否かを判定する。 One aspect of the information processing method is to receive an input of a first image and a second image, synthesize a third image based on the first image and the second image, emphasize the difference between the second image and the third image, calculate an index representing the likelihood of the second image being a synthesized image based on the emphasized difference, and determine whether the second image is a synthesized image or not based on the index.

　記録媒体の一の態様は、コンピュータに、第１の画像及び第２の画像の入力を受け付け、前記第１の画像及び前記第２の画像に基づいて、第３の画像を合成し、前記第２の画像と前記第３の画像との差分を強調し、前記強調された差分に基づいて、前記第２の画像の合成された画像らしさを表す指標を算出し、前記指標に応じて、前記第２の画像が合成された画像か否かを判定する情報処理方法を実行させるためのコンピュータプログラムが記録されている。 In one embodiment of the recording medium, a computer program is recorded to cause a computer to execute an information processing method that accepts input of a first image and a second image, synthesizes a third image based on the first image and the second image, emphasizes the difference between the second image and the third image, calculates an index representing the likelihood that the second image is a synthesized image based on the emphasized difference, and determines whether the second image is a synthesized image according to the index.

　本開示にかかる情報処理装置、情報処理方法、及び、記録媒体は、入力された画像が合成された画像か否かを精度よく検知することができる。 The information processing device, information processing method, and recording medium disclosed herein can accurately detect whether an input image is a composite image.

本開示にかかる第１の情報処理装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a first information processing device according to the present disclosure. 本開示にかかる第２の情報処理装置の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a second information processing device according to the present disclosure. 本開示にかかる第２の情報処理装置の処理動作を示すフローチャートである。10 is a flowchart showing a processing operation of a second information processing device according to the present disclosure. 本開示にかかる第３の情報処理装置の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a third information processing device according to the present disclosure. 本開示にかかる第３の情報処理装置の処理動作を示すフローチャートである。13 is a flowchart showing a processing operation of a third information processing device according to the present disclosure. 本開示にかかる第４の情報処理装置の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a fourth information processing device according to the present disclosure. 本開示にかかる第４の情報処理装置の処理動作を示すフローチャートである。13 is a flowchart showing a processing operation of a fourth information processing device according to the present disclosure.

　以下、図面を参照しながら、情報処理装置、情報処理方法、及び、記録媒体の実施形態について説明する。
　［１：第１実施形態］ Hereinafter, embodiments of an information processing device, an information processing method, and a recording medium will be described with reference to the drawings.
[1: First embodiment]

　情報処理装置、情報処理方法、及び、記録媒体の第１実施形態について説明する。以下では、本開示にかかる第１の情報処理装置１を用いて、情報処理装置、情報処理方法、及び記録媒体の第１実施形態について説明する。
　［１－１：情報処理装置１の構成］ A first embodiment of an information processing device, an information processing method, and a recording medium will be described. Hereinafter, a first embodiment of an information processing device, an information processing method, and a recording medium will be described using a first information processing device 1 according to the present disclosure.
[1-1: Configuration of information processing device 1]

　図１は、本開示にかかる第１の情報処理装置１の構成を示すブロック図である。図１に示すように、情報処理装置１は、受付部１１と、合成部１２と、差分強調部１３と、算出部１４と、判定部１５とを備える。受付部１１は、第１の画像及び第２の画像の入力を受け付ける。合成部１２は、第１の画像及び第２の画像に基づいて、第３の画像を合成する。差分強調部１３は、第２の画像と第３の画像との差分を強調する。算出部１４は、強調された差分に基づいて、第２の画像の合成された画像らしさを表す指標を算出する。判定部１５は、指標に応じて、第２の画像が合成された画像か否かを判定する。
　［１－２：情報処理装置１の技術的効果］ FIG. 1 is a block diagram showing a configuration of a first information processing device 1 according to the present disclosure. As shown in FIG. 1, the information processing device 1 includes a receiving unit 11, a synthesis unit 12, a difference emphasizing unit 13, a calculation unit 14, and a determination unit 15. The receiving unit 11 receives input of a first image and a second image. The synthesis unit 12 synthesizes a third image based on the first image and the second image. The difference emphasizing unit 13 emphasizes the difference between the second image and the third image. The calculation unit 14 calculates an index representing the likelihood of the second image being a synthesized image based on the emphasized difference. The determination unit 15 determines whether the second image is a synthesized image according to the index.
[1-2: Technical Effects of Information Processing Device 1]

　本開示にかかる第１の情報処理装置１は、入力された画像と合成した画像との差分に基づいた判定をするので、入力された画像が合成された画像か否かを精度よく検知することができる。
　［２：第２実施形態］ The first information processing device 1 according to the present disclosure makes a judgment based on the difference between an input image and a composite image, and can therefore accurately detect whether or not the input image is a composite image.
[2: Second embodiment]

　情報処理装置、情報処理方法、及び、記録媒体の第２実施形態について説明する。以下では、本開示にかかる第２の情報処理装置２を用いて、情報処理装置、情報処理方法、及び記録媒体の第２実施形態について説明する。
　［２－１：フェイク画像］ A second embodiment of an information processing device, an information processing method, and a recording medium will be described. Hereinafter, a second embodiment of an information processing device, an information processing method, and a recording medium will be described using a second information processing device 2 according to the present disclosure.
[2-1: Fake image]

　人物の顔写真一枚の情報を基づいて、当該人物の画像を合成する技術がある。人物の画像の合成の技術として、例えば、ディープフェイク（ｄｅｅｐｆａｋｅ）が知られている。ディープフェイクは、実際には起こっていないことがらが写る、フェイクの画像を合成する技術として知られている。以下、実際には起こっていないことがらが写った画像をフェイク画像とよぶ場合がある。また、合成された画像をフェイク画像とよぶ場合もある。また、実際に起こったことがらが写った画像を本物画像とよぶ場合がある。
　［２－２：情報処理装置２の構成］ There is a technology that synthesizes an image of a person based on information from a single photograph of the person's face. For example, deepfake is known as a technology for synthesizing an image of a person. Deepfake is known as a technology for synthesizing a fake image that shows something that does not actually happen. Hereinafter, an image that shows something that does not actually happen may be called a fake image. A synthesized image may also be called a fake image. An image that shows something that actually happened may also be called a real image.
[2-2: Configuration of information processing device 2]

　図２は、第２の情報処理装置２の構成を示すブロック図である。図２に示すように、情報処理装置２は、演算装置２１と、記憶装置２２とを備えている。更に、情報処理装置２は、通信装置２３と、入力装置２４と、出力装置２５とを備えていてもよい。但し、情報処理装置２は、通信装置２３、入力装置２４及び出力装置２５のうちの少なくとも一つを備えていなくてもよい。演算装置２１と、記憶装置２２と、通信装置２３と、入力装置２４と、出力装置２５とは、データバス２６を介して接続されていてもよい。 FIG. 2 is a block diagram showing the configuration of the second information processing device 2. As shown in FIG. 2, the information processing device 2 includes a calculation device 21 and a storage device 22. Furthermore, the information processing device 2 may include a communication device 23, an input device 24, and an output device 25. However, the information processing device 2 does not have to include at least one of the communication device 23, the input device 24, and the output device 25. The calculation device 21, the storage device 22, the communication device 23, the input device 24, and the output device 25 may be connected via a data bus 26.

　演算装置２１は、例えば、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）、ＧＰＵ（Ｇｒａｐｈｉｃｓ　Ｐｒｏｅｃｓｓｉｎｇ　Ｕｎｉｔ）及びＦＰＧＡ（Ｆｉｅｌｄ　Ｐｒｏｇｒａｍｍａｂｌｅ　Ｇａｔｅ　Ａｒｒａｙ）のうちの少なくとも一つを含む。演算装置２１は、コンピュータプログラムを読み込む。例えば、演算装置２１は、記憶装置２２が記憶しているコンピュータプログラムを読み込んでもよい。例えば、演算装置２１は、コンピュータで読み取り可能であって且つ一時的でない記録媒体が記憶しているコンピュータプログラムを、情報処理装置２が備える図示しない記録媒体読み取り装置（例えば、後述する入力装置２４）を用いて読み込んでもよい。演算装置２１は、通信装置２３（或いは、その他の通信装置）を介して、情報処理装置２の外部に配置される不図示の装置からコンピュータプログラムを取得してもよい（つまり、ダウンロードしてもよい又は読み込んでもよい）。演算装置２１は、読み込んだコンピュータプログラムを実行する。その結果、演算装置２１内には、情報処理装置２が行うべき動作を実行するための論理的な機能ブロックが実現される。つまり、演算装置２１は、情報処理装置２が行うべき動作（言い換えれば、処理）を実行するための論理的な機能ブロックを実現するためのコントローラとして機能可能である。 The arithmetic device 21 includes, for example, at least one of a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an FPGA (Field Programmable Gate Array). The arithmetic device 21 reads a computer program. For example, the arithmetic device 21 may read a computer program stored in the storage device 22. For example, the arithmetic device 21 may read a computer program stored in a computer-readable and non-transient recording medium using a recording medium reading device (e.g., an input device 24 described later) not shown in the figure that is provided in the information processing device 2. The arithmetic device 21 may acquire (i.e., download or read) a computer program from a device (not shown) located outside the information processing device 2 via the communication device 23 (or other communication device). The arithmetic device 21 executes the read computer program. As a result, a logical functional block for executing the operation to be performed by the information processing device 2 is realized within the calculation device 21. In other words, the calculation device 21 can function as a controller for realizing a logical functional block for executing the operation (in other words, processing) to be performed by the information processing device 2.

　図２には、情報処理動作を実行するために演算装置２１内に実現される論理的な機能ブロックの一例が示されている。図２に示すように、演算装置２１内には、後述する付記に記載された「受付手段」の一具体例である受付部２１１と、後述する付記に記載された「合成手段」の一具体例である合成部２１２と、後述する付記に記載された「差分強調手段」の一具体例である差分強調部と、後述する付記に記載された「算出手段」の一具体例である算出部２１４と、後述する付記に記載された「判定手段」の一具体例である判定部２１５と、出力部２１６とが実現される。但し、出力部２１６は、演算装置２１内に実現されなくてもよい。差分強調部は、抽出部２１３１、及び強調部２１３２を有していてもよい。受付部２１１、合成部２１２、差分強調部、算出部２１４、判定部２１５、及び出力部２１６の各々の動作の詳細については、図３を参照しながら後に説明する。 2 shows an example of a logical functional block realized in the arithmetic device 21 to execute an information processing operation. As shown in FIG. 2, the arithmetic device 21 realizes a reception unit 211, which is a specific example of the "reception means" described in the appendix described later, a synthesis unit 212, which is a specific example of the "synthesizing means" described in the appendix described later, a difference emphasis unit, which is a specific example of the "difference emphasis means" described in the appendix described later, a calculation unit 214, which is a specific example of the "calculation means" described in the appendix described later, a judgment unit 215, which is a specific example of the "judgment means" described in the appendix described later, and an output unit 216. However, the output unit 216 does not have to be realized in the arithmetic device 21. The difference emphasis unit may have an extraction unit 2131 and an emphasis unit 2132. Details of the operations of the reception unit 211, synthesis unit 212, difference emphasis unit, calculation unit 214, judgment unit 215, and output unit 216 will be described later with reference to FIG. 3.

　記憶装置２２は、所望のデータを記憶可能である。例えば、記憶装置２２は、演算装置２１が実行するコンピュータプログラムを一時的に記憶していてもよい。記憶装置２２は、演算装置２１がコンピュータプログラムを実行している場合に演算装置２１が一時的に使用するデータを一時的に記憶してもよい。記憶装置２２は、情報処理装置２が長期的に保存するデータを記憶してもよい。尚、記憶装置２２は、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）、ＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）、ハードディスク装置、光磁気ディスク装置、ＳＳＤ（Ｓｏｌｉｄ　Ｓｔａｔｅ　Ｄｒｉｖｅ）及びディスクアレイ装置のうちの少なくとも一つを含んでいてもよい。つまり、記憶装置２２は、一時的でない記録媒体を含んでいてもよい。 The storage device 22 can store desired data. For example, the storage device 22 may temporarily store a computer program executed by the arithmetic device 21. The storage device 22 may temporarily store data that is temporarily used by the arithmetic device 21 when the arithmetic device 21 is executing a computer program. The storage device 22 may store data that the information processing device 2 stores for a long period of time. The storage device 22 may include at least one of a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), and a disk array device. In other words, the storage device 22 may include a non-temporary recording medium.

　通信装置２３は、不図示の通信ネットワークを介して、情報処理装置２の外部の装置と通信可能である。通信装置２３は、イーサネット（登録商標）、Ｗｉ－Ｆｉ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＵＳＢ（Ｕｎｉｖｅｒｓａｌ　Ｓｅｒｉａｌ　Ｂｕｓ）等の規格に基づく通信インターフェースであってもよい。 The communication device 23 is capable of communicating with devices external to the information processing device 2 via a communication network (not shown). The communication device 23 may be a communication interface based on standards such as Ethernet (registered trademark), Wi-Fi (registered trademark), Bluetooth (registered trademark), and USB (Universal Serial Bus).

　入力装置２４は、情報処理装置２の外部からの情報処理装置２に対する情報の入力を受け付ける装置である。例えば、入力装置２４は、情報処理装置２のオペレータが操作可能な操作装置（例えば、キーボード、マウス及びタッチパネルのうちの少なくとも一つ）を含んでいてもよい。例えば、入力装置２４は情報処理装置２に対して外付け可能な記録媒体にデータとして記録されている情報を読み取り可能な読取装置を含んでいてもよい。 The input device 24 is a device that accepts information input to the information processing device 2 from outside the information processing device 2. For example, the input device 24 may include an operating device (e.g., at least one of a keyboard, a mouse, and a touch panel) that can be operated by an operator of the information processing device 2. For example, the input device 24 may include a reading device that can read information recorded as data on a recording medium that can be attached externally to the information processing device 2.

　出力装置２５は、情報処理装置２の外部に対して情報を出力する装置である。例えば、出力装置２５は、情報を画像として出力してもよい。つまり、出力装置２５は、出力したい情報を示す画像を表示可能な表示装置（いわゆる、ディスプレイ）を含んでいてもよい。例えば、出力装置２５は、情報を音声として出力してもよい。つまり、出力装置２５は、音声を出力可能な音声装置（いわゆる、スピーカ）を含んでいてもよい。例えば、出力装置２５は、紙面に情報を出力してもよい。つまり、出力装置２５は、紙面に所望の情報を印刷可能な印刷装置（いわゆる、プリンタ）を含んでいてもよい。
　［２－３：情報処理装置２が行う情報処理動作］ The output device 25 is a device that outputs information to the outside of the information processing device 2. For example, the output device 25 may output information as an image. That is, the output device 25 may include a display device (so-called a display) capable of displaying an image showing the information to be output. For example, the output device 25 may output information as sound. That is, the output device 25 may include an audio device (so-called a speaker) capable of outputting sound. For example, the output device 25 may output information on paper. That is, the output device 25 may include a printing device (so-called a printer) capable of printing desired information on paper.
[2-3: Information Processing Operation Performed by Information Processing Device 2]

　図３を参照しながら、情報処理装置２が行う情報処理動作について説明する。図３は、情報処理装置２が行う情報処理動作の流れを示すフローチャートである。なお、本開示は、第１の画像がフェイク画像ではなく本物画像であると仮定している。 The information processing operation performed by the information processing device 2 will be described with reference to FIG. 3. FIG. 3 is a flowchart showing the flow of the information processing operation performed by the information processing device 2. Note that this disclosure assumes that the first image is not a fake image but a real image.

　図３に示す様に、受付部２１１は、第１の画像の入力を受け付ける（ステップＳ２０）。受付部２１１は、第１の画像として、人物の顔領域を含む顔画像の入力を受け付けてもよい。第１の画像は、静止画像であってもよい。以下、「第１の画像」を「ソース画像」とよぶ場合がある。 As shown in FIG. 3, the reception unit 211 receives input of a first image (step S20). The reception unit 211 may receive input of a face image including a person's face area as the first image. The first image may be a still image. Hereinafter, the "first image" may be referred to as the "source image."

　受付部２１１は、第２の画像の入力を受け付ける（ステップＳ２１）。受付部２１１は、第２の画像として、人物の顔領域を含む顔画像の入力を受け付けてもよい。第２の画像は、静止画像であってもよい。第２の画像は、動画像であってもよい。第２実施形態では、静止画像である第２の画像を処理する場合を説明する。以下、「第２の画像」を「判定対象画像」とよぶ場合がある。 The reception unit 211 receives input of a second image (step S21). The reception unit 211 may receive input of a face image including a person's facial area as the second image. The second image may be a still image. The second image may be a moving image. In the second embodiment, a case where the second image that is a still image is processed will be described. Hereinafter, the "second image" may be referred to as the "image to be determined."

　合成部２１２は、第１の画像及び第２の画像に基づいて、第３の画像を合成する。以下、「第３の画像」を「合成画像」とよぶ場合がある。合成部２１２は、例えば、フェーススワップ（Ｆａｃｅ　Ｓｗａｐ）とよばれる手法を用いて合成画像を生成してもよい。フェーススワップは、ソース画像の顔領域とターゲット画像の顔領域とを入れ替える手法である。合成部２１２は、ソース画像の顔領域に判定対象画像の顔領域をはめ込むことにより合成画像を生成してもよい。合成部２１２は、ソース画像の特徴を有する合成画像を生成してもよい。例えば、合成部２１２は、ソース画像の顔の表情を維持した合成画像を生成してもよい。 The synthesis unit 212 synthesizes a third image based on the first image and the second image. Hereinafter, the "third image" may be referred to as a "synthetic image". The synthesis unit 212 may generate a synthetic image using, for example, a technique called face swap. Face swap is a technique for exchanging a face area of a source image with a face area of a target image. The synthesis unit 212 may generate a synthetic image by fitting a face area of the judgment target image into a face area of the source image. The synthesis unit 212 may generate a synthetic image having the characteristics of the source image. For example, the synthesis unit 212 may generate a synthetic image that maintains the facial expression of the source image.

　抽出部２１３１は、判定対象画像と合成画像との差分を抽出する（ステップＳ２３）。抽出部２１３１は、判定対象画像と合成画像とが異なっている部分を抽出すると言い換えてもよい。抽出部２１３１は、画素毎に画素値の差を求め、各画素を画素値の差で表した差分画像を生成してもよい。 The extraction unit 2131 extracts the difference between the image to be determined and the composite image (step S23). In other words, the extraction unit 2131 extracts the portion where the image to be determined and the composite image differ. The extraction unit 2131 may obtain the difference in pixel value for each pixel, and generate a difference image in which each pixel is represented by the difference in pixel value.

　強調部２１３２は、差分を強調する（ステップＳ２４）。強調部２１３２は、例えば、差分画像の各画素値に実数を乗ずることにより、差分をより強調してもよい。すなわち、強調部２１３２は、差分画像における画素値が０（つまり、判定対象画像と合成画像とで画素値の差がない）ではない画素の画素値を大きくしてもよい。以下、強調部２１３２により差分が強調された差分画像を「差分強調画像」とよぶ場合がある。強調部２１３２は、例えば、差分画像における画素値が０ではない画素の画素値を取り得る最大値にすることにより、差分強調画像を生成してもよい。強調部２１３２は、例えば、差分画像における画素値が所定以上の画素の画素値を取り得る最大値にすることにより、差分強調画像を生成してもよい。強調部２１３２は、例えば、差分画像における画素値の範囲を設定することにより、例えば、差分大、差分中、差分小、差分無の４種類の画素値を取り得る差分強調画像を生成してもよい。強調部２１３２は、任意の方法を採用して差分を強調してもよい。 The highlighting unit 2132 highlights the difference (step S24). The highlighting unit 2132 may, for example, highlight each pixel value of the difference image by multiplying it by a real number to highlight the difference. That is, the highlighting unit 2132 may increase the pixel value of a pixel in the difference image whose pixel value is not 0 (that is, there is no difference in pixel value between the image to be determined and the composite image). Hereinafter, the difference image in which the difference is highlighted by the highlighting unit 2132 may be referred to as a "difference highlighted image". The highlighting unit 2132 may, for example, generate a difference highlighted image by setting the pixel value of a pixel in the difference image whose pixel value is not 0 to the maximum value that it can take. The highlighting unit 2132 may, for example, generate a difference highlighted image by setting the pixel value of a pixel in the difference image whose pixel value is equal to or greater than a predetermined value to the maximum value that it can take. The highlighting unit 2132 may, for example, generate a difference highlighted image that can take four types of pixel values, for example, large difference, medium difference, small difference, and no difference, by setting a range of pixel values in the difference image. The highlighting unit 2132 may employ any method to highlight the difference.

　算出部２１４は、強調された差分に基づいて、判定対象画像のフェイク画像らしさを表す指標を算出する（ステップＳ２５）。算出部２１４が算出する指標は実数であってもよい。算出部２１４は、算出モデルを用いて、判定対象画像のフェイク画像らしさを表す指標を算出してもよい。算出モデルは、強調された差分が入力されると、画像のフェイク画像らしさを表す指標を出力するモデルである。算出モデルは、機械学習されたモデルであってもよい。算出モデルの学習を行う学習機構は、正解（判定対象画像は本物画像である、又は、判定対象画像はフェイク画像である）を示す情報を有する差分強調画像を教師データとして用いて、算出モデルの学習を行ってもよい。学習機構は、正解を示す情報と、算出モデルが出力したフェイク画像らしさを表す指標とを用いて、当該指標の算出方法を算出モデルに学習させてもよい。 The calculation unit 214 calculates an index representing the likelihood that the image to be determined is a fake image based on the emphasized difference (step S25). The index calculated by the calculation unit 214 may be a real number. The calculation unit 214 may use a calculation model to calculate an index representing the likelihood that the image to be determined is a fake image. The calculation model is a model that outputs an index representing the likelihood that the image is a fake image when the emphasized difference is input. The calculation model may be a machine-learned model. The learning mechanism that trains the calculation model may train the calculation model using a difference-emphasized image having information indicating the correct answer (the image to be determined is a real image, or the image to be determined is a fake image) as teacher data. The learning mechanism may train the calculation model on a method for calculating the index using the information indicating the correct answer and the index representing the likelihood that the image is a fake image output by the calculation model.

　判定部２１５は、指標に応じて、第２の画像が合成された画像か否かを判定する（ステップＳ２６）。判定部２１５は、指標と所定の閾値との比較により、第２の画像が合成された画像か否かを判定してもよい。 The determination unit 215 determines whether the second image is a composite image or not based on the index (step S26). The determination unit 215 may determine whether the second image is a composite image or not by comparing the index with a predetermined threshold value.

　指標が所定の閾値を超過した場合（ステップＳ２６：Ｙｅｓ）、判定部２１５は、第２の画像が合成された画像であると判定する（ステップＳ２７）。指標が所定の閾値を超過しなかった場合（ステップＳ２６：Ｎｏ）、判定部２１５は、第２の画像が合成されていない画像であると判定する（ステップＳ２８）。 If the index exceeds the predetermined threshold (step S26: Yes), the determination unit 215 determines that the second image is a composite image (step S27). If the index does not exceed the predetermined threshold (step S26: No), the determination unit 215 determines that the second image is a non-composite image (step S28).

　出力部２１６は、判定結果に応じた出力をする（ステップＳ２９）。出力部２１６は、出力装置２５を制御して、出力装置２５に判定結果に応じた出力をさせてもよい。
　［２－４：情報処理装置２の技術的効果］ The output unit 216 outputs according to the determination result (step S29). The output unit 216 may control the output device 25 to cause the output device 25 to output according to the determination result.
[2-4: Technical Effects of Information Processing Device 2]

　判定対象画像と合成画像との比較判定に強調した差分を用いた場合、判定対象画像がフェイク画像である場合とない場合との違いが分かりやすく、判定対象画像がフェイク画像であるのか否かが判別し易い。本開示の第２の情報処理装置２は、入力された判別対象画像と合成した合成画像との差分を強調し、強調した差分に基づいた判定をするので、入力された判別対象画像がフェイク画像か否かを精度よく検知することができる。また、情報処理装置２は、閾値との比較によりフェイク画像か否かを判定するので、閾値の設定により、どれだけフェイク画像らしい画像をフェイク画像と判定するかを調整することができる。
　［３：第３実施形態］ When an emphasized difference is used for the comparison judgment between the judgment target image and the composite image, the difference between the judgment target image being a fake image and not being a fake image is easy to understand, and it is easy to judge whether the judgment target image is a fake image or not. The second information processing device 2 of the present disclosure emphasizes the difference between the input judgment target image and the composite image and judges based on the emphasized difference, so that it is possible to accurately detect whether the input judgment target image is a fake image or not. In addition, since the information processing device 2 judges whether the image is a fake image or not by comparing with a threshold value, it is possible to adjust how much an image that seems like a fake image is judged to be a fake image by setting the threshold value.
[3: Third embodiment]

　情報処理装置、情報処理方法、及び、記録媒体の第３実施形態について説明する。以下では、本開示にかかる第３の情報処理装置３を用いて、情報処理装置、情報処理方法、及び記録媒体の第３実施形態について説明する。第３実施形態では、第２の画像が複数のフレームを含む動画像である場合を説明する。以下、「第２の画像」を「判定対象動画」とよぶ場合がある。また、実際には起こっていないことがらが写った動画像をフェイク動画とよぶ場合がある。また、実際に起こったことがらが写った動画像を本物動画とよぶ場合がある。
　［３－１：フェイク動画］ A third embodiment of an information processing device, an information processing method, and a recording medium will be described below. In the following, a third embodiment of an information processing device, an information processing method, and a recording medium will be described using a third information processing device 3 according to the present disclosure. In the third embodiment, a case where the second image is a moving image including a plurality of frames will be described. Hereinafter, the "second image" may be referred to as a "video to be determined". Also, a moving image showing an event that has not actually occurred may be referred to as a fake video. Also, a moving image showing an event that has actually occurred may be referred to as a real video.
[3-1: Fake video]

　例えば、本物動画は、カメラにより撮像されている人物Ｂがカメラの前で行った動作が写る動画を含んでいてもよい。これに対し、フェイク動画は、カメラにより撮像されている人物Ｂがカメラの前で行った動作を、人物Ｂとは異なる人物Ａが行ったように合成された動画像を含んでいてもよい。 For example, a genuine video may include a video showing an action performed by person B in front of the camera as captured by the camera. In contrast, a fake video may include a moving image synthesized to make it appear as if person A, a different person from person B, performed the action performed by person B in front of the camera as captured by the camera.

　操演（Ｒｅｅｎａｃｔｍｅｎｔ）とよばれる、元画像に写る人物の表情が所望の表情に変化したり、元画像に写る人物が所望の方向を向いたりするフェイク動画を生成する技術がある。例えば、人物Ａの少なくとも１枚の顔画像に基づき、当該顔画像の人物Ａの表情を人物Ｂの表情に合わせて変化させた、あたかも人物Ａが表情を変えているかのような動画像を生成（以下、「静止画を動画化」とよぶ場合がある）する技術が知られている。 There is a technology called reenactment that creates fake videos in which the facial expression of a person in an original image changes to a desired expression or faces a desired direction. For example, a technology is known that uses at least one facial image of person A and changes the facial expression of person A in the facial image to match the facial expression of person B, creating a moving image that makes it appear as if person A is changing his or her facial expression (hereinafter sometimes referred to as "animating still images").

　元画像と元動画とを用いることにより、静止画を動画化することができる。元動画とは、例えば、カメラにより撮像されている人物Ｂがカメラの前で行った動作が写る動画像であってもよい。また、元画像は、人物Ｂとは異なる人物Ａが写る静止画であってもよい。静止画の動画化では、まず、元画像のランドマークを検出する。また、元動画を構成する動画フレームの各々からランドマークを検出する。続いて、元動画を構成する各々の動画フレームについて、元画像のランドマークと該当動画フレームのランドマークとを合せるように元画像を編集して合成フレームを生成する。生成した各々の合成フレームを繋ぎ合わせることで、静止画を動画化することができる。ランドマークとは、画像に写る被写体の特徴的な位置であってもよい。 By using the original image and the original video, a still image can be animated. The original video may be, for example, a video image showing the actions of person B in front of the camera as captured by the camera. The original image may also be a still image showing person A, who is different from person B. When animating a still image, first, landmarks are detected in the original image. Landmarks are also detected from each of the video frames that make up the original video. Next, for each video frame that makes up the original video, the original image is edited so that the landmarks in the original image and the landmarks in the corresponding video frame are matched to generate a composite frame. The generated composite frames are then joined together to generate an animated still image. A landmark may be a characteristic position of a subject that appears in the image.

　例えば、元画像に写る人物Ａの顔の向き、表情等を、元動画に写る人物Ｂの顔のランドマークを用いて変化させ、人物Ａの顔の向き、表情等が変化する動画を合成することができる。人物の顔の向き、表情等が変化させるランドマークとは、顔における特徴的な部位であってもよい。顔における特徴的な位置とは、目、鼻、口等の部位の特定のポイントであってもよい。 For example, the facial direction, expression, etc. of person A in the original image can be changed using the facial landmarks of person B in the original video, and a video can be synthesized in which the facial direction, expression, etc. of person A changes. The landmarks that change the facial direction, expression, etc. of a person may be characteristic parts of the face. The characteristic positions on the face may be specific points on parts of the face such as the eyes, nose, mouth, etc.

　入手した動画像が、静止画を用いて合成した合成動画と類似している場合、当該入手した動画像はフェイク動画である可能性が高い。本実施形態では、この性質をフェイク動画か否かの判定に利用する。すなわち、第３実施形態では、静止画を用いて合成動画を生成すること、及び入手した動画像と合成動画とを比較することにより、入手した動画像がフェイク動画であるか否かを判定する。
　［３－２：情報処理装置３の構成］ If the acquired moving image is similar to a composite moving image synthesized using still images, the acquired moving image is likely to be a fake moving image. In this embodiment, this property is used to determine whether or not the moving image is a fake moving image. That is, in the third embodiment, a composite moving image is generated using still images, and the acquired moving image is compared with the composite moving image to determine whether or not the acquired moving image is a fake moving image.
[3-2: Configuration of information processing device 3]

　図４を参照しながら、第３の情報処理装置３の構成について説明する。図４は、第３の情報処理装置３の構成を示すブロック図である。 The configuration of the third information processing device 3 will be described with reference to FIG. 4. FIG. 4 is a block diagram showing the configuration of the third information processing device 3.

　図４に示すように、第３の情報処理装置３は、第２の情報処理装置２と同様に、演算装置２１と、記憶装置２２とを備えている。更に、第３の情報処理装置３は、第２の情報処理装置２と同様に、通信装置２３と、入力装置２４と、出力装置２５とを備えていてもよい。但し、情報処理装置３は、通信装置２３、入力装置２４及び出力装置２５のうちの少なくとも１つを備えていなくてもよい。第３の情報処理装置３は、合成部３１２が検出部３１２１を含む点で、第２の情報処理装置２と異なる。情報処理装置３のその他の特徴は、情報処理装置２のその他の特徴と同一であってもよい。このため、以下では、すでに説明した実施形態と異なる部分について詳細に説明し、その他の重複する部分については適宜説明を省略するものとする。
　［３－３：情報処理装置３が行う情報処理動作］ As shown in FIG. 4, the third information processing device 3 includes a calculation device 21 and a storage device 22, similar to the second information processing device 2. Furthermore, the third information processing device 3 may include a communication device 23, an input device 24, and an output device 25, similar to the second information processing device 2. However, the information processing device 3 may not include at least one of the communication device 23, the input device 24, and the output device 25. The third information processing device 3 differs from the second information processing device 2 in that the synthesis unit 312 includes a detection unit 3121. Other features of the information processing device 3 may be the same as other features of the information processing device 2. For this reason, hereinafter, the parts that are different from the embodiments already described will be described in detail, and other overlapping parts will be appropriately omitted.
[3-3: Information Processing Operation Performed by Information Processing Device 3]

　図５を参照しながら、情報処理装置３が行う情報処理動作の流れについて説明する。図５は、情報処理装置３が行う情報処理動作の流れを示すフローチャートである。 The flow of information processing operations performed by the information processing device 3 will be described with reference to FIG. 5. FIG. 5 is a flowchart showing the flow of information processing operations performed by the information processing device 3.

　図５に示す様に、受付部３１１は、第１の画像としてのソース画像の入力を受け付ける（ステップＳ２０）。検出部３１２１は、ソース画像からランドマークを検出する。検出部３１２１は、静止画から、ランドマークとして、顔領域における特徴的な位置を検出してもよい。検出部３１２１は、ソース画像から、ランドマークとして、目、鼻、口等の部位の特定のポイントを検出してもよい。 As shown in FIG. 5, the reception unit 311 receives input of a source image as a first image (step S20). The detection unit 3121 detects landmarks from the source image. The detection unit 3121 may detect characteristic positions in the face area as landmarks from a still image. The detection unit 3121 may detect specific points of parts of the body such as the eyes, nose, and mouth as landmarks from the source image.

　受付部３１１は、第２の画像としての判定対象動画の入力を受け付ける（ステップＳ３０）。検出部３１２１は、判定対象動画が含む１以上のフレームの各々からランドマークを検出する（ステップＳ３１）。判定対象動画が含む１以上のフレームは、判定対象動画が含む全てのフレームであってもよい。判定対象動画が含む１以上のフレームは、動画像が含む任意の１以上のフレームであってもよい。検出部３１２１は、判定対象画像から、ランドマークとして、ソース画像から検出したランドマークと同等の位置を検出してもよい。 The reception unit 311 receives input of the video to be judged as the second image (step S30). The detection unit 3121 detects landmarks from each of the one or more frames included in the video to be judged (step S31). The one or more frames included in the video to be judged may be all frames included in the video to be judged. The one or more frames included in the video to be judged may be any one or more frames included in the moving image. The detection unit 3121 may detect, as landmarks, positions from the image to be judged that are equivalent to the landmarks detected from the source image.

　合成部３１２は、ソース画像、及びソース画像のランドマーク、並びに、判定対象動画が含む１以上のフレームの各々のランドマークに基づいて、第３の画像を合成する（ステップＳ３２）。第３実施形態において、第３の画像は、１以上のフレームを含む合成動画である。以下、「第３の画像」を「合成動画」とよぶ場合がある。 The synthesis unit 312 synthesizes a third image based on the source image, the landmarks of the source image, and the landmarks of one or more frames included in the video to be determined (step S32). In the third embodiment, the third image is a synthetic video including one or more frames. Hereinafter, the "third image" may be referred to as a "synthetic video."

　合成部３１２は、まず、判定対象動画を構成する各々の入力フレームについて、ソース画像のランドマークと該当入力フレームのランドマークとを合せるように編集した合成フレームを生成してもよい。続いて、合成部３１２は、各々の合成フレームを繋ぎ合わせることで、静止画であるソース画像を動画化し、合成動画を生成してもよい。 The synthesis unit 312 may first generate a synthesis frame for each input frame constituting the judgment target moving image by editing the landmarks of the source image to match the landmarks of the corresponding input frame. Next, the synthesis unit 312 may connect each synthesis frame together to animate the source images, which are still images, and generate a synthesis moving image.

　抽出部３１３１は、判定対象動画と、合成動画との差分を抽出する（ステップＳ３３）。抽出部３１３１は、判定対象動画が含むフレームと、当該フレームに対応する合成動画が含むフレームとの差分を抽出する。抽出部３１３１は、判定対象動画と、合成動画との差分を抽出し、差分動画を生成してもよい。判定対象動画ｄ_ｉが１からＦのフレームを含む場合、判定対象動画ｄ_ｉを［ｘ_ｉ ^１，・・・，ｘ_ｉ ^Ｆ］と表してもよい。合成動画をｄ_ｆが１からＦのフレームを含む場合、合成動画ｄ_ｆを［ｘ_ｆ ^１，・・・，ｘ_ｉ ^Ｆ］と表してもよい。この場合、差分動画ｄ_ｄｉｆｆは、ｄ_ｆ－ｄ_ｉ＝［｜ｘ_ｆ ^１－ｘ_ｉ ^１｜，・・・，｜ｘ_ｆ ^Ｆ－ｘ_ｉ ^Ｆ｜］と表してもよい。ステップＳ３１において、検出部３１２１が判定対象動画が含む任意の１以上のフレームからランドマークを検出した場合、抽出部３１３１は、当該任意の１以上のフレームに対応する差分フレームを含む差分動画を生成してもよい。 The extraction unit 3131 extracts the difference between the determination target moving image and the composite moving image (step S33). The extraction unit 3131 extracts the difference between a frame included in the determination target moving image and a frame included in the composite moving image corresponding to the frame. The extraction unit 3131 may extract the difference between the determination target moving image and the composite moving image to generate a difference moving image. When the determination target moving image d _i includes frames 1 to F, the determination target moving image d _i may be expressed as [x _i ¹ , ..., x _i ^F ]. When the composite moving image d _f includes frames 1 to F, the composite moving image d _f may be expressed as [x _f ¹ , ..., x _i ^F ]. In this case, the difference moving image d _diff may be expressed as d _f - _{d i} = [|x _f ¹ - _{x i} ¹ |, ..., |x _f ^F - x _i ^F |]. In step S31, if the detection unit 3121 detects a landmark from any one or more frames included in the video to be judged, the extraction unit 3131 may generate a difference video including difference frames corresponding to the any one or more frames.

　強調部３１３２は、差分を強調する（ステップＳ３４）。強調部３１３２は、判定対象動画が含むフレームと、当該フレームに対応する合成動画が含むフレームとの差分を強調した差分強調フレームを含む差分強調動画を生成してもよい。強調部３１３２が生成した差分強調動画ｄ_ｄｉｆｆは、［α｜ｘ_ｆ ^１－ｘ_ｉ ^１｜，・・・，α｜ｘ_ｆ ^Ｆ－ｘ_ｉ ^Ｆ｜］と表してもよい。αは、実数であり、差分をより強調するためのパラメータである。 The emphasis unit 3132 emphasizes the difference (step S34). The emphasis unit 3132 may generate a difference emphasized moving image including a difference emphasized frame in which a difference between a frame included in the determination target moving image and a frame included in the composite moving image corresponding to the frame is emphasized. The difference emphasized moving image d _diff generated by the emphasis unit 3132 may be expressed as [α|x _f ¹ -x _i ¹ |, ..., α|x _f ^F -x _i ^F |], where α is a real number and a parameter for further emphasizing the difference.

　算出部３１４は、差分強調動画が含む１以上のフレームに基づいて、判定対象動画のフェイク動画らしさを表す指標を算出する（ステップＳ３５）。算出部３１４は、算出モデルを用いて、判定対象画像のフェイク画像らしさを表す指標を算出してもよい。第３実施形態において、算出モデルは、差分強調動画が含む１以上のフレームが入力されると、画像のフェイク画像らしさを表す指標を出力する。差分強調動画が含む１以上のフレームは、差分強調動画が含む全てのフレームであってもよい。差分強調動画が含む１以上のフレームは、動画像が含む任意の１以上のフレームであってもよい。 The calculation unit 314 calculates an index representing the likelihood that the video to be judged is a fake video based on one or more frames included in the difference-emphasized video (step S35). The calculation unit 314 may use a calculation model to calculate an index representing the likelihood that the image to be judged is a fake image. In the third embodiment, when one or more frames included in the difference-emphasized video are input, the calculation model outputs an index representing the likelihood that the image is a fake image. The one or more frames included in the difference-emphasized video may be all frames included in the difference-emphasized video. The one or more frames included in the difference-emphasized video may be any one or more frames included in the moving image.

　判定部３１５は、指標に応じて、判定対象動画がフェイク動画か否かを判定する（ステップＳ３６）。判定部３１５は、指標と所定の閾値との比較により、判定対象動画がフェイク動画か否かを判定してもよい。 The determination unit 315 determines whether the video to be determined is a fake video or not based on the index (step S36). The determination unit 315 may determine whether the video to be determined is a fake video or not by comparing the index with a predetermined threshold value.

　指標が所定の閾値を超過した場合（ステップＳ３６：Ｙｅｓ）、判定部３１５は、判定対象動画がフェイク動画であると判定する（ステップＳ３７）。指標が所定の閾値を超過しなかった場合（ステップＳ３６：Ｎｏ）、判定部３１５は、判定対象動画がフェイク動画ではないと判定する（ステップＳ３８）。出力部３１６は、判定結果に応じた出力をする（ステップＳ３９）。
　［３－４：情報処理装置３の技術的効果］ If the index exceeds a predetermined threshold (step S36: Yes), the determination unit 315 determines that the video to be determined is a fake video (step S37). If the index does not exceed the predetermined threshold (step S36: No), the determination unit 315 determines that the video to be determined is not a fake video (step S38). The output unit 316 outputs according to the determination result (step S39).
[3-4: Technical Effects of Information Processing Device 3]

　静止画、及びランドマークを用いて生成した合成動画は、ディープフェイク等の技術を用いて生成されたフェイク動画の特徴を捉えることができる。本開示の第３の情報処理装置３は、入力された静止画であるソース画像を使って生成した合成動画と入力された判定対象動画の特徴が似通っている場合には判定対象動画がフェイク動画である可能性が高いという性質を用いる。本開示の第３の情報処理装置３は、判定対象動画が偽造されていない本物動画であるか、判定対象動画が偽造されたフェイク動画であるかを精度よく判定することができる。また、情報処理装置３は、フレーム毎の差分に基づいて、入力された判定対象動画がフェイク動画か否かを精度よく検知することができる。また、情報処理装置３は、ランドマークを用いて生成されたディープフェイクを精度よく検知することができる。
　［４：第４実施形態］ A synthetic video generated using still images and landmarks can capture the characteristics of a fake video generated using a technology such as deep fake. The third information processing device 3 of the present disclosure uses the property that if the characteristics of a synthetic video generated using a source image, which is an input still image, and the input video to be judged are similar, the video to be judged is likely to be a fake video. The third information processing device 3 of the present disclosure can accurately judge whether the video to be judged is a genuine video that is not forged or a fake video that is forged. In addition, the information processing device 3 can accurately detect whether the input video to be judged is a fake video or not based on the difference between each frame. In addition, the information processing device 3 can accurately detect deep fakes generated using landmarks.
[4: Fourth embodiment]

　情報処理装置、情報処理方法、及び、記録媒体の第４実施形態について説明する。以下では、本開示にかかる第４の情報処理装置４を用いて、情報処理装置、情報処理方法、及び記録媒体の第４実施形態について説明する。
　［４－１：情報処理装置４の構成］ A fourth embodiment of an information processing device, an information processing method, and a recording medium will be described. Hereinafter, a fourth embodiment of an information processing device, an information processing method, and a recording medium will be described using a fourth information processing device 4 according to the present disclosure.
[4-1: Configuration of information processing device 4]

　図６を参照しながら、第４の情報処理装置４の構成について説明する。図６は、第４の情報処理装置４の構成を示すブロック図である。 The configuration of the fourth information processing device 4 will be described with reference to FIG. 6. FIG. 6 is a block diagram showing the configuration of the fourth information processing device 4.

　図６に示すように、第４の情報処理装置４は、第２の情報処理装置２、及び第３の情報処理装置３と同様に、演算装置２１と、記憶装置２２とを備えている。更に、第４の情報処理装置４は、第２の情報処理装置２、及び第３の情報処理装置３と同様に、通信装置２３と、入力装置２４と、出力装置２５とを備えていてもよい。但し、情報処理装置４は、通信装置２３、入力装置２４及び出力装置２５のうちの少なくとも１つを備えていなくてもよい。第４の情報処理装置４は、演算装置２１内に照合部４１７、成りすまし判定部４１８、及び認証部４１９が更に実現される点で、第２の情報処理装置２、及び第３の情報処理装置３と異なる。情報処理装置４のその他の特徴は、情報処理装置２、及び情報処理装置３の少なくとも一方のその他の特徴と同一であってもよい。このため、以下では、すでに説明した実施形態と異なる部分について詳細に説明し、その他の重複する部分については適宜説明を省略するものとする。 6, the fourth information processing device 4 includes a calculation device 21 and a storage device 22, similar to the second information processing device 2 and the third information processing device 3. Furthermore, the fourth information processing device 4 may include a communication device 23, an input device 24, and an output device 25, similar to the second information processing device 2 and the third information processing device 3. However, the information processing device 4 may not include at least one of the communication device 23, the input device 24, and the output device 25. The fourth information processing device 4 differs from the second information processing device 2 and the third information processing device 3 in that a matching unit 417, a spoofing determination unit 418, and an authentication unit 419 are further realized in the calculation device 21. Other features of the information processing device 4 may be the same as other features of at least one of the information processing device 2 and the information processing device 3. Therefore, hereinafter, the parts that are different from the embodiments already described will be described in detail, and other overlapping parts will be omitted as appropriate.

　第４の情報処理装置４は、人物の生体認証を実施可能な機構である。情報処理装置４は、画像を用いた照合動作をするとともに、画像を用いて人物が成りすましているか否かを判定し、人物を認証可能な機構であってもよい。 The fourth information processing device 4 is a mechanism capable of performing biometric authentication of a person. The information processing device 4 may be a mechanism capable of performing a matching operation using an image, and determining whether or not a person is impersonating another person using the image, thereby authenticating the person.

　本開示の第４の情報処理装置４は、電子本人確認（ｅｌｅｃｔｒｏｎｉｃＫｎｏｗＹｏｕｒＣｕｓｔｏｍｅｒ：ｅＫＹＣ）等のオンラインでの本人確認に適用されてもよい。上述したように、人物の顔写真一枚の情報を基に、当該人物の画像を合成する技術が存在しており、ｅＫＹＣにおける成りすましの脅威となっている。フェイク動画か否かの正確な判定はｅＫＹＣのようなサービスの信頼性を高めるうえで重要な課題である。ｅＫＹＣに対する入力には、フェイク動画を合成するための情報となる公的な文書の顔画像が含まれている。つまり、ｅＫＹＣに対する成りすましとして、運転免許証、マイナンバーカード等の公的な文書の顔画像のように限られた情報を基にフェイク動画を合成し入力する手法が考えられる。
　［４－２：情報処理装置４が行う情報処理動作］ The fourth information processing device 4 of the present disclosure may be applied to online identity verification such as electronic know your customer (eKYC). As described above, there is a technology that synthesizes an image of a person based on information of a single face photograph of the person, which poses a threat of impersonation in eKYC. Accurate determination of whether or not a video is fake is an important issue in increasing the reliability of services such as eKYC. The input to eKYC includes a face image of an official document that serves as information for synthesizing the fake video. In other words, as an impersonation method for eKYC, a method of synthesizing and inputting a fake video based on limited information such as a face image of an official document such as a driver's license or a My Number card can be considered.
[4-2: Information Processing Operation Performed by Information Processing Device 4]

　図７を参照しながら、情報処理装置４が行う情報処理動作の流れについて説明する。図７は、情報処理装置４が行う情報処理動作の流れを示すフローチャートである。なお、第４実施形態でも、第２の画像が複数のフレームを含む動画像である場合を説明し、第２の画像を判定対象動画とよぶ。 With reference to FIG. 7, the flow of information processing operations performed by the information processing device 4 will be described. FIG. 7 is a flowchart showing the flow of information processing operations performed by the information processing device 4. Note that in the fourth embodiment, a case will also be described in which the second image is a moving image including a plurality of frames, and the second image will be referred to as a determination target moving image.

　図７に示す様に、受付部３１１は、第１の画像としてのソース画像の入力を受け付ける（ステップＳ２０）。受付部３１１は、ソース画像として、運転免許証、マイナンバーカード等の本人確認書類の顔写真の入力を受け付けてもよい。受付部３１１は、第２の画像としての判定対象動画の入力を受け付ける（ステップＳ３０）。 As shown in FIG. 7, the reception unit 311 receives an input of a source image as a first image (step S20). The reception unit 311 may receive an input of a facial photograph on an identification document such as a driver's license or a My Number card as the source image. The reception unit 311 receives an input of a video to be judged as a second image (step S30).

　照合部４１７は、人物の顔画像を照合する（ステップＳ４０）。第１の画像が運転免許証、マイナンバーカード等の公的な文書の顔画像である場合、照合部４１７は、第１の画像に写る人物と、判定対象動画に写る人物とを照合してもよい。この場合、第１の画像に写る人物と、判定対象動画に写る人物との照合に失敗した際は、当該情報処理動作は終了してもよい。または、照合部４１７は、受け付けた第１の画像と予め登録されている登録顔画像とを照合してもよい。または、照合部４１７は、受け付けた判定対象動画と予め登録されている登録顔画像とを照合してもよい。すなわち、照合部４１７は、第１の画像に写る人物、及び判定対象動画に写る人物の少なくとも一方の照合をしてもよい。 The matching unit 417 matches the facial image of the person (step S40). If the first image is a facial image on an official document such as a driver's license or a My Number card, the matching unit 417 may match the person appearing in the first image with the person appearing in the video to be judged. In this case, if the matching between the person appearing in the first image and the person appearing in the video to be judged fails, the information processing operation may be terminated. Alternatively, the matching unit 417 may match the received first image with a registered facial image that has been registered in advance. Alternatively, the matching unit 417 may match the received video to be judged with a registered facial image that has been registered in advance. In other words, the matching unit 417 may match at least one of the person appearing in the first image and the person appearing in the video to be judged.

　なお、第１の画像としてのソース画像と、当該ソース画像に基づいて合成されたフェイク動画とは似ているので、判定対象動画がフェイク動画であった場合にも、第１の画像と判定対象動画との照合が成功する可能性は高い。 In addition, since the source image as the first image and the fake video synthesized based on the source image are similar, even if the video to be judged is a fake video, there is a high possibility that the first image will be successfully matched with the video to be judged.

　成りすまし判定部４１８は、判定対象動画を用いて成りすまし判定を実施する（ステップＳ４１）。第４実施形態において、判定対象動画は、フェイク動画であるか否かの判定とともに、成りすまし判定に用いられてもよい。例えば、判定対象動画は、情報処理装置４からの指示により人物が実施した動作が写る動画であってもよい。情報処理装置４は、顔の向き、視線の向き、顔の位置を指示してもよい。情報処理装置４は、視線を誘導してもよい。情報処理装置４は、ジェスチャーを指示してもよい。成りすまし判定部４１８は、判定対象動画を用いてアクティブライブネス判定を実施してもよい。 The spoofing determination unit 418 performs spoofing determination using the video to be determined (step S41). In the fourth embodiment, the video to be determined may be used for spoofing determination along with determining whether it is a fake video or not. For example, the video to be determined may be a video showing an action performed by a person in response to an instruction from the information processing device 4. The information processing device 4 may instruct the face direction, gaze direction, and face position. The information processing device 4 may guide the gaze. The information processing device 4 may instruct a gesture. The spoofing determination unit 418 may perform active liveness determination using the video to be determined.

　検出部３１２１は、判定対象動画が含む１以上のフレームの各々からランドマークを検出する（ステップＳ３１）。合成部３１２は、ソース画像、及び判定対象動画が含む１以上のフレームの各々のランドマークに基づいて、合成動画を生成する（ステップＳ３２）。合成部３１２は、ソース画像としての顔写真に基づいて、合成画像を生成する。 The detection unit 3121 detects landmarks from each of one or more frames included in the video to be judged (step S31). The synthesis unit 312 generates a composite video based on the source image and the landmarks from each of one or more frames included in the video to be judged (step S32). The synthesis unit 312 generates a composite image based on a facial photograph as the source image.

　抽出部３１３１は、判定対象動画が含むフレームと、当該フレームに対応する合成動画が含むフレームとの差分を抽出する（ステップＳ３３）。強調部３１３２は、差分を強調する（ステップＳ３４）。 The extraction unit 3131 extracts the difference between a frame included in the judgment target video and a frame included in the composite video corresponding to that frame (step S33). The emphasis unit 3132 emphasizes the difference (step S34).

　算出部３１４は、動画差分に基づいて、判定対象動画のフェイク動画らしさを表す指標を算出する（ステップＳ３５）。判定部３１５は、指標に応じて、判定対象動画がフェイク動画か否かを判定する（ステップＳ３６）。判定部３１５は、指標と所定の閾値との比較により、判定対象動画がフェイク動画か否かを判定してもよい。 The calculation unit 314 calculates an index representing the likelihood that the video to be judged is a fake video based on the video difference (step S35). The determination unit 315 determines whether the video to be judged is a fake video or not based on the index (step S36). The determination unit 315 may determine whether the video to be judged is a fake video or not by comparing the index with a predetermined threshold value.

　指標が所定の閾値を超過した場合（ステップＳ３６：Ｙｅｓ）、判定部３１５は、判定対象動画がフェイク動画であると判定する（ステップＳ３７）。指標が所定の閾値を超過しなかった場合（ステップＳ３６：Ｎｏ）、判定部３１５は、判定対象動画がフェイク動画ではないと判定する（ステップＳ３８）。 If the index exceeds the predetermined threshold (step S36: Yes), the determination unit 315 determines that the video to be determined is a fake video (step S37). If the index does not exceed the predetermined threshold (step S36: No), the determination unit 315 determines that the video to be determined is not a fake video (step S38).

　判定部３１５は、判定対象動画がフェイク動画ではないと判定した場合、認証部４１９は、照合部４１７による照合結果、及び成りすまし判定部４１８による判定結果に基づき、人物を認証する（ステップＳ４２）。また、認証部４１９は、判定部３１５が判定対象動画が所定の基準よりもフェイク画像らしくないと判定し、かつ、成りすまし判定部４１８が人物は指示に従った動作をしたと判定したことを条件に、人物を認証してもよい。認証部４１９による人物の認証が成功した場合とは、人物の本人確認ができた場合であってもよい。出力部４１６は、人物の認証結果を出力する（ステップＳ４３）。
　［４－３：情報処理装置４の技術的効果］ When the determination unit 315 determines that the video to be determined is not a fake video, the authentication unit 419 authenticates the person based on the collation result by the collation unit 417 and the determination result by the masquerade determination unit 418 (step S42). The authentication unit 419 may also authenticate the person on the condition that the determination unit 315 determines that the video to be determined is less likely to be a fake image than a predetermined standard and the masquerade determination unit 418 determines that the person has acted in accordance with the instructions. The case where the authentication unit 419 has successfully authenticated the person may be the case where the person's identity has been confirmed. The output unit 416 outputs the authentication result of the person (step S43).
[4-3: Technical Effects of Information Processing Device 4]

　本開示の第４の情報処理装置４は、入力された判定対象動画がフェイク動画か否かを精度よく検知することができるので、精度よく本人確認をすることができる。
　［５：付記］ The fourth information processing device 4 of the present disclosure can accurately detect whether an input video to be judged is a fake video or not, and can therefore perform identity verification with high accuracy.
[5: Supplementary Note]

　以上説明した実施形態に関して、更に以下の付記を開示する。
　［付記１］
　第１の画像及び第２の画像の入力を受け付ける受付手段と、
　前記第１の画像及び前記第２の画像に基づいて、第３の画像を合成する合成手段と、
　前記第２の画像と前記第３の画像との差分を強調する差分強調手段と、
　前記強調された差分に基づいて、前記第２の画像の合成された画像らしさを表す指標を算出する算出手段と、
　前記指標に応じて、前記第２の画像が合成された画像か否かを判定する判定手段と
　を備える情報処理装置。
　［付記２］
　前記差分強調手段は、
　　前記第２の画像と前記第３の画像との差分を抽出する抽出手段と、
　　前記差分を強調する強調手段とを含む
　付記１に記載の情報処理装置。
　［付記３］
　前記第２の画像は、複数のフレームを含む動画像であり、
　前記合成手段は、１以上のフレームを含む前記第３の画像を合成し、
　前記差分強調手段は、前記第２の画像が含むフレームと、当該フレームに対応する前記第３の画像が含むフレームとの差分を強調する
　付記１又は２に記載の情報処理装置。
　［付記４］
　前記差分強調手段は、前記第２の画像が含むフレームと、当該フレームに対応する前記第３の画像が含むフレームとの差分を強調した差分フレームを含む差分強調動画を生成し、
　前記算出手段は、前記差分強調動画に基づいて、前記第２の画像の合成された画像らしさを表す指標を算出する
　付記３に記載の情報処理装置。
　［付記５］
　前記合成手段は、前記第２の画像からランドマークを検出する検出手段を含み、
　前記第１の画像、及び前記ランドマークに基づいて、前記第３の画像を合成する
　付記３に記載の情報処理装置。
　［付記６］
　前記判定手段は、前記指標と所定の閾値との比較により、前記第２の画像が合成された画像か否かを判定する
　付記１又は２に記載の情報処理装置。
　［付記７］
　前記第１の画像に写る対象、及び前記第２の画像に写る対象の少なくとも一方を照合する照合手段と、
　前記判定手段による判定結果、及び前記照合手段による照合結果の少なくとも一方に基づいて、前記対象を認証する認証手段と
　を備える付記１又は２に記載の情報処理装置。
　［付記８］
　第１の画像及び第２の画像の入力を受け付け、
　前記第１の画像及び前記第２の画像に基づいて、第３の画像を合成し、
　前記第２の画像と前記第３の画像との差分を強調し、
　前記強調された差分に基づいて、前記第２の画像の合成された画像らしさを表す指標を算出し、
　前記指標に応じて、前記第２の画像が合成された画像か否かを判定する
　情報処理方法。
　［付記９］
　コンピュータに、
　第１の画像及び第２の画像の入力を受け付け、
　前記第１の画像及び前記第２の画像に基づいて、第３の画像を合成し、
　前記第２の画像と前記第３の画像との差分を強調し、
　前記強調された差分に基づいて、前記第２の画像の合成された画像らしさを表す指標を算出し、
　前記指標に応じて、前記第２の画像が合成された画像か否かを判定する
　情報処理方法を実行させるためのコンピュータプログラムが記録されている記録媒体。 The following supplementary notes are further disclosed regarding the above-described embodiment.
[Appendix 1]
A receiving means for receiving an input of a first image and a second image;
a synthesizing means for synthesizing a third image based on the first image and the second image;
a difference enhancing means for enhancing a difference between the second image and the third image;
a calculation means for calculating an index representing a likelihood that the second image is a synthesized image based on the emphasized difference;
and determining whether the second image is a synthesized image or not, based on the index.
[Appendix 2]
The difference emphasis means is
an extraction means for extracting a difference between the second image and the third image;
and highlighting means for highlighting the difference.
[Appendix 3]
the second image is a video including a plurality of frames;
The synthesizing means synthesizes the third image including one or more frames;
The information processing device according to claim 1 or 2, wherein the difference emphasis means emphasizes a difference between a frame included in the second image and a frame included in the third image corresponding to the frame.
[Appendix 4]
the difference emphasizing means generates a difference-emphasized video including a difference frame in which a difference between a frame included in the second image and a frame included in the third image corresponding to the frame is emphasized;
The information processing device according to claim 3, wherein the calculation means calculates an index representing a likelihood that the second image is a synthesized image based on the difference-emphasized moving image.
[Appendix 5]
The synthesis means includes a detection means for detecting a landmark from the second image,
The information processing device according to claim 3, further comprising: a first image processing unit that processes the first image and the landmarks to generate a third image.
[Appendix 6]
The information processing device according to claim 1 or 2, wherein the determining means determines whether the second image is a synthesized image by comparing the index with a predetermined threshold value.
[Appendix 7]
A matching means for matching at least one of an object appearing in the first image and an object appearing in the second image;
and an authentication unit that authenticates the target based on at least one of a determination result by the determination unit and a matching result by the matching unit.
[Appendix 8]
Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the indicator.
[Appendix 9]
On the computer,
Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the index.

　以上、実施の形態を参照して本開示を説明したが、本開示は上述の実施の形態に限定されるものではない。本開示の構成や詳細には、本開示のスコープ内で当業者が理解し得る様々な変更をすることができる。そして、各実施の形態は、適宜他の実施の形態と組み合わせることができる。 The present disclosure has been described above with reference to the embodiments, but the present disclosure is not limited to the above-mentioned embodiments. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present disclosure within the scope of the present disclosure. Furthermore, each embodiment can be combined with other embodiments as appropriate.

１，２，３，４　情報処理装置
１１，２１１，３１１　受付部
１２，２１２，３１２　合成部
１３，２１３，３１３　差分強調部
２１３１，３１３１　抽出部
２１３２，３１３２　強調部
２１４，３１４　算出部
２１５，３１５　判定部
２１６，３１６，４１６　出力部
３１２１検出部
４１７　照合部
４１８　成りすまし判定部
４１９　認証部 1, 2, 3, 4 Information processing device 11, 211, 311 Reception unit 12, 212, 312 Synthesis unit 13, 213, 313 Difference emphasis unit 2131, 3131 Extraction unit 2132, 3132 Emphasis unit 214, 314 Calculation unit 215, 315 Determination unit 216, 316, 416 Output unit 3121 Detection unit 417 Collation unit 418 Impersonation determination unit 419 Authentication unit

Claims

A receiving means for receiving an input of a first image and a second image;
a synthesizing means for synthesizing a third image based on the first image and the second image;
a difference enhancing means for enhancing a difference between the second image and the third image;
a calculation means for calculating an index representing a likelihood that the second image is a synthesized image based on the emphasized difference;
and determining whether the second image is a synthesized image or not, based on the index.

The difference emphasis means is
an extraction means for extracting a difference between the second image and the third image;
The information processing apparatus according to claim 1 , further comprising: emphasis means for emphasizing the difference.

the second image is a video including a plurality of frames;
The synthesizing means synthesizes the third image including one or more frames;
The information processing apparatus according to claim 1 , wherein the difference emphasis means emphasizes a difference between a frame included in the second image and a frame included in the third image corresponding to the frame.

the difference emphasizing means generates a difference-emphasized video including a difference frame in which a difference between a frame included in the second image and a frame included in the third image corresponding to the frame is emphasized;
The information processing apparatus according to claim 3 , wherein the calculation means calculates an index representing a likelihood that the second image is a synthesized image, based on the difference-emphasized moving image.

The synthesis means includes a detection means for detecting a landmark from the second image,
The information processing apparatus according to claim 3 , wherein the third image is synthesized based on the first image and the landmarks.

The information processing apparatus according to claim 1 , wherein the determining means determines whether or not the second image is a synthesized image by comparing the index with a predetermined threshold value.

A matching means for matching at least one of an object appearing in the first image and an object appearing in the second image;
The information processing apparatus according to claim 1 , further comprising: an authentication unit that authenticates the target based on at least one of a determination result by the determination unit and a comparison result by the comparison unit.

Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the indicator.

On the computer,
Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the index.