[go: up one dir, main page]

WO2024261856A1 - Information processing device, information processing method, and recording medium - Google Patents

Information processing device, information processing method, and recording medium Download PDF

Info

Publication number
WO2024261856A1
WO2024261856A1 PCT/JP2023/022755 JP2023022755W WO2024261856A1 WO 2024261856 A1 WO2024261856 A1 WO 2024261856A1 JP 2023022755 W JP2023022755 W JP 2023022755W WO 2024261856 A1 WO2024261856 A1 WO 2024261856A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
difference
information processing
unit
processing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2023/022755
Other languages
French (fr)
Japanese (ja)
Inventor
和也 柿崎
拓磨 天田
雅弘 佛崎
俊則 荒木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to PCT/JP2023/022755 priority Critical patent/WO2024261856A1/en
Publication of WO2024261856A1 publication Critical patent/WO2024261856A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection

Definitions

  • This disclosure relates to the technical fields of information processing devices, information processing methods, and recording media.
  • Patent Document 1 describes a technology in which a class indicates whether a person's face (object) is real (true) or fake (false), and a synthesized area (i.e., an area added to an original video) is detected as an element of the detection target class to detect whether the video is a fake video (e.g., a false video generated by a synthesis process, etc.).
  • a synthesized area i.e., an area added to an original video
  • the present disclosure aims to provide an information processing device, an information processing method, and a recording medium that are intended to accurately detect whether an input image is a composite image.
  • One aspect of the information processing device includes a receiving means for receiving an input of a first image and a second image, a synthesis means for synthesizing a third image based on the first image and the second image, a difference emphasis means for emphasizing the difference between the second image and the third image, a calculation means for calculating an index representing the likelihood that the second image is a synthesized image based on the emphasized difference, and a determination means for determining whether the second image is a synthesized image or not according to the index.
  • One aspect of the information processing method is to receive an input of a first image and a second image, synthesize a third image based on the first image and the second image, emphasize the difference between the second image and the third image, calculate an index representing the likelihood of the second image being a synthesized image based on the emphasized difference, and determine whether the second image is a synthesized image or not based on the index.
  • a computer program is recorded to cause a computer to execute an information processing method that accepts input of a first image and a second image, synthesizes a third image based on the first image and the second image, emphasizes the difference between the second image and the third image, calculates an index representing the likelihood that the second image is a synthesized image based on the emphasized difference, and determines whether the second image is a synthesized image according to the index.
  • the information processing device, information processing method, and recording medium disclosed herein can accurately detect whether an input image is a composite image.
  • FIG. 1 is a block diagram showing a configuration of a first information processing device according to the present disclosure.
  • FIG. 2 is a block diagram showing a configuration of a second information processing device according to the present disclosure.
  • 10 is a flowchart showing a processing operation of a second information processing device according to the present disclosure.
  • FIG. 13 is a block diagram showing a configuration of a third information processing device according to the present disclosure. 13 is a flowchart showing a processing operation of a third information processing device according to the present disclosure.
  • FIG. 13 is a block diagram showing a configuration of a fourth information processing device according to the present disclosure. 13 is a flowchart showing a processing operation of a fourth information processing device according to the present disclosure.
  • a first embodiment of an information processing device, an information processing method, and a recording medium will be described.
  • a first embodiment of an information processing device, an information processing method, and a recording medium will be described using a first information processing device 1 according to the present disclosure.
  • FIG. 1 is a block diagram showing a configuration of a first information processing device 1 according to the present disclosure.
  • the information processing device 1 includes a receiving unit 11, a synthesis unit 12, a difference emphasizing unit 13, a calculation unit 14, and a determination unit 15.
  • the receiving unit 11 receives input of a first image and a second image.
  • the synthesis unit 12 synthesizes a third image based on the first image and the second image.
  • the difference emphasizing unit 13 emphasizes the difference between the second image and the third image.
  • the calculation unit 14 calculates an index representing the likelihood of the second image being a synthesized image based on the emphasized difference.
  • the determination unit 15 determines whether the second image is a synthesized image according to the index. [1-2: Technical Effects of Information Processing Device 1]
  • the first information processing device 1 makes a judgment based on the difference between an input image and a composite image, and can therefore accurately detect whether or not the input image is a composite image.
  • a second embodiment of an information processing device, an information processing method, and a recording medium will be described.
  • a second embodiment of an information processing device, an information processing method, and a recording medium will be described using a second information processing device 2 according to the present disclosure.
  • FIG. 2 is a block diagram showing the configuration of the second information processing device 2.
  • the information processing device 2 includes a calculation device 21 and a storage device 22.
  • the information processing device 2 may include a communication device 23, an input device 24, and an output device 25.
  • the information processing device 2 does not have to include at least one of the communication device 23, the input device 24, and the output device 25.
  • the calculation device 21, the storage device 22, the communication device 23, the input device 24, and the output device 25 may be connected via a data bus 26.
  • the arithmetic device 21 includes, for example, at least one of a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an FPGA (Field Programmable Gate Array).
  • the arithmetic device 21 reads a computer program.
  • the arithmetic device 21 may read a computer program stored in the storage device 22.
  • the arithmetic device 21 may read a computer program stored in a computer-readable and non-transient recording medium using a recording medium reading device (e.g., an input device 24 described later) not shown in the figure that is provided in the information processing device 2.
  • a recording medium reading device e.g., an input device 24 described later
  • the arithmetic device 21 may acquire (i.e., download or read) a computer program from a device (not shown) located outside the information processing device 2 via the communication device 23 (or other communication device).
  • the arithmetic device 21 executes the read computer program.
  • a logical functional block for executing the operation to be performed by the information processing device 2 is realized within the calculation device 21.
  • the calculation device 21 can function as a controller for realizing a logical functional block for executing the operation (in other words, processing) to be performed by the information processing device 2.
  • the arithmetic device 21 realizes a reception unit 211, which is a specific example of the "reception means” described in the appendix described later, a synthesis unit 212, which is a specific example of the “synthesizing means” described in the appendix described later, a difference emphasis unit, which is a specific example of the "difference emphasis means” described in the appendix described later, a calculation unit 214, which is a specific example of the "calculation means” described in the appendix described later, a judgment unit 215, which is a specific example of the "judgment means” described in the appendix described later, and an output unit 216.
  • a reception unit 211 which is a specific example of the "reception means” described in the appendix described later
  • a synthesis unit 212 which is a specific example of the “synthesizing means” described in the appendix described later
  • a difference emphasis unit which is a specific example of the "difference emphasis means” described
  • the difference emphasis unit may have an extraction unit 2131 and an emphasis unit 2132. Details of the operations of the reception unit 211, synthesis unit 212, difference emphasis unit, calculation unit 214, judgment unit 215, and output unit 216 will be described later with reference to FIG. 3.
  • the storage device 22 can store desired data.
  • the storage device 22 may temporarily store a computer program executed by the arithmetic device 21.
  • the storage device 22 may temporarily store data that is temporarily used by the arithmetic device 21 when the arithmetic device 21 is executing a computer program.
  • the storage device 22 may store data that the information processing device 2 stores for a long period of time.
  • the storage device 22 may include at least one of a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), and a disk array device.
  • the storage device 22 may include a non-temporary recording medium.
  • the communication device 23 is capable of communicating with devices external to the information processing device 2 via a communication network (not shown).
  • the communication device 23 may be a communication interface based on standards such as Ethernet (registered trademark), Wi-Fi (registered trademark), Bluetooth (registered trademark), and USB (Universal Serial Bus).
  • the input device 24 is a device that accepts information input to the information processing device 2 from outside the information processing device 2.
  • the input device 24 may include an operating device (e.g., at least one of a keyboard, a mouse, and a touch panel) that can be operated by an operator of the information processing device 2.
  • the input device 24 may include a reading device that can read information recorded as data on a recording medium that can be attached externally to the information processing device 2.
  • the output device 25 is a device that outputs information to the outside of the information processing device 2.
  • the output device 25 may output information as an image. That is, the output device 25 may include a display device (so-called a display) capable of displaying an image showing the information to be output.
  • the output device 25 may output information as sound. That is, the output device 25 may include an audio device (so-called a speaker) capable of outputting sound.
  • the output device 25 may output information on paper. That is, the output device 25 may include a printing device (so-called a printer) capable of printing desired information on paper. [2-3: Information Processing Operation Performed by Information Processing Device 2]
  • FIG. 3 is a flowchart showing the flow of the information processing operation performed by the information processing device 2. Note that this disclosure assumes that the first image is not a fake image but a real image.
  • the reception unit 211 receives input of a first image (step S20).
  • the reception unit 211 may receive input of a face image including a person's face area as the first image.
  • the first image may be a still image.
  • the "first image” may be referred to as the "source image.”
  • the reception unit 211 receives input of a second image (step S21).
  • the reception unit 211 may receive input of a face image including a person's facial area as the second image.
  • the second image may be a still image.
  • the second image may be a moving image.
  • the "second image” may be referred to as the "image to be determined.”
  • the synthesis unit 212 synthesizes a third image based on the first image and the second image.
  • the "third image” may be referred to as a "synthetic image”.
  • the synthesis unit 212 may generate a synthetic image using, for example, a technique called face swap. Face swap is a technique for exchanging a face area of a source image with a face area of a target image.
  • the synthesis unit 212 may generate a synthetic image by fitting a face area of the judgment target image into a face area of the source image.
  • the synthesis unit 212 may generate a synthetic image having the characteristics of the source image. For example, the synthesis unit 212 may generate a synthetic image that maintains the facial expression of the source image.
  • the extraction unit 2131 extracts the difference between the image to be determined and the composite image (step S23). In other words, the extraction unit 2131 extracts the portion where the image to be determined and the composite image differ.
  • the extraction unit 2131 may obtain the difference in pixel value for each pixel, and generate a difference image in which each pixel is represented by the difference in pixel value.
  • the highlighting unit 2132 highlights the difference (step S24).
  • the highlighting unit 2132 may, for example, highlight each pixel value of the difference image by multiplying it by a real number to highlight the difference. That is, the highlighting unit 2132 may increase the pixel value of a pixel in the difference image whose pixel value is not 0 (that is, there is no difference in pixel value between the image to be determined and the composite image).
  • the difference image in which the difference is highlighted by the highlighting unit 2132 may be referred to as a "difference highlighted image".
  • the highlighting unit 2132 may, for example, generate a difference highlighted image by setting the pixel value of a pixel in the difference image whose pixel value is not 0 to the maximum value that it can take.
  • the highlighting unit 2132 may, for example, generate a difference highlighted image by setting the pixel value of a pixel in the difference image whose pixel value is equal to or greater than a predetermined value to the maximum value that it can take.
  • the highlighting unit 2132 may, for example, generate a difference highlighted image that can take four types of pixel values, for example, large difference, medium difference, small difference, and no difference, by setting a range of pixel values in the difference image.
  • the highlighting unit 2132 may employ any method to highlight the difference.
  • the calculation unit 214 calculates an index representing the likelihood that the image to be determined is a fake image based on the emphasized difference (step S25).
  • the index calculated by the calculation unit 214 may be a real number.
  • the calculation unit 214 may use a calculation model to calculate an index representing the likelihood that the image to be determined is a fake image.
  • the calculation model is a model that outputs an index representing the likelihood that the image is a fake image when the emphasized difference is input.
  • the calculation model may be a machine-learned model.
  • the learning mechanism that trains the calculation model may train the calculation model using a difference-emphasized image having information indicating the correct answer (the image to be determined is a real image, or the image to be determined is a fake image) as teacher data.
  • the learning mechanism may train the calculation model on a method for calculating the index using the information indicating the correct answer and the index representing the likelihood that the image is a fake image output by the calculation model.
  • the determination unit 215 determines whether the second image is a composite image or not based on the index (step S26).
  • the determination unit 215 may determine whether the second image is a composite image or not by comparing the index with a predetermined threshold value.
  • step S26: Yes the determination unit 215 determines that the second image is a composite image (step S27). If the index does not exceed the predetermined threshold (step S26: No), the determination unit 215 determines that the second image is a non-composite image (step S28).
  • the output unit 216 outputs according to the determination result (step S29).
  • the output unit 216 may control the output device 25 to cause the output device 25 to output according to the determination result.
  • the second information processing device 2 of the present disclosure emphasizes the difference between the input judgment target image and the composite image and judges based on the emphasized difference, so that it is possible to accurately detect whether the input judgment target image is a fake image or not.
  • the information processing device 2 judges whether the image is a fake image or not by comparing with a threshold value, it is possible to adjust how much an image that seems like a fake image is judged to be a fake image by setting the threshold value.
  • a third embodiment of an information processing device, an information processing method, and a recording medium will be described below.
  • a third embodiment of an information processing device, an information processing method, and a recording medium will be described using a third information processing device 3 according to the present disclosure.
  • the second image may be referred to as a "video to be determined”.
  • a moving image showing an event that has not actually occurred may be referred to as a fake video.
  • a moving image showing an event that has actually occurred may be referred to as a real video. [3-1: Fake video]
  • a genuine video may include a video showing an action performed by person B in front of the camera as captured by the camera.
  • a fake video may include a moving image synthesized to make it appear as if person A, a different person from person B, performed the action performed by person B in front of the camera as captured by the camera.
  • reenactment that creates fake videos in which the facial expression of a person in an original image changes to a desired expression or faces a desired direction.
  • a technology is known that uses at least one facial image of person A and changes the facial expression of person A in the facial image to match the facial expression of person B, creating a moving image that makes it appear as if person A is changing his or her facial expression (hereinafter sometimes referred to as "animating still images").
  • the original video may be, for example, a video image showing the actions of person B in front of the camera as captured by the camera.
  • the original image may also be a still image showing person A, who is different from person B.
  • animating a still image first, landmarks are detected in the original image. Landmarks are also detected from each of the video frames that make up the original video. Next, for each video frame that makes up the original video, the original image is edited so that the landmarks in the original image and the landmarks in the corresponding video frame are matched to generate a composite frame. The generated composite frames are then joined together to generate an animated still image.
  • a landmark may be a characteristic position of a subject that appears in the image.
  • the facial direction, expression, etc. of person A in the original image can be changed using the facial landmarks of person B in the original video, and a video can be synthesized in which the facial direction, expression, etc. of person A changes.
  • the landmarks that change the facial direction, expression, etc. of a person may be characteristic parts of the face.
  • the characteristic positions on the face may be specific points on parts of the face such as the eyes, nose, mouth, etc.
  • the acquired moving image is similar to a composite moving image synthesized using still images, the acquired moving image is likely to be a fake moving image.
  • this property is used to determine whether or not the moving image is a fake moving image. That is, in the third embodiment, a composite moving image is generated using still images, and the acquired moving image is compared with the composite moving image to determine whether or not the acquired moving image is a fake moving image.
  • FIG. 4 is a block diagram showing the configuration of the third information processing device 3.
  • the third information processing device 3 includes a calculation device 21 and a storage device 22, similar to the second information processing device 2. Furthermore, the third information processing device 3 may include a communication device 23, an input device 24, and an output device 25, similar to the second information processing device 2. However, the information processing device 3 may not include at least one of the communication device 23, the input device 24, and the output device 25.
  • the third information processing device 3 differs from the second information processing device 2 in that the synthesis unit 312 includes a detection unit 3121.
  • Other features of the information processing device 3 may be the same as other features of the information processing device 2. For this reason, hereinafter, the parts that are different from the embodiments already described will be described in detail, and other overlapping parts will be appropriately omitted.
  • FIG. 5 is a flowchart showing the flow of information processing operations performed by the information processing device 3.
  • the reception unit 311 receives input of a source image as a first image (step S20).
  • the detection unit 3121 detects landmarks from the source image.
  • the detection unit 3121 may detect characteristic positions in the face area as landmarks from a still image.
  • the detection unit 3121 may detect specific points of parts of the body such as the eyes, nose, and mouth as landmarks from the source image.
  • the reception unit 311 receives input of the video to be judged as the second image (step S30).
  • the detection unit 3121 detects landmarks from each of the one or more frames included in the video to be judged (step S31).
  • the one or more frames included in the video to be judged may be all frames included in the video to be judged.
  • the one or more frames included in the video to be judged may be any one or more frames included in the moving image.
  • the detection unit 3121 may detect, as landmarks, positions from the image to be judged that are equivalent to the landmarks detected from the source image.
  • the synthesis unit 312 synthesizes a third image based on the source image, the landmarks of the source image, and the landmarks of one or more frames included in the video to be determined (step S32).
  • the third image is a synthetic video including one or more frames.
  • the "third image” may be referred to as a "synthetic video.”
  • the synthesis unit 312 may first generate a synthesis frame for each input frame constituting the judgment target moving image by editing the landmarks of the source image to match the landmarks of the corresponding input frame. Next, the synthesis unit 312 may connect each synthesis frame together to animate the source images, which are still images, and generate a synthesis moving image.
  • the extraction unit 3131 extracts the difference between the determination target moving image and the composite moving image (step S33).
  • the extraction unit 3131 extracts the difference between a frame included in the determination target moving image and a frame included in the composite moving image corresponding to the frame.
  • the extraction unit 3131 may extract the difference between the determination target moving image and the composite moving image to generate a difference moving image.
  • the determination target moving image d i includes frames 1 to F
  • the determination target moving image d i may be expressed as [x i 1 , ..., x i F ].
  • the composite moving image d f includes frames 1 to F
  • the composite moving image d f may be expressed as [x f 1 , ..., x i F ].
  • the extraction unit 3131 may generate a difference video including difference frames corresponding to the any one or more frames.
  • the emphasis unit 3132 emphasizes the difference (step S34).
  • the emphasis unit 3132 may generate a difference emphasized moving image including a difference emphasized frame in which a difference between a frame included in the determination target moving image and a frame included in the composite moving image corresponding to the frame is emphasized.
  • the difference emphasized moving image d diff generated by the emphasis unit 3132 may be expressed as [ ⁇
  • the calculation unit 314 calculates an index representing the likelihood that the video to be judged is a fake video based on one or more frames included in the difference-emphasized video (step S35).
  • the calculation unit 314 may use a calculation model to calculate an index representing the likelihood that the image to be judged is a fake image.
  • the calculation model when one or more frames included in the difference-emphasized video are input, the calculation model outputs an index representing the likelihood that the image is a fake image.
  • the one or more frames included in the difference-emphasized video may be all frames included in the difference-emphasized video.
  • the one or more frames included in the difference-emphasized video may be any one or more frames included in the moving image.
  • the determination unit 315 determines whether the video to be determined is a fake video or not based on the index (step S36).
  • the determination unit 315 may determine whether the video to be determined is a fake video or not by comparing the index with a predetermined threshold value.
  • step S36: Yes the determination unit 315 determines that the video to be determined is a fake video (step S37). If the index does not exceed the predetermined threshold (step S36: No), the determination unit 315 determines that the video to be determined is not a fake video (step S38).
  • the output unit 316 outputs according to the determination result (step S39).
  • a synthetic video generated using still images and landmarks can capture the characteristics of a fake video generated using a technology such as deep fake.
  • the third information processing device 3 of the present disclosure uses the property that if the characteristics of a synthetic video generated using a source image, which is an input still image, and the input video to be judged are similar, the video to be judged is likely to be a fake video.
  • the third information processing device 3 of the present disclosure can accurately judge whether the video to be judged is a genuine video that is not forged or a fake video that is forged.
  • the information processing device 3 can accurately detect whether the input video to be judged is a fake video or not based on the difference between each frame.
  • the information processing device 3 can accurately detect deep fakes generated using landmarks.
  • a fourth embodiment of an information processing device, an information processing method, and a recording medium will be described.
  • a fourth embodiment of an information processing device, an information processing method, and a recording medium will be described using a fourth information processing device 4 according to the present disclosure.
  • FIG. 6 is a block diagram showing the configuration of the fourth information processing device 4.
  • the fourth information processing device 4 includes a calculation device 21 and a storage device 22, similar to the second information processing device 2 and the third information processing device 3. Furthermore, the fourth information processing device 4 may include a communication device 23, an input device 24, and an output device 25, similar to the second information processing device 2 and the third information processing device 3. However, the information processing device 4 may not include at least one of the communication device 23, the input device 24, and the output device 25.
  • the fourth information processing device 4 differs from the second information processing device 2 and the third information processing device 3 in that a matching unit 417, a spoofing determination unit 418, and an authentication unit 419 are further realized in the calculation device 21.
  • Other features of the information processing device 4 may be the same as other features of at least one of the information processing device 2 and the information processing device 3. Therefore, hereinafter, the parts that are different from the embodiments already described will be described in detail, and other overlapping parts will be omitted as appropriate.
  • the fourth information processing device 4 is a mechanism capable of performing biometric authentication of a person.
  • the information processing device 4 may be a mechanism capable of performing a matching operation using an image, and determining whether or not a person is impersonating another person using the image, thereby authenticating the person.
  • the fourth information processing device 4 of the present disclosure may be applied to online identity verification such as electronic know your customer (eKYC).
  • eKYC electronic know your customer
  • Accurate determination of whether or not a video is fake is an important issue in increasing the reliability of services such as eKYC.
  • the input to eKYC includes a face image of an official document that serves as information for synthesizing the fake video.
  • FIG. 7 is a flowchart showing the flow of information processing operations performed by the information processing device 4. Note that in the fourth embodiment, a case will also be described in which the second image is a moving image including a plurality of frames, and the second image will be referred to as a determination target moving image.
  • the reception unit 311 receives an input of a source image as a first image (step S20).
  • the reception unit 311 may receive an input of a facial photograph on an identification document such as a driver's license or a My Number card as the source image.
  • the reception unit 311 receives an input of a video to be judged as a second image (step S30).
  • the matching unit 417 matches the facial image of the person (step S40). If the first image is a facial image on an official document such as a driver's license or a My Number card, the matching unit 417 may match the person appearing in the first image with the person appearing in the video to be judged. In this case, if the matching between the person appearing in the first image and the person appearing in the video to be judged fails, the information processing operation may be terminated. Alternatively, the matching unit 417 may match the received first image with a registered facial image that has been registered in advance. Alternatively, the matching unit 417 may match the received video to be judged with a registered facial image that has been registered in advance. In other words, the matching unit 417 may match at least one of the person appearing in the first image and the person appearing in the video to be judged.
  • the source image as the first image and the fake video synthesized based on the source image are similar, even if the video to be judged is a fake video, there is a high possibility that the first image will be successfully matched with the video to be judged.
  • the spoofing determination unit 418 performs spoofing determination using the video to be determined (step S41).
  • the video to be determined may be used for spoofing determination along with determining whether it is a fake video or not.
  • the video to be determined may be a video showing an action performed by a person in response to an instruction from the information processing device 4.
  • the information processing device 4 may instruct the face direction, gaze direction, and face position.
  • the information processing device 4 may guide the gaze.
  • the information processing device 4 may instruct a gesture.
  • the spoofing determination unit 418 may perform active liveness determination using the video to be determined.
  • the detection unit 3121 detects landmarks from each of one or more frames included in the video to be judged (step S31).
  • the synthesis unit 312 generates a composite video based on the source image and the landmarks from each of one or more frames included in the video to be judged (step S32).
  • the synthesis unit 312 generates a composite image based on a facial photograph as the source image.
  • the extraction unit 3131 extracts the difference between a frame included in the judgment target video and a frame included in the composite video corresponding to that frame (step S33).
  • the emphasis unit 3132 emphasizes the difference (step S34).
  • the calculation unit 314 calculates an index representing the likelihood that the video to be judged is a fake video based on the video difference (step S35).
  • the determination unit 315 determines whether the video to be judged is a fake video or not based on the index (step S36).
  • the determination unit 315 may determine whether the video to be judged is a fake video or not by comparing the index with a predetermined threshold value.
  • step S36 determines that the video to be determined is a fake video (step S37). If the index does not exceed the predetermined threshold (step S36: No), the determination unit 315 determines that the video to be determined is not a fake video (step S38).
  • the authentication unit 419 authenticates the person based on the collation result by the collation unit 417 and the determination result by the masquerade determination unit 418 (step S42).
  • the authentication unit 419 may also authenticate the person on the condition that the determination unit 315 determines that the video to be determined is less likely to be a fake image than a predetermined standard and the masquerade determination unit 418 determines that the person has acted in accordance with the instructions.
  • the case where the authentication unit 419 has successfully authenticated the person may be the case where the person's identity has been confirmed.
  • the output unit 416 outputs the authentication result of the person (step S43). [4-3: Technical Effects of Information Processing Device 4]
  • the fourth information processing device 4 of the present disclosure can accurately detect whether an input video to be judged is a fake video or not, and can therefore perform identity verification with high accuracy.
  • the difference emphasis means is an extraction means for extracting a difference between the second image and the third image; and highlighting means for highlighting the difference.
  • the second image is a video including a plurality of frames;
  • the synthesizing means synthesizes the third image including one or more frames;
  • the information processing device according to claim 1 or 2
  • the difference emphasis means emphasizes a difference between a frame included in the second image and a frame included in the third image corresponding to the frame.
  • the difference emphasizing means generates a difference-emphasized video including a difference frame in which a difference between a frame included in the second image and a frame included in the third image corresponding to the frame is emphasized;
  • the information processing device according to claim 3, wherein the calculation means calculates an index representing a likelihood that the second image is a synthesized image based on the difference-emphasized moving image.
  • the synthesis means includes a detection means for detecting a landmark from the second image, The information processing device according to claim 3, further comprising: a first image processing unit that processes the first image and the landmarks to generate a third image.
  • the information processing device according to claim 1 or 2 wherein the determining means determines whether the second image is a synthesized image by comparing the index with a predetermined threshold value.
  • [Appendix 8] Accepting input of a first image and a second image; synthesizing a third image based on the first image and the second image; highlighting a difference between the second image and the third image; calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference; determining whether the second image is a synthesized image or not according to the indicator.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An information processing device 1 comprises: a reception unit 11 that receives input of a first image and a second image; a synthesis unit 12 that synthesizes a third image on the basis of the first image and the second image; a difference emphasis unit 13 that emphasizes a difference between the second image and the third image; a calculation unit 14 that calculates an index representing the likeness of the second image to a synthesized image on the basis of the emphasized difference; and a determination unit 15 that determines whether the second image is a synthesized image according to the index.

Description

情報処理装置、情報処理方法、及び、記録媒体Information processing device, information processing method, and recording medium

 本開示は、情報処理装置、情報処理方法、及び、記録媒体の技術分野に関する。 This disclosure relates to the technical fields of information processing devices, information processing methods, and recording media.

 特許文献1には、クラスが人物の顔(物体)が本物か(真)、あるいは、フェイクか(偽)を示し、検知対象クラスの要素として、合成された領域(即ち、本来の動画に付け足された領域)を検知して、動画がフェイク動画(例えば、合成処理等で生成された嘘の動画)を検知する技術が記載されている。 Patent Document 1 describes a technology in which a class indicates whether a person's face (object) is real (true) or fake (false), and a synthesized area (i.e., an area added to an original video) is detected as an element of the detection target class to detect whether the video is a fake video (e.g., a false video generated by a synthesis process, etc.).

国際公開第2022/054246号公報International Publication No. 2022/054246

 本開示は、入力された画像が合成された画像か否かを精度よく検知することを目的とする情報処理装置、情報処理方法、及び、記録媒体を提供することを課題とする。 The present disclosure aims to provide an information processing device, an information processing method, and a recording medium that are intended to accurately detect whether an input image is a composite image.

 情報処理装置の一の態様は、第1の画像及び第2の画像の入力を受け付ける受付手段と、前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成する合成手段と、前記第2の画像と前記第3の画像との差分を強調する差分強調手段と、前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出する算出手段と、前記指標に応じて、前記第2の画像が合成された画像か否かを判定する判定手段とを備える。 One aspect of the information processing device includes a receiving means for receiving an input of a first image and a second image, a synthesis means for synthesizing a third image based on the first image and the second image, a difference emphasis means for emphasizing the difference between the second image and the third image, a calculation means for calculating an index representing the likelihood that the second image is a synthesized image based on the emphasized difference, and a determination means for determining whether the second image is a synthesized image or not according to the index.

 情報処理方法の一の態様は、第1の画像及び第2の画像の入力を受け付け、前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成し、前記第2の画像と前記第3の画像との差分を強調し、前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出し、前記指標に応じて、前記第2の画像が合成された画像か否かを判定する。 One aspect of the information processing method is to receive an input of a first image and a second image, synthesize a third image based on the first image and the second image, emphasize the difference between the second image and the third image, calculate an index representing the likelihood of the second image being a synthesized image based on the emphasized difference, and determine whether the second image is a synthesized image or not based on the index.

 記録媒体の一の態様は、コンピュータに、第1の画像及び第2の画像の入力を受け付け、前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成し、前記第2の画像と前記第3の画像との差分を強調し、前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出し、前記指標に応じて、前記第2の画像が合成された画像か否かを判定する情報処理方法を実行させるためのコンピュータプログラムが記録されている。 In one embodiment of the recording medium, a computer program is recorded to cause a computer to execute an information processing method that accepts input of a first image and a second image, synthesizes a third image based on the first image and the second image, emphasizes the difference between the second image and the third image, calculates an index representing the likelihood that the second image is a synthesized image based on the emphasized difference, and determines whether the second image is a synthesized image according to the index.

 本開示にかかる情報処理装置、情報処理方法、及び、記録媒体は、入力された画像が合成された画像か否かを精度よく検知することができる。 The information processing device, information processing method, and recording medium disclosed herein can accurately detect whether an input image is a composite image.

本開示にかかる第1の情報処理装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a first information processing device according to the present disclosure. 本開示にかかる第2の情報処理装置の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a second information processing device according to the present disclosure. 本開示にかかる第2の情報処理装置の処理動作を示すフローチャートである。10 is a flowchart showing a processing operation of a second information processing device according to the present disclosure. 本開示にかかる第3の情報処理装置の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a third information processing device according to the present disclosure. 本開示にかかる第3の情報処理装置の処理動作を示すフローチャートである。13 is a flowchart showing a processing operation of a third information processing device according to the present disclosure. 本開示にかかる第4の情報処理装置の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a fourth information processing device according to the present disclosure. 本開示にかかる第4の情報処理装置の処理動作を示すフローチャートである。13 is a flowchart showing a processing operation of a fourth information processing device according to the present disclosure.

 以下、図面を参照しながら、情報処理装置、情報処理方法、及び、記録媒体の実施形態について説明する。
 [1:第1実施形態]
Hereinafter, embodiments of an information processing device, an information processing method, and a recording medium will be described with reference to the drawings.
[1: First embodiment]

 情報処理装置、情報処理方法、及び、記録媒体の第1実施形態について説明する。以下では、本開示にかかる第1の情報処理装置1を用いて、情報処理装置、情報処理方法、及び記録媒体の第1実施形態について説明する。
 [1-1:情報処理装置1の構成]
A first embodiment of an information processing device, an information processing method, and a recording medium will be described. Hereinafter, a first embodiment of an information processing device, an information processing method, and a recording medium will be described using a first information processing device 1 according to the present disclosure.
[1-1: Configuration of information processing device 1]

 図1は、本開示にかかる第1の情報処理装置1の構成を示すブロック図である。図1に示すように、情報処理装置1は、受付部11と、合成部12と、差分強調部13と、算出部14と、判定部15とを備える。受付部11は、第1の画像及び第2の画像の入力を受け付ける。合成部12は、第1の画像及び第2の画像に基づいて、第3の画像を合成する。差分強調部13は、第2の画像と第3の画像との差分を強調する。算出部14は、強調された差分に基づいて、第2の画像の合成された画像らしさを表す指標を算出する。判定部15は、指標に応じて、第2の画像が合成された画像か否かを判定する。
 [1-2:情報処理装置1の技術的効果]
FIG. 1 is a block diagram showing a configuration of a first information processing device 1 according to the present disclosure. As shown in FIG. 1, the information processing device 1 includes a receiving unit 11, a synthesis unit 12, a difference emphasizing unit 13, a calculation unit 14, and a determination unit 15. The receiving unit 11 receives input of a first image and a second image. The synthesis unit 12 synthesizes a third image based on the first image and the second image. The difference emphasizing unit 13 emphasizes the difference between the second image and the third image. The calculation unit 14 calculates an index representing the likelihood of the second image being a synthesized image based on the emphasized difference. The determination unit 15 determines whether the second image is a synthesized image according to the index.
[1-2: Technical Effects of Information Processing Device 1]

 本開示にかかる第1の情報処理装置1は、入力された画像と合成した画像との差分に基づいた判定をするので、入力された画像が合成された画像か否かを精度よく検知することができる。
 [2:第2実施形態]
The first information processing device 1 according to the present disclosure makes a judgment based on the difference between an input image and a composite image, and can therefore accurately detect whether or not the input image is a composite image.
[2: Second embodiment]

 情報処理装置、情報処理方法、及び、記録媒体の第2実施形態について説明する。以下では、本開示にかかる第2の情報処理装置2を用いて、情報処理装置、情報処理方法、及び記録媒体の第2実施形態について説明する。
 [2-1:フェイク画像]
A second embodiment of an information processing device, an information processing method, and a recording medium will be described. Hereinafter, a second embodiment of an information processing device, an information processing method, and a recording medium will be described using a second information processing device 2 according to the present disclosure.
[2-1: Fake image]

 人物の顔写真一枚の情報を基づいて、当該人物の画像を合成する技術がある。人物の画像の合成の技術として、例えば、ディープフェイク(deepfake)が知られている。ディープフェイクは、実際には起こっていないことがらが写る、フェイクの画像を合成する技術として知られている。以下、実際には起こっていないことがらが写った画像をフェイク画像とよぶ場合がある。また、合成された画像をフェイク画像とよぶ場合もある。また、実際に起こったことがらが写った画像を本物画像とよぶ場合がある。
 [2-2:情報処理装置2の構成]
There is a technology that synthesizes an image of a person based on information from a single photograph of the person's face. For example, deepfake is known as a technology for synthesizing an image of a person. Deepfake is known as a technology for synthesizing a fake image that shows something that does not actually happen. Hereinafter, an image that shows something that does not actually happen may be called a fake image. A synthesized image may also be called a fake image. An image that shows something that actually happened may also be called a real image.
[2-2: Configuration of information processing device 2]

 図2は、第2の情報処理装置2の構成を示すブロック図である。図2に示すように、情報処理装置2は、演算装置21と、記憶装置22とを備えている。更に、情報処理装置2は、通信装置23と、入力装置24と、出力装置25とを備えていてもよい。但し、情報処理装置2は、通信装置23、入力装置24及び出力装置25のうちの少なくとも一つを備えていなくてもよい。演算装置21と、記憶装置22と、通信装置23と、入力装置24と、出力装置25とは、データバス26を介して接続されていてもよい。 FIG. 2 is a block diagram showing the configuration of the second information processing device 2. As shown in FIG. 2, the information processing device 2 includes a calculation device 21 and a storage device 22. Furthermore, the information processing device 2 may include a communication device 23, an input device 24, and an output device 25. However, the information processing device 2 does not have to include at least one of the communication device 23, the input device 24, and the output device 25. The calculation device 21, the storage device 22, the communication device 23, the input device 24, and the output device 25 may be connected via a data bus 26.

 演算装置21は、例えば、CPU(Central Processing Unit)、GPU(Graphics Proecssing Unit)及びFPGA(Field Programmable Gate Array)のうちの少なくとも一つを含む。演算装置21は、コンピュータプログラムを読み込む。例えば、演算装置21は、記憶装置22が記憶しているコンピュータプログラムを読み込んでもよい。例えば、演算装置21は、コンピュータで読み取り可能であって且つ一時的でない記録媒体が記憶しているコンピュータプログラムを、情報処理装置2が備える図示しない記録媒体読み取り装置(例えば、後述する入力装置24)を用いて読み込んでもよい。演算装置21は、通信装置23(或いは、その他の通信装置)を介して、情報処理装置2の外部に配置される不図示の装置からコンピュータプログラムを取得してもよい(つまり、ダウンロードしてもよい又は読み込んでもよい)。演算装置21は、読み込んだコンピュータプログラムを実行する。その結果、演算装置21内には、情報処理装置2が行うべき動作を実行するための論理的な機能ブロックが実現される。つまり、演算装置21は、情報処理装置2が行うべき動作(言い換えれば、処理)を実行するための論理的な機能ブロックを実現するためのコントローラとして機能可能である。 The arithmetic device 21 includes, for example, at least one of a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an FPGA (Field Programmable Gate Array). The arithmetic device 21 reads a computer program. For example, the arithmetic device 21 may read a computer program stored in the storage device 22. For example, the arithmetic device 21 may read a computer program stored in a computer-readable and non-transient recording medium using a recording medium reading device (e.g., an input device 24 described later) not shown in the figure that is provided in the information processing device 2. The arithmetic device 21 may acquire (i.e., download or read) a computer program from a device (not shown) located outside the information processing device 2 via the communication device 23 (or other communication device). The arithmetic device 21 executes the read computer program. As a result, a logical functional block for executing the operation to be performed by the information processing device 2 is realized within the calculation device 21. In other words, the calculation device 21 can function as a controller for realizing a logical functional block for executing the operation (in other words, processing) to be performed by the information processing device 2.

 図2には、情報処理動作を実行するために演算装置21内に実現される論理的な機能ブロックの一例が示されている。図2に示すように、演算装置21内には、後述する付記に記載された「受付手段」の一具体例である受付部211と、後述する付記に記載された「合成手段」の一具体例である合成部212と、後述する付記に記載された「差分強調手段」の一具体例である差分強調部と、後述する付記に記載された「算出手段」の一具体例である算出部214と、後述する付記に記載された「判定手段」の一具体例である判定部215と、出力部216とが実現される。但し、出力部216は、演算装置21内に実現されなくてもよい。差分強調部は、抽出部2131、及び強調部2132を有していてもよい。受付部211、合成部212、差分強調部、算出部214、判定部215、及び出力部216の各々の動作の詳細については、図3を参照しながら後に説明する。 2 shows an example of a logical functional block realized in the arithmetic device 21 to execute an information processing operation. As shown in FIG. 2, the arithmetic device 21 realizes a reception unit 211, which is a specific example of the "reception means" described in the appendix described later, a synthesis unit 212, which is a specific example of the "synthesizing means" described in the appendix described later, a difference emphasis unit, which is a specific example of the "difference emphasis means" described in the appendix described later, a calculation unit 214, which is a specific example of the "calculation means" described in the appendix described later, a judgment unit 215, which is a specific example of the "judgment means" described in the appendix described later, and an output unit 216. However, the output unit 216 does not have to be realized in the arithmetic device 21. The difference emphasis unit may have an extraction unit 2131 and an emphasis unit 2132. Details of the operations of the reception unit 211, synthesis unit 212, difference emphasis unit, calculation unit 214, judgment unit 215, and output unit 216 will be described later with reference to FIG. 3.

 記憶装置22は、所望のデータを記憶可能である。例えば、記憶装置22は、演算装置21が実行するコンピュータプログラムを一時的に記憶していてもよい。記憶装置22は、演算装置21がコンピュータプログラムを実行している場合に演算装置21が一時的に使用するデータを一時的に記憶してもよい。記憶装置22は、情報処理装置2が長期的に保存するデータを記憶してもよい。尚、記憶装置22は、RAM(Random Access Memory)、ROM(Read Only Memory)、ハードディスク装置、光磁気ディスク装置、SSD(Solid State Drive)及びディスクアレイ装置のうちの少なくとも一つを含んでいてもよい。つまり、記憶装置22は、一時的でない記録媒体を含んでいてもよい。 The storage device 22 can store desired data. For example, the storage device 22 may temporarily store a computer program executed by the arithmetic device 21. The storage device 22 may temporarily store data that is temporarily used by the arithmetic device 21 when the arithmetic device 21 is executing a computer program. The storage device 22 may store data that the information processing device 2 stores for a long period of time. The storage device 22 may include at least one of a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), and a disk array device. In other words, the storage device 22 may include a non-temporary recording medium.

 通信装置23は、不図示の通信ネットワークを介して、情報処理装置2の外部の装置と通信可能である。通信装置23は、イーサネット(登録商標)、Wi-Fi(登録商標)、Bluetooth(登録商標)、USB(Universal Serial Bus)等の規格に基づく通信インターフェースであってもよい。 The communication device 23 is capable of communicating with devices external to the information processing device 2 via a communication network (not shown). The communication device 23 may be a communication interface based on standards such as Ethernet (registered trademark), Wi-Fi (registered trademark), Bluetooth (registered trademark), and USB (Universal Serial Bus).

 入力装置24は、情報処理装置2の外部からの情報処理装置2に対する情報の入力を受け付ける装置である。例えば、入力装置24は、情報処理装置2のオペレータが操作可能な操作装置(例えば、キーボード、マウス及びタッチパネルのうちの少なくとも一つ)を含んでいてもよい。例えば、入力装置24は情報処理装置2に対して外付け可能な記録媒体にデータとして記録されている情報を読み取り可能な読取装置を含んでいてもよい。 The input device 24 is a device that accepts information input to the information processing device 2 from outside the information processing device 2. For example, the input device 24 may include an operating device (e.g., at least one of a keyboard, a mouse, and a touch panel) that can be operated by an operator of the information processing device 2. For example, the input device 24 may include a reading device that can read information recorded as data on a recording medium that can be attached externally to the information processing device 2.

 出力装置25は、情報処理装置2の外部に対して情報を出力する装置である。例えば、出力装置25は、情報を画像として出力してもよい。つまり、出力装置25は、出力したい情報を示す画像を表示可能な表示装置(いわゆる、ディスプレイ)を含んでいてもよい。例えば、出力装置25は、情報を音声として出力してもよい。つまり、出力装置25は、音声を出力可能な音声装置(いわゆる、スピーカ)を含んでいてもよい。例えば、出力装置25は、紙面に情報を出力してもよい。つまり、出力装置25は、紙面に所望の情報を印刷可能な印刷装置(いわゆる、プリンタ)を含んでいてもよい。
 [2-3:情報処理装置2が行う情報処理動作]
The output device 25 is a device that outputs information to the outside of the information processing device 2. For example, the output device 25 may output information as an image. That is, the output device 25 may include a display device (so-called a display) capable of displaying an image showing the information to be output. For example, the output device 25 may output information as sound. That is, the output device 25 may include an audio device (so-called a speaker) capable of outputting sound. For example, the output device 25 may output information on paper. That is, the output device 25 may include a printing device (so-called a printer) capable of printing desired information on paper.
[2-3: Information Processing Operation Performed by Information Processing Device 2]

 図3を参照しながら、情報処理装置2が行う情報処理動作について説明する。図3は、情報処理装置2が行う情報処理動作の流れを示すフローチャートである。なお、本開示は、第1の画像がフェイク画像ではなく本物画像であると仮定している。 The information processing operation performed by the information processing device 2 will be described with reference to FIG. 3. FIG. 3 is a flowchart showing the flow of the information processing operation performed by the information processing device 2. Note that this disclosure assumes that the first image is not a fake image but a real image.

 図3に示す様に、受付部211は、第1の画像の入力を受け付ける(ステップS20)。受付部211は、第1の画像として、人物の顔領域を含む顔画像の入力を受け付けてもよい。第1の画像は、静止画像であってもよい。以下、「第1の画像」を「ソース画像」とよぶ場合がある。 As shown in FIG. 3, the reception unit 211 receives input of a first image (step S20). The reception unit 211 may receive input of a face image including a person's face area as the first image. The first image may be a still image. Hereinafter, the "first image" may be referred to as the "source image."

 受付部211は、第2の画像の入力を受け付ける(ステップS21)。受付部211は、第2の画像として、人物の顔領域を含む顔画像の入力を受け付けてもよい。第2の画像は、静止画像であってもよい。第2の画像は、動画像であってもよい。第2実施形態では、静止画像である第2の画像を処理する場合を説明する。以下、「第2の画像」を「判定対象画像」とよぶ場合がある。 The reception unit 211 receives input of a second image (step S21). The reception unit 211 may receive input of a face image including a person's facial area as the second image. The second image may be a still image. The second image may be a moving image. In the second embodiment, a case where the second image that is a still image is processed will be described. Hereinafter, the "second image" may be referred to as the "image to be determined."

 合成部212は、第1の画像及び第2の画像に基づいて、第3の画像を合成する。以下、「第3の画像」を「合成画像」とよぶ場合がある。合成部212は、例えば、フェーススワップ(Face Swap)とよばれる手法を用いて合成画像を生成してもよい。フェーススワップは、ソース画像の顔領域とターゲット画像の顔領域とを入れ替える手法である。合成部212は、ソース画像の顔領域に判定対象画像の顔領域をはめ込むことにより合成画像を生成してもよい。合成部212は、ソース画像の特徴を有する合成画像を生成してもよい。例えば、合成部212は、ソース画像の顔の表情を維持した合成画像を生成してもよい。 The synthesis unit 212 synthesizes a third image based on the first image and the second image. Hereinafter, the "third image" may be referred to as a "synthetic image". The synthesis unit 212 may generate a synthetic image using, for example, a technique called face swap. Face swap is a technique for exchanging a face area of a source image with a face area of a target image. The synthesis unit 212 may generate a synthetic image by fitting a face area of the judgment target image into a face area of the source image. The synthesis unit 212 may generate a synthetic image having the characteristics of the source image. For example, the synthesis unit 212 may generate a synthetic image that maintains the facial expression of the source image.

 抽出部2131は、判定対象画像と合成画像との差分を抽出する(ステップS23)。抽出部2131は、判定対象画像と合成画像とが異なっている部分を抽出すると言い換えてもよい。抽出部2131は、画素毎に画素値の差を求め、各画素を画素値の差で表した差分画像を生成してもよい。 The extraction unit 2131 extracts the difference between the image to be determined and the composite image (step S23). In other words, the extraction unit 2131 extracts the portion where the image to be determined and the composite image differ. The extraction unit 2131 may obtain the difference in pixel value for each pixel, and generate a difference image in which each pixel is represented by the difference in pixel value.

 強調部2132は、差分を強調する(ステップS24)。強調部2132は、例えば、差分画像の各画素値に実数を乗ずることにより、差分をより強調してもよい。すなわち、強調部2132は、差分画像における画素値が0(つまり、判定対象画像と合成画像とで画素値の差がない)ではない画素の画素値を大きくしてもよい。以下、強調部2132により差分が強調された差分画像を「差分強調画像」とよぶ場合がある。強調部2132は、例えば、差分画像における画素値が0ではない画素の画素値を取り得る最大値にすることにより、差分強調画像を生成してもよい。強調部2132は、例えば、差分画像における画素値が所定以上の画素の画素値を取り得る最大値にすることにより、差分強調画像を生成してもよい。強調部2132は、例えば、差分画像における画素値の範囲を設定することにより、例えば、差分大、差分中、差分小、差分無の4種類の画素値を取り得る差分強調画像を生成してもよい。強調部2132は、任意の方法を採用して差分を強調してもよい。 The highlighting unit 2132 highlights the difference (step S24). The highlighting unit 2132 may, for example, highlight each pixel value of the difference image by multiplying it by a real number to highlight the difference. That is, the highlighting unit 2132 may increase the pixel value of a pixel in the difference image whose pixel value is not 0 (that is, there is no difference in pixel value between the image to be determined and the composite image). Hereinafter, the difference image in which the difference is highlighted by the highlighting unit 2132 may be referred to as a "difference highlighted image". The highlighting unit 2132 may, for example, generate a difference highlighted image by setting the pixel value of a pixel in the difference image whose pixel value is not 0 to the maximum value that it can take. The highlighting unit 2132 may, for example, generate a difference highlighted image by setting the pixel value of a pixel in the difference image whose pixel value is equal to or greater than a predetermined value to the maximum value that it can take. The highlighting unit 2132 may, for example, generate a difference highlighted image that can take four types of pixel values, for example, large difference, medium difference, small difference, and no difference, by setting a range of pixel values in the difference image. The highlighting unit 2132 may employ any method to highlight the difference.

 算出部214は、強調された差分に基づいて、判定対象画像のフェイク画像らしさを表す指標を算出する(ステップS25)。算出部214が算出する指標は実数であってもよい。算出部214は、算出モデルを用いて、判定対象画像のフェイク画像らしさを表す指標を算出してもよい。算出モデルは、強調された差分が入力されると、画像のフェイク画像らしさを表す指標を出力するモデルである。算出モデルは、機械学習されたモデルであってもよい。算出モデルの学習を行う学習機構は、正解(判定対象画像は本物画像である、又は、判定対象画像はフェイク画像である)を示す情報を有する差分強調画像を教師データとして用いて、算出モデルの学習を行ってもよい。学習機構は、正解を示す情報と、算出モデルが出力したフェイク画像らしさを表す指標とを用いて、当該指標の算出方法を算出モデルに学習させてもよい。 The calculation unit 214 calculates an index representing the likelihood that the image to be determined is a fake image based on the emphasized difference (step S25). The index calculated by the calculation unit 214 may be a real number. The calculation unit 214 may use a calculation model to calculate an index representing the likelihood that the image to be determined is a fake image. The calculation model is a model that outputs an index representing the likelihood that the image is a fake image when the emphasized difference is input. The calculation model may be a machine-learned model. The learning mechanism that trains the calculation model may train the calculation model using a difference-emphasized image having information indicating the correct answer (the image to be determined is a real image, or the image to be determined is a fake image) as teacher data. The learning mechanism may train the calculation model on a method for calculating the index using the information indicating the correct answer and the index representing the likelihood that the image is a fake image output by the calculation model.

 判定部215は、指標に応じて、第2の画像が合成された画像か否かを判定する(ステップS26)。判定部215は、指標と所定の閾値との比較により、第2の画像が合成された画像か否かを判定してもよい。 The determination unit 215 determines whether the second image is a composite image or not based on the index (step S26). The determination unit 215 may determine whether the second image is a composite image or not by comparing the index with a predetermined threshold value.

 指標が所定の閾値を超過した場合(ステップS26:Yes)、判定部215は、第2の画像が合成された画像であると判定する(ステップS27)。指標が所定の閾値を超過しなかった場合(ステップS26:No)、判定部215は、第2の画像が合成されていない画像であると判定する(ステップS28)。 If the index exceeds the predetermined threshold (step S26: Yes), the determination unit 215 determines that the second image is a composite image (step S27). If the index does not exceed the predetermined threshold (step S26: No), the determination unit 215 determines that the second image is a non-composite image (step S28).

 出力部216は、判定結果に応じた出力をする(ステップS29)。出力部216は、出力装置25を制御して、出力装置25に判定結果に応じた出力をさせてもよい。
 [2-4:情報処理装置2の技術的効果]
The output unit 216 outputs according to the determination result (step S29). The output unit 216 may control the output device 25 to cause the output device 25 to output according to the determination result.
[2-4: Technical Effects of Information Processing Device 2]

 判定対象画像と合成画像との比較判定に強調した差分を用いた場合、判定対象画像がフェイク画像である場合とない場合との違いが分かりやすく、判定対象画像がフェイク画像であるのか否かが判別し易い。本開示の第2の情報処理装置2は、入力された判別対象画像と合成した合成画像との差分を強調し、強調した差分に基づいた判定をするので、入力された判別対象画像がフェイク画像か否かを精度よく検知することができる。また、情報処理装置2は、閾値との比較によりフェイク画像か否かを判定するので、閾値の設定により、どれだけフェイク画像らしい画像をフェイク画像と判定するかを調整することができる。
 [3:第3実施形態]
When an emphasized difference is used for the comparison judgment between the judgment target image and the composite image, the difference between the judgment target image being a fake image and not being a fake image is easy to understand, and it is easy to judge whether the judgment target image is a fake image or not. The second information processing device 2 of the present disclosure emphasizes the difference between the input judgment target image and the composite image and judges based on the emphasized difference, so that it is possible to accurately detect whether the input judgment target image is a fake image or not. In addition, since the information processing device 2 judges whether the image is a fake image or not by comparing with a threshold value, it is possible to adjust how much an image that seems like a fake image is judged to be a fake image by setting the threshold value.
[3: Third embodiment]

 情報処理装置、情報処理方法、及び、記録媒体の第3実施形態について説明する。以下では、本開示にかかる第3の情報処理装置3を用いて、情報処理装置、情報処理方法、及び記録媒体の第3実施形態について説明する。第3実施形態では、第2の画像が複数のフレームを含む動画像である場合を説明する。以下、「第2の画像」を「判定対象動画」とよぶ場合がある。また、実際には起こっていないことがらが写った動画像をフェイク動画とよぶ場合がある。また、実際に起こったことがらが写った動画像を本物動画とよぶ場合がある。
 [3-1:フェイク動画]
A third embodiment of an information processing device, an information processing method, and a recording medium will be described below. In the following, a third embodiment of an information processing device, an information processing method, and a recording medium will be described using a third information processing device 3 according to the present disclosure. In the third embodiment, a case where the second image is a moving image including a plurality of frames will be described. Hereinafter, the "second image" may be referred to as a "video to be determined". Also, a moving image showing an event that has not actually occurred may be referred to as a fake video. Also, a moving image showing an event that has actually occurred may be referred to as a real video.
[3-1: Fake video]

 例えば、本物動画は、カメラにより撮像されている人物Bがカメラの前で行った動作が写る動画を含んでいてもよい。これに対し、フェイク動画は、カメラにより撮像されている人物Bがカメラの前で行った動作を、人物Bとは異なる人物Aが行ったように合成された動画像を含んでいてもよい。 For example, a genuine video may include a video showing an action performed by person B in front of the camera as captured by the camera. In contrast, a fake video may include a moving image synthesized to make it appear as if person A, a different person from person B, performed the action performed by person B in front of the camera as captured by the camera.

 操演(Reenactment)とよばれる、元画像に写る人物の表情が所望の表情に変化したり、元画像に写る人物が所望の方向を向いたりするフェイク動画を生成する技術がある。例えば、人物Aの少なくとも1枚の顔画像に基づき、当該顔画像の人物Aの表情を人物Bの表情に合わせて変化させた、あたかも人物Aが表情を変えているかのような動画像を生成(以下、「静止画を動画化」とよぶ場合がある)する技術が知られている。 There is a technology called reenactment that creates fake videos in which the facial expression of a person in an original image changes to a desired expression or faces a desired direction. For example, a technology is known that uses at least one facial image of person A and changes the facial expression of person A in the facial image to match the facial expression of person B, creating a moving image that makes it appear as if person A is changing his or her facial expression (hereinafter sometimes referred to as "animating still images").

 元画像と元動画とを用いることにより、静止画を動画化することができる。元動画とは、例えば、カメラにより撮像されている人物Bがカメラの前で行った動作が写る動画像であってもよい。また、元画像は、人物Bとは異なる人物Aが写る静止画であってもよい。静止画の動画化では、まず、元画像のランドマークを検出する。また、元動画を構成する動画フレームの各々からランドマークを検出する。続いて、元動画を構成する各々の動画フレームについて、元画像のランドマークと該当動画フレームのランドマークとを合せるように元画像を編集して合成フレームを生成する。生成した各々の合成フレームを繋ぎ合わせることで、静止画を動画化することができる。ランドマークとは、画像に写る被写体の特徴的な位置であってもよい。 By using the original image and the original video, a still image can be animated. The original video may be, for example, a video image showing the actions of person B in front of the camera as captured by the camera. The original image may also be a still image showing person A, who is different from person B. When animating a still image, first, landmarks are detected in the original image. Landmarks are also detected from each of the video frames that make up the original video. Next, for each video frame that makes up the original video, the original image is edited so that the landmarks in the original image and the landmarks in the corresponding video frame are matched to generate a composite frame. The generated composite frames are then joined together to generate an animated still image. A landmark may be a characteristic position of a subject that appears in the image.

 例えば、元画像に写る人物Aの顔の向き、表情等を、元動画に写る人物Bの顔のランドマークを用いて変化させ、人物Aの顔の向き、表情等が変化する動画を合成することができる。人物の顔の向き、表情等が変化させるランドマークとは、顔における特徴的な部位であってもよい。顔における特徴的な位置とは、目、鼻、口等の部位の特定のポイントであってもよい。 For example, the facial direction, expression, etc. of person A in the original image can be changed using the facial landmarks of person B in the original video, and a video can be synthesized in which the facial direction, expression, etc. of person A changes. The landmarks that change the facial direction, expression, etc. of a person may be characteristic parts of the face. The characteristic positions on the face may be specific points on parts of the face such as the eyes, nose, mouth, etc.

 入手した動画像が、静止画を用いて合成した合成動画と類似している場合、当該入手した動画像はフェイク動画である可能性が高い。本実施形態では、この性質をフェイク動画か否かの判定に利用する。すなわち、第3実施形態では、静止画を用いて合成動画を生成すること、及び入手した動画像と合成動画とを比較することにより、入手した動画像がフェイク動画であるか否かを判定する。
 [3-2:情報処理装置3の構成]
If the acquired moving image is similar to a composite moving image synthesized using still images, the acquired moving image is likely to be a fake moving image. In this embodiment, this property is used to determine whether or not the moving image is a fake moving image. That is, in the third embodiment, a composite moving image is generated using still images, and the acquired moving image is compared with the composite moving image to determine whether or not the acquired moving image is a fake moving image.
[3-2: Configuration of information processing device 3]

 図4を参照しながら、第3の情報処理装置3の構成について説明する。図4は、第3の情報処理装置3の構成を示すブロック図である。 The configuration of the third information processing device 3 will be described with reference to FIG. 4. FIG. 4 is a block diagram showing the configuration of the third information processing device 3.

 図4に示すように、第3の情報処理装置3は、第2の情報処理装置2と同様に、演算装置21と、記憶装置22とを備えている。更に、第3の情報処理装置3は、第2の情報処理装置2と同様に、通信装置23と、入力装置24と、出力装置25とを備えていてもよい。但し、情報処理装置3は、通信装置23、入力装置24及び出力装置25のうちの少なくとも1つを備えていなくてもよい。第3の情報処理装置3は、合成部312が検出部3121を含む点で、第2の情報処理装置2と異なる。情報処理装置3のその他の特徴は、情報処理装置2のその他の特徴と同一であってもよい。このため、以下では、すでに説明した実施形態と異なる部分について詳細に説明し、その他の重複する部分については適宜説明を省略するものとする。
 [3-3:情報処理装置3が行う情報処理動作]
As shown in FIG. 4, the third information processing device 3 includes a calculation device 21 and a storage device 22, similar to the second information processing device 2. Furthermore, the third information processing device 3 may include a communication device 23, an input device 24, and an output device 25, similar to the second information processing device 2. However, the information processing device 3 may not include at least one of the communication device 23, the input device 24, and the output device 25. The third information processing device 3 differs from the second information processing device 2 in that the synthesis unit 312 includes a detection unit 3121. Other features of the information processing device 3 may be the same as other features of the information processing device 2. For this reason, hereinafter, the parts that are different from the embodiments already described will be described in detail, and other overlapping parts will be appropriately omitted.
[3-3: Information Processing Operation Performed by Information Processing Device 3]

 図5を参照しながら、情報処理装置3が行う情報処理動作の流れについて説明する。図5は、情報処理装置3が行う情報処理動作の流れを示すフローチャートである。 The flow of information processing operations performed by the information processing device 3 will be described with reference to FIG. 5. FIG. 5 is a flowchart showing the flow of information processing operations performed by the information processing device 3.

 図5に示す様に、受付部311は、第1の画像としてのソース画像の入力を受け付ける(ステップS20)。検出部3121は、ソース画像からランドマークを検出する。検出部3121は、静止画から、ランドマークとして、顔領域における特徴的な位置を検出してもよい。検出部3121は、ソース画像から、ランドマークとして、目、鼻、口等の部位の特定のポイントを検出してもよい。 As shown in FIG. 5, the reception unit 311 receives input of a source image as a first image (step S20). The detection unit 3121 detects landmarks from the source image. The detection unit 3121 may detect characteristic positions in the face area as landmarks from a still image. The detection unit 3121 may detect specific points of parts of the body such as the eyes, nose, and mouth as landmarks from the source image.

 受付部311は、第2の画像としての判定対象動画の入力を受け付ける(ステップS30)。検出部3121は、判定対象動画が含む1以上のフレームの各々からランドマークを検出する(ステップS31)。判定対象動画が含む1以上のフレームは、判定対象動画が含む全てのフレームであってもよい。判定対象動画が含む1以上のフレームは、動画像が含む任意の1以上のフレームであってもよい。検出部3121は、判定対象画像から、ランドマークとして、ソース画像から検出したランドマークと同等の位置を検出してもよい。 The reception unit 311 receives input of the video to be judged as the second image (step S30). The detection unit 3121 detects landmarks from each of the one or more frames included in the video to be judged (step S31). The one or more frames included in the video to be judged may be all frames included in the video to be judged. The one or more frames included in the video to be judged may be any one or more frames included in the moving image. The detection unit 3121 may detect, as landmarks, positions from the image to be judged that are equivalent to the landmarks detected from the source image.

 合成部312は、ソース画像、及びソース画像のランドマーク、並びに、判定対象動画が含む1以上のフレームの各々のランドマークに基づいて、第3の画像を合成する(ステップS32)。第3実施形態において、第3の画像は、1以上のフレームを含む合成動画である。以下、「第3の画像」を「合成動画」とよぶ場合がある。 The synthesis unit 312 synthesizes a third image based on the source image, the landmarks of the source image, and the landmarks of one or more frames included in the video to be determined (step S32). In the third embodiment, the third image is a synthetic video including one or more frames. Hereinafter, the "third image" may be referred to as a "synthetic video."

 合成部312は、まず、判定対象動画を構成する各々の入力フレームについて、ソース画像のランドマークと該当入力フレームのランドマークとを合せるように編集した合成フレームを生成してもよい。続いて、合成部312は、各々の合成フレームを繋ぎ合わせることで、静止画であるソース画像を動画化し、合成動画を生成してもよい。 The synthesis unit 312 may first generate a synthesis frame for each input frame constituting the judgment target moving image by editing the landmarks of the source image to match the landmarks of the corresponding input frame. Next, the synthesis unit 312 may connect each synthesis frame together to animate the source images, which are still images, and generate a synthesis moving image.

 抽出部3131は、判定対象動画と、合成動画との差分を抽出する(ステップS33)。抽出部3131は、判定対象動画が含むフレームと、当該フレームに対応する合成動画が含むフレームとの差分を抽出する。抽出部3131は、判定対象動画と、合成動画との差分を抽出し、差分動画を生成してもよい。判定対象動画dが1からFのフレームを含む場合、判定対象動画dを[x ,・・・,x ]と表してもよい。合成動画をdが1からFのフレームを含む場合、合成動画dを[x ,・・・,x ]と表してもよい。この場合、差分動画ddiffは、d-d=[|x -x |,・・・,|x -x |]と表してもよい。ステップS31において、検出部3121が判定対象動画が含む任意の1以上のフレームからランドマークを検出した場合、抽出部3131は、当該任意の1以上のフレームに対応する差分フレームを含む差分動画を生成してもよい。 The extraction unit 3131 extracts the difference between the determination target moving image and the composite moving image (step S33). The extraction unit 3131 extracts the difference between a frame included in the determination target moving image and a frame included in the composite moving image corresponding to the frame. The extraction unit 3131 may extract the difference between the determination target moving image and the composite moving image to generate a difference moving image. When the determination target moving image d i includes frames 1 to F, the determination target moving image d i may be expressed as [x i 1 , ..., x i F ]. When the composite moving image d f includes frames 1 to F, the composite moving image d f may be expressed as [x f 1 , ..., x i F ]. In this case, the difference moving image d diff may be expressed as d f - d i = [|x f 1 - x i 1 |, ..., |x f F - x i F |]. In step S31, if the detection unit 3121 detects a landmark from any one or more frames included in the video to be judged, the extraction unit 3131 may generate a difference video including difference frames corresponding to the any one or more frames.

 強調部3132は、差分を強調する(ステップS34)。強調部3132は、判定対象動画が含むフレームと、当該フレームに対応する合成動画が含むフレームとの差分を強調した差分強調フレームを含む差分強調動画を生成してもよい。強調部3132が生成した差分強調動画ddiffは、[α|x -x |,・・・,α|x -x |]と表してもよい。αは、実数であり、差分をより強調するためのパラメータである。 The emphasis unit 3132 emphasizes the difference (step S34). The emphasis unit 3132 may generate a difference emphasized moving image including a difference emphasized frame in which a difference between a frame included in the determination target moving image and a frame included in the composite moving image corresponding to the frame is emphasized. The difference emphasized moving image d diff generated by the emphasis unit 3132 may be expressed as [α|x f 1 -x i 1 |, ..., α|x f F -x i F |], where α is a real number and a parameter for further emphasizing the difference.

 算出部314は、差分強調動画が含む1以上のフレームに基づいて、判定対象動画のフェイク動画らしさを表す指標を算出する(ステップS35)。算出部314は、算出モデルを用いて、判定対象画像のフェイク画像らしさを表す指標を算出してもよい。第3実施形態において、算出モデルは、差分強調動画が含む1以上のフレームが入力されると、画像のフェイク画像らしさを表す指標を出力する。差分強調動画が含む1以上のフレームは、差分強調動画が含む全てのフレームであってもよい。差分強調動画が含む1以上のフレームは、動画像が含む任意の1以上のフレームであってもよい。 The calculation unit 314 calculates an index representing the likelihood that the video to be judged is a fake video based on one or more frames included in the difference-emphasized video (step S35). The calculation unit 314 may use a calculation model to calculate an index representing the likelihood that the image to be judged is a fake image. In the third embodiment, when one or more frames included in the difference-emphasized video are input, the calculation model outputs an index representing the likelihood that the image is a fake image. The one or more frames included in the difference-emphasized video may be all frames included in the difference-emphasized video. The one or more frames included in the difference-emphasized video may be any one or more frames included in the moving image.

 判定部315は、指標に応じて、判定対象動画がフェイク動画か否かを判定する(ステップS36)。判定部315は、指標と所定の閾値との比較により、判定対象動画がフェイク動画か否かを判定してもよい。 The determination unit 315 determines whether the video to be determined is a fake video or not based on the index (step S36). The determination unit 315 may determine whether the video to be determined is a fake video or not by comparing the index with a predetermined threshold value.

 指標が所定の閾値を超過した場合(ステップS36:Yes)、判定部315は、判定対象動画がフェイク動画であると判定する(ステップS37)。指標が所定の閾値を超過しなかった場合(ステップS36:No)、判定部315は、判定対象動画がフェイク動画ではないと判定する(ステップS38)。出力部316は、判定結果に応じた出力をする(ステップS39)。
 [3-4:情報処理装置3の技術的効果]
If the index exceeds a predetermined threshold (step S36: Yes), the determination unit 315 determines that the video to be determined is a fake video (step S37). If the index does not exceed the predetermined threshold (step S36: No), the determination unit 315 determines that the video to be determined is not a fake video (step S38). The output unit 316 outputs according to the determination result (step S39).
[3-4: Technical Effects of Information Processing Device 3]

 静止画、及びランドマークを用いて生成した合成動画は、ディープフェイク等の技術を用いて生成されたフェイク動画の特徴を捉えることができる。本開示の第3の情報処理装置3は、入力された静止画であるソース画像を使って生成した合成動画と入力された判定対象動画の特徴が似通っている場合には判定対象動画がフェイク動画である可能性が高いという性質を用いる。本開示の第3の情報処理装置3は、判定対象動画が偽造されていない本物動画であるか、判定対象動画が偽造されたフェイク動画であるかを精度よく判定することができる。また、情報処理装置3は、フレーム毎の差分に基づいて、入力された判定対象動画がフェイク動画か否かを精度よく検知することができる。また、情報処理装置3は、ランドマークを用いて生成されたディープフェイクを精度よく検知することができる。
 [4:第4実施形態]
A synthetic video generated using still images and landmarks can capture the characteristics of a fake video generated using a technology such as deep fake. The third information processing device 3 of the present disclosure uses the property that if the characteristics of a synthetic video generated using a source image, which is an input still image, and the input video to be judged are similar, the video to be judged is likely to be a fake video. The third information processing device 3 of the present disclosure can accurately judge whether the video to be judged is a genuine video that is not forged or a fake video that is forged. In addition, the information processing device 3 can accurately detect whether the input video to be judged is a fake video or not based on the difference between each frame. In addition, the information processing device 3 can accurately detect deep fakes generated using landmarks.
[4: Fourth embodiment]

 情報処理装置、情報処理方法、及び、記録媒体の第4実施形態について説明する。以下では、本開示にかかる第4の情報処理装置4を用いて、情報処理装置、情報処理方法、及び記録媒体の第4実施形態について説明する。
 [4-1:情報処理装置4の構成]
A fourth embodiment of an information processing device, an information processing method, and a recording medium will be described. Hereinafter, a fourth embodiment of an information processing device, an information processing method, and a recording medium will be described using a fourth information processing device 4 according to the present disclosure.
[4-1: Configuration of information processing device 4]

 図6を参照しながら、第4の情報処理装置4の構成について説明する。図6は、第4の情報処理装置4の構成を示すブロック図である。 The configuration of the fourth information processing device 4 will be described with reference to FIG. 6. FIG. 6 is a block diagram showing the configuration of the fourth information processing device 4.

 図6に示すように、第4の情報処理装置4は、第2の情報処理装置2、及び第3の情報処理装置3と同様に、演算装置21と、記憶装置22とを備えている。更に、第4の情報処理装置4は、第2の情報処理装置2、及び第3の情報処理装置3と同様に、通信装置23と、入力装置24と、出力装置25とを備えていてもよい。但し、情報処理装置4は、通信装置23、入力装置24及び出力装置25のうちの少なくとも1つを備えていなくてもよい。第4の情報処理装置4は、演算装置21内に照合部417、成りすまし判定部418、及び認証部419が更に実現される点で、第2の情報処理装置2、及び第3の情報処理装置3と異なる。情報処理装置4のその他の特徴は、情報処理装置2、及び情報処理装置3の少なくとも一方のその他の特徴と同一であってもよい。このため、以下では、すでに説明した実施形態と異なる部分について詳細に説明し、その他の重複する部分については適宜説明を省略するものとする。 6, the fourth information processing device 4 includes a calculation device 21 and a storage device 22, similar to the second information processing device 2 and the third information processing device 3. Furthermore, the fourth information processing device 4 may include a communication device 23, an input device 24, and an output device 25, similar to the second information processing device 2 and the third information processing device 3. However, the information processing device 4 may not include at least one of the communication device 23, the input device 24, and the output device 25. The fourth information processing device 4 differs from the second information processing device 2 and the third information processing device 3 in that a matching unit 417, a spoofing determination unit 418, and an authentication unit 419 are further realized in the calculation device 21. Other features of the information processing device 4 may be the same as other features of at least one of the information processing device 2 and the information processing device 3. Therefore, hereinafter, the parts that are different from the embodiments already described will be described in detail, and other overlapping parts will be omitted as appropriate.

 第4の情報処理装置4は、人物の生体認証を実施可能な機構である。情報処理装置4は、画像を用いた照合動作をするとともに、画像を用いて人物が成りすましているか否かを判定し、人物を認証可能な機構であってもよい。 The fourth information processing device 4 is a mechanism capable of performing biometric authentication of a person. The information processing device 4 may be a mechanism capable of performing a matching operation using an image, and determining whether or not a person is impersonating another person using the image, thereby authenticating the person.

 本開示の第4の情報処理装置4は、電子本人確認(electronic Know Your Customer:eKYC)等のオンラインでの本人確認に適用されてもよい。上述したように、人物の顔写真一枚の情報を基に、当該人物の画像を合成する技術が存在しており、eKYCにおける成りすましの脅威となっている。フェイク動画か否かの正確な判定はeKYCのようなサービスの信頼性を高めるうえで重要な課題である。eKYCに対する入力には、フェイク動画を合成するための情報となる公的な文書の顔画像が含まれている。つまり、eKYCに対する成りすましとして、運転免許証、マイナンバーカード等の公的な文書の顔画像のように限られた情報を基にフェイク動画を合成し入力する手法が考えられる。
 [4-2:情報処理装置4が行う情報処理動作]
The fourth information processing device 4 of the present disclosure may be applied to online identity verification such as electronic know your customer (eKYC). As described above, there is a technology that synthesizes an image of a person based on information of a single face photograph of the person, which poses a threat of impersonation in eKYC. Accurate determination of whether or not a video is fake is an important issue in increasing the reliability of services such as eKYC. The input to eKYC includes a face image of an official document that serves as information for synthesizing the fake video. In other words, as an impersonation method for eKYC, a method of synthesizing and inputting a fake video based on limited information such as a face image of an official document such as a driver's license or a My Number card can be considered.
[4-2: Information Processing Operation Performed by Information Processing Device 4]

 図7を参照しながら、情報処理装置4が行う情報処理動作の流れについて説明する。図7は、情報処理装置4が行う情報処理動作の流れを示すフローチャートである。なお、第4実施形態でも、第2の画像が複数のフレームを含む動画像である場合を説明し、第2の画像を判定対象動画とよぶ。 With reference to FIG. 7, the flow of information processing operations performed by the information processing device 4 will be described. FIG. 7 is a flowchart showing the flow of information processing operations performed by the information processing device 4. Note that in the fourth embodiment, a case will also be described in which the second image is a moving image including a plurality of frames, and the second image will be referred to as a determination target moving image.

 図7に示す様に、受付部311は、第1の画像としてのソース画像の入力を受け付ける(ステップS20)。受付部311は、ソース画像として、運転免許証、マイナンバーカード等の本人確認書類の顔写真の入力を受け付けてもよい。受付部311は、第2の画像としての判定対象動画の入力を受け付ける(ステップS30)。 As shown in FIG. 7, the reception unit 311 receives an input of a source image as a first image (step S20). The reception unit 311 may receive an input of a facial photograph on an identification document such as a driver's license or a My Number card as the source image. The reception unit 311 receives an input of a video to be judged as a second image (step S30).

 照合部417は、人物の顔画像を照合する(ステップS40)。第1の画像が運転免許証、マイナンバーカード等の公的な文書の顔画像である場合、照合部417は、第1の画像に写る人物と、判定対象動画に写る人物とを照合してもよい。この場合、第1の画像に写る人物と、判定対象動画に写る人物との照合に失敗した際は、当該情報処理動作は終了してもよい。または、照合部417は、受け付けた第1の画像と予め登録されている登録顔画像とを照合してもよい。または、照合部417は、受け付けた判定対象動画と予め登録されている登録顔画像とを照合してもよい。すなわち、照合部417は、第1の画像に写る人物、及び判定対象動画に写る人物の少なくとも一方の照合をしてもよい。 The matching unit 417 matches the facial image of the person (step S40). If the first image is a facial image on an official document such as a driver's license or a My Number card, the matching unit 417 may match the person appearing in the first image with the person appearing in the video to be judged. In this case, if the matching between the person appearing in the first image and the person appearing in the video to be judged fails, the information processing operation may be terminated. Alternatively, the matching unit 417 may match the received first image with a registered facial image that has been registered in advance. Alternatively, the matching unit 417 may match the received video to be judged with a registered facial image that has been registered in advance. In other words, the matching unit 417 may match at least one of the person appearing in the first image and the person appearing in the video to be judged.

 なお、第1の画像としてのソース画像と、当該ソース画像に基づいて合成されたフェイク動画とは似ているので、判定対象動画がフェイク動画であった場合にも、第1の画像と判定対象動画との照合が成功する可能性は高い。 In addition, since the source image as the first image and the fake video synthesized based on the source image are similar, even if the video to be judged is a fake video, there is a high possibility that the first image will be successfully matched with the video to be judged.

 成りすまし判定部418は、判定対象動画を用いて成りすまし判定を実施する(ステップS41)。第4実施形態において、判定対象動画は、フェイク動画であるか否かの判定とともに、成りすまし判定に用いられてもよい。例えば、判定対象動画は、情報処理装置4からの指示により人物が実施した動作が写る動画であってもよい。情報処理装置4は、顔の向き、視線の向き、顔の位置を指示してもよい。情報処理装置4は、視線を誘導してもよい。情報処理装置4は、ジェスチャーを指示してもよい。成りすまし判定部418は、判定対象動画を用いてアクティブライブネス判定を実施してもよい。 The spoofing determination unit 418 performs spoofing determination using the video to be determined (step S41). In the fourth embodiment, the video to be determined may be used for spoofing determination along with determining whether it is a fake video or not. For example, the video to be determined may be a video showing an action performed by a person in response to an instruction from the information processing device 4. The information processing device 4 may instruct the face direction, gaze direction, and face position. The information processing device 4 may guide the gaze. The information processing device 4 may instruct a gesture. The spoofing determination unit 418 may perform active liveness determination using the video to be determined.

 検出部3121は、判定対象動画が含む1以上のフレームの各々からランドマークを検出する(ステップS31)。合成部312は、ソース画像、及び判定対象動画が含む1以上のフレームの各々のランドマークに基づいて、合成動画を生成する(ステップS32)。合成部312は、ソース画像としての顔写真に基づいて、合成画像を生成する。 The detection unit 3121 detects landmarks from each of one or more frames included in the video to be judged (step S31). The synthesis unit 312 generates a composite video based on the source image and the landmarks from each of one or more frames included in the video to be judged (step S32). The synthesis unit 312 generates a composite image based on a facial photograph as the source image.

 抽出部3131は、判定対象動画が含むフレームと、当該フレームに対応する合成動画が含むフレームとの差分を抽出する(ステップS33)。強調部3132は、差分を強調する(ステップS34)。 The extraction unit 3131 extracts the difference between a frame included in the judgment target video and a frame included in the composite video corresponding to that frame (step S33). The emphasis unit 3132 emphasizes the difference (step S34).

 算出部314は、動画差分に基づいて、判定対象動画のフェイク動画らしさを表す指標を算出する(ステップS35)。判定部315は、指標に応じて、判定対象動画がフェイク動画か否かを判定する(ステップS36)。判定部315は、指標と所定の閾値との比較により、判定対象動画がフェイク動画か否かを判定してもよい。 The calculation unit 314 calculates an index representing the likelihood that the video to be judged is a fake video based on the video difference (step S35). The determination unit 315 determines whether the video to be judged is a fake video or not based on the index (step S36). The determination unit 315 may determine whether the video to be judged is a fake video or not by comparing the index with a predetermined threshold value.

 指標が所定の閾値を超過した場合(ステップS36:Yes)、判定部315は、判定対象動画がフェイク動画であると判定する(ステップS37)。指標が所定の閾値を超過しなかった場合(ステップS36:No)、判定部315は、判定対象動画がフェイク動画ではないと判定する(ステップS38)。 If the index exceeds the predetermined threshold (step S36: Yes), the determination unit 315 determines that the video to be determined is a fake video (step S37). If the index does not exceed the predetermined threshold (step S36: No), the determination unit 315 determines that the video to be determined is not a fake video (step S38).

 判定部315は、判定対象動画がフェイク動画ではないと判定した場合、認証部419は、照合部417による照合結果、及び成りすまし判定部418による判定結果に基づき、人物を認証する(ステップS42)。また、認証部419は、判定部315が判定対象動画が所定の基準よりもフェイク画像らしくないと判定し、かつ、成りすまし判定部418が人物は指示に従った動作をしたと判定したことを条件に、人物を認証してもよい。認証部419による人物の認証が成功した場合とは、人物の本人確認ができた場合であってもよい。出力部416は、人物の認証結果を出力する(ステップS43)。
 [4-3:情報処理装置4の技術的効果]
When the determination unit 315 determines that the video to be determined is not a fake video, the authentication unit 419 authenticates the person based on the collation result by the collation unit 417 and the determination result by the masquerade determination unit 418 (step S42). The authentication unit 419 may also authenticate the person on the condition that the determination unit 315 determines that the video to be determined is less likely to be a fake image than a predetermined standard and the masquerade determination unit 418 determines that the person has acted in accordance with the instructions. The case where the authentication unit 419 has successfully authenticated the person may be the case where the person's identity has been confirmed. The output unit 416 outputs the authentication result of the person (step S43).
[4-3: Technical Effects of Information Processing Device 4]

 本開示の第4の情報処理装置4は、入力された判定対象動画がフェイク動画か否かを精度よく検知することができるので、精度よく本人確認をすることができる。
 [5:付記]
The fourth information processing device 4 of the present disclosure can accurately detect whether an input video to be judged is a fake video or not, and can therefore perform identity verification with high accuracy.
[5: Supplementary Note]

 以上説明した実施形態に関して、更に以下の付記を開示する。
 [付記1]
 第1の画像及び第2の画像の入力を受け付ける受付手段と、
 前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成する合成手段と、
 前記第2の画像と前記第3の画像との差分を強調する差分強調手段と、
 前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出する算出手段と、
 前記指標に応じて、前記第2の画像が合成された画像か否かを判定する判定手段と
 を備える情報処理装置。
 [付記2]
 前記差分強調手段は、
  前記第2の画像と前記第3の画像との差分を抽出する抽出手段と、
  前記差分を強調する強調手段とを含む
 付記1に記載の情報処理装置。
 [付記3]
 前記第2の画像は、複数のフレームを含む動画像であり、
 前記合成手段は、1以上のフレームを含む前記第3の画像を合成し、
 前記差分強調手段は、前記第2の画像が含むフレームと、当該フレームに対応する前記第3の画像が含むフレームとの差分を強調する
 付記1又は2に記載の情報処理装置。
 [付記4]
 前記差分強調手段は、前記第2の画像が含むフレームと、当該フレームに対応する前記第3の画像が含むフレームとの差分を強調した差分フレームを含む差分強調動画を生成し、
 前記算出手段は、前記差分強調動画に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出する
 付記3に記載の情報処理装置。
 [付記5]
 前記合成手段は、前記第2の画像からランドマークを検出する検出手段を含み、
 前記第1の画像、及び前記ランドマークに基づいて、前記第3の画像を合成する
 付記3に記載の情報処理装置。
 [付記6]
 前記判定手段は、前記指標と所定の閾値との比較により、前記第2の画像が合成された画像か否かを判定する
 付記1又は2に記載の情報処理装置。
 [付記7]
 前記第1の画像に写る対象、及び前記第2の画像に写る対象の少なくとも一方を照合する照合手段と、
 前記判定手段による判定結果、及び前記照合手段による照合結果の少なくとも一方に基づいて、前記対象を認証する認証手段と
 を備える付記1又は2に記載の情報処理装置。
 [付記8]
 第1の画像及び第2の画像の入力を受け付け、
 前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成し、
 前記第2の画像と前記第3の画像との差分を強調し、
 前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出し、
 前記指標に応じて、前記第2の画像が合成された画像か否かを判定する
 情報処理方法。
 [付記9]
 コンピュータに、
 第1の画像及び第2の画像の入力を受け付け、
 前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成し、
 前記第2の画像と前記第3の画像との差分を強調し、
 前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出し、
 前記指標に応じて、前記第2の画像が合成された画像か否かを判定する
 情報処理方法を実行させるためのコンピュータプログラムが記録されている記録媒体。
The following supplementary notes are further disclosed regarding the above-described embodiment.
[Appendix 1]
A receiving means for receiving an input of a first image and a second image;
a synthesizing means for synthesizing a third image based on the first image and the second image;
a difference enhancing means for enhancing a difference between the second image and the third image;
a calculation means for calculating an index representing a likelihood that the second image is a synthesized image based on the emphasized difference;
and determining whether the second image is a synthesized image or not, based on the index.
[Appendix 2]
The difference emphasis means is
an extraction means for extracting a difference between the second image and the third image;
and highlighting means for highlighting the difference.
[Appendix 3]
the second image is a video including a plurality of frames;
The synthesizing means synthesizes the third image including one or more frames;
The information processing device according to claim 1 or 2, wherein the difference emphasis means emphasizes a difference between a frame included in the second image and a frame included in the third image corresponding to the frame.
[Appendix 4]
the difference emphasizing means generates a difference-emphasized video including a difference frame in which a difference between a frame included in the second image and a frame included in the third image corresponding to the frame is emphasized;
The information processing device according to claim 3, wherein the calculation means calculates an index representing a likelihood that the second image is a synthesized image based on the difference-emphasized moving image.
[Appendix 5]
The synthesis means includes a detection means for detecting a landmark from the second image,
The information processing device according to claim 3, further comprising: a first image processing unit that processes the first image and the landmarks to generate a third image.
[Appendix 6]
The information processing device according to claim 1 or 2, wherein the determining means determines whether the second image is a synthesized image by comparing the index with a predetermined threshold value.
[Appendix 7]
A matching means for matching at least one of an object appearing in the first image and an object appearing in the second image;
and an authentication unit that authenticates the target based on at least one of a determination result by the determination unit and a matching result by the matching unit.
[Appendix 8]
Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the indicator.
[Appendix 9]
On the computer,
Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the index.

 以上、実施の形態を参照して本開示を説明したが、本開示は上述の実施の形態に限定されるものではない。本開示の構成や詳細には、本開示のスコープ内で当業者が理解し得る様々な変更をすることができる。そして、各実施の形態は、適宜他の実施の形態と組み合わせることができる。 The present disclosure has been described above with reference to the embodiments, but the present disclosure is not limited to the above-mentioned embodiments. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present disclosure within the scope of the present disclosure. Furthermore, each embodiment can be combined with other embodiments as appropriate.

1,2,3,4 情報処理装置
11,211,311 受付部
12,212,312 合成部
13,213,313 差分強調部
2131,3131 抽出部
2132,3132 強調部
214,314 算出部
215,315 判定部
216,316,416 出力部
3121検出部
417 照合部
418 成りすまし判定部
419 認証部
1, 2, 3, 4 Information processing device 11, 211, 311 Reception unit 12, 212, 312 Synthesis unit 13, 213, 313 Difference emphasis unit 2131, 3131 Extraction unit 2132, 3132 Emphasis unit 214, 314 Calculation unit 215, 315 Determination unit 216, 316, 416 Output unit 3121 Detection unit 417 Collation unit 418 Impersonation determination unit 419 Authentication unit

Claims (9)

 第1の画像及び第2の画像の入力を受け付ける受付手段と、
 前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成する合成手段と、
 前記第2の画像と前記第3の画像との差分を強調する差分強調手段と、
 前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出する算出手段と、
 前記指標に応じて、前記第2の画像が合成された画像か否かを判定する判定手段と
 を備える情報処理装置。
A receiving means for receiving an input of a first image and a second image;
a synthesizing means for synthesizing a third image based on the first image and the second image;
a difference enhancing means for enhancing a difference between the second image and the third image;
a calculation means for calculating an index representing a likelihood that the second image is a synthesized image based on the emphasized difference;
and determining whether the second image is a synthesized image or not, based on the index.
 前記差分強調手段は、
  前記第2の画像と前記第3の画像との差分を抽出する抽出手段と、
  前記差分を強調する強調手段とを含む
 請求項1に記載の情報処理装置。
The difference emphasis means is
an extraction means for extracting a difference between the second image and the third image;
The information processing apparatus according to claim 1 , further comprising: emphasis means for emphasizing the difference.
 前記第2の画像は、複数のフレームを含む動画像であり、
 前記合成手段は、1以上のフレームを含む前記第3の画像を合成し、
 前記差分強調手段は、前記第2の画像が含むフレームと、当該フレームに対応する前記第3の画像が含むフレームとの差分を強調する
 請求項1又は2に記載の情報処理装置。
the second image is a video including a plurality of frames;
The synthesizing means synthesizes the third image including one or more frames;
The information processing apparatus according to claim 1 , wherein the difference emphasis means emphasizes a difference between a frame included in the second image and a frame included in the third image corresponding to the frame.
 前記差分強調手段は、前記第2の画像が含むフレームと、当該フレームに対応する前記第3の画像が含むフレームとの差分を強調した差分フレームを含む差分強調動画を生成し、
 前記算出手段は、前記差分強調動画に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出する
 請求項3に記載の情報処理装置。
the difference emphasizing means generates a difference-emphasized video including a difference frame in which a difference between a frame included in the second image and a frame included in the third image corresponding to the frame is emphasized;
The information processing apparatus according to claim 3 , wherein the calculation means calculates an index representing a likelihood that the second image is a synthesized image, based on the difference-emphasized moving image.
 前記合成手段は、前記第2の画像からランドマークを検出する検出手段を含み、
 前記第1の画像、及び前記ランドマークに基づいて、前記第3の画像を合成する
 請求項3に記載の情報処理装置。
The synthesis means includes a detection means for detecting a landmark from the second image,
The information processing apparatus according to claim 3 , wherein the third image is synthesized based on the first image and the landmarks.
 前記判定手段は、前記指標と所定の閾値との比較により、前記第2の画像が合成された画像か否かを判定する
 請求項1又は2に記載の情報処理装置。
The information processing apparatus according to claim 1 , wherein the determining means determines whether or not the second image is a synthesized image by comparing the index with a predetermined threshold value.
 前記第1の画像に写る対象、及び前記第2の画像に写る対象の少なくとも一方を照合する照合手段と、
 前記判定手段による判定結果、及び前記照合手段による照合結果の少なくとも一方に基づいて、前記対象を認証する認証手段と
 を備える請求項1又は2に記載の情報処理装置。
A matching means for matching at least one of an object appearing in the first image and an object appearing in the second image;
The information processing apparatus according to claim 1 , further comprising: an authentication unit that authenticates the target based on at least one of a determination result by the determination unit and a comparison result by the comparison unit.
 第1の画像及び第2の画像の入力を受け付け、
 前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成し、
 前記第2の画像と前記第3の画像との差分を強調し、
 前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出し、
 前記指標に応じて、前記第2の画像が合成された画像か否かを判定する
 情報処理方法。
Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the indicator.
 コンピュータに、
 第1の画像及び第2の画像の入力を受け付け、
 前記第1の画像及び前記第2の画像に基づいて、第3の画像を合成し、
 前記第2の画像と前記第3の画像との差分を強調し、
 前記強調された差分に基づいて、前記第2の画像の合成された画像らしさを表す指標を算出し、
 前記指標に応じて、前記第2の画像が合成された画像か否かを判定する
 情報処理方法を実行させるためのコンピュータプログラムが記録されている記録媒体。
On the computer,
Accepting input of a first image and a second image;
synthesizing a third image based on the first image and the second image;
highlighting a difference between the second image and the third image;
calculating an index representing a likelihood of the second image being a synthesized image based on the emphasized difference;
determining whether the second image is a synthesized image or not according to the index.
PCT/JP2023/022755 2023-06-20 2023-06-20 Information processing device, information processing method, and recording medium Pending WO2024261856A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/022755 WO2024261856A1 (en) 2023-06-20 2023-06-20 Information processing device, information processing method, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/022755 WO2024261856A1 (en) 2023-06-20 2023-06-20 Information processing device, information processing method, and recording medium

Publications (1)

Publication Number Publication Date
WO2024261856A1 true WO2024261856A1 (en) 2024-12-26

Family

ID=93935146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/022755 Pending WO2024261856A1 (en) 2023-06-20 2023-06-20 Information processing device, information processing method, and recording medium

Country Status (1)

Country Link
WO (1) WO2024261856A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861956A (en) * 2020-06-24 2020-10-30 北京金山云网络技术有限公司 Picture processing method and device, electronic equipment and medium
JP2021089219A (en) * 2019-12-05 2021-06-10 東洋製罐グループホールディングス株式会社 Image inspection system and image inspection method
US20220058375A1 (en) * 2020-02-21 2022-02-24 Samsung Electronics Co., Ltd. Server, electronic device, and control methods therefor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021089219A (en) * 2019-12-05 2021-06-10 東洋製罐グループホールディングス株式会社 Image inspection system and image inspection method
US20220058375A1 (en) * 2020-02-21 2022-02-24 Samsung Electronics Co., Ltd. Server, electronic device, and control methods therefor
CN111861956A (en) * 2020-06-24 2020-10-30 北京金山云网络技术有限公司 Picture processing method and device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
JP7365445B2 (en) Computing apparatus and method
US11790494B2 (en) Facial verification method and apparatus based on three-dimensional (3D) image
CN108664782B (en) Facial verification methods and devices
JP7046625B2 (en) Face recognition method and equipment
KR102434562B1 (en) Method and apparatus for detecting fake fingerprint, method and apparatus for recognizing fingerprint
WO2017101267A1 (en) Method for identifying living face, terminal, server, and storage medium
CN108664879A (en) Face authentication method and apparatus
CN109766785B (en) Method and device for liveness detection of human face
CN114627543A (en) Method and apparatus for face recognition
US12236717B2 (en) Spoof detection based on challenge response analysis
WO2018234384A1 (en) DETECTION OF FACIAL ARTIFICIAL IMAGES USING FACIAL REFERENCES
KR101897072B1 (en) Method and apparatus for verifying facial liveness in mobile terminal
CN113992812A (en) Method and apparatus for activity detection
WO2024261856A1 (en) Information processing device, information processing method, and recording medium
KR20180108361A (en) Method and apparatus for verifying face
WO2024142399A1 (en) Information processing device, information processing system, information processing method, and recording medium
JP2008009617A (en) System, program, and method for individual biological information collation
EP4645214A1 (en) Information processing device, information processing system, information processing method, and recording medium
JP2022522251A (en) Handwritten signature authentication method and device based on multiple verification algorithms
US12112220B1 (en) Authenticating a physical card using sensor data
CN119312308B (en) Authentication method, device, computer equipment and storage medium
KR102579610B1 (en) Apparatus for Detecting ATM Abnormal Behavior and Driving Method Thereof
CN209690934U (en) A kind of face identification device
TWI576717B (en) Dimensional biometric identification system and method
WO2024105778A1 (en) Information processing device, information processing method, and recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23942293

Country of ref document: EP

Kind code of ref document: A1