CN118427755A

CN118427755A - Training method, deep counterfeiting identification method and electronic equipment

Info

Publication number: CN118427755A
Application number: CN202410582096.0A
Authority: CN
Inventors: 关涛; 张月祥; 李国亮; 石岩; 窦晨晨
Original assignee: Beijing Smart Technology Co Ltd
Current assignee: Beijing Smart Technology Co Ltd
Priority date: 2024-05-11
Filing date: 2024-05-11
Publication date: 2024-08-02

Abstract

The present application provides a training method, a deep fake identification method and an electronic device, which obtains picture sample data, audio sample data and video sample data when obtaining a sample data set, and trains based on the picture sample data, audio sample data and video sample data to obtain a target deep fake identification model. The target deep fake identification model obtained can perform authenticity identification on picture data, audio data and video data, and realizes the cross-modal comprehensive identification function through one identification model.

Description

A training method, a deep fake identification method and an electronic device

技术领域Technical Field

本申请涉及数字内容安全技术领域，特别是涉及一种训练方法、深度伪造鉴别方法及电子设备。The present application relates to the field of digital content security technology, and in particular to a training method, a deep fake identification method and an electronic device.

背景技术Background technique

随着人工智能技术的发展，深度伪造技术能够生成逼真的图像、音频和视频内容，这在一定程度上对社会秩序和个人隐私安全造成了威胁。因此，研究一种有效的深伪鉴别方法，对于保障数字内容安全具有重要意义。然而目前的深伪鉴别技术通常只能针对单一模态的内容进行鉴别，比如，仅能针对图像内容进行鉴别，或者仅能针对音频内容进行鉴别。With the development of artificial intelligence technology, deep fake technology can generate realistic images, audio and video content, which poses a threat to social order and personal privacy security to a certain extent. Therefore, studying an effective deep fake identification method is of great significance to ensure the security of digital content. However, current deep fake identification technology can usually only identify content of a single modality, for example, it can only identify image content or audio content.

发明内容Summary of the invention

本申请实施例的目的在于提供一种训练方法、深度伪造鉴别方法及电子设备，以解决上述技术问题。The purpose of the embodiments of the present application is to provide a training method, a deep fake identification method and an electronic device to solve the above-mentioned technical problems.

一方面，提供一种深度伪造鉴别模型的训练方法，所述方法包括：On the one hand, a method for training a deep fake identification model is provided, the method comprising:

获取样本数据集；所述样本数据集中包括多个样本数据，每一所述样本数据标记有用于指示所述样本数据是否为深度伪造数据的标签信息，所述样本数据集中包括图片样本数据、音频样本数据以及视频样本数据；Obtain a sample data set; the sample data set includes a plurality of sample data, each of the sample data is marked with label information for indicating whether the sample data is deep fake data, and the sample data set includes picture sample data, audio sample data, and video sample data;

对各所述样本数据进行预处理得到预处理样本数据；Preprocessing each of the sample data to obtain preprocessed sample data;

对各所述预处理样本数据进行特征提取，得到与各所述样本数据对应的样本数据特征；Performing feature extraction on each of the preprocessed sample data to obtain sample data features corresponding to each of the sample data;

基于各所述样本数据特征和预设的初始深度伪造鉴别模型进行训练，得到目标深度伪造鉴别模型。Training is performed based on the sample data features and a preset initial deep fake identification model to obtain a target deep fake identification model.

在其中一个实施例中，所述对各所述样本数据进行预处理得到预处理样本数据，包括：In one embodiment, the preprocessing of each sample data to obtain preprocessed sample data includes:

对各所述图片样本数据进行图像归一化处理，得到预处理图片样本数据，并对各所述音频样本数据进行音频降噪处理，得到预处理音频样本数据，并对各所述视频样本数据进行关键帧提取，得到预处理视频样本数据。Perform image normalization processing on each of the picture sample data to obtain preprocessed picture sample data, perform audio noise reduction processing on each of the audio sample data to obtain preprocessed audio sample data, and perform key frame extraction on each of the video sample data to obtain preprocessed video sample data.

在其中一个实施例中，所述对各所述预处理样本数据进行特征提取，得到与各所述样本数据对应的样本数据特征，包括：In one embodiment, the extracting features of each of the preprocessed sample data to obtain sample data features corresponding to each of the sample data includes:

对所述预处理图片样本数据进行特征提取，得到对应的样本视觉特征，并对所述预处理音频样本数据进行特征提取，得到对应的样本听觉特征，并对所述预处理视频样本数据进行特征提取，得到对应的由样本视觉特征和样本听觉特征组合的特征；所述样本视觉特征包括颜色特征、纹理特征和形状特征中的至少一种，所述样本听觉特征包括音高特征、声音节奏特征和声音强度特征中的至少一种。Feature extraction is performed on the preprocessed image sample data to obtain corresponding sample visual features, and feature extraction is performed on the preprocessed audio sample data to obtain corresponding sample auditory features, and feature extraction is performed on the preprocessed video sample data to obtain corresponding features combined by the sample visual features and the sample auditory features; the sample visual features include at least one of color features, texture features and shape features, and the sample auditory features include at least one of pitch features, sound rhythm features and sound intensity features.

在其中一个实施例中，所述初始深度伪造鉴别模型的损失函数为：其中，表示第i个样本数据的模型预测值，y_i表示第i个样本数据的真实标签值，N表示所述样本数据的总数，L表示损失值。In one embodiment, the loss function of the initial deep fake identification model is: in, represents the model prediction value of the ith sample data, _yi represents the true label value of the ith sample data, N represents the total number of sample data, and L represents the loss value.

另一方面，提供了一种深度伪造鉴别方法，包括：On the other hand, a deep fake identification method is provided, comprising:

获取待鉴别数据；Obtaining data to be identified;

对所述待鉴别数据进行预处理得到预处理待鉴别数据；Preprocessing the data to be identified to obtain preprocessed data to be identified;

对所述预处理待鉴别数据进行特征提取得到待鉴别数据特征；Extracting features from the pre-processed data to be identified to obtain features of the data to be identified;

将所述待鉴别数据特征输入上述任一所述的目标深度伪造鉴别模型，得到所述待鉴别数据的深度伪造得分；Inputting the features of the data to be identified into any of the above-mentioned target deep fake identification models to obtain a deep fake score of the data to be identified;

将所述深度伪造得分与预设的深度伪造得分阈值进行比对，并根据比对结果得到所述待鉴别数据的鉴别结果。The deep fake score is compared with a preset deep fake score threshold, and an identification result of the data to be identified is obtained based on the comparison result.

在其中一个实施例中，所述待鉴别数据包括待鉴别视频数据，所述对所述待鉴别数据进行预处理得到预处理待鉴别数据，包括：In one embodiment, the data to be identified includes video data to be identified, and the preprocessing of the data to be identified to obtain preprocessed data to be identified includes:

对所述待鉴别视频数据进行关键帧提取，得到视频关键帧数据；Extracting key frames from the video data to be identified to obtain video key frame data;

对各所述视频关键帧数据进行图像归一化处理和音频降噪处理得到所述待鉴别数据的预处理待鉴别数据。The image normalization processing and the audio noise reduction processing are performed on each of the video key frame data to obtain the pre-processed data to be identified.

在其中一个实施例中，所述将所述深度伪造得分与预设的深度伪造得分阈值进行比对，并根据比对结果得到所述待鉴别数据的鉴别结果，包括：In one embodiment, comparing the deep fake score with a preset deep fake score threshold, and obtaining an identification result of the data to be identified based on the comparison result, includes:

当所述深度伪造得分小于预设的第一深度伪造得分阈值时，判定所述待鉴别数据为真实数据；When the deep fake score is less than a preset first deep fake score threshold, determining that the data to be authenticated is real data;

当所述深度伪造得分大于预设的第二深度伪造得分阈值时，判定所述待鉴别数据为深度伪造数据。When the deep fake score is greater than a preset second deep fake score threshold, the data to be identified is determined to be deep fake data.

在其中一个实施例中，所述第二深度伪造得分阈值大于所述第一深度伪造得分阈值，所述将所述深度伪造得分与预设的深度伪造得分阈值进行比对，并根据比对结果得到所述待鉴别数据的鉴别结果，包括：In one embodiment, the second deep fake score threshold is greater than the first deep fake score threshold, and comparing the deep fake score with a preset deep fake score threshold, and obtaining an identification result of the data to be identified according to the comparison result, includes:

当所述深度伪造得到大于所述第一深度伪造得分阈值，且小于所述第二深度伪造得分阈值时，判定所述待鉴别数据为真伪待定数据。When the deep fake score is greater than the first deep fake score threshold and less than the second deep fake score threshold, the data to be identified is determined to be data whose authenticity is pending.

在其中一个实施例中，在所述将所述深度伪造得分与预设的深度伪造得分阈值进行比对之前，所述方法包括：In one embodiment, before comparing the deep fake score with a preset deep fake score threshold, the method includes:

确定所述待鉴别数据的来源场景；Determining a source scenario of the data to be authenticated;

根据预先设置的来源场景与深度伪造得分阈值的对应关系，确定与所述待鉴别数据的来源场景对应的深度伪造得分阈值。According to the preset correspondence between the source scene and the deep fake score threshold, the deep fake score threshold corresponding to the source scene of the data to be identified is determined.

另一方面，提供了一种电子设备，包括处理器和存储器，所述存储器中存储有计算机程序，所述处理器执行所述计算机程序，以实现上述任一所述的方法。On the other hand, an electronic device is provided, including a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement any of the above methods.

通过本申请提供的训练方法、深度伪造鉴别方法及电子设备，在获取样本数据集时获取图片样本数据、音频样本数据以及视频样本数据，基于图片样本数据、音频样本数据以及视频样本数据进行训练得到目标深度伪造鉴别模型，因此得到的目标深度伪造鉴别模型可以对图片数据进行真伪鉴别，也可以对音频数据进行真伪鉴别，也可以对视频数据进行真伪鉴别，通过一个鉴别模型实现了跨模态的综合鉴别功能。Through the training method, deep fake identification method and electronic device provided by the present application, image sample data, audio sample data and video sample data are obtained when obtaining a sample data set, and a target deep fake identification model is obtained through training based on the image sample data, audio sample data and video sample data. Therefore, the obtained target deep fake identification model can perform authenticity identification on image data, can also perform authenticity identification on audio data, and can also perform authenticity identification on video data, thereby realizing a cross-modal comprehensive identification function through one identification model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请实施例一提供的深度伪造鉴别模型的训练方法的流程示意图；FIG1 is a schematic diagram of a process of training a deep fake identification model provided in Example 1 of the present application;

图2为本申请实施例一提供的深度伪造鉴别方法的流程示意图；FIG2 is a schematic diagram of a process flow of a deep fake identification method provided in Example 1 of the present application;

图3为本申请实施例二提供的电子设备的结构示意图。FIG3 is a schematic diagram of the structure of an electronic device provided in Embodiment 2 of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

实施例一：Embodiment 1:

本申请实施例提供一种深度伪造鉴别模型的训练方法，请参见图1所示，包括如下步骤：The present application embodiment provides a method for training a deep fake identification model, as shown in FIG1 , comprising the following steps:

S11：获取样本数据集；样本数据集中包括多个样本数据，每一样本数据标记有用于指示该样本数据是否为深度伪造数据的标签信息，样本数据集中包括图片样本数据、音频样本数据以及视频样本数据。S11: Obtain a sample data set; the sample data set includes multiple sample data, each sample data is marked with label information for indicating whether the sample data is deep fake data, and the sample data set includes picture sample data, audio sample data and video sample data.

S12：对各样本数据进行预处理得到预处理样本数据。S12: Preprocess each sample data to obtain preprocessed sample data.

S13：对各预处理样本数据进行特征提取，得到与各样本数据对应的样本数据特征。S13: Extract features from each preprocessed sample data to obtain sample data features corresponding to each sample data.

S14：基于各样本数据特征和预设的初始深度伪造鉴别模型进行训练，得到目标深度伪造鉴别模型。S14: Training is performed based on the characteristics of each sample data and a preset initial deep fake identification model to obtain a target deep fake identification model.

下面，对上述各步骤中的内容进行详细介绍。The following is a detailed introduction to the contents of each of the above steps.

在步骤S11中，可以从多种不同的来源场景收集大量的真实的样本数据和深度伪造的样本数据。这些样本数据可以涵盖不同的人物、场景、表情和动作，以确保多样性和代表性。此外，对于深度伪造的样本数据，应当包括不同质量的伪造内容，从低分辨率到高分辨率，从简单编辑到复杂合成，以模拟真实世界中的各种情况。In step S11, a large amount of real sample data and deep fake sample data can be collected from a variety of different source scenes. These sample data can cover different characters, scenes, expressions and actions to ensure diversity and representativeness. In addition, for deep fake sample data, it should include fake content of different qualities, from low resolution to high resolution, from simple editing to complex synthesis, to simulate various situations in the real world.

示例性的，本申请实施例中的样本数据集可以包括N个真实的样本数据，M个深度伪造的样本数据。具体来说，可以通过以下途径收集样本数据：Exemplarily, the sample data set in the embodiment of the present application may include N real sample data and M deep fake sample data. Specifically, the sample data may be collected through the following ways:

公开数据集：利用已有的深度伪造和真实内容数据集，如CelebA、DeepFakeDetection Challenge等。Public datasets: Leverage existing deep fake and real content datasets such as CelebA, DeepFakeDetection Challenge, etc.

社交媒体：从社交媒体平台获取公开分享的真实图片、真实音频、真实视频，以及被标记为深度伪造的内容。Social Media: Obtain publicly shared real images, real audio, real videos, and content marked as deepfakes from social media platforms.

合作伙伴：与电影制作公司、新闻机构等合作，获取高质量的真实和伪造样本数据。Partners: Collaborate with film production companies, news organizations, etc. to obtain high-quality real and fake sample data.

为确保收集的样本数据集的多样性，收集的样本数据应覆盖不同的：To ensure the diversity of the collected sample data sets, the collected sample data should cover different:

场景：室内、室外、自然、城市等。Scenes: indoor, outdoor, nature, city, etc.

表情：喜怒哀乐等。Expressions: joy, anger, sorrow, happiness, etc.

光照条件：强光、弱光、阴影等。Lighting conditions: strong light, weak light, shadow, etc.

在步骤S12中，针对不同类型的样本数据应当采用不同的预处理策略进行预处理，具体包括：In step S12, different preprocessing strategies should be used for preprocessing different types of sample data, including:

对各图片样本数据进行图像归一化处理，得到预处理图片样本数据，并对各音频样本数据进行音频降噪处理，得到预处理音频样本数据，并对各视频样本数据进行关键帧提取，得到预处理视频样本数据。Perform image normalization processing on each image sample data to obtain preprocessed image sample data, perform audio noise reduction processing on each audio sample data to obtain preprocessed audio sample data, and perform key frame extraction on each video sample data to obtain preprocessed video sample data.

需要说明的是，音频样本数据通常同时包含有图像内容和音频内容，因此在对视频样本数据进行关键帧提取后，针对关键帧对应的图像内容和音频内容，可以分别进行图像归一化处理和音频降噪处理，从而得到预处理视频样本数据。视频帧提取可以通过关键帧检测算法实现，该算法基于运动向量V和颜色变化C来确定关键帧。It should be noted that audio sample data usually contains both image content and audio content. Therefore, after extracting key frames from video sample data, image normalization processing and audio noise reduction processing can be performed on the image content and audio content corresponding to the key frames, respectively, to obtain preprocessed video sample data. Video frame extraction can be achieved through a key frame detection algorithm, which determines key frames based on motion vectors V and color changes C.

本申请实施例中采用相同的图像归一化处理策略对各图片样本数据以及视频样本数据中的图像内容进行图像归一化处理，包括但不限于调整图像的亮度、对比度、饱和度、尺寸和颜色，使各图片样本数据以及视频样本数据中的图像内容均符合初始深度伪造鉴别模型的输入标准。In the embodiment of the present application, the same image normalization processing strategy is used to perform image normalization processing on the image content in each picture sample data and video sample data, including but not limited to adjusting the brightness, contrast, saturation, size and color of the image, so that the image content in each picture sample data and video sample data meets the input standard of the initial deep fake identification model.

示例性的，本申请实施例中可以基于公式Inorm＝(I-μI)/σI进行图像亮度归一化处理，其中，I表示原始图像，也即表示图片样本数据或视频样本数据中的图像内容中各像素点上的原始亮度值，μI表示原始图像的平均亮度，σI表示原始图像亮度的标准差，基于该公式可以调整各图像的亮度，可以理解的是，可以参照该公式对各图像的对比度和饱和度进行调整。Exemplarily, in an embodiment of the present application, image brightness normalization processing can be performed based on the formula Inorm=(I-μI)/σI, wherein I represents the original image, that is, the original brightness value of each pixel in the image content in the picture sample data or the video sample data, μI represents the average brightness of the original image, and σI represents the standard deviation of the brightness of the original image. Based on this formula, the brightness of each image can be adjusted. It can be understood that the contrast and saturation of each image can be adjusted with reference to this formula.

图像在进行尺寸调整时，可以将各图像调整为统一的分辨率，如224x224像素。图像在进行颜色调整时，可以使用标准化方法，使图像的RGB值分布在均值为0，标准差为1的范围内。When resizing images, each image can be adjusted to a uniform resolution, such as 224x224 pixels. When adjusting the color of an image, a standardization method can be used to make the RGB values of the image distributed within a range with a mean of 0 and a standard deviation of 1.

本申请实施例中在对音频样本数据或视频样本数据对应的音频内容进行音频降噪处理时，可以使用信号处理技术去除背景噪声，提高音频的清晰度。音频降噪可以使用自适应滤波器，其输出y[n]可以表示为： x[n]表示原始音频信号，是时间序列上的采样点，其中n表示采样序号，表示时间序列上的采样点，h[m]表示滤波器的冲激响应，是一个权重系数向量，其中m表示冲激响应中的序号，用于与x[n-m]结合产生滤波后的输出。y[n]表示滤波后的输出信号，M表示冲激响应的长度，即滤波器的阶数或大小。In the embodiment of the present application, when performing audio noise reduction processing on the audio content corresponding to the audio sample data or the video sample data, a signal processing technology can be used to remove background noise and improve the clarity of the audio. The audio noise reduction can use an adaptive filter, and its output y[n] can be expressed as: x[n] represents the original audio signal, which is a sampling point in the time series, where n represents the sampling sequence number, and represents the sampling point in the time series. h[m] represents the impulse response of the filter, which is a weight coefficient vector, where m represents the sequence number in the impulse response, and is used to combine with x[nm] to produce the filtered output. y[n] represents the output signal after filtering, and M represents the length of the impulse response, that is, the order or size of the filter.

需要说明的是，对各音频样本数据进行音频降噪处理还包括：在去除背景噪声后，将时域音频信号转换为频域音频信号。例如采用FFT(傅里叶变换)或MFCC(梅尔频谱系数)将时域音频信号转换为频域音频信号。It should be noted that performing audio noise reduction processing on each audio sample data also includes: after removing the background noise, converting the time domain audio signal into a frequency domain audio signal, for example, using FFT (Fourier Transform) or MFCC (Mel Spectrum Coefficient) to convert the time domain audio signal into a frequency domain audio signal.

本申请实施例中的步骤S13可以包括：Step S13 in the embodiment of the present application may include:

对预处理图片样本数据进行特征提取，得到对应的样本视觉特征，并对预处理音频样本数据进行特征提取，得到对应的样本听觉特征，并对预处理视频样本数据进行特征提取，得到对应的由样本视觉特征和样本听觉特征组合的特征；样本视觉特征包括颜色特征、纹理特征和形状特征中的至少一种，样本听觉特征包括音高特征、声音节奏特征和声音强度特征中的至少一种。Feature extraction is performed on the preprocessed image sample data to obtain corresponding sample visual features, feature extraction is performed on the preprocessed audio sample data to obtain corresponding sample auditory features, and feature extraction is performed on the preprocessed video sample data to obtain corresponding features combined by the sample visual features and the sample auditory features; the sample visual features include at least one of color features, texture features and shape features, and the sample auditory features include at least one of pitch features, sound rhythm features and sound intensity features.

需要说明的是，在步骤S13中，可以使用深度学习模型来进行特征提取。具体而言，可以采用CNN(卷积神经网络)对预处理图片样本数据或预处理视频样本数据进行特征提取，得到对应的视觉特征。可以采用RNN(循环神经网络)或LSTM(长短时记忆网络)对预处理音频样本数据进行特征提取，得到对应的听觉特征。It should be noted that in step S13, a deep learning model can be used to extract features. Specifically, a CNN (convolutional neural network) can be used to extract features from preprocessed image sample data or preprocessed video sample data to obtain corresponding visual features. RNN (recurrent neural network) or LSTM (long short-term memory network) can be used to extract features from preprocessed audio sample data to obtain corresponding auditory features.

本申请实施例中的初始深度伪造鉴别模型的损失函数为：其中，表示第i个样本数据的模型预测值，y_i表示第i个样本数据的真实标签值，N表示所述样本数据的总数，L表示损失值。模型训练的目标是最小化该损失函数。The loss function of the initial deep fake identification model in the embodiment of the present application is: in, represents the model prediction value of the i-th sample data, _yi represents the true label value of the i-th sample data, N represents the total number of sample data, and L represents the loss value. The goal of model training is to minimize the loss function.

需要说明的是，在模型训练阶段，可以采用以下深度学习构架：It should be noted that during the model training phase, the following deep learning framework can be used:

通过VAE(变分自编码器)学习真实内容的低维表示，并用以重建真实内容，以此作为鉴别伪造内容的基准。VAE (Variational Autoencoder) is used to learn a low-dimensional representation of real content and use it to reconstruct real content as a benchmark for identifying forged content.

通过GAN生成对抗网络(生成对抗网络)生成伪造内容的样本，用以训练鉴别器，提高其对伪造内容的识别能力。The GAN generative adversarial network (GAN) is used to generate samples of forged content to train the discriminator and improve its ability to identify forged content.

通过Transformer模型处理视频和音频数据中的长距离依赖，例如，视频中连续帧之间的相关性或音频中的长时频谱特征。The Transformer model can be used to handle long-range dependencies in video and audio data, such as the correlation between consecutive frames in video or long-term spectral features in audio.

VAE是一种生成模型，它通过编码器将输入数据压缩成一个低维的潜在表示，然后通过解码器重建输入数据。VAE的关键特点在于它使用变分推断来优化潜在空间的分布，使其接近先验分布。VAE模型的目标函数可以表示为：min_q,p E_q(z∣x)[logp(x∣z)]-KL(q(z∣x)∣∣p(z))，其中q是编码器分布，p是解码器分布，KL是KL散度。VAE的架构可以表示为：VAE is a generative model that compresses input data into a low-dimensional latent representation through an encoder and then reconstructs the input data through a decoder. The key feature of VAE is that it uses variational inference to optimize the distribution of the latent space to make it close to the prior distribution. The objective function of the VAE model can be expressed as: min _q,p E _q(z|x) [logp(x|z)]-KL(q(z|x)||p(z)), where q is the encoder distribution, p is the decoder distribution, and KL is the KL divergence. The architecture of VAE can be expressed as:

1)编码器q_φ(z∣x)＝N(μ_φ(x),σ_φ(x)²I)：输入数据x被映射到潜在空间z的分布上，其中μ_φ和σ_φ分别是潜在表示的均值和对数方差的参数化函数。1) Encoder _qφ (z|x)=N( _μφ (x), _σφ (x) ^2I ): The input data x is mapped to a distribution in the latent space z, where _μφ and _σφ are parameterized functions of the mean and log-variance of the latent representation, respectively.

2)解码器p_θ(x∣z)＝N(μ_θ(z),σ_θ(z)²I：潜在表示z被映射回数据空间，其中μ_θ和σ_θ是重建数据的均值和对数方差的参数化函数。2) Decoder p _θ (x|z) = N(μ _θ (z), σ _θ (z) ² I: The latent representation z is mapped back to the data space, where μ _θ and σ _θ are parameterized functions of the mean and log-variance of the reconstructed data.

3)损失函数由两部分组成：重建损失和KL散度，前者衡量重建数据和原始数据之间的差异，后者衡量潜在表示的分布与先验分布之间的差异。3) The loss function consists of two parts: reconstruction loss and KL divergence. The former measures the difference between the reconstructed data and the original data, and the latter measures the difference between the distribution of the potential representation and the prior distribution.

GAN由两部分组成：生成器和鉴别器，GAN模型的训练过程包括生成器G和鉴别器D的对抗过程，它们在训练过程中相互竞争，其目标函数为：GAN consists of two parts: the generator and the discriminator. The training process of the GAN model includes the adversarial process of the generator G and the discriminator D. They compete with each other during the training process, and their objective function is:

1)生成器G：目标是生成尽可能逼真的数据G(z)，其中z是从先验分布pz中采样的噪声。1) Generator G: The goal is to generate data G(z) that is as realistic as possible, where z is the noise sampled from the prior distribution pz.

2)鉴别器D：目标是区分真实数据x和生成器生成的假数据G(z)。2) Discriminator D: The goal is to distinguish between real data x and fake data G(z) generated by the generator.

3)对抗过程：生成器和鉴别器通过对抗过程不断更新，生成器学习生成越来越逼真的数据，而鉴别器学习更准确地分类真假数据。3) Adversarial process: The generator and discriminator are continuously updated through the adversarial process. The generator learns to generate more and more realistic data, while the discriminator learns to classify true and false data more accurately.

4)损失函数：鉴别器的损失函数是真实和生成数据的分类准确度，而生成器的损失函数是其欺骗鉴别器的能力。4) Loss Function: The loss function of the discriminator is the classification accuracy of real and generated data, while the loss function of the generator is its ability to deceive the discriminator.

Transformer模型是一种基于自注意力机制的模型，它能够有效地处理序列数据中的长距离依赖问题：The Transformer model is a model based on the self-attention mechanism, which can effectively handle long-distance dependency problems in sequence data:

1)自注意力机制：允许模型在序列的不同位置间直接计算表示，无需像循环神经网络那样逐步处理。1) Self-attention mechanism: allows the model to directly compute representations between different positions in the sequence without having to process them step by step like a recurrent neural network.

2)多头注意力：并行地执行多个自注意力机制，每个头关注序列的不同部分。2) Multi-head attention: Multiple self-attention mechanisms are executed in parallel, with each head focusing on a different part of the sequence.

3)位置编码：由于Transformer缺乏循环或卷积结构，它使用位置编码来提供序列中词汇的位置信息。3) Positional encoding: Since the Transformer lacks a recurrent or convolutional structure, it uses positional encoding to provide the position information of the words in the sequence.

4)输出表示：通过线性层和非线性激活函数生成最终的输出。4) Output representation: Generate the final output through linear layers and non-linear activation functions.

本申请实施例中通过标注有标签信息的样本数据集训练初始深度伪造鉴别模型，使其能够学习到真实内容与深度伪造内容之间的细微差异，训练过程中可以使用以下训练策略：In the embodiment of the present application, the initial deep fake identification model is trained by a sample data set annotated with label information, so that it can learn the subtle differences between real content and deep fake content. The following training strategies can be used during the training process:

数据增强：通过旋转、缩放、裁剪等方法增加样本的多样性；Data enhancement: Increase the diversity of samples by rotating, scaling, cropping, etc.

损失函数：使用二元交叉熵损失函数，以区分真实和伪造内容；Loss function: Use binary cross entropy loss function to distinguish between real and fake content;

优化器：使用Adam优化器，其自适应学习率有助于快速收敛。Optimizer: Use the Adam optimizer, whose adaptive learning rate helps fast convergence.

在模型验证与优化阶段，可以采用以下方法进行数据集划分：During the model validation and optimization phase, the following methods can be used to divide the data set:

分层抽样：确保训练集、验证集和测试集中样本的分布一致。Stratified sampling: Ensure that the distribution of samples in the training set, validation set, and test set is consistent.

可以根据评估结果，调整模型的参数和结构，以优化性能，这包括：Based on the evaluation results, the model parameters and structure can be adjusted to optimize performance, including:

改变网络的深度和宽度；Vary the depth and width of the network;

调整学习率和优化器；Adjust learning rate and optimizer;

使用不同的初始化方法。Use a different initialization method.

在超参数调整阶段，可以采用以下方法优化超参数：During the hyperparameter tuning phase, the following methods can be used to optimize hyperparameters:

网格搜索：遍历不同的超参数组合，找到最优解。Grid search: traverse different hyperparameter combinations to find the optimal solution.

随机搜索：随机选择超参数组合，以避免陷入局部最优。Random Search: Randomly select hyperparameter combinations to avoid getting stuck in local optima.

本申请实施例中，在得到目标深度伪造鉴别模型以后，可以基于该目标深度伪造鉴别模型进行深伪鉴别，因此，本申请实施例还提供一种深度伪造鉴别方法，请参见图2所示，包括：In the embodiment of the present application, after obtaining the target deep fake identification model, deep fake identification can be performed based on the target deep fake identification model. Therefore, the embodiment of the present application also provides a deep fake identification method, as shown in FIG2, including:

S21：获取待鉴别数据。S21: Obtain the data to be authenticated.

S22：对待鉴别数据进行预处理得到预处理待鉴别数据。S22: Preprocessing the data to be identified to obtain preprocessed data to be identified.

S23：对预处理待鉴别数据进行特征提取得到待鉴别数据特征。S23: Extracting features from the pre-processed data to be identified to obtain features of the data to be identified.

S24：将待鉴别数据特征输入目标深度伪造鉴别模型，得到待鉴别数据的深度伪造得分。S24: Input the features of the data to be identified into the target deep fake identification model to obtain the deep fake score of the data to be identified.

S25：将深度伪造得分与预设的深度伪造得分阈值进行比对，并根据比对结果得到待鉴别数据的鉴别结果。S25: Compare the deep fake score with a preset deep fake score threshold, and obtain an identification result of the data to be identified based on the comparison result.

可以理解的是，步骤S21中的待鉴别数据可以是待鉴别视频数据、待鉴别图片数据以及待鉴别音频数据中的任意一种。It can be understood that the data to be identified in step S21 can be any one of video data to be identified, picture data to be identified, and audio data to be identified.

当待鉴别数据是待鉴别视频数据时，步骤S22包括：When the data to be identified is video data to be identified, step S22 includes:

对待鉴别视频数据进行关键帧提取，得到视频关键帧数据；Extract key frames from the video data to be identified to obtain video key frame data;

对各视频关键帧数据进行图像归一化处理和音频降噪处理得到待鉴别数据的预处理待鉴别数据。The image normalization processing and the audio noise reduction processing are performed on the key frame data of each video to obtain the preprocessed data to be identified.

这里对视频关键帧数据进行图像归一化处理和音频降噪处理的过程与上述对视频样本数据进行预处理的过程相同，这里不再赘述。同样的，这里对预处理待鉴别数据进行特征提取得到待鉴别数据特征的过程与上述对预处理样本数据进行特征提取，得到对应的样本数据特征的过程相同，也不再赘述。The process of performing image normalization and audio noise reduction on the video key frame data here is the same as the process of preprocessing the video sample data mentioned above, and will not be repeated here. Similarly, the process of extracting features from the preprocessed data to be identified to obtain features of the data to be identified here is the same as the process of extracting features from the preprocessed sample data to obtain the corresponding sample data features, and will not be repeated here.

步骤S24中的目标深度伪造鉴别模型为采用上述深度伪造鉴别模型的训练方法进行训练得到的目标深度伪造鉴别模型。The target deep fake identification model in step S24 is a target deep fake identification model trained using the above-mentioned deep fake identification model training method.

步骤S25具体可以包括如下内容：Step S25 may specifically include the following contents:

当深度伪造得分小于预设的第一深度伪造得分阈值时，判定待鉴别数据为真实数据；当深度伪造得分大于预设的第二深度伪造得分阈值时，判定待鉴别数据为深度伪造数据。When the deep fake score is less than a preset first deep fake score threshold, the data to be identified is determined to be real data; when the deep fake score is greater than a preset second deep fake score threshold, the data to be identified is determined to be deep fake data.

目标深度伪造鉴别模型输出的深度伪造得分表示待鉴别数据为深度伪造数据的概率。第一深度伪造得分阈值和第二深度伪造得分阈值可以由开发人员灵活设置，具体而言，可以通过分析ROC曲线选择最优的阈值。The deep fake score output by the target deep fake identification model represents the probability that the data to be identified is deep fake data. The first deep fake score threshold and the second deep fake score threshold can be flexibly set by the developer. Specifically, the optimal threshold can be selected by analyzing the ROC curve.

在一种可选的实施方式中，第一深度伪造得分阈值等于第二深度伪造得分阈值，例如，其取值可以为0.5，当目标深度伪造鉴别模型输出的深度伪造得分小于0.5时，判定该待鉴别数据为真实数据，当深度伪造得分大于0.5时，判定该待鉴别数据为深度伪造数据，当目标深度伪造鉴别模型输出的深度伪造得分等于0.5时，可以判定该待鉴别数据为真实数据，或者也可以判定该待鉴别数据为深度伪造数据In an optional implementation, the first deep fake score threshold is equal to the second deep fake score threshold. For example, its value can be 0.5. When the deep fake score output by the target deep fake identification model is less than 0.5, the data to be identified is determined to be real data. When the deep fake score is greater than 0.5, the data to be identified is determined to be deep fake data. When the deep fake score output by the target deep fake identification model is equal to 0.5, the data to be identified can be determined to be real data, or it can also be determined to be deep fake data.

在另外一种可选的实施方式中，第一深度伪造得分阈值小于第二深度伪造得分阈值，当目标深度伪造鉴别模型输出的深度伪造得到大于或等于第一深度伪造得分阈值，且小于或等于第二深度伪造得分阈值时，判定待鉴别数据为真伪待定数据。当待鉴别数据为真伪待定数据时，可以由人工对该待鉴别数据进行进一步鉴定。示例性的，第一深度伪造得分阈值可以为T-δ，第二深度伪造得分阈值可以为T+δ，当监测到的深度伪造得分在[T-δ,T+δ]之间时，判定待鉴别数据为真伪待定数据。In another optional embodiment, the first deep fake score threshold is less than the second deep fake score threshold, and when the deep fake output by the target deep fake identification model is greater than or equal to the first deep fake score threshold, and less than or equal to the second deep fake score threshold, the data to be identified is determined to be data whose authenticity is pending. When the data to be identified is data whose authenticity is pending, the data to be identified can be further identified manually. Exemplarily, the first deep fake score threshold can be T-δ, and the second deep fake score threshold can be T+δ. When the monitored deep fake score is between [T-δ, T+δ], the data to be identified is determined to be data whose authenticity is pending.

本申请实施例中在得到待鉴别数据的鉴别结果之后，可以将该鉴别结果展示给用户，例如，可以通过不同的颜色对用户进行提示，具体的，可以将真实数据以绿色的形式呈现，将深度伪造数据以红色呈现，将真伪待定数据以黄色呈现。可以理解的是，在其他的一些实施例中，也可以通过其他标注标签的形式对用户进行提示。In the embodiment of the present application, after obtaining the identification result of the data to be identified, the identification result can be displayed to the user. For example, different colors can be used to prompt the user. Specifically, the real data can be presented in green, the deep fake data can be presented in red, and the authenticity of the data to be determined can be presented in yellow. It is understandable that in some other embodiments, the user can also be prompted in the form of other annotation labels.

需要说明的是，在不同的应用场景下对待鉴别数据的鉴别需求可能不同，比如，在一些应用场景下可能需要忽略鉴别准确度，需要尽可能地将有深度伪造嫌疑的数据鉴别出来，在另外一些应用场景下可能需要尽可能地将真实数据鉴别出来，因此，针对不同应用场景下鉴别需求的不同，在一些实施例中的步骤S25之前，还可以包括如下步骤：It should be noted that the identification requirements for identification data may be different in different application scenarios. For example, in some application scenarios, it may be necessary to ignore the identification accuracy and identify data suspected of deep forgery as much as possible. In other application scenarios, it may be necessary to identify real data as much as possible. Therefore, in view of the different identification requirements in different application scenarios, the following steps may be included before step S25 in some embodiments:

确定待鉴别数据的来源场景；Determine the source scenario of the data to be identified;

根据预先设置的来源场景与深度伪造得分阈值的对应关系，确定与待鉴别数据的来源场景对应的深度伪造得分阈值。According to the preset correspondence between the source scene and the deep fake score threshold, the deep fake score threshold corresponding to the source scene of the data to be identified is determined.

可以理解的是，这里的来源场景实质上就是上述内容中所提及的应用场景。在本申请实施例中，需要预先针对不同的来源场景设置其对应的深度伪造得分阈值。这里的深度伪造得分阈值包括第一深度伪造得分阈值和第二深度伪造得分阈值。It is understandable that the source scenario here is essentially the application scenario mentioned in the above content. In the embodiment of the present application, it is necessary to set the corresponding deep fake score threshold for different source scenarios in advance. The deep fake score threshold here includes a first deep fake score threshold and a second deep fake score threshold.

在针对某一来源场景设置其对应的深度伪造得分阈值时，可以根据该来源场景下对待鉴别数据的鉴别需求而定，比如，当需要尽可能地将有深度伪造嫌疑的数据鉴别出来时，可以将第二深度伪造得分阈值设置得低一些，当需要尽量能地将具有真实可能的数据鉴别出来时，可以将第一深度伪造得分阈值设置的高一些，但是无论如何设置，都需要保证第一深度伪造得分阈值小于或等于第二深度伪造得分阈值。When setting the corresponding deep fake score threshold for a certain source scenario, it can be determined according to the identification requirements of the data to be identified in the source scenario. For example, when it is necessary to identify data suspected of deep fakes as much as possible, the second deep fake score threshold can be set lower. When it is necessary to identify data that may be real as much as possible, the first deep fake score threshold can be set higher. However, no matter how it is set, it is necessary to ensure that the first deep fake score threshold is less than or equal to the second deep fake score threshold.

在鉴别过程中，当待鉴别数据的深度伪造得分小于与该待鉴别数据的来源场景对应的第一深度伪造得分阈值时，判定该待鉴别数据为真实数据。当待鉴别数据的深度伪造得分大于与该待鉴别数据的来源场景对应的第二深度伪造得分阈值时，判定该待鉴别数据为深度伪造数据。During the authentication process, when the deep fake score of the data to be authenticated is less than the first deep fake score threshold corresponding to the source scene of the data to be authenticated, the data to be authenticated is determined to be real data. When the deep fake score of the data to be authenticated is greater than the second deep fake score threshold corresponding to the source scene of the data to be authenticated, the data to be authenticated is determined to be deep fake data.

本申请实施例中的来源场景包括但不限于是安全监控系统场景、社交媒体平台场景等。The source scenarios in the embodiments of the present application include but are not limited to security monitoring system scenarios, social media platform scenarios, etc.

可以理解的是，安全监控系统或者社交媒体平台可以通过API接口调用鉴别服务从而实现本方法，也可以由执行本方法的电子设备从安全监控系统或者社交媒体平台中获取待鉴别数据。It is understandable that the security monitoring system or social media platform can implement the present method by calling the identification service through an API interface, or the electronic device executing the present method can obtain the data to be identified from the security monitoring system or social media platform.

为便于外部系统或平台调用鉴别服务，可以开发插件，使其他外部系统或平台通过插件的形式使用鉴别功能。In order to facilitate external systems or platforms to call the identification service, plug-ins can be developed to enable other external systems or platforms to use the identification function in the form of plug-ins.

本申请实施例提供的深度伪造鉴别模型的训练方法及深度伪造鉴别方法，通过多模态特征提取和综合分析，增强了模型的泛化能力和鲁棒性，通过一个鉴别模型实现了跨模态的综合鉴别功能。The deep fake identification model training method and deep fake identification method provided in the embodiments of the present application enhance the generalization ability and robustness of the model through multimodal feature extraction and comprehensive analysis, and realize cross-modal comprehensive identification function through one identification model.

应该理解的是，虽然上述流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，上述流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the above-mentioned flow chart are displayed in sequence according to the indication of the arrows, these steps are not necessarily executed in sequence according to the order indicated by the arrows. Unless there is a clear explanation in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in the above-mentioned flow chart may include a plurality of sub-steps or a plurality of stages, and these sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the execution order of these sub-steps or stages is not necessarily to be carried out in sequence, but can be executed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

实施例二：Embodiment 2:

基于同一发明构思，本申请实施例提供一种电子设备，请参见图3所示，包括处理器301和存储器302。存储器302中存储有计算机程序，处理器301和存储器302通过通信总线实现通信，处理器301执行该计算机程序，以实现上述介绍的方法的各步骤，在此不再赘述。Based on the same inventive concept, an embodiment of the present application provides an electronic device, as shown in FIG3 , including a processor 301 and a memory 302. The memory 302 stores a computer program, the processor 301 and the memory 302 communicate via a communication bus, and the processor 301 executes the computer program to implement the steps of the method described above, which will not be described in detail here.

需要说明的是，本申请实施例中的电子设备可以为终端，也可以为服务器。其中，终端可以为智能手机、平板电脑、笔记本电脑、个人电脑(Personal Computer，PC)、智能家居、可穿戴电子设备、VR/AR设备、车载计算机等等。服务器可以为多个异构系统之间的互通服务器或者后台服务器，还可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、以及大数据和人工智能平台等基础云计算服务的云服务器等等。It should be noted that the electronic device in the embodiment of the present application can be a terminal or a server. Among them, the terminal can be a smart phone, a tablet computer, a laptop, a personal computer (PC), a smart home, a wearable electronic device, a VR/AR device, a car computer, etc. The server can be an intercommunication server or a background server between multiple heterogeneous systems, or an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and basic cloud computing services such as big data and artificial intelligence platforms, etc.

处理器301可以是一种集成电路芯片，具有信号处理能力。上述处理器301可以是通用处理器，包括中央处理器(CPU)、网络处理器(NP)等；还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。其可以实现或者执行本申请实施例中公开的各种方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 301 may be an integrated circuit chip with signal processing capabilities. The processor 301 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. It may implement or execute various methods, steps and logic block diagrams disclosed in the embodiments of the present application. A general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc.

存储器302可以包括但不限于随机存取存储器(RAM)，只读存储器(ROM)，可编程只读存储器(PROM)，可擦除只读存储器(EPROM)，电可擦除只读存储器(EEPROM)等。The memory 302 may include, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable read-only memory (EPROM), electrically erasable read-only memory (EEPROM), etc.

本实施例还提供了一种计算机可读存储介质，如软盘、光盘、硬盘、闪存、U盘、SD卡、MMC卡等，在该计算机存储介质中存储有实现上述各个步骤的一个或者多个程序，这一个或者多个程序可被一个或者多个处理器执行，以实现上述实施例一中方法的各步骤，在此不再赘述。This embodiment also provides a computer-readable storage medium, such as a floppy disk, a CD, a hard disk, a flash memory, a USB flash disk, an SD card, an MMC card, etc., in which one or more programs for implementing the above steps are stored. These one or more programs can be executed by one or more processors to implement the steps of the method in the above embodiment 1, which will not be repeated here.

需要说明的是，本实施例中所提供的图示仅以示意方式说明本发明的基本构想，遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制，其实际实施时各组件的型态、数量及比例可为一种随意的改变，且其组件布局型态也可能更为复杂。本说明书所附图式所绘示的结构、比例、大小等，均仅用以配合说明书所揭示的内容，以供熟悉此技术的人士了解与阅读，并非用以限定本发明可实施的限定条件，故不具技术上的实质意义，任何结构的修饰、比例关系的改变或大小的调整，在不影响本发明所能产生的功效及所能达成的目的下，均应仍落在本发明所揭示的技术内容得能涵盖的范围内。同时，本说明书中所引用的如“上”、“下”、“左”、“右”、“中间”及“一”等的用语，亦仅为便于叙述的明了，而非用以限定本发明可实施的范围，其相对关系的改变或调整，在无实质变更技术内容下，当亦视为本发明可实施的范畴。It should be noted that the diagram provided in the present embodiment only illustrates the basic concept of the present invention in a schematic manner, so the diagram only shows the components related to the present invention rather than drawing according to the number, shape and size of the components during actual implementation. The type, quantity and ratio of each component during actual implementation can be a random change, and the component layout type may also be more complicated. The structure, ratio, size, etc. illustrated in the drawings of the present specification are only used to match the content disclosed in the specification for people familiar with this technology to understand and read, and are not used to limit the limiting conditions that the present invention can implement, so they have no technical substantive significance. Any modification of the structure, change of the proportional relationship or adjustment of the size should still fall within the scope of the technical content disclosed by the present invention without affecting the effect that the present invention can produce and the purpose that can be achieved. At the same time, the terms such as "upper", "lower", "left", "right", "middle" and "one" quoted in this specification are only for the convenience of narration, and are not used to limit the scope of the present invention. The change or adjustment of its relative relationship should also be regarded as the scope of the present invention without substantial change of the technical content.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the invention patent. It should be pointed out that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the patent of the present application shall be subject to the attached claims.

Claims

1. A training method for a deep forgery identification model, comprising:

acquiring a sample data set; the sample data set comprises a plurality of sample data, each sample data is marked with label information for indicating whether the sample data is depth counterfeit data, and the sample data set comprises picture sample data, audio sample data and video sample data;

preprocessing each sample data to obtain preprocessed sample data;

extracting characteristics of each piece of preprocessed sample data to obtain sample data characteristics corresponding to each piece of sample data;

training based on the characteristics of the sample data and a preset initial depth counterfeit identification model to obtain a target depth counterfeit identification model.

2. The method of training a deep forgery identification model of claim 1, wherein preprocessing each of the sample data to obtain preprocessed sample data includes:

Performing image normalization processing on each piece of picture sample data to obtain preprocessed picture sample data, performing audio noise reduction processing on each piece of audio sample data to obtain preprocessed audio sample data, and performing key frame extraction on each piece of video sample data to obtain preprocessed video sample data.

3. The method for training a deep forgery identification model of claim 2, wherein the performing feature extraction on each of the preprocessed sample data to obtain a sample data feature corresponding to each of the sample data includes:

Extracting features from the preprocessed picture sample data to obtain corresponding sample visual features, extracting features from the preprocessed audio sample data to obtain corresponding sample auditory features, and extracting features from the preprocessed video sample data to obtain corresponding features combined by the sample visual features and the sample auditory features; the sample visual features include at least one of color features, texture features, and shape features, and the sample auditory features include at least one of pitch features, sound tempo features, and sound intensity features.

4. The training method of a depth counterfeit authentication model of claim 1, wherein said initial depth counterfeit authentication model has a loss function of: wherein, Representing a model predicted value of the ith sample data, y _i representing a true tag value of the ith sample data, N representing a total number of the sample data, and L representing a loss value.

5. A method of depth counterfeit authentication, comprising:

Acquiring data to be authenticated;

Preprocessing the data to be authenticated to obtain preprocessed data to be authenticated;

Extracting the characteristics of the data to be authenticated by preprocessing to obtain the characteristics of the data to be authenticated;

Inputting the data to be authenticated characteristics into the target depth forgery authentication model according to any one of claims 1-4 to obtain a depth forgery score of the data to be authenticated;

And comparing the depth falsification score with a preset depth falsification score threshold value, and obtaining an identification result of the data to be identified according to the comparison result.

6. The depth forgery identification method of claim 5, wherein the data to be identified includes video data to be identified, and the preprocessing of the data to be identified results in preprocessing the data to be identified, including:

extracting key frames of the video data to be identified to obtain video key frame data;

And carrying out image normalization processing and audio noise reduction processing on each video key frame data to obtain the preprocessing data to be authenticated of the data to be authenticated.

7. The method for identifying deep forgery as claimed in claim 5, wherein comparing the deep forgery score with a preset deep forgery score threshold value, and obtaining the identification result of the data to be identified according to the comparison result, comprises:

When the depth falsification score is smaller than a preset first depth falsification score threshold value, judging the data to be authenticated as real data;

And when the depth falsification score is larger than a preset second depth falsification score threshold value, judging that the data to be authenticated is the depth falsification data.

8. The depth-based forgery identification method of claim 7, wherein the second depth-based forgery score threshold is greater than the first depth-based forgery score threshold, the comparing the depth-based forgery score with a preset depth-based forgery score threshold, and obtaining the identification result of the data to be identified according to the comparison result, includes:

and when the depth forgery is larger than the first depth forgery score threshold value and smaller than the second depth forgery score threshold value, judging that the data to be authenticated is true and false data to be authenticated.

9. The depth-forgery-identification method of claim 5, wherein prior to said comparing the depth-forgery score to a preset depth-forgery-score threshold, the method comprises:

determining a source scene of the data to be authenticated;

and determining a depth falsification score threshold corresponding to the source scene of the data to be identified according to the corresponding relation between the preset source scene and the depth falsification score threshold.

10. An electronic device comprising a processor and a memory, the memory having stored therein a computer program, the processor executing the computer program to implement the method of any of claims 1-9.