CN118070107A

CN118070107A - Deep learning-oriented network anomaly detection method, device, storage medium and equipment

Info

Publication number: CN118070107A
Application number: CN202410467852.5A
Authority: CN
Inventors: 尹春勇; 曹儒商
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-04-18
Filing date: 2024-04-18
Publication date: 2024-05-24
Anticipated expiration: 2044-04-18
Also published as: CN118070107B

Abstract

The invention relates to the technical field of anomaly detection, and discloses a network anomaly detection method, device, storage medium and equipment for deep learning, which comprise the following steps: obtaining abnormal flow data to be detected, inputting the abnormal flow data to a pre-trained network abnormal detection model for classification detection, and obtaining a detection result; the training process of the trained network anomaly detection model comprises the following steps: preprocessing the acquired historical network flow data set, and then screening the characteristics of the data to solve the problem of data redundancy; performing balance treatment on the data subjected to the feature screening; constructing a mixed network of a deformable convolutional neural network DCNN and an improved transducer model for realizing network traffic classification; finishing the definition of a network training loss function; and training the hybrid network by using the training set on the basis of course learning to obtain a network anomaly detection model. The beneficial effects of the invention are as follows: the accuracy and the robustness of network anomaly detection are remarkably improved.

Description

A network anomaly detection method, device, storage medium and equipment for deep learning

技术领域Technical Field

本发明涉及一种面向深度学习的网络异常检测方法、装置、存储介质及设备，属于网络异常检测技术领域。The present invention relates to a network anomaly detection method, device, storage medium and equipment for deep learning, belonging to the technical field of network anomaly detection.

背景技术Background technique

网络运行过程中产生的数据是网络状态的直接体现，通过对网络数据特征的提取与分析即可实现对网络状态的实时检测，进而避免网络异常行为带来的损失。然而，由于网络技术日新月异，网络新产品层出不穷，网络流量数据呈几何倍数飞速增长，数据组成也逐渐变得复杂多样，如何从庞大的数据中提取到有效的特征引起研究者的广泛关注。网络异常检测(Network Anomaly Detection，NAD)作为提高网络系统安全性的一种关键技术，与以往的被动防御策略不同，NAD采用主动防御策略，通过对网络行为的实时监测，获取并分析网络系统产生的数据，深入挖掘网络数据特征间隐藏的关联信息，一旦察觉到与异常行为相关的数据就反馈网络管理员，并采取相应的策略应对网络异常行为，最大限度的减少异常行为带来的危害。The data generated during the operation of the network is a direct reflection of the network status. By extracting and analyzing the network data features, the network status can be detected in real time, thereby avoiding the losses caused by abnormal network behavior. However, as network technology is changing with each passing day, new network products emerge in an endless stream, network traffic data is growing exponentially, and the data composition is gradually becoming complex and diverse. How to extract effective features from huge data has attracted widespread attention from researchers. Network Anomaly Detection (NAD) is a key technology to improve the security of network systems. Different from the previous passive defense strategy, NAD adopts an active defense strategy. Through real-time monitoring of network behavior, it obtains and analyzes the data generated by the network system, deeply mines the hidden correlation information between network data features, and once data related to abnormal behavior is detected, it will be fed back to the network administrator, and corresponding strategies will be taken to deal with abnormal network behavior to minimize the harm caused by abnormal behavior.

目前，异常流量检测技术有了很大的发展，但是仍然存在一些问题：At present, abnormal traffic detection technology has made great progress, but there are still some problems:

1)特征冗余问题：特征维数越多，不仅会增加模型的训练时间，还会降低模型的检测效果，影响异常检测的泛化性。虽然主成分分析法(Principal Component Analysis，PCA)对数据降维能达到一个好的效果，但PCA是一种线性降维方法，对于非线性数据的处理能力有限。1) Feature redundancy problem: The more feature dimensions there are, the longer the model training time will be, the lower the model detection effect will be, and the generalization of anomaly detection will be affected. Although principal component analysis (PCA) can achieve a good effect on data dimensionality reduction, PCA is a linear dimensionality reduction method and has limited processing capabilities for nonlinear data.

2)数据不平衡问题：当某一类型的攻击在数据集中表现不足时即不同攻击之间的数据不均衡，生成的模型在检测不常见的攻击类型时表现较差，影响神经网络训练和模型检测的准确性。现有过采样方法，比如常用的合成少数类过采样技术(Synthetic MinorityOver-sampling Technique，SMOTE)，虽然能够解决数据不平衡的问题，但在某些情况下可能会导致过拟合或欠拟合问题。2) Data imbalance problem: When a certain type of attack is underrepresented in the data set, that is, the data between different attacks is unbalanced, the generated model performs poorly in detecting uncommon attack types, affecting the accuracy of neural network training and model detection. Existing oversampling methods, such as the commonly used synthetic minority oversampling technique (SMOTE), can solve the problem of data imbalance, but in some cases may cause overfitting or underfitting problems.

3)特征学习不全面：单独使用卷积神经网络（Convolutional Neural Networks,CNN）可以提取数据的空间特征和局部特征，但却难以捕捉长期依赖关系。3) Incomplete feature learning: Using Convolutional Neural Networks (CNN) alone can extract spatial and local features of data, but it is difficult to capture long-term dependencies.

发明内容Summary of the invention

本发明的目的是提供一种面向深度学习的网络异常检测方法、装置、存储介质及设备，能同时提取数据的时序特征和空间特征，解决了特征提取不全面的问题，达到帮助模型更好地理解和利用数据中的时序和空间关系，从而提高模型的性能和泛化能力的作用。The purpose of the present invention is to provide a network anomaly detection method, device, storage medium and equipment for deep learning, which can simultaneously extract the temporal features and spatial features of data, solve the problem of incomplete feature extraction, and help the model better understand and utilize the temporal and spatial relationships in the data, thereby improving the performance and generalization ability of the model.

为解决上述技术问题，本发明是采用下述技术方案实现的。To solve the above technical problems, the present invention is implemented by adopting the following technical solutions.

第一方面，本发明提供一种面向深度学习的网络异常检测方法，包括：In a first aspect, the present invention provides a network anomaly detection method for deep learning, comprising:

获取待检测的异常流量数据，输入到预先训练好的网络异常检测模型进行分类检测，得到检测结果；Obtain the abnormal traffic data to be detected, input it into the pre-trained network anomaly detection model for classification detection, and obtain the detection results;

所述训练好的网络异常检测模型的训练过程，包括：The training process of the trained network anomaly detection model includes:

将获取的历史网络流量数据集划分为训练集和测试集，并将训练集的一部分划为验证集，对训练集中的数据进行特征筛选；The acquired historical network traffic data set is divided into a training set and a test set, and a part of the training set is divided into a validation set, and the data in the training set is subjected to feature screening;

对进行特征筛选后的数据进行各个类别的样本数量的平衡处理；Balance the number of samples in each category for the data after feature screening;

构建可变形卷积神经网络DCNN和改进的Transformer模型的混合网络模型用于实现网络流量分类；根据所述平衡处理后的数据，并结合所述混合网络模型完成损失函数的定义，在课程学习的基础上使用训练集完成混合网络模型的训练，得到网络异常检测模型。A hybrid network model of a deformable convolutional neural network DCNN and an improved Transformer model is constructed to realize network traffic classification. According to the balanced data and combined with the hybrid network model, the loss function is defined, and the training set is used to complete the training of the hybrid network model based on course learning to obtain a network anomaly detection model.

在一种实施例中，本发明的网络异常检测方法还包括：在得到网络异常检测模型后，使用验证集对所述网络异常检测模型进行验证，验证通过后，使用网络异常检测模型完成异常流量数据的分类检测。验证通过后，将测试集输入到所述网络异常检测模型中，评估网络异常检测模型的性能。In one embodiment, the network anomaly detection method of the present invention further includes: after obtaining the network anomaly detection model, using a validation set to verify the network anomaly detection model, and after the validation is passed, using the network anomaly detection model to complete the classification detection of abnormal traffic data. After the validation is passed, the test set is input into the network anomaly detection model to evaluate the performance of the network anomaly detection model.

在一种实施例中，在将网络流量数据集划分为训练集和验证集前，先对网络流量数据集进行预处理，预处理的方法为采用One-hot编码将标称特征转换为二进制向量从而满足模型对输入格式的要求。然后对数据集进行标准化操作，这样有助于提高模型的收敛速度和精度。In one embodiment, before dividing the network traffic data set into a training set and a validation set, the network traffic data set is preprocessed by using one-hot encoding to convert the nominal features into binary vectors to meet the model's input format requirements. Then, the data set is standardized, which helps to improve the convergence speed and accuracy of the model.

在一种实施例中，网络流量数据集选取的数据集包括但不限于NSL-KDD数据集。当网络流量数据集选取NSL-KDD数据集时，这个数据集分别被划分为训练集KDDTrain和测试集KDDTest，将训练集的一部分划为验证集。In one embodiment, the data set selected by the network traffic data set includes but is not limited to the NSL-KDD data set. When the network traffic data set selects the NSL-KDD data set, the data set is divided into a training set KDDTrain and a test set KDDTest, and a part of the training set is divided into a validation set.

在一种实施例中，使用去噪自编码器(DenoisingAutoencoder，DAE)对训练集中的数据进行特征筛选来解决数据冗余问题。通过训练DAE，我们可以通过对输入数据施加噪声并重构原始数据，学习到数据的关键特征表示，有效地减少特征的维度和数据的冗余，提升模型的效果和计算效率。In one embodiment, a denoising autoencoder (DAE) is used to perform feature screening on the data in the training set to solve the data redundancy problem. By training DAE, we can learn the key feature representation of the data by adding noise to the input data and reconstructing the original data, effectively reducing the dimension of the feature and the redundancy of the data, and improving the effect and computational efficiency of the model.

在一种实施例中，使用自适应合成抽样算法对进行特征筛选后的数据进行各个类别的样本数量的平衡处理，以解决数据不均衡的问题。该算法通过度量不同类别样本之间的不平衡度，针对少数类别样本生成合成新的样本点，从而实现对数据集的平衡调整。这种方法的目标是增加少数类别数据的样本数量，以提高模型在处理不平衡数据时的性能和准确性。在一种实施例中，所述混合网络模型的构建方法为：In one embodiment, an adaptive synthetic sampling algorithm is used to balance the number of samples of each category of the data after feature screening to solve the problem of data imbalance. The algorithm measures the imbalance between samples of different categories and generates synthetic new sample points for minority category samples, thereby achieving balanced adjustment of the data set. The goal of this method is to increase the number of samples of minority category data to improve the performance and accuracy of the model when processing unbalanced data. In one embodiment, the method for constructing the hybrid network model is:

所述可变形卷积神经网络DCNN包括输入层、可变形卷积层、可变形ROI池化层以及Dropout层；The deformable convolutional neural network DCNN includes an input layer, a deformable convolution layer, a deformable ROI pooling layer and a Dropout layer;

所述改进的Transformer模型包括编码器，所述编码器包括Embedding层、多头自注意力层、残差连接和层归一化层以及前馈网络层；The improved Transformer model includes an encoder, wherein the encoder includes an Embedding layer, a multi-head self-attention layer, a residual connection and a layer normalization layer, and a feedforward network layer;

将平衡处理后的数据分别输入可变形卷积神经网络DCNN和改进的Transformer模型中，所述可变形卷积神经网络DCNN和改进的Transformer模型分别对输入的数据进行特征提取并输出，具体如下：The balanced data are respectively input into the deformable convolutional neural network DCNN and the improved Transformer model, and the deformable convolutional neural network DCNN and the improved Transformer model respectively extract features from the input data and output them, as follows:

将平衡处理后的数据输入可变形卷积神经网络DCNN的输入层进行编码，编码后的数据输入至可变形卷积层进行可变性卷积步骤，将可变性卷积步骤处理后的结果输入至可变形ROI池化层中，经过可变形ROI池化层的池化处理后传输至Dropout层，Dropout层舍弃部分神经元后输出可变形卷积神经网络DCNN的模型输出结果；The balanced data is input into the input layer of the deformable convolutional neural network DCNN for encoding, the encoded data is input into the deformable convolutional layer for the variable convolution step, the result after the variable convolution step is input into the deformable ROI pooling layer, and after the pooling process of the deformable ROI pooling layer, it is transmitted to the Dropout layer, and the Dropout layer discards some neurons and then outputs the model output result of the deformable convolutional neural network DCNN;

将平衡处理后的数据输入至改进的Transformer模型的Embedding层进行编码，编码后与预设的位置编码相融合，融合后的编码依次经过多头自注意力层、残差连接和层归一化层以及前馈网络层的处理，在前馈网络层之后再进行残差连接和层归一化的处理，处理后得到改进的Transformer模型输出结果；The balanced data is input into the Embedding layer of the improved Transformer model for encoding, and then fused with the preset position encoding. The fused encoding is processed by the multi-head self-attention layer, residual connection and layer normalization layer, and feedforward network layer in turn. After the feedforward network layer, residual connection and layer normalization are performed again to obtain the output result of the improved Transformer model.

将所述可变形卷积神经网络DCNN和改进的Transformer模型的输出结果融合输入到自注意力模块中再次进行特征提取，将自注意力模块的输出通过全连接层，再利用softmax函数进行分类。The output results of the deformable convolutional neural network DCNN and the improved Transformer model are fused and input into the self-attention module for feature extraction again, and the output of the self-attention module passes through a fully connected layer and is classified using a softmax function.

本发明的可变形卷积神经网络DCNN中可变形卷积层通过可变形卷积操作能够学习适应不同形状和尺度的特征，并应用非线性激活函数来引入非线性性；可变形ROI池化层对输入数据进行空间金字塔池化，从不同尺度的感受野中提取关键特征，可以根据目标的位置和形状进行自适应池化，提高特征的不变性和鲁棒性。将经过特征筛选和数据平衡处理过的数据输入到可变形卷积神经网络DCNN当中，对输入的数据进行特征提取并输出。本发明在传统Transformer模型基础上做出了改进，只保留编码器部分，这样能简化了传统Transformer模型，可以减少模型的计算和内存需求。The deformable convolution layer in the deformable convolutional neural network DCNN of the present invention can learn to adapt to features of different shapes and scales through deformable convolution operations, and introduce nonlinearity by applying nonlinear activation functions; the deformable ROI pooling layer performs spatial pyramid pooling on the input data, extracts key features from receptive fields of different scales, and can perform adaptive pooling according to the position and shape of the target to improve the invariance and robustness of the features. The data processed by feature screening and data balancing is input into the deformable convolutional neural network DCNN, and features of the input data are extracted and output. The present invention makes improvements on the traditional Transformer model, retaining only the encoder part, which can simplify the traditional Transformer model and reduce the calculation and memory requirements of the model.

在一种实施例中，训练所述混合网络模型，得到网络异常检测模型的过程为：In one embodiment, the process of training the hybrid network model to obtain a network anomaly detection model is as follows:

初始化所述混合网络模型的参数；Initializing parameters of the hybrid network model;

将平衡处理过的数据输入到所述混合网络模型当中；Inputting the balanced data into the hybrid network model;

使用定义的所述损失函数计算混合网络模型输出与真实值之间的误差；Calculate the error between the output of the hybrid network model and the true value using the defined loss function;

根据所述损失函数的梯度，使用梯度下降算法，将所述误差反向传播回所述混合网络模型的每一层，逐层调整所述混合网络模型的参数，以减少误差；According to the gradient of the loss function, using a gradient descent algorithm, the error is back-propagated back to each layer of the hybrid network model, and the parameters of the hybrid network model are adjusted layer by layer to reduce the error;

重复以上步骤，设置训练轮次阈值；当训练轮次达到阈值时，停止训练，得到异常检测模型，并将训练好的异常检测模型保存下来，以备后续使用。Repeat the above steps to set the training round threshold; when the training round reaches the threshold, stop training, obtain the anomaly detection model, and save the trained anomaly detection model for subsequent use.

第二方面，本发明提供一种面向深度学习的网络异常检测装置，包括：In a second aspect, the present invention provides a network anomaly detection device for deep learning, comprising:

异常流量数据获取模块，用于获取待检测的异常流量数据，输入到预先训练好的网络异常检测模型进行分类检测，得到检测结果；The abnormal traffic data acquisition module is used to obtain the abnormal traffic data to be detected, input it into the pre-trained network anomaly detection model for classification detection, and obtain the detection result;

特征筛选模块，用于将获取的历史网络流量数据集划分为训练集和测试集，并将训练集的一部分划为验证集，对训练集中的数据进行特征筛选；A feature screening module is used to divide the acquired historical network traffic data set into a training set and a test set, and to divide a part of the training set into a validation set, and to perform feature screening on the data in the training set;

数据不平衡处理模块，用于对进行特征筛选后的数据进行各个类别的样本数量的平衡处理；The data imbalance processing module is used to balance the number of samples in each category of the data after feature screening;

网络异常检测模型构建模块，用于构建可变形卷积神经网络DCNN和改进的Transformer模型的混合网络模型用于实现网络流量分类；根据所述平衡处理后的数据，并结合所述混合网络模型完成损失函数的定义，训练所述混合网络模型，得到网络异常检测模型。The network anomaly detection model construction module is used to construct a hybrid network model of a deformable convolutional neural network DCNN and an improved Transformer model to realize network traffic classification; based on the balanced processed data and in combination with the hybrid network model, the loss function is defined, the hybrid network model is trained, and a network anomaly detection model is obtained.

第三方面，本发明提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时，实现上述的网络异常检测方法的步骤。In a third aspect, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-mentioned network anomaly detection method.

第四方面，本发明提供一种计算机设备，包括：In a fourth aspect, the present invention provides a computer device, comprising:

存储器，用于存储计算机程序；Memory for storing computer programs;

处理器，用于执行所述计算机程序以实现上述的网络异常检测方法的步骤。The processor is used to execute the computer program to implement the steps of the above-mentioned network anomaly detection method.

第五方面，本发明提供一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述的网络异常检测方法的步骤。In a fifth aspect, the present invention provides a computer program product, including a computer program, which implements the steps of the above-mentioned network anomaly detection method when executed by a processor.

与现有技术相比，本发明所达到的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

（1）本发明通过可变形卷积神经网络DCNN和Transformer的融合模型能同时提取数据的时序特征和空间特征，解决了特征提取不全面的问题，DCNN通过卷积操作可以有效地捕捉局部特征和空间关系，而Transformer则通过自注意力机制能够捕捉全局特征和长程依赖关系，该融合模型能够综合考虑时序和空间特征，提高特征表达能力，增强对复杂异常模式的检测能力，并具有自适应特征学习的能力，从而提高网络异常检测模型的性能和鲁棒性。(1) The present invention can simultaneously extract the temporal features and spatial features of data through the fusion model of deformable convolutional neural network (DCNN) and Transformer, thus solving the problem of incomplete feature extraction. DCNN can effectively capture local features and spatial relationships through convolution operations, while Transformer can capture global features and long-range dependencies through the self-attention mechanism. The fusion model can comprehensively consider temporal and spatial features, improve feature expression capabilities, enhance the ability to detect complex abnormal patterns, and has the ability of adaptive feature learning, thereby improving the performance and robustness of the network anomaly detection model.

（2）本发明利用去噪自编码器DAE来算法来对数据进行特征筛选来解决数据冗余问题，DAE可以自动学习数据中的重要特征，将数据映射到更紧凑、更有信息量的表示形式，从而减少冗余信息；DAE可以突出异常特征，并且抑制与异常无关的噪声和冗余特征，这有助于提高异常特征的可辨识性，使得异常检测模型更容易捕捉和区分异常行为，从而提高检测准确性；减少特征的维度还可以降低模型的复杂度，减少模型过拟合的风险，提高模型的泛化能力，显著提升了网络异常检测的精度和鲁棒性。(2) The present invention uses a denoising autoencoder (DAE) algorithm to perform feature screening on data to solve the problem of data redundancy. DAE can automatically learn important features in the data and map the data to a more compact and informative representation, thereby reducing redundant information. DAE can highlight abnormal features and suppress noise and redundant features that are not related to the abnormality, which helps to improve the identifiability of abnormal features and makes it easier for anomaly detection models to capture and distinguish abnormal behaviors, thereby improving detection accuracy. Reducing the dimension of features can also reduce the complexity of the model, reduce the risk of model overfitting, improve the generalization ability of the model, and significantly improve the accuracy and robustness of network anomaly detection.

（3）本发明使用自适应合成抽样算法ADASYN解决数据的不平衡问题，ADASYN算法通过自适应地合成新的少数类样本，可以平衡数据集的分布，使得正常样本和异常样本的数量更加接近，这有助于提高模型对异常样本的识别能力，并减少对正常样本的过度拟合。ADASYN算法在合成新样本时，考虑了样本之间的距离和分布情况。它通过分析少数类样本周围的多数类样本，合成新的少数类样本时会保留原始样本的特征。这使得合成的样本更能够代表真实的异常样本分布，避免了简单的复制或随机合成可能带来的信息损失。(3) The present invention uses the adaptive synthetic sampling algorithm ADASYN to solve the problem of data imbalance. The ADASYN algorithm can balance the distribution of the data set by adaptively synthesizing new minority class samples, making the number of normal samples and abnormal samples closer, which helps to improve the model's ability to identify abnormal samples and reduce overfitting of normal samples. When synthesizing new samples, the ADASYN algorithm takes into account the distance and distribution between samples. It analyzes the majority class samples around the minority class samples and retains the characteristics of the original samples when synthesizing new minority class samples. This makes the synthesized samples more representative of the true abnormal sample distribution and avoids the information loss that may be caused by simple replication or random synthesis.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例中网络异常检测方法的流程图；FIG1 is a flow chart of a network anomaly detection method according to an embodiment of the present invention;

图2为本发明实施例中的混合网络模型结构示意图；FIG2 is a schematic diagram of a hybrid network model structure in an embodiment of the present invention;

图3为本发明实施例中DCNN卷积层原理示意图，FIG3 is a schematic diagram of the principle of a DCNN convolutional layer in an embodiment of the present invention.

图4为本发明实施例中DCNNROI池化层原理示意图；FIG4 is a schematic diagram of the principle of the DCNNROI pooling layer in an embodiment of the present invention;

图5为本发明实施例中可变形卷积采样示意图，其中，图5a为传统CNN卷积步骤中的标准采样位置（卷积范围为标准的矩形，淡灰色点），图5b为本发明DCNN卷积步骤中形变后的采样位置（卷积范围由标准的矩形向外扩展的相同大小的自适应形状，深灰色点）；FIG5 is a schematic diagram of deformable convolution sampling in an embodiment of the present invention, wherein FIG5a is a standard sampling position in a conventional CNN convolution step (the convolution range is a standard rectangle, light gray dots), and FIG5b is a deformed sampling position in a DCNN convolution step of the present invention (the convolution range is an adaptive shape of the same size that is extended outward from a standard rectangle, dark gray dots);

图6为本发明实施例中改进的Transformer模型的模块结构示意图。FIG6 is a schematic diagram of the module structure of the improved Transformer model in an embodiment of the present invention.

具体实施方式Detailed ways

下面通过附图以及具体实施例对本发明技术方案做详细地说明，应当理解本发明实施例以及实施例中的具体特征是对本发明技术方案的详细的说明，而不是对本发明技术方案的限定，在不冲突的情况下，本发明实施例以及实施例中的技术特征可以相互组合。The technical solution of the present invention is described in detail below through the accompanying drawings and specific embodiments. It should be understood that the embodiments of the present invention and the specific features in the embodiments are detailed descriptions of the technical solution of the present invention, rather than limitations on the technical solution of the present invention. The embodiments of the present invention and the technical features in the embodiments may be combined with each other unless there is a conflict.

术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，字符"/"，一般表示前后关联对象是一种“或”的关系。The term "and/or" is only a way to describe the association relationship of related objects, indicating that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" generally indicates that the related objects are in an "or" relationship.

实施例1Example 1

图1为本发明公开的网络异常检测方法的一个实施例的示意图，图1的实施例具体可由电脑、网络流量监测设备、安全信息与事件管理（SIEM）系统以及物联网（IoT）网关和平台执行，可以应用到以下场景中：1.在云计算环境中，网络异常检测可以监测和保护云服务、虚拟机和容器等资源的安全。它可以帮助识别异常的网络流量、异常的用户行为或异常的应用程序活动，以防止未经授权的访问或攻击。2.在物联网中，各种设备和传感器通过网络进行通信和数据交换。网络异常检测可以帮助检测异常设备行为、异常传感器数据或网络攻击活动，从而保护物联网系统的安全和稳定运行。FIG1 is a schematic diagram of an embodiment of a network anomaly detection method disclosed in the present invention. The embodiment of FIG1 can be specifically executed by a computer, a network traffic monitoring device, a security information and event management (SIEM) system, and an Internet of Things (IoT) gateway and platform, and can be applied to the following scenarios: 1. In a cloud computing environment, network anomaly detection can monitor and protect the security of resources such as cloud services, virtual machines, and containers. It can help identify abnormal network traffic, abnormal user behavior, or abnormal application activity to prevent unauthorized access or attacks. 2. In the Internet of Things, various devices and sensors communicate and exchange data through the network. Network anomaly detection can help detect abnormal device behavior, abnormal sensor data, or network attack activities, thereby protecting the security and stable operation of the IoT system.

如图1所示，本实施例的步骤S1：将网络流量数据集划分为训练集和测试集，并将训练集的一部分划为验证集，对训练集中的数据进行特征筛选。As shown in FIG. 1 , step S1 of this embodiment is: dividing the network traffic data set into a training set and a test set, and dividing a part of the training set into a validation set, and performing feature screening on the data in the training set.

在本实施例的一种具体实施方式中，网络流量数据集选取的数据集包括但不限于NSL-KDD数据集。当网络流量数据集选取NSL-KDD数据集，这个数据集分别被划分为训练集KDDTrain和测试集KDDTest，将训练集的一部分划为验证集。In a specific implementation of this embodiment, the data set selected from the network traffic data set includes but is not limited to the NSL-KDD data set. When the network traffic data set selects the NSL-KDD data set, the data set is divided into a training set KDDTrain and a test set KDDTest, and a part of the training set is divided into a validation set.

步骤S11：将网络流量数据集划分为训练集和测试集，并将训练集的一部分划为验证集。Step S11: Divide the network traffic data set into a training set and a test set, and divide a part of the training set into a validation set.

将NSL-KDD数据集按照5:1的比例划分为训练集KDDTrain和测试集KDDTest，抽取训练集的20%作为验证集。The NSL-KDD dataset is divided into a training set KDDTrain and a test set KDDTest in a ratio of 5:1, and 20% of the training set is extracted as a validation set.

步骤S12：对训练集中的数据进行特征筛选。Step S12: Perform feature screening on the data in the training set.

特征筛选是异常流量检测中的一个重要步骤，通过特征筛选可以去除冗余和无关的特征，从而减少噪声和干扰，降低数据维度。通过从大的特征集中选择最优的特征子集，以降低计算复杂度，提高模型的准确性和泛化能力。因此选用带加权损失函数的去噪自编码器DAE对训练集进行特征筛选, DAE通过引入噪声能够提取出更有意义的特征。与主成分分析方法相比DAE是一种非线性特征提取方法，能够更好地捕捉数据中的非线性关系和复杂模式。带加权损失函数的DAE通过利用加权损失函数对正常流量和异常流量分配不同的权重，从而诱导DAE在训练过程中更加注重攻击样本的重构，使得选择结果有利于异常检测性能的提高。Feature screening is an important step in abnormal traffic detection. Through feature screening, redundant and irrelevant features can be removed, thereby reducing noise and interference and reducing data dimensions. By selecting the optimal feature subset from a large feature set, the computational complexity can be reduced and the accuracy and generalization ability of the model can be improved. Therefore, a denoising autoencoder DAE with a weighted loss function is selected to perform feature screening on the training set. DAE can extract more meaningful features by introducing noise. Compared with the principal component analysis method, DAE is a nonlinear feature extraction method that can better capture nonlinear relationships and complex patterns in the data. DAE with a weighted loss function assigns different weights to normal traffic and abnormal traffic by using a weighted loss function, thereby inducing DAE to pay more attention to the reconstruction of attack samples during the training process, so that the selection result is conducive to improving the performance of anomaly detection.

计算每个特征在编码器权重矩阵中的行向量的L2范数作为该特征的权重。DAE的损失函数为：Calculate the L2 norm of the row vector of each feature in the encoder weight matrix as the weight of the feature. for:

(1) (1)

其中，m是样本个数；是由均方差 MSE 计算的重构误差；/>是L2正则化项带来的损失；/>为权重矩阵的转置，通过将权重矩阵结合到均方差损失中得到最终的损失函数。Where, m is the number of samples; is the reconstruction error calculated by the mean square error MSE; /> is the loss caused by the L2 regularization term; /> is the transpose of the weight matrix, and the final loss function is obtained by combining the weight matrix into the mean square error loss.

如图1所示，本实施例的步骤S2：对进行特征筛选后的数据进行各个类别的样本数量的平衡处理。As shown in FIG. 1 , step S2 of this embodiment is: performing a balancing process on the number of samples of each category on the data after feature screening.

使用自适应合成抽样算法(Adaptive Synthetic Sampling,ADASYN)来解决数据的不平衡问题。该算法通过度量不同类别样本之间的不平衡度，针对少数类别样本生成合成新的样本点，从而实现对数据集的平衡调整。这种方法的目标是增加少数类别数据的样本数量，以提高模型在处理不平衡数据时的性能和准确性。ADASYN算法过程如下：Adaptive Synthetic Sampling (ADASYN) is used to solve the problem of data imbalance. The algorithm measures the imbalance between samples of different categories and generates new synthetic sample points for minority category samples to achieve balanced adjustment of the data set. The goal of this method is to increase the number of samples of minority category data to improve the performance and accuracy of the model when processing unbalanced data. The ADASYN algorithm process is as follows:

步骤S21，计算类别不平衡程度d：Step S21, calculate the category imbalance degree d :

(2) (2)

其中，和/>分别表示为少数类样本的数量和多数类样本的数量，/>≤/>，，m是样本个数。in, and/> Respectively represent the number of minority class samples and the number of majority class samples,/> ≤/> , , m is the number of samples.

步骤S22，如果d＜，/>是类不平衡比例最大程度的预设阈值，则：Step S22: if d < ,/> is the preset threshold of the maximum class imbalance ratio, then:

步骤S221，计算需要为少数类生成的合成数据示例的总数量G：Step S221, calculate the total number G of synthetic data examples that need to be generated for the minority class:

(3) (3)

其中，是一个参数，用于指定生成合成数据后所需的平衡水平。/>意味着在泛化过程之后创建了完全平衡的数据集。in, is a parameter that specifies the desired level of balance after generating synthetic data. /> Meaning a completely balanced dataset is created after the generalization process.

步骤S222，对于每个少数类中的样本，根据N维空间中的欧氏距离找到K个最近邻，并计算比率/>：Step S222, for each sample in the minority class , find the K nearest neighbors based on the Euclidean distance in N-dimensional space and calculate the ratio/> :

(4) (4)

其中，为/>的K个最近邻中属于多数类的示例数量，/>，/>表示/>个少数类样本。in, For/> The number of examples belonging to the majority class among the K nearest neighbors of ,/> Indicates/> minority class samples.

步骤S223，对进行标准化处理，可得每个类别的占比/>。Step S223: After standardization, the proportion of each category can be obtained/> .

(5) (5)

是一个密度分布，/>。 is a density distribution, /> .

步骤S224，计算需要为每个少数类中的样本生成的合成数据示例的数量/>。Step S224, calculate the samples required for each minority class The number of synthetic data examples generated/> .

(6) (6)

其中，G是需要为等式(3)中定义的少数类生成的合成数据示例的总数量。where G is the total number of synthetic data examples that need to be generated for the minority class defined in Equation (3).

步骤S225，对于每个少数类中的样本，按照以下步骤生成合成数据样本。Step S225, for each sample in the minority class , follow the steps below to generate synthetic data samples.

1)从每个少数类中的样本的K个最近邻中随机选择一个少数数据示例/>。1) From each minority class sample Randomly select a few data examples from the K nearest neighbors of /> .

2)生成合成数据示例：2) Generate synthetic data examples :

(7) (7)

其中，是N维空间中的差分向量；/>是一个随机数，/>。从1到/>循环直到满足需要合成的数目为止。in, is the difference vector in N-dimensional space;/> is a random number, /> . From 1 to /> Repeat the process until the required number of synthesized particles is reached.

平衡的数据集能够更准确地反映不同类别之间的关系，有助于混合网络模型更好地学习和分类。通过数据扩充，我们获得了更多的样本数据，可以提高网络异常检测模型的泛化能力和鲁棒性。最后将特征进行归一化，以便于收敛。A balanced dataset can more accurately reflect the relationship between different categories, which helps the hybrid network model to learn and classify better. Through data augmentation, we obtain more sample data, which can improve the generalization ability and robustness of the network anomaly detection model. Finally, the features are normalized to facilitate convergence.

如图1和图2所示，本实施例的步骤S3：构建可变形卷积神经网络DCNN和改进的Transformer模型的混合网络模型用于实现网络流量分类；根据所述平衡处理后的数据，并结合所述混合网络模型完成损失函数的定义，在课程学习的基础上使用训练集完成混合网络模型的训练，得到网络异常检测模型，使用所述网络异常检测模型对异常流量数据进行分类检测，得到检测结果。As shown in Figures 1 and 2, step S3 of this embodiment: construct a hybrid network model of a deformable convolutional neural network DCNN and an improved Transformer model to realize network traffic classification; based on the balanced data, and in combination with the hybrid network model, complete the definition of the loss function, use the training set to complete the training of the hybrid network model based on course learning, obtain a network anomaly detection model, and use the network anomaly detection model to classify and detect abnormal traffic data to obtain detection results.

可变形卷积神经网络DCNN包括输入层、可变形卷积层、可变形ROI池化层以及Dropout层。其中可变形卷积层有两层，可变形池化层有一层，可变形卷积神经网络DCNN由两层堆叠，其模块结构如图2所示。DCNN在CNN的基础上改进了两个关键模块,原理如图3和图4所示。DCNN在传统的卷积步骤中添加了新的卷积层来提取卷积范围的几何变形偏移。改变形状后，范围不再是标准的矩形，而是向外扩展的相同大小的自适应形状，如图5所示。可变形ROI池化层在标准池化层的基础上增加了一个额外的池化层来提取并生成偏移矩阵。DCNN在传统的卷积层和池化层中分别引入偏移矩阵，使模型能够自适应地提取更有价值的特征信息，极大地提高了CNN的局部特征提取能力。The deformable convolutional neural network DCNN includes an input layer, a deformable convolutional layer, a deformable ROI pooling layer, and a Dropout layer. There are two deformable convolutional layers and one deformable pooling layer. The deformable convolutional neural network DCNN is stacked by two layers, and its module structure is shown in Figure 2. DCNN improves two key modules on the basis of CNN, and the principles are shown in Figures 3 and 4. DCNN adds a new convolutional layer to the traditional convolution step to extract the geometric deformation offset of the convolution range. After the shape is changed, the range is no longer a standard rectangle, but an adaptive shape of the same size that expands outward, as shown in Figure 5. The deformable ROI pooling layer adds an additional pooling layer on the basis of the standard pooling layer to extract and generate the offset matrix. DCNN introduces the offset matrix in the traditional convolutional layer and pooling layer respectively, so that the model can adaptively extract more valuable feature information, greatly improving the local feature extraction ability of CNN.

需要说明的是：图2中随机丢弃层为Dropout层，全连接层为FC，transformer表示一种transformer模型，是一种神经网络模型，softmax表示Softmax函数。图4中RoI表示感兴趣区域，是在卷积神经网络中使用的一种操作。It should be noted that in Figure 2, the random dropout layer is the Dropout layer, the fully connected layer is the FC, transformer represents a transformer model, which is a neural network model, and softmax represents the Softmax function. In Figure 4, RoI represents the region of interest, which is an operation used in convolutional neural networks.

步骤S3的具体步骤如下：The specific steps of step S3 are as follows:

步骤S31，将经过步骤S1和步骤S2处理过的数据输入到可变形卷积神经网络当中对数据进行特征提取。Step S31, input the data processed by steps S1 and S2 into a deformable convolutional neural network to extract features from the data.

在本实例当中，我们将卷积核设置为一维卷积核，窗口大小为3，步长为1。可变形卷积层通过引入可学习的偏移量来改变卷积操作的采样网格，使其能够根据输入的局部特征动态地调整形状和位置。具体来说，可变形卷积在每个位置使用一个偏移量预测网络，该网络学习生成一个偏移量场，用于调整卷积核在输入上的采样位置。通过这种方式，可变形卷积能够自适应地适应目标的形变和姿态变化，并对不规则形状的目标进行更准确的建模。改变形状后，范围不再是标准的矩形，而是向外扩展的相同大小的自适应形状(见图5）。迁移矩阵定义了卷积核的范围的大小和范围。偏移采样点距离关键信息点越近，提取的特征越有价值。可变形卷积过程可以表示为：In this example, we set the convolution kernel to a one-dimensional convolution kernel with a window size of 3 and a stride of 1. The deformable convolution layer changes the sampling grid of the convolution operation by introducing a learnable offset, enabling it to dynamically adjust the shape and position according to the local features of the input. Specifically, the deformable convolution uses an offset prediction network at each position, which learns to generate an offset field to adjust the sampling position of the convolution kernel on the input. In this way, the deformable convolution can adaptively adapt to the deformation and posture changes of the target and model irregularly shaped targets more accurately. After changing the shape, the range is no longer a standard rectangle, but an adaptive shape of the same size that expands outward (see Figure 5). The migration matrix defines the size and range of the range of the convolution kernel. The closer the offset sampling point is to the key information point, the more valuable the extracted features are. The deformable convolution process can be expressed as:

(8) (8)

其中，表示点在输入特征图上的位置，/>是点在卷积核的范围/>中的位置，/>是的偏移值，n表示表示不同的位置索引。/>是每个输入/>经过偏移矩阵后的输出特征；是对应采样位置的权重；/>是离散函数；偏移值/>不是实际点。因此，我们使用双线性差分法在一维特征图中计算特征的离散位置的值。/>定义如下：in, Indicates the position of the point on the input feature map, /> Is the point in the range of the convolution kernel/> Position in, /> yes The offset value, n represents different position indexes. /> Each input /> Output features after the offset matrix; is the weight corresponding to the sampling position; /> is a discrete function; the offset value/> Not actual points. Therefore, we use bilinear difference to calculate the value of the feature at discrete locations in the one-dimensional feature map. /> The definition is as follows:

(9) (9)

(10) (10)

其中，q是输入特征图上的所有整数位置，p表示任意（分数）位置。Here, q represents all integer positions on the input feature map and p represents any (fractional) position.

将可变性卷积层的输出输入到可变形ROI池化层中，可变形池化层的窗口大小为2，步长为1。可变形ROI池化层在标准池化层的基础上增加一个额外的池化层来提取并生成偏移矩阵，然后偏移矩阵和标准池化层共同作用得到可变形ROI特征汇总后的信息。可变形ROI池化层可以表示如下:The output of the deformable convolutional layer is input into the deformable ROI pooling layer, and the window size of the deformable pooling layer is 2 and the stride is 1. The deformable ROI pooling layer adds an additional pooling layer on the basis of the standard pooling layer to extract and generate the offset matrix, and then the offset matrix and the standard pooling layer work together to obtain the information after the deformable ROI feature is summarized. The deformable ROI pooling layer can be expressed as follows:

(11) (11)

其中，表示范围内的点数。获取/>的过程如下。首先，ROI池化生成池化特征图。FC层根据映射生成归一化偏移量/>。然而，/>的大小与每个范围/>的大小并不相同，因此我们使用以下公式来计算:in, Indicates the number of points in the range. Get /> The process is as follows. First, ROI pooling generates a pooled feature map. The FC layer generates a normalized offset based on the mapping. However,/> The size of each range /> The sizes are not the same, so we use the following formula to calculate:

(12) (12)

其中，是增益标量，默认/>；◦表示矩阵中元素的乘积。/>表示二维偏移量。输入特征图在生成的偏移矩阵和池化层的作用下输出特征图，实现特征提取和降维。in, is the gain scalar, default/> ;◦ represents the product of the elements in the matrix. /> Represents a two-dimensional offset. The input feature map outputs a feature map under the action of the generated offset matrix and the pooling layer to achieve feature extraction and dimensionality reduction.

将经过可变形ROI池化的数据输入到Dropout层中。Dropout会随机丢弃一部分神经元。Dropout通过随机丢弃神经元，减少了神经元之间的相关性，增强了网络的泛化能力和鲁棒性，抑制了过拟合和共适应现象，提高了网络的学习效果和稳定性。The data after deformable ROI pooling is input into the Dropout layer. Dropout randomly discards some neurons. By randomly discarding neurons, Dropout reduces the correlation between neurons, enhances the generalization ability and robustness of the network, suppresses overfitting and co-adaptation, and improves the learning effect and stability of the network.

步骤S32，将经过步骤S1和步骤S2处理过的数据输入到改进的Transformer模型当中对数据进行特征提取。Step S32, input the data processed by steps S1 and S2 into the improved Transformer model to extract features from the data.

在本实例中改进的Transformer模型结构示意图如图6所示，传统的Transformer模型包括编码器和解码器，而本发明对Transformer模型做出了改进，改进的Transformer模型只包括编码器，这样能简化模型，可以减少模型的计算和内存需求。编码器模块包括Embedding层、多头自注意力层、残差连接和层归一化层以及前馈网络层。In this example, the schematic diagram of the improved Transformer model structure is shown in Figure 6. The traditional Transformer model includes an encoder and a decoder, while the present invention improves the Transformer model. The improved Transformer model only includes an encoder, which can simplify the model and reduce the calculation and memory requirements of the model. The encoder module includes an Embedding layer, a multi-head self-attention layer, a residual connection and a layer normalization layer, and a feedforward network layer.

在Transformer中，除了输入嵌入向量表示输入数据的语义信息外，还需要使用位置编码来表示数据在序列中的位置信息。位置编码用PE表示，PE的维度与输入嵌入向量是相同的。在Transformer中，采用了一种基于公式计算的位置编码方法。计算公式如下：In Transformer, in addition to the input embedding vector representing the semantic information of the input data, position encoding is also required to represent the position information of the data in the sequence. Position encoding is represented by PE , and the dimension of PE is the same as the input embedding vector. In Transformer, a position encoding method based on formula calculation is adopted. The calculation formula is as follows:

(13) (13)

(14) (14)

其中，pos表示单词在序列中的位置，表示PE的维度，2t表示偶数的维度，2t+1表示奇数维度(即/>,/>)。任意位置的PE _(pos+k)(k表示位置的偏移量)都可以被PE _(pos)的线性函数表示：Among them, pos represents the position of the word in the sequence, represents the dimension of PE , 2t represents the even dimension, and 2t+1 represents the odd dimension (i.e./> ,/> ). PE _{( pos + k )} at any position (k represents the offset of the position) can be represented by a linear function of PE _{( pos )} :

(15) (15)

(16) (16)

其中，表示当前位置的索引，/>表示是位置编码的参数，用于控制位置编码的频率和相位。in, Indicates the index of the current position, /> It is the parameter of position encoding, which is used to control the frequency and phase of position encoding.

在Transformer模型中，位置编码与输入嵌入向量逐元素相加，以获得包含位置信息的最终输入。这样，Transformer模型在进行自注意力计算时，不仅能够考虑数据的语义信息，还能同时考虑其在序列中的位置信息，从而更好地捕捉序列中的时序关系。需要注意的是，位置编码是固定的，不会随着模型的训练而更新。它的作用是为了引入位置信息，而不引入额外的模型参数。In the Transformer model, the position encoding is added element by element to the input embedding vector to obtain the final input containing the position information. In this way, when performing self-attention calculations, the Transformer model can not only consider the semantic information of the data, but also its position information in the sequence, thereby better capturing the temporal relationship in the sequence. It should be noted that the position encoding is fixed and will not be updated as the model is trained. Its role is to introduce position information without introducing additional model parameters.

Transformer模型是以自注意力机制为核心构建的，计算的时候需要用到矩阵Q(查询),K(键),V(值)。自注意力机制接收的是输入(数据的表示向量x组成的矩阵X)或者上一个Encoder的输出。而Q,K,V正是通过自注意力机制的输入进行线性变换得到的。自注意力机制的输入用矩阵X进行表示，则可以使用线性变阵矩阵W_qQ,W_kK,W_vV计算得到Q,K,V。得到矩阵Q,K,V之后就可以计算出自注意力机制的输出了，计算的公式如下:The Transformer model is built around the self-attention mechanism, which requires matrices Q (query), K (key), and V (value) for calculations. The self-attention mechanism receives input (matrix X composed of the data representation vector x) or the output of the previous Encoder. Q, K, and V are obtained by linearly transforming the input of the self-attention mechanism. The input of the self-attention mechanism is represented by the matrix X, and the linear transformation matrix W _q Q, W _k K, and W _v V can be used to calculate Q, K, and V. After obtaining the matrices Q, K, and V, the output of the self-attention mechanism can be calculated. The calculation formula is as follows:

(17) (17)

其中，d _s是Q，K矩阵的列数，即向量维度。Among them, ds is the number _of columns of the Q, K matrices, that is, the vector dimension.

通过Q(查询)矩阵和K(键)矩阵的乘积，可以计算当前位置特征与其他位置特征之间的相关性。相关性越高，乘积结果的数值就会越大。将乘积的结果除以向量为度的平方根以确保度不会过大或过小保持稳定。然后对结构进行softmax处理，计算以计算当前位置特征与其他位置特征的相关性重要性。softmax操作可以将相关性分数转化为概率分布。将softmax计算结果与V(值)矩阵相乘，以给予相关性较高的特征更大的权重，同时减弱不相关的信息。这个乘积结果将作为自注意力机制的输出，用于继续进行后续的计算和处理。The correlation between the current position feature and the features at other positions can be calculated by multiplying the Q (query) matrix and the K (key) matrix. The higher the correlation, the larger the value of the product result. Divide the result of the product by the square root of the degree of the vector to ensure that the degree is not too large or too small to remain stable. Then perform softmax processing on the structure to calculate the importance of the correlation between the current position feature and the features at other positions. The softmax operation can convert the correlation score into a probability distribution. Multiply the softmax calculation result with the V (value) matrix to give greater weight to features with higher correlation and weaken irrelevant information. This product result will be used as the output of the self-attention mechanism for subsequent calculations and processing.

多头自注意力机制包含多个自注意力层，首先将输入X分别传递到个不同的自注意力机制中，计算得到/>个输出矩阵Z。多头自注意力机制将它们拼接在一起，然后传入一个Linear层，得到多头自注意力机制最终的输出Z。多头自注意力机制输出的矩阵Z与其输入的矩阵X的维度是一样的。整个计算过程可表示为：The multi-head self-attention mechanism consists of multiple self-attention layers. First, the input X is passed to In different self-attention mechanisms, it is calculated that /> The multi-head self-attention mechanism concatenates them together and then passes them into a Linear layer to obtain the final output Z of the multi-head self-attention mechanism. The dimension of the matrix Z output by the multi-head self-attention mechanism is the same as its input matrix X. The entire calculation process can be expressed as:

(18) (18)

(19) (19)

其中，,/>是附加权重矩阵，/>是矩阵拼接方法，/>表示嵌入维度/>和/>表示投影子空间的隐藏维度。in, ,/> is the additional weight matrix, /> is the matrix splicing method, /> Represents the embedding dimension/> and/> represents the hidden dimension of the projected subspace.

残差连接是指将输入直接添加到某个层的输出上，以便模型能够学习到残差信息。在Transformer中，每个子层都包含一个残差连接，用于将输入与子层的输出相加，表示子层的输出由输入和子层的变换共同构成。这种设计有助于缓解深度网络中的梯度消失和梯度爆炸问题。层归一化是一种用于规范化神经网络每一层输入的归一化方法。通过层归一化，可以提升模型的训练稳定性和泛化性能。计算方式如下：Residual connection refers to adding the input directly to the output of a layer so that the model can learn the residual information. In Transformer, each sublayer contains a residual connection to add the input to the output of the sublayer, indicating that the output of the sublayer is composed of the input and the transformation of the sublayer. This design helps to alleviate the gradient vanishing and gradient exploding problems in deep networks. Layer normalization is a normalization method used to normalize the input of each layer of a neural network. Through layer normalization, the training stability and generalization performance of the model can be improved. The calculation method is as follows:

(20) (20)

(21) (twenty one)

(22) (twenty two)

其中，和/>分别是X每个元素沿最后一维的均值和标准差，/>和/>是可学习的参数，表示按元素相乘，/>是一个常数，其作用是让数值稳定，c表示元素总个数/>，j表示第j个元素。in, and/> are the mean and standard deviation of each element of X along the last dimension, /> and/> is a learnable parameter, Indicates element-wise multiplication, /> is a constant, its function is to stabilize the value, c represents the total number of elements/> , j represents the jth element.

前馈网络就是一个全连接前馈网络，每个位置的数据都单独经过这个完全相同的前馈神经网络。前馈网络被用于对每个位置的隐藏层进行非线性的变换，将多头注意力层的输出映射到另一个维度空间，从而提升模型的表达能力。前馈网络由两个全连接层组成，第一个全连接层的激活函数为 ReLU 激活函数，第二个全连接层不使用激活函数，可以表示为：The feedforward network is a fully connected feedforward network. The data at each position passes through this identical feedforward neural network separately. The feedforward network is used to perform nonlinear transformations on the hidden layer at each position, mapping the output of the multi-head attention layer to another dimensional space, thereby improving the expressiveness of the model. The feedforward network consists of two fully connected layers. The activation function of the first fully connected layer is the ReLU activation function, and the second fully connected layer does not use an activation function, which can be expressed as:

(23) (twenty three)

其中,是激活函数，W ₁和W ₂是权重参数，b ₁和b ₂是偏置参数。in, is the activation function, W1 _and W2 _are weight parameters, _and b1 _and b2 are bias parameters.

步骤S33，将可变形卷积神经网络DCNN的输出和改进的Transformer模型的输出融合输入到自注意力模块中，其过程如图2所示。通过这种机制，混合网络模型可以自适应地调整连接权重和特征的重要性，使得对任务关键的特征得到更高的权重，从而提高网络异常检测模型的精度和性能。将自注意力模块的输出通过一层全连接层，在经过softmax函数进行分类，得到分类结果。Step S33, the output of the deformable convolutional neural network DCNN and the output of the improved Transformer model are fused and input into the self-attention module, and the process is shown in Figure 2. Through this mechanism, the hybrid network model can adaptively adjust the connection weights and the importance of features, so that the features that are critical to the task get higher weights, thereby improving the accuracy and performance of the network anomaly detection model. The output of the self-attention module passes through a fully connected layer and is classified by the softmax function to obtain the classification result.

步骤S34，结合由步骤S1和步骤S2处理过的数据集，并结合混合网络模型的规模完成损失函数的定义，损失函数为多分类常用的损失函数交叉熵函数，计算式如下：Step S34, combining the data sets processed by step S1 and step S2, and combining the scale of the hybrid network model to complete the definition of the loss function, the loss function is a cross entropy function commonly used in multi-classification, and the calculation formula is as follows:

(24) (twenty four)

其中，是真实标签，/>是预测的类分布，/>表示第/>个样本，z是样本的总数量。in, is the true label, /> is the predicted class distribution, /> Indicates the first/> samples, z is the total number of samples.

步骤S35，混合网络模型的训练过程如下：Step S35, the training process of the hybrid network model is as follows:

步骤S351，初始化模型参数：初始设置可变形卷积神经网络和改进的Transformer模型的参数，以及全连接层的权重和偏置等。Step S351, initializing model parameters: initially setting the parameters of the deformable convolutional neural network and the improved Transformer model, as well as the weights and biases of the fully connected layer.

步骤S352，输入数据：将数据集中的训练样本输入到混合网络模型中，获取混合网络模型的输出。Step S352, input data: input the training samples in the data set into the hybrid network model to obtain the output of the hybrid network model.

步骤S353，计算损失函数：使用所定义的损失函数计算混合网络模型输出和真实值之间的误差。Step S353, calculating the loss function: using the defined loss function to calculate the error between the output of the hybrid network model and the true value.

步骤S354，反向传播和参数更新：利用梯度下降法或其他优化算法，根据损失函数的梯度，逐层传播误差并调整混合网络模型的参数，以减小真实值和模型输出之间的误差。Step S354, back propagation and parameter update: using the gradient descent method or other optimization algorithms, according to the gradient of the loss function, the error is propagated layer by layer and the parameters of the hybrid network model are adjusted to reduce the error between the true value and the model output.

步骤S355，重复训练步骤：循环执行步骤S352至步骤S354，将样本输入混合网络模型，计算误差，进行反向传播并更新参数。通过迭代优化，混合网络模型逐渐学习并调整自身以更好地拟合训练数据。Step S355, repeat the training steps: loop through steps S352 to S354, input the sample into the hybrid network model, calculate the error, perform back propagation and update the parameters. Through iterative optimization, the hybrid network model gradually learns and adjusts itself to better fit the training data.

步骤S356，判断停止条件：监测训练过程中的指标并设定停止条件。一旦满足停止条件，即达到预设的训练轮次阈值，便停止训练，得到网络异常检测模型，并保存训练好的网络异常检测模型。Step S356, determine the stop condition: monitor the indicators during the training process and set the stop condition. Once the stop condition is met, that is, the preset training round threshold is reached, the training is stopped, the network anomaly detection model is obtained, and the trained network anomaly detection model is saved.

如图1所示，本实施例的步骤S4：使用验证集对所述网络异常检测模型进行验证，验证通过后，使用网络异常检测模型完成异常流量数据的分类检测。As shown in FIG. 1 , step S4 of this embodiment: using a verification set to verify the network anomaly detection model, after the verification is passed, using the network anomaly detection model to complete classification detection of abnormal traffic data.

具体步骤如下：Specific steps are as follows:

步骤S41：将步骤S1划分的验证集输入到训练过后的网络异常检测模型当中进行验证。Step S41: input the verification set divided in step S1 into the trained network anomaly detection model for verification.

步骤S42：将测试集输入到所述网络异常检测模型中，评估网络异常检测模型的性能。Step S42: input the test set into the network anomaly detection model to evaluate the performance of the network anomaly detection model.

验证通过以后，将测试集KDDTest输入保存的网络异常检测模型进行测试，我们将记录测试结果，测试结果如表1所示，以便后续进行进一步分析、评估和改进模型的性能。这个过程是评估模型的关键步骤，可以帮助我们验证模型的鲁棒性和可靠性，并为实际应用中的异常检测问题提供有价值的指导。After verification, the test set KDDTest is input into the saved network anomaly detection model for testing. We will record the test results, as shown in Table 1, for further analysis, evaluation, and improvement of the model's performance. This process is a key step in evaluating the model, which can help us verify the robustness and reliability of the model and provide valuable guidance for anomaly detection problems in practical applications.

综上，通过本发明网络异常检测模型，我们能够有效地进行网络异常检测，提升了检测的精度和鲁棒性。这为网络安全领域提供了重要的技术支持，帮助我们更好地保护网络系统免受恶意攻击和异常行为的影响。In summary, through the network anomaly detection model of the present invention, we can effectively perform network anomaly detection and improve the accuracy and robustness of detection. This provides important technical support for the field of network security and helps us better protect network systems from malicious attacks and abnormal behaviors.

表1：测试结果Table 1: Test results

。 .

实施例2Example 2

与实施例1基于相同的发明构思，本实施例介绍一种面向深度学习的网络异常检测装置，包括：Based on the same inventive concept as Example 1, this embodiment introduces a network anomaly detection device for deep learning, including:

进一步地，本发明的网络异常检测装置还包括验证模块，用于在得到网络异常检测模型后，使用验证集对所述网络异常检测模型进行验证，验证通过后，将测试集输入到所述网络异常检测模型中，评估网络异常检测模型的性能。Furthermore, the network anomaly detection device of the present invention also includes a verification module, which is used to verify the network anomaly detection model using a verification set after obtaining the network anomaly detection model. After the verification is passed, the test set is input into the network anomaly detection model to evaluate the performance of the network anomaly detection model.

实施例3Example 3

与其它实施例基于相同的发明构思，本实施例介绍一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时，实现上述的网络异常检测方法的步骤。Based on the same inventive concept as other embodiments, this embodiment introduces a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the above-mentioned network anomaly detection method are implemented.

实施例4Example 4

与其它实施例基于相同的发明构思，本实施例介绍一种计算机设备，包括：存储器，用于存储计算机程序；处理器，用于执行所述计算机程序以实现上述的网络异常检测方法的步骤。Based on the same inventive concept as other embodiments, this embodiment introduces a computer device, including: a memory for storing a computer program; a processor for executing the computer program to implement the steps of the above-mentioned network anomaly detection method.

实施例5Example 5

与其它实施例基于相同的发明构思，本实施例介绍一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述的网络异常检测方法的步骤。Based on the same inventive concept as other embodiments, this embodiment introduces a computer program product, including a computer program, which implements the steps of the above-mentioned network anomaly detection method when executed by a processor.

综上实施例，本发明通过可变形卷积神经网络DCNN和Transformer的融合模型能同时提取数据的时序特征和空间特征，解决了特征提取不全面的问题；利用去噪自编码器DAE来算法来对数据进行特征筛选来解决数据冗余问题；使用自适应合成抽样算法ADASYN解决数据的不平衡问题。因此，显著提升了网络异常检测的精度和鲁棒性。In summary, the present invention can extract the temporal and spatial features of data simultaneously through the fusion model of deformable convolutional neural network DCNN and Transformer, solving the problem of incomplete feature extraction; using the denoising autoencoder DAE algorithm to perform feature screening on data to solve the problem of data redundancy; using the adaptive synthetic sampling algorithm ADASYN to solve the problem of data imbalance. Therefore, the accuracy and robustness of network anomaly detection are significantly improved.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Furthermore, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本发明是参照根据本发明实施例的方法、设备（系统）、和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowcharts and/or block diagrams of the methods, devices (systems), and computer program products according to the embodiments of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

以上结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，这些均属于本发明的保护之内。The embodiments of the present invention are described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific implementation methods. The above-mentioned specific implementation methods are merely illustrative and not restrictive. Under the enlightenment of the present invention, ordinary technicians in this field can also make many forms without departing from the scope of protection of the purpose of the present invention and the claims, which all fall within the protection of the present invention.

Claims

1. The network anomaly detection method for deep learning is characterized by comprising the following steps of:

Obtaining abnormal flow data to be detected, inputting the abnormal flow data to a pre-trained network abnormal detection model for classification detection, and obtaining a detection result;

the training process of the trained network anomaly detection model comprises the following steps:

dividing the acquired historical network flow data set into a training set and a testing set, dividing a part of the training set into a verification set, and screening the characteristics of the data in the training set;

Performing balancing processing on the sample number of each class on the data subjected to the feature screening;

Constructing a mixed network model of a deformable convolutional neural network DCNN and an improved transducer model for realizing network traffic classification; according to the data after the balance processing, the definition of a loss function is completed by combining the hybrid network model, and the hybrid network model is trained to obtain a network anomaly detection model;

the construction method of the hybrid network model comprises the following steps:

The deformable convolutional neural network DCNN comprises an input layer, a deformable convolutional layer, a deformable ROI pooling layer and a Dropout layer;

The improved transducer model includes an encoder comprising Embedding layers, a multi-headed self-attention layer, a residual connection and layer normalization layer, and a feed forward network layer;

The data after the balance processing are respectively input into a deformable convolutional neural network DCNN and an improved transducer model, and the deformable convolutional neural network DCNN and the improved transducer model respectively extract and output the characteristics of the input data, specifically as follows:

Inputting the balanced data into an input layer of a Deformable Convolutional Neural Network (DCNN) for encoding, inputting the encoded data into the deformable convolutional layer for performing a deformable convolutional step, inputting the result processed by the deformable convolutional step into a deformable ROI pooling layer, transmitting the result to a Dropout layer after pooling processing of the deformable ROI pooling layer, and outputting a model output result of the Deformable Convolutional Neural Network (DCNN) after discarding part of neurons by the Dropout layer;

inputting the balanced data into Embedding layers of an improved transducer model for encoding, fusing the encoded data with preset position codes, sequentially processing the fused codes through a multi-head self-attention layer, a residual error connection and layer normalization layer and a feedforward network layer, and processing the residual error connection and layer normalization layer after the feedforward network layer to obtain an improved transducer model output result after processing;

And merging and inputting the output results of the deformable convolutional neural network DCNN and the improved transducer model into a self-attention module to perform feature extraction again, and classifying the output of the self-attention module through a full-connection layer and then utilizing a softmax function.

2. The deep learning oriented network anomaly detection method of claim 1, further comprising: and after the network anomaly detection model is obtained, the network anomaly detection model is verified by using a verification set, and after the verification is passed, the classification detection of the anomaly traffic data is completed by using the network anomaly detection model.

3. The deep learning-oriented network anomaly detection method of claim 1, wherein the method comprises the steps of: the data in the training set is feature filtered using a de-noised self-encoder DAE.

4. The deep learning-oriented network anomaly detection method of claim 1, wherein the method comprises the steps of: and carrying out balancing processing on the sample number of each class on the data subjected to the feature screening by using an adaptive synthetic sampling algorithm.

5. The deep learning-oriented network anomaly detection method of claim 1, wherein training the hybrid network model to obtain a network anomaly detection model comprises:

initializing parameters of the hybrid network model;

Inputting the balanced processed data into the hybrid network model;

calculating an error between the hybrid network model output and the true value using the defined loss function;

According to the gradient of the loss function, using a gradient descent algorithm to reversely propagate the error back to each layer of the hybrid network model, and adjusting parameters of the hybrid network model layer by layer;

Repeating the steps, and setting a training round threshold value; and stopping training when the training round reaches the threshold value to obtain an abnormality detection model.

6. The network anomaly detection device facing deep learning is characterized by comprising:

the abnormal flow data acquisition module is used for acquiring abnormal flow data to be detected, inputting the abnormal flow data to a pre-trained network abnormal detection model for classification detection, and obtaining a detection result;

the feature screening module is used for dividing the acquired historical network flow data set into a training set and a testing set, dividing a part of the training set into a verification set, and carrying out feature screening on the data in the training set;

The data unbalance processing module is used for carrying out balance processing on the sample number of each class on the data subjected to the feature screening;

The network anomaly detection model construction module is used for constructing a mixed network model of a deformable convolutional neural network DCNN and an improved transducer model and is used for realizing network traffic classification; and according to the data after the balance processing, the definition of the loss function is completed by combining the hybrid network model, and the hybrid network model is trained to obtain a network anomaly detection model.

7. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, performs the steps of the network anomaly detection method according to any one of claims 1 to 5.

8. A computer device, comprising:

A memory for storing a computer program;

A processor for executing the computer program to implement the steps of the network anomaly detection method of any one of claims 1 to 5.

9. A computer program product comprising a computer program characterized by: the computer program when executed by a processor implements the steps of the network anomaly detection method of any one of claims 1 to 5.