CN117011599A

CN117011599A - Image detection method based on improved efficientnet model

Info

Publication number: CN117011599A
Application number: CN202310888336.5A
Authority: CN
Inventors: 周佳宇; 孙景余; 王岩; 韩孟睿; 郭天睿; 陈童悦
Original assignee: Anhui Polytechnic University
Current assignee: Anhui Polytechnic University
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2023-11-07

Abstract

The application discloses an image detection method based on an improved efficientet model, which comprises preprocessing operation, a built neural network model and parameters required to be adjusted by the neural network model. The method comprises the following steps: the method comprises the steps of (1) preprocessing an image to be detected; (2) Building a neural network model based on an improved efficientnet model and training; (3) And (3) detecting and outputting the image to be detected by adopting the neural network model after training in the step (2). The method solves the problems of high hardware configuration, low classification accuracy, low running speed and the like of the current image classification neural network model.

Description

An image detection method based on improved efficientnet model

技术领域Technical field

本发明涉及图像检测领域，特别涉及一种基于改进efficientnet模型的图像检测方法。The invention relates to the field of image detection, and in particular to an image detection method based on an improved efficientnet model.

背景技术Background technique

图像分类是一项计算机视觉领域的技术，广泛地应用于图像识别、目标检测和行为分析等方面。Image classification is a technology in the field of computer vision and is widely used in image recognition, target detection and behavior analysis.

目前通常采用颜色法、支持向量机法、神经网络法等来进行图像分类，前两者均存在准确率受限、无法处理复杂场景等众多问题。而最后的神经网络法包含的图像分类神经网络模型只适用于大规模分类任务，且具有较大的计算量和参数数量等问题，如专利申请号为201810113878.4的一种基于卷积网和循环神经网络的高光谱图像分类方法，主要解决现有技术中高光谱图像分类精度低的问题。本发明具体步骤如下：(1)构造三维的卷积神经网络；(2)构造循环神经网络；(3)对待分类的高光谱图像矩阵进行预处理；(4)生成训练数据集和测试数据集；(5)利用训练数据集训练网络；(6)提取测试数据集空间特征和光谱特征；(7)融合空间特征和光谱特征；(8)对测试数据集进行分类。本发明引入三维卷积神经网络和循环神经网络提取高光谱图像的空间特征和光谱特征，融合两种特征进行分类，具有针对高光谱图像分类问题精度高的优点。At present, color method, support vector machine method, neural network method, etc. are usually used for image classification. The first two have many problems such as limited accuracy and inability to handle complex scenes. The image classification neural network model included in the final neural network method is only suitable for large-scale classification tasks, and has a large amount of calculation and number of parameters. For example, the patent application number 201810113878.4 is based on a convolutional network and recurrent neural network. The hyperspectral image classification method of the network mainly solves the problem of low accuracy of hyperspectral image classification in the existing technology. The specific steps of the present invention are as follows: (1) Construct a three-dimensional convolutional neural network; (2) Construct a recurrent neural network; (3) Preprocess the hyperspectral image matrix to be classified; (4) Generate a training data set and a test data set ; (5) Use the training data set to train the network; (6) Extract the spatial features and spectral features of the test data set; (7) Fusion of spatial features and spectral features; (8) Classify the test data set. The present invention introduces a three-dimensional convolutional neural network and a recurrent neural network to extract spatial features and spectral features of hyperspectral images, and fuses the two features for classification, which has the advantage of high accuracy for hyperspectral image classification problems.

可以说现有神经网络拥有一些优点，但是在应用于实际生活中的各种图像分类任务中往往因硬件配置昂贵、分类准确率过低和运行速度缓慢等缺点而被放弃使用。因此需要一种兼顾准确率、运行速度以及成本的图像分类方法。It can be said that existing neural networks have some advantages, but when applied to various image classification tasks in real life, they are often abandoned due to disadvantages such as expensive hardware configuration, low classification accuracy, and slow running speed. Therefore, an image classification method that takes into account accuracy, running speed, and cost is needed.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于改进efficientnet模型的图像检测方法，解决现有技术神经网络等图像分类方法存在的缺陷，创新性地提出了改进的efficientnet模型算法，帮助神经网络技术能够更高效、更准确地应用于图像分类实际任务。The purpose of the present invention is to overcome the shortcomings of the existing technology, provide an image detection method based on an improved efficientnet model, solve the shortcomings of image classification methods such as existing neural networks, and innovatively propose an improved efficientnet model algorithm to help Neural network technology can be applied to practical image classification tasks more efficiently and accurately.

为了实现上述目的，本发明采用的技术方案为：一种基于改进efficientnet模型的图像检测方法，包括如下步骤：In order to achieve the above purpose, the technical solution adopted by the present invention is: an image detection method based on an improved efficientnet model, which includes the following steps:

(1)对待检测的图像进行预处理操作；(1) Preprocess the image to be detected;

(2)搭建基于改进efficientnet模型的神经网络模型并进行训练；(2) Build a neural network model based on the improved efficientnet model and conduct training;

(3)采用步骤(2)中训练完成后的神经网络模型对待检测的图像进行检测输出。(3) Use the neural network model trained in step (2) to detect and output the image to be detected.

步骤2中，搭建的基于改进efficientnet模型的神经网络模型包括：用于提取图片浅层次特征的茎、用于将浅层次特征提取为深层次特征的MBConv块、用于将深层次特征转化为输出的枝干以及用于对输出进行分类的分类器。In step 2, the neural network model built based on the improved efficientnet model includes: stems used to extract shallow-level features of the image, MBConv blocks used to extract shallow-level features into deep-level features, and transform deep-level features into are the branches of the output and the classifier used to classify the output.

步骤(1)中，预处理操作包括图像的剪裁、灰度化、增强图像特征的均衡化、去除光照影响的中值滤波、将图像与背景分离的高提升滤波、提取图像可视化部分的二值化、图像修补、转化为pytorch运算形式的张量化和加快训练速度的标准化操作中的全部。In step (1), preprocessing operations include image cropping, grayscale, equalization to enhance image features, median filtering to remove the influence of lighting, high-boosting filtering to separate the image from the background, and binary extraction of the visual part of the image. ization, image inpainting, tensorization into pytorch operations, and standardized operations to speed up training.

用于提取图片浅层次特征的茎包括：卷积模块、归一化操作模块以及swish函数，输入的图像经过卷积模块进行图像升维，再通过归一化操作模块进行批归一化操作，最后通过swish激活函数来输出图像的浅层次特征。The stems used to extract shallow-level features of images include: convolution module, normalization operation module and swish function. The input image undergoes image dimensionality increase through the convolution module, and then performs batch normalization operation through the normalization operation module. , and finally output the shallow features of the image through the swish activation function.

用于将浅层次特征提取为深层次特征的MBConv块包括：用于升维的1*1卷积、k*k维深度可分离卷积、SE模块、1*1的降维卷积核Dropout层；其中输入的图像特征先经过1*1卷积进行升维后依次经过BN层、Swish激活函数后送入到k*k维深度可分离卷积进行处理后再次经过BN层、Swish激活函数后送入到SE模块，SE模块处理后一次经过1*1降维卷积、BN层后送入到Dropout层会后将Dropout输出与MBConv的输入按位相加得到MBConv块的输出。The MBConv block used to extract shallow-level features into deep-level features includes: 1*1 convolution for dimensionality increase, k*k-dimensional depth separable convolution, SE module, and 1*1 dimensionality reduction convolution kernel. Dropout layer; the input image features first undergo 1*1 convolution to increase the dimension, then pass through the BN layer and Swish activation function, and then are sent to the k*k-dimensional depth separable convolution for processing, and then go through the BN layer and Swish activation again. The function is then sent to the SE module. After processing by the SE module, it is sent to the Dropout layer through 1*1 dimensionality reduction convolution and BN layer. After that, the Dropout output and the input of MBConv are added bitwise to obtain the output of the MBConv block.

用于将深层次特征转化为输出的枝干包括1*1的降维卷积核批归一化操作模块。The branches used to convert deep features into output include a 1*1 dimensionality reduction convolution kernel batch normalization operation module.

用于对输出进行分类的分类器包括依次顺序连接输出的全局平均池化层、随机失活层、全连接层和Softmax激活函数。The classifier used to classify the output consists of a global average pooling layer, a random dropout layer, a fully connected layer, and a softmax activation function that sequentially connect the outputs.

基于改进efficientnet模型的神经网络模型的训练包括：The training of the neural network model based on the improved efficientnet model includes:

建立训练集、验证集和测试集；Establish training set, validation set and test set;

然后采用验证集进行验证，导入验证集数据并依次输入到此批次所得模型中，比对当前模型输出值和真实标签值，从而统计出当前模型在验证集上的准确率，作为本批次所得内部参数的模型所对应的准确率，最后将内部参数保存至一个最佳参数文件。Then use the verification set for verification, import the verification set data and input it into the model obtained in this batch in turn, compare the current model output value and the real label value, and then calculate the accuracy of the current model on the verification set, as this batch The accuracy corresponding to the obtained internal parameters model is finally saved to an optimal parameter file.

然后进行第二个批次的训练，再次利用训练集训练出新的内部参数的模型，比对统计出新模型在验证集上的准确率，作为第二个批次所得的内部参数所对应的准确率。Then conduct the second batch of training, use the training set again to train a model with new internal parameters, and compare and calculate the accuracy of the new model on the verification set as the corresponding internal parameters obtained in the second batch. Accuracy.

然后将第二个批次准确率与之前批次的准确率比较，若准确率更高则采用第二个批次的内部参数覆盖至最佳参数文件；若准确率相同或者更低，则不对最佳参数文件进行更新。Then compare the accuracy of the second batch with the accuracy of the previous batch. If the accuracy is higher, the internal parameters of the second batch will be used to cover the optimal parameter file; if the accuracy is the same or lower, it is not correct. The optimal parameter file is updated.

然后按照上述操作，在规定的批次数内，周而复始地进行训练更新，最终得到准确率最高的最佳内部参数，完成模型的训练。Then follow the above operations and perform training updates repeatedly within the specified number of batches. Finally, the best internal parameters with the highest accuracy are obtained and the model training is completed.

基于训练集更新模型内部参数包括：Updating the internal parameters of the model based on the training set includes:

调参的预备工作：根据训练集中图片保存的路径导入图片并对图片进行预处理操作；Preparatory work for parameter adjustment: import images according to the path where the images in the training set are saved and perform preprocessing operations on the images;

内部参数调试：将搭建好的待改进模型进行He初始化，之后导入训练集的一个图片输入到改进的神经网络模型中，完成正向传播；Internal parameter debugging: He initializes the built model to be improved, and then imports a picture of the training set into the improved neural network model to complete forward propagation;

然后选择交叉熵损失函数，通过将模型的输出值和真实标签值一同导入交叉熵损失函数中计算出损失值；接着基于损失值进行反向传播计算各层梯度，最后根据各层梯度更新神经网络模型的内部参数，采用同样操作完成整个训练集的图片，从而完成当前批次的内部参数调节。Then select the cross-entropy loss function, and calculate the loss value by importing the output value of the model and the real label value into the cross-entropy loss function; then perform backpropagation based on the loss value to calculate the gradient of each layer, and finally update the neural network according to the gradient of each layer For the internal parameters of the model, the same operation is used to complete the images of the entire training set, thereby completing the internal parameter adjustment of the current batch.

模型的超参数设置为：每批读取的次数的batch_number为8，初始学习率为lr为1e-3，对于梯度和梯度平方的指数衰减率betas为0.9和0.999，分母小平滑项eps为1e-8，权重衰减weight_decay为0，Adam算法改进版amsgrad的使用选项选择True，并把训练集的迭代次数num_epochs设置为40。The hyperparameters of the model are set as follows: the batch_number of the number of reads per batch is 8, the initial learning rate lr is 1e-3, the exponential decay rate betas for the gradient and gradient square are 0.9 and 0.999, and the denominator small smoothing term eps is 1e -8, weight decay weight_decay is 0, select True for the usage option of the improved version of amsgrad of Adam algorithm, and set the number of iterations of the training set num_epochs to 40.

与现有技术相比，本发明的优点在于：Compared with the prior art, the advantages of the present invention are:

1)在图片集导入神经网络前进行了预处理操作，提高了图片集质量的同时帮助神经网络更好地学习到图片集的关键性能，对网络的性能要求降低、运行速度和准确率提高。1) Preprocessing is performed before the picture set is imported into the neural network, which improves the quality of the picture set and helps the neural network better learn the key performance of the picture set, reducing the performance requirements of the network and improving the running speed and accuracy.

2)采用了早停策略和改进模型结构等措施，减少了efficientnet模型代码的运行参数和计算时的显存要求，避免了配置高性能显卡等硬件条件，节省了成本，也扩宽了模型的使用场景。2) Measures such as early stopping strategy and improved model structure were adopted to reduce the operating parameters of the efficientnet model code and the video memory requirements during calculation, avoiding the configuration of high-performance graphics cards and other hardware conditions, saving costs and broadening the use of the model. Scenes.

3)改进的efficientnet模型代码相比于源码在枝干、全连接层等有所改变，帮助模型在小规模的实际分类任务中也能表现出类似于大规模的实际分类任务中百分百的分类准确率。3) Compared with the source code, the improved efficientnet model code has changes in branches, fully connected layers, etc., helping the model to perform 100% in small-scale actual classification tasks similar to large-scale actual classification tasks. Classification accuracy.

4)改进的efficientnet模型代码改进了搭建原理，并对分散的文件进行整合，去除运行时耗费的垃圾时间，帮助新模型代码以更快的运行速度应用于各种实际分类任务。4) The improved efficientnet model code improves the construction principle and integrates scattered files to remove garbage time spent during runtime, helping the new model code to be applied to various practical classification tasks at a faster running speed.

附图说明Description of the drawings

下面对本发明说明书各幅附图表达的内容及图中的标记作简要说明：The following is a brief description of the content expressed in each drawing of the specification of the present invention and the marks in the drawings:

图1为本发明的整体流程图。Figure 1 is an overall flow chart of the present invention.

图2为本发明的轴承装配质量检测的图像分类任务示例图。Figure 2 is an example diagram of an image classification task for bearing assembly quality detection according to the present invention.

图3为本发明示例图的剪裁操作的成果图。Figure 3 is a diagram showing the results of the clipping operation of the example diagram of the present invention.

图4为本发明示例图的灰度化操作的成果图。FIG. 4 is a result diagram of the grayscale operation of the example image of the present invention.

图5为本发明示例图的灰度化操作的灰度直方图。Figure 5 is a grayscale histogram of the grayscale operation of the example image of the present invention.

图6为本发明示例图的均衡化操作的成果图。Figure 6 is a diagram showing the results of the equalization operation of the example diagram of the present invention.

图7为本发明示例图的均衡化操作的灰度直方图。Figure 7 is a grayscale histogram of the equalization operation of the example image of the present invention.

图8为本发明示例图的中值滤波操作的成果图。Figure 8 is a result diagram of the median filtering operation of the example diagram of the present invention.

图9为本发明示例图的高提升滤波操作的成果图。FIG. 9 is a diagram showing the results of the high-boost filtering operation of the example diagram of the present invention.

图10为本发明示例图的二值化操作的成果图。Figure 10 is a result diagram of the binarization operation of the example diagram of the present invention.

图11为本发明示例图的补全圆形操作的成果图。Figure 11 is a diagram showing the results of the circle completion operation of the example diagram of the present invention.

图12为本发明示例图的标准化操作的成果图。Figure 12 is a result diagram of the standardization operation of the example diagram of the present invention.

图13本发明训练中efficientnet源码的训练、验证集的损失函数变化曲线。Figure 13 is the change curve of the loss function of the training and verification sets of the efficientnet source code in the training of the present invention.

图14本发明训练中改进模型代码的训练、验证集的损失函数变化曲线。Figure 14 is the change curve of the loss function of the training and verification sets of the improved model code in the training of the present invention.

图15本发明efficientnet源码的最终准确率。Figure 15 is the final accuracy of the efficientnet source code of the present invention.

图16本发明改进模型代码的最终准确率。Figure 16 The final accuracy of the improved model code of the present invention.

图17为模型的茎示意图；Figure 17 is a schematic diagram of the stem of the model;

图18为模型的MBConv块的结构示意图；Figure 18 is a schematic structural diagram of the MBConv block of the model;

图19为模型的枝干结构示意图；Figure 19 is a schematic diagram of the branch structure of the model;

图20为分类器示意图；Figure 20 is a schematic diagram of the classifier;

具体实施方式Detailed ways

下面对照附图，通过对最优实施例的描述，对本发明的具体实施方式作进一步详细的说明。The specific implementation manner of the present invention will be further described in detail below by describing the preferred embodiment with reference to the accompanying drawings.

本发明提供了一种基于改进efficientnet模型的图像检测方法，包括如下步骤：The present invention provides an image detection method based on an improved efficientnet model, which includes the following steps:

步骤(1)中，预处理操作包括图像的剪裁、灰度化、增强图像特征的均衡化、去除光照影响的中值滤波、将图像与背景分离的高提升滤波、提取图像可视化部分的二值化、图像修补、转化为pytorch运算形式的张量化和加快训练速度的标准化操作中的至少一个或其组合或全部。In step (1), preprocessing operations include image cropping, grayscale, equalization to enhance image features, median filtering to remove the influence of lighting, high-boosting filtering to separate the image from the background, and binary extraction of the visual part of the image. At least one, a combination or all of the following: image inpainting, tensorization into pytorch operation form, and normalization operations to speed up training.

如图17所示，用于提取图片浅层次特征的茎包括：卷积模块、归一化操作模块以及swish函数，输入的图像经过卷积模块进行图像升维，再通过归一化操作模块进行批归一化操作，最后通过swish激活函数来输出图像的浅层次特征。As shown in Figure 17, the stems used to extract shallow-level features of images include: convolution module, normalization operation module and swish function. The input image goes through the convolution module to increase the image dimension, and then passes through the normalization operation module Perform batch normalization operation, and finally output the shallow features of the image through the swish activation function.

如图18所示，为模型的MBConv块的结构示意图；模型的块由2个卷积核为3*3、步长为1*1的MBConv1块，由3个卷积核为3*3、步长为2*2的MBConv6块、由3个卷积核为5*5、步长为2*2的MBConv6块，由5个卷积核为3*3、步长为2*2的MBConv6块，由5个卷积核为5*5、步长为1*1的MBConv6块，由6个卷积核为5*5、步长为2*2的MBConv6块，由2个卷积核为3*3、步长为1*1的MBConv6块，用于将浅层次特征提取为深层次特征的MBConv块包括：用于升维的1*1卷积、k*k维深度可分离卷积、SE模块、1*1的降维卷积和Dropout层；其中输入的图像特征先经过1*1卷积进行升维后依次经过BN层、Swish激活函数后送入到k*k维深度可分离卷积进行处理后再次经过BN层、Swish激活函数后送入到SE模块，SE模块处理后一次经过1*1降维卷积、BN层后送入到Dropout层，然后将Dropout输出与MBConv的输入按位相加得到MBConv块的输出。As shown in Figure 18, it is a schematic structural diagram of the MBConv block of the model; the model block consists of MBConv1 block with 2 convolution kernels of 3*3 and a stride of 1*1, and 3 convolution kernels of 3*3, MBConv6 block with a stride of 2*2, MBConv6 block with 3 convolution kernels of 5*5 and a stride of 2*2, MBConv6 with 5 convolution kernels of 3*3 and a stride of 2*2 block, an MBConv6 block consisting of 5 convolution kernels of 5*5 and a stride of 1*1, an MBConv6 block of 6 convolution kernels of 5*5 and a stride of 2*2, consisting of 2 convolution kernels The MBConv6 block is 3*3 with a stride of 1*1. The MBConv blocks used to extract shallow-level features into deep-level features include: 1*1 convolution for dimensionality increase, k*k-dimensional depth separable Convolution, SE module, 1*1 dimensionality reduction convolution and dropout layer; the input image features first undergo 1*1 convolution to increase the dimension, and then pass through the BN layer, Swish activation function and then are sent to k*k dimensions. After being processed by depth-separable convolution, it is sent to the SE module through the BN layer and Swish activation function. After the SE module is processed, it is sent to the Dropout layer after being processed by 1*1 dimensionality reduction convolution and BN layer, and then the Dropout is output. Bitwise addition to the input of MBConv results in the output of the MBConv block.

如图19为模型的枝干结构示意图，用于将深层次特征转化为输出的枝干包括1*1的降维卷积和批归一化操作模块。Figure 19 is a schematic diagram of the branch structure of the model. The branches used to convert deep-level features into output include 1*1 dimensionality reduction convolution and batch normalization operation modules.

如图20为分类器示意图，用于对输出进行分类的分类器包括依次顺序连接输出的全局平均池化层、随机失活层、全连接层和Softmax激活函数。Figure 20 is a schematic diagram of a classifier. The classifier used to classify the output includes a global average pooling layer, a random deactivation layer, a fully connected layer and a Softmax activation function that connect the outputs in sequence.

建立训练集和验证集和测试集；(训练集、验证集和测试集均取自轴承装配质量情况的数据集，整个数据集包含合格轴承【均布钢珠轴承——即各钢珠的间距在误差范围内的情况】和不合格轴承【非均布钢珠轴承——即各钢珠的间距在误差范围外的情况、缺失一颗钢珠轴承——即轴承内缺失一颗钢珠的情况、缺失两颗钢珠轴承——即轴承内缺失两颗钢珠的情况】两种。接着，对数据集按照6:2:2的比例进行划分，兼顾了“减少训练时间”和“提高测试可靠性”的目标)Establish a training set, a verification set and a test set; (the training set, verification set and test set are all taken from the data set of bearing assembly quality, and the entire data set contains qualified bearings [uniformly distributed steel ball bearings - that is, the spacing of each steel ball within the error Within the range] and unqualified bearings [non-uniformly distributed steel ball bearings - that is, the distance between each steel ball is outside the error range, missing one steel ball bearing - that is, one steel ball is missing in the bearing, two steel balls are missing Bearing - that is, two situations in which two steel balls are missing in the bearing. Then, the data set is divided according to the ratio of 6:2:2, taking into account the goals of "reducing training time" and "improving test reliability")

基于训练集中的图片对模型进行训练完成模型内部参数调节直至整个训练集训练完成，得到此批次的内部参数的模型；The model is trained based on the images in the training set to complete the internal parameter adjustment of the model until the training of the entire training set is completed, and a model of the internal parameters of this batch is obtained;

对原论文所规定的超参数进行改进，配合内部参数调节，最终训练出来的改进efficientnet模型的神经网络模型可应用于图像检测分类，其检测准确度高且硬件成本低，在应用时仅需要将待检测的图像进行预处理后送入到模型中即可由模型进行输出分类检测结果。By improving the hyperparameters specified in the original paper and cooperating with internal parameter adjustments, the finally trained neural network model of the improved efficientnet model can be applied to image detection and classification. Its detection accuracy is high and the hardware cost is low. When applying, it only needs to The image to be detected is preprocessed and sent to the model, and the model can output classification detection results.

下面将以轴承装配质量检测的图像分类任务为例，详细说明各模块的具体原理及软件代码实现方式(需要理解的是，术语“钢珠1”等指示的相应名称为基于钢珠运动次序所对应出的“qiu1”，各术语仅是为了便于描述本发明的简化描述，因此不能理解为对本发明的限制。)：The following will take the image classification task of bearing assembly quality inspection as an example to explain in detail the specific principles and software code implementation of each module (it needs to be understood that the corresponding names indicated by the terms "Steel Ball 1" are based on the movement sequence of the steel balls). "qiu1", each term is only a simplified description to facilitate the description of the present invention, and therefore cannot be understood as a limitation of the present invention.):

一种基于改进的efficientnet模型的图像检测方法，包括预处理操作、搭建的神经网络模型及神经网络模型需要调整的参数。An image detection method based on an improved efficientnet model, including preprocessing operations, built neural network models and parameters that need to be adjusted in the neural network model.

预处理操作，包括对轴承所在图像的剪裁、简化图像的灰度化、增强图像特征的均衡化、去除光照影响的中值滤波、将钢珠与背景分离的高提升滤波、提取钢珠可视化部分的二值化、修复完整钢珠图形的补全圆形、转化为pytorch运算形式的张量化和加快训练速度的标准化操作。Preprocessing operations include cropping the image where the bearing is located, simplifying the grayscale of the image, enhancing the equalization of image features, median filtering to remove the impact of lighting, high-lift filtering to separate the steel ball from the background, and extracting the visual part of the steel ball. value, complete circles to repair complete steel ball graphics, tensorization into pytorch operation forms, and standardized operations to speed up training.

搭建的神经网络模型，包含提取图片浅层次特征的茎、将浅层次特征提取为深层次特征的MBConv块、将深层次特征转化为输出的枝干、对输出进行分类的分类器。The built neural network model includes stems that extract shallow-level features of images, MBConv blocks that extract shallow-level features into deep-level features, branches that convert deep-level features into outputs, and classifiers that classify the outputs.

神经网络模型需要调整的参数，包括调参的预备工作、计算出模型最佳权重的内部参数、提高模型相关性能的超参数、提供最终判定依据的测试集参数。The parameters that need to be adjusted for the neural network model include preparatory work for parameter adjustment, internal parameters to calculate the optimal weight of the model, hyperparameters to improve the relevant performance of the model, and test set parameters to provide the basis for the final judgment.

如图1所示，基于改进的efficientnet模型的图像检测方法，包括图像预处理、神经网络模型搭建及神经网络模型调参。具体包括如下步骤：As shown in Figure 1, the image detection method based on the improved efficientnet model includes image preprocessing, neural network model construction and neural network model parameter adjustment. Specifically, it includes the following steps:

步骤1：加载图片样本集；Step 1: Load the picture sample set;

步骤2：对样本进行预处理；Step 2: Preprocess the sample;

步骤3：搭建改进的efficientnet模型；Step 3: Build an improved efficientnet model;

步骤4：对模型内部参数进行调试；Step 4: Debugging the internal parameters of the model;

步骤5：超参数调试；Step 5: Hyperparameter debugging;

步骤6：测试集进行测试得到模型的准确率，判断是否优于现有技术的准确率？若是则确定模型；否则调整超参数进行重新获得模型，再进行测试，循环至结果为是。Step 6: Test the test set to obtain the accuracy of the model, and determine whether it is better than the accuracy of the existing technology? If so, determine the model; otherwise adjust the hyperparameters to re-obtain the model, test again, and loop until the result is yes.

如图2所示，为轴承装配质量检测的图像分类任务示例图。As shown in Figure 2, it is an example of an image classification task for bearing assembly quality inspection.

本申请中图像预处理包括如下顺序执行的步骤：Image preprocessing in this application includes the following steps:

先进行剪裁，利用“修改图像左下角坐标等于修改中心点坐标”的数学关系，通过transforms类下的Lambda函数调用自定义函数Center()(该函数将图像中心点坐标移动至指定位置。方式是先取得输入图像的长和宽，并根据“新的中心点坐标减去图像长和宽的一半”得到新图像的左下角坐标，再根据左下角坐标加上输入图像的长和宽得到新图像右上角坐标，将新图像的左下角、右上角坐标输入img.crop函数【根据左下角、右上角坐标移动图像的函数】中，实现图像的移动，进而间接实现中心点的移动)来修改图像中心点坐标。再通过transforms类下的CenterCrop函数使图像从新中心点开始剪裁。从而裁剪出只包含轴承的图像，剪裁的成果图如图3所示；Trim first, and use the mathematical relationship "modifying the coordinates of the lower left corner of the image is equal to modifying the coordinates of the center point" to call the custom function Center() through the Lambda function under the transforms class (this function moves the coordinates of the center point of the image to the specified position. The method is First get the length and width of the input image, and get the lower left corner coordinates of the new image based on "the new center point coordinates minus half the length and width of the image", and then get the new image based on the lower left corner coordinates plus the length and width of the input image. The coordinates of the upper right corner, input the coordinates of the lower left corner and upper right corner of the new image into the img.crop function [function to move the image according to the coordinates of the lower left corner and upper right corner] to realize the movement of the image, and then indirectly realize the movement of the center point) to modify the image Center point coordinates. Then use the CenterCrop function under the transforms class to start cropping the image from the new center point. Thus, the image containing only the bearing is cropped, and the cropped result is shown in Figure 3;

再进行灰度化，通过将transforms类下的Grayscale函数的参数num_output_channels改为1得到轴承的灰度图，灰度化的成果图如图4所示，灰度图的灰度直方图如图5所示、再进行均衡化。通过transforms类下的Lambda函数调用自定义函数Equ()(此函数被配置为将灰度值均衡化处理，这样图像的信息更加丰富。方式是利用ImageOps.equalize函数，它利用占比等信息更新图像各个像素点的灰度取值范围，实现灰度值的均匀分布)使得灰度值各数值均匀排布，图像特征得到增强，均衡化的成果图如图6所示，均衡化图的灰度直方图如图7所示、再进行中值滤波。通过transforms类下的Lambda函数调用ImageFilter类的MedianFilter()函数，并令参数size为1，去除图像中光照、阴影等干扰因素，中值滤波的成果图如图8所示；Then perform grayscale, and change the parameter num_output_channels of the Grayscale function under the transforms class to 1 to obtain the grayscale image of the bearing. The grayscale result is shown in Figure 4, and the grayscale histogram of the grayscale image is shown in Figure 5. shown, and then perform equalization. Call the custom function Equ() through the Lambda function under the transforms class (this function is configured to equalize the gray value, so that the image information is richer. The method is to use the ImageOps.equalize function, which uses information such as proportion to update The gray value range of each pixel point in the image achieves a uniform distribution of gray values) so that the gray values are evenly arranged and the image features are enhanced. The equalization result diagram is shown in Figure 6. The gray value of the equalization diagram The degree histogram is shown in Figure 7, and then median filtering is performed. Call the MedianFilter() function of the ImageFilter class through the Lambda function under the transforms class, and set the parameter size to 1 to remove interference factors such as lighting and shadows in the image. The results of median filtering are shown in Figure 8;

中值滤波完成后接着进行高提升滤波。通过transforms类下的Lambda函数调用自定义函数High()(此函数被配置为对图像进行卷积操作，主要是使得轴承与背景分隔开。方式是利用torch.ones函数创建一个5*5的卷积核，在卷积核中心为0.45的情况下对图像实现零填充的卷积操作)，配合最终的调试参数得到高提升滤波图，使得钢珠和背景得以分隔。高提升滤波的成果图如图9所示、接着进行二值化。After the completion of median filtering, high boost filtering is performed. Call the custom function High() through the Lambda function under the transforms class (this function is configured to perform a convolution operation on the image, mainly to separate the bearings from the background. The method is to use the torch.ones function to create a 5*5 The convolution kernel implements a zero-filling convolution operation on the image when the convolution kernel center is 0.45), and cooperates with the final debugging parameters to obtain a highly improved filter image, which separates the steel balls and the background. The result of high-lift filtering is shown in Figure 9, and then binarized.

二值化包括：通过transforms类下的Lambda函数调用point()函数，使得当前像素值大于10的为白色，否则为黑色，使得结果提取出钢珠的可视化部分，二值化的成果图如图10所示、接着进行补全圆形。通过transforms类下的Lambda函数调用自定义函数Circle()(此函数被配置为根据简化的钢珠可见部分，补全出完整的钢珠图像。主要是利用cv2.findContours函数找出各部分轮廓，并根据轮廓得到此处钢珠完整图案所对应的半径和圆心，利用这些信息在原图案的基础上绘制圆，实现只有完整钢珠的图像信息)，通过cv库下检测轮廓和绘制圆形函数的配合，使得由钢珠的可视化部分可以得到钢珠的完整图案，补全圆形的成果图如图11所示、接着进行张量化。图像像素值归一化为[0.0,1.0]的浮点数值，且图像的形状和格式转化为pytorch框架下可以处理的形式、最后进行标准化。通过transforms类下的Normalize函数，配合各数据集下利用mean函数和std函数计算的均值和标准差，使得数据变为标准正态分布，标准化的成果图如图12所示。Binarization includes: calling the point() function through the Lambda function under the transforms class, so that the current pixel value greater than 10 is white, otherwise it is black, so that the visual part of the steel ball is extracted from the result. The result of the binarization is shown in Figure 10 As shown, then complete the circle. Call the custom function Circle() through the Lambda function under the transforms class (this function is configured to complete the complete steel ball image based on the simplified visible part of the steel ball. It mainly uses the cv2.findContours function to find the contours of each part, and based on The contour obtains the radius and center of the circle corresponding to the complete pattern of the steel ball here, and uses this information to draw a circle based on the original pattern to achieve only the image information of the complete steel ball). Through the combination of detecting the contour and drawing the circle function under the cv library, it is made possible by The visualization part of the steel ball can obtain the complete pattern of the steel ball. The completed circular result is shown in Figure 11, and then tensorization is performed. The image pixel values are normalized to floating point values of [0.0, 1.0], and the shape and format of the image are converted into a form that can be processed under the pytorch framework, and finally standardized. Through the Normalize function under the transforms class, combined with the mean and standard deviation calculated using the mean function and std function under each data set, the data becomes a standard normal distribution. The standardized result chart is shown in Figure 12.

本申请的需要搭建基于efficient NET的神经网络模型，神经网络模型包括：提取图片浅层次特征的茎、将浅层次特征提取为深层次特征的MBConv块、将深层次特征转化为输出的枝干、对输出进行分类的分类器。各部分具体结构图如图17-20所示。This application needs to build a neural network model based on efficient NET. The neural network model includes: stems that extract shallow-level features of images, MBConv blocks that extract shallow-level features into deep-level features, and branches that convert deep-level features into outputs. Stem, a classifier that classifies the output. The specific structure diagram of each part is shown in Figure 17-20.

神经网络模型搭建，先搭建模型的茎。通过先使用自定义的Conv2d()函数(该函数制作出图片能在神经网络运算时的形式，接着进行卷积操作。具体包括获取与输入图像同等大小的Conv2d形式，根据这种形式，按照卷积运算的数学公式搭配math库的函数实现规定参数的卷积操作，这里对应的是步长为1的3*3卷积操作)搭配相关参数实现3*3卷积操作进行图像升维，再通过torch库下的nn工具库里面的BatchNorm2d工具实现数据的批归一化操作，最后通过自定义的MemoryEfficientSwish()函数(避免pycharm新旧版本导致的Swish激活函数兼容问题和低内存高效率地实现Swish函数。具体是通过访问“SiLU函数的有无”，从而判定出版本情况，并调用版本对应的swish函数。低内存高效率主要通过在前向、反向传播中单独保存了传播的最终结果，避免后面需要结果值时的再次计算，节约了运行时间)实现swish激活操作，完成模型的茎的搭建，具体参数见代码附件1；(茎提取出来图像的初步特征，如边缘、纹理等。)To build a neural network model, first build the stem of the model. By first using the custom Conv2d() function (this function creates the form of the image that can be used in the neural network operation, and then performs the convolution operation. Specifically, it includes obtaining the Conv2d form of the same size as the input image. According to this form, according to the convolution The mathematical formula of the product operation is used with the function of the math library to implement the convolution operation with specified parameters. This corresponds to the 3*3 convolution operation with a step size of 1) and the relevant parameters are used to implement the 3*3 convolution operation to increase the image dimension, and then The batch normalization operation of data is realized through the BatchNorm2d tool in the nn tool library under the torch library. Finally, the customized MemoryEfficientSwish() function is used (to avoid the Swish activation function compatibility issues and low memory caused by the old and new versions of pycharm to implement Swish efficiently. function. Specifically, it determines the version status by accessing the "presence or absence of SiLU function" and calls the swish function corresponding to the version. Low memory and high efficiency mainly save the final result of propagation separately in forward and reverse propagation. To avoid recalculation when the result value is needed later, and save running time) implement the swish activation operation and complete the construction of the stem of the model. See code attachment 1 for specific parameters; (the stem extracts the preliminary features of the image, such as edges, texture, etc.)

再搭建模型的MBConv块。需要提前搭建好其中的SE模块，先使用torch库下的nn工具库里的functional函数库的adaptive_avg_pool2d函数实现全局平均池化，使用自定义的Conv2d()函数(该自定义函数制作出图片能在神经网络运算时的形式，接着进行卷积操作。具体包括获取与输入图像同等大小的Conv2d形式，根据这种形式，按照卷积运算的数学公式搭配math库的函数实现规定参数的卷积操作，这里对应的是1*1卷积操作)(这里1*1卷积操作的目的是为了进行降维，可以极大地减少计算量，更好的拟合通道间的复杂关系)搭配相关参数实现用于降维的1*1卷积，通过自定义的MemoryEfficientSwish()函数(避免pycharm新旧版本导致的Swish激活函数兼容问题和低内存高效率地实现Swish函数。具体是通过访问“SiLU函数的有无”，从而判定出版本情况，并调用版本对应的swish函数。低内存高效率主要通过在前向、反向传播中单独保存了传播的最终结果，避免后面需要结果值时的再次计算，节约了运行时间)(此处的激活函数是引入非线性变换，帮助SE模块自适应地学习输入特征图中每个像素点的重要性。通过激活函数后的结果，其对重要信息(所在区域的像素点)的权重值更高，提高了后期结果的正确性)实现swish激活函数，使用自定义的Conv2d()函数搭配相关参数实现用于升维的1*1卷积(该自定义函数制作出图片能在神经网络运算时的形式，接着进行卷积操作。具体包括获取与输入图像同等大小的Conv2d形式，根据这种形式，按照卷积运算的数学公式搭配math库的函数实现规定参数的卷积操作，这里对应的是1*1卷积操作)(这里1*1卷积操作的目的是为了进行升维。由于刚刚为了减少计算量进行降维，图像形式已变化，但后面还需要将此处的权重结果与原输入图像相乘。为了能够与原图像成功相乘，必须按照原本降维的形式对权重结果进行升维，故进行此1*1卷积操作)，使用torch库下的sigmoid函数实现sigmoid激活，得到最终的权重并将其与原图相乘，得到最终SE模块的处理结果。整个MBConv块通过使用自定义的Conv2d()函数搭配相关参数实现k*k深度可分离卷积来对图像升维(Conv2d()函数自定义函数制作出图片能在神经网络运算时的形式，接着进行卷积操作。具体包括获取与输入图像同等大小的Conv2d形式，根据这种形式，按照卷积运算的数学公式搭配math库的函数实现规定参数的卷积操作，这里对应的是k*k卷积操作,k指代MBConv中卷积核的大小，对应图18里面的参数)(这里的k*k卷积操作的目的是为了进行升维，可以增加图像通道，提高模型的表达能力，并降低信息的损失)，再通过torch库下的nn工具库里面的BatchNorm2d函数实现数据的批归一化操作，再通过自定义的MemoryEfficientSwish()函数实现swish激活操作，激活后调用搭建好的SE模块，并接着通过使用自定义的Conv2d()函数搭配相关参数实现1*1逐点普通卷积来对图像降维(Conv2d()自定义函数制作出图片能在神经网络运算时的形式，接着进行卷积操作。具体包括获取与输入图像同等大小的Conv2d形式，根据这种形式，按照卷积运算的数学公式搭配math库的函数实现规定参数的卷积操作，这里对应的是1*1卷积操作)(这里1*1卷积操作的目的是为了针对刚刚“k*k深度可分离卷积”对图像形状的破坏，对图像进行升维，恢复至刚刚升维前的图像形状)，接着通过torch库下的nn工具库里面的BatchNorm2d函数实现数据的批归一化操作,最后通过自定义的drop_connect()(此函数会根据失活概率随机使得输入特征图上的像素点的值变为0，目的是为了防止过拟合【在训练集上表现良好，在未知的测试集上表现差】。失活概率来自对每个MVBConv块各自的索引，由索引值除以MBConv块的个数(26)的除数，并乘上0.2(超参数)得到，使得越靠后的块被dropout的概率越大，再次避免了过拟合)实现连接跳越避免过拟合，完成整个MBConv的制作，具体参数见代码附件2；(该步骤用于提取出来图像的高级特征)Then build the MBConv block of the model. It is necessary to build the SE module in advance. First, use the adaptive_avg_pool2d function of the functional function library in the nn tool library under the torch library to implement global average pooling. Use the custom Conv2d() function (this custom function creates images that can be The form of the neural network operation, and then perform the convolution operation. Specifically, it includes obtaining the Conv2d form of the same size as the input image. According to this form, according to the mathematical formula of the convolution operation and the function of the math library to implement the convolution operation of the specified parameters, This corresponds to the 1*1 convolution operation) (the purpose of the 1*1 convolution operation here is to reduce the dimension, which can greatly reduce the amount of calculation and better fit the complex relationship between channels) and implement it with related parameters For 1*1 convolution for dimensionality reduction, the Swish function is implemented efficiently through the custom MemoryEfficientSwish() function (to avoid the Swish activation function compatibility issues caused by old and new versions of pycharm and low memory. Specifically, by accessing the "SiLU function" ", thereby determining the version situation and calling the swish function corresponding to the version. Low memory and high efficiency mainly save the final result of propagation separately in forward and reverse propagation, avoiding recalculation when the result value is needed later, saving money Running time) (The activation function here introduces nonlinear transformation to help the SE module adaptively learn the importance of each pixel in the input feature map. The result after passing the activation function has important information (pixels in the area) point) has a higher weight value, which improves the accuracy of the later results) implement the swish activation function, and use the custom Conv2d() function with relevant parameters to implement 1*1 convolution for dimensionality enhancement (this custom function creates The image can be in the form of a neural network operation, and then perform a convolution operation. Specifically, it includes obtaining a Conv2d form of the same size as the input image. According to this form, the mathematical formula of the convolution operation is used with the function of the math library to implement the convolution of the specified parameters. Product operation, which corresponds here to the 1*1 convolution operation) (The purpose of the 1*1 convolution operation here is to increase the dimension. Since the dimensionality has just been reduced to reduce the amount of calculation, the image form has changed, but it needs to be added later. The weight result here is multiplied with the original input image. In order to be successfully multiplied with the original image, the weight result must be dimensionally increased according to the original dimensionality reduction form, so this 1*1 convolution operation is performed), use the torch library The sigmoid function implements sigmoid activation, obtains the final weight and multiplies it with the original image to obtain the final processing result of the SE module. The entire MBConv block uses the custom Conv2d() function with relevant parameters to implement k*k depth separable convolution to increase the dimension of the image (Conv2d() function). The custom function creates the form of the image that can be used in the neural network operation, and then Perform a convolution operation. Specifically, it includes obtaining a Conv2d form of the same size as the input image. According to this form, the convolution operation with specified parameters is implemented according to the mathematical formula of the convolution operation and the function of the math library. This corresponds to k*k convolution. Product operation, k refers to the size of the convolution kernel in MBConv, corresponding to the parameters in Figure 18) (The purpose of the k*k convolution operation here is to increase the dimension, which can increase image channels, improve the expression ability of the model, and Reduce the loss of information), and then implement the batch normalization operation of the data through the BatchNorm2d function in the nn tool library under the torch library, and then implement the swish activation operation through the custom MemoryEfficientSwish() function, and call the built SE module after activation , and then use the custom Conv2d() function with relevant parameters to implement 1*1 point-by-point ordinary convolution to reduce the dimension of the image (Conv2d() custom function creates the form of the image that can be used in the neural network operation, and then proceeds Convolution operation. Specifically, it includes obtaining a Conv2d form of the same size as the input image. According to this form, the convolution operation with specified parameters is implemented according to the mathematical formula of the convolution operation and the function of the math library. This corresponds to 1*1 convolution. operation) (the purpose of the 1*1 convolution operation here is to increase the dimension of the image and restore it to the image shape just before the dimension increase), and then The batch normalization operation of the data is implemented through the BatchNorm2d function in the nn tool library under the torch library, and finally through the customized drop_connect() (this function will randomly change the value of the pixel on the input feature map to 0, the purpose is to prevent overfitting [performing well on the training set, performing poorly on the unknown test set]. The deactivation probability comes from the respective index of each MVBConv block, divided by the index value by the number of MBConv blocks The divisor of (26) is obtained by multiplying it by 0.2 (hyperparameter), so that the probability of dropout is greater for the later blocks, again avoiding overfitting) to implement connection skipping to avoid overfitting, and complete the production of the entire MBConv , see code attachment 2 for specific parameters; (this step is used to extract advanced features of the image)

再搭建模型的枝干。如图19，通过先使用自定义的Conv2d()函数搭配相关参数实现1*1卷积操作进行图像降维(该自定义函数制作出图片能在神经网络运算时的形式，接着进行卷积操作。具体包括获取与输入图像同等大小的Conv2d形式，根据这种形式，按照卷积运算的数学公式搭配math库的函数实现规定参数的卷积操作，这里对应的是1*1卷积操作)(这里将图像降维至10*10*1536(长，宽，通道数)，其目的为了减少后续的计算量、控制模型复杂度，并提升特征表达能力)，再通过torch库下的nn工具库里面的BatchNorm2d实现数据的批归一化操作，完成模型的枝干的搭建，具体参数见代码附件3；(枝干对块提取出来的图像的高级特征进行最终的预测和输出)Then build the branches of the model. As shown in Figure 19, by first using the custom Conv2d() function with relevant parameters to implement a 1*1 convolution operation for image dimensionality reduction (the custom function creates the form in which the image can be used in the neural network operation, and then performs the convolution operation . Specifically, it includes obtaining a Conv2d form of the same size as the input image. According to this form, the convolution operation with specified parameters is implemented according to the mathematical formula of the convolution operation and the function of the math library. This corresponds to the 1*1 convolution operation) ( Here, the image dimension is reduced to 10*10*1536 (length, width, number of channels). The purpose is to reduce the subsequent calculation amount, control the model complexity, and improve the feature expression ability), and then use the nn tool library under the torch library The BatchNorm2d inside implements the batch normalization operation of the data and completes the construction of the branches of the model. See code attachment 3 for specific parameters; (the branches perform the final prediction and output of the high-level features of the image extracted from the block)

最后搭建模型的分类器。如图20所示，先通过torch库下的nn工具库里面的AdaptiveAvgPool2d()函数实现全局平均池化，再通过torch库下的nn工具库里面的Dropout()函数实现随机失活，并通过torch库下的nn工具库里面的Linear()函数和Softmax()函数实现最后概率分布的分类输出，具体参数见代码附件4。完成上述操作后，最终完成改进的efficientnet神经网络模型的搭建，命名为model变量。(分类器对块提取出来的图像的高级特征进行最终的预测和输出)Finally, build the classifier of the model. As shown in Figure 20, global average pooling is first implemented through the AdaptiveAvgPool2d() function in the nn tool library under the torch library, and then random deactivation is implemented through the Dropout() function in the nn tool library under the torch library. The Linear() function and Softmax() function in the nn tool library under the library realize the classification output of the final probability distribution. See code attachment 4 for specific parameters. After completing the above operations, the construction of the improved efficientnet neural network model is finally completed and named as the model variable. (The classifier performs final prediction and output on the high-level features of the image extracted from the block)

神经网络模型调参(调参之前需要按照要求准备好训练集和验证集，训练集、验证集和测试集均取自轴承装配质量情况的数据集，整个数据集包含合格轴承【均布钢珠轴承——即各钢珠的间距在误差范围内的情况】和不合格轴承【非均布钢珠轴承——即各钢珠的间距在误差范围外的情况、缺失一颗钢珠轴承——即轴承内缺失一颗钢珠的情况、缺失两颗钢珠轴承——即轴承内缺失两颗钢珠的情况】两种。接着，对数据集按照6:2:2的比例进行划分，兼顾了“减少训练时间”和“提高测试可靠性”的目标)，先进行调参的预备工作。需要使用datasets库下的ImageFolder类根据图片保存的路径导入和预处理操作进行处理，并使用DataLoader将总体图片按照批次依次进行处理。同时，通过Torch库下的device()函数将运算设备设定为GPU加快训练速度，从而完成预备的工作、再进行内部参数调试。在训练时导入训练集，验证时导入验证集。Neural network model parameter adjustment (before adjusting parameters, you need to prepare the training set and verification set as required. The training set, verification set and test set are all taken from the bearing assembly quality data set. The entire data set contains qualified bearings [uniformly distributed steel ball bearings] - That is, the distance between each steel ball is within the error range] and unqualified bearings [Non-uniformly distributed steel ball bearings - that is, the distance between each steel ball is outside the error range, one steel ball bearing is missing - that is, one ball bearing is missing in the bearing. Two steel balls are missing, and two steel ball bearings are missing - that is, two steel balls are missing in the bearing. Then, the data set is divided according to the ratio of 6:2:2, taking into account "reduce training time" and " To improve test reliability"), first carry out the preparatory work for parameter adjustment. You need to use the ImageFolder class under the datasets library to import and preprocess the images according to the path where the images are saved, and use DataLoader to process the overall images in batches. At the same time, the computing device is set to GPU through the device() function under the Torch library to speed up training, thereby completing the preparatory work and then debugging internal parameters. The training set is imported during training and the validation set is imported during verification.

需要将搭建好的改进模型model通过自定义的he_init()函数(这里自定义函数采用init.kaiming_normal_、init.normal_和init.constant_函数对卷积层、批归一化层、全连接层进行初始化。这是由于最开始模型训练时的内部参数需要设置呈现出正态分布，防止后期训练的梯度消失和爆炸问题】，而其中的正态分布所对应的均值和标准差为超参数，为手动调节的)设置为He初始化，帮助更好地解决梯度爆炸和梯度消失的问题。之后用for语句循环导入训练集的所有批次并依次输入到改进的神经网络模型model中，完成正向传播。然后选择torch库下的nn工具库里面的CrossEntropyLoss()实现交叉熵损失函数的选择，通过将输出值和真实标签值一同导入交叉熵损失函数中计算出损失值。接着在损失值后面使用backward()进行反向传播计算各层梯度，最后通过Optim类下的Adam优化器使用step()完成根据各层梯度更新神经网络模型的参数操作，从而完成一次内部参数调节、接着进行超参数调试。在整个训练集训练完成后，采用一次验证集验证。即用for语句循环导入验证集的所有批次并依次输入到改进的神经网络模型model中，完成正向传播。然后将输出值和真实标签值一同导入torch库下的nn工具库里面的CrossEntropyLoss()交叉熵损失函数中计算出损失值即完成调试。最终保存在验证集上损失值最小的那组神经网络内部参数作为模型最终参数。同时，在本模型中存在多种超参数，经过多次实验对比后发现，使得每批读取的次数的batch_number为8，初始学习率为lr为1e-3，对于梯度和梯度平方的指数衰减率betas为0.9和0.999，分母小平滑项eps为1e-8，权重衰减weight_decay为0，Adam算法改进版amsgrad的使用选项选择True，并把训练集的迭代次数num_epochs设置为40可以帮助模型相比于源模型的超参数表现出更好性能、最后进行准确率测试。这次使用for语句循环导入测试集的所有批次并依次输入到改进的神经网络模型model中，通过,predicted＝torch.max(outputs.data,1)(因为在导入图像时使用的是ImageFolder函数，图像与其对应的类别标签是以元组的形式保存。经过神经网络训练后，由于采用的是交叉熵损失函数，最后的输出结果同样是“此标签的可能概率值”+“此标签的标识值”的元组形式。故采用torch.max函数得出可能概率值最大的一个元组，其索引第二位【“此标签的标识值”】输出给predicted，从而得到模型认为最可能的类别)将模型输出结果的预测标签提取出来，并导入本批次的真实标签进行比对，正确时就在变量correct上加一。最后将变量correct除以测试集总样本数即可得到最终结果。(见图14、16的图中代码)The built improved model model needs to be passed through the custom he_init() function (the custom function here uses init.kaiming_normal_, init.normal_ and init.constant_ functions for the convolution layer, batch normalization layer, and fully connected layer Initialize. This is because the internal parameters during initial model training need to be set to present a normal distribution to prevent gradient disappearance and explosion problems in later training], and the mean and standard deviation corresponding to the normal distribution are hyperparameters. (for manual adjustment) is set to He initialization to help better solve the problems of gradient explosion and gradient disappearance. Then use a for statement to loop through all the batches of the training set and input them into the improved neural network model in turn to complete forward propagation. Then select CrossEntropyLoss() in the nn tool library under the torch library to select the cross-entropy loss function. The loss value is calculated by importing the output value and the real label value into the cross-entropy loss function. Then use backward() after the loss value to perform back propagation to calculate the gradient of each layer. Finally, use step() through the Adam optimizer under the Optim class to complete the parameter operation of updating the neural network model according to the gradient of each layer, thereby completing an internal parameter adjustment. , and then perform hyperparameter debugging. After the training of the entire training set is completed, a verification set is used for verification. That is, use the for statement to loop through all the batches of the verification set and input them into the improved neural network model model in sequence to complete forward propagation. Then import the output value and the real label value into the CrossEntropyLoss() cross-entropy loss function in the nn tool library under the torch library to calculate the loss value and complete the debugging. The set of internal parameters of the neural network with the smallest loss value on the validation set is finally saved as the final parameters of the model. At the same time, there are a variety of hyperparameters in this model. After many experimental comparisons, it was found that the batch_number of the number of reads in each batch is 8, the initial learning rate lr is 1e-3, and the exponential decay of the gradient and the square of the gradient The rate betas are 0.9 and 0.999, the denominator small smoothing term eps is 1e-8, the weight decay weight_decay is 0, the use option of the improved version of Adam algorithm amsgrad is selected True, and the number of iterations of the training set num_epochs is set to 40 to help the model compare The hyperparameters of the Yuyuan model showed better performance, and finally the accuracy test was performed. This time, use the for statement to loop through all the batches of the test set and input them into the improved neural network model model in turn, through, predicted=torch.max(outputs.data,1) (because the ImageFolder function is used when importing images , the image and its corresponding category label are saved in the form of tuples. After neural network training, because the cross-entropy loss function is used, the final output result is also "the possible probability value of this label" + "the identification of this label" The tuple form of "value". Therefore, the torch.max function is used to obtain a tuple with the largest possible probability value, and the second bit of its index ["identification value of this label"] is output to predicted, thereby obtaining the most likely category considered by the model. ) Extract the predicted labels of the model output results and import the real labels of this batch for comparison. If correct, add one to the variable correct. Finally, divide the variable correct by the total number of samples in the test set to get the final result. (See the code in Figures 14 and 16)

完成上述操作后，训练过程中efficientnet源码的训练集和验证集的损失函数变化曲线如图13所示，训练过程中改进模型代码的训练集和验证集的损失函数变化曲线如图14所示，efficientnet源码的最终准确率如图15所示，改进模型代码的最终准确率如图16所示。本模型的超参数根据相应的调试结果进行调整，使得整个模型呈现出最佳性能，将初始化修改为he初始化，将优化器修改为adam优化器。通过这些改进，模型的整个误差函数曲线接近标准的误差函数曲线，测试时的准确率为100％。After completing the above operations, the change curves of the loss function of the training set and verification set of the efficientnet source code during the training process are shown in Figure 13. The change curves of the loss function of the training set and verification set of the improved model code during the training process are shown in Figure 14. The final accuracy of the efficientnet source code is shown in Figure 15, and the final accuracy of the improved model code is shown in Figure 16. The hyperparameters of this model are adjusted according to the corresponding debugging results, so that the entire model shows the best performance. The initialization is changed to he initialization, and the optimizer is changed to the adam optimizer. Through these improvements, the entire error function curve of the model is close to the standard error function curve, and the accuracy during testing is 100%.

本模型相比于原模型，在图像导入模型前额外加入了图像预处理。这些简化特征的操作，使得轴承装配情况的图片简化为只含轴承钢珠的图片，将神经网络处理任务的难度降低。整体上说，通过图像预处理，提高了导入模型前的图像质量，为“神经网络能够展现良好性能”打下基础。本模型在搭建时采用Python故本改进模型代码，先根据相应比例和EfficientNet-B0模型提前计算出EfficientNet-B3模型参数，并根据参数利用nn.Sequential()等容器函数直接搭建出EfficientNet-B3模型，节省了搭建B0模型、计算B3模型参数、搭建过程需调用多种py文件等可以避免的垃圾时间，降低对硬件要求，更加适应于轴承装配质量检测的任务。Compared with the original model, this model adds additional image preprocessing before importing the image into the model. The operation of these simplified features simplifies the picture of the bearing assembly into a picture containing only bearing steel balls, reducing the difficulty of the neural network processing task. Overall, image preprocessing improves the image quality before importing the model, laying the foundation for "the neural network to show good performance." This model uses Python to improve the model code when building it. First, the EfficientNet-B3 model parameters are calculated in advance based on the corresponding proportions and the EfficientNet-B0 model, and the EfficientNet-B3 model is directly built based on the parameters using container functions such as nn.Sequential(). , which saves avoidable waste time such as building B0 model, calculating B3 model parameters, and calling multiple py files during the construction process, reduces hardware requirements, and is more suitable for the task of bearing assembly quality inspection.

代码附件1：Code attachment 1:

Conv2d＝get_same_padding_conv2d(image_size＝(300,300))Conv2d＝get_same_padding_conv2d(image_size＝(300,300))

self.stem_block1＝nn.Sequential(Conv2d(in_channels＝1,out_channels＝40,kernel_size＝3,stride＝2,padding＝1,groups＝1,bias＝False),nn.BatchNorm2d(num_features＝40,momentum＝self._bn_mom,eps＝self._bn_eps))self.stem_block1=nn.Sequential(Conv2d(in_channels=1,out_channels=40,kernel_size=3,stride=2,padding=1,groups=1,bias=False),nn.BatchNorm2d(num_features=40,momentum=self ._bn_mom,eps=self._bn_eps))

代码附件2：Code attachment 2:

self.block2＝nn.Sequential(*[MBConv(in_channels＝40,out_channels＝16,kernel_size＝3,stride＝1,expand_ratio＝1,squeeze＝0.25,id_skip＝False,image_size＝(150,150))]self.block2=nn.Sequential(*[MBConv(in_channels=40, out_channels=16, kernel_size=3, stride=1, expand_ratio=1, squeeze=0.25, id_skip=False, image_size=(150,150))]

+[MBConv(in_channels＝16,out_channels＝16,kernel_size＝3,stride＝1,expand_ratio＝1,squeeze＝0.25,id_skip＝True,idx＝2,image_size＝(150,150))])+[MBConv(in_channels=16, out_channels=16, kernel_size=3, stride=1, expand_ratio=1, squeeze=0.25, id_skip=True, idx=2, image_size=(150,150))])

self.block3＝nn.Sequential(*[MBConv(in_channels＝16,out_channels＝32,kernel_size＝3,stride＝2,expand_ratio＝6,squeeze＝0.25,id_skip＝False,image_size＝(150,150))]self.block3=nn.Sequential(*[MBConv(in_channels=16, out_channels=32, kernel_size=3, stride=2, expand_ratio=6, squeeze=0.25, id_skip=False, image_size=(150,150))]

+[MBConv(in_channels＝32,out_channels＝32,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝4,image_size＝(75,75))]+[MBConv(in_channels=32, out_channels=32, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=4, image_size=(75,75))]

+[MBConv(in_channels＝32,out_channels＝32,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝5,image_size＝(75,75))])+[MBConv(in_channels=32, out_channels=32, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=5, image_size=(75,75))])

self.block4＝nn.Sequential(*[MBConv(in_channels＝32,out_channels＝48,kernel_size＝5,stride＝2,expand_ratio＝6,squeeze＝0.25,id_skip＝False,image_size＝(75,75))]+[MBConv(in_channels＝48,out_channels＝48,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝7,image_size＝(38,38))]self.block4=nn.Sequential(*[MBConv(in_channels=32, out_channels=48, kernel_size=5, stride=2, expand_ratio=6, squeeze=0.25, id_skip=False, image_size=(75,75))]+ [MBConv(in_channels=48, out_channels=48, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=7, image_size=(38,38))]

+[MBConv(in_channels＝48,out_channels＝48,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝8,image_size＝(38,38))])+[MBConv(in_channels=48, out_channels=48, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=8, image_size=(38,38))])

self.block5＝nn.Sequential(*[MBConv(in_channels＝48,out_channels＝96,kernel_size＝3,stride＝2,expand_ratio＝6,squeeze＝0.25,id_skip＝False,image_size＝(38,38))]self.block5=nn.Sequential(*[MBConv(in_channels=48, out_channels=96, kernel_size=3, stride=2, expand_ratio=6, squeeze=0.25, id_skip=False, image_size=(38,38))]

+[MBConv(in_channels＝96,out_channels＝96,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝10,image_size＝(19,19))]+[MBConv(in_channels=96, out_channels=96, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=10, image_size=(19,19))]

+[MBConv(in_channels＝96,out_channels＝96,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝11,image_size＝(19,19))]+[MBConv(in_channels=96, out_channels=96, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=11, image_size=(19,19))]

+[MBConv(in_channels＝96,out_channels＝96,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝12,image_size＝(19,19))]+[MBConv(in_channels=96, out_channels=96, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=12, image_size=(19,19))]

+[MBConv(in_channels＝96,out_channels＝96,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝13,image_size＝(19,19))])+[MBConv(in_channels=96, out_channels=96, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=13, image_size=(19,19))])

self.block6＝nn.Sequential(*[MBConv(in_channels＝96,out_channels＝136,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝False,image_size＝(19,19))]+[MBConv(in_channels＝136,out_channels＝136,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝15,image_size＝(19,19))]self.block6=nn.Sequential(*[MBConv(in_channels=96, out_channels=136, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=False, image_size=(19,19))]+ [MBConv(in_channels=136, out_channels=136, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=15, image_size=(19,19))]

+[MBConv(in_channels＝136,out_channels＝136,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝16,image_size＝(19,19))]+[MBConv(in_channels=136, out_channels=136, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=16, image_size=(19,19))]

+[MBConv(in_channels＝136,out_channels＝136,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝17,image_size＝(19,19))]+[MBConv(in_channels=136, out_channels=136, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=17, image_size=(19,19))]

+[MBConv(in_channels＝136,out_channels＝136,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝18,image_size＝(19,19))])+[MBConv(in_channels=136, out_channels=136, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=18, image_size=(19,19))])

self.block7＝nn.Sequential(*[MBConv(in_channels＝136,out_channels＝232,kernel_size＝5,stride＝2,expand_ratio＝6,squeeze＝0.25,id_skip＝False,image_size＝(19,19))]+[MBConv(in_channels＝232,out_channels＝232,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝20,image_size＝(10,10))]self.block7=nn.Sequential(*[MBConv(in_channels=136, out_channels=232, kernel_size=5, stride=2, expand_ratio=6, squeeze=0.25, id_skip=False, image_size=(19,19))]+ [MBConv(in_channels=232, out_channels=232, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=20, image_size=(10,10))]

+[MBConv(in_channels＝232,out_channels＝232,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝21,image_size＝(10,10))]+[MBConv(in_channels=232, out_channels=232, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=21, image_size=(10,10))]

+[MBConv(in_channels＝232,out_channels＝232,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝22,image_size＝(10,10))]+[MBConv(in_channels=232, out_channels=232, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=22, image_size=(10,10))]

+[MBConv(in_channels＝232,out_channels＝232,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝23,image_size＝(10,10))]+[MBConv(in_channels=232, out_channels=232, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=23, image_size=(10,10))]

+[MBConv(in_channels＝232,out_channels＝232,kernel_size＝5,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝24,image_size＝(10,10))])+[MBConv(in_channels=232, out_channels=232, kernel_size=5, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=24, image_size=(10,10))])

self.block8＝nn.Sequential(*[MBConv(in_channels＝232,out_channels＝384,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝False,image_size＝(10,10))]+[MBConv(in_channels＝384,out_channels＝384,kernel_size＝3,stride＝1,expand_ratio＝6,squeeze＝0.25,id_skip＝True,idx＝26,image_size＝(10,10))])self.block8=nn.Sequential(*[MBConv(in_channels=232, out_channels=384, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=False, image_size=(10,10))]+ [MBConv(in_channels=384, out_channels=384, kernel_size=3, stride=1, expand_ratio=6, squeeze=0.25, id_skip=True, idx=26, image_size=(10,10))])

代码附件3：Code attachment 3:

Conv2d＝get_same_padding_conv2d(image_size＝(10,10))Conv2d＝get_same_padding_conv2d(image_size＝(10,10))

self.head_conv9＝nn.Sequential(Conv2d(in_channels＝384,out_channels＝1536,kernel_size＝1,stride＝1,padding＝1,groups＝1,bias＝False),nn.BatchNorm2d(num_features＝1536,momentum＝self._bn_mom,eps＝self._bn_eps))self.head_conv9=nn.Sequential(Conv2d(in_channels=384, out_channels=1536, kernel_size=1, stride=1, padding=1, groups=1, bias=False), nn.BatchNorm2d(num_features=1536, momentum=self ._bn_mom,eps=self._bn_eps))

代码附件4：Code attachment 4:

self._avg_pooling＝nn.AdaptiveAvgPool2d(1)self._avg_pooling=nn.AdaptiveAvgPool2d(1)

self.classifier＝nn.Sequential(nn.Dropout(0.3),self.classifier=nn.Sequential(nn.Dropout(0.3),

nn.Linear(1536,self.num_classes),nn.Linear(1536,self.num_classes),

nn.Softmax(dim＝1))nn.Softmax(dim=1))

显然本发明具体实现并不受上述方式的限制，只要采用了本发明的方法构思和技术方案进行的各种非实质性的改进，均在本发明的保护范围之内。Obviously, the specific implementation of the present invention is not limited by the above-mentioned manner. As long as various non-substantive improvements are made using the method concepts and technical solutions of the present invention, they are all within the protection scope of the present invention.

Claims

1. An image detection method based on an improved efficientnet model, which is characterized by including the following steps:

(1) Preprocess the image to be detected;

(2) Build a neural network model based on the improved efficientnet model and conduct training;

(3) Use the neural network model trained in step (2) to detect and output the image to be detected.

2. An image detection method based on an improved efficientnet model as claimed in claim 1, characterized in that:

In step 2, the neural network model built based on the improved efficientnet model includes: stems used to extract shallow-level features of the image, MBConv blocks used to extract shallow-level features into deep-level features, and transform deep-level features into are the branches of the output and the classifier used to classify the output.

3. A kind of image detection method based on improved efficientnet model as claimed in claim 1, characterized in that:

In step (1), preprocessing operations include image cropping, grayscale, equalization to enhance image features, median filtering to remove the influence of lighting, high-boosting filtering to separate the image from the background, and binary extraction of the visual part of the image. ization, image inpainting, tensorization into pytorch operations, and standardized operations to speed up training.

4. An image detection method based on an improved efficientnet model as claimed in claim 1 or 2, characterized in that:

The stems used to extract shallow-level features of images include: convolution module, normalization operation module and swish function. The input image undergoes image dimensionality increase through the convolution module, and then performs batch normalization operation through the normalization operation module. , and finally output the shallow features of the image through the swish activation function.

5. An image detection method based on an improved efficientnet model as claimed in claim 1 or 2, characterized in that:

The MBConv block used to extract shallow-level features into deep-level features includes: 1*1 convolution for dimensionality increase, k*k-dimensional depth separable convolution, SE module, 1*1 dimensionality reduction convolution and Dropout layer; the input image features first undergo 1*1 convolution to increase the dimension, then pass through the BN layer and Swish activation function, and then are sent to the k*k-dimensional depth separable convolution for processing, and then go through the BN layer and Swish activation again. The function is then sent to the SE module. After processing by the SE module, it is sent to the Dropout layer after 1*1 dimensionality reduction convolution and BN layer. The Dropout output and the initial input of MBConv are added bitwise to obtain the final output of the MBConv block.

6. An image detection method based on an improved efficientnet model as claimed in claim 1 or 2, characterized in that:

The branches used to convert deep features into output include 1*1 dimensionality reduction convolution and batch normalization operation modules.

7. An image detection method based on an improved efficientnet model as claimed in claim 2, characterized in that:

The classifier used to classify the output consists of a global average pooling layer that sequentially connects the outputs, a random dropout layer, a fully connected layer, and a softmax activation function.

8. An image detection method based on an improved efficientnet model as claimed in claim 1 or 2, characterized in that:

The training of the neural network model based on the improved efficientnet model includes:

Establish training set, validation set and test set;

The model is trained based on the images in the training set to complete the internal parameter adjustment of the model until the training of the entire training set is completed, and a model of the internal parameters of this batch is obtained;

Then use the verification set for verification, import the verification set data and input it into the model obtained in this batch in turn, compare the current model output value and the real label value, and then calculate the accuracy of the current model on the verification set, as this batch The accuracy rate corresponding to the model of the obtained internal parameters is obtained, and finally the internal parameters are saved to an optimal parameter file;

Then conduct the second batch of training, use the training set again to train a model with new internal parameters, and compare and calculate the accuracy of the new model on the verification set as the corresponding internal parameters obtained in the second batch. Accuracy.

Then compare the accuracy of the second batch with the accuracy of the previous batch. If the accuracy is higher, the internal parameters of the second batch will be used to cover the optimal parameter file; if the accuracy is the same or lower, it is not correct. The optimal parameter file is updated.

Then within the specified number of batches, training and updating are performed cyclically, and finally the best internal parameters with the highest accuracy are obtained to complete the training of the model.

9. An image detection method based on an improved efficientnet model as claimed in claim 8, characterized in that:

Updating the internal parameters of the model based on the training set includes:

Preparatory work for parameter adjustment: import images according to the path where the images in the training set are saved and perform preprocessing operations on the images;

Internal parameter debugging: He initializes the built model to be improved, and then imports a picture of the training set into the improved neural network model to complete forward propagation;

Then select the cross-entropy loss function, and calculate the loss value by importing the output value of the model and the real label value into the cross-entropy loss function; then perform backpropagation based on the loss value to calculate the gradient of each layer, and finally update the neural network according to the gradient of each layer For the internal parameters of the model, the same operation is used to complete the images of the entire training set, thereby completing the internal parameter adjustment of the current batch.

10. An image detection method based on an improved efficientnet model as claimed in claim 8, characterized in that:

The hyperparameters of the model are set as follows: the batch_number of the number of reads per batch is 8, the initial learning rate lr is 1e-3, the exponential decay rate betas for the gradient and gradient square are 0.9 and 0.999, and the denominator small smoothing term eps is 1e -8, weight decay weight_decay is 0, select True for the usage option of the improved version of amsgrad of Adam algorithm, and set the number of iterations of the training set num_epochs to 40.