Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide an aggregate detection and classification method based on an AI intelligent machine vision technology.
The invention provides the following technical scheme:
an aggregate detection and classification method based on an AI intelligent machine vision technology comprises the following steps:
S1, shooting aggregate transportation videos by using a high-definition camera, and transmitting the aggregate transportation videos in real time through a 5G network to obtain aggregate images;
S2, processing the aggregate image by an image edge segmentation processing method, and extracting edge characteristics;
s3, manually marking the aggregate category of the processed aggregate image;
S4, inputting the marked aggregate image into a machine vision algorithm, training, extracting features, and outputting the features as a weight file;
s5, replaying the trained weight in a detection system, and detecting and identifying the aggregate.
Preferably, in step S2, the image edge segmentation processing method specifically includes the following steps:
A, importing a bone material image, carrying out graying treatment on the bone material image, and then solving a gradient map;
B, processing the image by using a watershed algorithm on the basis of the gradient map to obtain edge lines of the segmented image;
c, using image opening operation to remove small objects in the picture and burr interference affecting the target;
And D, carrying out adhesion segmentation, and segmenting out objects adhered together.
Preferably, in step S4, the machine vision algorithm training process specifically includes the following steps:
a, selecting 5 detection areas for each marked aggregate image;
b, respectively carrying out image convolution on each detection area to extract image features;
c, carrying out up-sampling treatment on the image, and restoring the convolved feature map into the image;
d, integrating the image features by using a tensor stitching algorithm;
e, outputting the image to a detection head, wherein the detection head part classifies the image, and judging the classification result of the identification image through a probability function;
f, identifying the result and outputting the detection result of the image.
Preferably, in the step a, the 5 detection areas are respectively the center point, the upper left, the upper right, the lower left and the lower right areas of the aggregate image.
Preferably, in step b, the image features include edge features, color features and texture features of the aggregate.
Preferably, in step c, the image is up-sampled so that the image conforms to the size of the display area, and an interpolation method is adopted, that is, a suitable interpolation algorithm is adopted between pixel points based on the original image pixels to insert new elements.
Preferably, in step C, the gradient image is thresholded to reduce over-segmentation due to gray scale variation using a processing method that modifies the gradient function so that the basin is only responsive to the target to be detected.
S6, testing a new image, judging whether the sample size is sufficient or not according to the test condition, keeping the sample size as uniform as possible, if the accuracy of the weight to a certain category is lower than 90%, increasing the corresponding sample size, and re-extracting the characteristics until the identification rate of each category of aggregate reaches more than 90%, and meeting the requirements.
Preferably, in step B, the watershed calculation process is divided into two steps, one is a sequencing process and the other is a flooding process. The gray level of each pixel is firstly ordered from low to high, and then in the process of realizing flooding from low to high, the influence domain of each local minimum value at the h-order height is judged and marked by adopting a first-in first-out (FIFO) structure. The watershed transformation is to obtain a water collecting basin image of the input image, and boundary points among the water collecting basins are the watershed. Obviously, watershed represents the input image maximum point. Therefore, to obtain edge information of an image, a gradient image is generally used as an input image, i.e
g(x,y)=grad(f(x,y))={[f(x,y)-f(x-1,y)]×2f(x,y)-f(x,y-1)]×2}×0.5
Where f (x, y) represents the original image and grad represents the gradient operation.
Preferably, in step C, to reduce the over-segmentation generated by the watershed algorithm, the gradient function is modified, and a simple method is to thresholde the gradient image to eliminate the over-segmentation generated by small changes in gray scale. I.e.
g(x,y)=max(grad(f(x,y)),gθ)
Where gθ represents a threshold value.
The program adopts the method that the threshold value is used for limiting the gradient image so as to eliminate excessive segmentation caused by tiny change of gray values, a proper amount of areas are obtained, gray levels of edge points of the areas are ordered from low to high, then the submerged process is realized from low to high, and the gradient image is obtained by calculating with a Sobel operator.
The operator comprises two groups of 3x3 matrixes which are respectively transverse and longitudinal, and the two groups of matrixes are subjected to plane convolution with an image, so that the brightness difference approximate values of the transverse and longitudinal directions can be obtained respectively. The formula is as follows:
the lateral and longitudinal gradient approximations for each pixel of the image may be combined using the following equations to calculate the magnitude of the gradient.
Where A represents the original image and G x and G y represent the images with lateral and longitudinal edge detection, respectively.
After pretreatment, obtaining edge information, and selecting a minimum rectangular frame from the frames.
Preferably, in step S4, the following will be done using an algorithm:
picture classification, namely inputting pictures into a multi-layer convolutional neural network, outputting a characteristic vector, and feeding back to a softmax unit to predict the types of the pictures.
And (3) positioning and classifying, namely judging whether a detected target exists in the graph by using an algorithm, marking the position of the target, marking by using a frame (Bounding Box), and identifying and positioning the target.
Object detection, namely, a picture can contain a plurality of objects, and a part of single picture also has a plurality of different classified objects so as to detect the plurality of objects.
The Softmax function, the normalized exponential function, is effectively a logarithmic normalization of the gradient of the finite term discrete probability distribution. In multiple logistic regression and linear discriminant analysis, the input of the function is the result from K different linear functions, and the probability formula for sample vector x belonging to the j-th class is:
This can be seen as a composite of Softmax functions of K linear functions.
The method comprises the steps of representing targets by using an anchor Box, dividing an input picture into S, generating n anchor boxes by each small grid, performing IOU calculation on a real frame of an image and the anchor boxes generated by the small grids where the center points of the image are located, enabling the regressed frame to be Bounding Box, enabling the performance of a model to be better when the anchor boxes are closer to the real width and height, clustering width and height with representative shapes in ground truth Box of all samples in a training set by using a k-means algorithm, clustering (dimension cluster) dimensions, clustering out a plurality of anchor Box groups with different numbers, respectively applying the anchor Box groups to the model, and finally finding out the optimal anchor boxes which are balanced between the complexity and the high recall rate (HIGH RECALL) of the model.
Bounding Box formula:
Wherein a w and a h are the width and height of an anchor box,
T w and t h are the widths and heights directly predicted by the binding box,
B w and b h are the actual widths and heights of the post-conversion predictions,
This is the width and height of the output in the final prediction.
The loss function for object detection is as follows:
In the above formula, N is the total number of categories, yi is the probability of the current category obtained after the excitation function, yi judges whether the prior frame is responsible for the target (0 or 1), if yes, 0, otherwise, 1.
Compared with the prior art, the invention has the following beneficial effects:
(1) According to the aggregate detection and classification method based on the AI intelligent machine vision technology, aggregate images are obtained in real time through a 5G network by using a camera, then a computer image processing technology is applied to process videos, and different aggregate state images are detected and identified by combining an AI intelligent algorithm, so that the problem of complex operation of the existing aggregate monitoring method is solved, and the real-time processing function is realized.
(2) According to the aggregate detection and classification method based on the AI intelligent machine vision technology, aggregate edge information, texture features and aggregate particle size features are extracted according to the acquired aggregate images, the images are processed through the image processing technology, and the images are preprocessed, so that the aggregate features are more obvious.
(3) According to the aggregate detection and classification method based on the AI intelligent machine vision technology, each collected image is divided into 5 detection areas, one image can be judged for 5 times, and therefore multiple times of judgment of one-time identification can be achieved.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, of the embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1-6, the detection objects of the invention are five kinds of aggregates, namely stones, small stones, machine-made sand, small machine-made sand and face sand, wherein the particle size of the stones is in the range of 3-4 cm, the particle size of the small stones is in the range of 2-3 cm, the particle size of the machine-made sand is in the range of 1-2 cm, the particle size of the face sand is in the range of 0.1-0.5 cm, and the collected aggregate images comprise different light states, the dry and wet states of the aggregates and the states with different shooting distances.
Fig. 1 is a schematic view of a scene of the invention, a high-definition camera is positioned obliquely above a vehicle, and after a transport vehicle is in place, a video is shot and a frame sequence picture is extracted. The aggregate detection model system comprises two parts, wherein one part is used for positioning a transport vehicle and acquiring a high-definition camera image, and the other part is used for processing image data and displaying results.
An aggregate detection and classification method based on an AI intelligent machine vision technology comprises the following steps:
S1, shooting aggregate transportation videos by using a high-definition camera, and transmitting the aggregate transportation videos in real time through a 5G network to obtain aggregate images;
S2, processing the aggregate image by an image edge segmentation processing method, and extracting edge characteristics;
s3, manually marking the aggregate category of the processed aggregate image;
S4, inputting the marked aggregate image into a machine vision algorithm, training, extracting features, and outputting the features as a weight file;
s5, replaying the trained weight in a detection system, and detecting and identifying the aggregate.
In step S2, the image edge segmentation processing method specifically includes the following steps:
A, importing a bone material image, carrying out graying treatment on the bone material image, and then solving a gradient map;
B, processing the image by using a watershed algorithm on the basis of the gradient map to obtain edge lines of the segmented image;
c, using image opening operation to remove small objects in the picture and burr interference affecting the target;
And D, carrying out adhesion segmentation, and segmenting out objects adhered together.
FIG. 4 is a flow chart of the image edge segmentation technique of the present invention. First, a captured image is imported. And (5) carrying out graying treatment on the image. The third step uses image open operation to remove small objects in the picture, and influences the burr interference of the target. And finally, carrying out adhesion segmentation, and segmenting out objects adhered together.
FIG. 5 is a schematic illustration of a blocking segment of the present invention that segments the edges of aggregate.
In practical application, aggregate accumulation is considered, edge information is mainly distinguished by color light and shade boundaries, and similarity between adjacent pixels is used as an important reference basis in the segmentation process, so that pixel points which are similar in spatial position and gray value (gradient solving) are connected with each other to form a closed contour.
The operation steps are that the color image is graying, then the gradient map is obtained, and finally the watershed algorithm is carried out on the basis of the gradient map, so as to obtain the edge line of the segmented image.
In real images, there is often an over-segmentation phenomenon using watershed algorithms due to the presence of noise points or other interfering factors, because of the presence of many very small local extremal points. To solve the problem of over-segmentation, a watershed algorithm based on a mark image may be used, i.e. guided by a priori knowledge, in order to obtain a better image segmentation effect.
In step B, the watershed calculation process is divided into two steps, one is a ranking process and the other is a flooding process. The gray level of each pixel is firstly ordered from low to high, and then in the process of realizing flooding from low to high, the influence domain of each local minimum value at the h-order height is judged and marked by adopting a first-in first-out (FIFO) structure. The watershed transformation is to obtain a water collecting basin image of the input image, and boundary points among the water collecting basins are the watershed. Obviously, watershed represents the input image maximum point. Therefore, to obtain edge information of an image, a gradient image is generally used as an input image, i.e
g(x,y)=grad(f(x,y))={[f(x,y)-f(x-1,y)]×2f(x,y)-f(x,y-1)]×2}×0.5
Where f (x, y) represents the original image and grad represents the gradient operation.
The watershed algorithm has good response to weak edges, and noise in an image and fine gray level change of the surface of an object can generate the phenomenon of over-segmentation. But at the same time it should be seen that the watershed algorithm has a good response to weak edges, which is guaranteed by closed continuous edges. In addition, the closed catchment basin obtained by the watershed algorithm provides possibility for analyzing the regional characteristics of the image.
In step C, to reduce the over-segmentation generated by the watershed algorithm, the gradient function is modified, and a simple method is to thresholde the gradient image to eliminate the over-segmentation generated by small changes in gray scale. I.e.
g(x,y)=max(grad(f(x,y)),gθ)
Where gθ represents a threshold value.
The program adopts the method that the threshold value is used for limiting the gradient image so as to eliminate excessive segmentation caused by tiny change of gray values, a proper amount of areas are obtained, gray levels of edge points of the areas are ordered from low to high, then the submerged process is realized from low to high, and the gradient image is obtained by calculating with a Sobel operator.
The operator comprises two groups of 3x3 matrixes which are respectively transverse and longitudinal, and the two groups of matrixes are subjected to plane convolution with an image, so that the brightness difference approximate values of the transverse and longitudinal directions can be obtained respectively. The formula is as follows:
the lateral and longitudinal gradient approximations for each pixel of the image may be combined using the following equations to calculate the magnitude of the gradient.
Where A represents the original image and G x and G y represent the images with lateral and longitudinal edge detection, respectively.
After pretreatment, obtaining edge information, and selecting a minimum rectangular frame from the frames.
In step S4, the machine vision algorithm training process specifically includes the following steps:
a, selecting 5 detection areas for each marked aggregate image;
b, respectively carrying out image convolution on each detection area to extract image features;
c, carrying out up-sampling treatment on the image, and restoring the convolved feature map into the image;
d, integrating the image features by using a tensor stitching algorithm;
e, outputting the image to a detection head, wherein the detection head part classifies the image, and judging the classification result of the identification image through a probability function;
f, identifying the result and outputting the detection result of the image.
The final purpose that this patent expects to reach is to carry out real-time detection recognition to the aggregate, and detects the background complicacy, needs to discern under the condition of different illumination, different distances and the aggregate different water content, and the characteristic has the diversity, and simple image processing can not satisfy the judgement to sample characteristic, consequently uses machine vision algorithm to carry out the input and the training of sample under the different states.
And (3) processing the acquired images in the steps, manually marking, marking aggregate categories, inputting a machine vision algorithm, and extracting features.
The machine vision algorithm used in this patent is mainly 6 work steps. Firstly, selecting 5 detection areas, namely dividing an image into 5 working areas, judging a final result by using an odd number of areas, and selecting 5 recognition categories with the recognition probability of more than 50% as the final result. Secondly, image convolution, namely, the purpose of extracting image features is achieved by carrying out image convolution on each region, and the image features mainly comprise edge features, color features and texture features of aggregate. Thirdly, the image is up-sampled, so that the image accords with the size of a display area, and an interpolation method is adopted, namely, new elements are inserted between pixel points by adopting a proper interpolation algorithm on the basis of original image pixels. Fourth, tensor stitching, which uses tensor stitching to integrate image features, since we need to process the three channels (RGB) of the image separately before they can be recombined to generate a new image. Fifthly, outputting the image to a detection head, wherein the detection head part classifies the image, and judging the classification result of the identification image through a probability function. And sixthly, identifying the result, and outputting the detection result of the image, namely, the maximum probability result that the algorithm considers the image to be of a certain type.
Fig. 2 is a main flow chart of an aggregate detection method and system based on an AI intelligent vision processing technology, wherein the flow comprises four parts of aggregate region image acquisition, image preprocessing, image feature extraction and aggregate detection judgment. The system firstly judges whether a vehicle is in place or not, if so, acquires real-time image information, performs region selection and image preprocessing, then detects images by using a machine vision algorithm, performs recognition, records recognition results, outputs a final result if the probability of the selected 5 images is greater than 50%, judges whether the vehicle starts to pour, and if so, stops judging.
Fig. 3 is a technical route of the patent, wherein a training sample image is preprocessed, aggregate edges are segmented, feature extraction and training learning are performed, an efficient aggregate detection intelligent processing model is built, and then a tested sample image is processed and compared to obtain an actual aggregate identification processing result.
FIG. 6 is a flow chart of a machine vision algorithm process used in the present invention. Firstly, 5 areas, namely an image center point, an upper left area, an upper right area, a lower left area and a lower right area, are extracted from each image, so that multiple judgment of one-time identification is realized, each area is processed, image characteristics such as light, angles and textures are extracted through image convolution, an up-sampling treatment is carried out on the image, the convolved characteristic images are restored into the image, then a tensor stitching algorithm is used for expanding the dimension of the image, and finally a detection head prediction result is used. The final recognition result of the 5 regions is the consistent result of the highest probability, and more than 50% of the 5 regions are the same predicted result. And judging that the detection is finished after the 5 areas are identified.
In the step a, the 5 detection areas are respectively the center point, the upper left, the upper right, the lower left and the lower right areas of the aggregate image.
In step b, the image features include edge features, color features, and texture features of the aggregate.
In step c, the image is up-sampled to make the image conform to the size of the display area, and an interpolation method is adopted, namely, a proper interpolation algorithm is adopted to insert new elements between pixel points on the basis of original image pixels.
In step C, the gradient image is thresholded by a processing method that modifies the gradient function so that the basin responds only to the target to be detected, to reduce over-segmentation due to gray scale variation.
Example 2
S6, testing a new image, judging whether the sample size is sufficient according to the test condition, keeping the sample size as uniform as possible, if the accuracy of the weight to a certain category is lower than 90%, increasing the corresponding sample size, re-extracting the characteristics until the identification rate of each category of aggregate reaches more than 90%, and meeting the requirements.
In step S4, the following will be done using the algorithm:
picture classification, namely inputting pictures into a multi-layer convolutional neural network, outputting a characteristic vector, and feeding back to a softmax unit to predict the types of the pictures.
And (3) positioning and classifying, namely judging whether a detected target exists in the graph by using an algorithm, marking the position of the target, marking by using a frame (Bounding Box), and identifying and positioning the target.
Object detection, namely, a picture can contain a plurality of objects, and a part of single picture also has a plurality of different classified objects so as to detect the plurality of objects.
The Softmax function, the normalized exponential function, is effectively a logarithmic normalization of the gradient of the finite term discrete probability distribution. In multiple logistic regression and linear discriminant analysis, the input of the function is the result from K different linear functions, and the probability formula for sample vector x belonging to the j-th class is:
This can be seen as a composite of Softmax functions of K linear functions.
The method comprises the steps of representing targets by using an anchor Box, dividing an input picture into S, generating n anchor boxes by each small grid, performing IOU calculation on a real frame of an image and the anchor boxes generated by the small grids where the center points of the image are located, enabling the regressed frame to be Bounding Box, enabling the performance of a model to be better when the anchor boxes are closer to the real width and height, clustering width and height with representative shapes in ground truth Box of all samples in a training set by using a k-means algorithm, clustering (dimension cluster) dimensions, clustering out a plurality of anchor Box groups with different numbers, respectively applying the anchor Box groups to the model, and finally finding out the optimal anchor boxes which are balanced between the complexity and the high recall rate (HIGH RECALL) of the model.
Bounding Box formula:
Wherein a w and a h are the width and height of an anchor box,
T w and t h are the widths and heights directly predicted by the binding box,
B w and b h are the actual widths and heights of the post-conversion predictions,
This is the width and height of the output in the final prediction.
The loss function for object detection is as follows:
In the above formula, N is the total number of categories, yi is the probability of the current category obtained after the excitation function, yi judges whether the prior frame is responsible for the target (0 or 1), if yes, 0, otherwise, 1.
According to the invention, after the aggregate image is obtained, the edge characteristics are extracted by an image processing method, so that the aggregate characteristics are enhanced. Firstly, graying the picture, and then extracting the image edge through a watershed algorithm. After all the collected images are preprocessed, the images are manually marked, input into a machine vision algorithm, extracted in characteristics and output as a weight file. The method comprises the working processes of shooting aggregate transportation videos by using a high-definition camera, carrying out real-time transmission through a 5G network, extracting key areas in an image frame sequence of a monitoring area, starting to carry out preliminary processing on images, preprocessing by using an image processing method, carrying out edge detection, carrying out manual marking after image acquisition processing, inputting the images into a machine vision algorithm for training, and finally replaying the trained weights in a detection system for detecting and identifying the aggregates. Aggregate images are obtained in real time through a 5G network by using a camera, then a computer image processing technology is applied to process videos, and images in different aggregate states are detected and identified by combining an AI intelligent algorithm, so that the problem of complex operation of the existing aggregate monitoring method is solved, and the real-time processing function is realized.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made by those skilled in the art, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the scope of the present invention.