TWI810946B

TWI810946B - Method for identifying image, computer device and storage medium

Info

Publication number: TWI810946B
Application number: TW111119324A
Authority: TW
Inventors: 李潔; 郭錦斌
Original assignee: 鴻海精密工業股份有限公司
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2023-08-01
Also published as: TW202347245A

Abstract

The present application relates to an image analysis technology, and provides a method for identifying image, a computer device and a storage medium. The method includes obtaining an image to be identified, an initial annotation image and an initial annotation result of the initial annotation image, comparing the initial annotation result with a preset annotation result; obtaining a target image and an annotation result of the target image by inputting the initial annotation image into a first pre-constructed semantic segmentation network in response that the initial annotation result is different from the preset annotation result; and obtaining a second semantic segmentation network by training the first semantic segmentation network based on the target image and the annotation result of the target image. Finally, an image annotation result of the image to be identified is obtained by inputting the image to be identified into the second semantic segmentation network. This application can improve the accuracy of image recognition.

Description

Image recognition method, computer equipment and storage medium

本申請涉及影像處理領域，尤其涉及一種圖像識別方法、電腦設備及儲存介質。 The present application relates to the field of image processing, in particular to an image recognition method, computer equipment and storage media.

在目前的圖像識別方案中，由於訓練圖像的標註維度較多，同時每個標註維度的訓練圖像的數量較少，會使得訓練好的模型的泛化能力不佳，導致圖像識別的準確性低下。 In the current image recognition scheme, due to the large number of annotation dimensions of the training image, and the small number of training images for each annotation dimension, the generalization ability of the trained model will be poor, resulting in image recognition accuracy is low.

鑒於以上內容，有必要提供一種圖像識別方法、電腦設備及儲存介質，能夠提高圖像的識別準確性。 In view of the above, it is necessary to provide an image recognition method, computer equipment and storage medium, which can improve the accuracy of image recognition.

一種圖像識別方法，所述圖像識別方法包括：獲取待識別圖像，並獲取初始標註圖像及所述初始標註圖像的初始標註結果；構建第一語義分割網路；將所述初始標註結果與預設標註結果進行比較，得到比較結果；若所述比較結果為所述初始標註結果與所述預設標註結果不相同，則將所述初始標註圖像輸入到所述第一語義分割網路中，得到與所述初始標註圖像對應的目標圖像及所述目標圖像的目標標註結果；基於多張所述目標圖像及每張目標圖像的目標標註結果對所述第一語義分割網路進行訓練，得到第二語義分割網路；將所述待識別圖像輸入到所述第二語義分割網路中，得到所述待識別圖像的圖像標註結果。 An image recognition method, the image recognition method comprising: Obtaining the image to be identified, and obtaining an initial labeling image and an initial labeling result of the initial labeling image; constructing a first semantic segmentation network; comparing the initial labeling result with a preset labeling result to obtain a comparison result; If the comparison result is that the initial labeling result is different from the preset labeling result, the initial labeling image is input into the first semantic segmentation network to obtain the corresponding initial labeling image. The target image of the target image and the target labeling result of the target image; the first semantic segmentation network is trained based on multiple target images and the target labeling results of each target image to obtain the second semantic segmentation A network; inputting the image to be recognized into the second semantic segmentation network to obtain an image annotation result of the image to be recognized.

根據本申請可選實施例，在將所述初始標註結果與預設標註結果進行比較之前，所述方法還包括：獲取預設的多個顏色標籤、多個預設數值及多個預設類別；建立每個預設數值與每個顏色標籤的對應關係，得到多個目標標籤；建立每個預設類別與每個目標標籤的對應關係，得到所述預設標註結果。 According to an optional embodiment of the present application, before comparing the initial labeling result with the preset labeling result, the method further includes: acquiring a plurality of preset color labels, a plurality of preset values and a plurality of preset categories ; establishing a corresponding relationship between each preset value and each color label to obtain a plurality of target labels; establishing a corresponding relationship between each preset category and each target label to obtain the preset labeling result.

根據本申請可選實施例，所述初始標註結果包含多個初始類別及每個初始類別對應的初始標籤，所述將所述初始標註結果與預設標註結果進行比較，得到比較結果包括：獲取所述初始標註圖像中的每個初始物件，並確定每個初始物件對應的初始類別；若任一初始類別不存在對應的預設類別，確定所述比較結果為所述初始標註結果與所述預設標註結果不相同；或者，若每個初始類別都存在對應的預設類別而每個初始類別對應的初始標籤與所述對應的預設類別的目標標籤不同，則確定所述比較結果為所述初始標註結果與所述預設標註結果不相同。 According to an optional embodiment of the present application, the initial labeling result includes a plurality of initial categories and initial labels corresponding to each initial category, and comparing the initial labeling result with a preset labeling result to obtain the comparison result includes: obtaining Each initial object in the initial annotation image, and determine the initial category corresponding to each initial object; if there is no corresponding preset category for any initial category, determine that the comparison result is the initial labeling result and the initial category The preset labeling results are different; or, if each initial category has a corresponding preset category and the initial label corresponding to each initial category is different from the target label of the corresponding preset category, then determine the comparison result The initial marking result is different from the preset marking result.

根據本申請可選實施例，所述第一語義分割網路包括自編碼器及分類器，所述將所述初始標註圖像輸入到所述第一語義分割網路中，得到與所述初始標註圖像對應的目標圖像及所述目標圖像的目標標註結果包括：利用所述自編碼器處理所述初始標註圖像，生成目標特徵圖；基於所述分類器對所述目標特徵圖中的每個像素點進行分類，得到每個像素點對應的標註類別；基於每個像素點所對應的標註類別及所述標註類別對應的目標標籤對所述目標特徵圖進行標註，生成所述目標圖像及所述目標標註結果。 According to an optional embodiment of the present application, the first semantic segmentation network includes an autoencoder and a classifier, and the initial labeled image is input into the first semantic segmentation network to obtain the initial Labeling the target image corresponding to the image and the target labeling result of the target image includes: using the self-encoder to process the initial labeling image to generate a target feature map; based on the classifier to classify the target feature map Each pixel in is classified to obtain the labeling category corresponding to each pixel; based on the labeling category corresponding to each pixel and the target label corresponding to the labeling category, the target feature map is labeled to generate the The target image and the target labeling result.

根據本申請可選實施例，所述自編碼器包括多個級聯結構及解碼器，所述利用所述自編碼器處理所述初始標註圖像，生成目標特徵圖包括：基於任一級聯結構中的多個隱藏層對所述初始標註圖像進行特徵提取，得到最後一個隱藏層輸出的初始特徵圖；基於所述任一級聯結構的池化層對所述初始特徵圖進行池化操作，輸出第一特徵圖；將當前級聯結構輸出的第一特徵圖輸入至下一個級聯結構中，直至獲取最後一個級聯結構輸出的第一特徵圖作為第二特徵圖；獲取多個所述第一特徵圖中每個像素點的像素值在對應的初始特徵圖中的第一像素位置，並獲取所述第二特徵圖中每個像素點的像素值在對應的初始特徵圖中的第二像素位置；基於所述解碼器、所述第一像素位置及所述第二像素位置對所述第二特徵圖進行解碼操作，得到所述目標特徵圖。 According to an optional embodiment of the present application, the self-encoder includes a plurality of cascaded structures and decoders, and using the self-encoder to process the initial labeled image and generating the target feature map includes: based on any cascaded structure A plurality of hidden layers in the method extract features from the initial labeled image to obtain an initial feature map output by the last hidden layer; a pooling layer based on any cascaded structure performs a pooling operation on the initial feature map, Output the first feature map; output the current cascade structure The first feature map is input to the next cascade structure until the first feature map output by the last cascade structure is obtained as the second feature map; the pixel value of each pixel point in multiple first feature maps is obtained in The first pixel position in the corresponding initial feature map, and obtain the second pixel position of the pixel value of each pixel in the second feature map in the corresponding initial feature map; based on the decoder, the first A pixel position and the second pixel position perform a decoding operation on the second feature map to obtain the target feature map.

根據本申請可選實施例，所述基於所述分類器對所述目標特徵圖中的每個像素點進行分類，得到每個像素點對應的標註類別包括：基於所述目標特徵圖中每個像素點的像素值計算所述目標特徵圖中每個像素點的單個評分值；基於所述單個評分值及所述多個預設類別計算所述單個評分值對應的像素點屬於每個預設類別的類別概率；將取值最大的類別概率所對應的預設類別確定為所述像素點對應的標註類別。 According to an optional embodiment of the present application, classifying each pixel in the target feature map based on the classifier, and obtaining a label category corresponding to each pixel includes: Calculating a single score value for each pixel point in the target feature map based on the pixel value of each pixel point in the target feature map; calculating the single score value based on the single score value and the plurality of preset categories The category probability that the corresponding pixel belongs to each preset category; the preset category corresponding to the category probability with the largest value is determined as the labeled category corresponding to the pixel.

根據本申請可選實施例，所述類別概率的計算公式為：

，i=1,2,...,k；其中，S _i表示每個像素點屬於第i個預設類別的類別概率，

表示所述目標特徵圖中的第j個像素點的單個評分值，z _j表示所述目標特徵圖中的第j個像素點的像素值，

表示所述目標特徵圖中所有像素點的總評分值，i表示所述第i個預設類別，k表示所述多個預設類別的數量。 According to an optional embodiment of the present application, the formula for calculating the category probability is:

, i=1,2,...,k; Among them, S _i represents the category probability that each pixel belongs to the i-th preset category,

Represents the single scoring value of the jth pixel in the target feature map, z _j represents the pixel value of the jth pixel in the target feature map,

represents the total scoring value of all pixels in the target feature map, i represents the ith preset category, and k represents the number of the multiple preset categories.

根據本申請可選實施例，所述基於每個像素點所對應的標註類別及所述標註類別對應的目標標籤對所述目標特徵圖進行標註，生成所述目標圖像及所述目標標註結果包括：將所述目標特徵圖中同一標註類別所對應的所有像素點構成的區域確定為特徵區域；將所述特徵區域中所有像素點的像素值調整為所述同一標註類別所對應的預設數值；根據所述特徵區域的預設數值所對應的顏色標籤對所述特徵區域中的每個像素點進行著色處理，得到目的地區域；根據多個所述目的地區域在所述目標特徵圖的區域位置拼接所述多個目的地區域，得到所述目標圖像；將所述目標圖像中每個目的地區域所對應的預設數值、顏色標籤及標註類別確定為所述目標標註結果。 According to an optional embodiment of the present application, the target feature map is marked based on the labeling category corresponding to each pixel point and the target label corresponding to the labeling category, and the target image and the target labeling result are generated. Including: determining the area formed by all the pixels corresponding to the same label category in the target feature map as the feature area; adjusting the pixel values of all the pixels in the feature area to the preset value corresponding to the same label category numerical value; according to the color label corresponding to the preset numerical value of the feature area, coloring is performed on each pixel in the feature area to obtain a destination area; according to multiple destination areas in the target feature map The area positions of the multiple destination areas are spliced together to obtain the Target image: determining the preset value, color label and labeling category corresponding to each destination area in the target image as the target labeling result.

本申請提供一種電腦設備，所述電腦設備包括：儲存器，儲存至少一個指令；及處理器，執行所述至少一個指令以實現所述的圖像識別方法。 The present application provides a computer device, which includes: a memory storing at least one instruction; and a processor executing the at least one instruction to implement the image recognition method.

本申請提供一種電腦可讀儲存介質，所述電腦可讀儲存介質中儲存有至少一個指令，所述至少一個指令被電腦設備中的處理器執行以實現所述的圖像識別方法。 The present application provides a computer-readable storage medium, at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in a computer device to implement the image recognition method.

由以上技術方案可以看出，由於本申請中的初始標註圖像來自不同的資料集，因此來自不同資料集的任意兩張初始標註圖像之間對同一目標物件的初始標註結果不同(即任意兩張初始標註圖像中的同一個目標物件的標註方式不同)，透過所述第一語義分割網路對所述初始標註圖像進行重新標註，由於所述第一語義分割網路是根據所述預設標註結果的圖像訓練而成的，因此能夠使得透過所述第一語義分割網路生成的目標圖像中對同一個目標物件具有相同的目標標註結果(即統一了所述目標圖像中同一個目標物件的標註方式)，進而基於所述多張目標圖像及每張目標圖像的目標標註結果對所述第一語義分割網路進行訓練，得到第二語義分割網路，由於增加了所述第一語義分割網路的訓練資料及統一了所述訓練圖像的標註方式，使得生成的第二語義分割網路具有更好的泛化能力，從而能夠提高所述圖像標註結果的準確性。 It can be seen from the above technical solutions that since the initial labeled images in this application come from different data sets, the initial labeling results for the same target object between any two initial labeled images from different data sets are different (that is, any The same target object in the two initial annotation images is marked in different ways), and the initial annotation image is re-labeled through the first semantic segmentation network, because the first semantic segmentation network is based on the Therefore, the target image generated by the first semantic segmentation network can have the same target labeling result for the same target object (that is, the target image is unified. the labeling method of the same target object in the image), and then based on the multiple target images and the target labeling results of each target image, the first semantic segmentation network is trained to obtain the second semantic segmentation network, Due to the increase of the training data of the first semantic segmentation network and the unification of the labeling methods of the training images, the generated second semantic segmentation network has better generalization ability, thereby improving the image quality. Accuracy of labeling results.

1:電腦設備 1: Computer equipment

12:儲存器 12: Storage

13:處理器 13: Processor

S10~S15:步驟 S10~S15: Steps

圖1是本申請較佳實施例的圖像識別方法的流程圖。 Fig. 1 is a flowchart of an image recognition method in a preferred embodiment of the present application.

圖2是本申請圖像識別方法中初始標註圖像的示意圖。 Fig. 2 is a schematic diagram of an initially labeled image in the image recognition method of the present application.

圖3是本申請實現圖像識別方法的較佳實施例的電腦設備的結構示意圖。 Fig. 3 is a schematic structural diagram of a computer device implementing a preferred embodiment of the image recognition method of the present application.

為了使本申請的目的、技術方案和優點更加清楚，下面結合附圖和具體實施例對本申請進行詳細描述。 In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

所述圖像識別方法可應用於一個或者多個電腦設備1中。所述電腦設備1是一種能夠按照事先設定或儲存的指令，自動進行參數值計算和/或資訊處理的設備，其硬體包括，但不限於：微處理器、專用積體電路(Application Specific Integrated Circuit，ASIC)、可程式設計閘陣列(Field-Programmable Gate Array，FPGA)、數位訊號處理器(Digital Signal Processor，DSP)、嵌入式設備等。 The image recognition method can be applied to one or more computer devices 1 . The computer device 1 is a device that can automatically perform parameter value calculation and/or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to: microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital signal processor (Digital Signal Processor, DSP), embedded devices, etc.

所述電腦設備1可以是任何一種可與用戶進行人機交互的電子產品，例如，個人電腦、平板電腦、智慧手機、個人數位助理(Personal Digital Assistant，PDA)、遊戲機、互動式網路電視(Internet Protocol Television，IPTV)、智慧型穿戴裝置等。 The computer device 1 can be any electronic product capable of man-machine interaction with the user, for example, a personal computer, a tablet computer, a smart phone, a personal digital assistant (Personal Digital Assistant, PDA), a game console, an interactive Internet TV (Internet Protocol Television, IPTV), smart wearable devices, etc.

所述電腦設備1還可以包括網路設備和/或使用者設備。其中，所述網路設備包括，但不限於單個網路伺服器、多個網路伺服器組成的伺服器組或基於雲計算(Cloud Computing)的由大量主機或網路伺服器構成的雲。 The computer equipment 1 may also include network equipment and/or user equipment. Wherein, the network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud composed of a large number of hosts or network servers based on Cloud Computing.

所述電腦設備1所處的網路包括，但不限於：網際網路、廣域網路、都會區網路、區域網路、虛擬私人網路絡(Virtual Private Network，VPN)等。 The network where the computer device 1 is located includes, but is not limited to: Internet, wide area network, metropolitan area network, local area network, virtual private network (Virtual Private Network, VPN) and so on.

如圖1所示，是本申請一種圖像識別方法的較佳實施例的流程圖。根據不同的需求，所述流程圖中各個步驟的順序可以根據實際檢測要求進行調整，某些步驟可以省略。所述方法的執行主體為電腦設備，例如圖1所示的電腦設備1。 As shown in FIG. 1 , it is a flowchart of a preferred embodiment of an image recognition method of the present application. According to different requirements, the order of each step in the flowchart can be adjusted according to actual detection requirements, and some steps can be omitted. The execution body of the method is a computer device, such as the computer device 1 shown in FIG. 1 .

步驟S10，獲取待識別圖像，並獲取初始標註圖像及所述初始標註圖像的初始標註結果。 Step S10, acquiring an image to be recognized, and acquiring an initial labeled image and an initial labeling result of the initial labeled image.

在本申請的至少一個實施例中，所述初始標註圖像是指所述初始標註結果已知的多張圖像，所述初始標註圖像中包含多個初始物件，其中，不同的資料集中的同一個初始物件對應著多個初始類別、多個初始顏色及多個初始編號。例如：從資料集A獲取第一初始標註圖像，從資料集B獲取第二初始標註圖像，所述第一初始標註圖像及所述第二初始標註圖像中均包含初始物件車，所述第一初始標註圖像中所述初始物件車的初始類別為轎車及公車，其中，所述轎車的初始顏色為藍色，所述轎車的初始編號為3，所述公車的初始顏色為紫色，所述公車的初始編號為5；所述第二初始標註圖像中所述初始物件車的初始類別為摩托車及貨車，其中，所述摩托車的初始顏色為藍色，所述摩托車的初始編號為1，所述貨車的初始顏色為黑色，所述貨車的初始編號為8。 In at least one embodiment of the present application, the initial labeled image refers to the initial Multiple images with known labeling results, the initial labeled image contains multiple initial objects, wherein the same initial object in different data sets corresponds to multiple initial categories, multiple initial colors and multiple initial numbers . For example: the first initial labeled image is obtained from data set A, and the second initial labeled image is obtained from data set B, the first initial labeled image and the second initial labeled image both contain the initial object car, The initial category of the initial object car in the first initial annotation image is a car and a bus, wherein the initial color of the car is blue, the initial number of the car is 3, and the initial color of the bus is Purple, the initial number of the bus is 5; the initial category of the initial object car in the second initial label image is a motorcycle and a truck, wherein the initial color of the motorcycle is blue, the motorcycle The initial number of the car is 1, the initial color of the truck is black, and the initial number of the truck is 8.

在本申請的至少一個實施例中，所述初始標註結果是指所述初始標註圖像中每個初始物件的初始類別與每個初始標籤之間的對應關係，每個初始標籤是指所述初始標註圖像中每個初始物件的初始顏色與每個初始物件的初始編號之間的對應關係。 In at least one embodiment of the present application, the initial labeling result refers to the correspondence between the initial category of each initial object in the initial labeling image and each initial label, and each initial label refers to the The correspondence between the initial color of each initial object in the initial annotation image and the initial number of each initial object.

所述初始物件的初始類別可以為：汽車、樹木、電線桿、行人、雙黃線、白線等等。所述初始顏色可以為：灰色、紅色、橙色、黃色、黃綠色、棕色以及淺藍色等等。例如，任一張初始標註圖像中的初始物件為：人、車、樹木、道路設施、道路、車道線，每個初始物件對應的初始類別為：行人、汽車、樹木、電線桿、馬路、雙黃線、白線，所述任一初始標註圖像的初始標註結果如表1所示：

The initial categories of the initial objects may be: cars, trees, utility poles, pedestrians, double yellow lines, white lines and so on. The initial color may be: gray, red, orange, yellow, yellow-green, brown, light blue, and the like. For example, the initial objects in any initial labeled image are: people, cars, trees, road facilities, roads, lane lines, and the initial categories corresponding to each initial object are: pedestrians, cars, trees, utility poles, roads, Double yellow lines and white lines, the initial labeling results of any of the initial labeling images are shown in Table 1:

在本申請的生少一個實施例中，所述電腦設備從KITTI、Mapillary、CityScapes及Daimler Urban等資料庫中獲取所述初始標註圖像及所述初始標註圖像的初始標註結果。 In one embodiment of the present application, the computer device obtains the initial tagged image and the initial tagged result of the initial tagged image from databases such as KITTI, Mapillary, CityScapes, and Daimler Urban.

如圖2所示，是本申請從上述多個資料庫中獲取到的初始標註圖像，其中，圖2包含建築物、客車、樹木、轎車、行人、車道、人行道及電線桿等多個初始類別，但是僅列舉了部分初始類別的初始顏色及初始編號，所述部分初始類別中每個初始類別的初始顏色及初始編號均不相同。 As shown in Figure 2, it is the initial labeled image obtained by the application from the above-mentioned multiple databases. categories, but only the initial colors and initial numbers of some initial categories are listed, and the initial colors and initial numbers of each initial category in the partial initial categories are different.

為便於呈現不同的初始物件(例如，人、車、路等)對應的初始類別、初始顏色及初始編號，在圖2中採用多個虛線框以說明各個初始物件對應的初始類別、初始顏色及初始編號，虛線框內的文字僅是對圖中各個初始物件的示例性說明，實際應用中，初始標註圖像不包含所述虛線框。在本申請的至少一個實施例中，所述待識別圖像包含多個待識別物件，所述待識別圖像是指圖像中每個待識別物件的類別、顏色等資訊均未標註出來的圖像。在本申請的至少一個實施例中，所述電腦設備獲取行車記錄儀或者車內攝像機拍攝到的圖像，得到所述待識別圖像。 In order to facilitate the presentation of the initial categories, initial colors and initial numbers corresponding to different initial objects (such as people, cars, roads, etc.), multiple dashed boxes are used in Figure 2 to illustrate the initial categories, initial colors and initial numbers corresponding to each initial object. The initial number and the text in the dotted line box are only exemplary descriptions of each initial object in the figure. In practical applications, the initial label image does not include the dotted line box. In at least one embodiment of the present application, the image to be recognized includes a plurality of objects to be recognized, and the image to be recognized refers to an image in which information such as the category and color of each object to be recognized is not marked image. In at least one embodiment of the present application, the computer device acquires images captured by a driving recorder or an in-vehicle camera to obtain the image to be recognized.

步驟S11，構建第一語義分割網路。 Step S11, constructing a first semantic segmentation network.

在本申請的至少一個實施例中，所述第一語義分割網路是指對所述初始標註圖像進行重新標註的網路。 In at least one embodiment of the present application, the first semantic segmentation network refers to a network for re-labeling the initially labeled image.

在本申請的至少一個實施例中，所述第一語義分割網路包括自編碼器及分類器。其中，所述自編碼器包括多個級聯結構及解碼器，每個級聯結構包括多個隱藏層與池化層，所述解碼器包括多個串聯結構，每個串聯結構包括反池化層及多個運算層，所述電腦設備構建第一語義分割網路包括：所述電腦設備構建多個隱藏層與池化層的級聯結構，並將多個所述級聯結構作為編碼器，所述電腦設備構建反池化層及多個運算層的所述串聯結構，將多個所述串聯結構作為解碼器，並構建分類器，更進一步地，所述電腦設備基於所述編碼器、所述解碼器及所述分類器生成學習器，更進一步地，所述電腦設備獲取訓練圖像，並基於所述訓練圖像對所述學習器進行訓練，計算所述學習器的損失值，直至所述損失值小於預設值，得到所述第一語義分割網路。其中，每個隱藏層包括多個卷積層(Convolution Layer)、批標準層(Batch Normalization Layer)及啟動函數層(Activation Function Layer)，所述池化層為最大池化層，每個運算層包括多個反卷積層、所述批標準層及所述啟動函數層，所述啟動函數層可以為ReLu線性整流函數。 In at least one embodiment of the present application, the first semantic segmentation network includes an autoencoder and a classifier. Wherein, the self-encoder includes multiple cascaded structures and decoders, each cascaded structure includes multiple hidden layers and pooling layers, and the decoder includes multiple cascaded structures, each cascaded structure includes anti-pooling layer and a plurality of computing layers, the computer device constructing the first semantic segmentation network includes: the computer device constructs a cascade structure of multiple hidden layers and pooling layers, and uses multiple cascade structures as encoders , the computer equipment constructs the anti-pooling layer and the series connection of multiple computing layers structure, using multiple serial structures as decoders, and constructing a classifier, further, the computer device generates a learner based on the encoder, the decoder and the classifier, further, the The computer device acquires training images, and trains the learner based on the training images, and calculates a loss value of the learner until the loss value is less than a preset value, and obtains the first semantic segmentation network road. Wherein, each hidden layer includes multiple convolution layers (Convolution Layer), batch standard layer (Batch Normalization Layer) and activation function layer (Activation Function Layer), and the pooling layer is a maximum pooling layer, and each operation layer includes A plurality of deconvolution layers, the batch standard layer and the activation function layer, the activation function layer may be a ReLu linear rectification function.

其中，所述編碼器是指對所述初始標註圖像進行特徵提取的網路，所述解碼器是指對編碼器提取到的特徵進行還原的網路，所述編碼器與所述解碼器為對稱結構，所述解碼處理為所述編碼處理的逆過程。在本申請的其他實施例中，所述第一語義分割網路還可以為U-Net、DeepLab v1、DeepLab v2及Mask R-CNN等網路，本申請對此不作限制。 Wherein, the encoder refers to a network that extracts features from the initial labeled image, and the decoder refers to a network that restores the features extracted by the encoder, and the encoder and the decoder It is a symmetrical structure, and the decoding process is the reverse process of the encoding process. In other embodiments of the present application, the first semantic segmentation network may also be networks such as U-Net, DeepLab v1, DeepLab v2, and Mask R-CNN, which are not limited in this application.

步驟S12，將所述初始標註結果與預設標註結果進行比較，得到比較結果。 Step S12, comparing the initial labeling result with the preset labeling result to obtain a comparison result.

在本申請的至少一個實施例中，所述預設標註結果是指每個預設類別與每個預設的顏色標籤及每個預設數值之間的對應關係。 In at least one embodiment of the present application, the preset labeling result refers to a correspondence between each preset category, each preset color label, and each preset value.

在本申請的至少一個實施例中，所述比較結果包括：所述初始標註結果與所述預設標註結果相同，或者，所述初始標註結果與所述預設標註結果不相同。 In at least one embodiment of the present application, the comparison result includes: the initial tagging result is the same as the preset tagging result, or the initial tagging result is different from the preset tagging result.

在本申請的至少一個實施例中，在將所述初始標註結果與預設標註結果進行比較之前，所述方法還包括：所述電腦設備獲取預設的多個顏色標籤、多個預設數值及多個預設類別，進一步地，所述電腦設備建立每個預設數值與每個顏色標籤的對應關係，得到多個目標標籤，更進一步地，所述電腦設備建立每個預設類別與每個目標標籤的對應關係，得到所述預設標註結果。其中，所述多個顏色標籤包括：紅色、綠色、藍色、灰色等，所述多個預設數值可以為：1、2、3、4等。 In at least one embodiment of the present application, before comparing the initial tagging result with the preset tagging result, the method further includes: the computer device acquires a plurality of preset color tags and a plurality of preset values and a plurality of preset categories, further, the computer equipment establishes the corresponding relationship between each preset value and each color label, and obtains multiple target labels, and further, the computer equipment establishes the correspondence between each preset category and each color label The corresponding relationship of each target label is used to obtain the preset labeling result. Wherein, the plurality of color tags include: red, green, blue, gray, etc., the plurality of preset values Can be: 1, 2, 3, 4, etc.

所述多個預設類別可以為：車道、車道線、路口、斑馬線、路緣、樹木、人行道、汽車、腳踏車、機車、行人、嬰兒車、大型車、卡車、紅綠燈、交通號志、道路標誌、建築物、路燈、電線桿、靜態障礙物及動態障礙物等等。可以理解的是，所述預設類別盡可能包含道路上出現的每個目標物件的類別。例如，目標物件為：人、車、樹木、道路設施、道路、車道線，障礙物，每個目標物件對應的初始類別為：行人、汽車、卡車、貨車、樹木、電線桿、道路、車道線、動態障礙物及靜態障礙物等。 The plurality of preset categories may be: lanes, lane markings, crossings, zebra crossings, curbs, trees, sidewalks, cars, bicycles, motorcycles, pedestrians, baby carriages, large vehicles, trucks, traffic lights, traffic signals, road signs , buildings, street lights, utility poles, static and dynamic obstacles, etc. It can be understood that the preset category includes the category of each target object appearing on the road as much as possible. For example, the target objects are: people, cars, trees, road facilities, roads, lane lines, obstacles, and the initial categories corresponding to each target object are: pedestrians, cars, trucks, trucks, trees, utility poles, roads, lane lines , dynamic obstacles and static obstacles.

例如，所述預設標註結果可以如表2所示：

For example, the preset labeling results can be shown in Table 2:

在本申請的至少一個實施例中，所述初始標註結果包含多個初始類別及每個初始類別對應的初始標籤，所述將所述初始標註結果與預設標註結果進行比較，得到比較結果包括：所述電腦設備獲取所述初始標註圖像中的每個初始物件，並確定每個初始物件對應的初始類別，若任一初始類別不存在對應的預設類別，所述電腦設備確定所述比較結果為所述初始標註結果與所述預設標註結果不相同，或者，若每個初始類別存在對應的預設類別而每個初始類別對應的初始標籤與所述對應的預設類別的目標標籤不同，所述電腦設備確定所述比較結果為所述初始標註結果與所述預設標註結果不相同。在本實施例中，當任一初始類別不存在對應的預設類別時，包括以下情況：所述任一初始類別所對應的初始物件與所述多個預設類別中所對應的目標物件相同，或者所述任一初始類別所對應的初始物件與所述多個預設類別所對應的目標物件不同。 In at least one embodiment of the present application, the initial labeling result includes a plurality of initial categories and initial labels corresponding to each initial category, and comparing the initial labeling result with the preset labeling result, the comparison result includes : The computer device obtains each initial object in the initial labeled image, and determines The initial category corresponding to each initial object, if any initial category does not have a corresponding preset category, the computer device determines that the comparison result is that the initial labeling result is different from the preset labeling result, or, if There is a corresponding preset category for each initial category and the initial label corresponding to each initial category is different from the target label of the corresponding preset category, and the computer device determines that the comparison result is that the initial labeling result and the The preset labeling results are different. In this embodiment, when there is no corresponding preset category for any initial category, it includes the following situation: the initial object corresponding to any initial category is the same as the corresponding target object in the plurality of preset categories , or the initial object corresponding to any initial category is different from the target objects corresponding to the multiple preset categories.

進一步地，當所述任一初始類別所對應的初始物件與所述多個預設類別中所對應的目標物件相同時，存在多種情況，例如，所述初始類別與所述預設類別的命名不一致，使得所述初始類別不存在對應的預設類別，或者，所述預設類別的分類過於下位，使得所述初始類別不存在對應的預設類別，或者，所述預設類別過於上位，使得所述初始類別不存在對應的預設類別。每個初始類別對應的初始標籤與所述對應的預設類別的目標標籤不同是指所述初始顏色標籤與初始數值的對應關係和預設數值與顏色標籤的對應關係不一致。 Further, when the initial object corresponding to any initial category is the same as the target object corresponding to the plurality of preset categories, there are many situations, for example, the naming of the initial category and the preset category Inconsistent, so that the initial category does not have a corresponding preset category, or, the classification of the preset category is too low, so that the initial category does not have a corresponding preset category, or, the preset category is too high, So that the initial category does not have a corresponding preset category. The difference between the initial label corresponding to each initial category and the target label of the corresponding preset category means that the correspondence between the initial color label and the initial value is inconsistent with the correspondence between the preset value and the color label.

在本實施例中，若每個初始類別存在對應的預設類別，及每個初始類別對應的初始標籤與所述對應的預設類別的目標標籤相同，所述電腦設備確定所述比較結果為所述初始標註結果與所述預設標註結果相同。透過上述實施方式，能夠判斷任一張初始標註圖像的初始標註結果與所述預設標註結果是否相同，當所述初始標註圖像的初始標註結果與所述預設標註結果相同，則無需對該初始標註圖像進行重新標註，能夠提高標註效率。 In this embodiment, if there is a corresponding preset category for each initial category, and the initial label corresponding to each initial category is the same as the target label of the corresponding preset category, the computer device determines that the comparison result is The initial tagging result is the same as the preset tagging result. Through the above implementation, it can be judged whether the initial labeling result of any initial labeling image is the same as the preset labeling result, and when the initial labeling result of the initial labeling image is the same as the preset labeling result, there is no need to Relabeling the initially labeled image can improve labeling efficiency.

承接上述實施例，將表1與表2進行比對，可以看出表1初始標註結果中馬路、雙黃線及白線在表2中不存在對應的預設類別，經過比對，馬路在所述預設標註結果的預設類別為道路，雙黃線、白線在所述預設標註結果中的預設類別為車道線，因此表1初始標註結果與表2預設標註結果不相同。 Following the above example, comparing Table 1 with Table 2, it can be seen that the road, double yellow line and white line in the initial labeling results of Table 1 do not have corresponding preset categories in Table 2. After comparison, the road is in the The preset category of the preset labeling result is road, and the preset category of double yellow lines and white lines in the preset labeling result is lane line, so the initial labeling results in Table 1 are different from the preset labeling results in Table 2.

步驟S13，若所述比較結果為所述初始標註結果與所述預設標註結果不相同，則將所述初始標註圖像輸入到所述第一語義分割網路中，得到與所述初始標註圖像對應的目標圖像及所述目標圖像的目標標註結果。 Step S13, if the comparison result is the initial labeling result and the preset labeling If the results are not the same, then input the initial tagged image into the first semantic segmentation network to obtain the target image corresponding to the initial tagged image and the target tagging result of the target image.

在本申請的至少一個實施例中，所述目標圖像是指根據所述第一語義分割網路對初始標註圖像進行重新標註後生成的圖像。 In at least one embodiment of the present application, the target image refers to an image generated after re-labeling an initial labeled image according to the first semantic segmentation network.

在本申請的至少一個實施例中，所述電腦設備將所述初始標註圖像輸入到所述第一語義分割網路中，得到與所述初始標註圖像對應的目標圖像及所述目標圖像的目標標註結果包括：所述電腦設備利用所述自編碼器處理所述初始標註圖像，生成目標特徵圖，進一步地，所述電腦設備基於所述分類器對所述目標特徵圖中的每個像素點進行分類，得到每個像素點對應的標註類別，更進一步地，所述電腦設備基於每個像素點所對應的標註類別及所述標註類別對應的目標標籤對所述目標特徵圖進行標註，生成所述目標圖像及所述目標標註結果。 In at least one embodiment of the present application, the computer device inputs the initial tagged image into the first semantic segmentation network to obtain the target image corresponding to the initial tagged image and the target The target labeling result of the image includes: the computer device uses the self-encoder to process the initial label image to generate a target feature map, and further, the computer device classifies the target feature map based on the classifier Each pixel is classified to obtain the labeling category corresponding to each pixel. Further, the computer device classifies the target feature based on the labeling category corresponding to each pixel and the target label corresponding to the labeling category. Annotate the image to generate the target image and the target labeling result.

透過上述實施方式，基於所述自編碼器對所述初始標註圖像進行壓縮及解壓，得到所述目標特徵圖，由於在壓縮的過程中過濾了所述初始標註圖像的圖像雜訊，能夠使生成的目標特徵圖更加清晰，透過所述預設標註結果對與所述預設標註結果不同的初始標註結果所對應的多張初始標註圖像進行重新標註，得到多張所述目標圖像，能夠確保所述多張目標圖像對同一個對象的標註結果是統一的。 Through the above-mentioned implementation manner, based on the self-encoder, the initial labeled image is compressed and decompressed to obtain the target feature map. Since the image noise of the initial labeled image is filtered during the compression process, The generated target feature map can be made clearer, and multiple initial labeled images corresponding to initial labeling results different from the preset labeling result are relabeled through the preset labeling result to obtain multiple target images image, it can ensure that the labeling results of the multiple target images for the same object are unified.

具體地，所述電腦設備利用所述自編碼器處理所述初始標註圖像，生成目標特徵圖包括：所述電腦設備基於任一級聯結構中的多個隱藏層對所述初始標註圖像進行特徵提取，得到最後一個隱藏層輸出的初始特徵圖，進一步地，所述電腦設備基於所述任一級聯結構的池化層對所述初始特徵圖進行池化操作，輸出第一特徵圖，更進一步地，所述電腦設備將當前級聯結構輸出的第一特徵圖輸入至下一個級聯結構中，直至獲取最後一個級聯結構輸出的第一特徵圖作為第二特徵圖，所述電腦設備獲取多個所述第一特徵圖中每個像素點的像素值在對應的初始特徵圖中的第一像素位置，並獲取所述第二特徵圖中每個像素點的像素值在對應的初始特徵圖中的第二像素位置，進一步地，所述電腦設備基於所述解碼器、所述第一像素位置及所述第二像素位置對所述第二特徵圖進行解碼操作，得到所述目標特徵圖。 Specifically, the computer device uses the self-encoder to process the initial tagged image, and generating the target feature map includes: the computer device performs the initial tagged image based on multiple hidden layers in any cascaded structure The feature extraction is to obtain the initial feature map output by the last hidden layer, further, the computer device performs pooling operation on the initial feature map based on the pooling layer of any cascaded structure, and outputs the first feature map, and further Further, the computer device inputs the first feature map output by the current cascade structure into the next cascade structure until the first feature map output by the last cascade structure is obtained as the second feature map, and the computer device Obtain the first pixel position of the pixel value of each pixel in the first feature map in the corresponding initial feature map, and obtain each pixel value in the second feature map The pixel value of the pixel point is in the second pixel position in the corresponding initial feature map, further, the computer device calculates the second feature based on the decoder, the first pixel position and the second pixel position The map is decoded to obtain the target feature map.

具體地，所述電腦設備基於所述解碼器、所述第一像素位置及所述第二像素位置對所述第二特徵圖進行解碼操作，得到所述目標特徵圖包括：所述電腦設備根據每個級聯結構輸出的初始特徵圖的大小構建對應大小的全零特徵圖，並根據所述第二像素位置將所述第二特徵圖中的每個像素值填入對應的全零特徵圖中，得到第一個串聯結構中的反池化層輸出的第三特徵圖，進一步地，所述電腦設備基於所述第一個串聯結構中的多個運算層對所述第三特徵圖進行反卷積操作，得到所述第一個串聯結構輸出的第四特徵圖，更進一步地，所述電腦設備將所述第四特徵圖輸入至下一個串聯結構中，並基於所述第四特徵圖、對應的第一像素位置及所述下一個串聯結構中的多個運算層生成所述下一個串聯結構的第五特徵圖，所述電腦設備獲取最後一個串聯結構輸出的第五特徵圖作為所述目標特徵圖。 Specifically, the computer device performs a decoding operation on the second feature map based on the decoder, the first pixel position, and the second pixel position, and obtaining the target feature map includes: the computer device according to The size of the initial feature map output by each cascaded structure constructs an all-zero feature map of a corresponding size, and fills each pixel value in the second feature map into the corresponding all-zero feature map according to the second pixel position In the method, the third feature map output by the unpooling layer in the first series structure is obtained, and further, the computer device performs calculation on the third feature map based on the multiple operation layers in the first series structure Deconvolution operation to obtain the fourth feature map output by the first series structure, further, the computer device inputs the fourth feature map into the next series structure, and based on the fourth feature map Figure, the corresponding first pixel position and multiple operation layers in the next series structure generate the fifth feature map of the next series structure, and the computer device obtains the fifth feature map output by the last series structure as The target feature map.

在本實施例中，基於所述第一像素位置及所述第二像素位置對所述第二特徵圖進行解碼操作，得到所述目標特徵圖，由於保留了更多的像素的位置資訊，因此所述目標特徵圖中包含的特徵更加完整。 In this embodiment, the second feature map is decoded based on the first pixel position and the second pixel position to obtain the target feature map. Since more pixel position information is retained, the The features contained in the target feature map are more complete.

具體地，所述電腦設備基於所述分類器對所述目標特徵圖中的每個像素點進行分類，得到每個像素點對應的標註類別包括：所述電腦設備基於所述目標特徵圖中每個像素點的像素值計算所述目標特徵圖中每個像素點的單個評分值，進一步地，所述電腦設備基於所述單個評分值及所述多個預設類別計算所述單個評分值對應的像素點屬於每個預設類別的類別概率，更進一步地，所述電腦設備將取值最大的類別概率所對應的預設類別確定為所述像素點對應的標註類別。 Specifically, the computer device classifies each pixel in the target feature map based on the classifier, and obtaining the label category corresponding to each pixel includes: the computer device classifies each pixel in the target feature map based on Calculate the single score value of each pixel point in the target feature map based on the pixel value of each pixel point, and further, the computer device calculates the corresponding score value of the single score value based on the single score value and the plurality of preset categories The pixel points belong to the category probability of each preset category, and further, the computer device determines the preset category corresponding to the category probability with the largest value as the label category corresponding to the pixel point.

在本實施例中，將取值最大的類別概率所對應的預設類別確定為所述像素點對應的標註類別，能夠提高對每個像素點的分類的準確性。具體地，所述類別概率的計算公式為：

表示所述目標特徵圖中所有像素點的總評分值，i表示所述第i個預設類別，k表示所述多個預設類別的數量。 In this embodiment, the preset category corresponding to the category probability with the largest value is determined as the labeled category corresponding to the pixel, which can improve the classification accuracy of each pixel. Specifically, the formula for calculating the class probability is:

具體地，所述電腦設備將每個像素點的像素值作為exp指數函數的對數，得到每個像素點的單個評分值，所述exp指數函數是指以e為底數的指數函數。 Specifically, the computer device uses the pixel value of each pixel as the logarithm of an exp exponential function to obtain a single scoring value for each pixel, and the exp exponential function refers to an exponential function with e as the base.

具體地，所述電腦設備基於每個像素點對應的標註類別及所述標註類別對應的目標標籤對所述目標特徵圖進行標註，生成所述目標圖像及所述目標標註結果包括：所述電腦設備將所述目標特徵圖中同一標註類別所對應的所有像素點構成的區域確定為特徵區域，進一步地，所述電腦設備將所述特徵區域中所有像素點的像素值調整為所述同一標註類別所對應的預設數值，更進一步地，所述電腦設備根據所述特徵區域的預設數值所對應的顏色標籤對所述特徵區域中的每個像素點進行著色處理，得到目的地區域，更進一步地，所述電腦設備根據多個所述目的地區域在所述目標特徵圖的區域位置拼接所述多個目的地區域，得到所述目標圖像，更進一步地，所述電腦設備將所述目標圖像中每個目的地區域所對應的預設數值、顏色標籤及標註類別確定為所述目標標註結果。 Specifically, the computer device labels the target feature map based on the labeling category corresponding to each pixel point and the target label corresponding to the labeling category, and generating the target image and the target labeling result includes: the The computer device determines the area formed by all the pixels corresponding to the same label category in the target feature map as the feature area, and further, the computer device adjusts the pixel values of all the pixels in the feature area to the same Labeling the preset value corresponding to the category, further, the computer device performs coloring processing on each pixel in the feature area according to the color label corresponding to the preset value of the feature area to obtain the destination area , further, the computer device stitches the multiple destination areas according to the area positions of the target feature map to obtain the target image, further, the computer device The preset value, color label and labeling category corresponding to each destination area in the target image are determined as the target labeling result.

在本實施例中，根據所述預設標註結果能夠快速將所述特徵區域中每個像素點的像素值及每個像素點的進行調整，使得所述目標圖像中的每個區域更加顯著。 In this embodiment, according to the preset labeling result, the pixel value of each pixel in the feature region and the value of each pixel can be quickly adjusted, so that each region in the target image is more prominent .

承接上述舉例，所述任一張初始標註圖像的目標標註結果如表3所示：

Following the above example, the target labeling results of any one of the initial labeling images are shown in Table 3:

步驟S14，基於多張所述目標圖像及每張目標圖像的目標標註結果對所述第一語義分割網路進行訓練，得到第二語義分割網路。 Step S14: Train the first semantic segmentation network based on multiple target images and the target labeling results of each target image to obtain a second semantic segmentation network.

本申請的至少一個實施例中，所述第二語義分割網路是指使用所述多張目標圖像及每張目標圖像的目標標註結果對所述第一語義分割網路進行訓練後生成的網路。本申請的至少一個實施例中，所述第二語義分割網路的生成過程與所述第一語義分割網路生成的過程基本一致，本申請對此不作贅述。 In at least one embodiment of the present application, the second semantic segmentation network refers to using the multiple target images and the target labeling results of each target image to train the first semantic segmentation network to generate network. In at least one embodiment of the present application, the process of generating the second semantic segmentation network is basically the same as the process of generating the first semantic segmentation network, which will not be described in detail in this application.

透過上述實施方式，使用所述多張目標圖像及每張目標圖像對應的目標標註結果對所述第一語義分割網路進行訓練，得到所述第二語義分割圖像，透過增加訓練圖像的數量，使得所述第二語義分割網路具有更高的識別準確性，由於所述多張目標圖像對同一個物件的標註結果是統一的，因此，所述第二語義分割網路能夠自動對多張所述待識別圖像進行統一標註。 Through the above implementation, the first semantic segmentation network is trained using the plurality of target images and the target labeling results corresponding to each target image to obtain the second semantic segmentation image. By adding the training image The number of images makes the second semantic segmentation network have higher recognition accuracy. Since the labeling results of the multiple target images for the same object are uniform, the second semantic segmentation network It can automatically and uniformly mark multiple images to be recognized.

步驟S15，將所述待識別圖像輸入到所述第二語義分割網路中，得到所述待識別圖像的圖像標註結果。 Step S15, inputting the image to be recognized into the second semantic segmentation network to obtain an image annotation result of the image to be recognized.

本申請的至少一個實施例中，所述圖像標註結果包含每個待識別物件的類別，每個待識別物件的編號及每個待識別物件的顏色標籤。本申請的至少一個實施例中，所述圖像標註結果的生成過程與所述目標標註結果的生成過程基本一致，本申請對此不作限制。 In at least one embodiment of the present application, the image annotation result includes the category of each object to be recognized, the number of each object to be recognized and the color label of each object to be recognized. In at least one embodiment of the present application, the generation process of the image annotation result is related to the generation of the target annotation result The process is basically the same, and this application does not limit it.

本實施例中，由於所述待識別圖像為行車記錄儀或者車內攝像機拍攝到的圖像，使用所述第二語義分割網路能夠對所述待識別圖像中的每個待識別物件進行準確識別，得到所述圖像標註結果，駕駛員在駕駛過程中能夠根據多個所述圖像標註結果，獲取準確的路況資訊，進而能夠提高駕駛的安全性。 In this embodiment, since the image to be recognized is an image captured by a driving recorder or an in-vehicle camera, each object to be recognized in the image to be recognized can be analyzed by using the second semantic segmentation network Accurate identification is performed to obtain the image annotation results, and the driver can obtain accurate road condition information according to the plurality of image annotation results during driving, thereby improving driving safety.

如圖3所示，是本申請實現圖像識別方法的較佳實施例的電腦設備的結構示意圖。 As shown in FIG. 3 , it is a schematic structural diagram of a computer device implementing a preferred embodiment of the image recognition method of the present application.

在本申請的一個實施例中，所述電腦設備1包括，但不限於，儲存器12、處理器13，以及儲存在所述儲存器12中並可在所述處理器13上運行的電腦程式，例如圖像識別程式。 In one embodiment of the present application, the computer device 1 includes, but is not limited to, a storage 12, a processor 13, and a computer program stored in the storage 12 and operable on the processor 13 , such as an image recognition program.

本領域技術人員可以理解，所述示意圖僅僅是電腦設備1的示例，並不構成對電腦設備1的限定，可以包括比圖示更多或更少的部件，或者組合某些部件，或者不同的部件，例如所述電腦設備1還可以包括輸入輸出設備、網路接入設備、匯流排等。所述處理器13可以是中央處理單元(Central Processing Unit，CPU)，還可以是其他通用處理器、數位訊號處理器(Digital Signal Processor，DSP)、專用積體電路(Application Specific Integrated Circuit，ASIC)、現場可程式設計閘陣列(Field-Programmable Gate Array，FPGA)或者其他可程式設計邏輯器件、分立門或者電晶體邏輯器件、分立硬體元件等。通用處理器可以是微處理器或者所述處理器也可以是任何常規的處理器等，所述處理器13是所述電腦設備1的運算核心和控制中心，利用各種介面和線路連接整個電腦設備1的各個部分，及獲取所述電腦設備1的作業系統以及安裝的各類應用程式、程式碼等。 Those skilled in the art can understand that the schematic diagram is only an example of the computer device 1 and does not constitute a limitation to the computer device 1. It may include more or less components than those shown in the illustration, or combine certain components, or have different Components, for example, the computer device 1 may also include input and output devices, network access devices, bus bars, and the like. The processor 13 can be a central processing unit (Central Processing Unit, CPU), and can also be other general processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., and the processor 13 is the computing core and control center of the computer device 1, and utilizes various interfaces and lines to connect the entire computer device 1, and obtain the operating system of the computer device 1 and various installed applications, program codes, etc.

所述處理器13獲取所述電腦設備1的作業系統以及安裝的各類應用程式。所述處理器13獲取所述應用程式以實現上述各個圖像識別方法實施例中的步驟，例如圖1所示的步驟。 The processor 13 acquires the operating system of the computer device 1 and various installed applications. The processor 13 acquires the application program to implement the steps in the above embodiments of the image recognition method, for example, the steps shown in FIG. 1 .

示例性的，所述電腦程式可以被分割成一個或多個模組/單元，所述一個或者多個模組/單元被儲存在所述儲存器12中，並由所述處理器13獲取，以完成本申請。所述一個或多個模組/單元可以是能夠完成特定功能的一系列電腦程式指令段，所述指令段用於描述所述電腦程式在所述電腦設備1中的獲取過程。 Exemplarily, the computer program can be divided into one or more modules/units, and the one or more modules/units are stored in the storage 12 and acquired by the processor 13, to complete this application. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the acquisition process of the computer program in the computer device 1 .

所述儲存器12可用於儲存所述電腦程式和/或模組，所述處理器13透過運行或獲取儲存在所述儲存器12內的電腦程式和/或模組，以及調用儲存在儲存器12內的資料，實現所述電腦設備1的各種功能。所述儲存器12可主要包括儲存程式區和儲存資料區，其中，儲存程式區可儲存作業系統、至少一個功能所需的應用程式(比如聲音播放功能、圖像播放功能等)等；儲存資料區可儲存根據電腦設備的使用所創建的資料等。此外，儲存器12可以包括非易失性儲存器，例如硬碟、儲存器、插接式硬碟，智慧儲存卡(Smart Media Card,SMC)，安全數位(Secure Digital,SD)卡，快閃儲存器卡(Flash Card)、至少一個磁碟儲存器件、快閃儲存器器件、或其他非易失性固態儲存器件。 The storage 12 can be used to store the computer programs and/or modules, and the processor 13 executes or obtains the computer programs and/or modules stored in the storage 12, and calls the computer programs and/or modules stored in the storage 12 to realize various functions of the computer device 1. The storage device 12 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, at least one application program required by a function (such as a sound playback function, an image playback function, etc.); The area can store data created according to the use of the computer equipment, etc. In addition, the storage 12 may include non-volatile storage, such as hard disk, memory, plug-in hard disk, smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory A memory card (Flash Card), at least one magnetic disk storage device, a flash memory device, or other non-volatile solid state storage devices.

所述儲存器12可以是電腦設備1的外部儲存器和/或內部儲存器。進一步地，所述儲存器12可以是具有實物形式的儲存器，如儲存器條、TF卡 (Trans-flash Card)等等。 The storage 12 may be an external storage and/or an internal storage of the computer device 1 . Further, the storage 12 can be a storage in physical form, such as a storage bar, a TF card (Trans-flash Card) and so on.

所述電腦設備1集成的模組/單元如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時，可以儲存在一個電腦可讀取儲存介質中。基於這樣的理解，本申請實現上述實施例方法中的全部或部分流程，也可以透過電腦程式來指令相關的硬體來完成，所述的電腦程式可儲存於一電腦可讀儲存介質中，所述電腦程式在被處理器獲取時，可實現上述各個方法實施例的步驟。 If the integrated modules/units of the computer device 1 are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments in the present application can also be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium, so When the computer program is acquired by the processor, it can realize the steps of the above-mentioned various method embodiments.

其中，所述電腦程式包括電腦程式代碼，所述電腦程式代碼可以為原始程式碼形式、物件代碼形式、可獲取檔或某些中間形式等。所述電腦可讀介質可以包括：能夠攜帶所述電腦程式代碼的任何實體或裝置、記錄介質、隨身碟、移動硬碟、磁碟、光碟、電腦儲存器、唯讀儲存器(ROM，Read-Only Memory)。 Wherein, the computer program includes computer program code, and the computer program code may be in the form of original code, object code, obtainable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer storage, a read-only memory (ROM, Read- Only Memory).

結合圖1，所述電腦設備1中的所述儲存器12儲存多個指令以實現一種圖像識別方法，所述處理器13可獲取所述多個指令從而實現：獲取待識別圖像，並獲取初始標註圖像及所述初始標註圖像的初始標註結果；構建第一語義分割網路；將所述初始標註結果與預設標註結果進行比較，得到比較結果；若所述比較結果為所述初始標註結果與所述預設標註結果不相同，則將所述初始標註圖像輸入到所述第一語義分割網路中，得到與所述初始標註圖像對應的目標圖像及所述目標圖像的目標標註結果；基於多張所述目標圖像及每張目標圖像的目標標註結果對所述第一語義分割網路進行訓練，得到第二語義分割網路；將所述待識別圖像輸入到所述第二語義分割網路中，得到所述待識別圖像的圖像標註結果。具體地，所述處理器13對上述指令的具體實現方法可參考圖2對應實施例中相關步驟的描述，在此不贅述。 Referring to FIG. 1, the memory 12 in the computer device 1 stores a plurality of instructions to implement an image recognition method, and the processor 13 can acquire the plurality of instructions to realize: acquire an image to be identified, and Obtaining an initial labeling image and an initial labeling result of the initial labeling image; constructing a first semantic segmentation network; comparing the initial labeling result with a preset labeling result to obtain a comparison result; if the comparison result is the If the initial labeling result is different from the preset labeling result, the initial labeling image is input into the first semantic segmentation network to obtain the target image corresponding to the initial labeling image and the The target labeling result of the target image; the first semantic segmentation network is trained based on the target labeling results of multiple target images and each target image to obtain the second semantic segmentation network; The recognized image is input into the second semantic segmentation network to obtain an image annotation result of the image to be recognized. Specifically, for the specific implementation method of the above instruction by the processor 13, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 2 , which will not be repeated here.

在本申請所提供的幾個實施例中，應所述理解到，所揭露的系統，裝置和方法，可以透過其它的方式實現。例如，以上所描述的裝置實施例僅僅是示意性的，例如，所述模組的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式。 In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

所述作為分離部件說明的模組可以是或者也可以不是物理上分開的，作為模組顯示的部件可以是或者也可以不是物理單元，即可以位於一個地方，或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部模組來實現本實施例方案的目的。 The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or may also be distributed to multiple networks on the unit. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申請各個實施例中的各功能模組可以集成在一個處理單元中，也可以是各個單元單獨物理存在，也可以兩個或兩個以上單元集成在一個單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用硬體加軟體功能模組的形式實現。 In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented not only in the form of hardware, but also in the form of hardware plus software function modules.

因此，無論從哪一點來看，均應將實施例看作是示範性的，而且是非限制性的，本申請的範圍由所附請求項而不是上述說明限定，因此旨在將落在請求項的等同要件的含義和範圍內的所有變化涵括在本申請內。不應將請求項中的任何附關聯圖標記視為限制所涉及的請求項。 Therefore, no matter from any point of view, the embodiments should be regarded as exemplary and non-restrictive, and the scope of the application is defined by the appended claims rather than the above description, so it is intended to All changes within the meaning and range of equivalents of the elements are embraced in this application. Any attached reference mark in a claim shall not be deemed to limit the claim to which it relates.

此外，顯然“包括”一詞不排除其他單元或步驟，單數不排除複數。本申請中陳述的多個單元或裝置也可以由一個單元或裝置透過軟體或者硬體來實現。第一、第二等詞語用來表示名稱，而並不表示任何特定的順序。 In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or devices stated in this application may also be realized by one unit or device through software or hardware. The terms first, second, etc. are used to denote names and do not imply any particular order.

最後應說明的是，以上實施例僅用以說明本申請的技術方案而非限制，儘管參照較佳實施例對本申請進行了詳細說明，本領域的普通技術人員應當理解，可以對本申請的技術方案進行修改或等同替換，而不脫離本申請技術方案的精神和範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application without limitation. Although the present application has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present application.

S10~S15:步驟 S10~S15: Steps

Claims

An image recognition method applied to computer equipment, wherein the image recognition method includes: obtaining an image to be recognized, and obtaining an initial tagged image and an initial tagged result of the initial tagged image; constructing a first semantic segmentation Network; obtain preset multiple color labels, multiple preset values and multiple preset categories, establish the corresponding relationship between each preset value and each color label, obtain multiple target labels, and establish each preset The corresponding relationship between the category and each target label to obtain the preset labeling result; compare the initial labeling result with the preset labeling result to obtain the comparison result; if the comparison result is the initial labeling result and the If the preset labeling results are different, the initial labeling image is input into the first semantic segmentation network to obtain the target image corresponding to the initial labeling image and the target labeling result of the target image ; Based on a plurality of target images and the target labeling results of each target image, the first semantic segmentation network is trained to obtain a second semantic segmentation network; the image to be recognized is input to the In the second semantic segmentation network, an image annotation result of the image to be recognized is obtained.

The image recognition method according to claim 1, wherein the initial labeling result includes a plurality of initial categories and an initial label corresponding to each initial category, and comparing the initial labeling result with a preset labeling result, Obtaining the comparison result includes: obtaining each initial object in the initial marked image, and determining the initial category corresponding to each initial object; if any initial category does not have a corresponding preset category, determining that the comparison result is the The initial labeling result is different from the preset labeling result; or if each initial category has a corresponding preset category and the initial label corresponding to each initial category is different from the target label of the corresponding preset category, then determining that the comparison result is the initial labeling result The result is different from the preset labeling result.

The image recognition method according to claim 1, wherein the first semantic segmentation network includes an autoencoder and a classifier, and the initial labeled image is input into the first semantic segmentation network Obtaining the target image corresponding to the initial labeled image and the target labeling result of the target image includes: using the self-encoder to process the initial labeled image to generate a target feature map; based on the classifier Classifying each pixel in the target feature map to obtain a label category corresponding to each pixel; based on the label category corresponding to each pixel point and the target label corresponding to the label category Marking is performed to generate the target image and the target marking result.

The image recognition method according to claim 3, wherein the self-encoder includes a plurality of cascaded structures and decoders, and the process of using the self-encoder to process the initial labeled image to generate a target feature map includes : Based on multiple hidden layers in any cascade structure, feature extraction is performed on the initial labeled image to obtain an initial feature map output by the last hidden layer; based on the pooling layer of any cascade structure, the initial feature The graph is pooled and the first feature map is output; the first feature map output by the current cascade structure is input to the next cascade structure until the first feature map output by the last cascade structure is obtained as the second feature map ; Obtain the first pixel position of the pixel value of each pixel in the first feature map in the corresponding initial feature map, and obtain the pixel value of each pixel in the second feature map in the corresponding A second pixel position in the initial feature map; performing a decoding operation on the second feature map based on the decoder, the first pixel position, and the second pixel position to obtain the target feature map.

The image recognition method according to claim 3 or 4, wherein the classifier is used to classify each pixel in the target feature map to obtain the label corresponding to each pixel The classification includes: calculating a single score value of each pixel point in the target feature map based on the pixel value of each pixel point in the target feature map; based on the single score value and the plurality of preset categories, calculating the The category probability that the pixel point corresponding to the single scoring value belongs to each preset category; the preset category corresponding to the category probability with the largest value is determined as the labeled category corresponding to the pixel point.

The image recognition method as described in claim item 5, wherein the calculation formula of the category probability is:

The image recognition method according to claim 4, wherein the target feature map is marked based on the label category corresponding to each pixel point and the target label corresponding to the label category to generate the target image And the target labeling result includes: determining the area formed by all pixels corresponding to the same label category in the target feature map as a feature area; adjusting the pixel values of all pixels in the feature area to the same label The preset value corresponding to the category; coloring each pixel in the feature area according to the color label corresponding to the preset value of the feature area to obtain a destination area; according to a plurality of the destination areas Splicing the plurality of destination areas at the area positions of the target feature map to obtain the target image; Don't be sure to label the result for said goal.

A computer device, wherein the computer device includes: a memory storing at least one instruction; and a processor obtaining the instruction stored in the memory to realize the image as described in any one of claims 1 to 7 recognition methods.

A computer-readable storage medium, wherein: at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in a computer device to implement any one of claims 1 to 7. image recognition method.