Character recognition method based on convolutional neural network
Technical Field
The invention belongs to the field of pattern recognition, and particularly relates to a character recognition method based on a convolutional neural network, which can be used for document scanning recognition of industrial drawings.
Background
Optical Character Recognition (OCR) technology refers to a process in which an electronic device examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by Character Recognition. OCR is an important research direction in the field of pattern recognition, is widely applied to automatic information processing, and has important practical significance in the engineering fields of buildings, machinery and the like.
In actual engineering, workers need to record information of industrial drawings into a computer so as to record and manage engineering information. However, in general, the number of engineering drawings is huge and the contents are complicated, if the manual entry is only performed, a large amount of labor cost and time cost are often required, the manual entry work is tedious and repeated, and the phenomena of mis-entry and mis-identification are likely to occur. If the contents of the engineering drawings can be automatically scanned and identified, a large amount of labor and time cost can be saved for engineering, and the management of engineering drawing documents becomes simple and efficient.
The traditional image-text recognition method mainly comprises two types of statistical decision recognition and character recognition method based on machine learning. The statistical decision identification is the most traditional image-text identification method, the typical statistical decision identification method is a template matching method, and the optimal matching template of the input image is selected by calculating the minimum distance between the template image and the input image. The character recognition method based on machine learning mainly comprises the steps of image preprocessing, feature extraction and recognition, wherein the feature extraction can adopt artificial feature extraction or a convolutional neural network to extract image features. The convolutional neural network is used as an implementation model of deep learning, has the characteristics of local perception field and weight sharing, greatly reduces the computational complexity, and is rapidly developed in the aspect of image-text recognition. However, for poor-quality and complex drawings, the traditional recognition algorithm based on the convolutional neural network cannot achieve higher accuracy.
Disclosure of Invention
The invention aims to provide a character recognition method based on a convolutional neural network, which improves the traditional character recognition convolutional neural network, simplifies the network structure, adopts a proper excitation function and pooling method, increases the error correction post-processing operations such as replacement, list comparison and the like, and greatly improves the recognition speed and accuracy. In addition, the method corrects the inclined character by adopting a least square method, realizes the identification of the multidirectional character, adapts to the complexity of an industrial drawing and can meet the requirement of identification accuracy. Therefore, compared with the traditional identification method, the convolutional neural network method adopted by the invention has higher accuracy and is more effective.
The technical scheme of the invention comprises the following steps:
step 1, binaryzation is carried out on a scanned document of an engineering drawing, and pictures and texts in the scanned document of the engineering drawing are separated from a background;
step 2, aiming at the inclined scanned document, finding the inclination angle of the binary image, and performing rotation correction on the image by adopting bilinear interpolation;
step 3, extracting a table frame line in the corrected image in the step 2 by adopting mathematical morphology processing, and dividing the image into a drawing area, a material table area and a title bar area;
step 4, searching all connected areas of the corrected image in the step 2, and then screening and combining the connected areas according to the shape and the adjacent position characteristics of the marked connected areas in the image to position a single character;
step 5, combining the single characters into a character string, calculating a fitting straight line of the central point of a circumscribed rectangle of each character in the character string, judging the trend of the character string according to the slope of the fitting straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction;
step 6, searching the circle in the corrected image in the step 2, and extracting the welding track number in the engineering drawing;
step 7, making an engineering drawing character data set, constructing a convolutional neural network under a MatConvNet framework, and identifying characters in the drawing in the step by using a model obtained by network training;
step 8, setting different coding rules for the picture area, the material list area and the title bar area, and automatically correcting simple character recognition errors in each area;
and 9, outputting and storing all the drawing data obtained by identification.
Further, the step 1 adopts the optimal threshold processing based on the Ostu method to carry out binarization on the scanned engineering drawing document, the specific implementation mode is as follows,
l different gray levels in a scanned image are represented by {0,1, 2.,. L-1}, a normalized histogram of the input image is calculated, and p is usediL-1 denotes the histogram, assuming that a threshold k is selected, at which the image is divided into C 01, k, and C1Two groups of { k +1, k + 2.,. L-1}, then the pixels are divided into C0Group C and1the summary in the group is:
C0and C1The average gray levels of (a) are respectively:
the average value of the gray levels of the whole image is:
μ=μ0ω0+μ1ω1
calculate C0And C1Variance between groups:
σ2=ω0(μ0-μ)2+ω1(μ1-μ)2=ω0ω1(μ1-μ0)2
finding the optimal threshold k among L different gray levels*Will make σ2Maximum k*As an optimum threshold value, the scanned document image is converted into a binary image using the optimum threshold value.
Furthermore, the tilt angle of the binary image is found by using hough transform in step 2, which is implemented as follows,
in image space, all lines passing through the point (x, y) are represented by polar equations:
ρ=x cosθ+y sinθ,0≤θ≤π
where ρ is the distance from the origin to the line, θ is the angle between the line and the x-axis;
establishing an accumulator H (rho, theta) in the parameter space rho and theta, initializing the element value of the accumulator to be 0, finding all values corresponding to rho and theta in the parameter space rho and theta for any point (x, y) in the image, accumulating the accumulator H (rho, theta), and then carrying out local peak value detection on the accumulator H (rho, theta) to obtain rho and theta parameters of all straight lines in the image; in order to reduce the interference of redundant information, only the straight line in the image close to the upper boundary area is considered, and the parameter theta of the longest straight line l, l in the partial area is found, namely the inclination angle of the scanned document.
Further, the specific implementation of step 3 includes the following sub-steps,
step 3.1, linear structural element [ 111 ] in vertical direction is adopted]TAnd a linear structural element [ 111 ] in the horizontal direction]Corroding and then expanding the corrected image in the step 2, skeletonizing the result, and refining all horizontal and vertical line segments into a single line with the line segment width of one pixel to obtain a vertical frame line image I1And horizontal frame line image I2;
Step 3.2, for the vertical frame line image I1And horizontal frame line image I2Performing and operation to obtain intersection points of the frame lines of the table;
3.3, for each intersection point, detecting other intersection points near the same horizontal position and other intersection points near the same vertical position, and matching in sequence according to the distance to find each correct form frame; if an intersection point A exists, the intersection point B near the same horizontal position and the intersection point C near the same vertical position form a rectangle, and table frame lines exist in the positions of the four sides of the rectangle, namely the four sides of the rectangle are detected to be superposed with the table frame lines, a table frame which takes the intersection point A as an upper left corner point, the distance AB as a length and the distance AC as a width exists in the whole table; the processing is carried out on all the intersection points of the table frame lines in sequence, and each correct table frame can be extracted;
and 3.4, dividing the table frame with the largest area in the extracted table frames into a drawing area, dividing the table frame with the second area into a material table area, and forming a title bar area by the area comprising a plurality of table frames below the whole image.
Further, the specific implementation of step 4 includes the following sub-steps,
step 4.1, searching all connected regions of the corrected image in the step 2 by adopting a stroke-based connected region marking algorithm, wherein the searching step is as follows:
1) scanning the corrected image in the step 2 line by line, forming a sequence of continuous white pixels in each line, wherein the sequence is called a cluster, and recording the starting point, the end point and the line number of the cluster;
2) traversing the cliques in each row starting from the second row, and giving a new label to the cliques in a certain row if the cliques in the row do not overlap with the cliques in the previous row; if there is only an overlap with one blob in the previous row, assigning it the label of the blob with which the previous row overlaps; if it has an overlapping portion with 2 and more blobs in the previous row, it is given the smallest index of the blob in the previous row that overlaps it. And writing the marks of the clusters in the upper line into the equivalence pairs to indicate that the clusters belong to one class;
3) converting the equivalent pair into equivalent sequences, each sequence being given the same reference number since they are equivalent, starting with 1 and giving each equivalent sequence a reference number;
4) the marks of all the groups are carried out one by one, corresponding equivalent sequences are searched, and new marks are given to the equivalent sequences;
5) after traversing, dividing the clusters with the same mark into a communication area, and searching to obtain all communication areas;
and 4.2, measuring the connected regions by setting the length-width ratio, the maximum length-width size and the neighbor connection distance according to the character characteristics of the engineering drawing, and finally, screening and combining the connected regions to obtain a single character positioning result.
Further, the specific implementation of step 5 includes the following sub-steps,
step 5.1, obtaining the coordinates of the starting point, the width, the height and the coordinates of the central point of the circumscribed rectangular frame of the single character in the step 4, sorting the circumscribed rectangular frames of all the characters from left to right and from top to bottom according to the coordinates of the central point, processing each character connected region according to the sorting, and setting the circumscribed rectangular region of the first character as a label 1 to represent that the circumscribed rectangular region belongs to a first character string; for all the characters except the first one, if the distance between the character and the center point of the character area in front of the character area is larger than the set maximum distance of the same character string, a new label is given to the character area, and the character area belongs to another new character string; if one (or more) central point distance between the character region before the sorting and the character region before the sorting is within the set maximum distance of the same character string, the characters belong to the same character string, and a character region label meeting the requirement is given to the current character region; sorting all labels, wherein the same label belongs to the same character string to obtain a character string combination;
and 5.2, finding a fitted straight line of the central point of the character circumscribed rectangle frame in the character string by using a least square method, judging the trend of the character string according to the slope of the fitted straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction.
Further, in step 6, a circle Hough transform algorithm is adopted to search a circle in the corrected image,
the equation of a circle with (a, b) as the center and r as the radius is
(x-a)2+(y-b)2=r2
Rewriting the equation into parametric form
a=x-r cosθ,b=y-r sinθ,θ∈[0,2π)
Wherein a and b are horizontal and vertical coordinates of the circle center, r is the radius of the circle, and theta is the included angle between a point on the circle and the straight line where the circle center is located and the x axis. Mapping the point (x, y) on each circle to the (a, b, r) parameter space to obtain a quadric surface, establishing an accumulator H (a, b, r) in the parameter space, initializing the element value of the accumulator to be 0, and setting the value range of the radius of the circle to be detected as r belonging to r1,r2]Traversing all points in the binary image, finding all values in a corresponding parameter space, and accumulating an H (a, b, r) accumulator, wherein the larger the value of the accumulator is, the higher the possibility that a circle with the (a, b) as the center of the circle and the r as the radius exists in the binary image is; and carrying out local peak value detection on the accumulator to obtain parameters of all possible circles in the image, thereby finding all possible circles in the drawing.
Further, in step 7, a convolutional neural network is constructed under the MatConvNet framework, and the structure of the network includes: the convolutional layer C1, the downsampling layer S1, the convolutional layer C2, the downsampling layer S2, the convolutional layer C3 and the Softmax layer, the size of the input image is 20 × 20, the output is a matrix of 1 × 1 × ClassNum, each element of the matrix represents the probability that the input image belongs to a certain class character, and the class corresponding to the maximum probability represents the class to which the input image belongs.
Compared with the existing algorithm, the invention has the following advantages:
1) character words in any direction can be recognized. The method can accurately position the character region, judge the trend of the character string by calculating the fitted straight line of the central point of the circumscribed rectangle of each character in the character string, and rotationally correct the oblique character string, thereby solving the character multi-directionality problem.
2) Real-time performance can be satisfied. The invention adopts a mathematical morphology method to carry out layout analysis on the engineering drawing, divides the whole image into corresponding areas, and then carries out text recognition on different local areas, thereby improving the processing speed and time efficiency.
3) The accuracy of character recognition is high. The method of the convolutional neural network is adopted, and the accuracy is higher than that of the traditional template matching method.
Drawings
FIG. 1 is a flow chart of an implementation of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network structure for character recognition in an embodiment of the present invention;
FIG. 3 is a diagram of the results of the present invention segmenting the form frame, with the details of the drawing, materials tables, and title bar areas omitted, and only the form frame shown;
FIG. 4 is a diagram illustrating the effect of extracting a circular channel number by a circular Hough transform according to an embodiment of the present invention;
fig. 5 is a graph showing the character recognition result of the embodiment of the present invention, in which the grey bottom numbers beside the circular bead numbers are the result of recognizing the numbers inside the bead numbers.
Detailed Description
The technical means and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of a character recognition method based on a convolutional neural network provided by the embodiment of the present invention are as follows:
step 1: performing binarization on the scanned engineering drawing document by adopting optimal threshold processing based on an Ostu method, and separating the image-text from the background in the scanned engineering drawing document;
the method comprises the following specific steps:
l different gray levels in a scanned image are represented by {0,1, 2.,. L-1}, a normalized histogram of the input image is calculated, and p is usediL-1 denotes the histogram, assuming that a threshold k is selected, at which the image is divided into C 01, k, and C1Two groups of { k +1, k + 2.,. L-1}, then the pixels are divided into C0Group C and1the summary in the group is:
C0and C1The average gray levels of (a) are respectively:
the average value of the gray levels of the whole image is:
μ=μ0ω0+μ1ω1
calculate C0And C1Variance between groups:
σ2=ω0(μ0-μ)2+ω1(μ1-μ)2=ω0ω1(μ1-μ0)2
finding the optimal threshold k among L different gray levels*Will make σ2Maximum k*As an optimum threshold value, the document image is converted into a binary image using the optimum threshold value.
Step 2: because the problem of oblique scanning inevitably occurs in the scanning process, the angle of inclination of the binarized scanned document image in the step 1 is found by using hough transform, only the angle of inclination varying within the range of plus or minus 5 degrees in the horizontal direction and the vertical direction is detected in the embodiment, and then the rotation correction is carried out on the inclined image by using bilinear interpolation;
the method comprises the following specific steps:
in image space, all lines passing through point (x, y) can be represented by the polar equation:
ρ=x cosθ+y sinθ,0≤θ≤π
where ρ is the distance from the origin to the line and θ is the angle between the line and the x-axis. Establishing an accumulator H (rho, theta) in the parameter space rho and theta, initializing the element value of the accumulator to be 0, finding all values of rho and theta in the parameter space corresponding to rho and theta for any point (x, y) in the image, accumulating the accumulator H (rho, theta), and then carrying out local peak value detection on the accumulator H (rho, theta) to obtain rho and theta parameters of all straight lines in the image. In order to reduce the interference of redundant information, only straight lines in the image close to the upper boundary area are considered, and the parameter theta of the longest straight line l, l in the partial image is found to be the inclination angle of the scanned document. And finally, rotating the image by adopting a bilinear interpolation method to finish the inclination correction.
And step 3: extracting a table frame line in the corrected image by adopting mathematical morphology processing, and dividing the image into 3 areas of a drawing, a material table and a title bar;
when the engineering drawing scanning document is identified, the document image is divided into regions, namely, the whole image is divided into corresponding regions, and after the identification is completed, error correction processing needs to be performed on different regions respectively. The layout of the engineering drawing consists of a complex table, and the regional division can be converted into the table detection.
The method comprises the following specific steps:
step 3.1: the table frame lines are detected, and include vertical frame lines and horizontal frame lines. Using a linear structural element [ 111 ] in the vertical direction]TAnd a linear structural element [ 111 ] in the horizontal direction]Corroding and expanding an input scanned document image, skeletonizing the result, and refining all horizontal and vertical line segments into a single line with the line segment width of one pixel to obtain a vertical frame line image I1And a horizontal frameLine image I2。
Step 3.2: and (6) calculating the intersection point of the frame lines. For vertical frame line image I1And horizontal frame line image I2And operation is carried out to obtain the intersection point of the frame line of the table.
Step 3.3: and extracting the table frame. And for each intersection point, detecting other intersection points near the same horizontal position and other intersection points near the same vertical position, and sequentially matching according to the distance to find each correct form frame. If an intersection point A exists, the intersection point B near the same horizontal position and the intersection point C near the same vertical position form a rectangle, and table frame lines exist in the positions of the four sides of the rectangle, namely the four sides of the rectangle are detected to be superposed with the table frame lines, a table frame which takes the intersection point A as an upper left corner point, the distance AB as a length and the distance AC as a width exists in the whole table. By performing the above-described processing on the intersections of the table frame lines in sequence, each correct table frame can be extracted.
Step 3.4: and dividing the table frame with the largest area in the extracted table frames into a drawing area, dividing the table frame with the second area into a material table area, and forming a title bar area by an area comprising a plurality of table frames below the whole image. FIG. 3 shows the table frame division result, as shown in the figure, the table frame with the largest area on the right side is divided into a drawing area, the table frame with the second area on the left side of the drawing area is divided into a material table area, and all the table frames below the material table and the drawing constitute a title bar area.
And 4, step 4: searching all connected regions in the corrected image in the step 2, and screening and combining the connected regions according to the characteristics of the shape, the adjacent position and the like of the marked connected regions in the image to position the character regions;
the method comprises the following specific steps:
step 4.1: searching all connected regions in the corrected image in the step 2 by adopting a connected region marking algorithm based on the travel; the searching steps are as follows:
1) the rectified image in step 2 is scanned line by line, the successive white pixels in each line are grouped into a sequence, called a blob, and its start, end and line number are recorded.
2) Traversing the cliques in each row starting from the second row, and giving a new label to the cliques in a certain row if the cliques in the row do not overlap with the cliques in the previous row; if there is only an overlap with one blob in the previous row, assigning it the label of the blob with which the previous row overlaps; if it has an overlapping portion with 2 and more blobs in the previous row, it is given the smallest index of the blob in the previous row that overlaps it. And the labels of the several blobs in the previous row are written into the equivalence pairs, indicating that they belong to one class.
3) Equivalent pairs are converted to equivalent sequences, each of which is given the same reference numeral since they are equivalent, and starting with 1, each equivalent sequence is given a reference numeral.
4) Going through the labels of all the clusters one by one, looking up the corresponding equivalent sequences and giving them new labels.
5) After traversing, dividing the clusters with the same mark into a communication area, and searching to obtain all the communication areas.
Step 4.2: and 4, carrying out operations such as screening and merging on the connected regions according to the characteristics such as the shapes and the adjacent positions of the connected regions obtained by searching in the step 4.1, so as to obtain the character regions. For characters in engineering drawings, the connected regions should meet certain characteristics of height, width and mutual proportion. According to the character characteristics of an engineering drawing, the connected region is measured by setting the length-width ratio, the maximum length-width size, the neighbor connection distance and the like, the range of the length-width ratio is set to be 1.5-2, the maximum length is 12, the maximum width is 7, the neighbor connection distance is 3, and finally, the regions are screened and combined to obtain a single character positioning result.
And 5: the single character is combined into a character string, a fitting straight line of the central point of a circumscribed rectangle of each character in the character string is calculated, the trend of the character string is judged according to the slope of the fitting straight line, and the oblique character string is rotated to the horizontal direction by a corresponding angle, so that the difficulty of character recognition is reduced.
The method comprises the following specific steps:
step 5.1: firstly, positioning each single character through character positioning based on a connected domain to obtain parameters such as a starting point coordinate, a width, a height, a central point coordinate and the like of a character external rectangular frame. And sequencing the circumscribed rectangular frames of all the characters from left to right and from top to bottom according to the coordinates of the central point, processing each character connected region in sequence, and setting the first character circumscribed rectangular region as a label 1 to represent that the first character string belongs to. For all the characters except the first one, if the distance between the character area and the center point of the character area in the front is larger than the set maximum distance of the same character string, a new label is given to the character area, and the character area belongs to another new character string; if the distance between the center point of the character area before the sorting and the center point of the character area before the sorting is within the set maximum distance of the same character string, the characters belong to the same character string, and a character area label meeting the requirement is given to the current character area. And sorting all the labels, wherein the same label belongs to the same character string to obtain a character string combination.
Step 5.2: and finding a fitting straight line of the central point of the external rectangular frame of each character in the character string by using a least square method, judging the trend of the character string according to the slope of the fitting straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction.
Step 6: searching a circle in the corrected image by adopting a circular Hough transform algorithm, and extracting a weld bead number in the engineering drawing, wherein the extraction result of the circular weld bead number is shown in FIG. 4;
the method comprises the following specific steps:
the equation of a circle with (a, b) as the center and r as the radius is
(x-a)2+(y-b)2=r2
Rewriting the equation into parametric form
a=x-r cosθ,b=y-r sinθ,θ∈[0,2π)
Wherein a and b are horizontal and vertical coordinates of the circle center, r is the radius of the circle, and theta is the included angle between a point on the circle and the straight line where the circle center is located and the x axis. Mapping the point (x, y) on each circle to the (a, b, r) parameter space to obtain a quadric surface, establishing an accumulator H (a, b, r) in the parameter space, initializing the element value of the accumulator to be 0, and setting the radius of the circle to be detectedHas a value range of r ∈ [ r ∈ >1,r2]Traversing all points in the binary image, finding all values in the corresponding parameter space, and accumulating the H (a, b, r) accumulator, wherein the larger the value of the accumulator is, the higher the possibility that a circle with the (a, b) as the center of the circle and the r as the radius exists in the binary image is. And carrying out local peak value detection on the accumulator to obtain parameters of all possible circles in the image, thereby finding all possible circles in the drawing. The weld bead number is generally a circle with a radius within a certain range, and the weld bead number can be extracted after finding out the circle in the graph.
And 7: making an engineering drawing character data set, constructing a convolutional neural network under a MatConvNet framework, and recognizing characters in a drawing by using a model obtained by network training, wherein the character recognition result is shown in FIG. 5;
the method comprises the following specific steps:
step 7.1: nearly 2000 characters are extracted from engineering drawings and marked to form a data set, and the data set is divided into a training set and a testing set, wherein the training set and the testing set respectively account for 80% and 20% of the data set.
Step 7.2: and (5) building a CNN network for identifying characters under MatConvNet. Step 7.2: and (5) building a CNN network for identifying characters under MatConvNet. As shown in fig. 2, it can be seen that the structure of the convolutional layer for character recognition includes: convolutional layer C1, downsampling layer S1, convolutional layer C2, downsampling layer S2, convolutional layer C3, and Softmax layer, the input image size is 20 × 20, and the output is a matrix of 1 × 1 × ClassNum (ClassNum stands for number of classes). The input of the CNN model is a 20 × 20 character image, which passes through the first layer convolutional layer C1, the scale of which is 20 convolution kernels of 3 × 3, to obtain 20 feature maps of 18 × 18, and then the feature maps are transmitted to the first maximum pooling layer of 2 × 2, and after passing through this maximum pooling layer, the scale of the feature maps is reduced to 9 × 9 × 20, and a ReLU activation function is added thereto, and the space size of the feature maps is not changed after processing. Then, the feature map is input into a second layer convolution layer, wherein 20 × 100 convolution kernels with the size of 3 × 3 are included, and the feature map with the size of 7 × 7 × 100 is sent to a second layer pooling layer to be subjected to feature compression, so that the size of the feature map is 3 × 3 × 100, and then the feature map is processed by a ReLU activation function. And then entering a convolution layer with a convolution kernel of a third layer of 3 multiplied by 3, wherein the number of the convolution kernels is 100 multiplied by ClassNum, finally, after a Softmax regression, each output dimension is the probability that the input image belongs to the class, the class to which the maximum probability belongs is a prediction class, and the character recognition is completed.
And 8: and correcting the simple errors through the coding rule and automatically correcting the identification errors.
The detailed steps are as follows:
according to the form frame extracted in the step 3, the whole drawing layout can be divided into a drawing, a material list and a title bar, different encoding rules are set for different areas, for example, the detection result of the drawing area should be a number, letters such as o, i and l cannot exist in a character string with a value type of the number, and the letters are directly replaced by 0,1 and 1. In addition, the drawing number, pipeline number, grade, Item Code and the like in the material list and the title column have complete lists, the recognition results of the drawing number, the pipeline number, the grade, the Item Code and the like are definitely in the lists, the recognition results are compared with the values, the similarity between the recognition results which are not in the lists and the list values is calculated, and the similarity is higher than 90%, and the recognition results are replaced by the list values.
And step 9: and outputting an XML file containing the engineering drawing information, and storing the drawing data.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.