[go: up one dir, main page]

CN108596066B - A Character Recognition Method Based on Convolutional Neural Network - Google Patents

A Character Recognition Method Based on Convolutional Neural Network Download PDF

Info

Publication number
CN108596066B
CN108596066B CN201810332531.9A CN201810332531A CN108596066B CN 108596066 B CN108596066 B CN 108596066B CN 201810332531 A CN201810332531 A CN 201810332531A CN 108596066 B CN108596066 B CN 108596066B
Authority
CN
China
Prior art keywords
character
image
frame
area
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810332531.9A
Other languages
Chinese (zh)
Other versions
CN108596066A (en
Inventor
张海剑
成帅
杨天韵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201810332531.9A priority Critical patent/CN108596066B/en
Publication of CN108596066A publication Critical patent/CN108596066A/en
Application granted granted Critical
Publication of CN108596066B publication Critical patent/CN108596066B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • G06V30/422Technical drawings; Geographical maps
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

本发明公开了一种基于卷积神经网络的字符识别方法,主要解决现有的人力录入工程图纸数据时费时费力、错误率高的问题,其主要步骤包括:1)采用基于Otsu法的最佳全局阈值处理将灰度图像转换为二值图像。2)对倾斜的图像进行旋转矫正。3)采用数学形态学的算法提取出表格框。4)获得字符所在区域。5)将斜向字符串旋转相应的角度至水平方向。6)采用圆形霍夫变换的算法来查找图像中的圆圈,提取工程图纸中的焊道号。7)采用卷积神经网络的方法对工程图纸进行图文识别。8)通过编码规则修正简单错误,自动纠正识别错误。9)输出并存储图纸数据。本发明在工程图纸的图文识别上具有高准确率和实时性,实现了工程图纸数据的高效记录与管理。

Figure 201810332531

The invention discloses a character recognition method based on a convolutional neural network, which mainly solves the problems of time-consuming, labor-intensive and high error rate when entering engineering drawing data by manpower. Global thresholding converts grayscale images to binary images. 2) Rotate the skewed image. 3) The table frame is extracted by the algorithm of mathematical morphology. 4) Get the area where the character is located. 5) Rotate the diagonal string by the corresponding angle to the horizontal direction. 6) The algorithm of circular Hough transform is used to find the circle in the image, and the weld bead number in the engineering drawing is extracted. 7) The method of convolutional neural network is used to recognize the graphic and text of engineering drawings. 8) Correct simple errors through coding rules, and automatically correct and identify errors. 9) Export and store drawing data. The invention has high accuracy and real-time performance in the graphic and text recognition of the engineering drawings, and realizes the efficient recording and management of the engineering drawing data.

Figure 201810332531

Description

Character recognition method based on convolutional neural network
Technical Field
The invention belongs to the field of pattern recognition, and particularly relates to a character recognition method based on a convolutional neural network, which can be used for document scanning recognition of industrial drawings.
Background
Optical Character Recognition (OCR) technology refers to a process in which an electronic device examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by Character Recognition. OCR is an important research direction in the field of pattern recognition, is widely applied to automatic information processing, and has important practical significance in the engineering fields of buildings, machinery and the like.
In actual engineering, workers need to record information of industrial drawings into a computer so as to record and manage engineering information. However, in general, the number of engineering drawings is huge and the contents are complicated, if the manual entry is only performed, a large amount of labor cost and time cost are often required, the manual entry work is tedious and repeated, and the phenomena of mis-entry and mis-identification are likely to occur. If the contents of the engineering drawings can be automatically scanned and identified, a large amount of labor and time cost can be saved for engineering, and the management of engineering drawing documents becomes simple and efficient.
The traditional image-text recognition method mainly comprises two types of statistical decision recognition and character recognition method based on machine learning. The statistical decision identification is the most traditional image-text identification method, the typical statistical decision identification method is a template matching method, and the optimal matching template of the input image is selected by calculating the minimum distance between the template image and the input image. The character recognition method based on machine learning mainly comprises the steps of image preprocessing, feature extraction and recognition, wherein the feature extraction can adopt artificial feature extraction or a convolutional neural network to extract image features. The convolutional neural network is used as an implementation model of deep learning, has the characteristics of local perception field and weight sharing, greatly reduces the computational complexity, and is rapidly developed in the aspect of image-text recognition. However, for poor-quality and complex drawings, the traditional recognition algorithm based on the convolutional neural network cannot achieve higher accuracy.
Disclosure of Invention
The invention aims to provide a character recognition method based on a convolutional neural network, which improves the traditional character recognition convolutional neural network, simplifies the network structure, adopts a proper excitation function and pooling method, increases the error correction post-processing operations such as replacement, list comparison and the like, and greatly improves the recognition speed and accuracy. In addition, the method corrects the inclined character by adopting a least square method, realizes the identification of the multidirectional character, adapts to the complexity of an industrial drawing and can meet the requirement of identification accuracy. Therefore, compared with the traditional identification method, the convolutional neural network method adopted by the invention has higher accuracy and is more effective.
The technical scheme of the invention comprises the following steps:
step 1, binaryzation is carried out on a scanned document of an engineering drawing, and pictures and texts in the scanned document of the engineering drawing are separated from a background;
step 2, aiming at the inclined scanned document, finding the inclination angle of the binary image, and performing rotation correction on the image by adopting bilinear interpolation;
step 3, extracting a table frame line in the corrected image in the step 2 by adopting mathematical morphology processing, and dividing the image into a drawing area, a material table area and a title bar area;
step 4, searching all connected areas of the corrected image in the step 2, and then screening and combining the connected areas according to the shape and the adjacent position characteristics of the marked connected areas in the image to position a single character;
step 5, combining the single characters into a character string, calculating a fitting straight line of the central point of a circumscribed rectangle of each character in the character string, judging the trend of the character string according to the slope of the fitting straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction;
step 6, searching the circle in the corrected image in the step 2, and extracting the welding track number in the engineering drawing;
step 7, making an engineering drawing character data set, constructing a convolutional neural network under a MatConvNet framework, and identifying characters in the drawing in the step by using a model obtained by network training;
step 8, setting different coding rules for the picture area, the material list area and the title bar area, and automatically correcting simple character recognition errors in each area;
and 9, outputting and storing all the drawing data obtained by identification.
Further, the step 1 adopts the optimal threshold processing based on the Ostu method to carry out binarization on the scanned engineering drawing document, the specific implementation mode is as follows,
l different gray levels in a scanned image are represented by {0,1, 2.,. L-1}, a normalized histogram of the input image is calculated, and p is usediL-1 denotes the histogram, assuming that a threshold k is selected, at which the image is divided into C 01, k, and C1Two groups of { k +1, k + 2.,. L-1}, then the pixels are divided into C0Group C and1the summary in the group is:
Figure BDA0001628360870000021
C0and C1The average gray levels of (a) are respectively:
Figure BDA0001628360870000031
the average value of the gray levels of the whole image is:
μ=μ0ω01ω1
calculate C0And C1Variance between groups:
σ2=ω00-μ)211-μ)2=ω0ω110)2
finding the optimal threshold k among L different gray levels*Will make σ2Maximum k*As an optimum threshold value, the scanned document image is converted into a binary image using the optimum threshold value.
Furthermore, the tilt angle of the binary image is found by using hough transform in step 2, which is implemented as follows,
in image space, all lines passing through the point (x, y) are represented by polar equations:
ρ=x cosθ+y sinθ,0≤θ≤π
where ρ is the distance from the origin to the line, θ is the angle between the line and the x-axis;
establishing an accumulator H (rho, theta) in the parameter space rho and theta, initializing the element value of the accumulator to be 0, finding all values corresponding to rho and theta in the parameter space rho and theta for any point (x, y) in the image, accumulating the accumulator H (rho, theta), and then carrying out local peak value detection on the accumulator H (rho, theta) to obtain rho and theta parameters of all straight lines in the image; in order to reduce the interference of redundant information, only the straight line in the image close to the upper boundary area is considered, and the parameter theta of the longest straight line l, l in the partial area is found, namely the inclination angle of the scanned document.
Further, the specific implementation of step 3 includes the following sub-steps,
step 3.1, linear structural element [ 111 ] in vertical direction is adopted]TAnd a linear structural element [ 111 ] in the horizontal direction]Corroding and then expanding the corrected image in the step 2, skeletonizing the result, and refining all horizontal and vertical line segments into a single line with the line segment width of one pixel to obtain a vertical frame line image I1And horizontal frame line image I2
Step 3.2, for the vertical frame line image I1And horizontal frame line image I2Performing and operation to obtain intersection points of the frame lines of the table;
3.3, for each intersection point, detecting other intersection points near the same horizontal position and other intersection points near the same vertical position, and matching in sequence according to the distance to find each correct form frame; if an intersection point A exists, the intersection point B near the same horizontal position and the intersection point C near the same vertical position form a rectangle, and table frame lines exist in the positions of the four sides of the rectangle, namely the four sides of the rectangle are detected to be superposed with the table frame lines, a table frame which takes the intersection point A as an upper left corner point, the distance AB as a length and the distance AC as a width exists in the whole table; the processing is carried out on all the intersection points of the table frame lines in sequence, and each correct table frame can be extracted;
and 3.4, dividing the table frame with the largest area in the extracted table frames into a drawing area, dividing the table frame with the second area into a material table area, and forming a title bar area by the area comprising a plurality of table frames below the whole image.
Further, the specific implementation of step 4 includes the following sub-steps,
step 4.1, searching all connected regions of the corrected image in the step 2 by adopting a stroke-based connected region marking algorithm, wherein the searching step is as follows:
1) scanning the corrected image in the step 2 line by line, forming a sequence of continuous white pixels in each line, wherein the sequence is called a cluster, and recording the starting point, the end point and the line number of the cluster;
2) traversing the cliques in each row starting from the second row, and giving a new label to the cliques in a certain row if the cliques in the row do not overlap with the cliques in the previous row; if there is only an overlap with one blob in the previous row, assigning it the label of the blob with which the previous row overlaps; if it has an overlapping portion with 2 and more blobs in the previous row, it is given the smallest index of the blob in the previous row that overlaps it. And writing the marks of the clusters in the upper line into the equivalence pairs to indicate that the clusters belong to one class;
3) converting the equivalent pair into equivalent sequences, each sequence being given the same reference number since they are equivalent, starting with 1 and giving each equivalent sequence a reference number;
4) the marks of all the groups are carried out one by one, corresponding equivalent sequences are searched, and new marks are given to the equivalent sequences;
5) after traversing, dividing the clusters with the same mark into a communication area, and searching to obtain all communication areas;
and 4.2, measuring the connected regions by setting the length-width ratio, the maximum length-width size and the neighbor connection distance according to the character characteristics of the engineering drawing, and finally, screening and combining the connected regions to obtain a single character positioning result.
Further, the specific implementation of step 5 includes the following sub-steps,
step 5.1, obtaining the coordinates of the starting point, the width, the height and the coordinates of the central point of the circumscribed rectangular frame of the single character in the step 4, sorting the circumscribed rectangular frames of all the characters from left to right and from top to bottom according to the coordinates of the central point, processing each character connected region according to the sorting, and setting the circumscribed rectangular region of the first character as a label 1 to represent that the circumscribed rectangular region belongs to a first character string; for all the characters except the first one, if the distance between the character and the center point of the character area in front of the character area is larger than the set maximum distance of the same character string, a new label is given to the character area, and the character area belongs to another new character string; if one (or more) central point distance between the character region before the sorting and the character region before the sorting is within the set maximum distance of the same character string, the characters belong to the same character string, and a character region label meeting the requirement is given to the current character region; sorting all labels, wherein the same label belongs to the same character string to obtain a character string combination;
and 5.2, finding a fitted straight line of the central point of the character circumscribed rectangle frame in the character string by using a least square method, judging the trend of the character string according to the slope of the fitted straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction.
Further, in step 6, a circle Hough transform algorithm is adopted to search a circle in the corrected image,
the equation of a circle with (a, b) as the center and r as the radius is
(x-a)2+(y-b)2=r2
Rewriting the equation into parametric form
a=x-r cosθ,b=y-r sinθ,θ∈[0,2π)
Wherein a and b are horizontal and vertical coordinates of the circle center, r is the radius of the circle, and theta is the included angle between a point on the circle and the straight line where the circle center is located and the x axis. Mapping the point (x, y) on each circle to the (a, b, r) parameter space to obtain a quadric surface, establishing an accumulator H (a, b, r) in the parameter space, initializing the element value of the accumulator to be 0, and setting the value range of the radius of the circle to be detected as r belonging to r1,r2]Traversing all points in the binary image, finding all values in a corresponding parameter space, and accumulating an H (a, b, r) accumulator, wherein the larger the value of the accumulator is, the higher the possibility that a circle with the (a, b) as the center of the circle and the r as the radius exists in the binary image is; and carrying out local peak value detection on the accumulator to obtain parameters of all possible circles in the image, thereby finding all possible circles in the drawing.
Further, in step 7, a convolutional neural network is constructed under the MatConvNet framework, and the structure of the network includes: the convolutional layer C1, the downsampling layer S1, the convolutional layer C2, the downsampling layer S2, the convolutional layer C3 and the Softmax layer, the size of the input image is 20 × 20, the output is a matrix of 1 × 1 × ClassNum, each element of the matrix represents the probability that the input image belongs to a certain class character, and the class corresponding to the maximum probability represents the class to which the input image belongs.
Compared with the existing algorithm, the invention has the following advantages:
1) character words in any direction can be recognized. The method can accurately position the character region, judge the trend of the character string by calculating the fitted straight line of the central point of the circumscribed rectangle of each character in the character string, and rotationally correct the oblique character string, thereby solving the character multi-directionality problem.
2) Real-time performance can be satisfied. The invention adopts a mathematical morphology method to carry out layout analysis on the engineering drawing, divides the whole image into corresponding areas, and then carries out text recognition on different local areas, thereby improving the processing speed and time efficiency.
3) The accuracy of character recognition is high. The method of the convolutional neural network is adopted, and the accuracy is higher than that of the traditional template matching method.
Drawings
FIG. 1 is a flow chart of an implementation of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network structure for character recognition in an embodiment of the present invention;
FIG. 3 is a diagram of the results of the present invention segmenting the form frame, with the details of the drawing, materials tables, and title bar areas omitted, and only the form frame shown;
FIG. 4 is a diagram illustrating the effect of extracting a circular channel number by a circular Hough transform according to an embodiment of the present invention;
fig. 5 is a graph showing the character recognition result of the embodiment of the present invention, in which the grey bottom numbers beside the circular bead numbers are the result of recognizing the numbers inside the bead numbers.
Detailed Description
The technical means and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of a character recognition method based on a convolutional neural network provided by the embodiment of the present invention are as follows:
step 1: performing binarization on the scanned engineering drawing document by adopting optimal threshold processing based on an Ostu method, and separating the image-text from the background in the scanned engineering drawing document;
the method comprises the following specific steps:
l different gray levels in a scanned image are represented by {0,1, 2.,. L-1}, a normalized histogram of the input image is calculated, and p is usediL-1 denotes the histogram, assuming that a threshold k is selected, at which the image is divided into C 01, k, and C1Two groups of { k +1, k + 2.,. L-1}, then the pixels are divided into C0Group C and1the summary in the group is:
Figure BDA0001628360870000071
C0and C1The average gray levels of (a) are respectively:
Figure BDA0001628360870000072
the average value of the gray levels of the whole image is:
μ=μ0ω01ω1
calculate C0And C1Variance between groups:
σ2=ω00-μ)211-μ)2=ω0ω110)2
finding the optimal threshold k among L different gray levels*Will make σ2Maximum k*As an optimum threshold value, the document image is converted into a binary image using the optimum threshold value.
Step 2: because the problem of oblique scanning inevitably occurs in the scanning process, the angle of inclination of the binarized scanned document image in the step 1 is found by using hough transform, only the angle of inclination varying within the range of plus or minus 5 degrees in the horizontal direction and the vertical direction is detected in the embodiment, and then the rotation correction is carried out on the inclined image by using bilinear interpolation;
the method comprises the following specific steps:
in image space, all lines passing through point (x, y) can be represented by the polar equation:
ρ=x cosθ+y sinθ,0≤θ≤π
where ρ is the distance from the origin to the line and θ is the angle between the line and the x-axis. Establishing an accumulator H (rho, theta) in the parameter space rho and theta, initializing the element value of the accumulator to be 0, finding all values of rho and theta in the parameter space corresponding to rho and theta for any point (x, y) in the image, accumulating the accumulator H (rho, theta), and then carrying out local peak value detection on the accumulator H (rho, theta) to obtain rho and theta parameters of all straight lines in the image. In order to reduce the interference of redundant information, only straight lines in the image close to the upper boundary area are considered, and the parameter theta of the longest straight line l, l in the partial image is found to be the inclination angle of the scanned document. And finally, rotating the image by adopting a bilinear interpolation method to finish the inclination correction.
And step 3: extracting a table frame line in the corrected image by adopting mathematical morphology processing, and dividing the image into 3 areas of a drawing, a material table and a title bar;
when the engineering drawing scanning document is identified, the document image is divided into regions, namely, the whole image is divided into corresponding regions, and after the identification is completed, error correction processing needs to be performed on different regions respectively. The layout of the engineering drawing consists of a complex table, and the regional division can be converted into the table detection.
The method comprises the following specific steps:
step 3.1: the table frame lines are detected, and include vertical frame lines and horizontal frame lines. Using a linear structural element [ 111 ] in the vertical direction]TAnd a linear structural element [ 111 ] in the horizontal direction]Corroding and expanding an input scanned document image, skeletonizing the result, and refining all horizontal and vertical line segments into a single line with the line segment width of one pixel to obtain a vertical frame line image I1And a horizontal frameLine image I2
Step 3.2: and (6) calculating the intersection point of the frame lines. For vertical frame line image I1And horizontal frame line image I2And operation is carried out to obtain the intersection point of the frame line of the table.
Step 3.3: and extracting the table frame. And for each intersection point, detecting other intersection points near the same horizontal position and other intersection points near the same vertical position, and sequentially matching according to the distance to find each correct form frame. If an intersection point A exists, the intersection point B near the same horizontal position and the intersection point C near the same vertical position form a rectangle, and table frame lines exist in the positions of the four sides of the rectangle, namely the four sides of the rectangle are detected to be superposed with the table frame lines, a table frame which takes the intersection point A as an upper left corner point, the distance AB as a length and the distance AC as a width exists in the whole table. By performing the above-described processing on the intersections of the table frame lines in sequence, each correct table frame can be extracted.
Step 3.4: and dividing the table frame with the largest area in the extracted table frames into a drawing area, dividing the table frame with the second area into a material table area, and forming a title bar area by an area comprising a plurality of table frames below the whole image. FIG. 3 shows the table frame division result, as shown in the figure, the table frame with the largest area on the right side is divided into a drawing area, the table frame with the second area on the left side of the drawing area is divided into a material table area, and all the table frames below the material table and the drawing constitute a title bar area.
And 4, step 4: searching all connected regions in the corrected image in the step 2, and screening and combining the connected regions according to the characteristics of the shape, the adjacent position and the like of the marked connected regions in the image to position the character regions;
the method comprises the following specific steps:
step 4.1: searching all connected regions in the corrected image in the step 2 by adopting a connected region marking algorithm based on the travel; the searching steps are as follows:
1) the rectified image in step 2 is scanned line by line, the successive white pixels in each line are grouped into a sequence, called a blob, and its start, end and line number are recorded.
2) Traversing the cliques in each row starting from the second row, and giving a new label to the cliques in a certain row if the cliques in the row do not overlap with the cliques in the previous row; if there is only an overlap with one blob in the previous row, assigning it the label of the blob with which the previous row overlaps; if it has an overlapping portion with 2 and more blobs in the previous row, it is given the smallest index of the blob in the previous row that overlaps it. And the labels of the several blobs in the previous row are written into the equivalence pairs, indicating that they belong to one class.
3) Equivalent pairs are converted to equivalent sequences, each of which is given the same reference numeral since they are equivalent, and starting with 1, each equivalent sequence is given a reference numeral.
4) Going through the labels of all the clusters one by one, looking up the corresponding equivalent sequences and giving them new labels.
5) After traversing, dividing the clusters with the same mark into a communication area, and searching to obtain all the communication areas.
Step 4.2: and 4, carrying out operations such as screening and merging on the connected regions according to the characteristics such as the shapes and the adjacent positions of the connected regions obtained by searching in the step 4.1, so as to obtain the character regions. For characters in engineering drawings, the connected regions should meet certain characteristics of height, width and mutual proportion. According to the character characteristics of an engineering drawing, the connected region is measured by setting the length-width ratio, the maximum length-width size, the neighbor connection distance and the like, the range of the length-width ratio is set to be 1.5-2, the maximum length is 12, the maximum width is 7, the neighbor connection distance is 3, and finally, the regions are screened and combined to obtain a single character positioning result.
And 5: the single character is combined into a character string, a fitting straight line of the central point of a circumscribed rectangle of each character in the character string is calculated, the trend of the character string is judged according to the slope of the fitting straight line, and the oblique character string is rotated to the horizontal direction by a corresponding angle, so that the difficulty of character recognition is reduced.
The method comprises the following specific steps:
step 5.1: firstly, positioning each single character through character positioning based on a connected domain to obtain parameters such as a starting point coordinate, a width, a height, a central point coordinate and the like of a character external rectangular frame. And sequencing the circumscribed rectangular frames of all the characters from left to right and from top to bottom according to the coordinates of the central point, processing each character connected region in sequence, and setting the first character circumscribed rectangular region as a label 1 to represent that the first character string belongs to. For all the characters except the first one, if the distance between the character area and the center point of the character area in the front is larger than the set maximum distance of the same character string, a new label is given to the character area, and the character area belongs to another new character string; if the distance between the center point of the character area before the sorting and the center point of the character area before the sorting is within the set maximum distance of the same character string, the characters belong to the same character string, and a character area label meeting the requirement is given to the current character area. And sorting all the labels, wherein the same label belongs to the same character string to obtain a character string combination.
Step 5.2: and finding a fitting straight line of the central point of the external rectangular frame of each character in the character string by using a least square method, judging the trend of the character string according to the slope of the fitting straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction.
Step 6: searching a circle in the corrected image by adopting a circular Hough transform algorithm, and extracting a weld bead number in the engineering drawing, wherein the extraction result of the circular weld bead number is shown in FIG. 4;
the method comprises the following specific steps:
the equation of a circle with (a, b) as the center and r as the radius is
(x-a)2+(y-b)2=r2
Rewriting the equation into parametric form
a=x-r cosθ,b=y-r sinθ,θ∈[0,2π)
Wherein a and b are horizontal and vertical coordinates of the circle center, r is the radius of the circle, and theta is the included angle between a point on the circle and the straight line where the circle center is located and the x axis. Mapping the point (x, y) on each circle to the (a, b, r) parameter space to obtain a quadric surface, establishing an accumulator H (a, b, r) in the parameter space, initializing the element value of the accumulator to be 0, and setting the radius of the circle to be detectedHas a value range of r ∈ [ r ∈ >1,r2]Traversing all points in the binary image, finding all values in the corresponding parameter space, and accumulating the H (a, b, r) accumulator, wherein the larger the value of the accumulator is, the higher the possibility that a circle with the (a, b) as the center of the circle and the r as the radius exists in the binary image is. And carrying out local peak value detection on the accumulator to obtain parameters of all possible circles in the image, thereby finding all possible circles in the drawing. The weld bead number is generally a circle with a radius within a certain range, and the weld bead number can be extracted after finding out the circle in the graph.
And 7: making an engineering drawing character data set, constructing a convolutional neural network under a MatConvNet framework, and recognizing characters in a drawing by using a model obtained by network training, wherein the character recognition result is shown in FIG. 5;
the method comprises the following specific steps:
step 7.1: nearly 2000 characters are extracted from engineering drawings and marked to form a data set, and the data set is divided into a training set and a testing set, wherein the training set and the testing set respectively account for 80% and 20% of the data set.
Step 7.2: and (5) building a CNN network for identifying characters under MatConvNet. Step 7.2: and (5) building a CNN network for identifying characters under MatConvNet. As shown in fig. 2, it can be seen that the structure of the convolutional layer for character recognition includes: convolutional layer C1, downsampling layer S1, convolutional layer C2, downsampling layer S2, convolutional layer C3, and Softmax layer, the input image size is 20 × 20, and the output is a matrix of 1 × 1 × ClassNum (ClassNum stands for number of classes). The input of the CNN model is a 20 × 20 character image, which passes through the first layer convolutional layer C1, the scale of which is 20 convolution kernels of 3 × 3, to obtain 20 feature maps of 18 × 18, and then the feature maps are transmitted to the first maximum pooling layer of 2 × 2, and after passing through this maximum pooling layer, the scale of the feature maps is reduced to 9 × 9 × 20, and a ReLU activation function is added thereto, and the space size of the feature maps is not changed after processing. Then, the feature map is input into a second layer convolution layer, wherein 20 × 100 convolution kernels with the size of 3 × 3 are included, and the feature map with the size of 7 × 7 × 100 is sent to a second layer pooling layer to be subjected to feature compression, so that the size of the feature map is 3 × 3 × 100, and then the feature map is processed by a ReLU activation function. And then entering a convolution layer with a convolution kernel of a third layer of 3 multiplied by 3, wherein the number of the convolution kernels is 100 multiplied by ClassNum, finally, after a Softmax regression, each output dimension is the probability that the input image belongs to the class, the class to which the maximum probability belongs is a prediction class, and the character recognition is completed.
And 8: and correcting the simple errors through the coding rule and automatically correcting the identification errors.
The detailed steps are as follows:
according to the form frame extracted in the step 3, the whole drawing layout can be divided into a drawing, a material list and a title bar, different encoding rules are set for different areas, for example, the detection result of the drawing area should be a number, letters such as o, i and l cannot exist in a character string with a value type of the number, and the letters are directly replaced by 0,1 and 1. In addition, the drawing number, pipeline number, grade, Item Code and the like in the material list and the title column have complete lists, the recognition results of the drawing number, the pipeline number, the grade, the Item Code and the like are definitely in the lists, the recognition results are compared with the values, the similarity between the recognition results which are not in the lists and the list values is calculated, and the similarity is higher than 90%, and the recognition results are replaced by the list values.
And step 9: and outputting an XML file containing the engineering drawing information, and storing the drawing data.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (7)

1. A character recognition method based on a convolutional neural network is characterized by comprising the following steps:
step 1, binaryzation is carried out on a scanned document of an engineering drawing, and pictures and texts in the scanned document of the engineering drawing are separated from a background;
step 2, aiming at the inclined scanning document, finding the inclination angle of the binary image, and performing rotation correction on the binary image by adopting bilinear interpolation;
step 3, adopting mathematical morphology to process and extract a table frame line in the corrected image in the step 2, and dividing the corrected image into a picture area, a material table area and a title bar area;
the specific implementation of step 3 comprises the following sub-steps,
step 3.1, linear structural element [ 111 ] in vertical direction is adopted]TAnd a linear structural element [ 111 ] in the horizontal direction]Corroding and then expanding the corrected image in the step 2, skeletonizing the result, and refining all horizontal and vertical line segments into a single line with the line segment width of one pixel to obtain a vertical frame line image I1And horizontal frame line image I2
Step 3.2, for the vertical frame line image I1And horizontal frame line image I2Performing and operation to obtain intersection points of the frame lines of the table;
3.3, for each intersection point, detecting other intersection points near the same horizontal position and other intersection points near the same vertical position, and sequentially matching according to the distance to find each correct form frame; if an intersection point A exists, the intersection point B near the same horizontal position and the intersection point C near the same vertical position form a rectangle, and table frame lines exist in the positions of the four sides of the rectangle, namely the four sides of the rectangle are detected to be superposed with the table frame lines, a table frame which takes the intersection point A as an upper left corner point, the distance AB as a length and the distance AC as a width exists in the whole table; sequentially carrying out the processing on all the intersection points of the table frame lines to extract each correct table frame;
step 3.4, dividing the table frame with the largest area in the extracted table frames into a drawing area, dividing the table frame with the second area into a material table area, and forming a title bar area by the area comprising a plurality of table frames below the whole image;
step 4, searching all connected areas of the corrected image in the step 2, and then screening and combining the connected areas according to the shape and the adjacent position characteristics of the marked connected areas in the corrected image to position a single character;
step 5, combining the single characters into a character string, calculating a fitting straight line of the central point of a circumscribed rectangle of each character in the character string, judging the trend of the character string according to the slope of the fitting straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction;
step 6, searching the circle in the corrected image in the step 2, and extracting the welding track number in the engineering drawing;
step 7, making an engineering drawing character data set, constructing a convolutional neural network under a MatConvNet framework, and recognizing characters in the engineering drawing by using a model obtained by training the convolutional neural network;
step 8, setting different coding rules for the picture area, the material list area and the title bar area, and automatically correcting simple character recognition errors in each area;
and 9, outputting and storing all the drawing data obtained by identification.
2. The method of character recognition based on convolutional neural network as claimed in claim 1, wherein: in the step 1, the optimal threshold processing based on the Ostu method is adopted to carry out binarization on the scanned documents of the engineering drawings, the specific implementation mode is as follows,
l different gray levels in a scanned image are represented by {0,1, 2.,. L-1}, a normalized histogram of the input image is calculated, and P is usediI-0, 1,2, …, L-1 denotes the histogram, assuming a threshold k is chosen at which the image is divided into C01, k, and C1Two groups of { k +1, k + 2.,. L-1}, then the pixels are divided into C0Group C and1the probabilities in the group are:
Figure FDA0002427945390000021
C0and C1The average gray levels of (a) are respectively:
Figure FDA0002427945390000022
the average value of the gray levels of the whole image is:
μ=μ0ω01ω1
calculate C0And C1Variance between groups:
σ2=ω00-μ)211-μ)2=ω0ω110)2
finding the optimal threshold k among L different gray levels*Will make σ2Maximum k*As an optimum threshold value, the scanned document image is converted into a binary image using the optimum threshold value.
3. The method of character recognition based on convolutional neural network as claimed in claim 1, wherein: in the step 2, the inclination angle of the binary image is found by using Hough transform, and the specific implementation mode is as follows,
in image space, all lines passing through the point (x, y) are represented by polar equations:
ρ=x cosθ+y sinθ,0≤θ≤π
where ρ is the distance from the origin to the line, θ is the angle between the line and the x-axis;
establishing an accumulator H (rho, theta) in the parameter space rho and theta, initializing the element value of the accumulator to be 0, finding all values corresponding to rho and theta in the parameter space rho and theta for any point (x, y) in the image, accumulating the accumulator H (rho, theta), and then carrying out local peak value detection on the accumulator H (rho, theta) to obtain rho and theta parameters of all straight lines in the image; in order to reduce the interference of redundant information, only the straight line in the image close to the upper boundary area is considered, and the parameter theta of the longest straight line l, l in the boundary area is found to be the inclination angle of the scanned document.
4. The method of character recognition based on convolutional neural network as claimed in claim 1, wherein: the specific implementation of step 4 comprises the following sub-steps,
step 4.1, searching all connected regions in the corrected image in the step 2 by adopting a stroke-based connected region marking algorithm, wherein the searching step is as follows:
1) scanning the corrected image in the step 2 line by line, forming a sequence of continuous white pixels in each line, wherein the sequence is called a cluster, and recording the starting point, the end point and the line number of the cluster;
2) traversing the cliques in each row starting from the second row, and giving a new label to the cliques in a certain row if the cliques in the row do not overlap with the cliques in the previous row; if there is only an overlap with one blob in the previous row, assigning it the label of the blob with which the previous row overlaps; if it has an overlapping part with 2 and more blobs in the previous row, assigning the smallest label of the blob in the previous row which overlaps it, and writing the labels of these several blobs in the previous row into the equivalent pair, which shows that they belong to one class;
3) converting the equivalent pair into equivalent sequences, each sequence being given the same reference number since they are equivalent, starting with 1 and giving each equivalent sequence a reference number;
4) the marks of all the groups are carried out one by one, corresponding equivalent sequences are searched, and new marks are given to the equivalent sequences;
5) after traversing is finished, dividing the clusters with the same mark into a connected region, and searching to obtain all the connected regions;
and 4.2, measuring the connected regions by setting the length-width ratio, the maximum length-width size and the neighbor connection distance according to the character characteristics of the engineering drawing, and finally, screening and combining the connected regions to obtain a single character positioning result.
5. The method of character recognition based on convolutional neural network as claimed in claim 1, wherein: the specific implementation of step 5 comprises the following sub-steps,
step 5.1, obtaining the coordinates of the starting point, the width, the height and the coordinates of the central point of the circumscribed rectangular frame of the single character in the step 4, sorting the circumscribed rectangular frames of all the characters from left to right and from top to bottom according to the coordinates of the central point, processing each character connected region according to the sorting, and setting the circumscribed rectangular frame of the first character as a label 1 to represent that the circumscribed rectangular frame belongs to a first character string; for all the character bounding rectangle frames except the first one, if the distance between the character bounding rectangle frame and the center point of the character bounding rectangle frame in front is larger than the set maximum distance of the same character string, a new label is given to the character bounding rectangle frame, and the character bounding rectangle frame belongs to another new character string; if one or more central point distances between the character region label and the character circumscribed rectangle frame before sequencing are within the set maximum distance of the same character string, the characters belong to the same character string, and a character region label meeting the requirement is given to the current character circumscribed rectangle frame; sorting all labels, wherein the same label belongs to the same character string to obtain a character string combination;
and 5.2, finding a fitted straight line of the central point of the character circumscribed rectangle frame in the character string by using a least square method, judging the trend of the character string according to the slope of the fitted straight line, and rotating the oblique character string by a corresponding angle to the horizontal direction.
6. The method of character recognition based on convolutional neural network as claimed in claim 1, wherein: in step 6, a circle Hough transform algorithm is adopted to search a circle in the corrected image,
the equation of a circle with (a, b) as the center and r as the radius is (x-a)2+(y-b)2=r2
Rewriting the equation into parametric form
a=x-r cosθ,b=y-r sinθ,θ∈[0,2π)
Wherein a and b are horizontal and vertical coordinates of the circle center, r is the radius of the circle, and theta is the included angle between one point on the circle and the straight line where the circle center is located and the x axis; mapping the point (x, y) on each circle to the (a, b, r) parameter space to obtain a quadric surface, establishing an accumulator H (a, b, r) in the parameter space, initializing the element value of the accumulator to be 0, and setting the value range of the radius of the circle to be detected as r belonging to r1,r2]Traversing all points in the binary image, finding all values in the corresponding parameter space, accumulating the H (a, b, r) accumulator, wherein the larger the value of the accumulator is, the more the value of the accumulator is, the binary value is representedThe more likely there is a circle having (a, b) as the center and r as the radius in the image; and carrying out local peak value detection on the accumulator to obtain parameters of all circles in the image, thereby finding all circles in the drawing.
7. The method of character recognition based on convolutional neural network as claimed in claim 1, wherein: in step 7, a convolutional neural network is constructed under a MatConvNet framework, and the structure of the network comprises: the input image size is 20 × 20, and the output is a matrix of 1 × 1 × ClassNum, where ClassNum represents the number of categories, each element of the matrix represents the probability that the input image belongs to a certain category character, and the category corresponding to the maximum probability represents the category to which the input image belongs.
CN201810332531.9A 2018-04-13 2018-04-13 A Character Recognition Method Based on Convolutional Neural Network Expired - Fee Related CN108596066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810332531.9A CN108596066B (en) 2018-04-13 2018-04-13 A Character Recognition Method Based on Convolutional Neural Network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810332531.9A CN108596066B (en) 2018-04-13 2018-04-13 A Character Recognition Method Based on Convolutional Neural Network

Publications (2)

Publication Number Publication Date
CN108596066A CN108596066A (en) 2018-09-28
CN108596066B true CN108596066B (en) 2020-05-26

Family

ID=63622628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810332531.9A Expired - Fee Related CN108596066B (en) 2018-04-13 2018-04-13 A Character Recognition Method Based on Convolutional Neural Network

Country Status (1)

Country Link
CN (1) CN108596066B (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657665B (en) * 2018-10-31 2023-01-20 广东工业大学 Invoice batch automatic identification system based on deep learning
CN109858325B (en) * 2018-12-11 2021-07-02 科大讯飞股份有限公司 Table detection method and device
CN112037224B (en) * 2018-12-27 2024-03-22 福建晨曦信息科技集团股份有限公司 Method for generating axis according to marking line, electronic equipment and storage medium
CN109815954A (en) * 2019-01-31 2019-05-28 科大讯飞股份有限公司 Correction for direction method, apparatus, equipment and the storage medium of VAT invoice image
CN109993160B (en) * 2019-02-18 2022-02-25 北京联合大学 Image correction and text and position identification method and system
CN110059600B (en) * 2019-04-09 2021-07-06 杭州视氪科技有限公司 A single-line text recognition method based on pointing gesture
CN111898623B (en) * 2019-05-05 2023-09-05 杭州海康威视数字技术股份有限公司 A character recognition method, device and electronic equipment
CN110175531B (en) * 2019-05-05 2021-02-09 沈阳图为科技有限公司 Attitude-based examinee position positioning method
CN110232438B (en) * 2019-06-06 2021-07-20 北京致远慧图科技有限公司 Image processing method and device of convolutional neural network under polar coordinate system
CN110377768B (en) * 2019-06-10 2022-03-08 万翼科技有限公司 Intelligent graph recognition system and method
US11487998B2 (en) * 2019-06-17 2022-11-01 Qualcomm Incorporated Depth-first convolution in deep neural networks
CN110443269A (en) * 2019-06-17 2019-11-12 平安信托有限责任公司 A kind of document comparison method and device
CN110276189B (en) * 2019-06-27 2022-02-11 电子科技大学 User identity authentication method based on gait information
CN110378337B (en) * 2019-07-05 2023-03-31 上海交通大学 Visual input method and system for drawing identification information of metal cutting tool
CN110490193B (en) * 2019-07-24 2022-11-08 西安网算数据科技有限公司 Single character area detection method and bill content identification method
CN112836554A (en) * 2019-11-25 2021-05-25 广东博智林机器人有限公司 Image verification model construction method, image verification method and device
CN111031222A (en) * 2019-12-27 2020-04-17 山东厚德测控技术股份有限公司 A real-time identification device and method for a camera-type gas meter character wheel
CN111460202A (en) * 2020-04-01 2020-07-28 上海建工四建集团有限公司 Label determining method and device for electronic engineering drawing, storage medium and terminal
CN112052723A (en) * 2020-07-23 2020-12-08 深圳市玩瞳科技有限公司 Literacy card, and desktop scene STR method and device based on image recognition
CN112200181B (en) * 2020-08-19 2023-10-10 西安理工大学 A text shape approximation method based on particle swarm optimization algorithm
CN112364863B (en) * 2020-10-20 2022-10-28 苏宁金融科技(南京)有限公司 Character positioning method and system for license document
CN112200117B (en) * 2020-10-22 2023-10-13 长城计算机软件与系统有限公司 Form identification method and device
CN112330659B (en) * 2020-11-24 2023-04-07 成都信息工程大学 Geometric tolerance symbol segmentation method combining LSD (least squares) linear detection and connected domain marking method
CN112487985B (en) * 2020-11-30 2025-01-17 江苏云控软件技术有限公司 A method for positioning a water gauge of a vessel
CN112434665A (en) * 2020-12-12 2021-03-02 广东电力信息科技有限公司 Method and device for intelligently identifying financial data in image based on machine learning
CN112508003B (en) * 2020-12-18 2023-10-13 北京百度网讯科技有限公司 Character recognition processing method and device
CN112883795B (en) * 2021-01-19 2023-01-31 贵州电网有限责任公司 Rapid and automatic table extraction method based on deep neural network
CN112766266B (en) * 2021-01-29 2021-12-10 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN113191348B (en) * 2021-05-31 2023-02-03 山东新一代信息产业技术研究院有限公司 Template-based text structured extraction method and tool
CN113610043B (en) * 2021-08-19 2024-09-27 海默潘多拉数据科技(深圳)有限公司 Industrial drawing form structured recognition method and system
CN114445285B (en) * 2021-11-08 2025-11-04 北京全路通信信号研究设计院集团有限公司 A signal plane graph-based signal recognition method and system based on deep learning
CN114170616A (en) * 2021-11-15 2022-03-11 嵊州市光宇实业有限公司 Electric power engineering material information acquisition and analysis system and method based on graph paper set
CN116167703A (en) * 2021-11-24 2023-05-26 国网宁夏电力有限公司经济技术研究院 Automatic calculation method, device, equipment and storage medium for power transmission line engineering
CN116503870A (en) * 2022-01-19 2023-07-28 中国移动通信有限公司研究院 Character recognition method, character recognition device, electronic equipment and readable storage medium
CN114299502B (en) * 2022-03-07 2022-06-17 科大智能物联技术股份有限公司 Method for correcting and identifying inclination of code-spraying characters on end face of round casting blank and storage medium
CN114663898A (en) * 2022-04-02 2022-06-24 广州市勤思网络科技有限公司 Method and system for extracting key information from forms based on insurance claims scenarios
CN115272172B (en) * 2022-06-07 2025-08-19 国网湖北省电力有限公司超高压公司 Automatic recognition method for switch cabinet terminal diagram suitable for non-table type graphic primitives
CN115393856B (en) * 2022-06-14 2025-09-23 国网浙江省电力有限公司丽水供电公司 Optical character recognition system and method
CN116343209B (en) * 2023-03-14 2025-10-03 广东电网有限责任公司广州供电局 String recognition method, device and medium for robotic process automation
CN116386053B (en) * 2023-03-28 2025-11-14 国网北京市电力公司 A method, system, device and medium for calibrating text recognition results in electronic drawings.
CN116437063A (en) * 2023-06-15 2023-07-14 广州科伊斯数字技术有限公司 Three-dimensional image display system and method
CN117218675B (en) * 2023-09-19 2025-10-10 中广核工程有限公司 Drawing table area detection method, device, storage medium and electronic device
CN117710985B (en) * 2023-12-18 2024-08-06 珠海凌烟阁芯片科技有限公司 Optical character recognition method and device and intelligent terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446896A (en) * 2015-08-04 2017-02-22 阿里巴巴集团控股有限公司 Character segmentation method and device and electronic equipment
CN107491730A (en) * 2017-07-14 2017-12-19 浙江大学 A kind of laboratory test report recognition methods based on image procossing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446896A (en) * 2015-08-04 2017-02-22 阿里巴巴集团控股有限公司 Character segmentation method and device and electronic equipment
CN107491730A (en) * 2017-07-14 2017-12-19 浙江大学 A kind of laboratory test report recognition methods based on image procossing

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
一种改进的图像水平倾斜角度测量算法的应用;金晅宏 等;《传感器与微系统》;20131231;第32卷(第9期);第146-152页 *
图像分析:二值图像连通域标记;*Ronny;《https://www.cnblogs.com/ronny/p/img_aly_01.html》;20140314;第1-8页 *
基于图像技术的表格结构识别研究;郝红永;《中国优秀硕士学位论文全文数据库信息科技辑》;20130515(第05期);第I138-1906页 *
基于高阶卷积神经网络的工图零件字符识别与检测;齐嘉锐;《中国优秀硕士学位论文全文数据库信息科技辑》;20170915(第09期);第I138-314页 *

Also Published As

Publication number Publication date
CN108596066A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108596066B (en) A Character Recognition Method Based on Convolutional Neural Network
CN109389121B (en) Nameplate identification method and system based on deep learning
CN105046252B (en) A kind of RMB prefix code recognition methods
CN106446750B (en) A kind of bar code read method and device
CN110298376B (en) An Image Classification Method of Bank Notes Based on Improved B-CNN
CN112085024A (en) A method for character recognition on the surface of a tank
CN107491730A (en) A kind of laboratory test report recognition methods based on image procossing
CN106875546A (en) A method for identifying value-added tax invoices
CN115063802B (en) A method, device, and medium for recognizing circular seals based on PSENet
CN111091124B (en) Spine character recognition method
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
CN109389115B (en) Text recognition method, device, storage medium and computer equipment
CN106169080A (en) A kind of combustion gas index automatic identifying method based on image
CN112115948A (en) Chip surface character recognition method based on deep learning
CN113971792A (en) Character recognition method, device, device and storage medium for traffic sign
CN113792780B (en) Container number recognition method based on deep learning and image post-processing
CN111259893A (en) Intelligent tool management method based on deep learning
CN110738216A (en) Drug Recognition Method Based on Improved SURF Algorithm
CN108537217A (en) Identification based on character code mark and localization method
CN118447527A (en) House pattern recognition method based on region segmentation and target detection
CN115880566A (en) Intelligent marking system based on visual analysis
CN110378337B (en) Visual input method and system for drawing identification information of metal cutting tool
Ntzios et al. An old greek handwritten OCR system based on an efficient segmentation-free approach
CN109271882B (en) A color-distinguishing method for extracting handwritten Chinese characters
CN110263784A (en) The English paper achievement of intelligence identifies input method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200526

CF01 Termination of patent right due to non-payment of annual fee