WO2010092952A1 - パターン認識装置 - Google Patents
パターン認識装置 Download PDFInfo
- Publication number
- WO2010092952A1 WO2010092952A1 PCT/JP2010/051889 JP2010051889W WO2010092952A1 WO 2010092952 A1 WO2010092952 A1 WO 2010092952A1 JP 2010051889 W JP2010051889 W JP 2010051889W WO 2010092952 A1 WO2010092952 A1 WO 2010092952A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pattern
- feature
- character
- recognition
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/66—Analysis of geometric attributes of image moments or centre of gravity
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5838—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5854—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
- G06V30/1478—Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
- G06V30/268—Lexical context
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present invention relates to a pattern recognition apparatus, and mainly relates to an apparatus capable of real-time recognition of characters and pictograms captured by a camera.
- Pattern recognition using a camera has attracted attention in recent years because it can be used in various applications.
- One promising application is a “translation camera” which is a translation device combining a camera and a character recognition device (see Non-Patent Documents 1 and 2).
- it can be applied to visually impaired people by recognizing characters copied by a camera and converting them to speech.
- This application is beneficial for the visually impaired. Because some visually impaired people have difficulty finding letters, this application, which can be called “machine vision”, is very useful.
- machine vision is very useful.
- (1) real-time processing is possible, (2) robust against geometric distortion, and (3) recognition independent of character arrangement is possible. Practical camera-based character recognition technology that satisfies these three requirements is necessary.
- Non-Patent Document 4 For dealing with geometric distortion, when a target is limited to characters, a known method is realized (for example, see Non-Patent Documents 3 and 4). In particular, the technique of Non-Patent Document 4 operates in real time. It has been reported. In these methods, a line of text is first extracted from an image captured using a camera, and then affine distortion, which is an approximation of the projection distortion with the highest degree of freedom of deformation among geometric distortions, is corrected, and finally cut out. Techniques have been proposed for recognizing printed characters. However, for example, the technique of Non-Patent Document 4 corrects projection distortion in units of character lines, so that characters that do not form character lines cannot be recognized.
- the present invention provides a pattern recognition device based on a simple but effective technique that can recognize patterns such as characters and pictograms in real time in order to realize a pattern recognition technology that satisfies the above-mentioned requirements. To do.
- the present invention includes a cutout unit configured to cut out at least one pattern element from a query image that has been subjected to geometric transformation and includes at least one pattern element.
- a feature of the pattern element represented by at least three feature points including the first, second and third feature points specified based on the rule of
- a feature amount acquisition unit that acquires as a quantity, a collation unit that collates a plurality of reference feature amounts respectively representing features of a plurality of different reference patterns prepared as pattern recognition candidates, and the query feature amount, and a collated feature
- a pattern determination unit that determines, as a recognition result, a reference pattern identified from the candidates based on the similarity of the quantities, and each reference feature quantity , Represented using feature points determined from each reference pattern based on the rule, and based on the predetermined rule, the position of the first feature point is within the pattern element and is relative to the geometric transformation.
- the position of the second feature point is a property related to the shape of the pattern element and is determined using a property that is invariant to the geometric transformation
- the position of the third feature point is There is provided a pattern recognition apparatus characterized by being specified from a predetermined amount that is invariant to the geometric transformation and the determined positions of the first and second feature points.
- the position of the first feature point is specified as a point that is in the pattern element and is invariant to the geometric transformation
- the position of the second feature point is the pattern element.
- the first characteristic and the third characteristic point are determined with a predetermined amount that is invariable with respect to the geometric transformation. Since the number of sets of the invariant coordinate system is specified from the positions of the two feature points, the number of sets of the invariant coordinate system is limited to the number of combinations that determine one of the pixels satisfying a predetermined criterion as the first feature point. Therefore, the processing time can be greatly reduced as compared with a known geometric hashing method.
- the corresponding second and third feature points are uniquely determined.
- p P n ways which are permutations for selecting the predetermined number n of feature points from the number p of all feature points constituting the pattern region.
- p P 1 ways for selecting the first point.
- a pattern can be recognized in real time from an image acquired with geometric transformation.
- any of the first and second feature points may be determined first.
- the query image is an image including a pattern to be recognized.
- the pattern comprises one or more blocks of connected components.
- a lump of connected components is a pattern in which patterns to be recognized are connected together in a lump.
- the characters “I” and “J” are examples of recognition targets made up of a single connected component connected regionally.
- there is a separation character or separation pattern in which one character is composed of a plurality of connected components such as “i” and “j”.
- the query image is acquired by receiving a geometric transformation. For example, when an image including characters to be recognized is read by an image scanner, the image is read with geometric distortion such as enlargement / reduction and rotation.
- the query image is distorted by similarity transformation.
- a projection distortion is caused by a deviation from the directly facing position.
- the amount of deviation is small, it can be approximated as an affine distortion without a change in magnification in the depth direction.
- the image features of each pattern are registered in the image database in association with vectors representing the features.
- An example of an image feature is a shape feature, a light / dark distribution feature, a color feature, or a combination thereof.
- Each vector in the image database is registered in advance in a systematic state using a hash table so that the vectors can be collated in a short time.
- the functions of the cutout unit, the feature amount acquisition unit, the collation unit, and the pattern determination unit may be realized by a computer executing a predetermined program. Alternatively, part or all of the processing may be realized by, for example, hardware mounted on a semiconductor chip. In the embodiment described later, the function of each unit is realized by hardware and software of a personal computer.
- the pattern extraction of the present invention is subjected to adaptive binarization and contour extraction.
- Geometric hashing is used for pattern recognition according to the present invention.
- the calculation amount of Geometric Hashing considering affine transformation is O (P 4 ) when P is the number of feature points, but in this invention it can be reduced to O (P 2 ) using the invariant calculation principle. is there. Further, by using a combination of techniques using a voting mechanism, the present invention can operate in real time on a notebook computer connected to a web camera.
- FIG. 35 is a graph showing candidates for each character in the word region 2 in FIG. 34 in the estimated arrangement order in the demonstration system used in Experimental Example 4 according to the present invention.
- the present invention that solves the above-described problems is expressed differently from a query image in which one or more types of patterns are acquired with a geometric transformation. 2 points out of the selected 3 points are combined for each combination that determines 3 or more feature points based on a predetermined rule from a cutout processing unit (the cutout unit) that cuts out (element) as a pattern region
- a feature vector generation unit that generates a vector representing an image feature of the pattern region as a query feature vector by using an invariant coordinate system that is based on two vectors and is invariant to the geometric transformation.
- An amount acquisition unit) and an index value is calculated by applying a predetermined hash function to the generated query feature vector.
- a reference feature vector representing a geometric feature of each reference pattern is associated with the reference pattern, and is classified into a plurality of bins and registered in advance.
- a table is referred to using the index, a collation unit that collates the query feature vector for which the index is calculated with one or more reference feature vectors registered in a reference destination bin, and a recognition result based on the collation.
- a pattern determining unit for determining a power reference pattern wherein the reference feature vector is generated for each combination for selecting three of the feature points determined through the same procedure as the query feature vector, and the feature vector
- the generation unit as the rule applied to the determination of the feature point, a predetermined pixel among the pixels related to the pattern region
- One of the pixels satisfying the criterion is determined as a first feature point
- one point determined from a property invariant to the geometric transformation is determined as the second feature point for the shape of the pattern region
- the present invention relates to a pattern recognition apparatus that determines one point determined as a third feature point based on a predetermined value and first and second feature points as an invariant for the geometrical transformation.
- the feature vector generation unit determines, as the rule to be applied to determination of feature points, one of the pixels related to the pattern region that satisfies a predetermined criterion as a first feature point. Then, one point determined from the invariant property with respect to the geometric transformation is determined as the second feature point for the shape of the pattern region, and a predetermined value as the invariant with respect to the geometric transformation and the first, Since one point determined based on the second feature point is determined as the third feature point, the number of sets of the invariant coordinate system is one of the pixels satisfying a predetermined criterion as the first feature point. Limited to the number of combinations to be determined. Therefore, the processing time can be greatly reduced as compared with a known geometric hashing method.
- the position of the first feature point may be specified from pixels on the contour of the pattern element. In this way, the first feature point can be reliably determined as one point on the contour by extracting the contour of the pattern region.
- the property is a property in which the center of gravity is invariant with respect to the affine transformation as a kind of the geometric transformation, and the position of the second feature point is specified as the center of gravity of the pattern element using the property. Also good.
- the center of gravity is an invariant for the affine transformation. In this way, the second feature point can be uniquely determined as the center of gravity of the target pattern region.
- the property is a property in which the area ratio is invariant with respect to the affine transformation as a kind of the geometric transformation, the position of the third feature point is specified from the outline of the pattern element, and Using the property, the pattern element may be specified based on a predetermined value of an area ratio between the area of the pattern element and the area of a triangle having the first, second, and third feature points as vertices.
- the area ratio is an invariant with respect to the affine transformation. In this way, the third feature point can be uniquely determined from the predetermined invariant value and the first and second feature points.
- the feature quantity acquisition unit is a coordinate system based on two primary independent vectors each connecting two of the first to third feature points, and is invariant to the geometric transformation. Features that are invariant to the geometric transformation may be obtained using a coordinate system.
- the collation unit collates a reference feature amount associated with a corresponding reference pattern and registered in a hash table with the query feature amount, and the hash table includes a plurality of bins, and each reference feature The quantity is classified and registered in advance in one bin determined by calculating a predetermined hash function for the reference feature quantity, and the collation unit is obtained by calculating the hash function for the query feature quantity.
- the matching may be performed by referring to an appropriate bin using an index. In this way, since the reference feature vector corresponding to the query feature vector can be collated using the hash table, the collation with each query feature vector can be performed in a short time.
- the pattern determination unit determines the orientation of the pattern region based on the correspondence between the feature point coordinates determined by the feature vector generation unit and the feature point coordinates registered in the hash table for the pattern region of the query image.
- the posture of the query image may be estimated by performing a majority process on each estimate. In this way, by correcting the geometric distortion of the query image by inferring the posture of the acquired query image based on the correspondence between the coordinates of the feature points, the accuracy is higher than when no correction is performed. Can be verified.
- the pattern determination unit includes a separation pattern table in which at least one set of separation patterns is registered. Each separation pattern corresponds to one of the reference patterns, and the set of separation patterns has one recognition result. Determining whether or not there is a correspondence between the reference pattern identified from the candidates and one separation pattern of the set with reference to the separation pattern table, and the correspondence exists. In addition, when correspondences for all other separation patterns in the set already exist, a recognition result may be provided by the set to which the separation pattern corresponding to the identified reference pattern belongs. In this way, the recognition result can be determined for the separation pattern.
- the separation pattern table is registered with a relative position between a certain separation pattern and another separation pattern
- the pattern determination unit is configured to select a separation pattern corresponding to a specified reference pattern.
- the recognition result may be determined when there is another specified reference pattern at a position determined by the registered relative position. In this way, the recognition result can be determined with higher accuracy in consideration of the positional relationship between the separation pattern and another pattern related to the combination.
- the query image includes a pattern of words made up of a plurality of characters, finds the shortest route that passes through each character recognized by the pattern determination unit once, and the order and reverse order of the obtained routes.
- a word candidate determination unit that uses each as a word candidate, a rotation angle determination unit that calculates a rotation angle of each character with respect to a predetermined direction in the query image, and a rotation angle between two characters that are adjacent in the order of the path or in reverse order
- the first evaluation index is used as the first evaluation index, and when one end of each candidate is defined as the first character, the direction toward the second character adjacent to the first character and the predetermined reading direction rule are used.
- a rotation angle to be taken by one character is estimated, and a difference between the estimated rotation angle and the rotation angle of the first character determined by the rotation angle determination unit is used as a second evaluation index, and the first and second evaluations are performed.
- Select candidates to minimize metrics It may further comprise a read order determining unit for determining a reading order of characters composing the word by Rukoto.
- the words are separated from other words by, for example, separating them with a space like English, and a predetermined reading direction rule, for example, a rule of writing from left to right. Word recognition that deals with languages that follow
- the various preferable aspects shown here can also be combined.
- the query image corresponds to a paper surface including a plurality of characters and marks as shown in FIG.
- a pattern element is a word corresponding to a connected component.
- the query feature amount is a vector amount representing the feature of each pattern element of the query image.
- the reference pattern corresponds to a connected component representing each character in character recognition. For example, it corresponds to each character of FIG. 14A (to be described later), each pictogram of (b), and each pattern of the first column (not the second column) of the separated character table of FIG.
- the reference feature amount represents the feature of each reference pattern, and is compared (matched) with the query feature amount.
- the separation pattern table corresponds to the separation character table of FIG.
- the group constituting the character “j” and the group constituting “i” are included in the separation pattern table.
- the separation pattern corresponds to, for example, each pattern in the first column (not the second column) of the separated character table in FIG.
- the present invention will be described in more detail with reference to the drawings.
- the following description is an illustration in all the points, Comprising: It should not be interpreted as limiting this invention.
- black pixels are an example, and a group of pixels having a predetermined color and / or density that can be distinguished from the ground and a group of pixels having a predetermined range of color and / or density are pattern regions according to the present invention.
- the pixels constituting the pattern area are referred to as “black pixels”.
- Registration process The reference image is registered. First, it is assumed that feature points extracted from a reference image to be registered are given. Next, three points are selected from all the feature points, and two basis vectors are created in consideration of the order of the selected feature points as shown in FIG. Then, using the two basis vectors, a new coordinate system is created as shown in FIG. 2B, and the feature points are mapped.
- This coordinate system is an affine invariant coordinate system because it can be created in the same way even if the image is subjected to affine transformation. When this affine invariant coordinate system is divided into a grid as shown in FIG. 2B, each area corresponds to a bin of a two-dimensional hash table.
- O (P 3 ) is required for creating an affine invariant coordinate system
- O (P) is required for projecting feature points
- the calculation amount per reference image is O (P 4 ).
- O (P) or O (P 3 ) is a notation method of an approximate calculation amount necessary for solving the problem
- O (P) is the calculation amount when P is determined as P O (P 3 ) represents an order of the third power of P, ie, aP 3 + bP 2 + cP + d or less.
- a, b, c, d are constants. The same applies to O (P 4 ) and others.
- the first half of the search process is the same as the registration process. It is assumed that feature points extracted from the question image are given. Three points are selected from all the feature points, and two basis vectors are created in consideration of the order of the selected feature points as shown in FIG. Then, an affine invariant coordinate system is created using two basis vectors. This affine invariant coordinate system is partitioned in a grid pattern at the time of registration, and each area corresponds to a bin of a two-dimensional hash table. A set of registered image numbers and base serial numbers is extracted from the bin where each feature point exists, and a vote is made for the set of image numbers and base serial numbers (the voting table is two-dimensional).
- This process is executed for all possible bases, and the set of the image number and base serial number that has obtained the maximum number of votes is determined. Then, the number of this set of images is output as a search result. However, when the result becomes clear before processing all the bases, the processing can be terminated in the middle. Since O (P 3 ) is required for creating an affine invariant coordinate system, and O (P) is required for projecting feature points, the total calculation amount is O (P 4 ).
- Geometric Hashing solves the problem of identifying an object only from the arrangement of points when feature points are given. That is, it does not consider how the feature points are extracted from what object.
- a figure is identified by using both the arrangement of feature points obtained from the figure and the feature of the figure when the figure is given. That is, it is determined by applying a predetermined rule to the pattern area. For this reason, a point that does not change even when subjected to geometric transformation such as an angle or an inflection point obtained from a figure can be used as a feature point. The following describes how this method differs.
- Geometric Hashing is so large (M.Iwamura, T.Nakai and K.Kise, "Improvement of retrieval speed and required amount of memory for geometric hashing by combining local invariants, Proc. BMVC2007,” 2, pp.1010-1019, Sept. 2007.).
- the base set used in the registration process needs to be calculated in the search process.
- all (or many) bases are eventually calculated. Therefore, if the same base can be calculated in the registration process and the search process, the amount of calculation can be reduced. Therefore, the proposed Geometric Hashing improvement method reduces the amount of calculation required to calculate the same basis by selecting the same feature points in the registration process and the search process.
- the feature point selection method that reduces the amount of calculation in this invention will be described.
- a method of selecting three points in the case of affine transformation will be described as an example.
- the centroid of the figure is stored, so the centroid is taken as the first feature point. (Although there is no guarantee that the center of gravity will be on the contour, it doesn't matter.)
- the second point is selected appropriately, just like Geometric Hashing.
- the third point is determined automatically using the two points obtained so far and the properties of the invariants described below.
- the properties of invariants will be described using the simplest example of FIG. If three points A, B, and C are given on one straight line as shown in FIG. 4, AB / AC is an invariant that does not change even if it undergoes an affine transformation. In this way, invariant values are usually calculated from the coordinates of points.
- the coordinates of the point C are obtained.
- the amount of calculation can be reduced. Since the above method uniquely determines two points, the amount of calculation can be reduced from O (P 4 ) to O (P 2 ).
- Feature point selection method 1 Assume that three feature points are given as shown in FIG. Consider a half line passing through the first point and the second point and a half line passing through the first point and the third point, and let S 1 be the area cut from the figure. At this time, S 1 / S 0 is an affine invariant due to property 4 in Table 1. Therefore, the third point may be determined so that S 1 / S 0 has a specific value. When determining the third point, information such as clockwise or counterclockwise can be used to uniquely determine the point.
- Feature point selection method 2 Similarly to method 1, it is assumed that three feature points are given as shown in FIG. When the area of a triangle formed by three points is S 1 , S 1 / S 0 is an affine invariant due to property 4 in Table 1. Therefore, the third point may be determined so that S 1 / S 0 has a specific value. The value of S 1 / S 0 may not be a specific value but may be a maximum value, a minimum value, or the like.
- information such as clockwise or counterclockwise can be used to uniquely determine the point. Considering that the third point is determined so that S 1 is constant, the possible third point trajectory is a straight line parallel to the straight line passing through the first and second points as shown in FIG. Therefore, it is only necessary to determine the intersection of the straight line and the figure as the third point, and the calculation is simple. If there are multiple intersections, a selection method is also possible, such as selecting the closest point from the second point.
- the first two points can be determined by a method different from the above method. That is, similarly to Geometric Hashing, the first point can be appropriately selected from the P points, and can be determined using the area ratio when determining the second point. If two feature points are given as shown in FIG. 8, the area ratio S 1 / S 0 is an affine invariant. Therefore, the second point may be determined so that S 1 / S 0 has a specific value.
- Geometric Hashing registers pairs of image numbers and base numbers in a database.
- the feature vector calculated from the image and the coordinates of the feature point used to create the base are registered instead of the base number (see FIG. 9).
- the reason for using the feature vector calculated from the image is that the feature of the image is more expressive. In the problem setting of Geometric Hashing, only the coordinates of the feature points extracted from the search target are given, but in the problem considered in this embodiment, the figure to be searched is given. Therefore, feature vectors extracted from figures can be used. In addition, the reason for registering the coordinates of the feature points used to create the base is that the posture estimation accuracy and the recognition accuracy are improved by using these in the voting process described later.
- the “image number” will be referred to as “character number” in order to specialize in the story of character recognition as a typical example of pattern recognition according to the present invention. “Search” is called “recognition”. Further, the “pattern recognition device” is referred to as a “character recognition device”.
- FIG. 10 is an explanatory diagram showing a method for describing a separation character according to the present invention.
- FIG. 10A shows that the separation character is described by a vector representing the area of each connected component constituting the separated character and the relative position of those elements.
- FIG. 10B shows an example of a separated character table for describing separated characters.
- the number of connected components in the image is checked when the reference image is registered. For a reference image including two or more connected components, each connected component is treated like a different character and registered separately, and the separated character table of FIG. 10B is registered.
- the separated character table of FIG. 10B is composed of five elements, and each element is numbered from the first to the fifth in order from the left end.
- the first element indicates the shape of the connected component and / or the number of the connected component.
- the second element indicates a separation character including the connected component.
- the third element is a vector indicating the relative positions of the connected components.
- the fourth element indicates the area of the connected component.
- the fifth element indicates the area of the connected component to be paired.
- the connected component under “i” is the same shape as “I” (uppercase eye) and “l” (lowercase L), so it cannot be distinguished by appearance. Therefore, in order to correctly recognize “i”, it is necessary to check whether it is “i” for all connected components having the same shape such as “I” and “l”. If the connected component above “i” is present at a predetermined position, it is recognized as “i”, otherwise it is recognized as “I” or “l”.
- FIG. 11 shows an outline of a pattern recognition apparatus according to the present invention.
- the apparatus is roughly composed of an image registration unit 11 and an image recognition unit 13.
- the character recognition apparatus according to the present invention includes at least an image recognition unit 13 that can access the database 15. Each is described below.
- Image Registration Unit The image registration unit 11 registers reference images in the database 15.
- the reference image is a binary image.
- W and H are the width and height of the image.
- the degree of blur is adjusted by the standard deviation ⁇ of the normal distribution to be convoluted.
- the generated deteriorated image is binarized again, and is thereafter handled in the same manner as the reference image.
- a feature vector is generated by selecting three feature points using the method described in 2 and generating an invariant coordinate system. Hereinafter, a method for calculating a feature vector from the obtained three points will be described. If there are two feature points, one straight line passing through the two points can be determined, so the total from three points.
- a straight line of books can be calculated.
- k uniform partial areas as shown in FIG. 12 can be set.
- k l ⁇ l.
- Setting a partial region in this way is the same as setting a lattice in an invariant coordinate system defined by two bases in Geometric Hashing as shown in FIG.
- a k-dimensional feature vector can be calculated by counting the number of feature points in each region and normalizing them so that the sum is 1.
- the value of each area may be calculated not only for the pixels on the contour but also for the total number of pixels in the figure. Since there are three combinations to select two from the three straight lines in order, three k-dimensional feature vectors can be calculated. By simply combining them, a 3k-dimensional feature vector can be obtained.
- the database 15 is specifically composed of a hash table.
- the registration method to the database 15 will be described.
- the character number, feature vector, and coordinates of the three feature points are paired and stored in the hash table.
- the hash index H index is calculated by the following equation.
- H size is the size of the hash table
- r i is the value of the i th element of the feature vector
- D is that the i th element of the feature vector is quantized into D stages. If a collision occurs, they are linked in a list structure as shown in FIG.
- Image recognition unit 4.2.1. Acquisition of images Images are acquired as still images or videos with a digital camera or web camera. When captured as a moving image, it is disassembled for each frame and handled as a plurality of still images. Each obtained image is called a question image and used in the following processing. 4.2.2. Cutting out character images
- a character image is cut out from the obtained image.
- adaptive binarization is a method of determining white (luminance 1) or black (luminance 0) depending on whether the luminance of the pixel of interest is brighter or darker than the average luminance in the neighboring region.
- luminance I b (x, y) is expressed by the following equation.
- the connected component is a group of black pixels adjacent to each other in the image.
- the obtained connected component is regarded as a pattern area candidate, cut out in a rectangular shape, and set as a recognition processing target described below. However, if the area of the obtained connected component is equal to or smaller than the threshold value, it is regarded as noise and excluded from the recognition target.
- the reason for using the weight is that the number of feature points (contour length) P is different for each character, and it is considered that characters with many feature points will have an unfair number of votes.
- M be the maximum number of votes obtained by weighted voting. Based on this value, we define two groups from weighted votes. The first is a group of characters with a vote count of 0.9M or more and is called an "estimated group”. The second is a group of characters with a vote count of 0.8M or more and is called a “candidate group”.
- One affine transformation matrix is obtained from the correspondence between the coordinates of the three points obtained from the query image and the coordinates of the three points in the database. (However, the position shift is not considered in the affine transformation matrix.) Since there are S sets of coordinates of the three points obtained from the query image, a total of S affine transformation matrices are obtained. For each affine transformation matrix T, the enlargement ratio ⁇ , the rotation angle ⁇ , the degree of shear deformation ⁇ , Decompose the independent variable magnification ⁇ into four parameters.
- the number of points included in a total of 9 areas in the vicinity of that area and 8 is totaled, and the number of points is used as the score of the area. After calculating scores for all partial regions, the region with the highest score is selected. If the number of points included in this area is greater than 30, the area is divided into 25 equal parts and the same processing is repeated until the number of points is 30 or less. When the number of feature points included in the area with the largest score is 30 or less, the center value of the area is assumed to be the estimated values of ⁇ and ⁇ , far.
- a recognition result is determined for each connected component.
- the affine transformation matrix of the characters belonging to the “candidate group” The closest point is selected from, and the character given the affine transformation matrix is taken as the recognition result (first candidate). If you need two recognition results, remove the first candidate Select the closest point from, and make it the second candidate. Thereafter, the same processing is repeated.
- the posture of the connected component in the question image is estimated as an affine transformation matrix from the correspondence between the feature points of the question image and the feature points of the reference image. Since this contains an erroneous affine transformation matrix obtained in association with another connected component, weighted voting is performed on the connected component number as shown in FIG. 13 (a), and a reliable one is selected. To do. The reason why the weight is used is that connected components with many feature points are considered to have an unfair number of votes. When i-th feature points registered in the connected components (the length of the outer contour) was N i, multiplied by the weight of 1 / ⁇ N i for each voting power.
- Two groups are defined on the basis of the maximum number of votes obtained by weighted voting (M).
- the first is a group of reference image connected components with a vote count of 0.9M or more, and is called an “estimated group”.
- the second is a “candidate group” for groups with 0.8M or more votes. These are determined for each connected component of the question image.
- the posture of the paper is estimated.
- the shear deformation and independent scaling parameters should be common to all connected components. Therefore, similarly to the above-described document, a plausible set of parameters is estimated using density estimation in a two-dimensional space as shown in FIG.
- the affine transformation matrix of the connected component belonging to the “estimation group” is plotted in the aforementioned two-dimensional space.
- a point having the highest density in the vicinity is selected from the plotted points.
- T area 0.7.
- the recognition result is determined for each connected component.
- a pair of plausible rotation angle parameters and connected component numbers is estimated using density estimation in a two-dimensional space created by the connected component rotation angles and connected component numbers as shown in FIG.
- an affine transformation matrix of connected components belonging to the “candidate group” is used for the estimation.
- the rotation angle is a continuous amount, but the connected component numbers are discrete, so that one-dimensional density estimation is performed for each connected component number.
- the cumulative recognition rate and average processing time are shown in FIG.
- the cumulative recognition rate of Arial, Century, and Gigi fonts increases as the rank increases, but the recognition rate is saturated at about 6th place.
- Impact has a very poor 1st place recognition rate, but the cumulative recognition rate continues to rise to 20th place.
- a test pattern shown in FIG. 15 was created as a recognition target. This includes nine types of conditions with three types of character size (72pt, 48pt, 32pt) and three types of character inclination (0 °, 30 °, 45 °). Since there are 12 characters per condition, a total of 108 characters are included. 62 sheets were created for 62 characters. The printed pattern was photographed so that the inclination of the digital camera with respect to the paper surface was three types: 0 degrees, 30 degrees, and 45 degrees.
- the average character size when shooting 72 pt Arial ⁇ A '' from the front (0 degree) is 40.7 x 44.8 pixels, and 10.0 pt 18.6 when shooting 32 pt Arial ⁇ A '' from a 45 degree angle It was a pixel.
- the parameter S described in section 4.2.3 is 20.
- Table 5 shows the average processing time per character. Since the processing time required for one character is approximately 4 ms, it is considered possible to recognize about 200 to 250 characters per second by simple calculation.
- FIG. 21 includes 236 characters (excluding commas). The breakdown is 168 characters for Arial, 27 characters for Arial Black Italic, and 41 characters around the MIRU logo (font unknown). Of these, the characters around the Arial and MIRU logos were registered and a recognition experiment was conducted.
- LLAH maintains the robustness of search while reducing the amount of computation and memory usage to a hundred millionth of the conventional method. This performance improvement is made possible by reducing the amount of calculation by limiting the selection method of feature points and improving the identification performance by increasing the dimension of feature vectors.
- LLAH assumes that feature points are dispersed, it cannot be applied when feature points are continuous as in the present invention.
- the latter can also be applied to the present invention, and further performance improvement of the present invention can be expected.
- LLAH increases the dimension of feature vectors. Taking the case of affine transformation as an example, one invariant can be calculated if there are 4 points on the same plane. In LLAH, choose 4 points from m> 4 points
- the feature vector is increased in dimension and the identification performance is improved.
- the same processing as LLAH is possible. That is, in the case of affine transformation, a number of feature points are obtained from three points, and a number of feature vectors are calculated. All of them are combined to increase the dimension. This makes it possible to calculate a feature vector with higher discrimination performance. Note that the additional feature points can be uniquely selected, so the amount of calculation is considered to be small.
- the affine transformation method of the present invention is used or if the present invention is used at the similar transformation level, it is possible to collate with the amount of calculation of O (P 2 ) in either case.
- the breakdown of the amount of calculation in the case of similarity transformation is O (P) for the creation of a similarity invariant coordinate system and O (P) for the projection of feature points.
- the specific method in the case of similarity transformation is obtained in the same manner as in the case of affine transformation for the first and second points.
- the method for determining the predetermined angle and the length are stored, so the predetermined point is determined in advance from the first point and the second point.
- a method of setting the distance to be a certain distance can be considered.
- Non-Patent Document 7 selects only the method with the smallest distance, and is slightly different from this embodiment.
- the threshold is determined depending on the aspect ratio of the connected component. That is, when the ratio of the long side and the short side of the circumscribed rectangle of the connected component is r: 1 and t is a constant, it is set as tr. This is in consideration of an error of a feature vector caused by image processing.
- the threshold is made to depend on the aspect ratio of the connected component.
- the difference in threshold value at the time of registration and recognition is partly due to the difference in the normalization level of the connected components described in the next section. This measure is very powerful.
- a new query feature vector is created by bit inversion of the query feature vector.
- the recognition rate decreased by about 7% regardless of the imaging angle and the processing time decreased by about 0.3% to 4.0% depending on the imaging angle.
- the third strategy is a device for hash value collision.
- Experimental Example 2 a large number of collisions sometimes occurred in some bins of the hash table. Since the hash processing time is proportional to the number of collisions, if a large number of collisions occur, the processing time may become very slow. Therefore, in this modification, the number of the bins whose number of collisions is greater than c is thinned out to c. That is, the bin elements where a large number of collisions have occurred are deleted except for c elements. This operation can greatly reduce memory usage and processing time.
- Non-Patent Document 7 is slightly different from the method of the present invention because all information registered in the bin is deleted when a collision exceeding a threshold value occurs.
- a circular character type such as “O” could not be selectively recognized.
- O a circular character type
- the type of letters used in the experiment is a total of 62 letters, including upper and lower case alphabets and numbers.
- a total of 55800 reference images with 100 fonts were registered in the database.
- an image including each character twice (124 characters per sheet) and laid out so that the characters are arranged on a curve was prepared. And it printed on A4 paper, and it picked up from 0 degree, 30 degree, and 45 degree
- FIG. 24 shows a Century recognition target image. The image sizes are 1577 ⁇ 2209, 1397 ⁇ 2185, and 1265 ⁇ 2201, respectively.
- the fonts used are 100 fonts selected from the fonts installed in Microsoft Windows 7 (registered trademark). When selecting, fonts with thin strokes were excluded because the connected components were easily decomposed into two or more components due to the effect of reduced resolution.
- FIG. 25 shows 10 fonts among the selected fonts.
- the number of registered fonts was gradually increased from 1 to 100, and changes in recognition rate and processing time were observed.
- the number of registered fonts was increased by 1 font from 1 to 10 fonts and by 5 fonts thereafter. Since only 75 fonts could be prepared for recognition, the number of registered fonts was between 1 and 75, and the experiment method was slightly different for 80 fonts and later. Up to 75 fonts, the recognition target of the same font as the registered font was recognized. That is, Arial was registered in the database for the first font, and Arial character images were used as recognition targets. The second font registered Arial and Century, and also recognized Arial and Century character images. After 80 fonts, all 75 fonts were recognized regardless of the number of registered fonts.
- the connected components were classified according to the following procedure during the registration process.
- the number of connected components constituting the character is checked at the time of registration. If there are two or more, the relationship between the relative positions and sizes of the two connected components is described in the separated character table shown in FIG. At the time of recognition, the separated character table is referred to. If there is a connected component that satisfies this condition, the connected components are integrated and recognized as one character.
- Table 7 shows the result of classifying the 62 character types of Arial as an example of the classification.
- Table 7 lists only the classes to which two or more character types belong among the 55 classes generated.
- a computer with an Opteron 2.8 GHz CPU and 32 GB of memory was used.
- normalization was performed so that the larger connected component width and height would be 100 pixels for the reference image and 50 pixels for the query image.
- the recognition rate of characters taken from the front is 98.4% (20.0% increase compared to the case of Experimental Example 2), even when taken from 45 degrees A recognition rate of 97.9% (up 15.7%) was achieved.
- the processing time is 7.2 ms (about 1/3 of that in Experimental Example 2), and about 140 characters can be recognized per second. Therefore, it was confirmed that the three measures introduced in this embodiment are very effective.
- FIG. 29 and FIG. 30 show the number of classes and the memory usage of the present invention.
- the number of classes was 55 for 1 registered font, 397 for 10 and 1672 for 100.
- the rate of increase gradually decreased. This is considered because a part of newly registered font belongs to the same class as the already registered reference image.
- the memory usage increased almost in proportion to the number of registered fonts. As a cause of this, it is conceivable that the information registered in the hash table is hardly changed regardless of the increase in the number of classes.
- Memory usage in the case of 100 fonts was about 4GB, but we think that memory usage can be greatly reduced by implementation.
- Word recognition The embodiment described above performs recognition for each character. Therefore, it is possible to know where and what characters are written in the document, but it is not possible to know what meaning words and sentences are written. Word recognition is indispensable considering that keywords for information retrieval are often words. According to this embodiment, a language that is separated from other words by separating a word with a space, such as English, and that complies with a predetermined reading direction rule, for example, a rule of writing from left to right is dealt with. Perform word recognition. As a precondition, it is assumed that the document image has the independent scaling and shear distortion removed at the character recognition stage.
- FIG. 31 An outline of the demonstration system is shown in FIG.
- the demonstration system works with a commercially available laptop computer and a small camera, and can be carried and used.
- the target document is a white paper with black characters printed on it, and the layout is arbitrary.
- the following processing and output are performed in real time every frame.
- a character region is extracted from a captured image (circle A in FIG. 31, the same applies hereinafter), and a character is recognized.
- the extracted character area is displayed in green on the screen, and the recognition result is superimposed on the center of the area for each character (B in FIG. 31).
- the translation function is to send recognized English words to the English-Japanese dictionary server and display the translated words on the console (C in FIG. 31).
- the national flag and the image of the tourist attraction are linked to the word of the country name, and when shooting with the camera, the image window (D in FIG. 31) opens.
- an animal word was linked to an animal word.
- the demonstration system has a mode in which any word can be selected so that only useful information can be extracted. Since the cursor is displayed (E in FIG. 31) at the center of the capture screen, it is selected according to an arbitrary word area. The selected word area is highlighted in color, or a character string is displayed in another window (F in FIG. 31). You can click to access the service, or you can access it automatically when the word is over the cursor.
- classification is automatically performed when characters are registered.
- a character image is newly added to the database, it is matched with a character image that already exists in the database, and if a similar character image is found, the newly registered character is classified into the similar character class. It is a method of doing.
- affine transformation parameters are estimated separately for independent scaling, shear, rotation, and scaling. Assuming that all characters are written on the same plane, independent scaling and shear are common to all connected components on the page, and this can be used to remove distortion on the page. Rotation is obtained for each character, and the rotation is used for word recognition processing. Scaling is also a parameter obtained for each character, but this demonstration system does not use it for word recognition.
- characters composed of multiple connected components such as 'i' and 'j' record the class and positional relationship between connected components at the time of registration, and after combining the classes at the time of recognition, combine them into one character Restore to.
- characters extracted from the word region 2 are character numbers 1 to 5.
- “Character” at this stage is in a state where identification at the class level has been completed by the previous character recognition processing, and each character still has a plurality of character type candidates.
- the character number 1 class includes two characters 'M' and 'W'
- the character number 3 class includes two character types 'd' and 'p'
- the other characters are one character.
- the class consists only of species.
- the word in the word region 2 in FIG. 34 is read with a character number “4, 5, 1, 2, 3”, etc. This is not the case.
- a character passing through a character included in a word is selected at least once, that is, a shortest path problem is obtained by the Dijkstra method.
- the path is formed by connecting each character on the document image with a straight line, and the cost means the Euclidean distance between the characters.
- "1, 2, 3, 4, 5" and the reverse order "5, 4, 3, 2, 1" are obtained as the shortest path.
- FIG. 35 is a graph showing candidates for each character in the word region 2 in the estimated order. Numerical values and “assumed upward direction” in the figure will be described later. When the graphs are combined in order from the character number 1 and vice versa, “Media”, “Wepia”, “aideM” and the like are listed. If you try to read a word in the word region 2 as "Wepia”, 'W' and 'p' will be rotated by about 180 degrees compared to other characters. I can not say. Even if the direction of the characters is aligned, "aideM” will be read from right to left, which is also not suitable.
- Penalty is calculated using the character rotation angle obtained in the character recognition stage.
- the rotation angle is 0 ° for the upper direction of the capture screen and positive for right rotation.
- the numerical value below each character type in FIG. 35 indicates the direction.
- the first penalty is added when tracing the character node based on the assumption that the direction of the character does not change abruptly. Since it can be said that the direction of the character is aligned as the difference in the rotation angle from the previous character is smaller, this angle difference is taken as a penalty.
- the definition range of the angle difference is 0 ° ⁇ 180 °.
- 3 is added as a penalty.
- large penalties are added many times in the route, but in that case, the calculation is stopped halfway and excluded from the candidates, thereby suppressing an increase in processing time.
- the second penalty is related to the rule “read words from left to right”. Taking the word in word area 2 as an example, the concept is shown in FIG. If the words are read in order from the character number 1 like “Media”, it can be assumed that the direction from the first character to the second character is the right direction as shown in FIG. Then, it can be assumed that the upward direction is perpendicular to it. Since the smaller the difference between the upward direction and the angle of the first character candidate, the more likely the character is, the difference value is taken as a penalty. In FIG. 35, the penalty between “assumed upward direction (1)” and the next node is calculated. If the upward direction is ⁇ 35 °, the penalty is ⁇ 5 when the character number 1 is “M”.
- the words in the likely document can be estimated by sorting in ascending order.
- "Media” had the minimum penalty (17).
- Character classes of the same class such as 'd' and 'p' that could not be distinguished by recognition for each character are now distinguished at the character level at the stage of word recognition.
- penalties are almost equal for character types that are similar to each other, such as ‘0’ and ‘O’, and have similar orientations, and it is difficult to determine a plausible character type.
- Our coping method uses multiple candidates with small penalties as keywords for information retrieval, and if there is a link attached to it, it is assumed that it is a plausible word and determines the character type and accesses the link destination. Is to do. In the future, we are studying a function that enables even words containing misrecognized characters to be accessed by fuzzy search.
- section 9.1 we introduced the function to provide information by font.
- each node holds font information in addition to the character type and rotation angle, and votes for the font histogram each time one character is traced. Then, when the terminal character is reached and one word candidate is generated, the most frequent font is estimated as the word font. In the future, it will be necessary to improve the accuracy of font identification by providing a penalty for fonts and making an estimation while considering whether the adjacent character and the font are the same.
- the recognition target is the document shown in FIG. 37, in which 144 letters and 30 words are written side by side on the curve.
- the article “a” ⁇ ⁇ included in one document is not included in the number of words because there is no need to perform word recognition processing.
- a total of 10 copies of this document were printed on A4 paper in the same font as the database. This paper was photographed with a camera at an angle of 0 °, 30 ° and 45 ° to the front.
- FIGS. 37 (a) to 37 (c) show captured images in the case of Arial. When an Arial document was photographed from 0 degrees, the image size was 1,633 x 2,333 pixels, and the average size per character was 58.2 x 48.0 pixels.
- the word candidates are listed and the penalties described in section 4.2 are arranged in ascending order. Then, for each word, the word recognition rate was determined by examining whether only one word with the minimum penalty or the top 10 words were included when the correct word was included. Note that, as described above, in this embodiment, it is difficult to distinguish between character types in which uppercase letters and lowercase letters are in a relationship of enlargement / reduction, so that 'C', 'O', 'S', 'V', ' W ',' X ', and' Z 'are correct even if there is a difference between uppercase and lowercase letters. Font estimation was not targeted for performance evaluation, and only character types were compared. In addition, when comparing fonts in a preliminary experiment, the word recognition rate for 10 fonts of documents taken from 0 ° was 60.0% when only the minimum penalty was seen, and 72.0% when the top 10 words were seen.
- Fig. 38 shows the processing time per word.
- the processing time is the time required for word recognition and does not include character recognition. It was found that the processing time varies greatly depending on the number of fonts registered in the database, and the processing time varies depending on the font type. In the case of 10 fonts and 0 degree, the processing time was 6.14 milliseconds.
- the reason for the increase in processing time is considered to be the increase in the number of character types per class due to an increase in registered images and an increase in the amount of penalty calculation.
- the character groups that became the same class when Arial IV was registered are as shown in Table 7 in Section 8.2.
- the result of word recognition is shown in FIG.
- the correct answer rate when only the first word is seen from the order of the smaller penalty and the correct answer rate when the word up to the 10th place is seen are drawn for each shooting angle.
- the recognition rate declined with the increase in the number of fonts handled, but by examining the words up to 10th place, it increased by an average of 9.7% compared to examining only the 1st place, and the recognition rate of 92.3% was obtained at 10 fonts and 0 degrees. It was.
- the reason why the correct word cannot be covered only by the first place is because the incorrect character with the similar direction in the same class got a smaller penalty.
- word recognition failure examples include failure to narrow down the class in units of characters and failure to acquire a word region.
- An example of word area acquisition failure is shown in FIG.
- the frame line surrounding the character string represents the estimated outline of the word area, and “e” of “estimate” and the other characters are excessively divided. It was confirmed that if the blurring of the image is strengthened so that this word can be recognized, a plurality of words are combined with a space in another document image. Therefore, it will be a future problem to acquire a word region other than a method of determining the intensity of blurring or a method of blurring an image.
- FIG. 41 shows the recognition rate in character units for verification. Recognition was done at the class level, and if the correct class was included in the obtained class, the recognition was considered successful. Examples of character recognition failures include misrecognition due to an increase in the number of data in the database, and adjacent character and connected components such as 't' and 'u' in FIG. It is mentioned that it became impossible. If one character recognition fails, one word cannot be recognized even if other characters are successfully recognized. Therefore, the word recognition accuracy is very high. It can be said that it is greatly involved. In order to improve the word recognition rate, it is important to improve character recognition technology, correct character recognition errors using a word dictionary, and estimate correct words.
- the time required for the word recognition processing is as shown in FIG. 38, but the time required for the character recognition processing was 3.44 milliseconds per character in the case of 10 font and 0 ⁇ .
- the combined word recognition and five word recognitions are 23.34 milliseconds, and about 42 words can be processed in one second.
- FIG. 42 shows the memory usage when the database is read. The amount of memory used to handle 10 fonts of alphanumeric characters was approximately 397 MB.
- a simple but effective technique that can recognize patterns such as characters and pictograms in real time is provided. That is, it can simultaneously solve the three problems of (1) real-time processing, (2) dealing with geometric distortion, and (3) recognition of various layout patterns.
- a pattern recognition technique is realized. Although specific numerical values are not shown in this specification, the pattern recognition apparatus of the present invention based on the pattern recognition method can be realized on a notebook personal computer connected to a web camera and can operate in real time. . Further, in Section 8 of the embodiment, a faster and more robust approximate nearest neighbor search method is introduced.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Geometry (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
Description
上記のような応用を実現するためには、(1)実時間処理が可能である、(2)幾何学的な歪みに頑健である、(3)文字の配置に依らない認識が可能であるという3つの要件を満たす実用的なカメラベース文字認識技術が必要である。
一方、前述の(2)と(3)の要件を満たす手法として、KusachiらやLiらは文字を1文字ずつ認識する手法を提案している(例えば、非特許文献5,6参照)。しかし、非特許文献5、6に開示された手法は文字を1文字ずつ認識するため、前述の文字行の問題は起こらないが、処理に時間がかかり、(1)の実時間処理とはいえない。前記(1)から(3)の要件を同時に満たす手法が望まれている。
この発明において、クエリ画像は、幾何学的変換を受けて取得されることを前提としている。例えば、認識対象の文字を含む画像がイメージスキャナで読み取られる場合は、拡大/縮小、回転などの幾何学的歪みを伴って読み取られる。この場合、クエリ画像は相似変換による歪みを受けている。また、例えば、認識対象の文字を含む画像がカメラで撮影された場合、正対位置からのズレによって射影歪みを受ける。ただし、ズレ量が少なければ、奥行き方向の倍率変化を伴わないアフィン歪みとして近似できる。
各パターンの画像的な特徴は、その特徴を表すベクトルと関連付けられて画像データベースに登録されている。画像的な特徴の一例は、形状的な特徴、濃淡の分布の特徴、色彩の特徴あるいはそれらの組合せである。前記画像データベース中の各ベクトルは、短時間で前記ベクトルの照合を可能にすべくハッシュテーブルを用いて体系化された状態で予め登録されている。
切り出し部、特徴量取得部、照合部およびパターン決定部は、コンピュータが所定のプログラムを実行することによりそれらの機能が実現されてもよい。あるいは、その一部または全部の処理が、例えば半導体チップ上に実装されたハードウェアによって実現されてもよい。後述する実施形態において、各部の機能は、パーソナルコンピュータのハードウェアおよびソフトウェアによって実現されている。
前記第1特徴点の位置は、前記パターン要素の輪郭上の画素の中から特定されてもよい。このようにすれば、第1の特徴点は、前記パターン領域の輪郭を抽出し、その輪郭上の1点として確実に決定できる。
ここで示した種々の好ましい態様は、それら複数を組み合わせることもできる。
この発明で、クエリ画像は、後述する図1のような複数の文字やマークを含む紙面に対応する。パターン要素は、連結成分に対応する語である。また、この発明でクエリ特徴量は、クエリ画像の各パターン要素の特徴を表すベクトル量である。
さらに、この発明で、参照パターンは、文字認識において各文字を表す連結成分に対応する。例えば、後述する図14(a)の各文字、(b)の各ピクトグラム、図10の分離文字テーブルの第1列(第2列でない)の列の各パターンに対応する。参照特徴量は、各参照パターンの特徴を表すもので、クエリ特徴量と対比(照合)されるものである。また、分離パターン表は、後述する図10の分離文字テーブルに対応する。なお、図10の例では、文字「j」を構成するグループと「i」を構成するグループが分離パターン表に含まれている。この発明で、分離パターンは、例えば図10の分離文字テーブルの第1列(第2列でない)の列の各パターンに対応する。
以下、図面を用いてこの発明をさらに詳述する。なお、以下の説明は、すべての点で例示であって、この発明を限定するものと解されるべきではない。
発明の詳細な説明をするにあたり、まず、この発明の前提について説明する。この技術分野における従来の研究に習い、簡単のために、白色の背景に黒色の文字が書かれていることを想定する。なお、「黒色」は一例であって、地と区別できる所定の色および/または濃度の画素の一塊、所定範囲の色および/または濃度の画素の一塊は、この発明に係るパターン領域となる。以下、便宜上、前記パターン領域を構成する画素を「黒画素」と呼ぶ。文字画像はカメラで撮影するので、射影歪み、ぼけや解像度低下の影響を受けるが、文字の連結成分、即ち、画像中で黒画素が隣接し、一塊になっているものは簡単な処理で切り出せるとする。また、全ての文字は同一平面上に存在するとする。
この発明で解決すべき課題は、(i)切り出された連結成分の高速な認識、(ii)認識の頑健性向上、(iii)「i」や「j」のように2つ以上の連結成分から成る文字(分離文字)の認識、の3つである。
そのうち(i)の高速認識については、Geometric Hashingを連結成分の照合に適応させ、かつ、幾何学的不変量の計算原理を利用して高速化する方法を次の第2節で示す。(ii)の認識の頑健性向上については、連結成分の姿勢を考慮した認識方法を第4節で述べる。(iii)の分離文字の認識については、第3節で述べる手法で解決する。
2. Geometric Hashingをこの発明に適応させ高速化するための改良
2.1. Geometric Hashing
Geometric Hashingは幾何歪みを受けた画像を不変座標系を用いて記述し、検索する強力な手法である。この発明で提案するGeometric Hashingの改良手法を説明するために、Geometric Hashingを簡単に説明する。詳細は、次の文献、Y.Lamdan and H.J. Wolfson, "Geometric hashing: a general and efficient model-based recognition scheme, " Proc. ICCV1988, pp.238-249, 1988.を参照されたい。
参照画像を登録する。まず、登録する参照画像から抽出した特徴点が与えられているとする。次に全特徴点から3点選び、図2(a)に示すように選ばれた特徴点の順番を考慮して2本の基底ベクトルを作成する。そして、2本の基底ベクトルを用いて図2(b)のように新しい座標系を作成し、特徴点を写像する。この座標系は画像がアフィン変換を受けても同様に作成できるため、アフィン不変座標系である。このアフィン不変座標系を図2(b)のように格子状に区切ると、各領域は2次元ハッシュテーブルのビンに相当する。各特徴点が存在するビンに対して、登録する画像の番号と基底の通し番号の組を登録する。この処理を全ての可能な基底に対して実行し、1枚の参照画像の登録が終了する。全ての参照画像を登録して登録処理が終了する。アフィン不変座標系の作成にO(P3)、特徴点の射影等にO(P)の計算量が必要であるため、参照画像1枚当たりの計算量はO(P4)である。
なお、ここで、O(P)あるいはO(P3)は、問題を解くために必要なおおよその計算量の表記方法であって、O(P)はPが定まったときの計算量がPの1乗のオーダ、即ち、aP+b以下で収まることを、O(P3)はPの3乗のオーダ、即ち、aP3+bP2+cP+d以下で収まることを表す。
ただしa,b,c,dは定数である。O(P4)、その他についても同様である。
検索処理の前半は登録処理と同じである。質問画像から抽出した特徴点が与えられているとする。全特徴点から3点選び、図2(a)に示すように選ばれた特徴点の順番を考慮して2本の基底ベクトルを作成する。そして、2本の基底ベクトルを用いてアフィン不変座標系を作成する。このアフィン不変座標系は登録時に格子状に区切られていて、各領域が2次元ハッシュテーブルのビンに相当する。各特徴点が存在するビンから、登録されている画像の番号と基底の通し番号の組を取り出して、画像の番号と基底の通し番号の組に対して投票する(投票テーブルは2次元になる)。この処理を全ての可能な基底に対して実行し、最大の投票数を得た画像の番号と基底の通し番号の組を決める。そして、この組の画像の番号を検索結果として出力する。ただし、全ての基底を処理する前に結果が明らかになったときには処理を途中で終了することができる。アフィン不変座標系の作成にO(P3)、特徴点の射影等にO(P)の計算量が必要であるため、合計の計算量はO(P4)である。
2.2.1. 前提とする課題の相違
この発明はGeometric Hashingの改良手法に係る。この発明について述べる前に、Geometric Hashingが通常解決しようとする問題とこの発明の前提となる課題の違いについて述べておく。Geometric Hashingでは、特徴点が与えられたときに点の配置のみから対象を同定する問題を解いている。つまり、特徴点がどのような対象からどのように抽出されたかを考慮しない。それに対してこの発明では、図形が与えられたときに図形から得られる特徴点の配置と図形の特徴の両方を用いて図形を同定する。即ち、予め定められた規則をパターン領域に適用して決定する。そのため、特徴点として図形から得られる角や変曲点などの幾何変換を受けても変化しない箇所を使用することもできるが、この発明では原則として図形の輪郭上の画素を特徴点とする。このことによって手法がどのように異なるかは以下で述べる。
Geometric Hashingの欠点は膨大な計算量である。アフィン変換に対応した場合について考えると、検索処理に必要な計算量は点数Pに対してO(P4)である。P=100点の場合を考えるとO(100,000,000)もの計算が必要になるため、実時間アプリケーションに使用することは現実的でない。一方、提案する方法を用いれば、最も計算量が少ない場合、アフィン変換を考慮した場合O(P2)にまで削減することができる。
図6のように3点の特徴点が与えられたとする。1点目と2点目を通る半直線と1点目と3点目を通る半直線を考え、図形から切り取られる面積をS1とする。このとき、表1の性質4により、S1/S0がアフィン不変量になる。したがって、S1/S0が特定の値になるように3点目を定めればよい。3点目を定める際に時計回りや反時計回りなどの情報を使って点を一意に定めることもできる。
方法1と同様に、図7のように3点の特徴点が与えられたとする。3点が作る三角形の面積をS1としたとき、表1の性質4により、S1/S0がアフィン不変量になる。したがって、S1/S0が特定の値になるように3点目を定めればよい。S1/S0の値は、特定の値でなくとも、最大値、最小値等でもよい。3点目を定める際に時計回りや反時計回りなどの情報を使って点を一意に定めることもできる。なお、S1が一定になるように3点目を定めることを考えると、取り得る3点目の軌跡は図7のように1点目と2点目を通る直線と平行な直線になる。したがって、この直線と図形との交点を3点目に定めればよく、簡便に計算可能である。交点が複数ある場合は2点目から近いほうを選ぶといった選択方法も可能である。
Geometric Hashingは画像番号と基底番号の組をデータベースに登録する。一方、この発明では基底番号の代わりに、画像から計算した特徴ベクトルと基底の作成に用いた特徴点の座標を登録する(図9参照)。
なお、以後はこの発明に係るパターン認識の代表例としての文字認識の話に特化するため、「画像番号」のことを「文字番号」と呼ぶことにする。「検索」を「認識」と呼ぶことにする。また、「パターン認識装置」のことを「文字認識装置」と呼ぶことにする。
前節では単一の連結成分で構成される文字を高速に認識する方法を述べた。本節ではその結果を利用することで、「i」や「j」のように2つ以上の連結成分から成る分離文字の認識方法を述べる。
このテーブルは分離文字の各連結成分の位置と大きさの関係を記したもので、認識時に所定の位置に所定の連結成分があるかどうかを調べることで分離文字の認識が可能になる。図10(b)の分離文字テーブルは5つの要素で構成され、それらの各要素を左端から順に第1~第5まで番号を付している。第1の要素は、連結成分の形状および/または連結成分の番号を示す。第2の要素は、その連結成分が含まれる分離文字を示す。第3の要素は、連結成分相互の相対位置を示すベクトルである。第4の要素は、連結成分の面積を示す。第5の要素は、組になるべき連結成分の面積を示している。
この発明によるパターン認識装置の概要を図11に示す。前記装置は、大別して画像登録部11と画像認識部13から成る。この発明の文字認識装置は、少なくともデータベース15にアクセス可能な画像認識部13からなる。それぞれを以下で説明する。
4.1. 画像登録部
画像登録部11では、参照画像をデータベース15に登録する。参照画像は二値画像とする。
撮影時のピンぼけに対処するため、参照画像にガウスぼかしを重畳する生成型学習法を適用した(H.Ishida, S.Yanadume, T.Takahashi, I.Ide, Y.Mekada and H.Murase, "Recognition of low-resolution characters by a generative learning method, " Proc. CBDAR2005, pp.45-51, 2005.参照)。元画像の位置(x,y)における画素の輝度値をI0(x,y)とすると、生成された劣化画像の位置(x,y)における輝度値Ig(x,y)は次式で与えられる。
2で述べた方法で 3点の特徴点を選択し、不変座標系を生成することにより、特徴ベクトルを生成する。以下では、得られた3点から特徴ベクトルを計算する方法を説明する。特徴点が2点あれば、2点を通る直線を1本定めることができるので、3点から合計
データベース15は、具体的にはハッシュテーブルで構成されている。
4.2.1. 画像の取得
画像はデジタルカメラやwebカメラで静止画ないし動画として取得する。動画として撮像した場合はフレーム毎に分解して、複数の静止画として扱う。得られた各画像を質問画像と呼び、以下の処理で用いる。
4.2.2. 文字画像の切り出し
得られた連結成分から特徴ベクトルを生成する。この処理は2で述べた処理と基本的に同じである。唯一異なるのは、全ての可能な組合せに対して不変座標系を生成せず、あらかじめ定めるS個に限定することである。
4.2.4. 投票を用いたパラメータ推定と認識(実施例1)
投票を用いてパラメータ推定と認識を行う。ここではアフィン変換の場合について述べる。
最初に、文字番号、特徴ベクトル、3点の特徴点の座標の組をハッシュテーブルからS個得る。そして、文字番号に対して重み
独立変倍の倍率αを4つのパラメータに分解する。
前節と異なる姿勢推定手法について述べる。4.2.3で述べた特徴ベクトルを用いることで、図9のハッシュテーブルから連結成分番号と特徴点3点の座標を取得することができる。こうして得られた情報は、質問画像の仮の認識結果であり、多数の誤りを含んでいる。以下では、次の文献M. Iwamura, R. Niwa, A. Horimatsu, K. Kise, S. Uchida and S.Omachi, "Layout-free dewarping of planar document images, Proc. DRR XVI, " 7247-36, Jan. 2009. に類似の多数決原理を段階的に用いることで正しい認識結果に集約する。すなわち、図13に示すように、最初に紙面の姿勢推定を行い、続いて連結成分毎に認識と姿勢推定を行う。
これらは質問画像の連結成分毎に定める。
σを変えながら参照画像1枚につき4枚の劣化画像を生成した。用いたσの値はσ=0, 2,4, 6である。適応二値化のパラメータnはn=101とし、ノイズとみなして除去する閾値は32とした。ハッシュサイズはHsize=219-1とした。
特徴ベクトルを作成する際の各領域の値の計算は、輪郭上の画素のみでなく、図形内部の全画素を用いた。
以下の実験では、CPUがOpteron 2.4GHzで、メモリが128GBである計算機を用いた。
この発明の有効性を調べるために様々なフォントの文字を認識した。字種には60種類の数字とアルファベットを用いた。内訳は、10種の数字、"i"と"j"を除いた24種の小文字アルファベット、26種の大文字アルファベットである。一部の文字はアフィン歪みを受けると外見が類似してしまうため、表2で同じ箱に入っている文字は認識実験において同一クラスとみなした。例えば、0(ゼロ)をO(オー)に間違えても誤認識とはしない。
図14(b)に示す10種のピクトグラムを3.1と同様の方法で撮像し、認識実験を行った。S=200とした。
認識率と処理時間を図17に示す。ビン数が16のときに認識率が最高になった。処理時間はほとんどの場合で同程度だったが、ビン数が4のときに極めて大きくなった。それと同時に認識率が最低になった。これは特徴ベクトルの識別能力が十分でないため、ハッシュに多数の衝突が発生したためと思われる。
最後に、図1に示した文書を認識した。デジタルカメラを紙面から0度、30度、45度の3種類に傾けて撮影して、背景が写らないように紙面の部分だけ切り取った。切り取った後の0度、30度、45度の画像の大きさはそれぞれ2054×1464、1714×1326、1516×1322である。得られた画像を図18に示す。図1に含まれる文字から得られる連結成分は148個であるが、そのうち18個は"i"と"j"の一部であった。"i"と"j"は連結成分が1つではないため、参照画像に含まれていない。したがって、これらの2字種を認識することができない。
そこで、残る148-18=130個の文字の認識率を算出した。k=25とした。認識率と処理時間を表4に示す。S=200の場合はS=20の場合よりも認識率が高かった。S=20の場合の処理時間はS=200の場合の約1/7であったが、認識率の差はそれほど大きくなかった。S=20の場合の結果から、この発明が高速で頑健な認識が可能であることを確認できた。
6.1. 各種フォントに対する性能評価
分離文字の認識方法を用いた実施例2の手法の有効性を調べるために、図14に示すArial、Century、Gigi、Impactの4種類のフォントの数字とアルファベット(各62字種)を認識した。ただし3節で既に述べたように、一部の文字はアフィン歪みを受けると外見が類似
してしまい、判別が困難なため、表3で示した文字は同一クラスとみなしている。また、4.2節で述べた画像認識部による認識処理において、得票数が0だった場合はリジェクトとした。
最後に、文字以外の図形の認識可能性を調べるため、上記の4フォントの他に図14(b)に示す10種のピクトグラムも同様の方法で認識した。図20と表5に示すように、Impact以外のフォントと同様の結果が得られた。
以上により、この発明が高速で動作し、一部のフォントを除けば誤認識率が少ないことが確認できた。
図21に示した文書を認識した。デジタルカメラを紙面から0度、30度、45度の3種類に傾けて撮影して、背景が写らないように紙面の部分だけ切り取った。切り取った後の0度、30度、45度の画像の大きさはそれぞれ2470×1746、2278×1746、2038×1844である。得られた画像を図22に示す。図21には236文字含まれている(カンマは除く)。内訳は、Arialが168文字、Arial Black Italicが27文字、MIRUのロゴの周囲の文字(フォント不明)が41文字である。このうち、ArialとMIRUのロゴの周囲の文字を登録して認識実験を行った。
Geometric Hashing以外でこの発明の関連研究をまとめておく。
中居らが提案したLLAH(Locally Likely Arrangement Hashing)という手法が ある(中居友弘,黄瀬浩一,岩村雅一,"特徴点の局所的配置に基づくデジタルカメラを用いた高速文書画像検索,"電子情報通信学会論文誌D,vol.J89-D, no.9,pp.2045-2054,Sept. 2006.参照、また、国際公開第WO2006/092957号パンフレット参照)。この手法は、単純な画像処理によって得られる特徴点の局所的な配置に着目し, 幾何学的不変量とハッシュを用いてデータベース中から対応する特徴点を 高速に検索する手法である。Geometric HashingとLLAHを比較すると、LLAHは検索の頑健性を保ちつつ, 計算量とメモリ使用量を従来手法の数億分の一に減少させている。この性能向上を可能にしているのは、特徴点の選択方法を限定して計算量を減少させていることと、特徴ベクトルの高次元化による識別性能の向上である。
本節では、前節で述べた実施態様に非特許文献7の方策を3つ導入することにより、改良手法を提案する。
8.1. 変形例の態様
1番目の方策は、距離計算の導入である。前の4.2.5 節で述べたように、ハッシュテーブルから得られる情報には誤りが含まれているため、その中から正しいものを選択する必要がある。図13(a)に示す実施態様では字種に対する投票で信頼性の高い情報を絞り込んでいたが、この実施態様ではその代わりに、クエリ特徴ベクトルと、ハッシュテーブルから得られた特徴ベクトルのユークリッド距離を計算し、距離が閾値以下のものを選択する。
8.2. 実験例3
上述した実施態様の有効性を確認するために、最大100フォントを登録したデータベースを用意し、カメラで撮像した様々なフォントの文字画像を認識した。
実験に使用した字種は、大文字と小文字のアルファベットと数字の合計62字種である。参照画像1枚につき8種類の劣化画像を生成するため、100フォントでは合計55800枚の参照画像をデータベースに登録した。認識対象として、図24に示すような、各文字を2回ずつ含み(1枚当たり124文字)、文字が曲線上に並ぶようにレイアウトされた画像を用意した。そして、A4用紙に印刷し、デジタルカメラで0度、30度、45度から撮像したものを手動で切り出して認識対象とした。図24にCenturyの認識対象画像を示す。画像の大きさはそれぞれ1577×2209、1397×2185、1265×2201である。
実験ではCPUがOpteron 2.8GHzで、メモリが32GBである計算機を用いた。画像の登録と認識に要する計算量を削減するため、連結成分の幅と高さの大きい方が、参照画像では100ピクセル、質問画像では50ピクセルになるように正規化した。また、本実験例(8節)において前述していないパラメータはl=4(すなわち、k=16)、Hsize=219-1、D=2、S=10とした。以下で述べる実験例2の実験においても本パラメータを使用するため、手法が同じであっても前節までの結果と完全には一致しない。
まず、認識率と1文字当たりの平均処理時間を図27、図28に示す。図中の「実験例2」は実験例2で用いた手法を表す。認識率については、実験例2による手法は複数のフォントを登録すると認識率が減少したのに対して、提案手法の登録フォント数に依らず、高い水準でほぼ一定であった。処理時間については、実験例3(この実施態様)の手法と実験例2の手法の両方が登録フォント数の増加に伴って処理時間が増加しているが、提案手法の方が傾きが緩やかであった。
以上に述べた実施態様は、1文字ごとに認識を行うものである。そのため、文書のどこにどの文字が書かれているかを知ることは出来るが、どんな意味の単語や文章が書かれているのかを知ることは出来ない。情報検索のキーワードは多くの場合単語であるということを考えると単語認識は不可欠である。
この実施態様によれば、英語のように単語間がスペースで区切られて他の単語と区別され、かつ予め定められた読み方向の規則、例えば、左から右に記すという規則に従う言語に対処した単語認識を行う。また、前提条件として文書画像は文字認識の段階で独立変倍とシアーの歪みが取り除かれたものとする。
この実施態様では実時間の文字・単語認識技術を利用した情報取得アプリケーションの実現性、有用性を実証するため発明者らが作成した単語認識機能付のパターン認識装置(以下、実証システム)について説明する。
雑誌の記事や街頭の看板など、環境中には至る所に文字が存在し、それぞれが目的地への経路や商品の宣伝のような何らかの意味を持った情報を人に伝えている。そのような環境中文字をカメラで撮影し、実時間でコンピュータに認識させれば様々なサービスが見込める。情景中の文字を用いるパターン認識装置では特別な準備が必要なく、気軽に使えるという利点がある。実世界の単語が各種サービスとリンクしていることから、前記パターン認識装置の機能を我々は「環境中文字列のリンクアンカー化」と呼ぶことにした。
今回、実証システムで用いたカメラベース文字認識は第2.3節で上述した手法を基礎とし、さらに第8節の変形例を取り入れた手法である。連結成分ごとの認識では ‘N’,‘Z’,‘z’など、互いにアフィン変換の関係にある文字を識別することができないため、そのような文字群は図32のように同クラスとみなし、認識時にはクラスに属する複数の文字を結果候補として出力する。文字単位認識のみではクラスレベルまでの識別しか行えないが、次節で述べる単語認識の段階で1字種レベルでの識別を行う。しかし、アフィン変換の関係にある文字の組み合わせはフォントによってまちまちであるため、登録させるフォントの種類が増えるにつれ、手動でのクラス分類は困難になる。そこで、我々の手法では文字の登録時に自動的にクラス分類を行う。文字画像がデータベースに新たに追加されるとき、既にデータベース中に存在する文字画像とのマッチングを行い、類似した文字画像が見つかった場合には新たに登録された文字をその類似文字のクラスに分類するという方法である。
ここまでに書いた処理により1文字ごとのクラスおよびその姿勢を知ることが出来たが、この実証システムにおいては、どの文字の姿勢が尤もらしいか推定し、複数の文字を含むクラスから最終的な結果1文字を決定する処理は次節の単語認識の段階で行う。
9.2.1. 問題設定
この実証システムは、英語のように単語間がスペースで区切られ、かつ左から右に記される言語に対処した単語認識を行う。また、前提条件として文書画像は文字認識の段階で独立変倍とシアーの歪みが取り除かれたものとする。我々は文字の向きを利用して文書中の文字を連結し、尤もらしい単語を求める手法を提案する。「文字の向き」とは 9.1節の文字認識で求められる回転のパラメータを指す。文字行を利用して文字の並びを推定する前記非特許文献4では文字行が平行な直線である文書に限って認識可能であるが、この発明では図1、図21及び図33のような文字行を成さない文書にも対処できることである。
最初に画像中のどの領域が1つの単語であるのかを推定する。図34のように文書画像にある程度のぼかしを掛けて二値化すると、隣接する文字同士が結合し、スペースで区切られた部分のみが分割したままの状態になる。よって、ぼかし画像の連結成分を抽出することで単語の領域を推定することが出来る。適切なぼかしの度合いはキャプチャ画像中の文字の間隔と太さによって変化するため、この実証システムでは文字間の距離および各文字の面積を計算し、それに比例したぼかしの度合いを逐次的に決定する。文字間の距離とはキャプチャ画像中の各文字から最近傍の位置にある文字とのユークリッド距離を求め、平均したものである。面積とは連結成分の画素数であり、これも平均値を用いる。ぼかしはガウシアンフィルタを用い、平均文字間距離を d、平均面積を a とすると、ガウシアンの標準偏差 σ は σ=200 × d/a とした。また、ぼかした画像の二値化処理には OpenCV の適応二値化を用いた。
この実施態様の有効性を確認するため、カメラで撮影した文書の単語を認識する実験を行った。用いた計算機は CPU が Opteron 2.8GHz、メモリが 16GB である。実験ではデータベースに登録したフォントの種類が増えたときに認識精度や処理時間がどのように変化するのかを調べた。
文字登録時にはカメラ撮影時のピンぼけや解像度の低下に対処するため、前記生成型学習法を用いる。本実験ではガウスぼかしを3段階、解像度低下を3段階 (ただし、ぼかし無し・解像度変化無しの段階も含む) の計9段階の劣化を適用した。そのため、10フォントでは文字画像 5,580 枚分のデータが登録されることになる。
また、実施形態の第8節では、より高速かつ頑健な近似最近傍探索手法を導入した。これにより、データベースに100フォント(登録画像総数は55800枚)を登録し、認識対象の文字画像に劣化(射影歪みや解像度の低下、ぼけ)が生じるという条件の下で1秒間に140文字程度を認識することが可能になった。
さらに、環境中文字列をリンクアンカー化する実証システムを作成して動作させた。前記実証システムにおいて利便性を考慮した単語認識手法を提案し、実験によって有効性を示した。
13 画像認識部
15 データベース
Claims (10)
- 少なくとも1片のパターン要素から構成され、幾何学的変換を受けたクエリ画像からその少なくとも1片のパターン要素を切り出す切り出し部と、
前記パターン要素の中にあってそのパターン要素から所定の規則に基づいて特定される第1、第2および第3特徴点を含む少なくとも3つの特徴点により表される前記パターン要素の特徴であって前記幾何学的変換に対し不変な特徴をクエリ特徴量として取得する特徴量取得部と、
パターン認識の候補として用意された複数の異なる参照パターンの特徴をそれぞれ表す複数の参照特徴量と前記クエリ特徴量とを照合する照合部と、
照合された特徴量の類似性に基づいて、前記候補の中から特定された参照パターンを認識結果として決定するパターン決定部とを備え、
各参照特徴量は、前記規則に基づいて各参照パターンから決定される特徴点を用いて表され、
前記規則に基づき、前記第1特徴点の位置は前記パターン要素の中にあって前記幾何学的変換に対して不変な点に特定され、前記第2特徴点の位置は前記パターン要素の形状に関する性質であって前記幾何学的変換に対し不変な性質を用いて特定され、前記第3特徴点の位置は前記幾何学的変換に対し不変な所定量と決定された第1および第2特徴点の位置とから特定されることを特徴とするパターン認識装置。 - 前記第1特徴点の位置は、前記パターン要素の輪郭上の画素の中から特定される請求項1に記載のパターン認識装置。
- 前記性質は、前記幾何学的変換の一種としてのアフィン変換に対して重心が不変な性質であり、
第2特徴点の位置は、前記性質を用い、前記パターン要素の重心として特定される請求項1または2に記載のパターン認識装置。 - 前記性質は、前記幾何学的変換の一種としてのアフィン変換に対して面積比が不変な性質であり、
第3特徴点の位置は、前記パターン要素の輪郭の中から特定され、かつ、前記性質を用い、前記パターン要素の面積と前記第1、第2および第3特徴点を頂点とする三角形の面積との面積比の所定値に基づいて特定される請求項1~3のいずれか一つに記載のパターン認識装置。 - 前記特徴量取得部は、前記第1乃至第3特徴点のうち2点をそれぞれ結ぶ2本の一次独立なベクトルを基底とする座標系であって前記幾何学的変換に対し不変な座標系を用いて前記幾何学的変換に対し不変な特徴を取得する請求項1~4のいずれか一つに記載のパターン認識装置。
- 前記照合部は、対応する参照パターンと関連付けられてハッシュテーブルに登録された参照特徴量と前記クエリ特徴量との照合を行い、
前記ハッシュテーブルは、複数のビンを有してなり、
各参照特徴量はその参照特徴量につき予め定められたハッシュ関数を計算して決定される一つのビンに予め分類されて登録され、
前記照合部は、前記クエリ特徴量につき前記ハッシュ関数を計算して得られるインデックスを用いて適当なビンを参照し前記照合を行う請求項1~5のいずれか一つに記載のパターン認識装置。 - 各参照特徴量は、前記第1乃至第3特徴点の座標データに対応付けられ、かつ、前記参照特徴量に対応する参照パターンの識別子に対応付けられて前記ビンに登録されてなり、
前記パターン決定部は、前記クエリ特徴量に関連付けられた各座標データと、参照されるビンに登録され各参照特徴量に関連付けられた座標データとの照合に基づき、かつ、それらの照合の多数決処理に基づいて前記クエリ画像の姿勢を推定する請求項6に記載のパターン認識装置。 - 前記パターン決定部は、少なくとも一組の分離パターンが登録された分離パターン表を有し、各分離パターンは前記参照パターンの一つに対応し前記一組の分離パターンは一つの認識結果を提供し、前記分離パターン表を参照して前記候補の中から特定された参照パターンとその組の一つの分離パターンとの間の対応関係が存するか否かを判断し、
前記対応関係が存しかつその組の他のすべての分離パターンについての対応関係が既に存するとき、前記特定された参照パターンに対応する分離パターンが属する組により提供されるものを認識結果とする請求項1~7のいずれか一つに記載のパターン認識装置。 - 前記分離パターン表には、その組のある分離パターンと他の分離パターンとの間の相対位置が登録されてなり、
前記パターン決定部は、ある特定された参照パターンに対応する分離パターンについて登録された相対位置によって定まる位置に他の特定された参照パターンがあるときに認識結果を決定する請求項8に記載のパターン認識装置。 - 前記クエリ画像は、複数の文字からなる単語のパターンを含んでなり、
前記パターン決定部により認識された各文字を1度ずつ通る最短の経路を求め、求められた経路の順及び逆順をそれぞれ単語の候補とする単語候補決定部と、
前記クエリ画像における所定方向に対する各文字の回転角を求める回転角決定部と、
前記経路の順又は逆順に沿って隣接する2文字間の回転角の差を第1の評価指標とし、各候補の何れか一端を第1文字としたとき第1文字に隣接する第2文字へ向かう方向と予め定められた読み方向の規則とに基づいて前記第1文字がとるべき回転角を推測し、推測された回転角と前記回転角決定部により決定された第1文字の回転角との差を第2の評価指標とし、第1及び第2の評価指標を最小化する候補を選択することによりその単語を構成する文字の読み順を決定する読み順決定部とをさらに備える請求項1~9のいずれか一つに記載のパターン認識装置。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2010800161588A CN102388392B (zh) | 2009-02-10 | 2010-02-09 | 模式识别设备 |
| US13/148,850 US8422793B2 (en) | 2009-02-10 | 2010-02-09 | Pattern recognition apparatus |
| JP2010550519A JP5522408B2 (ja) | 2009-02-10 | 2010-02-09 | パターン認識装置 |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2009029031 | 2009-02-10 | ||
| JP2009-029031 | 2009-02-10 | ||
| JP2009163924 | 2009-07-10 | ||
| JP2009-163924 | 2009-07-10 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2010092952A1 true WO2010092952A1 (ja) | 2010-08-19 |
Family
ID=42561794
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2010/051889 Ceased WO2010092952A1 (ja) | 2009-02-10 | 2010-02-09 | パターン認識装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US8422793B2 (ja) |
| JP (1) | JP5522408B2 (ja) |
| CN (1) | CN102388392B (ja) |
| WO (1) | WO2010092952A1 (ja) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2013077291A (ja) * | 2011-09-13 | 2013-04-25 | Ricoh Co Ltd | 画像検査装置、画像検査方法、及びプログラム |
| JP2015501017A (ja) * | 2011-09-30 | 2015-01-08 | キヤノン株式会社 | 画像検索方法 |
| US9639073B2 (en) | 2012-04-26 | 2017-05-02 | International Business Machines Corporation | Information processing apparatus for discriminating between combined results of plurality of elements, program product and method for same |
| WO2018099077A1 (zh) * | 2016-12-01 | 2018-06-07 | 京东方科技集团股份有限公司 | 图像匹配方法、装置、系统和存储介质 |
| CN113570682A (zh) * | 2021-08-02 | 2021-10-29 | 北京经纬恒润科技股份有限公司 | 一种直角路由方法和装置 |
| CN115272689A (zh) * | 2022-07-22 | 2022-11-01 | 厦门理工学院 | 基于视图的空间形状识别方法、装置、设备和存储介质 |
| CN117370591A (zh) * | 2023-12-07 | 2024-01-09 | 粤港澳大湾区数字经济研究院(福田) | 基于点集表示的矢量图识别方法、装置、终端及存储介质 |
Families Citing this family (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120162246A1 (en) * | 2010-12-23 | 2012-06-28 | Sap Portals Israel Ltd. | Method and an apparatus for automatic capturing |
| US8379979B2 (en) * | 2011-02-25 | 2013-02-19 | Sony Corporation | System and method for effectively performing a scene rectification procedure |
| JP5768590B2 (ja) * | 2011-08-22 | 2015-08-26 | 富士通株式会社 | 画像処理装置、画像処理方法及びプログラム |
| US9679221B2 (en) | 2012-08-23 | 2017-06-13 | Nec Corporation | Object identification apparatus, object identification method, and program |
| TWI443346B (zh) * | 2012-09-14 | 2014-07-01 | Chunghwa Telecom Co Ltd | 電器設備識別系統及方法 |
| CN102945373A (zh) * | 2012-10-24 | 2013-02-27 | 中国科学院自动化研究所 | 基于上下文的局部空间信息建模方法 |
| US9076223B1 (en) * | 2012-12-11 | 2015-07-07 | The Mathworks, Inc. | Fast stopping criterion for active contour algorithms |
| US8888005B2 (en) | 2013-04-12 | 2014-11-18 | David Prokop | Uniquely identifiable drug dosage form units |
| US9280827B2 (en) * | 2013-07-03 | 2016-03-08 | Mitsubishi Electric Research Laboratories, Inc. | Method for determining object poses using weighted features |
| AU2013273778A1 (en) * | 2013-12-20 | 2015-07-09 | Canon Kabushiki Kaisha | Text line fragments for text line analysis |
| CN110268224A (zh) * | 2017-02-10 | 2019-09-20 | 深圳市大疆创新科技有限公司 | 用于无人机实时位置跟踪的系统和方法 |
| US20180349110A1 (en) * | 2017-05-31 | 2018-12-06 | Wipro Limited | Method and layout identification system for facilitating identification of a layout of a user interface |
| CN110942064B (zh) * | 2019-11-25 | 2023-05-09 | 维沃移动通信有限公司 | 图像处理方法、装置和电子设备 |
| CN111783770B (zh) * | 2020-01-16 | 2024-05-24 | 北京沃东天骏信息技术有限公司 | 图像的矫正方法、装置和计算机可读存储介质 |
| US12488004B2 (en) * | 2020-05-29 | 2025-12-02 | Soco, Inc. | Question answering retrieval via sparse transformer matching |
| US12094184B2 (en) | 2020-09-22 | 2024-09-17 | Apple Inc. | Contextual matching |
| KR102378659B1 (ko) | 2021-02-23 | 2022-03-25 | 주식회사 포스로직 | 패턴 이미지 검출 방법 및 패턴 이미지 검출 장치 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0628476A (ja) * | 1992-07-09 | 1994-02-04 | Nec Corp | 画像信号の処理装置 |
| JP2003242509A (ja) * | 2001-12-13 | 2003-08-29 | Toshiba Corp | パターン認識装置及びその方法 |
| JP2005084765A (ja) * | 2003-09-05 | 2005-03-31 | Univ Of Fukui | 文字認識装置及び方法及びそのプログラム |
| WO2006092957A1 (ja) * | 2005-03-01 | 2006-09-08 | Osaka Prefecture University Public Corporation | 文書・画像検索方法とそのプログラム、文書・画像登録装置および検索装置 |
| JP2008252856A (ja) * | 2007-03-07 | 2008-10-16 | Osaka Prefecture Univ | 画像の補正方法、補正プログラムおよび画像歪み補正装置 |
| JP2008269582A (ja) * | 2007-03-28 | 2008-11-06 | Sharp Corp | 画像処理装置、画像形成装置、画像送信装置、画像読取装置、画像処理システム、画像処理方法、画像処理プログラムおよびその記録媒体 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5054094A (en) * | 1990-05-07 | 1991-10-01 | Eastman Kodak Company | Rotationally impervious feature extraction for optical character recognition |
| US6539106B1 (en) * | 1999-01-08 | 2003-03-25 | Applied Materials, Inc. | Feature-based defect detection |
| US7200270B2 (en) * | 2001-12-13 | 2007-04-03 | Kabushiki Kaisha Toshiba | Pattern recognition apparatus and method using distributed model representation of partial images |
| US7734067B2 (en) * | 2004-12-07 | 2010-06-08 | Electronics And Telecommunications Research Institute | User recognition system and method thereof |
| US8036497B2 (en) | 2005-03-01 | 2011-10-11 | Osaka Prefecture University Public Corporation | Method, program and apparatus for storing document and/or image using invariant values calculated from feature points and method, program and apparatus for retrieving document based on stored document and/or image |
| US7991189B2 (en) | 2007-03-28 | 2011-08-02 | Sharp Kabushiki Kaisha | Image processing apparatus, image forming apparatus, image processing system, and image processing method |
| EP2284796A4 (en) | 2008-04-28 | 2012-10-31 | Univ Osaka Prefect Public Corp | METHOD FOR GENERATING AN IMAGE DATABASE FOR OBJECT DETECTION, PROCESSING DEVICE AND PROCESSING PROGRAM |
-
2010
- 2010-02-09 WO PCT/JP2010/051889 patent/WO2010092952A1/ja not_active Ceased
- 2010-02-09 JP JP2010550519A patent/JP5522408B2/ja not_active Expired - Fee Related
- 2010-02-09 CN CN2010800161588A patent/CN102388392B/zh not_active Expired - Fee Related
- 2010-02-09 US US13/148,850 patent/US8422793B2/en not_active Expired - Fee Related
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0628476A (ja) * | 1992-07-09 | 1994-02-04 | Nec Corp | 画像信号の処理装置 |
| JP2003242509A (ja) * | 2001-12-13 | 2003-08-29 | Toshiba Corp | パターン認識装置及びその方法 |
| JP2005084765A (ja) * | 2003-09-05 | 2005-03-31 | Univ Of Fukui | 文字認識装置及び方法及びそのプログラム |
| WO2006092957A1 (ja) * | 2005-03-01 | 2006-09-08 | Osaka Prefecture University Public Corporation | 文書・画像検索方法とそのプログラム、文書・画像登録装置および検索装置 |
| JP2008252856A (ja) * | 2007-03-07 | 2008-10-16 | Osaka Prefecture Univ | 画像の補正方法、補正プログラムおよび画像歪み補正装置 |
| JP2008269582A (ja) * | 2007-03-28 | 2008-11-06 | Sharp Corp | 画像処理装置、画像形成装置、画像送信装置、画像読取装置、画像処理システム、画像処理方法、画像処理プログラムおよびその記録媒体 |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2013077291A (ja) * | 2011-09-13 | 2013-04-25 | Ricoh Co Ltd | 画像検査装置、画像検査方法、及びプログラム |
| JP2015501017A (ja) * | 2011-09-30 | 2015-01-08 | キヤノン株式会社 | 画像検索方法 |
| US9639073B2 (en) | 2012-04-26 | 2017-05-02 | International Business Machines Corporation | Information processing apparatus for discriminating between combined results of plurality of elements, program product and method for same |
| WO2018099077A1 (zh) * | 2016-12-01 | 2018-06-07 | 京东方科技集团股份有限公司 | 图像匹配方法、装置、系统和存储介质 |
| US10679364B2 (en) | 2016-12-01 | 2020-06-09 | Boe Technology Group Co., Ltd. | Image matching method, image matching apparatus, image matching system, and storage medium |
| CN113570682A (zh) * | 2021-08-02 | 2021-10-29 | 北京经纬恒润科技股份有限公司 | 一种直角路由方法和装置 |
| CN113570682B (zh) * | 2021-08-02 | 2024-05-07 | 北京经纬恒润科技股份有限公司 | 一种直角路由方法和装置 |
| CN115272689A (zh) * | 2022-07-22 | 2022-11-01 | 厦门理工学院 | 基于视图的空间形状识别方法、装置、设备和存储介质 |
| CN117370591A (zh) * | 2023-12-07 | 2024-01-09 | 粤港澳大湾区数字经济研究院(福田) | 基于点集表示的矢量图识别方法、装置、终端及存储介质 |
| CN117370591B (zh) * | 2023-12-07 | 2024-04-12 | 粤港澳大湾区数字经济研究院(福田) | 基于点集表示的矢量图识别方法、装置、终端及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20120230592A1 (en) | 2012-09-13 |
| JPWO2010092952A1 (ja) | 2012-08-16 |
| CN102388392B (zh) | 2013-09-11 |
| CN102388392A (zh) | 2012-03-21 |
| JP5522408B2 (ja) | 2014-06-18 |
| US8422793B2 (en) | 2013-04-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP5522408B2 (ja) | パターン認識装置 | |
| CN109308476B (zh) | 票据信息处理方法、系统及计算机可读存储介质 | |
| US8744196B2 (en) | Automatic recognition of images | |
| Tian et al. | Multilingual scene character recognition with co-occurrence of histogram of oriented gradients | |
| Ye et al. | Text detection and recognition in imagery: A survey | |
| CN101957919B (zh) | 基于图像局部特征检索的文字识别方法 | |
| JP2018136926A (ja) | コンテナコード認識のための方法及びシステム | |
| Park et al. | Automatic detection and recognition of Korean text in outdoor signboard images | |
| CN114937278B (zh) | 基于行文本框分词算法的文本内容提取识别方法 | |
| CN101452532A (zh) | 一种文本无关笔迹鉴别的方法和装置 | |
| Sahare et al. | Robust character segmentation and recognition schemes for multilingual Indian document images | |
| Zhu et al. | Deep residual text detection network for scene text | |
| CN111340020A (zh) | 一种公式识别方法、装置、设备及存储介质 | |
| CN105913057A (zh) | 一种结合投影和结构特征进行图像中数学公式检测方法 | |
| CN110569818A (zh) | 一种智能阅读学习方法 | |
| Liu et al. | Scene text recognition with high performance CNN classifier and efficient word inference | |
| JP3917349B2 (ja) | 文字認識結果を利用して情報を検索する検索装置および方法 | |
| CN104899551B (zh) | 一种表单图像分类方法 | |
| CN111213157A (zh) | 一种基于智能终端的快递信息录入方法及录入系统 | |
| JP2009032109A (ja) | 文書画像検索方法、文書画像登録方法、そのプログラムおよび装置 | |
| Chandolikar et al. | Devanagari Characters Recognition: Extracting Best Match for Photographed Text. | |
| JP2012008979A (ja) | 文字列探索方法、文字列探索装置、記録媒体 | |
| AlKhateeb et al. | Interactive knowledge discovery for baseline estimation and word segmentation in handwritten Arabic text | |
| Kumar | Methods for text segmentation from scene images | |
| Wigati et al. | Combination of Salient Object Detection and Image Matching for Object Instance Recognition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 201080016158.8 Country of ref document: CN |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10741229 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2010550519 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 13148850 Country of ref document: US |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 10741229 Country of ref document: EP Kind code of ref document: A1 |