AU2015201663A1

AU2015201663A1 - Dewarping from multiple text columns

Info

Publication number: AU2015201663A1
Application number: AU2015201663A
Authority: AU
Inventors: Michael John Blennerhassett
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-03-31
Filing date: 2015-03-31
Publication date: 2016-10-20

Abstract

- 39 DEWARPING FROM MULTIPLE TEXT COLUMNS A method of modifying an image of a document forms a first region (610) of the image and second region (620) of the image, each of the regions being defined by corresponding detected text lines, with the second region of the document being located outside the first region of the image. The method determines a (first) dewarp function for the first region and modifying the image of the document by applying the first dewarp function to the first region to remove warping of the detected text lines of the first region. The image of the document is then modified (640) by applying a (second) dewarp function of the second region (630) to a point in the second region of the image and applying an offset to the point in the second region by applying the first dewarp function to a reference point of the second region of the image of the document. O-7'AAA I Dl2OECO -; Ie 110b 140 130 120 110a Camera c Scanner Dewarp image Fig. 1

Description

DEWARPING FROM MULTIPLE TEXT COLUMNS

TECHNICAL FIELD

[001] The current invention relates to image processing and, in particular, to processing an input image to detect a warp in the input image and produce a modified image.

BACKGROUND

[002] The proliferation of imaging technology combined with ever increasing computational processing power has led to many advances in the area of document analysis. In addition, many images of documents are distorted by perspective and warp. This can occur for example where the image is scanned from a document formed by a book with a large spine, or the image of the document is captured by a camera. The presence of perspective and warp complicates the problem of detecting text lines, columns, paragraphs and other document structures in the document image. Also, Optical Character Recognition (OCR) may perform better when the perspective and warp are removed. Storing the document image without perspective and warp may be preferable.

[003] There are previous methods of correcting for warp and perspective that use the detected outline of the page in the image. However, the outline of the page may not always be completely visible in the image. Some previous methods use the detected text lines in the image to correct the warp and perspective, but have trouble correcting the warp and perspective if the detected text lines are not already close to horizontal. Other previous methods divide the image into a number of zones, and use dominant detected text lines in each zone to remove the perspective and warp from the image. The zone based method may not produce good results when the zones are selected suboptimally, or the dominant text lines in a zone are not of similar length. Alternatively, there are methods that rely on the texture patterns that text characters make in an image. The texture based methods are sometimes computationally expensive, may be confused by non-text structure in the image, and may produce suboptimal results when text effects such as italic are present.

SUMMARY

[004] According to one aspect of the present disclosure there is provided a method of modifying an image of a document, the method comprising: forming a first region of the image and second region of the image, each of the regions being defined by corresponding detected text lines, with the second region of the document being located outside the first region of the image; determining a (first) dewarp function for the first region and modifying the image of the document by applying the first dewarp function to the first region to remove warping of the detected text lines of the first region; and modifying the image of the document by applying a (second) dewarp function of the second region to a point in the second region of the image and applying an offset to the point in the second region by applying the first dewarp function to a reference point of the second region of the image of the document.

[005] Desirably the method comprises: (a) forming a first region from text lines of the image; (b) forming at least one second region from text lines of the image; (c) determining an offset of the second region from the first region; and (d) modifying the image by applying a dewarp function of the second region and applying the offset to the dewarped second region.

[006] In a further implementation the method, comprises: (a) forming a first region from text lines of the image; (b) forming at least one second region from text lines of the image; (c) determining the offset based on a point in the second region relative to a point in the first region; (d) applying the first dewarp function to the comers of the second region to determine the second dewarp function; and (e) modifying the image by dewarping the second region with the second dewarp function and applying the offset to the second dewarped region.

[007] Preferably a dewarped region forms a fixed region and steps (c) to (d) are performed on a warped region adjacent a fixed region forming part of set of fixed regions of the image.

[008] Typically the offset applied to the point in the second region moves the second region to prevent the first region from overlapping the second region when the dewarp function for the first region is applied to the first region.

[009] In one example, an orientation of text in the first region is different to an orientation of text in the second region.

[0010] In another example, the first region is a dominant region and is selected using projection peaks of text line endpoints based on a projection angle to form a straight line fitted to the corresponding endpoints. Preferably, text lines associated with endpoints fitted to a straight line are removed from a set of text lines of the image such that further regions can be formed from text lines remaining in the set.

[0011] Desirably, the region of the document is defined by a first curve defined by a detected text line and a second curve defined by a detected text line. The region of the document may be defined by a first straight line fitted through a set of text line endpoints and a second straight line fitted through a set of text line endpoints.

[0012] In another implementation, the first region is selected based on a number of text line endpoints used for straight line fitting for region one being larger than a number of text line endpoints used for straight line fitting for region two.

[0013] In a further example, a rotation is defined for region two based on region one if region two is determined to be rotated relative to region one.

[0014] In a further implementation, the second region is determined to be rotated relative to the first region when a vanishing point vector of the second region defined at the reference point is difference in angle to a parallel vector of the first region at the reference point than a vanishing point vector of the first region at the reference point.

[0015] Other aspects are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] At least one embodiment of the invention will now be described with reference to the following drawings, in which: [0017] Fig. 1 shows a context diagram of a system for image dewarping; [0018] Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon which arrangements described can be practised; [0019] Fig. 3 is a schematic flow diagram of a data processing architecture for a method of image dewarping according to the present disclosure; [0020] Fig. 4 is a schematic flow diagram illustrating a method of detecting text lines in images of documents with warp and perspective as used in the method of Fig. 3; [0021] Fig. 5 is a schematic flow diagram illustrating a method of detecting text lines and column gaps in images of documents with warp and perspective as used in the method of Fig. 4; [0022] Fig. 6 is a schematic flow diagram illustrating a method of modifying an image from two detected text line regions as used in the method of Fig. 3; [0023] Fig. 7 is a schematic flow diagram illustrating a method of finding projection profile peaks to form the straight edges of text line regions as used in the method of Fig. 6; [0024] Fig. 8 is a schematic flow diagram illustrating a method of forming regions from straight edges and detected text lines as used in the method of Fig. 7; [0025] Fig. 9 is a schematic flow diagram illustrating a method of adjusting multiple detected regions to dewarp an image as used in the method of Fig. 3; [0026] Fig. 10 is a schematic flow diagram illustrating a method of adjusting a pair of detected regions as used in the method of Fig. 9; [0027] Fig. 11 is a schematic flow diagram illustrating a method of determining whether two detected regions have perpendicular text direction as used in the method of Fig. 9; [0028] Fig. 12A shows an example document image input; [0029] Fig. 12B shows a representation of connected components formed from the document image of Fig. 12A; [0030] Fig. 13 A shows a Delaunay triangulation of the centrepoints of the ellipse representing the connected components; [0031] Fig. 13B shows the resulting text line fragments created using the Delaunay triangulation in Fig. 13A; [0032] Fig. 14A shows the formation of anchor points and triangles from centres of CCs; [0033] Fig. 14B shows the formation of anchor points and triangles based on an aspect ratio of the text line fragment; [0034] Fig. 14C shows the calculation on the aspect ratio from a set of CCs; [0035] Fig. 15 shows a Delaunay triangulation of the anchor points from the text line fragments shown in Fig. 13B; [0036] Figs. 16A shows the text line fragments from Fig. 13B with the Crust; [0037] Fig. 16B shows the text lines created using the text line fragments, the Crust and the triangulation from Fig. 15; [0038] Fig. 17A shows a Delaunay triangulation of the anchor points from the text line fragments shown in Fig. 13B with the column gap triangles highlighted; [0039] Fig. 17B shows the created text lines from Fig. 16B which have been split using the column gap triangles in Fig. 17A; [0040] Fig. 18 shows an example of a projection profile of detected line endpoints for a given angle; [0041] Fig. 19A shows the detected text lines from Fig. 17B where cubic splines have been fitted through the connected components comprising each detected text line; [0042] Fig. 19B shows the detected regions from the cubic splines from Fig. 19A and the projection profile of the detected text line endpoints; [0043] Fig. 20A illustrates a method to dewarp a point using a detected region; [0044] Fig. 20B shows the dewarped region from Fig. 20A and example dewarped points; [0045] Fig. 21A shows the dewarped text line fragments from Fig. 13B; [0046] Fig. 21B shows the dewarped connected components from Fig. 12B; [0047] Fig. 21C shows the dewarped pixels from Fig. 12A; [0048] Fig. 22A shows an example image with text lines in different directions; [0049] Fig. 22B shows the detected text line cubic splines from Fig. 22A; [0050] Fig. 22C shows the detected regions from the cubic splines from Fig. 22B and the projection profile of the detected text line endpoints; [0051] Fig. 23 A shows the dewarped text line fragments for the image in Fig 22A; [0052] Fig. 23B shows the dewarped connected components for the image in Fig. 22A; and [0053] Fig. 23C shows the dewarped pixels from Fig. 22A. DETAILED DESCRIPTION INCLUDING BEST MODE Context [0054] Fig. 1 schematically depicts an image modification system 100 for detecting and correcting a warp in a document image. The image modification system 100 processes an image 111 of an input document to produce an electronic document 160. The electronic document 160 can be further processed, for example by detecting complete paragraphs, determining paragraph reading order and/or performing optical character recognition (OCR), and/or being edited in a word processing application.

[0055] The image 111 may be produced by any of a number of sources, such as by a scanner 120 scanning a hardcopy document 110a, by retrieval from a data storage system 130 such as a hard disk having a database of images stored on the hard disk, or by digital photography of a hardcopy document 110b using a camera 140. These are merely examples of how the warped image 111 might be provided. As another example, the image 111 could be created by a software application as an extension of printing functionality of the software application.

[0056] The input image 111 is processed 150 to modify the image to remove the warp. One way of modifying the image is to use a process where text lines made from Connected Components (CCs) are grouped into multiple regions. The individual regions can be used to dewarp points, which are then adjusted to fit the final modified image, and reproduced to a user as the electronic document 160 using a display device 162 associated with a computing apparatus 164. The electronic document 160 can then be further processed as described above.

[0057] The process 150 for modifying an image of a document involves detecting regions defined by detected text lines and can be carried out on a computer processing arrangement configured to examine the image components to extract or at least delineate the text lines from other image content. Figs. 2A and 2B depict a general-purpose computer system 200, upon which the various arrangements described can be practised.

[0058] As seen in Fig. 2A, the computer system 200 includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227, and a microphone 280; and output devices including a printer 215, a display device 214 and loudspeakers 217. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional “dial-up” modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220.

[0059] The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes an number of input/output (I/O) interfaces including: an audio-video interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface 211 may comprise an Ethernet circuit card, a Bluetooth™ wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 211.

[0060] The EO interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 200.

[0061] The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun Sparcstations, Apple Mac™ or a like computer systems.

[0062] In the specifically detailed arrangement of Fig. 2 A, the computer module 201 may represent the computer apparatus 164 upon which the process 150 may be performed, with the display 214 representing the display 162. The camera 227 and scanner 226 may be representative of the camera 140 and scanner 120 respectively, whereas the storage 130 of Fig. 1 may be functionally implemented by any one or more of the disk medium 225, the HDD 210, or the networks 220 and 222.

[0063] The methods of image modification to detect and correct a warp in an image may be implemented using the computer system 200 wherein the processes of Figs. 3-11 to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the steps of the method of text line detection are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the detection methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

[0064] The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for implementing methods detect and correct for warp and perspective in an image.

[0065] The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 200 preferably effects an apparatus for detecting and correcting a warp in an image.

[0066] In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray Disc™, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 201 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

[0067] The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.

[0068] Fig. 2B is a detailed schematic block diagram of the processor 205 and a “memory” 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.

[0069] When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

[0070] The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 200 of Fig. 2A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.

[0071] As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244 - 246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.

[0072] The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228-230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.

[0073] In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.

[0074] The disclosed column gap detection arrangements use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The arrangements produce output variables 261, which are stored in the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

[0075] Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises: (i) a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230; (ii) a decode operation in which the control unit 239 determines which instruction has been fetched; and (iii) an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.

[0076] Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.

[0077] Each step or sub-process in the processes of Figs. 3-11 is associated with one or more segments of the program 233 and is performed by the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233.

[0078] The methods of Figs. 3-11 may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions to be described. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories. Such implementations may be particularly suited to implementations where the gap detection process 150 is integrally formed within the scanner 120, or other similar device, including a photocopier or multifunction device (not illustrated), each of which includes a scanning function. Alternatively the presently described functionality may be incorporated into the camera 140/227 subject to available processing capacity.

Overview [0079] Fig. 3 illustrates an overview process 300 for modifying an image of a document in further detail than described at process 150 of Fig. 1. The process 300 is preferably implemented as a software application for example stored on the HDD 210 and executable by the processor 205 in concert with the memory 206, for example as a whole or part of the application program 233. An input image 310 is received from an image source such as the hand-held camera 227 or the scanner 226 as described at image input step 111 of Fig. 1. The image 310 may be stored to either or both of the memory 206 and HDD 210. Next the processor 205 applies colour quantisation 320 to the image 310, for example by threshold binarisation to form a quantised image 325 in the memory 206. Other colour quantisations may be used, including quantisation to a fixed number of colours, or a variable number of colours based on a colour distance metric. In a next step 330, the processor 205 processes the quantised image 325 to form Connected Components (CCs) 335 by merging together adjacent pixels with the same quantised colour. Connected Components 335 are then classified at step 340 by the processor 205 to produce, at least, a list or group of text (classified) connected components (text CCs) 345. An example classifier operates to classify all CCs in the image as text CCs. This may be appropriate where the input image 310 contains, for example, black text on a plain white background. However it is advantageous and sometimes necessary to classify some CCs as non-text, such as the document image background, line art and pictures. An example classifier can use CC size features to classify some CCs as non-text. Accordingly step 340 may be implemented to produce multiple classifications. Following step 340, all steps described operate only upon text CCs 345. Accordingly, any reference simply to CCs in such steps should be construed as a reference to text CCs, unless otherwise expressly noted.

[0080] A process 350 is then performed by the processor 205 to detect text lines (i.e. lines of text as might appear in a document as opposed to line portions of individual text characters) by merging together text CCs to form text line fragments and then merging the text line fragments to form text lines 355. A process 360 is then implemented by the processor 205 to firstly detect 362 regions 365 in the image by using the text lines 355 detected at step 350. The detected regions 365 are then further used in step 360 to modify 367 the image of the document by applying a corresponding dewarp function to each of the detected regions 365 to form a dewarped output 369.

[0081] At step 370 the image modified by process 360 is further processed by the processor 205, for example for display to the user on the display device 214 or for use in a paragraph forming step or OCR step as described above and operating upon the electronic document 160 of Fig. 1.

[0082] CCs in general and text CCs in particular are formed as groups of pixels possessing like properties, and therefore may take arbitrary shapes (e.g. font characters). In a processing environment, it is convenient however to represent each CC as an ellipse, having a height and width, forming an aspect ratio, and sometimes qualified by an angle of inclination to some reference axis, generally defined by axes of the ellipse. The ellipses are centred over the midpoint of a bounding box of the pixels of the CC. In this fashion, interrelationships between different CCs can be readily established and compared without need to consider the differences between individual complex CC shapes. In the processing to be described, particularly that following step 340, CCs are considered as represented by ellipses. One way to choose an ellipse to represent a CC is to calculate the width, length and principal axis using Image Moments. Image moments can be calculated by summing pixel values weighted by position.

The zero order image moment is the sum of the number of a set of pixels, which is the area of the CC. The one-zero and the zero-one order image moments are the sum of a set of pixels weighted by x-position and y-position respectively. Dividing the one-zero and zero-one image moments by the zero image moment gives the x-position and y-position respectively for the centroid, or centre of mass, of a set of pixels. The image moments that are useful for calculating the aspect ratio of a set of pixels are the central moments, which are calculated by summing pixels weighted by position which is offset by the centroid. The covariance matrix of a set of pixels can be constructed by the two-zero central moment and the zero-two central moment on the diagonal, and the one-one central moment on both off-diagonal positions. All four elements of the matrix are divided by the zero image moment to give the covariance matrix. The eigenvectors of the covariance matrix correspond to the major (long) and minor (short) axes of the set of pixels, and the eigenvalues of the covariance matrix are proportional to the squared length of the eigenvector axes. One way to calculate the principal axis of a set of pixels is to take the angle of the eigenvector associated with the larger of the eigenvalues. One way to calculate the width and height of a CC from the eigenvalues of the covariance matrix is to scale each square root of each eigenvalue by a chosen constant. For example in Fig. 12B, the ellipses used to represent the CCs from Fig. 12 A have each square root of each eigenvalue scaled by 1.7 to produce the width and height for each CC. This value is chosen so that many ellipses look representative of the CCs for which they were calculated.

First Implementation [0083] Fig. 4 illustrates an exemplary method 400 which operates to detect text lines as shown by process 350. At an initial triangulation step 410, the centrepoints of ellipses for all text classified CCs received from CC classification step 340 are triangulated to form edges between the ellipse centrepoints. A suitable point set triangulation is the Delaunay triangulation, which has the property that the circumcircle for each triangle (made up of three edges) contains no other point in the triangulation. Other point set triangulations or nearest neighbour methods can alternatively be used to create edges between text classified CCs.

[0084] An example input document image 1210 is shown in Fig. 12A having input pixels formed as characters. In the example of Fig. 12A, the image 1210 is of a two-column document where the image displays some amount of skew and warping, and a gap 1212 between two columns 1216 and 1218 of text.

[0085] The corresponding text CC representation 1220 of the image 1210 is shown in Fig. 12B where individual text CCs are represented by ellipses 1221. In this example it will be appreciated that characters 1214 of Fig. 12A are represented by ellipses 1224 in Fig. 12B. The CCs in Fig. 12B, and others in this patent specification, are represented by ellipses with the major and minor axes corresponding to the second order central moments of the CC and the angle of the ellipse corresponding to the principal axis for the CC. Fig. 13 A shows the corresponding Delaunay triangulation 1310 of the centrepoint of the ellipses representing the CCs of 1220 in Fig. 12B.

[0086] At step 420, neighbouring text CCs are merged together based on interconnection features between the corresponding ellipses and the triangulation edges created in step 410. Many such features can be developed and used. Some simple examples of such interconnection features include: - the distance between pairs of points, such as points joined by an edge in the tri angulation, - the relative angle between pairs of points joined by an edge in the triangulation and other edges, such as edges forming axes of CCs or triangulation edges, - the relative distance between pairs of points joined by an edge in the triangulation and other edge connected points, and - the relative sizes of the text CCs connected by an edge in the tri angulation, to mention but a few. The decision to merge together a pair of text CCs, or not, can be based on heuristics and the interconnection features, or a machine learned classifier such as a Support Vector Machine (SVM).

[0087] At step 430, neighbouring text CCs which were merged together at merging step 420 are formed into text line fragments 435. A text line fragment can contain one or more text CCs, and can represent any portion of a text line, from a sub-character (such as only some of the CCs that make up a Chinese character) to a whole text line. Text CCs which were not merged with any other text CC at merging step 420 can nevertheless form a single CC text line fragment.

[0088] At step 440 the aspect ratio 445 of the text line fragments 435 is calculated. One way to determine the aspect ratio of a text line fragment is to calculate the second order image moments using the pixels of all CCs contained in a text line fragment. Image moments can be calculated by summing pixel values weighted by position in the text line fragment. The zero order image moment is the sum of the number of a set of pixels, which is the area of a CC or group of CCs. The one-zero and the zero-one order image moments are the sum of a set of pixels weighted by x-position and y-position respectively. Dividing the one-zero and zero-one image moments by the zero image moment gives the x-position and y-position respectively for the centroid, or centre of mass, of a set of pixels. The image moments that are useful for calculating the aspect ratio of a set of pixels are the central moments, which are calculated by summing pixels weighted by position which is offset by the centroid. The covariance matrix of a set of pixels can be constructed by the two-zero central moment and the zero-two central moment on the diagonal, and the one-one central moment on both off-diagonal positions. All four elements of the matrix are divided by the zero image moment to give the covariance matrix. The eigenvectors of the covariance matrix correspond to the major (long) and minor (short) axes of the set of pixels, and the eigenvalues of the covariance matrix are proportional to the squared length of the eigenvector axes. One way to calculate the principal axis of a set of pixels is to take the angle of the eigenvector associated with larger of the eigenvalues. The aspect ratio of a set of pixels can be calculated by taking the square root of the larger eigenvalue divided by the square root of the smaller eigenvalue. If the eigenvalues are equal then the aspect ratio is one. One way to calculate the length and width of a text line fragment is to scale the square root of each eigenvalue by a chosen amount, for example 1.7.

[0089] Another way to calculate the aspect ratio 445 for a set of pixels is to calculate the smallest rotated rectangle that contains all the pixels, and use the ratio of the width and height of the rotated rectangle as the aspect ratio for the set of pixels. The angle of the smallest rotated rectangle that contains all the pixels can be used as the principal axis. Another way to approximate the aspect ratio of a text line fragment is to count the number of contained CCs, since for some languages many text CCs have similar size. Another way to calculate the principal axis is to fit a straight line through the centre points of the CCs that make up the text line fragment, and calculate the angle of the fitted line. For example, Fig. 14C shows four CCs represented as ellipses 1441, along with the length 1442 and thickness 1443 calculated from the image moments. The ellipse 1444 is defined by the length and thickness and the centre of mass of the four CCs. The aspect ratio can be found by dividing the length 1442 by the thickness 1443. In the example shown, the aspect ratio is approximately four.

[0090] The result of step 440 is an aspect ratio 445 for each text line fragment 435. These values may be stored in the memory 234 as part of the intermediate variables 258.

[0091] At step 450, one or more, and typically many, text line fragments 435 are split apart. This is to allow for the text line fragments 435 to better represent a text line when there is a large amount of warp or curve in the text line in the image. A text line fragment is split by creating two new text line fragments, each containing a subset of the connected components from the original text line fragment, for each text line fragment that has an aspect ratio over a given threshold (for example, an aspect ratio of 5). Each of the two new text line fragments can then also be split apart if the aspect ratio of the two new text line fragments is over a given threshold. Sometimes a text line fragment cannot be split apart because there are insufficient connected components contained in the text line fragment to create two new text line fragments. In this case, the text line fragment is left unsplit.

[0092] Text line fragments 435 can also be split apart if they are determined to contain connected components from more than one text line. This can be achieved by comparing the height and angle of the text line fragment to the neighbouring text line fragments, and the height of the text line fragment to the size of the contained connected components. If a text line fragment is determined to contain connected components from more than one text line, then it can be split apart by individually removing text line fragments until it is determined not to contain connected components from more than one text line, or can be reformed from the contained connected components into one or more text line fragments using a more conservative method.

[0093] The end result of step 450 is a revised set of text line fragments 455 typically containing many more fragments than originally determined in step 440.

[0094] At process 460, the text line fragments 455 are used to form text lines, by merging together text line fragments to create detected text lines 465. A detected text line 465 can contain one or more text line fragments 455. This process is described in more detail with respect to Fig. 5.

[0095] Fig. 5 illustrates an exemplary method 500 of detecting text lines from text line fragments. The method 500 is useful in performing the detection process 460 of Fig. 4 and is desirably implemented as a software sub-module of the application 233.

[0096] At step 510, multiple anchor points for each text line fragment are created by the processor 205. One way to create multiple anchor points for each text line fragment is to use the centre points of the CCs which make up the text line fragment. An example of this is seen in Fig. 14A for a number of CCs 1402 having corresponding centres 1404 and which make up two text line fragments 1406 and 1408. The centres 1404 form anchor points 1404 for the corresponding fragment. Connections 1410 between the anchor points 1404, both within the fragments 1406 and 1408 and across the fragments 1406 and 1408, create and enclose regions (e.g. triangles) 1412 defined by three of the anchor points 1404. Note that each vertex of each triangle 1412 forms part of a text line fragment.

[0097] Where the text is a script other than Latin, using the centre points of the CCs which make up the text line fragments may sometimes produce a less useful triangulation for detecting text column gaps and detecting text lines. Another way to create multiple anchor points on each text line fragment useful in such instances is to calculate the aspect ratio and principal axis of each text line fragment. This can be calculated using the method described at step 440 of Fig. 4. Once an aspect ratio and a principal axis are calculated for a text line fragment, multiple anchor points can be created along a line of best fit through the text line fragment. The line of best fit through the text line fragment can be calculated using the principal axis and the centre of mass of the text line fragment, or by fitting a straight line through the centre points of the CCs that make up the text line fragment, or by using the long axis midline of a rotated rectangle that contains all of the pixels of the text line fragment. Multiple anchor points can then be created on the line of best fit between the start point and end point of the text line fragment. The number of anchor points created can be the floor function applied to the aspect ratio, so for example if the aspect ratio of the text line fragment is 3.1, then 3 anchor points can be created for the text line fragment evenly spaced along the line of best fit of the text line fragment. If the aspect ratio of the text line fragment is close to one, then only one anchor point may be created for the text line fragment. Alternatively, if the aspect ratio of a text line fragment is large, but the length of the text line fragment is within a threshold of the width of neighbouring text line fragments, then only one anchor point may be created for the text line fragment.

[0098] At triangulation step 520, the anchor points created in step 510 for the text line fragments are triangulated to form connections between the anchor points created for the text line fragments. Triangles may be formed in the manner discussed above with respect to

Fig. 14A and the formation may thus depend upon the manner of creation of the anchor points. A suitable point set triangulation is the Delaunay triangulation, which has the property that the circumcircle for each triangle (enclosed by three connected vertices) contains no other point in the tri angulation. Other point set triangulations or nearest neighbour methods can alternatively be used to form connections between anchor points. In the case where the anchor points of the text line fragments are the centre points of the CCs that make up the text line fragments, the triangulation at step 520 may be identical to the triangulation of step 410 used to create the text line fragments and hence directly used instead of creating a new tri angulation.

[0099] At step 530, triangles are located within the triangulation of anchor points from step 520 which have vertices from three different or distinct text line fragments. Triangles which have vertices from three different text line fragments are called column gap triangles.

An example of this is shown in Fig. 14B where four text line fragments 1420 -1426 define anchor points (not illustrated in this figure) from which a triangulation 1430 is formed. The triangulation in this case results in two triangles 1432 and 1434 for which each has a vertex that is formed from a corresponding and distinct text line fragment. The triangle 1432 is associated in this fashion with the line fragments 1420, 1424 and 1426, whereas the triangle 1434 is associated with the line fragments 1420, 1422 and 1426. Accordingly, the triangles 1432 and 1434 may be candidates for column gap triangles.

[00100] A pair of adjacent triangles is any pair of triangles that share a common edge.

By joining together all pairs of adjacent column gap triangles, a set of column gap triangles can be created that represents the gap between text columns. An example of a common edge is the edge 1439 shown in Fig. 14B and which is common to the triangles 1432 and 1434. Records of the anchor points, triangles and common edges as determined by the processor 205 can be stored in appropriate registers of the intermediate variables 258 discussed above.

[00101] At step 540, the triangulation created at step 520 is used to form a Crust. The Crust is a sub-graph of a Delaunay triangulation, and is found by adding the circumcentre points for all triangles in a Delaunay triangulation to the original points from the first Delaunay triangulation and creating a second Delaunay triangulation from the expanded set of points.

The Crust is then formed by the edges of the second Delaunay triangulation where both endpoints are not circumcentre points. For example, Fig. 15 shows a Delaunay triangulation 1510 which can be used to create the crust shown in Fig. 16A and parts of the crust labelled as 1610.

[00102] Once the Crust has been calculated at step 540, the Crust can be used as a basis to form text lines at step 550. Initial text lines can be constructed where the Crust forms a continuous non-forking line. A non-forking line is where each point in a Crust has only two edges connected. At step 560 the initial text lines can be extended from the endpoints of the initial text line based on the distance threshold and angle threshold of edges in the Delaunay tri angulation. For example in Fig. 16A there is no crust links across gaps 1611 and 1612 between relatively adjacent text lines. However, the gap 1611 is small enough for the text line to link across, as shown in Fig. 16B by detected text line 1631. The gap 1612 is too large to extend a text line across, as shown by a detected gap in text lines 1632 in Fig. 16B.

[00103] At step 570 the text lines can be refined by breaking the text lines where they intersect with the detected gaps between text columns. This can be achieved by removing an edge from a text line where the edge is an edge from a triangle that resides in a joined set of column gap triangles, as created by step 530. Once the edge has been removed from the text line, two text lines are created in place of the extended text line. For example Fig. 16B shows detected text lines 1630 before they are broken by the detected column gap. Fig. 17A shows the detected column gap 1710. Fig. 17B shows the text lines 1630 after being broken by the column gap 1710 to produce detected text lines 1720, 1730 and 1740.

[00104] Once text lines have been detected they can have a curved representation created for them based on the text CCs that make up the detected text line. One way to do this is to rotate the text line by an angle formed by the start and endpoint of the text line and fitting one or more cubic splines through the centres of the text CCs that make up the text line, weighted by the number of pixels in each text CC. The rotation before cubic spline fitting is useful as splines are functions (each x-value has only one y-value), whereas the detected text lines may not be represented accurately as functions.

[00105] Once text lines have been formed at step 570, step 460 concludes and processing control returns to the region formation and dewarp modification step 360 of Fig. 3.

[00106] Fig. 6 illustrates process 600 which is used to modify an image of a document as shown by process 360. The process 600 also may be implemented as a sub-module of the application program 233. At step 610 a first region is formed by the processor 205 from the detected text lines in the image. An example of step 610 is described in further detail in process 700 from Fig. 7. One method of forming a region from detected text lines is to search for straight lines that fit through the endpoints of the detected text lines. The strongest of these lines, (determined for example using the approach of Fig. 18), can then be used to determine a second straight line (the other side of the column) which shares many of the same detected text lines. Two representative text lines can then be selected that span between the two straight lines, forming a region. The detected text lines used to form the first region are then removed from the set of detected text line in the image, and the remaining detected text lines are used to form a second region from detected text lines in step 620. The second region is located outside of the first region and the two regions are separated by a non-zero distance.

[00107] At step 630, an offset of or between the second region from the first region is calculated. The offset is determined from a reference point associated with the second region. One such reference point is the closest point of the second region to the first region out of the two straight line endpoints. However, to maintain a larger distance between the two regions in the final modified image, choosing the second closest point to the first region out of the two straight line endpoints may be a better choice for the reference point.

[00108] The offset is determined by applying a dewarp function of the first region to the reference point associated with the second region. The dewarp function of a region is a function that, when applied to points inside or close to the region, removes much of the perspective and warp from the original image. For example, when a dewarp function is applied to points making up a line of text, the line of text will be straightened in the modified image. When the dewarp function is applied to points making up a paragraph of text, the endpoints of the lines of text in the paragraph will be straighter or more aligned. A dewarp function may have or produce a fixed point, which is where the dewarped point is the same as the input point. This may be the centre point of the region or one of the endpoints of the two straight lines of the region.

[00109] One way to create a dewarp function for a region is to create a function that, for a given input point, will return a dewarped point. One way to return a dewarped point for an input point is to test whether the input point is between the two straight lines (extended to infinity) of the region and to find an intersecting straight line that goes through the input point and intersects the first and second curves of the region at equal ratios along the length of each curve. In many cases finding this intersecting straight line requires an initial guess by calculating the vanishing point from the first and second straight lines of the region, and then using the input point and the vanishing point to create an initial guess, determining the ratios of the intersections with the curves of the region and adjusting the guess until the ratios of intersection with the curves are within a threshold of each other. Once this intersecting straight line is found, then the dewarped point can be calculated by the ratio of the distance along the curves as above, the ratio of the distance along the intersecting line from the input point to the intersecting points of the two curves of the region and the region’s width, height and fixed point. For input points that are not between the two straight lines of the region, an intersecting line is found that goes through the input point and intersects extended lines from the endpoints of the nearest straight line of the region, with slope of the curves of the region at those endpoints. The intersecting line is chosen so that the distance from the endpoints and the intersections with the extended lines are equal. Once this intersecting line is found, then the dewarped point can be calculated by the distance of the endpoints to the intersection relative to the model width, the ratio of the distance along the intersecting line from the input point to the intersecting points of the extended lines and the model height and the fixed point of the region. In this fashion, dewarp functions corresponding to each of the regions of the document image may be determined. Once the dewarp function of the first region is applied to the point associated with the second region, the resulting dewarped point is used to define the offset of the second region from the first region. The offset of the second region from the first region is the dewarp function of the second region applied to the point associated with the second region minus the dewarp function of the first region applied to the point associated with the second region. The offset applied to the point in the second region effectively moves the second region. Moving the second region prevents the first region from overlapping the second region when the dewarp function for the first region applied is applied to the first region.

[00110] Once the offset of the second region from the first region is calculated in step 630, then the second region can be used to modify the image. At step 640 the processor 205 executes the dewarp function of the second region to dewarp an input point in the second region. The dewarped input point is then offset by the vector point calculated in step 630 to produce the final location for the dewarped input point.

[00111] An example of dewarping a point using a dewarp function for a region is shown in Fig. 20A and Fig. 20B. In Fig. 20A, a region 2000 is shown with corner points 2010, 2011, 2012 and 2013. The region 2000 is defined by two straight lines (i.e. the straight line 2021 between points 2011 and 2012, and the straight line 2022 between points 2010 and 2013), and two curves (i.e. the curve 2020 between points 2010 and 2011, and the curve 2023 between points 2013 and 2012). A point to be dewarped using the dewarp function of the region 2000 is shown as point 2030. A straight line can be found between points 2031 and 2032 through point 2030 which intersects both of the curves 2020 and 2023 at equal ratios along their length. That is, the distance along the curve 2020 between points 2010-2031 divided by the distance along the curve 2020 between points 2010-2011 equals the distance along the curve 2023 between points 2013-2032 divided by the distance along the curve 2023 between points 2013-2012.

[00112] Fig. 20B shows the region 2000 from Fig. 20A in dewarped space as a rectangle 2060. Point 2050 is taken to be equal to point 2010. Point 2030 is then dewarped to point 2070 by adding the ratio of the distances between points 2010-2031 to 2010-2011, multiplied by the width of rectangle 2060 to the x-coordinate of point 2050, and adding the ratio of the distances between points 2031-2030 to 2031-2032, times the height of rectangle 2060 to the y-coordinate of point 2050. The same method is used to dewarp a point if the point is above curve 2020, except the distance 2030-2031 is counted as negative such that the dewarped point will have a smaller y-value than point 2050 (with origin at top left of image and increasing y down the image).

[00113] When dewarping a point where the point is not between the straight edges of the region, for example the point 2040 in Fig. 20A with respect to the region 2000, straight lines are extended from the nearest corner point on each curve at the angle of the tangent to the curve at those corner points. The straight lines are then used in a similar manner to the previous example. For example, the closest comer points to point 2040 are point 2010 and point 2013, and straight lines 2043 and 2044 have been extended at the angle of the tangent to the curves 2020 and 2023 at point 2010 and point 2013 respectively. A straight line 2045 between points 2041-2042 can then be created such that the distances between points 2010-2041 and points 2013-2042 are equal. Alternatively, the straight line 2045 between points 2041-2042 can be created such that the ratio of the distance between points 2010-2041 to the distance between points 2013-2042 is equal to the ratio of the distance between points 2010-2011 to the distance between points 2013-2012. Point 2040 of Fig. 20A is then dewarped to point 2080 of Fig. 20B by subtracting the ratio of distance 2010-2041 to distance 2010-2011, times the width of rectangle 2060, from the x-coordinate of point 2050, and adding the ratio of distance 2041-2040 to distance 2041-2042, times the height of rectangle 2060, to the y-coordinate of point 2050.

The same method is used to dewarp a point if the point is above curve 2020, except the distance 2040-2041 is counted as negative such that the dewarped point will have a smaller y-value than point 2050 (with origin at top left of image and increasing y down the image).

[00114] Fig. 7 illustrates a process of finding projection peaks for detected text lines endpoints and forming regions from detected text lines, used as a method 700 to form regions from text lines according to steps 610 and 620 of Fig. 6. At step 710, a list of angles is determined by the processor 205 to perform steps 720, 730 and 740. For example, the list of angles can contain each angle between positive 90 degrees and negative 90 degrees at increments of 1 degree. In this way, projection peaks of any angle can be detected to a resolution of approximately 1 degree. For a given rotation angle, the endpoints of all detected text lines in a set of detected text lines are rotated by the given angle at step 720. The rotation is performed relative to an arbitrary origin, which for example can be the top left corner of the image. At step 730 the x-coordinates of the rotated text line endpoints are used by the processor 205 to find the peaks of a projection profile of the text lines. One way of finding the projection peaks is to place a Gaussian function, having a width related to an average height of the text lines detected in the image (for example 3 times an average height of the text lines detected in the image), over each x-coordinate of the rotated text line endpoints. This forms a smoothed projection profile that can then be used to find peaks by looking for a point where the projection profile has smaller values on points either side of the given point. Another way of finding the projection peaks is to use the mean shift algorithm to estimate a mean, and adjust the mean based on surrounding x-coordinates until the adjustment is smaller than a threshold, at which point a peak is detected. This is discussed below with reference to Fig. 18.

[00115] At step 740 detected text lines can be collected into each detected peak based on a distance from the peak and a threshold based on the average height of the text lines detected in the image. Projection peaks which are close to each other (relative to the average height of the text lines detected in the image), or contain many of the same projected x-coordinates of text line endpoints can be merged into a single projection peak. At step 750, if there are additional angles in the angle list to process, then control returns to step 710 to process a new angle. Otherwise control proceeds to process 760. At process 760, the projection peaks and text lines are used to form a region, as further described by process 800 from Fig. 8.

[00116] An example of some projection peaks is shown in Fig. 18, where numerous detected text lines (for example 1810) are shown with endpoints (for example 1820). The projection angle is shown as angle 1830 and two Gaussian peaks for this projection angle are shown as 1840 and 1850. Each of the peaks 1840, 1850 has numerous text line endpoints contributing to the peak, which will have an approximately Gaussian shape when a Gaussian function is centred on each of the text line endpoint projections and all the Gaussian functions are summed together. In this fashion the peaks 1840 and 1850 represent the peaks of the summed Gaussian functions for the corresponding sets of text line endpoints.

[00117] Fig. 8 illustrates a preferred process 800 used in step 760 to form regions based on the detected text lines in an image. The projection peaks detected by process 700 from Fig. 7 are sorted by the strength of the projection peak at step 810. The strength of the projection peak is akin to magnitude and can be calculated by the processor 205 by the number of text line endpoints the peak contains, or alternatively by the value of the peak where each x-coordinate of the rotated text line endpoints is overlayed with a Gaussian function. A first peak is then selected by the processor 205 at step 820 as the strongest peak. A second peak is selected at step 830. The second peak is preferably selected by taking the strongest peak that is not the first peak and has many text lines in common with the first peak, but not many of the same endpoints in common as the first peak.

[00118] Once two projection peaks are selected, then a first text line can be found at step 840. The first text line can be found from a combined set of detected text lines associated with either the first projection peak or the second projection peak. This is done by the processor 205 fitting a first straight line through the endpoints of the first projection peak and fitting a second straight line through the endpoints of the second projection peak. The straight line fitting can be achieved using linear regression. From the combined set of detected text lines, a first text line is found that has endpoints close to or extending beyond the first and second straight lines from the projection peaks and has a point furthest from the origin when the nearest point to the first straight line is projected onto the first straight line (using a perpendicular projection). At step 850, a second text line is found from the combined set of detected text lines where the second text line has endpoints close to or extending beyond the first and second straight lines from the projection peaks and has a point furthest from the first text line when the nearest point to the first straight line is projected onto the first straight line (using a perpendicular projection).

[00119] At step 860, a region in the image is formed from the first and second text lines, and the curves of the first and second straight lines. If any of the text line endpoints are beyond the first or second straight line, the curve used to represent the text line is copied and a truncated version is used to represent the region, where the truncated version stops at the straight lines. If any of the text line endpoints do not extend to at least the first or second straight line, then the curve used to represent the text line is copied and extended to meet the straight line by extending the curve linearly from the endpoint and at the slope of the endpoint.

[00120] An example the above process is shown in Fig. 19AandFig. 19B. Fig. 19A shows the curves used to represent the detected text lines from the example of Fig. 17B. Some example curves are labelled 1910, 1915, 1920, and 1925. Note that some of the detected text lines are fragmented, and some of the detected text lines are incorrectly extended. By applying projection peak detection process 700 to the detected text line endpoints, various projection peaks can be detected. The strongest of the peaks correspond to the straight lines 1960 and 1965 from Fig. 19B as there are more endpoints that contribute to these projection peaks.

Curves 1950 and 1955 complete the corresponding region 1959 formed by the lines 1960 and 1965 and correspond to curves 1920 and 1925. Similarly, additional peaks can be found corresponding to straight lines 1940 and 1945, with curves 1930 and 1935 that complete the corresponding region 1939 corresponding to curves 1910 and 1915.

[00121] It will be appreciated from Fig. 19B that the various processes described above provide for the various text lines detected on a page to be each resolved into a corresponding region based upon a projection of the end of the text line to a line defining an edge or boundary of the region. The regions are representative of the columns of text on a page and thus the processes provided for dewarping an image of a page in the presence of multiple columns of page content. Expressed conversely, the method described provide for resolving multiple columns of a page in the presence of warp of an original page image.

Second Implementation [00122] Fig. 9 illustrates a process 900 which modifies an input image as shown by process 360 from Fig. 3, and which represents an alternative to the approach of Fig. 6 described above. At step 910 a first region is formed from the text lines in the image detected using the process 350. This can be performed using the process 700 from Fig. 7. In addition to the process described with reference to the method 800 from Fig. 8 (process 760), regions may be found with only one peak by creating a second straight line from the first straight line associated with the found first peak. A second straight line is created by offsetting the first straight line by a distance based on the lengths of the text lines that correspond to the found peak, and offset in such a way that the second straight line is parallel to the first straight line. The two curves for the region are then found as described previously using steps 840 and 850, and the region is formed using step 860 from Fig. 8.

[00123] Additionally, all regions are tested for whether either of the angles of the straight lines are within a threshold to vertical or horizontal and within a distance threshold to any edge of the image. If a straight line of a region meets the criteria for distance and angle, the straight line is considered an artefact of the edge of the image and called an edge straight line. A replacement straight line is constructed for the region by offsetting the non-edge straight line of the region so that replacement straight line touches the other end of the longer of the two curves that define the region and so that replacement straight line is parallel to the non-edge straight line. The shorter of the two curves that define the region is then extended in the direction of the tangent at the endpoint of the shorter line so that shorter line intersects with the replacement straight line.

[00124] At step 920 the text lines from the projection peaks used to form the first region are then removed from the set of detected text lines in the image, and process 700 is applied to the remaining text lines to form a second region from the detected text lines. The removal of text lines used to form a region and process 700 can be repeatedly used to form multiple regions, until there are not enough text lines remaining to construct a region. This is equivalent to detecting second, third, and so on, columns on the page image. If there are left over text lines after this process, then each left over text line can have an individual region constructed for the text line. Individual regions may be created only for text lines which have a length over a threshold. The two straight lines of the region can be constructed by either using the lines orthogonal to the endpoints of the text line or by using the straight lines from a nearby region. The two curves of the region are constructed by copying the curve fitted to the text line, offsetting one curve by a positive vector and one curve by a negative vector with magnitude of half the text line thickness, and direction an average of the two straight lines for the region.

[00125] Once all regions have been formed, the first region is designated a fixed region, and all other regions are designated unfixed. At step 930, all unfixed regions are processed in turn by steps 940, 950 and 960 to convert them into fixed regions. At step 940 a closest region to any fixed region is found. That is, for all fixed regions, the distance is tested to all unfixed regions, and the fixed region and unfixed region with the smallest distance are selected. The selected fixed region and closest unfixed region are used at step 950 to calculate the offset, scale and rotation for the closest region, using process 1000 from Fig. 10. The closest region can then be considered fixed and is then added to the set of fixed regions at step 960 and removed from the set of unfixed regions. At step 970 processing control returns to step 930 to find the closest region to any fixed region if there are any remaining regions in the set of unfixed regions. Otherwise processing control proceeds to step 980. At step 980, the set of fixed regions is used to modify the image. This step is similar to step 640 from Fig. 6, with the exceptions that there may be more than two regions and there may need to be adjustments made based on the scale and rotation parameters of the region. A point is dewarped to produce a modified image using the closest region. In the case where the closest region is formed from a single text line only, the closest non-single text line region can be used to dewarp the point if the point is further away from the single text line region than a threshold based on the thickness of the text line forming the single text line region.

[00126] Fig. 10 illustrates process 1000 which can be implemented by software executable by the processor 205 to adjust the offset, scale and rotation of a second region, given a first region. At step 1010 a closest comer point of the second region to the first region is found. The comer points of a region are the set of endpoints of the two straight lines of the region. The comer points can be used as reference points. At step 1020 the second closest corner point to the first region is found. At step 1030 the distance from the first region to the midpoint of the first straight line of the second region is calculated. At step 1040 the distance from the first region to the midpoint of the second straight line of the second region is calculated. At decision step 1050, if the second closest reference point as calculated at step 1020 is on the straight line that is further from the first region as defined by the midpoint distances calculated in steps 1030 and 1040, then process 1060 operates to adjust the offset for the second region based on the closest reference point. Otherwise, the second closest reference point is on the straight line that is closest to the first region as defined by the midpoint distances, and process 1070 adjusts the offset for the second region based on the second closest reference point. Steps 1060 and 1070 each operate in the same fashion but with different input reference points, as determined at step 1050. Both steps 1060 and 1070 can be performed the same way as for step 630 from Fig. 6. The scale for the second region is then adjusted based on the first region at step 1080. This can be achieved by using the first region’s dewarp function to dewarp the four corner points of the second region in the manner described above with the first implementation. The distances between the four dewarped corner points from the second region can then be used to calculate the height and width of the second region in the dewarped space. The dewarped space height can be found by averaging the distances between the two sets of top and bottom points, or by taking the distance between the top and bottom points closest to the first region. Similarly, the dewarped space width can be found by averaging the distances between the two sets of left and right points, or by taking the distance between the left and right points closest to the first region. The dewarped space width and height are then used to establish the dewarp function for the second region, which can then be applied to all points in the second region.

[00127] At process 1090, the rotation of the second region with respect to the first region is calculated. This is explained further in process 1100 from Fig. 11.

[00128] Fig. 11 illustrates a preferred process 1100 executable by the processor 205 to determine whether two regions are formed from detected text lines which should have an orientation that is perpendicular to each other when the image is modified to remove warp and perspective. At step 1110a region vector of the first region is found at a given point. This point is usually a point on the second region, such as the closest or second closest corner point of the second region to the first region. If the given point is outside the two straight edges of the region, then the region vector at that point is a vector that is along the closest edge of the region. If the given point is inside the two straight lines of the region, then the region vector is the distance weighted average of the vectors along the two straight lines. At step 1120, a perpendicular region vector of the first region is found at a given point. This point is usually a point on the second region, such as the closest or second closest corner point of the second region to the first region. If the given point is outside the two straight lines of the region, the perpendicular region vector at that point is a vector that is a distance weighted average of the vectors tangent to the top and bottom curves of the region. If the given point is inside the two straight lines of the region, then the perpendicular region vector is the distance weighted average of the tangents to the curve where the region vector for the given point intersects the top and bottom curves of the region. At step 1130, a region vector of the second region is found at a given point. This point is usually a point on the second region, such as the closest or second closest corner point to the first region. The region vector is calculated in the same manner as in step 1110. At decision step 1140, if the difference in angles between the region vector for the first region and the region vector for the second region is smaller than the difference in angles between the perpendicular region vector for the first region and the region vector for the second region, then no adjustment to the rotation angle is needed and process 1100 ends. Otherwise, the two regions are likely to be perpendicular to one another, and the second region should be rotated with respect to the first region when dewarped. In this case, at step 1150, the rotation angle of the second region is set to be 90 degrees with respect to the first region. This rotation angle is applied to all points dewarped using the dewarp function of the second region. At step 1160 the offset of the second region is again adjusted with respect to the first region, as due to the rotation, it will be different. The offset can be calculated in the same manner as step 630 from Fig. 6, with the exception that the rotation is applied to the reference point before the offset is calculated from the first region.

Example(s)/User Case(s) [00129] Fig. 12A shows an example input image 1210 with two columns 1216 and 1218 of text separated by a text column gap 1212. The connected components generated for the image can be represented as ellipses as shown in Fig. 12B. The centre points of the connected components can be triangulated with a Delaunay triangulation as shown in Fig. 13 A, and the connected components merged into text line fragments as represented by ellipses in Fig. 13B and with some example text line fragments labelled 1320. Multiple anchor points can then be created for the text line fragments, and these can be triangulated, as shown in Fig. 15. Column gap triangles can be determined from the triangulation 1510, shown shaded in Fig. 17A as 1710. The triangulation 1510 and the column gap triangles 1710 can be used to detect text lines, as shown in Fig. 17B, with some example detected text lines shown as 1720, 1730 and 1740. Curves, such as rotated cubic splines can be fitted through the connected components that make up each text line, and are shown in Fig. 19A. From the curves in Fig. 19A and the text line endpoints, multiple regions can be found, as shown in Fig. 19B. The regions from Fig. 19B can be used to dewarp points in the original image, creating a modified image. The more dominant region, defined by lines 1950, 1955, 1960, 1965, is used to define the offset for the region defined by lines 1930, 1935, 1940, 1945. When dewarping a point, the nearest region is used. For example, Fig. 21A shows the dewarped endpoints of the text line fragments from Fig. 13B. Fig. 21B shows the dewarped points of the convex hulls of the connected components for image 1210 from Fig. 12A. Fig. 21C shows the dewarped pixels of image 1210 from Fig. 12A.

[00130] Fig. 22A shows an example input image where there are different text line directions. Fig. 22B shows the curves created for the detected text lines from Fig. 22A.

Fig. 22C shows the detected regions 2210 and 2220 from the detected text lines and curves from Fig. 22B. The more dominant region 2210 is used to determine the offset for region 2220. Also, region 2220 is determined to be perpendicular to region 2210, and so a 90 degree rotation is applied when region 2220 is used to dewarp a point. Fig. 23 A shows the dewarped text line fragments from the image in Fig. 22A. Note that the dominant region 2210 has been dewarped so that the text lines are horizontal, and that the text lines from region 2220 have been dewarped to still be perpendicular to the text lines from 2210. Fig. 23B shows the dewarped convex hulls for the connected components from the image in Fig. 22A. Fig. 23C shows the dewarped pixels for the image in Fig. 22A.

[00131] The first and second implementations describe respective ways of achieving the same result. By way of brief summary:

In the first implementation, the image is modified by: (i) determining the first and second dewarp functions; (ii) determining the offset; (iii) applying the first dewarp function to the first region; and (iv) applying the second dewarp function and then the offset to the second region;

In the second implementation, the image is modified by: (i) determining the first dewarp function; (ii) applying the first dewarp function to the corners of the second region to determine the second dewarp function: (iii) determining the offset; and (iv) dewarping the first region with the first dewarp function, and dewarping the second region with the second dewarp function and applying the offset to the second dewarped region.

[00132] From this, it will be appreciated that the various steps can be performed in mixed orders to achieve the same result. For example, the offset may be applied to the second region before or after dewarping of that region.

INDUSTRIAL APPLICABILITY

[00133] The arrangements described are applicable to the computer and data processing industries and particularly for the imaging of text particularly when formed as columns or in regions, and in the presence of warp.

[00134] The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

[00135] (Australia Only) In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word "comprising", such as “comprise” and “comprises” have correspondingly varied meanings.

Claims

CLAIMS: The claims defining the invention are as follows:

1. A method of modifying an image of a document, the method comprising: forming a first region of the image and second region of the image, each of the regions being defined by corresponding detected text lines, with the second region of the document being located outside the first region of the image; determining a (first) dewarp function for the first region and modifying the image of the document by applying the first dewarp function to the first region to remove warping of the detected text lines of the first region; and modifying the image of the document by applying a (second) dewarp function of the second region to a point in the second region of the image and applying an offset to the point in the second region by applying the first dewarp function to a reference point of the second region of the image of the document.
2. A method according to claim 1, comprising: (a) forming a first region from text lines of the image; (b) forming at least one second region from text lines of the image; (c) determining an offset of the second region from the first region; and (d) modifying the image by applying a dewarp function of the second region and applying the offset to the dewarped second region.
3. A method according to claim 1, comprising: (a) forming a first region from text lines of the image; (b) forming at least one second region from text lines of the image; (c) determining the offset based on a point in the second region relative to a point in the first region; (d) applying the first dewarp function to the comers of the second region to determine the second dewarp function; and (e) modifying the image by dewarping the second region with the second dewarp function and applying the offset to the second dewarped region.
4. A method according to claim 3, wherein a dewarped region forms a fixed region and steps (c) to (d) are performed on a warped region adjacent a fixed region forming part of set of fixed regions of the image.
5. A method according to claim 1 wherein the offset applied to the point in the second region moves the second region to prevent the first region from overlapping the second region when the dewarp function for the first region is applied to the first region.
6. A method according to claim 1 wherein an orientation of text in the first region is different to an orientation of text in the second region.
7. A method according to claim 1 wherein the first region is a dominant region and is selected using projection peaks of text line endpoints based on a projection angle to form a straight line fitted to the corresponding endpoints.
8. A method according to claim 7 wherein text lines associated with endpoints fitted to a straight line are removed from a set of text lines of the image such that further regions can be formed from text lines remaining in the set.
9. A method according to claim 1, wherein the region of the document is defined by a first curve defined by a detected text line and a second curve defined by a detected text line.
10. A method according to claim 9 wherein the region of the document is defined by a first straight line fitted through a set of text line endpoints and a second straight line fitted through a set of text line endpoints. 11 A method according to claim 9, wherein the first region is selected based on a number of text line endpoints used for straight line fitting for region one being larger than a number of text line endpoints used for straight line fitting for region two.
12. A method according to claim 9, wherein a rotation is defined for region two based on region one if region two is determined to be rotated relative to region one.
13. A method according to claim 9 wherein the second region is determined to be rotated relative to the first region when a vanishing point vector of the second region defined at the reference point is difference in angle to a parallel vector of the first region at the reference point than a vanishing point vector of the first region at the reference point.
14. A non-transitory computer readable storage medium having a program recorded thereon, the program being executable by a processor to modify an image of a document, the program comprising: code for forming a first region of the image and second region of the image, each of the regions being defined by corresponding detected text lines, with the second region of the document being located outside the first region of the image; code for determining a (first) dewarp function for the first region and modifying the image of the document by applying the first dewarp function to the first region to remove warping of the detected text lines of the first region; and code for modifying the image of the document by applying a (second) dewarp function of the second region to a point in the second region of the image and applying an offset to the point in the second region by applying the first dewarp function to a reference point of the second region of the image of the document.
15. The computer readable storage medium according to claim 14, comprising: (a) code for forming a first region from text lines of the image; (b) code for forming at least one second region from text lines of the image; (c) code for determining the offset based on a point in the second region relative to a point in the first region; (d) code for applying the first dewarp function to the corners of the second region to determine the second dewarp function; and (e) code for modifying the image by dewarping the second region with the second dewarp function and applying the offset to the second dewarped region.
16. The computer readable storage medium according to claim 15, wherein a dewarped region forms a fixed region and code (c) to (d) are executable on a warped region adjacent a fixed region forming part of set of fixed regions of the image.
17. The computer readable storage medium according to claim 14 wherein at least one of (1) the offset applied to the point in the second region moves the second region to prevent the first region from overlapping the second region when the dewarp function for the first region is applied to the first region; (2) an orientation of text in the first region is different to an orientation of text in the second region; and (3) the first region is a dominant region and is selected using projection peaks of text line endpoints based on a projection angle to form a straight line fitted to the corresponding endpoints, wherein text lines associated with endpoints fitted to a straight line are removed from a set of text lines of the image such that further regions can be formed from text lines remaining in the set.
18. The computer readable storage medium according to claim 14, wherein the region of the document is defined by a first curve defined by a detected text line and a second curve defined by a detected text line, and at least one of (1) the region of the document is defined by a first straight line fitted through a set of text line endpoints and a second straight line fitted through a set of text line endpoints; (2) the first region is selected based on a number of text line endpoints used for straight line fitting for region one being larger than a number of text line endpoints used for straight line fitting for region two; (3) a rotation is defined for region two based on region one if region two is determined to be rotated relative to region one; and (4) the second region is determined to be rotated relative to the first region when a vanishing point vector of the second region defined at the reference point is difference in angle to a parallel vector of the first region at the reference point than a vanishing point vector of the first region at the reference point.
19. Computerised apparatus comprising at least a processor and a memory, the memory storing a program executable by the processor to implement a method of modifying an image of a document, the method comprising: forming a first region of the image and second region of the image, each of the regions being defined by corresponding detected text lines, with the second region of the document being located outside the first region of the image; determining a (first) dewarp function for the first region and modifying the image of the document by applying the first dewarp function to the first region to remove warping of the detected text lines of the first region; and modifying the image of the document by applying a (second) dewarp function of the second region to a point in the second region of the image and applying an offset to the point in the second region by applying the first dewarp function to a reference point of the second region of the image of the document.
20. Computerised apparatus according to claim 19 wherein the method comprises: (a) forming a first region from text lines of the image; (b) forming at least one second region from text lines of the image; (c) determining the offset based on a point in the second region relative to a point in the first region; (d) applying the first dewarp function to the comers of the second region to determine the second dewarp function; and (e) modifying the image by dewarping the second region with the second dewarp function and applying the offset to the second dewarped region.