[go: up one dir, main page]

WO2024181963A1 - Universal handwriting recognition - Google Patents

Universal handwriting recognition Download PDF

Info

Publication number
WO2024181963A1
WO2024181963A1 PCT/US2023/013939 US2023013939W WO2024181963A1 WO 2024181963 A1 WO2024181963 A1 WO 2024181963A1 US 2023013939 W US2023013939 W US 2023013939W WO 2024181963 A1 WO2024181963 A1 WO 2024181963A1
Authority
WO
WIPO (PCT)
Prior art keywords
handwriting
strokes
data representing
model
recognition models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2023/013939
Other languages
French (fr)
Inventor
Thomas WALLAND
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to PCT/US2023/013939 priority Critical patent/WO2024181963A1/en
Publication of WO2024181963A1 publication Critical patent/WO2024181963A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font

Definitions

  • the present disclosure relates generally to handwriting recognition. More particularly, the present disclosure relates to recognizing both traditional handwriting on a sheet of paper or other writing medium and digital handwriting and preserving the characters and the style characteristics of the handwriting for future use in additional handwriting being performed in a software application that can process digital handwriting, such as a note-taking application.
  • One example aspect of the present disclosure is directed to a computer-implemented method for performing handwriting recognition.
  • the method can include receiving, by an electronic processor, data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting and processing, by the electronic processor, the data representing the handwriting with one or more handwriting recognition models.
  • the method can also include receiving, by the electronic processor, one or more strokes associated with the handwriting from the one or more handwriting recognition models and providing, by the electronic processor, the strokes to a software application that supports digital handwriting.
  • the method can further include generating, by the electronic processor, a visual representation of the handwriting in the software application based on the strokes and generating, by the electronic processor, a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.
  • the computing system include sone or more electronic processors and a non-transitory, computer readable medium comprising instructions that, when executed by the one or more electronic processors, cause the one or more electronic processors to perform operations.
  • the operations can include receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handw riting and processing the data representing the handwriting with one or more handwTiting recognition models.
  • the operations can also include receiving one or more strokes associated with the handwriting from the one or more handwriting recognition models and providing the strokes to a software application that supports digital handwriting.
  • the operations can further include generating a visual representation of the handwriting in the software application based on the strokes and generating a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handw riting.
  • Another example aspect of the present disclosure is directed to a non-transitory, computer readable medium comprising instructions that, when executed by one or more electronic processors, cause the one or more electronic processors to perform operations.
  • the operations can include receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting and processing the data representing the handwriting with one or more handwriting recognition models.
  • the operations can also include receiving one or more strokes associated with the handwriting from the one or more handwriting recognition models and providing the strokes to a software application that supports digital handwriting.
  • the operations can further include generating a visual representation of the handwriting in the software application based on the strokes and generating a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.
  • Figure 1 A depicts a block diagram of an example computing system that performs handwriting recognition according to example embodiments of the present disclosure.
  • Figure IB depicts a block diagram of an example computing device that performs handwriting recognition according to example embodiments of the present disclosure.
  • Figure 1C depicts a block diagram of an example computing device that performs handwriting recognition according to example embodiments of the present disclosure.
  • Figure 2 depicts a block diagram of an example computing system according to example embodiments of the present disclosure.
  • Figure 3 depicts a block diagram of an example computing system according to example embodiments of the present disclosure.
  • Figure 4 depicts a flow chart diagram of an example method to perform handwriting recognition according to example embodiments of the present disclosure.
  • the present disclosure is directed to universal handwriting recognition, or recognizing handwriting input by any medium (as an image of an actual handwritten note or as digital handwriting) and then allowing the user of a computing device to continue to write digitally using the recognized handwriting (e.g., maintaining the style of the received handwriting or changing the style of the received handwriting to a preferred style).
  • the present invention can enable users to edit recognized handwriting as if the handwriting had been performed in real-time by the user immediately prior.
  • the present invention can enable users to quickly change styles of handwriting input to capture certain idiosyncrasies of different handwriting styles, such as a particular way of writing certain characters and the like. Furthermore, the present invention enables the user to change an existing style of the received handwriting to beautify or otherwise alter the handwriting, which can enable the handwriting to be more recognizable (e.g., a scribbled doctor’s note can be recognized and then converted into a handwriting style that is more readable).
  • a user may wish to import an old recipe written down on a piece of paper from their grandmother and make further digital modifications to the recipe without ruining the piece of paper that has the original recipe written down on it by adding new marks.
  • the user may also wish to preserve sentimental value of the handwnting (e.g., preserving a grandmother’s handwriting and not writing over or around it in a different script or handwriting style).
  • the present invention enables the user to import the handwriting into a digital format as-is and then make further marks in digital handwriting in the same style and handwriting as the imported handwriting (e.g., as if the grandmother of the user had made any modifications or additions herself).
  • the present invention enables users of digital handwriting software applications to quickly and efficiently import both traditional handwriting and digital handwriting from a variety of sources into a digital handwriting software application while preserving idiosyncratic style associated with the handwriting without the need to copyindividual pixels. Instead, the present invention enables machine learning models to leam “strokes” of the handwriting (traditional and digital) by identifying x-y input coordinates and tracking them over time.
  • the machine learning models can leam how the input writing was written and then be used to, for example, allow changes to be made to the handwriting in a digital handwriting software application while maintaining similar style characteristics and idiosyncrasies (e.g., how a writer writes particular characters) for the changes being made by the user of the software application.
  • example computing systems of the present disclosure are able to convert handwriting to a text-based representation such as Unicode.
  • the computing system can then convert the text-based representation (e.g., Unicode) to a digital handwriting format (e.g., a “digital ink” format associated with a particular note taking application, which may, for example, be represented as a set of strokes in x-y space).
  • a generative model can be trained to convert from text-based representations to digital handwriting formats.
  • the digital handwriting format can be expressed or represented using the stylization of original handwriting or a different stylization.
  • the resulting data fde can include both the text-based representation (e.g., Unicode) and the digital handwriting representation. This enables the user to continue interacting with the digital handwriting in the note-taking application while preserving the text-based representation for other uses.
  • Figure 1A depicts a block diagram of an example computing system 100 that performs handwriting recognition according to example embodiments of the present disclosure.
  • the system 100 includes a user computing device 102, a server computing system 130, and a training computing system 150 that are communicatively coupled over a network 180.
  • the user computing device 102 can be any type of computing device, such as, for example, a personal computing device (e.g., laptop or desktop), a mobile computing device (e.g., smartphone or tablet), a gaming console or controller, a wearable computing device, an embedded computing device, or any other type of computing device.
  • a personal computing device e.g., laptop or desktop
  • a mobile computing device e.g., smartphone or tablet
  • a gaming console or controller e.g., a gaming console or controller
  • a wearable computing device e.g., an embedded computing device, or any other type of computing device.
  • the user computing device 102 includes one or more processors 112 and a memory 114.
  • the one or more processors 112 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
  • the memory 114 can include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
  • the memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations.
  • the user computing device 102 can store or include one or more handwriting recognition models 120.
  • the handwriting recognition models 120 can be or can otherwise include various machine-learned models such as neural networks (e.g., deep neural networks) or other types of machine-learned models, including non-linear models and/or linear models.
  • Neural networks can include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks.
  • Some example machine- learned models can leverage an attention mechanism such as self-attention.
  • some example machine-learned models can include multi-headed self-attention models (e.g., transformer models).
  • Example handwriting recognition models 120 are discussed with reference to Figures 2 and 3.
  • the one or more handwriting recognition models 120 can be received from the server computing system 130 over network 180, stored in the user computing device memory 114, and then used or otherwise implemented by the one or more processors 112.
  • the user computing device 102 can implement multiple parallel instances of a single handwriting recognition model 120 (e.g., to perform parallel handwriting recognition across multiple instances of received handwriting).
  • the one or more handwriting recognition models 120 can generally recognize handwriting received from any medium (e.g., receiving traditional handwriting as an image and recognizing the handwriting and/or receiving digital handwriting and recognizing the handwriting based on received x-y coordinates and the sequence of the reception of these coordinates over time of a user writing digital handwriting and/or other means).
  • the handwriting can be recognized as a series of “strokes,” which are defined as the x-y coordinates being input over time to show how the user physically inputs the handwriting, thus capturing the actual act of a user writing something.
  • a synthetic generative model can be used to generate further handwriting that shares a style with the received handwriting, including various idiosyncrasies of the handwriting (particular style choices, ways of writing particular characters, and the like).
  • the synthetic generative model (or synthetic handwriting model) can sit on top of the one or more handwriting recognition models 120 and receive from the one or more handwriting recognition models 120 the sty le of the recognized handwriting, and then output additional handwriting as a user continues to add to or modify the received handwriting in the same style as the received handwriting.
  • These idiosyncrasies may also be referred to herein as “style characteristics” of the handwriting.
  • the one or more handwriting recognition models 120 can receive various inputs that represent handwriting.
  • the handwriting recognition models 120 can receive an image of handwriting on a sheet of paper or other writing medium and can perform image recognition to recognize characters and/or words in the image.
  • the one or more handwriting recognition models 120 can also determine stroke information that would be associated with the handwriting, despite not having stroke information that is associated with digital handwriting.
  • the one or more handwriting recognition models 120 can predict one or more strokes (e.g., predict when a writer of the handwriting wrote each portion of the handwriting) associated with the handwriting in the image, even though information related to the actual input of the handwriting is not available.
  • the handwriting recognition models 120 can then output the recognized handwriting as Unicode for other models to process.
  • the handwriting recognition models 120 can recognize characters as pixels and output strokes in a space for recreation of the handwriting in a digital handwriting software application, such as a notetaking application.
  • the one or more handwriting recognition models 120 can also include models that receive Unicode (either from other models that recognize traditional handwriting from images or from digital handwriting software applications) and output stroke information, or x-y coordinates tracked over time (which can define the style of the handwriting in addition to identifying the characters and words in the received handwriting).
  • the recognized style can then be provided to the synthetic generative model to create new handwriting (e.g., a set of strokes) for the user in the style of the received handwriting.
  • the output strokes are output to software applications where the user is modifying digital handwriting or adding new digital handwriting instead of being output as pixel information. In this way, the handwriting can be accurately recreated and the recognized handwriting style can be preserved for the modifications or additions being made to the handwriting in the software application.
  • one or more handwriting recognition models 140 can be included in or otherwise stored and implemented by the server computing system 130 that communicates with the user computing device 102 according to a client-server relationship.
  • the handwriting recognition models 140 can be implemented by the server computing system 140 as a portion of a web service (e.g., a handwriting recognition service).
  • a web service e.g., a handwriting recognition service
  • one or more models 120 can be stored and implemented at the user computing device 102 and/or one or more models 140 can be stored and implemented at the server computing system 130.
  • the user computing device 102 can also include one or more user input components 122 that receives user input.
  • the user input component 122 can be a touch-sensitive component (e.g., a touch-sensitive display screen or a touch pad) that is sensitive to the touch of a user input object (e.g., a finger or a stylus).
  • the touch-sensitive component can serve to implement a virtual keyboard.
  • Other example user input components include a microphone, a traditional keyboard, or other means by which a user can provide user input.
  • the server computing system 130 includes one or more processors 132 and a memory 134.
  • the one or more processors 132 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
  • the memory 134 can include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
  • the memory 134 can store data 136 and instructions 138 which are executed by the processor 132 to cause the server computing system 130 to perform operations.
  • the server computing system 130 includes or is otherwise implemented by one or more server computing devices. In instances in which the server computing system 130 includes plural server computing devices, such server computing devices can operate according to sequential computing architectures, parallel computing architectures, or some combination thereof.
  • the server computing system 130 can store or otherwise include one or more handwriting recognition models 140.
  • the models 140 can be or can otherwise include various machine-learned models.
  • Example machine-learned models include neural networks or other multi-layer non-linear models.
  • Example neural networks include feed forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks.
  • Some example machine-learned models can leverage an attention mechanism such as self-attention.
  • some example machine-learned models can include multi-headed self-attention models (e g., transformer models).
  • Example models 140 are discussed with reference to Figures 2 and 3.
  • the user computing device 102 and/or the server computing system 130 can train the models 120 and/or 140 via interaction with the training computing system 150 that is communicatively coupled over the network 180.
  • the training computing system 150 can be separate from the server computing system 130 or can be a portion of the server computing system 130.
  • the training computing system 150 includes one or more processors 152 and a memory 154.
  • the one or more processors 152 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
  • the memory 154 can include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
  • the memory 154 can store data 156 and instructions 158 which are executed by the processor 152 to cause the training computing system 150 to perform operations.
  • the training computing system 150 includes or is otherwise implemented by one or more server computing devices.
  • the training computing system 150 can include a model trainer 160 that trains the machine-learned models 120 and/or 140 stored at the user computing device 102 and/or the server computing system 130 using various training or learning techniques, such as, for example, backwards propagation of errors.
  • a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function).
  • Various loss functions can be used such as mean squared error, likelihood loss, cross entropy loss, hinge loss, and/or various other loss functions.
  • Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations.
  • performing backwards propagation of errors can include performing truncated backpropagation through time.
  • the model trainer 160 can perform a number of generalization techniques (e.g., weight decays, dropouts, etc.) to improve the generalization capability of the models being trained.
  • the model trainer 160 can tram the handwnting recognition models 120 and/or 140 based on a set of training data 162.
  • the training data 162 can include, for example, various examples of both traditional handwriting (e.g., images of handwriting on a sheet of paper) and digital handwriting.
  • the model trainer 160 can perform supervised training for the handwriting recognition models 120 and/or 140 to enable the handwriting recognition models 120 and/or 140 to identify various characters and words in different handwriting styles and mediums based on identified stroke information.
  • the model trainer 160 can also train a synthetic handwriting model that sits on top of the handwriting recognition models 120 and/or 140.
  • the synthetic handwriting model can be trained to recognize various idiosyncrasies within particular handwriting styles.
  • Training data for the synthetic handwriting model can include traditional handwriting and digital handwriting samples that include particular mannerisms for writing characters (e.g., putting a line through the number 7, dotting an “i” character with a particular symbol, underlining words that end in “y” by creating the underline from the tail of the “y” character, and other particular mannerisms) or other idiosy ncrasies (mixed use of cursive and printed characters, the direction handwriting slants, and the like).
  • the training examples can be provided by the user computing device 102.
  • the model 120 provided to the user computing device 102 can be trained by the training computing system 150 on user-specific data received from the user computing device 102. In some instances, this process can be referred to as personalizing the model.
  • the model trainer 160 includes computer logic utilized to provide desired functionality.
  • the model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor.
  • the model trainer 160 includes program files stored on a storage device, loaded into a memory and executed by one or more processors.
  • the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM, hard disk, or optical or magnetic media.
  • the network 180 can be any type of communications network, such as a local area network (e.g., intranet), wide area network (e.g., Internet), or some combination thereof and can include any number of wired or wireless links.
  • communication over the network 180 can be carried via any type of wired and/or wireless connection, using a wide variety of communication protocols (e.g., TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g., HTML, XML), and/or protection schemes (e.g., VPN, secure HTTP, SSL).
  • the input to the machine-learned model(s) of the present disclosure can be image data.
  • the machine-learned model(s) can process the image data to generate an output.
  • the machine-learned model(s) can process the image data to generate an image recognition output (e.g., a recognition of the image data, a latent embedding of the image data, an encoded representation of the image data, a hash of the image data, etc.).
  • the machine-learned model(s) can process the image data to generate an image segmentation output.
  • the machine-learned model(s) can process the image data to generate an image classification output.
  • the machine-learned model(s) can process the image data to generate an image data modification output (e.g., an alteration of the image data, etc.).
  • the machine-learned model(s) can process the image data to generate an encoded image data output (e.g., an encoded and/or compressed representation of the image data, etc.).
  • the machine-learned model(s) can process the image data to generate an upscaled image data output.
  • the machine-learned model(s) can process the image data to generate a prediction output.
  • the input to the machine-learned model(s) of the present disclosure can be text or natural language data.
  • the machine-learned model(s) can process the text or natural language data to generate an output.
  • the machine-learned model(s) can process the natural language data to generate a language encoding output.
  • the machine-learned model(s) can process the text or natural language data to generate a latent text embedding output.
  • the machine-learned model(s) can process the text or natural language data to generate a translation output.
  • the machine-learned model(s) can process the text or natural language data to generate a classification output.
  • the machine-learned model(s) can process the text or natural language data to generate a textual segmentation output.
  • the machine-learned model(s) can process the text or natural language data to generate a semantic intent output.
  • the machine-learned model(s) can process the text or natural language data to generate an upscaled text or natural language output (e g., text or natural language data that is higher quality than the input text or natural language, etc.).
  • the machine-learned model(s) can process the text or natural language data to generate a prediction output.
  • the input includes visual data and the task is a computer vision task.
  • the input includes pixel data for one or more images and the task is an image processing task.
  • the image processing task can be image classification, where the output is a set of scores, each score corresponding to a different object class and representing the likelihood that the one or more images depict an object belonging to the object class.
  • the image processing task may be object detection, where the image processing output identifies one or more regions in the one or more images and, for each region, a likelihood that region depicts an object of interest.
  • the image processing task can be image segmentation, where the image processing output defines, for each pixel in the one or more images, a respective likelihood for each category in a predetermined set of categories.
  • the set of categories can be foreground and background.
  • the set of categories can be object classes.
  • the image processing task can be depth estimation, where the image processing output defines, for each pixel in the one or more images, a respective depth value.
  • the image processing task can be motion estimation, where the network input includes multiple images, and the image processing output defines, for each pixel of one of the input images, a motion of the scene depicted at the pixel between the images in the network input.
  • Figure 1 A illustrates one example computing system that can be used to implement the present disclosure.
  • the user computing device 102 can include the model trainer 160 and the training dataset 162.
  • the models 120 can be both trained and used locally at the user computing device 102.
  • the user computing device 102 can implement the model trainer 160 to personalize the models 120 based on user-specific data.
  • Figure IB depicts a block diagram of an example computing device 10 that performs according to example embodiments of the present disclosure.
  • the computing device 10 can be a user computing device or a server computing device.
  • the computing device 10 includes a number of applications (e g., applications 1 through N). Each application contains its own machine learning library and machine-learned model(s). For example, each application can include a machine-learned model.
  • Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc.
  • each application can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components.
  • each application can communicate with each device component using an API (e.g., a public API).
  • the API used by each application is specific to that application.
  • FIG. 1C depicts a block diagram of an example computing device 50 that performs according to example embodiments of the present disclosure.
  • the computing device 50 can be a user computing device or a server computing device.
  • the computing device 50 includes a number of applications (e.g., applications 1 through N). Each application is in communication with a central intelligence layer.
  • Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc.
  • each application can communicate with the central intelligence layer (and model(s) stored therein) using an API (e.g., a common API across all applications).
  • the central intelligence layer includes a number of machine-learned models. For example, as illustrated in Figure 1C, a respective machine-learned model can be provided for each application and managed by the central intelligence layer. In other implementations, two or more applications can share a single machine-learned model. For example, in some implementations, the central intelligence layer can provide a single model for all of the applications. In some implementations, the central intelligence layer is included within or otherwise implemented by an operating system of the computing device 50. [58] The central intelligence layer can communicate with a central device data layer. The central device data layer can be a centralized repository of data for the computing device 50.
  • the central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components.
  • the central device data layer can communicate with each device component using an API (e.g., a private API).
  • Figure 2 depicts a block diagram of an example computing system 200 according to example embodiments of the present disclosure.
  • the computing system 200 receives an input image 205 of traditional handwriting, such as an image of handwriting on a piece of paper, a dry erase board, a black board, or another writing medium.
  • the input image 205 is provided to handwriting recognition models 210, which are stored in a memory of the computing system 200.
  • the handwriting recognition models 210 can perform image recognition tasks to recognize characters and/or words in the input image 205. This can be accomplished in one or more ways.
  • the handwriting models 210 can generate a Unicode representation of the one or more characters that are recognized. This Unicode representation can be provided to a first model, such as a synthetic handwriting model 215, of the handwriting recognition models 210, which can then output strokes, or data that represent x-y coordinates of a writing utensil (traditional or digital) over time, that represent the handwriting in a space, where the strokes are determined based on the Unicode representation of the handwriting.
  • the synthetic handwriting model 215 can also generate a style for the handwriting based on style characteristics identified by the handwriting recognition models 210 and/or the synthetic handwriting model 215. Any visual representation of the handwriting can be generated using the identified style characteristics and future digital writings that are added into the handwriting (or modifications made to the handwriting) can also take on these style characteristics. For example, if a user is copying over a recipe from a notebook that is in the handwriting and style of a grandmother or other particular person, the user may take additional notes on a software application that can process digital handwriting to augment or make additional notes for the recipe.
  • the output additional writing can therefore be made in the handwriting and style of the grandmother or other particular person, which allows the additional handwriting to appear as if it was made by the grandmother or particular person.
  • the original handwriting is preserved and any additional handwriting or changes can be made to appear as if the original writer made them, which can preserve sentimental value associated with the original handwriting even as it is moved into digital handwriting.
  • the generated strokes can then be provided to a note-taking application 220, which can generate a visual representation of the handwriting 225 for display on the computing system 200 to the user based on the received x-y coordinates and any style characteristics received from the synthetic handwriting model 215.
  • the note-taking application 220 is representative of any other number of software applications that can process digital handwriting, such as word processing applications, Portable Document Format applications, digital art applications, and the like.
  • the handwriting recognition models 210 can perform image recognition on the image to identify pixels representing one or more characters in the image. The identified pixels can then be provided to a first model of the handwriting recognition models, such as the synthetic handwriting model 215.
  • the synthetic handwriting model 215 can then output strokes, which are determined based on the identified pixels in the image. The strokes can then be provided to the note-taking application 220, which in turn can generate the visual representation of the handwriting 225 using the strokes as described above.
  • the synthetic handwriting model 215 can be used to generate the visual representation 225 of the handw riting, and can provide the visual representation 225 of the handw riting to the note-taking application 220 for display.
  • the output of the note taking application 220 can include both a stylized-representation of digital handwriting (e.g., a stylized version of the handwritten content expressed in a format associated with the note taking application 220) and also the underlying Unicode (or other digital character format) for the handwritten content.
  • the proposed system can provide an output that includes both stylized digital handwriting and non-stylized digital character representations.
  • FIG. 3 depicts a block diagram of an example computing system 300 according to example embodiments of the present disclosure.
  • the computing system 300 receives a Unicode representation 305 of digital handwriting, such as handwriting produced by a user interacting with a touch screen via a finger or stylus through a note-taking application or other software application that allows a user to create digital handwriting.
  • the Unicode representation 305 can be provided to handwriting recognition models 310, which in turn can output the strokes associated with the Unicode representation 305 of the handwriting.
  • synthetic handwriting model 315 can output the strokes that are associated with the Unicode representation 305 of the handwriting. These strokes can then be provided to a note-taking application 320, which can in turn display a visual representation 325 of the handwriting for further editing and additions.
  • the synthetic handwriting model 315 can be used to generate the visual representation 325 of the handwriting, and can provide the visual representation 325 of the handwriting to the note-taking application 320 for display.
  • the output of the note taking application 320 can include both a stylized-representation of digital handwriting (e.g., a stylized version of the handwritten content expressed in a format associated with the note taking application 320) and also the underlying Unicode (or other digital character format) for the handwritten content.
  • the proposed system can provide an output that includes both stylized digital handwriting and non-stylized digital character representations.
  • Figure 4 depicts a flow chart diagram of an example method 400 to perform handwriting recognition according to example embodiments of the present disclosure.
  • Figure 4 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement.
  • the various steps of the method 400 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
  • a computing system receives receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting.
  • the data representing the handwriting can be an input image of traditional handwriting on a handwriting medium (e.g., paper, dry erase board, and the like) or a Unicode representation of digital handwriting, such as handwriting produced using a finger or stylus in a note-taking application or other application that supports a user creating digital handwriting.
  • a handwriting medium e.g., paper, dry erase board, and the like
  • a Unicode representation of digital handwriting such as handwriting produced using a finger or stylus in a note-taking application or other application that supports a user creating digital handwriting.
  • the style characteristic can be any number of various idiosyncrasies within particular handwriting styles. Style characteristics can include particular mannerisms for writing characters (e.g., putting a line through the number 7, dotting an “i” character with a particular symbol, underlining words that end in “y” by creating the underline from the tail of the “y” character, and other particular mannerisms) or other idiosyncrasies (mixed use of cursive and printed characters, the direction handwriting slants, and the like). [71] At 410, the computing system processes the data representing the handwriting with one or more handwriting recognition models.
  • the one or more handwriting recognition models can take the data representing the handwriting and perform various processing on the data, such as the processing described above with regards to Figures 2 and 3, to convert the data into strokes, or x-y coordinates tracked over time in a space.
  • This processing can include processing an input image to obtain a Unicode representation of one or more characters in the image, processing the input image to identify pixels that represent one or more characters in the image, and processing a Unicode representation of digital handwriting.
  • the computing system can receive the strokes associated with the handwriting from the one or more handwriting recognition models.
  • the computing system can provide the strokes coordinates to a software application that can process digital handwriting, such as a note-taking application.
  • the computing system can generate a visual representation of the handwriting in the note-taking application based on the strokes.
  • the strokes are placed in a space in the software application for digital handwriting based on the coordinate values and the time data associated with each portion of each of the one or more strokes, and a visual representation of the handwriting is generated and displayed within the space based on the strokes.
  • the visual representation of the handwriting is also generated based on the style characteristic(s) associated with the received handwriting.
  • the style characteristic(s) can be identified by, for example, a synthetic handwriting model that tracks individual stylistic idiosyncrasies for the received handwriting.
  • the style characteristic(s) can be applied to the visual representation of the handwriting when the visual representation is displayed in the note-taking application.
  • the computing system can generate a style for future digital handwriting in the note-taking application.
  • the style can be generated based on the strokes and the at least one style characteristic of the handwriting. For example, if a user of the notetaking application adds more digital handwriting to the visual representation of the handwriting, stylistic characteristics that are consistent with the style of the visual representation can be applied to the newly-added digital handwriting, such as particular ways to write certain characters, slant characteristics of the handwriting, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

Systems and methods for universal handwriting recognition are disclosed herein. The method can include receiving data representing handwriting, including at least one style characteristic of the handwriting, and processing the data representing the handwriting with one or more handwriting recognition models. The method can also include receiving one or more strokes associated with the handwriting from the one or more handwriting recognition models and providing the strokes to a software application. The method can further include generating a visual representation of the handwriting in the software application based on the strokes and generating a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.

Description

UNIVERSAL HANDWRITING RECOGNITION
FIELD
[1] The present disclosure relates generally to handwriting recognition. More particularly, the present disclosure relates to recognizing both traditional handwriting on a sheet of paper or other writing medium and digital handwriting and preserving the characters and the style characteristics of the handwriting for future use in additional handwriting being performed in a software application that can process digital handwriting, such as a note-taking application.
BACKGROUND
[2] Many different methods of recognizing handwriting exist today. For example, optical character recognition attempts to recognize handwriting from images of a handwritten note, and digital handwriting understanding attempts to recognize handwriting coming from digital devices (e.g., writing with a finger or a stylus on a touchscreen). However, these approaches do not work synchronously with one another and a method of recognizing handwriting regardless of input method (image of handwriting or digital handwriting) does not currently exist.
SUMMARY
[3] Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments.
[4] One example aspect of the present disclosure is directed to a computer-implemented method for performing handwriting recognition. The method can include receiving, by an electronic processor, data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting and processing, by the electronic processor, the data representing the handwriting with one or more handwriting recognition models. The method can also include receiving, by the electronic processor, one or more strokes associated with the handwriting from the one or more handwriting recognition models and providing, by the electronic processor, the strokes to a software application that supports digital handwriting. The method can further include generating, by the electronic processor, a visual representation of the handwriting in the software application based on the strokes and generating, by the electronic processor, a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.
[5] Another example aspect of the present disclosure is directed to a computing system for performing handwriting recognition. The computing system include sone or more electronic processors and a non-transitory, computer readable medium comprising instructions that, when executed by the one or more electronic processors, cause the one or more electronic processors to perform operations. The operations can include receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handw riting and processing the data representing the handwriting with one or more handwTiting recognition models. The operations can also include receiving one or more strokes associated with the handwriting from the one or more handwriting recognition models and providing the strokes to a software application that supports digital handwriting. The operations can further include generating a visual representation of the handwriting in the software application based on the strokes and generating a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handw riting.
[6] Another example aspect of the present disclosure is directed to a non-transitory, computer readable medium comprising instructions that, when executed by one or more electronic processors, cause the one or more electronic processors to perform operations. The operations can include receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting and processing the data representing the handwriting with one or more handwriting recognition models. The operations can also include receiving one or more strokes associated with the handwriting from the one or more handwriting recognition models and providing the strokes to a software application that supports digital handwriting. The operations can further include generating a visual representation of the handwriting in the software application based on the strokes and generating a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.
[7] Other aspects of the present disclosure are directed to various systems, apparatuses, non-transitory computer-readable media, user interfaces, and electronic devices.
[8] These and other features, aspects, and advantages of various embodiments of the present disclosure will become better understood with reference to the following description and appended claims. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate example embodiments of the present disclosure and, together with the description, serve to explain the related principles.
BRIEF DESCRIPTION OF THE DRAWINGS
[9] Detailed discussion of embodiments directed to one of ordinary skill in the art is set forth in the specification, which makes reference to the appended figures, in which:
[10] Figure 1 A depicts a block diagram of an example computing system that performs handwriting recognition according to example embodiments of the present disclosure.
[11] Figure IB depicts a block diagram of an example computing device that performs handwriting recognition according to example embodiments of the present disclosure.
[12] Figure 1C depicts a block diagram of an example computing device that performs handwriting recognition according to example embodiments of the present disclosure.
[13] Figure 2 depicts a block diagram of an example computing system according to example embodiments of the present disclosure.
[14] Figure 3 depicts a block diagram of an example computing system according to example embodiments of the present disclosure.
[15] Figure 4 depicts a flow chart diagram of an example method to perform handwriting recognition according to example embodiments of the present disclosure.
[16] Reference numerals that are repeated across plural figures are intended to identify the same features in various implementations.
DETAILED DESCRIPTION
Overview
[17] Generally, the present disclosure is directed to universal handwriting recognition, or recognizing handwriting input by any medium (as an image of an actual handwritten note or as digital handwriting) and then allowing the user of a computing device to continue to write digitally using the recognized handwriting (e.g., maintaining the style of the received handwriting or changing the style of the received handwriting to a preferred style). In this way, the present invention can enable users to edit recognized handwriting as if the handwriting had been performed in real-time by the user immediately prior.
[18] Additionally, the present invention can enable users to quickly change styles of handwriting input to capture certain idiosyncrasies of different handwriting styles, such as a particular way of writing certain characters and the like. Furthermore, the present invention enables the user to change an existing style of the received handwriting to beautify or otherwise alter the handwriting, which can enable the handwriting to be more recognizable (e.g., a scribbled doctor’s note can be recognized and then converted into a handwriting style that is more readable).
[19] In another example, a user may wish to import an old recipe written down on a piece of paper from their grandmother and make further digital modifications to the recipe without ruining the piece of paper that has the original recipe written down on it by adding new marks. The user may also wish to preserve sentimental value of the handwnting (e.g., preserving a grandmother’s handwriting and not writing over or around it in a different script or handwriting style). The present invention enables the user to import the handwriting into a digital format as-is and then make further marks in digital handwriting in the same style and handwriting as the imported handwriting (e.g., as if the grandmother of the user had made any modifications or additions herself).
[20] The present invention enables users of digital handwriting software applications to quickly and efficiently import both traditional handwriting and digital handwriting from a variety of sources into a digital handwriting software application while preserving idiosyncratic style associated with the handwriting without the need to copyindividual pixels. Instead, the present invention enables machine learning models to leam “strokes” of the handwriting (traditional and digital) by identifying x-y input coordinates and tracking them over time. In this way, instead of simply copying over pixels and not being able to maintain stylistic idiosyncrasies associated with the handwriting, the machine learning models can leam how the input writing was written and then be used to, for example, allow changes to be made to the handwriting in a digital handwriting software application while maintaining similar style characteristics and idiosyncrasies (e.g., how a writer writes particular characters) for the changes being made by the user of the software application.
[21] Thus, example computing systems of the present disclosure are able to convert handwriting to a text-based representation such as Unicode. The computing system can then convert the text-based representation (e.g., Unicode) to a digital handwriting format (e.g., a “digital ink” format associated with a particular note taking application, which may, for example, be represented as a set of strokes in x-y space). For example, a generative model can be trained to convert from text-based representations to digital handwriting formats. The digital handwriting format can be expressed or represented using the stylization of original handwriting or a different stylization. The resulting data fde can include both the text-based representation (e.g., Unicode) and the digital handwriting representation. This enables the user to continue interacting with the digital handwriting in the note-taking application while preserving the text-based representation for other uses.
[22] With reference now to the Figures, example embodiments of the present disclosure will be discussed in further detail.
Example Devices and Systems
[23] Figure 1A depicts a block diagram of an example computing system 100 that performs handwriting recognition according to example embodiments of the present disclosure. The system 100 includes a user computing device 102, a server computing system 130, and a training computing system 150 that are communicatively coupled over a network 180.
[24] The user computing device 102 can be any type of computing device, such as, for example, a personal computing device (e.g., laptop or desktop), a mobile computing device (e.g., smartphone or tablet), a gaming console or controller, a wearable computing device, an embedded computing device, or any other type of computing device.
[25] The user computing device 102 includes one or more processors 112 and a memory 114. The one or more processors 112 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. The memory 114 can include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. The memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations.
[26] In some implementations, the user computing device 102 can store or include one or more handwriting recognition models 120. For example, the handwriting recognition models 120 can be or can otherwise include various machine-learned models such as neural networks (e.g., deep neural networks) or other types of machine-learned models, including non-linear models and/or linear models. Neural networks can include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks. Some example machine- learned models can leverage an attention mechanism such as self-attention. For example, some example machine-learned models can include multi-headed self-attention models (e.g., transformer models). Example handwriting recognition models 120 are discussed with reference to Figures 2 and 3. [27] In some implementations, the one or more handwriting recognition models 120 can be received from the server computing system 130 over network 180, stored in the user computing device memory 114, and then used or otherwise implemented by the one or more processors 112. In some implementations, the user computing device 102 can implement multiple parallel instances of a single handwriting recognition model 120 (e.g., to perform parallel handwriting recognition across multiple instances of received handwriting).
[28] More particularly, the one or more handwriting recognition models 120 can generally recognize handwriting received from any medium (e.g., receiving traditional handwriting as an image and recognizing the handwriting and/or receiving digital handwriting and recognizing the handwriting based on received x-y coordinates and the sequence of the reception of these coordinates over time of a user writing digital handwriting and/or other means). The handwriting can be recognized as a series of “strokes,” which are defined as the x-y coordinates being input over time to show how the user physically inputs the handwriting, thus capturing the actual act of a user writing something.
[29] Based on the recognized strokes, a synthetic generative model can be used to generate further handwriting that shares a style with the received handwriting, including various idiosyncrasies of the handwriting (particular style choices, ways of writing particular characters, and the like). The synthetic generative model (or synthetic handwriting model) can sit on top of the one or more handwriting recognition models 120 and receive from the one or more handwriting recognition models 120 the sty le of the recognized handwriting, and then output additional handwriting as a user continues to add to or modify the received handwriting in the same style as the received handwriting. These idiosyncrasies may also be referred to herein as “style characteristics” of the handwriting.
[30] To recognize the strokes of the handwriting and any particular stylistic choices, the one or more handwriting recognition models 120 can receive various inputs that represent handwriting. For example, the handwriting recognition models 120 can receive an image of handwriting on a sheet of paper or other writing medium and can perform image recognition to recognize characters and/or words in the image. For received traditional handwriting (e.g., handwriting on a sheet of paper), the one or more handwriting recognition models 120 can also determine stroke information that would be associated with the handwriting, despite not having stroke information that is associated with digital handwriting. For example, based on learned stroke information from digital handwriting or from other sources, the one or more handwriting recognition models 120 can predict one or more strokes (e.g., predict when a writer of the handwriting wrote each portion of the handwriting) associated with the handwriting in the image, even though information related to the actual input of the handwriting is not available.
[31] The handwriting recognition models 120 can then output the recognized handwriting as Unicode for other models to process. In another embodiment, the handwriting recognition models 120 can recognize characters as pixels and output strokes in a space for recreation of the handwriting in a digital handwriting software application, such as a notetaking application.
[32] The one or more handwriting recognition models 120 can also include models that receive Unicode (either from other models that recognize traditional handwriting from images or from digital handwriting software applications) and output stroke information, or x-y coordinates tracked over time (which can define the style of the handwriting in addition to identifying the characters and words in the received handwriting). The recognized style can then be provided to the synthetic generative model to create new handwriting (e.g., a set of strokes) for the user in the style of the received handwriting. The output strokes are output to software applications where the user is modifying digital handwriting or adding new digital handwriting instead of being output as pixel information. In this way, the handwriting can be accurately recreated and the recognized handwriting style can be preserved for the modifications or additions being made to the handwriting in the software application.
[33] Additionally or alternatively, one or more handwriting recognition models 140 can be included in or otherwise stored and implemented by the server computing system 130 that communicates with the user computing device 102 according to a client-server relationship. For example, the handwriting recognition models 140 can be implemented by the server computing system 140 as a portion of a web service (e.g., a handwriting recognition service). Thus, one or more models 120 can be stored and implemented at the user computing device 102 and/or one or more models 140 can be stored and implemented at the server computing system 130.
[34] The user computing device 102 can also include one or more user input components 122 that receives user input. For example, the user input component 122 can be a touch- sensitive component (e.g., a touch-sensitive display screen or a touch pad) that is sensitive to the touch of a user input object (e.g., a finger or a stylus). The touch-sensitive component can serve to implement a virtual keyboard. Other example user input components include a microphone, a traditional keyboard, or other means by which a user can provide user input.
[35] The server computing system 130 includes one or more processors 132 and a memory 134. The one or more processors 132 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. The memory 134 can include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. The memory 134 can store data 136 and instructions 138 which are executed by the processor 132 to cause the server computing system 130 to perform operations.
[36] In some implementations, the server computing system 130 includes or is otherwise implemented by one or more server computing devices. In instances in which the server computing system 130 includes plural server computing devices, such server computing devices can operate according to sequential computing architectures, parallel computing architectures, or some combination thereof.
[37] As described above, the server computing system 130 can store or otherwise include one or more handwriting recognition models 140. For example, the models 140 can be or can otherwise include various machine-learned models. Example machine-learned models include neural networks or other multi-layer non-linear models. Example neural networks include feed forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks. Some example machine-learned models can leverage an attention mechanism such as self-attention. For example, some example machine-learned models can include multi-headed self-attention models (e g., transformer models). Example models 140 are discussed with reference to Figures 2 and 3.
[38] The user computing device 102 and/or the server computing system 130 can train the models 120 and/or 140 via interaction with the training computing system 150 that is communicatively coupled over the network 180. The training computing system 150 can be separate from the server computing system 130 or can be a portion of the server computing system 130.
[39] The training computing system 150 includes one or more processors 152 and a memory 154. The one or more processors 152 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, an FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. The memory 154 can include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. The memory 154 can store data 156 and instructions 158 which are executed by the processor 152 to cause the training computing system 150 to perform operations. In some implementations, the training computing system 150 includes or is otherwise implemented by one or more server computing devices.
[40] The training computing system 150 can include a model trainer 160 that trains the machine-learned models 120 and/or 140 stored at the user computing device 102 and/or the server computing system 130 using various training or learning techniques, such as, for example, backwards propagation of errors. For example, a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function). Various loss functions can be used such as mean squared error, likelihood loss, cross entropy loss, hinge loss, and/or various other loss functions. Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations.
[41] In some implementations, performing backwards propagation of errors can include performing truncated backpropagation through time. The model trainer 160 can perform a number of generalization techniques (e.g., weight decays, dropouts, etc.) to improve the generalization capability of the models being trained.
[42] In particular, the model trainer 160 can tram the handwnting recognition models 120 and/or 140 based on a set of training data 162. The training data 162 can include, for example, various examples of both traditional handwriting (e.g., images of handwriting on a sheet of paper) and digital handwriting. The model trainer 160 can perform supervised training for the handwriting recognition models 120 and/or 140 to enable the handwriting recognition models 120 and/or 140 to identify various characters and words in different handwriting styles and mediums based on identified stroke information.
[43] The model trainer 160 can also train a synthetic handwriting model that sits on top of the handwriting recognition models 120 and/or 140. The synthetic handwriting model can be trained to recognize various idiosyncrasies within particular handwriting styles. Training data for the synthetic handwriting model can include traditional handwriting and digital handwriting samples that include particular mannerisms for writing characters (e.g., putting a line through the number 7, dotting an “i” character with a particular symbol, underlining words that end in “y” by creating the underline from the tail of the “y” character, and other particular mannerisms) or other idiosy ncrasies (mixed use of cursive and printed characters, the direction handwriting slants, and the like).
[44] In some implementations, if the user has provided consent, the training examples can be provided by the user computing device 102. Thus, in such implementations, the model 120 provided to the user computing device 102 can be trained by the training computing system 150 on user-specific data received from the user computing device 102. In some instances, this process can be referred to as personalizing the model.
[45] The model trainer 160 includes computer logic utilized to provide desired functionality. The model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor. For example, in some implementations, the model trainer 160 includes program files stored on a storage device, loaded into a memory and executed by one or more processors. In other implementations, the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM, hard disk, or optical or magnetic media.
[46] The network 180 can be any type of communications network, such as a local area network (e.g., intranet), wide area network (e.g., Internet), or some combination thereof and can include any number of wired or wireless links. In general, communication over the network 180 can be carried via any type of wired and/or wireless connection, using a wide variety of communication protocols (e.g., TCP/IP, HTTP, SMTP, FTP), encodings or formats (e.g., HTML, XML), and/or protection schemes (e.g., VPN, secure HTTP, SSL).
[47] The machine-learned models described in this specification may be used in a variety of tasks, applications, and/or use cases.
[48] In some implementations, the input to the machine-learned model(s) of the present disclosure can be image data. The machine-learned model(s) can process the image data to generate an output. As an example, the machine-learned model(s) can process the image data to generate an image recognition output (e.g., a recognition of the image data, a latent embedding of the image data, an encoded representation of the image data, a hash of the image data, etc.). As another example, the machine-learned model(s) can process the image data to generate an image segmentation output. As another example, the machine-learned model(s) can process the image data to generate an image classification output. As another example, the machine-learned model(s) can process the image data to generate an image data modification output (e.g., an alteration of the image data, etc.). As another example, the machine-learned model(s) can process the image data to generate an encoded image data output (e.g., an encoded and/or compressed representation of the image data, etc.). As another example, the machine-learned model(s) can process the image data to generate an upscaled image data output. As another example, the machine-learned model(s) can process the image data to generate a prediction output.
[49] In some implementations, the input to the machine-learned model(s) of the present disclosure can be text or natural language data. The machine-learned model(s) can process the text or natural language data to generate an output. As an example, the machine-learned model(s) can process the natural language data to generate a language encoding output. As another example, the machine-learned model(s) can process the text or natural language data to generate a latent text embedding output. As another example, the machine-learned model(s) can process the text or natural language data to generate a translation output. As another example, the machine-learned model(s) can process the text or natural language data to generate a classification output. As another example, the machine-learned model(s) can process the text or natural language data to generate a textual segmentation output. As another example, the machine-learned model(s) can process the text or natural language data to generate a semantic intent output. As another example, the machine-learned model(s) can process the text or natural language data to generate an upscaled text or natural language output (e g., text or natural language data that is higher quality than the input text or natural language, etc.). As another example, the machine-learned model(s) can process the text or natural language data to generate a prediction output.
[50] In some cases, the input includes visual data and the task is a computer vision task. In some cases, the input includes pixel data for one or more images and the task is an image processing task. For example, the image processing task can be image classification, where the output is a set of scores, each score corresponding to a different object class and representing the likelihood that the one or more images depict an object belonging to the object class. The image processing task may be object detection, where the image processing output identifies one or more regions in the one or more images and, for each region, a likelihood that region depicts an object of interest. As another example, the image processing task can be image segmentation, where the image processing output defines, for each pixel in the one or more images, a respective likelihood for each category in a predetermined set of categories. For example, the set of categories can be foreground and background. As another example, the set of categories can be object classes. As another example, the image processing task can be depth estimation, where the image processing output defines, for each pixel in the one or more images, a respective depth value. As another example, the image processing task can be motion estimation, where the network input includes multiple images, and the image processing output defines, for each pixel of one of the input images, a motion of the scene depicted at the pixel between the images in the network input.
[51] Figure 1 A illustrates one example computing system that can be used to implement the present disclosure. Other computing systems can be used as well. For example, in some implementations, the user computing device 102 can include the model trainer 160 and the training dataset 162. In such implementations, the models 120 can be both trained and used locally at the user computing device 102. In some of such implementations, the user computing device 102 can implement the model trainer 160 to personalize the models 120 based on user-specific data.
[52] Figure IB depicts a block diagram of an example computing device 10 that performs according to example embodiments of the present disclosure. The computing device 10 can be a user computing device or a server computing device.
[53] The computing device 10 includes a number of applications (e g., applications 1 through N). Each application contains its own machine learning library and machine-learned model(s). For example, each application can include a machine-learned model. Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc.
[54] As illustrated in Figure IB, each application can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components. In some implementations, each application can communicate with each device component using an API (e.g., a public API). In some implementations, the API used by each application is specific to that application.
[55] Figure 1C depicts a block diagram of an example computing device 50 that performs according to example embodiments of the present disclosure. The computing device 50 can be a user computing device or a server computing device.
[56] The computing device 50 includes a number of applications (e.g., applications 1 through N). Each application is in communication with a central intelligence layer. Example applications include a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc. In some implementations, each application can communicate with the central intelligence layer (and model(s) stored therein) using an API (e.g., a common API across all applications).
[57] The central intelligence layer includes a number of machine-learned models. For example, as illustrated in Figure 1C, a respective machine-learned model can be provided for each application and managed by the central intelligence layer. In other implementations, two or more applications can share a single machine-learned model. For example, in some implementations, the central intelligence layer can provide a single model for all of the applications. In some implementations, the central intelligence layer is included within or otherwise implemented by an operating system of the computing device 50. [58] The central intelligence layer can communicate with a central device data layer. The central device data layer can be a centralized repository of data for the computing device 50. As illustrated in Figure 1C, the central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components. In some implementations, the central device data layer can communicate with each device component using an API (e.g., a private API).
Example Model Arrangements
[59] Figure 2 depicts a block diagram of an example computing system 200 according to example embodiments of the present disclosure. In some implementations, the computing system 200 receives an input image 205 of traditional handwriting, such as an image of handwriting on a piece of paper, a dry erase board, a black board, or another writing medium.
[60] The input image 205 is provided to handwriting recognition models 210, which are stored in a memory of the computing system 200. The handwriting recognition models 210 can perform image recognition tasks to recognize characters and/or words in the input image 205. This can be accomplished in one or more ways. In one embodiment, the handwriting models 210 can generate a Unicode representation of the one or more characters that are recognized. This Unicode representation can be provided to a first model, such as a synthetic handwriting model 215, of the handwriting recognition models 210, which can then output strokes, or data that represent x-y coordinates of a writing utensil (traditional or digital) over time, that represent the handwriting in a space, where the strokes are determined based on the Unicode representation of the handwriting. The synthetic handwriting model 215 can also generate a style for the handwriting based on style characteristics identified by the handwriting recognition models 210 and/or the synthetic handwriting model 215. Any visual representation of the handwriting can be generated using the identified style characteristics and future digital writings that are added into the handwriting (or modifications made to the handwriting) can also take on these style characteristics. For example, if a user is copying over a recipe from a notebook that is in the handwriting and style of a grandmother or other particular person, the user may take additional notes on a software application that can process digital handwriting to augment or make additional notes for the recipe. The output additional writing (augmentations or additional notes) can therefore be made in the handwriting and style of the grandmother or other particular person, which allows the additional handwriting to appear as if it was made by the grandmother or particular person. Thus, the original handwriting is preserved and any additional handwriting or changes can be made to appear as if the original writer made them, which can preserve sentimental value associated with the original handwriting even as it is moved into digital handwriting.
[61] The generated strokes can then be provided to a note-taking application 220, which can generate a visual representation of the handwriting 225 for display on the computing system 200 to the user based on the received x-y coordinates and any style characteristics received from the synthetic handwriting model 215. The note-taking application 220 is representative of any other number of software applications that can process digital handwriting, such as word processing applications, Portable Document Format applications, digital art applications, and the like.
[62] In another embodiment, the handwriting recognition models 210 can perform image recognition on the image to identify pixels representing one or more characters in the image. The identified pixels can then be provided to a first model of the handwriting recognition models, such as the synthetic handwriting model 215. The synthetic handwriting model 215 can then output strokes, which are determined based on the identified pixels in the image. The strokes can then be provided to the note-taking application 220, which in turn can generate the visual representation of the handwriting 225 using the strokes as described above.
[63] In some embodiments, the synthetic handwriting model 215 can be used to generate the visual representation 225 of the handw riting, and can provide the visual representation 225 of the handw riting to the note-taking application 220 for display.
[64] In some embodiments, the output of the note taking application 220 can include both a stylized-representation of digital handwriting (e.g., a stylized version of the handwritten content expressed in a format associated with the note taking application 220) and also the underlying Unicode (or other digital character format) for the handwritten content. Thus, the proposed system can provide an output that includes both stylized digital handwriting and non-stylized digital character representations.
[65] Figure 3 depicts a block diagram of an example computing system 300 according to example embodiments of the present disclosure. In some implementations, the computing system 300 receives a Unicode representation 305 of digital handwriting, such as handwriting produced by a user interacting with a touch screen via a finger or stylus through a note-taking application or other software application that allows a user to create digital handwriting. The Unicode representation 305 can be provided to handwriting recognition models 310, which in turn can output the strokes associated with the Unicode representation 305 of the handwriting. In some embodiments, synthetic handwriting model 315 can output the strokes that are associated with the Unicode representation 305 of the handwriting. These strokes can then be provided to a note-taking application 320, which can in turn display a visual representation 325 of the handwriting for further editing and additions.
[66] In some embodiments, the synthetic handwriting model 315 can be used to generate the visual representation 325 of the handwriting, and can provide the visual representation 325 of the handwriting to the note-taking application 320 for display.
[67] In some embodiments, the output of the note taking application 320 can include both a stylized-representation of digital handwriting (e.g., a stylized version of the handwritten content expressed in a format associated with the note taking application 320) and also the underlying Unicode (or other digital character format) for the handwritten content. Thus, the proposed system can provide an output that includes both stylized digital handwriting and non-stylized digital character representations.
Example Methods
[68] Figure 4 depicts a flow chart diagram of an example method 400 to perform handwriting recognition according to example embodiments of the present disclosure. Although Figure 4 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particularly illustrated order or arrangement. The various steps of the method 400 can be omitted, rearranged, combined, and/or adapted in various ways without deviating from the scope of the present disclosure.
[69] At 405, a computing system receives receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting.
The data representing the handwriting can be an input image of traditional handwriting on a handwriting medium (e.g., paper, dry erase board, and the like) or a Unicode representation of digital handwriting, such as handwriting produced using a finger or stylus in a note-taking application or other application that supports a user creating digital handwriting.
[70] The style characteristic can be any number of various idiosyncrasies within particular handwriting styles. Style characteristics can include particular mannerisms for writing characters (e.g., putting a line through the number 7, dotting an “i” character with a particular symbol, underlining words that end in “y” by creating the underline from the tail of the “y” character, and other particular mannerisms) or other idiosyncrasies (mixed use of cursive and printed characters, the direction handwriting slants, and the like). [71] At 410, the computing system processes the data representing the handwriting with one or more handwriting recognition models. The one or more handwriting recognition models can take the data representing the handwriting and perform various processing on the data, such as the processing described above with regards to Figures 2 and 3, to convert the data into strokes, or x-y coordinates tracked over time in a space. This processing can include processing an input image to obtain a Unicode representation of one or more characters in the image, processing the input image to identify pixels that represent one or more characters in the image, and processing a Unicode representation of digital handwriting.
[72] At 415, the computing system can receive the strokes associated with the handwriting from the one or more handwriting recognition models.
[73] At 420, the computing system can provide the strokes coordinates to a software application that can process digital handwriting, such as a note-taking application.
[74] At 425, the computing system can generate a visual representation of the handwriting in the note-taking application based on the strokes. The strokes are placed in a space in the software application for digital handwriting based on the coordinate values and the time data associated with each portion of each of the one or more strokes, and a visual representation of the handwriting is generated and displayed within the space based on the strokes. In some embodiments, the visual representation of the handwriting is also generated based on the style characteristic(s) associated with the received handwriting. The style characteristic(s) can be identified by, for example, a synthetic handwriting model that tracks individual stylistic idiosyncrasies for the received handwriting. The style characteristic(s) can be applied to the visual representation of the handwriting when the visual representation is displayed in the note-taking application.
[75] At 430, the computing system can generate a style for future digital handwriting in the note-taking application. The style can be generated based on the strokes and the at least one style characteristic of the handwriting. For example, if a user of the notetaking application adds more digital handwriting to the visual representation of the handwriting, stylistic characteristics that are consistent with the style of the visual representation can be applied to the newly-added digital handwriting, such as particular ways to write certain characters, slant characteristics of the handwriting, and the like. In a particular example, Additional Disclosure
[76] The technology discussed herein makes reference to servers, databases, software applications, and other computer-based systems, as well as actions taken and information sent to and from such systems. The inherent flexibility of computer-based systems allows for a great variety of possible configurations, combinations, and divisions of tasks and functionality between and among components. For instance, processes discussed herein can be implemented using a single device or component or multiple devices or components working in combination. Databases and applications can be implemented on a single system or distributed across multiple systems. Distributed components can operate sequentially or in parallel.
[77] While the present subject matter has been described in detail with respect to various specific example embodiments thereof, each example is provided by way of explanation, not limitation of the disclosure. Those skilled in the art, upon attaining an understanding of the foregoing, can readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, the subject disclosure does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. For instance, features illustrated or described as part of one embodiment can be used with another embodiment to yield a still further embodiment. Thus, it is intended that the present disclosure cover such alterations, variations, and equivalents.

Claims

WHAT IS CLAIMED IS:
1. A computer-implemented method for performing handwriting recognition, the method comprising: receiving, by an electronic processor, data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting; processing, by the electronic processor, the data representing the handwriting with one or more handwriting recognition models; receiving, by the electronic processor, one or more strokes associated with the handwriting from the one or more handwriting recognition models; providing, by the electronic processor, the strokes to a software application that supports digital handwriting; generating, by the electronic processor, a visual representation of the handwriting in the software application based on the strokes; and generating, by the electronic processor, a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.
2. The method of claim 1, wherein the data representing the handwriting is an image of handwriting wntten on a handwntmg medium.
3. The method of claim 2, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: performing image recognition on the image to identify one or more characters in the image; providing the identified characters to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the identified characters.
4. The method of claim 2, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: performing image recognition on the image to identify pixels representing one or more characters in the image; providing the identified pixels to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the identified pixels.
5. The method of claim 1, wherein the data representing the handwriting is a Unicode representation of digital handwriting.
6. The method of claim 5, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: providing the Unicode representation to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the Unicode representation.
7. The method of claim 1, wherein generating the visual representation of the handwriting in the software application includes: generating, with a synthetic generator model, the visual representation of the handwriting based on the one or more strokes and the at least one style characteristic.
8. A computing system for performing handwriting recognition, the computing system comprising: one or more electronic processors; and a non-transitory, computer readable medium comprising instructions that, when executed by the one or more electronic processors, cause the one or more electronic processors to perform operations, the operations comprising: receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting; processing the data representing the handwriting with one or more handwriting recognition models; receiving one or more strokes associated with the handwriting from the one or more handwriting recognition models; providing the strokes to a software application that supports digital handwriting; generating a visual representation of the handwriting in the software application based on the strokes; and generating a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.
9. The computing system of claim 8, wherein the data representing the handwriting is an image of handwriting written on a handwriting medium.
10. The computing system of claim 9, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: performing image recognition on the image to identify one or more characters in the image; providing the identified characters to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the identified characters.
11. The computing system of claim 9, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: performing image recognition on the image to identify' pixels representing one or more characters in the image; providing the identified pixels to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the identified pixels.
12. The computing system of claim 8, wherein the data representing the handwriting is a Unicode representation of digital handwriting.
13. The computing system of claim 12, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: providing the Unicode representation to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the Unicode representation.
14. The computing system of claim 8, wherein generating the visual representation of the handwriting in the software application includes: generating, with a synthetic generator model, the visual representation of the handwriting based on the one or more strokes and the at least one style characteristic.
15. A non-transitory, computer readable medium comprising instructions that, when executed by one or more electronic processors, cause the one or more electronic processors to perform operations, the operations comprising: receiving data representing handwriting, the data representing the handwriting including at least one style characteristic of the handwriting; processing the data representing the handwriting with one or more handwriting recognition models; receiving one or more strokes associated with the handwriting from the one or more handwriting recognition models; providing the strokes to a software application that supports digital handwriting; generating a visual representation of the handwriting in the software application based on the strokes; and generating a style for future digital handwriting in the software application, the style being generated based on the strokes and the at least one style characteristic of the handwriting.
16. The non-transitory, computer-readable medium of claim 15, wherein the data representing the handwriting is an image of handwriting written on a handwriting medium.
17. The non-transitory, computer-readable medium of claim 16, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: performing image recognition on the image to identify one or more characters in the image; providing the identified characters to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the identified characters.
18. The non-transitory, computer-readable medium of claim 16, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: performing image recognition on the image to identify' pixels representing one or more characters in the image; providing the identified pixels to a first model of the one or more handwnting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the identified pixels.
19. non-transitory, computer-readable medium of claim 15, wherein the data representing the handwriting is a Unicode representation of digital handwriting.
20. The non-transitory, computer-readable medium of claim 19, wherein processing the data representing the handwriting with the one or more handwriting recognition models includes: providing the Unicode representation to a first model of the one or more handwriting recognition models; and outputting, from the first model, the one or more strokes, the one or more strokes being determined based on the Unicode representation.
PCT/US2023/013939 2023-02-27 2023-02-27 Universal handwriting recognition Ceased WO2024181963A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2023/013939 WO2024181963A1 (en) 2023-02-27 2023-02-27 Universal handwriting recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2023/013939 WO2024181963A1 (en) 2023-02-27 2023-02-27 Universal handwriting recognition

Publications (1)

Publication Number Publication Date
WO2024181963A1 true WO2024181963A1 (en) 2024-09-06

Family

ID=85772675

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/013939 Ceased WO2024181963A1 (en) 2023-02-27 2023-02-27 Universal handwriting recognition

Country Status (1)

Country Link
WO (1) WO2024181963A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5347477A (en) * 1992-01-28 1994-09-13 Jack Lee Pen-based form computer
EP1635268A2 (en) * 2004-09-03 2006-03-15 Microsoft Corporation Freeform digital ink annotation recognition
US20170153806A1 (en) * 2015-12-01 2017-06-01 Myscript System and method for note taking with gestures
US20200272840A1 (en) * 2019-02-22 2020-08-27 Samsung Electronics Co., Ltd. Method and device for displaying handwriting-based entry

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5347477A (en) * 1992-01-28 1994-09-13 Jack Lee Pen-based form computer
EP1635268A2 (en) * 2004-09-03 2006-03-15 Microsoft Corporation Freeform digital ink annotation recognition
US20170153806A1 (en) * 2015-12-01 2017-06-01 Myscript System and method for note taking with gestures
US20200272840A1 (en) * 2019-02-22 2020-08-27 Samsung Electronics Co., Ltd. Method and device for displaying handwriting-based entry

Similar Documents

Publication Publication Date Title
US11709992B2 (en) System and method for collaborative ink management
US11740783B2 (en) System and method of guiding handwriting diagram input
KR102473543B1 (en) Systems and methods for digital ink interaction
US11126794B2 (en) Targeted rewrites
US11402991B2 (en) System and method for note taking with gestures
US20190026019A1 (en) System and method for mixed content digital ink interactivity
CN113490959B (en) Digital Image Transcription and Manipulation
US20060271846A1 (en) Systems and methods that facilitate improved display of electronic documents
WO2024181963A1 (en) Universal handwriting recognition
US12050878B2 (en) Handwriting text summarization
CN116189208A (en) Method, device, device and medium for text recognition
Timofeev et al. DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation
Thrishaa et al. A Survey of Modern Handwriting Generation Models
Lu et al. A novel approach of constructing chinese handwriting mobile system and data service for elementary education domain
WO2017118609A1 (en) System and method for digital ink interactivity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23713483

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE