[go: up one dir, main page]

WO2019073267A1 - Automated image manipulation using artificial intelligence - Google Patents

Automated image manipulation using artificial intelligence Download PDF

Info

Publication number
WO2019073267A1
WO2019073267A1 PCT/GB2018/052951 GB2018052951W WO2019073267A1 WO 2019073267 A1 WO2019073267 A1 WO 2019073267A1 GB 2018052951 W GB2018052951 W GB 2018052951W WO 2019073267 A1 WO2019073267 A1 WO 2019073267A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
manipulation
images
processed
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2018/052951
Other languages
French (fr)
Inventor
Marco Marchesi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Happy Finish Ltd
Original Assignee
Happy Finish Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB1716750.3A external-priority patent/GB201716750D0/en
Priority claimed from GBGB1814774.4A external-priority patent/GB201814774D0/en
Application filed by Happy Finish Ltd filed Critical Happy Finish Ltd
Publication of WO2019073267A1 publication Critical patent/WO2019073267A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the invention disclosed herein relates to an automated image manipulation system that uses artificial intelligence.
  • a computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the method comprising: receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; for each one of the received plurality of unprocessed images: assigning one of the plurality of unprocessed images to an input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an image-manipulation model associated with a current user-selected image- manipulation task; loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor; processing the input image to derive an output image which has been processed according to the current user-determined
  • the step of processing each task of the set of user-selected image-manipulation tasks may further comprise: determining if all of the tasks in the set of user-selected image-manipulation tasks have been completed, and if not then: assigning the output image as a new input image; selecting any further user- selected image-manipulation task of the set of user-selected image-manipulation tasks which has not been completed; and repeating the above steps of retrieving, loading and processing until all of the user- selected image-manipulation tasks have been processed for the input image.
  • the image-manipulation method may further comprising determining if all of the plurality of unprocessed images have been processed and if not then: assigning another one of the unprocessed images as the input image; and repeating the steps of: processing each task of the set of user-selected image- manipulation tasks on the input image until all of the user-selected image-manipulation tasks have been processed for the input image and then storing the output image.
  • the at least one of the user-defined tasks may in some embodiments comprise a corresponding plurality of image-manipulation sub tasks and the retrieving step may comprise retrieving an image-manipulation model for executing the plurality of corresponding image-manipulation sub tasks.
  • the step of processing the input image may comprise processing a plurality of different input images in parallel with each image having a different image-manipulation task being applied to it. This advantageously allows a technique called pipelining which effectively increases image processing throughput.
  • the image-manipulation method may further comprise creating a plurality of image-manipulation models, each model being associated with a respective image-manipulation task and storing the image- manipulation model once created, wherein the creating step for each image-manipulation task may comprise applying a plurality of training data sets in a machine-learning process to a model-creation processor having a machine-learning computer architecture.
  • the training data sets may comprise a set of unprocessed training images which have not been processed to enhance a specific image manipulation task and a set of processed training images which have been processed to enhance the specific image manipulation task; the set of processed images being unpaired with the set of unprocessed images, and the applying step may further comprise the machine-learning process determining characteristics of the set of unprocessed training images and the characteristics of the set of processed training images and using these characteristics to create the image-manipulation model.
  • the set of unprocessed training images and the set of processed training images may be provided in a mixed set of training images and are not classified as processed or unprocessed training images prior to the applying step. This advantageously minimises the burden on the user to configure the method prior to its execution.
  • the creating step may comprise: initialising operational parameters of the model creation processor; applying an error image to the model creation processor and generating a training output image; comparing the training output image with a correct image corresponding to the error image; calculating an error between the training output image and its corresponding correct image; and adjusting the operational parameters of the model-creation processor to minimise the error for the next application of an error image.
  • the creating step may comprise creating and storing an intermediate version of the image-manipulation model, the intermediate version being created whilst the error is greater than a predetermined threshold.
  • the intermediate versions of the image-manipulation model can advantageously be provided to a user to show an intermediate stage of a large batch processing process.
  • the creating step may comprise repeating the applying, comparing, calculating steps until the error is less than a predetermined threshold.
  • the storing step may comprise storing the operational parameters of the model creation processor as the image-manipulation model or the intermediate version of the image-manipulation model.
  • the processing step may further comprise using the intermediate representation versions of the image- manipulation models to process an unprocessed input image and to generate and output an intermediate version of a partially processed image.
  • the image-manipulation method may further comprise standardising each of the plurality of training data sets of images to comply with a predetermined set of image criteria.
  • the standardising step may comprise one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
  • the image-manipulation method may further comprise creating a plurality of image-manipulation models, each model being associated with a respective image-manipulation task and storing the image- manipulation model once created, wherein the creating step for each image-manipulation task includes use of a General Adversarial Network to create a probability distribution of a non-linear mapping function, the distribution representing the probability that an input image comes from a set of correct images.
  • the image-manipulation method may further comprise standardising each of the plurality of unprocessed images to comply with a predetermined set of image criteria.
  • the standardising step may comprise one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
  • the image-manipulation method may further comprise creating the plurality of user-selected image- manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user- selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a set of user controls for setting the value of each one of a plurality of different image-manipulation tasks.
  • the creating step may comprise in some embodiments manipulating at least one of the user controls and viewing an approximated result of the application of the image-manipulation task on the image displayed within the window and subsequently sending the value of the at least one image-manipulation tasks to the processor.
  • the image-manipulation method may further comprise creating the plurality of user-selected image- manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user- selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a field for expressing image- manipulation tasks to be applied to the image in a natural language describing the image-manipulation tasks.
  • each input image may be a two-dimensional (2D) image and each image-manipulation model may be a 2D image-manipulation model
  • the method may further comprise: generating a three-dimensional (3D) image-manipulation model representation of each 2D image-manipulation model from a 3D data set and storing each 3D model representation; wherein the processing step comprises using the 3D model representation to create a corresponding prediction of the 2D processed image in the 3D domain.
  • the using step may comprise: using the user-selected image-processing task to select the 3D image- manipulation model representation to be applied to the unprocessed 2D input image to generate the prediction of the 2D processed image in the 3D domain.
  • the method may further comprise receiving one or more user-selected 3D image-manipulation tasks and selecting the 3D image-manipulation model representation corresponding to the 3D image-manipulation task and applying the selected 3D model representation to the 2D input image to generate the prediction of the 2D processed image in the 3D domain.
  • the prediction of the 2D processed image in the 3D domain may comprise a 3D processed output image.
  • the method may further comprise providing each the prediction of the 2D processed image in the 3D domain to a 3D image processing tool and further processing the prediction using 3D image-processing instructions of the 3D image processing tool to generate a 3D processed output image.
  • the step of generating a 3D model representation of each 2D image-manipulation model may comprise reading a 3D data set comprising a sequence of 2D images of a scene taken from different angles and generating a spatial parameter in each 3D model representation derived from the sequence of 2D images, the spatial parameter.
  • the processing step in some embodiments comprises removing a representation of a 3D object from the 2D image by identifying the object within the 2D input image using one or more 3D model representations and using 3D image-manipulation techniques to remove the object from the 3D image representation.
  • the method may further comprise converting the processed 3D image representation into a 2D processed image and storing the 2D processed image.
  • the method may further comprise determining an inverse function to a user-determined image- manipulation task, storing the same, and using the inverse function to undo any image processing task if required.
  • a computer-implemented automatic method of image-manipulation for modifying an unprocessed image having variable set of image parameters to a processed image having a target set of image parameters comprising: receiving the unprocessed image and a set of user-selected image-manipulation tasks to manipulate the unprocessed image to create the processed image having the target set of image parameters; assigning the received unprocessed image to an input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an image-manipulation model associated with a current user-selected image-manipulation task; loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor; processing the input image to derive an output image which has been processed according to the current user-determined image-manipulation task; and storing the output image once all of the user-selected image-manipulation tasks for the
  • a computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed two-dimensional (2D) images having different sets of image parameters to a plurality of processed images having a target set of image parameters comprising: receiving the plurality of unprocessed 2D images and a set of user- selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; for each one of the received plurality of unprocessed images: assigning one of the plurality of unprocessed images to a 2D input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an associated three-dimensional (3D) representation of an image-manipulation model an associated with a current user-selected image-manipulation task; loading a processor having a machine-learning computer architecture with the retrieved 3D representation of the
  • an automated image-manipulation system for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the system comprising: a receiver for receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; a processed image generator for processing each one of the received plurality of unprocessed images and the set of user-selected image-manipulation tasks, the processed image generator comprising: an instruction processor for sequentially processing each task of the set of user-selected image-manipulation tasks on a selected one of the plurality of unprocessed images; a model retrieval engine for retrieving image-manipulation models associated with each user-selected image-manipulation task; a task model processor, having a machine-learning computer architecture, arranged to be loaded
  • the automated image-manipulation system is a retouching system configured to correct both lighting conditions and irregularities in input images to output images having a common set of lighting conditions and the irregularities removed.
  • Some of the present embodiments can carry out sophisticated image manipulation such as object removal to reveal previously hidden portions of the scene, altering camera angles and adding lighting from lighting sources within the scene that generates appropriate shadows. This is carried out by use of extrapolation of the 2-dimensional image into the 3-dimensional domain before image manipulation is carried out. This involves learning the properties of 3-dimensional objects and rendering a 2-dimensional image into a 3-dimensional scene. Examples of such 3-dimensional image manipulations include removing objects, altering the camera angle to view the scene in the image at a different angle and adding lighting and appropriate shadows. BRIEF DESCRIPTION OF THE DRAWINGS
  • Figure 1 is a schematic block diagram of an automated image manipulation system according to an embodiment of the present invention
  • Figure 2 is a flow chart showing steps involved in an automated image manipulation process using the system of Figure 1 ;
  • Figure 3 is a schematic representation of an input user interface provided on the user system shown in Figure 1 ;
  • Figure 4 is a schematic representation of an output user interface provided on the user system shown in Figure 1 which can be used to specify further changes to be made to the output image;
  • FIG 5 is a schematic block diagram of the Artificial Intelligence based Retouching System (AIRS) of Figure 1 ;
  • Figure 6 is a flow chart detailing the steps involved in the operation of the AIRS of Figure 5;
  • Figure 7 is a schematic block diagram of the design module of the AIRS of Figure 5;
  • Figure 8 is a flow chart detailing the steps involved in the training the AIRS using the design module of Figure 7;
  • Figure 9 is a schematic diagram of exemplary training images used to train the AIRS system of Figure 5;
  • Figure 10 is a schematic block diagram of the generation module of Figure 7;
  • Figure 1 1 is a flow chart detailing steps involved in the operation of the generation module of Figure 7;
  • Figure 12 is a schematic block diagram illustrating how different tasks may be executed by the AIRS in a different embodiment of the present invention
  • Figure 13 is a schematic block diagram illustrating a further embodiment of the present invention which utilises a 3D representation learned from the training dataset and a 3D user interface with the user input interface of Figure 1 ;
  • Figure 14 is a flow chart illustrating the further embodiment of Figure 13 operating to use the 3D representation
  • Figure 15 is a schematic block diagram illustrating a yet further embodiment of the present invention which utilises a 3D representation learned from the 3D dataset and a 3D tool for manipulating the generated 3D scene before output;
  • Figure 16 is a flow chart illustrating the yet further embodiment of Figure 15 operating to use the 3D tool
  • Figure 17 is a series of images illustrating an example of how a 3D scene may be created by AIRS by either of the embodiments of Figures 13 or 15 and manipulated in order to generate a modified version of the original image;
  • Figure 18 is a series of images illustrating an example of how a plurality of sequential input images are processed to generate a 3D scene by either of embodiments of Figures 13 or 15 that may be rendered from any camera point of view;
  • Figure 19 is a schematic representation of a natural language user interface of a final embodiment of the present invention to be used with the AIRS system of Figure 5.
  • One non-limiting example application of the image retouching system is in the area of fashion image retouching.
  • An original image where the colours, lighting, shape of models and assets (apparel) contained in the image have not been modified (unprocessed), may be processed by the system in such a way that the retouching carried out is indistinguishable from retouching carried out by a human artist.
  • the image may contain manipulations that are difficult to be achieved by human artists but are still consistent with the rest of the image.
  • the claimed image retouching system may be applicable to the fields of both photographic image retouching and video retouching.
  • the system includes at its heart an Artificial Intelligence based Retouching System (AIRS) 2, alternatively called a "Deep Retouching" system that is designed as a general solution for making the image retouching process automatic or at the very least partially human- assisted.
  • AVS Artificial Intelligence based Retouching System
  • Deep Retouching a system that is designed as a general solution for making the image retouching process automatic or at the very least partially human- assisted.
  • current image retouching methods require corrections to be carried out mainly manually, typically one image at a time for one image processing task.
  • the AIRS 2 Since image manipulation using the AIRS 2 is automatic (or at least semi-automatic), the AIRS 2 is able to modify multiple digital images at a significantly faster rate than the existing methods.
  • the time taken to manipulate an image decreases from hours/minutes when carried out by a human retoucher (depending on the task) to milliseconds when carried out by the automated system of the present embodiment. Any retouching process may be performed in real-time (namely milliseconds) on specific image processing computer hardware. This increase in speed is due to the fact that a new technique (described below) has been used to enable a sometimes complicated retouching process to be applied automatically using computer systems rather than to the speed of the computer system itself.
  • One set of parameters can be specified (only once) as a target set of tasks to be applied to all images in a group or batch of input images. Receipt of a batch of input images and one set of image processing tasks enables all images in the batch to be standardised to target parameters specified by tasks in a single set of instructions.
  • Examples of possible image enhancements and manipulations carried out by the AIRS 2 include modifications to the colour, light, shape, details and/or texture of an image or aspects of an image, removing and/or re-positioning assets within an image, masking, enhancing details, enhancing sharpness and/or removing imperfection, such that the final processed image is different from the original unprocessed image. This ability to handle so many different complicated aspects of image processing in a single system has not been possible previously.
  • the image manipulation process being carried out by the same system and having the same size image may take the same amount of time for each image, independent of the image content.
  • the time taken to retouch images automatically is therefore more predictable than existing manual methods. This enables the time taken for processing thousands of images to be predictable and can help in determining accurately when any resultant action can take place, for example when a website can go live with a catalogue of processed images.
  • the image manipulation process may be applied to the same image that has different sizes, achieving the same modifications scaled accordingly.
  • Existing methods may require each image to be modified individually, resulting in inconsistencies.
  • any modification that restores or enhances the original image or modifies the aesthetics and composition of the image may be defined as a task, image manipulation, image modification or retouch. These terms may be used interchangeably.
  • FIG. 1 A schematic diagram of the image retouching system according to an embodiment of the present invention is shown in Figure 1 .
  • the system comprises three components: an Artificial Intelligence based Retouching System (AIRS) 2 which includes within it a neural network 4; an administrator system 6 for configuring the AIRS 2 and a user interface with the system - in this embodiment in the form of a user system 8.
  • the AIRS 2 is operatively coupled to the user system 8 via a wide-area communications network 10 such as the Internet.
  • a wide-area communications network 10 such as the Internet.
  • the primary function of the administrator system 6 is to assist in the creation of models which can be used to configure the AIRS 2 to process images.
  • the administrator system 6 includes training datasets 12 comprising sets of training images, A and B stored in a data store 13.
  • the training datasets 12 are stored in a data store as part of the administrator system 6. The training process is discussed in further detail below.
  • the user system 8 comprises an input graphical user interface (GUI) 14, which in use receives one or more input images C 16 from a data store 17.
  • GUI input graphical user interface
  • the input images C 16 may for example be digital images in any colour mode including, but not limited to, Red Green Blue (RGB), Red Green Blue Alpha (RGBA) or Grayscale, before image retouching has taken place.
  • the input GUI 14 may for example be part of a website and be rendered in a browser of the user system 8 or in a window of a mobile or desktop application provided at the user system 8.
  • the manipulation process itself is automatic but the manipulation tasks to be included in the manipulation process (such as light enhancement, colour retouch etc.) may be optionally manually chosen by the user via the input GUI 14.
  • the manipulation tasks to be included in the manipulation process are provided to the AIRS 2, in the form of instructions 18, along with the unprocessed input images C 16, via the communications network 10.
  • the AIRS 2 receives the instructions 18 and input images C 16 and carries out the requested image retouching on the input images C 16.
  • the resultant images are processed output images D 20, which may be stored in a data store 22 of the AIRS 2 and/or output to the user system 8 via the communication network 10 for display in an output GUI 24.
  • the output images 20 are typically digital images.
  • FIG 2 is a flow chart illustrating a method of operation of the image retouching system of Figure 1 .
  • the method commences, at Step 202, with the user selecting one or more images to modify (input images C 16), which are then displayed, at Step 204, in the input GUI 14.
  • the input images C 16 may optionally be normalised (not shown) to meet pre-determined requirements before they are subject to subsequent computational steps.
  • This normalisation may for example include scaling, cropping, image format conversion and/or any other data manipulation. This advantageously can standardise the input images for processing.
  • the user specifies, at Step 206, the image manipulation tasks he/she would like to be applied to the input image C 16 and selects, at Step 208, the option 'Run' 30 provided on the input GUI 14 (described in detail later with reference to Figure 3).
  • the input images C 16 and instructions 18 detailing which image manipulation tasks are to be carried out for a given input image are sent, at Step 210, to the AIRS 2 via the communications network 10.
  • the AIRS 2 receives these instructions 18 and processes them, at Step 212, before the resultant output images D 20 are sent, at Step 214, to the user system 8 and displayed, at Step 216, in the GUI acting now as the output GUI 24, via the communications network 10.
  • the output images D 20 may be downloaded and/or saved, at Step 220. If the output images D 20 are not considered acceptable, at Step 218, the user may specify, at Step 206, further image manipulations be carried out on output images D 20. In this case images D 20 are re-input to the AIRS 2 together with a new set of image- manipulation instructions 18 relating to the image to be reprocessed and the process is carried out until the user considers the resultant output images 20 to be acceptable.
  • the image manipulation tasks to be included in the manipulation process may be determined automatically.
  • the system can optionally require no human interaction except for providing the input images to be modified.
  • the system may be provided with a target image as the desired output. Tasks could then be automatically selected that, when applied to an input image, result in the target image.
  • These tasks may be stored in a data store and applied to a subsequent batch of input images.
  • task parameters may be pre-set and subsequently applied to a series of images. This may for example be useful if the user would like all the images to be re-touched according to the same image specifications.
  • the system does not require human intervention once the task parameters have been set.
  • Figure 3 shows an example of an input GU1 14 of the user system 8, which is used to carry out Steps 202 to 210 of Figure 2.
  • the input GUI 14 has four main interaction areas. The first is a browse button 32 which enables the user to select the image to be pulled in the input GUI 14 for consideration as an image to be processed. The second is a viewing window 34 for displaying the image to be processed which has been selected. The third is a set of slider controls 36, each slider control relating to a different image manipulation task and labelled with a task descriptor 37. Even though the present embodiment shows eight specific image manipulation tasks, it is possible in other embodiments for different types of task to be provided and also a different number of tasks. Finally, an execute control ('Run' button) 30 is provided to trigger the sending of the image 16 and instructions 18 to the AIRS 2.
  • 'Run' button an execute control
  • an input image C 16 to be manipulated.
  • the user selects an input image 16 from the 'browse' icon 32.
  • the selected image is then shown, at Step 204, in the viewing window 34.
  • the user specifies, at Step 206, the image manipulation tasks he or she would like to be applied to the input image C 16 by selecting one or more tasks and choosing the amount of retouch for each task, in this example using two slider controls 36 on the input GUI 14.
  • the user typically selects the amount of retouch task by setting the slider 36 to a percentage of each task to be carried out.
  • the 'correct colour' interactive slider is set to just over 50% and the 'enhance light' interactive slider is set to 100%.
  • the image displayed in the viewing window 34 changes to show approximately what effect the selected manipulation will produce on the image. This is useful to provide feedback to the user to see what effect a given image manipulation parameter will have on the original unprocessed image 16 before the image manipulations are finally selected.
  • the tasks refer to the type of image manipulation to be carried out, including but not limiting to colour correction, masking, light enhancement, detail enhancement, sharpness enhancement, imperfection removal, texture additions and background removal.
  • the number of slider controls 36 corresponds to the number of possible image manipulation tasks available for the image being displayed. Typically these do not vary, but in another embodiment some aspects of the image may not be variable and so the corresponding image manipulation option may not be shown as available for the user to modify. This can be specified in the metadata of the image if required and determined by considering the metadata.
  • the user input in this embodiment takes the form of actuating, at Step 208, the interactive 'Run' button 30, as illustrated in Figure 3.
  • the input image C 16 is sent to the AIRS 2 as an input image 16 at Step 210, along with the instructions 18 detailing the image modifications the user requested in Step 206.
  • the settings can apply to a single image or a batch of images. There may be a single retouch step or a series of retouch steps specified in the instructions 18. In one embodiment, all the modifications of the image may be processed sequentially but indivisibly as one block of work and one final result may be generated.
  • intermediate images 38 can be generated which still represent photorealistic modifications of the original image 16.
  • One of the intermediate images 38 may be the output image following at least one image manipulation task that will undergo at least one further image manipulation task for the process to be complete.
  • Intermediate images 38 are not necessarily visual images. They may for example represent a data file representing the image in an intermediate state of processing.
  • Figure 4 shows an example of the output GUI 24, which concerns Steps 216 to 220 and Steps 206 to 210 (if the user would like further tasks to be carried out in a second iteration) of Figure 2.
  • the processed output image D 20 is displayed in the output GUI 24 and may be downloadable (if being viewed in a browser window).
  • the image shown in Figure 4 is a preview of the result of the manipulation process that can be downloaded or further manipulated.
  • the user may choose to download the output image D 20 by for example activating, at Step 220 the interactive 'download' button 40 or choosing, at Step 218, for the output image D 20 to undergo further image manipulation (refining the retouching process).
  • the output image D 20 may be subject to a further selection of image manipulation tasks via the slider controls 36 with the amount of manipulation of each task being determined by the positions of the slider controls 36 and the subsequent selection of the run button 30.
  • the image is re-input into the AIRS 2 as re-input image 41 , along with the newly specified instructions 42 (additional image manipulation tasks) detailing the secondary image modifications requested by the user.
  • the settings can also apply to a single image or a batch of images.
  • the output image 20 if being viewed in a browser window, but not actually downloaded to the user system 8 can be referenced by an identification number (typically generated by the AIRS 2) and provided to the user system 8. In this case there is no need to re-input the output image again to the AIRS 2 but rather all that is required is to just provide the identification number together with the new image manipulation instructions to the AIRS 2.
  • FIG. 5 shows the Artificial Intelligence based Retouching System (AIRS) 2 of Figure 1 , in greater detail.
  • the AIRS 2 comprises two major components: a design module (or design engine) 50 and generation module (or generation engine) 52.
  • the design module 50 functions to carry out a training process and allows the system to learn a non-linear model 54 that, in one embodiment, maps input data to output data for a given image manipulation task. In another embodiment, the system may map input data to output data and learn a non-linear model for a plurality of image manipulation tasks. This may be considered to be a model for carrying out a plurality of image manipulation sub tasks.
  • the training dataset 12 is provided to the design module 50 via the administrator system 6.
  • Models 54 created by the design module 50 are stored in a data store 56 and used by the generation module 52.
  • the data store 50 may be placed together with the design module 50 in one embodiment or be located separately in portable devices or machine systems in other embodiments.
  • the generation module 52 functions to implement the image manipulation of a received input image 16 using the specified instructions 18 and the trained models 54.
  • the AIRS 2 further comprises a communications module 58 for communicating with the user system 8 via the communications network 10.
  • the generation module 52 in use receives the input image C 16 and instructions 18 via the communications module 58 and generates the output image 20, which is then output via the communications module 58.
  • FIG. 6 is a flow chart illustrating a method of operation of the AIRS 2 of Figure 5 according to an embodiment of the present invention.
  • the method commences with the administrator system 6 receiving, at Step 302, the training data set 12, which may be made up of sets of unpaired or paired digital training images, A and B.
  • each pair contains one image, A, with the original non-modified content and one image, B, with the same content after an image retouch process.
  • Each pair may contain multiple image retouch steps or a single image retouch step.
  • the pairs of images A and B may contain the same composition of items that differ by visible characteristics. For example, images of landscapes before and after a period of rain.
  • the manipulation process can be defined as A to B.
  • B to A training A to B means that A (the training image before image retouching) is considered the input and B (the training image after image retouching) the output, while training B to A means that B is considered the input and A the output.
  • training B to A means learning the function that, given any image after retouching, the image before the retouching process may be predicted.
  • one image may be a representation before a colour correction and the corresponding image pair may show a representation with the same content after a colour correction has been applied.
  • the system can be trained to perform a manipulation in both directions, so the after colour correction images may be inputted and the before colour correction images may be outputted. In this way, the system can learn to predict what colour corrected images would look like without a colour correction.
  • the training data set 12 is still organised in two subsets, before and after retouching, but the two subsets may differ from each other in number and/or content, in a way that they may appear "unpaired", that is for each image A in the dataset, it is not guaranteed that there exists an image B in the dataset that shows the same content after an image manipulation process. Accordingly, the training system learns the characteristics of each group and subsequently the transfer function between the groups rather than specifically the transfer function from one input training image to pair output training image.
  • the training data set is not separated into pairs of images before and after retouching. Instead, the system is able to learn the distinction between retouched and not retouched images from a data set of mixed images (retouched and not retouched) in a completely unsupervised manner. In a similar way to how 3D properties can be learned from a 3D data set without receiving any information about 3D structures (discussed in further detail below), the system can learn the concept of retouched images autonomously without a clear distinction between images.
  • advantageously processed and unprocessed images can be provided as the training set (mixed images) without pairing or grouping (any form of classification) prior to training.
  • a machine-learning architecture (see below) can then create groupings of training images as well as characteristics of each group and subsequently transfer functions between groups all within the weightings and configuration of the machine-learning architecture.
  • the training data set 12 may include any subject or composition in each sample and the system can be trained to apply a retouching process to any new image containing that subject or composition.
  • the training images of the training data set 12 are pre-processed, at Step 304, to meet pre-determined requirements.
  • the pre-processing may for example include, but is not limited to, scaling, cropping, image format conversion and/or any other data manipulation. This advantageously reduces the training time and/or increases the accuracy of the neural network 4 implementing the model 54 as all of the input training data images are made consistent in terms of size and scale such that the neural network model 54 does not have to accommodate these differences.
  • a batch of training images is created, at Step 306.
  • a batch refers to the set of training images processed at one time for a specific image manipulation task.
  • a batch may be a plurality of pairs of training images relating to a specific task such as imperfection removal, colour correction or texture addition.
  • Models 54 are then created, at Step 308, for each task at the design module 50.
  • the models 54 provide the numerical function to be applied to a neural network 4 such that the neural network 4 can process images to implement each task and will be explained in more detail below.
  • the models 54 are stored at Step 310, in the data store 56.
  • the generation module 52 receives, at Step 312, the input image C 16 and instructions 18 pertaining to that image and processes them at Step 314.
  • the output image D 20 is generated, at Step 316, by applying one or more models 54 to the input image C 16 corresponding to the tasks selected by the user, specified by the instructions 18.
  • the output image D 20 may be sent, at Step 318, to the output GUI 24 of the user system 8.
  • FIG 7 schematically illustrates the design module 50 of Figure 5 in greater detail.
  • the design module 50 may be a convolutional neural network (CNN) 4 or an alternative machine-learning computer architecture, which is used to create runtime models 54 for image manipulation tasks.
  • CNN convolutional neural network
  • Each model 54 represents an image manipulation task or plurality of image manipulation sub tasks and may be created via several iterations of training the CNN 4.
  • Intermediate image manipulation task models represent models that are complete themselves but will be integrated with another model or plurality of models such that the final model runs a plurality of image manipulation tasks.
  • intermediate models may be models that are deprecated by a subsequent model.
  • intermediate models of each task may be stored in the data store 56, such that, if imported in the generation module, the image manipulations learned so far can be presented to the user. That is, the intermediate models can give a preview (work in progress) of how much the system has learned so far and these previews can be provided back to the user if required.
  • the design module 50 of this embodiment comprises a batch retrieval engine 62 to retrieve a training data set 12 (images A and B for example) for a particular task from the administration system 6 together with an associated confidence threshold.
  • the batch retrieval engine 62 is in communication with a task manager engine 63 which in turn provides one of the training data set images (image A) to the CNN 4 whilst the other (image B) is provided to a comparator 64.
  • the task manager engine 63 is also responsible for initial configuration of the CNN 4 which typically has randomised weights before the first training image is input.
  • the output of the CNN 4 is also provided to the comparator 64.
  • the output of the comparator 64 is provided to an error calculator 66 which specifies an error between images A and B.
  • the error calculator 66 provides its output to a threshold comparator 68 which compares the error to the confidence threshold received from the task manager engine 63.
  • the output of the threshold comparator 68 is fed to a weight adjuster 69 which feeds back into the CNN 4 to adjust its weights to reduce the error (when the error is above the threshold).
  • the output of the threshold comparator 68 goes to a trained model storage engine 70 which can retrieve the current CNN 4 settings from the CNN 4 and store these in the data store 56 as a final trained model of the current image manipulation task.
  • the trained model storage engine 70 is also coupled to the task manager engine 62 to know which image processing task, the subject of the current CNN model 54, is currently being considered.
  • image A from a pair of training images, A and B relating to a specific task, is processed by the CNN 4 and undergoes a manipulation determined by the weights and biases of the neurons in the CNN 4.
  • the manipulated image A, training output A' is then sent to the comparator 64, where it is compared to image B from the pair of training images, A and B.
  • the difference (error) between image A' and B is calculated using the error calculator 66.
  • the design module 50 determines whether the error is below or above a predetermined threshold using the threshold comparator 68. If the error is above a threshold, the weights and biases of the neurons in the CNNs 4 are updated at the weight adjuster 69 in dependence on the error value.
  • Image A of the second batch of training images is then processed by the CNN 4 and undergoes a manipulation determined by the updated weights and biases of the neurons in the CNN 4. The process continues until the difference (error) between images A' and B is less than the predetermined threshold.
  • the trained model 54 (which corresponds to the weights and biases of the CNN neurons which led to the error between images A' and B being less than the predetermined threshold) is sent to the trained model storage engine 70.
  • This trained model 54 provides the numerical function to apply to a neural network to process input images 16 to implement a specific task.
  • Information relating to which image manipulation task the model 54 applies to may be retrieved from the task manager engine 62 and the values of the weights and biases of the neurons in the CNN 4 which led to the error between images A' and B being less than the predetermined threshold may be retrieved from the CNN 4.
  • the trained model storage engine 70 stores the trained model 54 to the data store 56.
  • the data store 56 may also comprise intermediate models 72 for different tasks and a final trained model 74, which may incorporate all the models 54 involved to carry out every task instructed.
  • the training process in the design module 50 may be made by Machine Learning computational blocks (CNNs) 4 organised as a conditional Generative Adversarial Network.
  • Generative Adversarial Networks represent a way to train a generative model in an unsupervised manner [2]. They consist of two models that play an adversarial game.
  • a generative model G captures the data distribution while a discriminative model D estimates the probability that a sample (in this case an image) came from the training set rather than from the generator G.
  • G and D solve a non-linear mapping function.
  • both G and D are conditioned on some extra data.
  • the extra data may for example comprise an input image.
  • the system is forced to learn a probability distribution that generates images similar to the input image.
  • the function of the CNNs 4 may be carried out using any network, software, hardware and/or algorithm that is able to learn a non-linear mapping function between the input and output images.
  • FIG 8 is a flow chart illustrating an example of how the training process, referred to when describing Figure 7 above, may be carried out.
  • the process starts by initialising, at Step 402, the neurons of the CNNs 4. This may for example involve using random initialisation and setting the weights and biases of the CNN 4 neurons with random values.
  • a new batch of training images is retrieved, at Step 404, relating to a specific task.
  • a pair of training images from the batch, images A and B, are obtained, at Step 406, before image A is subject to a manipulation, at Step 408, determined by the current configuration of the CNN 4 (i.e. the current weights and biases of the CNN neurons).
  • the manipulated image A is generated, at Step 410, and referred to as training output A'.
  • the training output A' is then compared, at Step 412, to training image B at the comparator.
  • the error (difference) between A' and B is then calculated, at Step 414, at the error calculator 66.
  • the error calculated, at Step 414 is then used to adjust/ update, at Step 416, the weights and biases of the neurons in the CNN 4, in order to reduce the error between A' and B the next time Image A is passed through the CNN 4 in the next iteration. If the error is less than a predetermined threshold, the training is complete and at Step 420 the updated neuron weights and biases are stored in the data store of the AIRS 2 as a model 54 for this particular batch and associated task.
  • the method continues by returning, at Step 406, to another round of training namely a second pair of training images is taken from the batch.
  • the current function of the CNN 4 with the updated weights and biases of the neurons is applied to the image A of the second pair of training images, at Step 408, and the process continues until the error between A' and B as determined at Step 416, is less than the predetermined threshold.
  • FIG. 9 An example batch of training images is schematically shown in Figure 9.
  • the batch of training images relates to imperfection removal on t-shirts.
  • Images A and B in the first pair of training images respectively show a crease 76 on the bottom left of a round-neck style t-shirt 78 and the same t- shirt 78 without the crease 76.
  • Images A and B in the second pair of training images respectively show another t-shirt 78 with a larger crease 76 on the top right of the t-shirt 78 and the same t-shirt 78 without the crease 76.
  • the third pair of training images, A and B respectively shows a crease 76 on a V-neck t- shirt 79 and the same V-neck t-shirt 79 without the crease 76.
  • FIG 10 schematically illustrates the generation module 52 of Figure 5 in greater detail.
  • the generation module 52 has an instruction processor 80 for receiving input images to be processed and outputting processed images.
  • Input image C 16 and the user's instructions 18 (relating to which image processing tasks the user would like to be carried out on input image C 16) are input to the instruction processor 80 from the input GUI 14 of the user system 8 and processed images are sent back to the output GUI 24 of the user system 8.
  • the instruction processor 80 separates the instructions 18 from the input image 16 to be processed and sends the image processing tasks to a task selector 82 to which it is connected.
  • the task selector 82 itself is connected to a model retrieval engine 84 and can specify which task model 54 is to be retrieved from the data store 56.
  • the model retrieval engine 84 functions to retrieve the appropriate model 54 from the data store 56 and load it into a task model 86 to which is it connected.
  • the task model 86 is a CNN 4 which is configured by the model 54 and is also connected to the instruction processor 80 such that images to be manipulated (processed) can be sent from the instruction processor 80 to the task model 86 and the output manipulated (processed) images generated can be sent from the task model 86 to the instruction processor 80.
  • the output image 20 may be sent from the instruction processor 80 to the output GUI 24 of the user system .
  • the generation module 52 contains the CNNs responsible for the generation of new images but its functionality can be executed also in the design module 50.
  • Figure 1 1 is a flow chart that illustrates an example of how the image manipulation process may be carried out using the generation module 52 of Figure 10.
  • the process starts by the instruction processor 80 of Figure 10 receiving input image C 16 and its associated instructions 18 at Step 502.
  • the image processing tasks, specified in the instructions 18 are then determined, at Step 504, at the instruction processor 80 of Figure 10 and a first task is selected, at Step 506, as the current task by the task selector 82.
  • the model 54 relating to the current task is then retrieved, at Step 508, from the data store 56 by the model retrieval engine 84 and sent to the task model 86.
  • Image C 16 is also then sent to the task model 86 from the instruction processor 80 for processing.
  • the task model 86 is configured with the model 54 relating to the first task and the configured model is applied, at Step 510, to image C 16 and thereby generates, at Step 512, an intermediate output image D' 88, which is sent to the instruction processor 80.
  • the amount of retouching of each input image 16 as detailed in the instructions 18 may be implemented by applying a normalised weighting to the model 54. For example, if the 'correct colour' interactive slider of Figure 3 is set to 50%, the colour correction model can be set at a weighting of 50%. Similarly, if the enhance light slider control of Figure 3 is set at 100%, the light enhancement model can be set at a full weighting (in the normalised scale of 1). The settings can apply to a single or a batch of images.
  • the system determines, at Step 514, whether any tasks requested in the instructions 18 have not been carried out. If there are any tasks that have not been carried out, the next task is selected from the instructions 18 as the new current task at the task selector 82 and the method of implementing the image processing returns to retrieving, at Step 508, the model 54 for the current task.
  • the image processing model 54 relating to this new task is then retrieved from the model retrieval engine 84, at Step 508, and the model 54 is applied, at Step 510, to intermediate output image D' 88. This process continues until all tasks requested in the instructions 18 have been carried out.
  • the intermediate output image D' 88 becomes, at Step 516, a final output image D 20 and is sent, at Step 518, to the user, for example to the output GUI 24 of the user system 8.
  • the intermediate output images 88 are not necessarily visual images. They may for example represent a data file representing the image in an intermediate state of processing.
  • the same generation module 52 can be reused for processing all the tasks. For example, for each task, the generation module 52 may be loaded with a model 54 related to the current image processing task and the values of the previous model provided to the task model CNN may be overwritten by new ones.
  • the process schematically illustrated in Figure 11 may alternatively be carried out using a plurality of different generation modules 52, each generation module 52 operating on a different task.
  • This enables image processing tasks for different images to be processed in parallel for improved efficiency, with multiple generation modules 52 running at the same time and processing different input images 16 concurrently. All of the features of this embodiment are the same as previously described embodiments except for the differences which are set out below.
  • Figure 12 is a schematic diagram illustrating an example of how this further embodiment is implemented.
  • two tasks are carried out sequentially on an input image C 16, using two generation modules 52, 90.
  • Task 1 is a colour correction image manipulation task and Task 2 is a light enhancement image manipulation task.
  • Input image C 16 and the instructions 18 associated with Task 1 are supplied to the first generation module 52, for example via the input GUI 14.
  • the first generation module 52 retrieves the model 54 related to Task 1 from the data store 56.
  • the first generation module 52 then generates an intermediate output image D' 88 and provides this to the second generation module 90.
  • the second generation module 90 receives instructions 18 related to Task 2 from the input GU I 14 and retrieves the model 54 related to Task 2 from the data store 56.
  • the second generation module 90 carries out Task 2 on the intermediate output image D' 88 to generate output image D 20.
  • the multiple generation modules 52, 90 may have the same structure but be different software instances (running on a single computer) that contain different parameters depending on the task to execute.
  • the generation modules 52, 90 can be pre-stored and available to be implemented as required. Their implementation can be triggered to operate on the current manipulated image based on the instructions being processed.
  • all of the instructions 18 may be sent to each of the generation modules 52, 90 and the differentiation of which task is carried out at which generation module 52, 90 maybe determined at the specific generation module 52, 90.
  • the embodiment illustrated in Figure 12 may also advantageously enable instruction pipelining to take place to improve throughput and efficiency when multiple images are being processed.
  • the system comprises three generation modules and the user would like to carry out three tasks on three input images 16, Image 1 C, Image 2C and Image 3C.
  • An input Image 1 C may be input to the first generation module to implement Task 1 on Image 1 C.
  • Image 1 C (processed by the first generation module) will then be passed to the second generation module to implement Task 2 on Image 1 C.
  • Image 2C may simultaneously be input to the first generation module to implement Task 1 on Image 2C.
  • Image 1 C once it has been processed by the second generation module, then is passed on to the third generation module to implement Task 3 on Image 1 C and Image 2C simultaneously is passed on to the second generation module to implement Task 2 on Image 2C.
  • Image 3C is input to the first generation module to implement Task 1 on Image 3C.
  • Image 2C once it has been processed by the second generation module, then is sent to the third generation module to implement Task 3 on Image 2C and Image 3C simultaneously is passed to the second generation module to implement Task 2 on Image 3C.
  • Image 3C once it has been processed by the second generation module, is passed to the third generation module to implement Task 3 on Image 3C. This process is illustrated in the table below: Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame
  • the further embodiment therefore enables tasks to be carried out on a plurality of images faster than if only a single task could be carried out on a single image at any one time (sequentially).
  • three images with three tasks can be processed in 5 equal time frames rather than 9 (if carried our sequentially).
  • Figures 13 and 14 illustrate a first multi-dimensional embodiment of the present invention, which enables a greater range of image processing capabilities to be realised by the system.
  • the first multi-dimensional embodiment of Figures 13 and 14 enables more sophisticated image processing tasks to be carried out using 3D image processing techniques, namely which specify how 3D image processing tasks, which are far more sophisticated than 2D image processing techniques, are carried out on a 2D input image.
  • the design module 50 of the AIRS 2 is trained to learn in terms of spatial properties and receives a 3D dataset 100, as well as the training data set 12 described in relation to previous embodiments, from the administrator system 6.
  • the 3D dataset 100 comprises sequences of images showing a scene where each image is taken from a different angle to teach the system the geometrical properties of objects, their material characterisation and how light illuminates the scene, enabling the AIRS 2 to learn the fundamental properties of the 3D world and render 3D scenes (or 3D representations) 101 from 2D images.
  • the design module 50 in Figure 13 is also connected to the data store 102 which is configured to store the 2D models 54 relating to specific tasks (described previously) and 3D representations of the models 104 which are generated from the 3D data set 100.
  • This data store 102 is connected to the generation module 52 to provide both of the 2D models 54 and the 3D representations of the 2D models 104 to the generation module 52.
  • a 2D input image C 16 may be retrieved from a local data store 17 of the user system 8 and manipulated at the input GUI 14 to specify 2D image modifications.
  • the user system 8 also has a 3D user interface 105, where the user can specify 3D image modifications, namely to generate 3D image manipulation instructions 106.
  • the input GU 1 14 incorporating a 3D user interface 105 as implemented on the user system 8, is connected to the generation module 52 and send not only the input image 16 but also the 2D and 3D instructions 18, 106 to the generation module 52.
  • the output image 108 generated by the generation module 52 is in the form of a 3D scene 101 or a 2D image.
  • the AIRS 2 in this embodiment generalises the process of scene understanding.
  • the output image 108 may differ not only from a camera point of view (an image manipulation only possible in the 3D domain), but also in terms of lighting, position of the assets or other characteristics, exponentially increasing the degrees of freedom of the image retouching process.
  • any input 2D image can be converted into a 3D representation to allow further 3D image manipulation tasks to be carried out on it as well as the 2D image manipulation tasks.
  • the type of image manipulation task to be carried out determines which 3D representation is selected in order to predict what the resultant processed 3D representation (output image) will be.
  • the "3D representation" 104 (3D representation of a 2D model) can be considered to be a collection of data representing what the system has learned from the training process, in terms of 3D properties.
  • the 3D representation can be applied or not for processing (retouching in an embodiment) the input images.
  • an input image can be retouched just by using pixel-related properties namely in the 2D domain. If the 3D representation is applied to the 2D input image, it generates a "prediction" of what the 3D scene could be. For example, taking an input image which is a simple image of a cube over a plane; in the 2D domain the input image may just show two objects (the cube and the plane) with possibly some shadows and lighting effects.
  • the system is trained with many images containing those objects. So once the system is trained, it applies its knowledge (the 3D representations) to generate a 3D scene that comprises of a cube and plane. The accuracy of how this 3D scene is related to the 2D input image depends of the training process (how well the system learned). Then this 3D scene can be editable or not, it can be visible to the user or not but it is always calculated as long as the system was enabled to "predict" the 3D representation of that 2D input image.
  • Figure 14 is a flow chart illustrating an example method of how the first multi-dimensional embodiment, referred to when describing Figure 13, may be implemented.
  • the method commences with the 3D data set 100 being sent, at Step 602, to the design module 50 by the administrator module 6.
  • the design module 50 is able to gain its own understanding of the fundamental properties of the 3D world by means of multiple sequences of synthetic or real images, provided by the 3D data set 100, from which the design module 50 learns how to represent spatial concepts.
  • the sequences of images in the 3D data set 100 may show camera movements in different contexts, namely views of a scene from different angles and/or views of a scene with different lighting.
  • the design module may learn the 3D properties contained in the 3D data set 100 in an analogous way that photogrammetry-based techniques generate a 3D mesh of an object from a sequences of images taken around the object.
  • the learning process is based on neural networks that learn the probability distribution of the 3D properties, given the 3D dataset 100.
  • the AIRS 2 may therefore learn in a completely unsupervised way.
  • the design module 50 may acquire a representation of the 3D world in terms of parameters such as perspective, geometry, lighting and/or occlusion.
  • This representation of the 3D world may be optimised to fit in its own representational space that is not manageable by traditional 3D tools.
  • Such representation is the latent space of vectors (mathematical representation) that contain the knowledge acquired by the system in a completely non- assisted manner. Having learned the concept of spatiality and other 3D properties, the system is able to render 3D images from 2D images and generate 3D representations of the 2D models 104. Neural scene representation and rendering is discussed further in Reference [3].
  • the training data set 12 is also sent, at Step 602, to the design module 50, which learns the pattern between unretouched and retouched images and calculates 2D models 54 associated with each task, as discussed in relation to previous embodiments. If the training data set 12 comprises a mix of retouched and unretouched images in accordance with the embodiment described above, the training data set 12 and 3D data set 100 may be combined to form a single data set which the system uses to learn the concept of retouching and 3D properties.
  • 3D representations of the 2D models 104 associated with each task are determined at the design module 50, using the 3D data set 100, and stored, at Step 604, in the data store 102. For example, if a 2D image manipulation task model adds rain to an image, a 3D representation of the 2D model will add the rain with the concept of depth applied.
  • the AIRS 2 learns from the training data set 12 and 3D data set 100 to retouch images in such a way that every automated retouching process is consistent with the training set, but this process is executed with much more accuracy and with higher degrees of freedom than in the prior art. This is because the automated retouching process happens not only at pixel level (within the 2D domain), but inside the 3D world (3D domain) artificially generated by the design module through use of the 3D data set 100.
  • Input image C 16 may be selected, at Step 606, by the user and pulled into the input GU 1 14 where it may be viewed, at Step 608, as a 2D image.
  • the user may request 2D tasks to be carried out, at Step 610, in the form of 2D instructions18.
  • the input image may also be pulled, at Step 608, into the 3D user interface 105, where a user may specify, at Step 610, 3D image manipulation tasks be carried out, in the form of 3D instructions 106.
  • the user may actuate, at Step 612, the 'Run' button on the input GUI 14/or 3D user interface 105 to trigger sending, at Step 614, of the input image 16 and the 2D and 3D instructions 18, 106 to the AIRS 2, and for the AIRS 2 to start the automatic image manipulation process.
  • 3D tasks include rotating objects, adding lighting that takes into account where shadows should be, moving objects and moving the position of the camera.
  • the generation module 52 processes, at Step 616, the instructions and generates, at Step 618, a 3D scene 100 with the image processing tasks requested in the 2D and 3D instructions applied 18, 106.
  • the tasks are applied using the process outlined in Figure 1 1 , with the additional step of using the 3D model representations in place of the 2D models 104 for any 3D tasks to be implemented.
  • the image generated may be conditioned by the training data set 12 to be consistent with the retouched images contained in the training set 12.
  • the output image D' 100 generated by this process is a 'projection' or 'prediction' of the input image in the 3D domain and is also referred to herein as a 3D scene.
  • the 3D scene 100 generated may be the final output of the retouching process, providing a new augmented representation of the subject contained in the original input image.
  • the 3D scene 100 may be optionally converted to a 2D output image.
  • the 3D scene 100 may be sent, at Step 620, to and displayed in the output GUI 24 and the user may decide whether this image is acceptable, at Step 622, or whether further image modifications should be applied. If further image modifications are required, the method returns to the user specifying, at Step 610, image modifications at the input/output GUI 14, 24 and the process is continued until the image is deemed acceptable, at Step 622. Once the image is acceptable, it is possible to download the image, at Step 624.
  • Figures 15 and 16 illustrate a second multi-dimensional embodiment of the present invention, which also enables a greater range of image processing capabilities to be realised by the system.
  • Various aspects of the scene may be adjusted, such as lighting, the position of objects identified in the scene and the position of the camera.
  • Figure 15 includes the 3D data set 100 and training set 12, which are fed to the design module 50.
  • the 2D models 54 associated with each task and 3D representations of the 2D models 104 are also stored in the data store 102.
  • the input image 16 in Figure 15 is directly provided from the input GU I 14, rather than the input GU I 14 and the 3D user interface 105 as is the case in the first multi-dimensional embodiment.
  • the user selects 2D tasks to be carried out using the input GU I 14 and the corresponding 2D instructions 18 are sent to the generation module 52, along with the 2D image 16.
  • Figure 15 further comprises a 3D tool 1 12 (which may or may not be part of the AIRS 2), and which receives a 3D scene 101 created by the generation module 52 using the 3D data set 100, with the 2D tasks requested by the user applied.
  • the 3D scene 101 may be manipulated further at the 3D tool 1 12 using 3D modelling tools, following the user's instructions retrieved from a data store 1 14 or submitted in real time to the 3D tool 112.
  • the fully manipulated 3D scene 101 may be output as a 3D image or converted to a 2D image 1 10 and output.
  • Figure 16 is a flow chart illustrating an example of how the second multi-dimensional embodiment, referred to when describing Figure 15, may be implemented.
  • the 3D data set 100 and training data set 12 are sent, at Step 702, to the design module 50 by the administrator system 6.
  • 2D models 54 associated with each task are created in the design module 50 using the same process described in relation to previous embodiments.
  • the 2D models 54 associated with each task and 3D representations of the 2D models 104 are stored, at Step 704, in the data store 102.
  • a user selects, at Step 706, an input image C 16 to modify which is uploaded and displayed, at Step 708, at the input GUI 14.
  • the user then specifies, at Step 710, 2D image modifications to be carried out and when indicated, at Step 712, the associated instructions 18 are sent, at Step 714, to the generation module 52, along with the 2D image 16.
  • the generation module 52 carries out, at Step 716, the 2D image processing tasks requested in the user's instructions 18 by applying the 2D models 54 from the data store 102 to the input image C 16 using the process described in relation to Figures 10 and 11 .
  • the manipulated 2D image is then converted into a 3D image using the 3D data set 100, resulting in a 3D scene 101 .
  • This 3D scene 101 is delivered, at Step 718, to the 3D tool 1 12.
  • the user may for example request, at Step 720, modifications to the 3D characteristics of the image, geometrical properties of the scene and lighting in the form of 3D instructions 106.
  • These 3D instructions 106 may be predetermined and stored in a 3D instruction data store 1 14 or requested by the user in real time, for example using a 3D user interface 105.
  • the 3D characteristics of the image to be modified may be selected by manipulating the inferred 3D scene 101 .
  • the instructions 106 are then processed, at Step 722, by the 3D tool 1 12 and outputs, at Step 724, either the 3D output image 108 or a 2D representation of the output image 1 10.
  • Step 726 If the output image 108, 1 10 is determined, at Step 726, to be acceptable, then the image is saved and made available for download as a processed image. Alternatively if the output image is not acceptable, at Step 726, then the method returns to the image to the 3D tool 1 12 and the user can re-specify, at Step 720, the 3D modifications to be carried out. This process continues until an acceptable output of the 3D tool 1 12 is achieved.
  • Figure 17 is an example of how the generation module 52 may predict a 3D volumetric scene 101 of a 2D input image C 16, where the input image shows 16 a living room in the daytime.
  • a generated 3D scene 101 is produced using the 3D data set 100.
  • the user may manipulate the volumetric 3D scene 101 to generate a new image or this may be automated.
  • the input image shown in Figure 17 has daylight illumination and a set of items of furniture placed in specific locations.
  • the 3D scene 101 may be manipulated to modify some fundamental properties of the 3D scene 101 .
  • the lamp 1 16 on the table to the left of the room and the light on the ceiling are both turned on.
  • the table 1 18 in the centre of the room and the waste paper bin 120 to the right of the left chair 122 have been removed.
  • the lighting outside has also changed, representing a night-time scene.
  • the 3D scene 101 in this example is rendered again to generate a 2D output image 1 10 with the modifications made to the 3D scene 101 applied.
  • the representation of a scene from an image allows more sophisticated manipulation. For example, since the AIRS 2 learns how to represent spatial concepts, if the user requests an object to be removed, such as the waste paper bin 120 in Figure 17, the AIRS 2 is able to predict what the chair 122 is likely to look like behind the bin and hence generate a realistic image of the living room without the bin.
  • the AIRS 2 learns how to represent spatial concepts, if the user requests an object to be removed, such as the waste paper bin 120 in Figure 17, the AIRS 2 is able to predict what the chair 122 is likely to look like behind the bin and hence generate a realistic image of the living room without the bin.
  • the user may not separately request 3D image manipulations in the form of 3D instructions via the 3D user interface 105 or by manipulating the 3D scene at the 3D tool 112. Instead, the user may specify which tasks they would like to be carried out using the control sliders 36 at the input GU I 14 and the system applies the 3D representations of the 2D models 104 corresponding to the tasks requested (created using the 3D data set and stored in the data store 102) to the 2D input image, to create the 3D scene 101 .
  • the 3D scene 101 with the tasks carried out may be rendered into a 2D image and output to the output GU I 24.
  • a stream of input images 130 may be processed in order to generate an animated 3D scene 132 that may be rendered from any camera point of view on a display 134 of any aspect, from flat displays to head-mounted displays.
  • the coherence between frames is learned from the design module 50 during the training process.
  • the selection of characteristics of the image to be modified can be requested through natural language in a natural language input GUI 140 where the query describes how the image may be retouched.
  • the natural language input GUI 140 has within it areas for a manipulation description 142 which takes the place of the task descriptors 37 and slider controls 36 in the input GUI 14 shown in Figure 3.
  • the user may select an input image to be manipulated.
  • the user selects an input image 16 from the 'browse' icon 144. The selected image is then shown in the viewing window 146.
  • the criteria of manipulation are stipulated through a description made in natural language that may be requested by means of a text input field 148 and previewed in the manipulation description 142.
  • the image 16 may then be sent to the AIRS, along with the instructions 150 detailing which image manipulations are to be carried out, specified by the manipulation description in natural language.
  • Natural language can be part of the training process, so the system is able to learn from previous descriptions how to generate new images that contain the description inserted by the user.
  • the system may be able to collect enough knowledge about a concept through natural language in such a way that it can create images containing the concept, even if the training data 12 does not contain any images of that concept visually represented.
  • the concept of a "river" can be visually generated by the system by learning how water is represented in nature and how a river is verbally described, but no previous knowledge about how a river is visually represented.
  • the images generated by the system may be photorealistic and may pass a Turing Test that discerns between real and Al-generated images.
  • a Turing Test can be qualitatively run by human testers, such as the perceptual studies on Amazon Mechanical Turk (AMT) or processed by an automatic system.
  • AMT Amazon Mechanical Turk
  • two automatic approaches are taken into account, without excluding any other method that may be considered equivalent.
  • the generated image is processed by a classifier able to recognise the items contained in the original scene. Under the assumption that the retouch process is not heavily modifying the content, it is expected that the classifier may be able to recognise the same items in the same positions in the retouched scene as well.
  • its Inception Score (which helps to identify the similar types of items) should be the same or really close to equality.
  • the image may be the input for a discriminator network of a Generative Adversarial Network (GAN) trained (described previously) with the dataset used for training the system.
  • GAN Generative Adversarial Network
  • the discriminator should be not able to distinguish between real images and generated ones. In metrics, the Frechet Inception Distance (equivalent to error) should be minimised.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters is described. The method comprises: receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; for each one of the received plurality of unprocessed images: assigning one of the plurality of unprocessed images to an input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an image-manipulation model associated with a current user-selected image-manipulation task; loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor; processing the input image to derive an output image which has been processed according to the current user-determined image-manipulation task; and storing the output image as one of the plurality of processed images once all of the user-selected image-manipulation tasks for the input image have been completed. The present method can also be used to translate two-dimensional images into the three-dimensional domain for enhanced image processing.

Description

AUTOMATED IMAGE MANIPULATION USING ARTIFICIAL INTELLIGENCE
FIELD OF THE INVENTION
The invention disclosed herein relates to an automated image manipulation system that uses artificial intelligence.
BACKGROUND OF THE INVENTION
In the worlds of internet commerce, fashion and advertisement in particular, there is a huge desire for digital images to be modified or retouched. This may include removing blemishes, making creased clothes appear ironed or changing the colour of items that are available in multiple colours, for example. Additionally, images may be taken with the wrong lighting or with assets missing or placed in the wrong positions. This is particularly the case where multiple images of items, such as goods, are to be displayed on a screen (such as a webpage) and there is a requirement for the illumination of each item to be consistent and uniform. Accordingly, there is a desire to adjust the lighting in the image after it has been taken and possibly reposition assets within the image to ensure consistency between images. Furthermore, there is a high demand for large volumes of digital images to be modified quickly, consistently and at a low cost.
Currently, digital images are manipulated manually or with a very limited set of automatic tools, with individuals using photo editing software to amend images one by one to remove artefacts and to ensure consistency. Every task related to the modification of the scene representation currently requires the work of highly-skilled artists in order to guarantee photorealism and consistency of edits with the rest of the image. This is a slow process and requires employing workers to process the images, which can be costly. There is often a need to process large quantities of similar images. If these images are modified manually, it can lead to noticeable inconsistencies. The time spent to manually modify the image can also exponentially increase depending on the complexity of the task.
Existing processes may involve carrying out a single image manipulation task or synthesising an output image that is supposedly indistinguishable from reality. For example, the document 'Image-to-image Translation with Conditional Adversarial Networks' published by Berkeley Al Research Laboratory, UC Berkeley, [1 ] (Ref 1) investigates using conditional adversarial networks as general-purpose solution to image-to-image translation problems. The conditional adversarial networks supposedly learn the mapping from input image to output image and a loss function to train this mapping. Such a process is used to synthesize photos from label maps, reconstruct objects from edge maps and colourise images.
However, the existing processes such as the document referred to in Ref 1 , tend to focus on one or two aspects of digital image manipulation and hence can only partially correct an image. For example, existing artificial intelligence systems may modify specific elements of the images, such as colouring, transferring or styling but they remain bespoke systems because they are unable to generalise this learning to enable a wider range of image processing tasks or multiple iterative image processing tasks to be carried out.
Accordingly, these types of processes do not have the ability to provide a complete image retouching system that is able to automatically enhance multiple aspects of images without human control.
Existing retouching systems are unable to provide sophisticated image manipulations such as removing objects from scenes, altering camera angles and adding lighting from lighting sources within the scene.
It is desired to address at least one of the above-mentioned problems. SUMMARY
According to one aspect of the present invention there is provided a computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the method comprising: receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; for each one of the received plurality of unprocessed images: assigning one of the plurality of unprocessed images to an input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an image-manipulation model associated with a current user-selected image- manipulation task; loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor; processing the input image to derive an output image which has been processed according to the current user-determined image-manipulation task; and storing the output image as one of the plurality of processed images once all of the user- selected image-manipulation tasks for the input image have been completed.
Other embodiments enable a pre-determined set of image manipulation tasks to be carried out on a batch of images automatically. Large volumes of images can be processed quickly, consistently and at low cost.
The step of processing each task of the set of user-selected image-manipulation tasks may further comprise: determining if all of the tasks in the set of user-selected image-manipulation tasks have been completed, and if not then: assigning the output image as a new input image; selecting any further user- selected image-manipulation task of the set of user-selected image-manipulation tasks which has not been completed; and repeating the above steps of retrieving, loading and processing until all of the user- selected image-manipulation tasks have been processed for the input image.
The image-manipulation method may further comprising determining if all of the plurality of unprocessed images have been processed and if not then: assigning another one of the unprocessed images as the input image; and repeating the steps of: processing each task of the set of user-selected image- manipulation tasks on the input image until all of the user-selected image-manipulation tasks have been processed for the input image and then storing the output image.
The at least one of the user-defined tasks may in some embodiments comprise a corresponding plurality of image-manipulation sub tasks and the retrieving step may comprise retrieving an image-manipulation model for executing the plurality of corresponding image-manipulation sub tasks.
The step of processing the input image may comprise processing a plurality of different input images in parallel with each image having a different image-manipulation task being applied to it. This advantageously allows a technique called pipelining which effectively increases image processing throughput.
The image-manipulation method may further comprise creating a plurality of image-manipulation models, each model being associated with a respective image-manipulation task and storing the image- manipulation model once created, wherein the creating step for each image-manipulation task may comprise applying a plurality of training data sets in a machine-learning process to a model-creation processor having a machine-learning computer architecture.
The training data sets may comprise a set of unprocessed training images which have not been processed to enhance a specific image manipulation task and a set of processed training images which have been processed to enhance the specific image manipulation task; the set of processed images being unpaired with the set of unprocessed images, and the applying step may further comprise the machine-learning process determining characteristics of the set of unprocessed training images and the characteristics of the set of processed training images and using these characteristics to create the image-manipulation model.
The set of unprocessed training images and the set of processed training images may be provided in a mixed set of training images and are not classified as processed or unprocessed training images prior to the applying step. This advantageously minimises the burden on the user to configure the method prior to its execution.
The creating step may comprise: initialising operational parameters of the model creation processor; applying an error image to the model creation processor and generating a training output image; comparing the training output image with a correct image corresponding to the error image; calculating an error between the training output image and its corresponding correct image; and adjusting the operational parameters of the model-creation processor to minimise the error for the next application of an error image.
The creating step may comprise creating and storing an intermediate version of the image-manipulation model, the intermediate version being created whilst the error is greater than a predetermined threshold. The intermediate versions of the image-manipulation model can advantageously be provided to a user to show an intermediate stage of a large batch processing process.
The creating step may comprise repeating the applying, comparing, calculating steps until the error is less than a predetermined threshold.
The storing step may comprise storing the operational parameters of the model creation processor as the image-manipulation model or the intermediate version of the image-manipulation model.
The processing step may further comprise using the intermediate representation versions of the image- manipulation models to process an unprocessed input image and to generate and output an intermediate version of a partially processed image.
The image-manipulation method may further comprise standardising each of the plurality of training data sets of images to comply with a predetermined set of image criteria. The standardising step may comprise one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
The image-manipulation method may further comprise creating a plurality of image-manipulation models, each model being associated with a respective image-manipulation task and storing the image- manipulation model once created, wherein the creating step for each image-manipulation task includes use of a General Adversarial Network to create a probability distribution of a non-linear mapping function, the distribution representing the probability that an input image comes from a set of correct images.
The image-manipulation method may further comprise standardising each of the plurality of unprocessed images to comply with a predetermined set of image criteria. The standardising step may comprise one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
The image-manipulation method may further comprise creating the plurality of user-selected image- manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user- selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a set of user controls for setting the value of each one of a plurality of different image-manipulation tasks.
The creating step may comprise in some embodiments manipulating at least one of the user controls and viewing an approximated result of the application of the image-manipulation task on the image displayed within the window and subsequently sending the value of the at least one image-manipulation tasks to the processor. The image-manipulation method may further comprise creating the plurality of user-selected image- manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user- selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a field for expressing image- manipulation tasks to be applied to the image in a natural language describing the image-manipulation tasks.
In some exemplary embodiments of the present invention each input image may be a two-dimensional (2D) image and each image-manipulation model may be a 2D image-manipulation model, and the method may further comprise: generating a three-dimensional (3D) image-manipulation model representation of each 2D image-manipulation model from a 3D data set and storing each 3D model representation; wherein the processing step comprises using the 3D model representation to create a corresponding prediction of the 2D processed image in the 3D domain.
The using step may comprise: using the user-selected image-processing task to select the 3D image- manipulation model representation to be applied to the unprocessed 2D input image to generate the prediction of the 2D processed image in the 3D domain.
The method may further comprise receiving one or more user-selected 3D image-manipulation tasks and selecting the 3D image-manipulation model representation corresponding to the 3D image-manipulation task and applying the selected 3D model representation to the 2D input image to generate the prediction of the 2D processed image in the 3D domain.
In some embodiments the prediction of the 2D processed image in the 3D domain may comprise a 3D processed output image.
The method may further comprise providing each the prediction of the 2D processed image in the 3D domain to a 3D image processing tool and further processing the prediction using 3D image-processing instructions of the 3D image processing tool to generate a 3D processed output image.
The step of generating a 3D model representation of each 2D image-manipulation model may comprise reading a 3D data set comprising a sequence of 2D images of a scene taken from different angles and generating a spatial parameter in each 3D model representation derived from the sequence of 2D images, the spatial parameter.
The processing step in some embodiments comprises removing a representation of a 3D object from the 2D image by identifying the object within the 2D input image using one or more 3D model representations and using 3D image-manipulation techniques to remove the object from the 3D image representation.
The method may further comprise converting the processed 3D image representation into a 2D processed image and storing the 2D processed image.
The method may further comprise determining an inverse function to a user-determined image- manipulation task, storing the same, and using the inverse function to undo any image processing task if required.
According to a second aspect of the present invention there is provided a computer-implemented automatic method of image-manipulation for modifying an unprocessed image having variable set of image parameters to a processed image having a target set of image parameters, the method comprising: receiving the unprocessed image and a set of user-selected image-manipulation tasks to manipulate the unprocessed image to create the processed image having the target set of image parameters; assigning the received unprocessed image to an input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an image-manipulation model associated with a current user-selected image-manipulation task; loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor; processing the input image to derive an output image which has been processed according to the current user-determined image-manipulation task; and storing the output image once all of the user-selected image-manipulation tasks for the input image have been completed.
According to a third aspect of the present invention there is provided a computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed two-dimensional (2D) images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the method comprising: receiving the plurality of unprocessed 2D images and a set of user- selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; for each one of the received plurality of unprocessed images: assigning one of the plurality of unprocessed images to a 2D input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an associated three-dimensional (3D) representation of an image-manipulation model an associated with a current user-selected image-manipulation task; loading a processor having a machine-learning computer architecture with the retrieved 3D representation of the model and applying the input image to the processor; processing the 2D input image to derive a 3D output image which has been processed within a 3D domain according to the current user- determined image-manipulation task; and storing the 3D output image as one of the plurality of processed images once all of the user-selected image-manipulation tasks for the input image have been completed.
According to a fourth aspect of the present invention there is provided an automated image-manipulation system for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the system comprising: a receiver for receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; a processed image generator for processing each one of the received plurality of unprocessed images and the set of user-selected image-manipulation tasks, the processed image generator comprising: an instruction processor for sequentially processing each task of the set of user-selected image-manipulation tasks on a selected one of the plurality of unprocessed images; a model retrieval engine for retrieving image-manipulation models associated with each user-selected image-manipulation task; a task model processor, having a machine-learning computer architecture, arranged to be loaded sequentially with each of the retrieved image-manipulation models and the selected image and to process the selected image to derive an output image which has been processed sequentially according to the current user-determined image-manipulation tasks; and a data store for storing the output image as one of the plurality of processed images once all of the user- selected image-manipulation tasks for the selected image have been completed.
In an embodiment the automated image-manipulation system is a retouching system configured to correct both lighting conditions and irregularities in input images to output images having a common set of lighting conditions and the irregularities removed.
Some of the present embodiments can carry out sophisticated image manipulation such as object removal to reveal previously hidden portions of the scene, altering camera angles and adding lighting from lighting sources within the scene that generates appropriate shadows. This is carried out by use of extrapolation of the 2-dimensional image into the 3-dimensional domain before image manipulation is carried out. This involves learning the properties of 3-dimensional objects and rendering a 2-dimensional image into a 3-dimensional scene. Examples of such 3-dimensional image manipulations include removing objects, altering the camera angle to view the scene in the image at a different angle and adding lighting and appropriate shadows. BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the present invention will now be described in detail with references to the accompanying drawings, in which:
Figure 1 is a schematic block diagram of an automated image manipulation system according to an embodiment of the present invention;
Figure 2 is a flow chart showing steps involved in an automated image manipulation process using the system of Figure 1 ;
Figure 3 is a schematic representation of an input user interface provided on the user system shown in Figure 1 ;
Figure 4 is a schematic representation of an output user interface provided on the user system shown in Figure 1 which can be used to specify further changes to be made to the output image;
Figure 5 is a schematic block diagram of the Artificial Intelligence based Retouching System (AIRS) of Figure 1 ;
Figure 6 is a flow chart detailing the steps involved in the operation of the AIRS of Figure 5;
Figure 7 is a schematic block diagram of the design module of the AIRS of Figure 5;
Figure 8 is a flow chart detailing the steps involved in the training the AIRS using the design module of Figure 7;
Figure 9 is a schematic diagram of exemplary training images used to train the AIRS system of Figure 5;
Figure 10 is a schematic block diagram of the generation module of Figure 7;
Figure 1 1 is a flow chart detailing steps involved in the operation of the generation module of Figure 7;
Figure 12 is a schematic block diagram illustrating how different tasks may be executed by the AIRS in a different embodiment of the present invention;
Figure 13 is a schematic block diagram illustrating a further embodiment of the present invention which utilises a 3D representation learned from the training dataset and a 3D user interface with the user input interface of Figure 1 ;
Figure 14 is a flow chart illustrating the further embodiment of Figure 13 operating to use the 3D representation;
Figure 15 is a schematic block diagram illustrating a yet further embodiment of the present invention which utilises a 3D representation learned from the 3D dataset and a 3D tool for manipulating the generated 3D scene before output;
Figure 16 is a flow chart illustrating the yet further embodiment of Figure 15 operating to use the 3D tool;
Figure 17 is a series of images illustrating an example of how a 3D scene may be created by AIRS by either of the embodiments of Figures 13 or 15 and manipulated in order to generate a modified version of the original image;
Figure 18 is a series of images illustrating an example of how a plurality of sequential input images are processed to generate a 3D scene by either of embodiments of Figures 13 or 15 that may be rendered from any camera point of view; and
Figure 19 is a schematic representation of a natural language user interface of a final embodiment of the present invention to be used with the AIRS system of Figure 5. DETAILED DESCRIPTION
There is presented an image retouching system that uses artificial intelligence to automatically modify an image in a manner such that the output generated is a result of the enhancements and manipulations applied by manipulation criteria.
One non-limiting example application of the image retouching system is in the area of fashion image retouching. An original image, where the colours, lighting, shape of models and assets (apparel) contained in the image have not been modified (unprocessed), may be processed by the system in such a way that the retouching carried out is indistinguishable from retouching carried out by a human artist. The image may contain manipulations that are difficult to be achieved by human artists but are still consistent with the rest of the image. The claimed image retouching system may be applicable to the fields of both photographic image retouching and video retouching.
The system, as will be described in greater detail below, includes at its heart an Artificial Intelligence based Retouching System (AIRS) 2, alternatively called a "Deep Retouching" system that is designed as a general solution for making the image retouching process automatic or at the very least partially human- assisted. In comparison, current image retouching methods require corrections to be carried out mainly manually, typically one image at a time for one image processing task.
Since image manipulation using the AIRS 2 is automatic (or at least semi-automatic), the AIRS 2 is able to modify multiple digital images at a significantly faster rate than the existing methods. The time taken to manipulate an image decreases from hours/minutes when carried out by a human retoucher (depending on the task) to milliseconds when carried out by the automated system of the present embodiment. Any retouching process may be performed in real-time (namely milliseconds) on specific image processing computer hardware. This increase in speed is due to the fact that a new technique (described below) has been used to enable a sometimes complicated retouching process to be applied automatically using computer systems rather than to the speed of the computer system itself.
One set of parameters can be specified (only once) as a target set of tasks to be applied to all images in a group or batch of input images. Receipt of a batch of input images and one set of image processing tasks enables all images in the batch to be standardised to target parameters specified by tasks in a single set of instructions.
Examples of possible image enhancements and manipulations carried out by the AIRS 2 include modifications to the colour, light, shape, details and/or texture of an image or aspects of an image, removing and/or re-positioning assets within an image, masking, enhancing details, enhancing sharpness and/or removing imperfection, such that the final processed image is different from the original unprocessed image. This ability to handle so many different complicated aspects of image processing in a single system has not been possible previously.
The image manipulation process being carried out by the same system and having the same size image may take the same amount of time for each image, independent of the image content. The time taken to retouch images automatically is therefore more predictable than existing manual methods. This enables the time taken for processing thousands of images to be predictable and can help in determining accurately when any resultant action can take place, for example when a website can go live with a catalogue of processed images.
The image manipulation process may be applied to the same image that has different sizes, achieving the same modifications scaled accordingly. Existing methods may require each image to be modified individually, resulting in inconsistencies.
For the purpose of this disclosure, any modification that restores or enhances the original image or modifies the aesthetics and composition of the image may be defined as a task, image manipulation, image modification or retouch. These terms may be used interchangeably.
A schematic diagram of the image retouching system according to an embodiment of the present invention is shown in Figure 1 . The system comprises three components: an Artificial Intelligence based Retouching System (AIRS) 2 which includes within it a neural network 4; an administrator system 6 for configuring the AIRS 2 and a user interface with the system - in this embodiment in the form of a user system 8. The AIRS 2 is operatively coupled to the user system 8 via a wide-area communications network 10 such as the Internet.
The primary function of the administrator system 6 is to assist in the creation of models which can be used to configure the AIRS 2 to process images. In this regard, the administrator system 6 includes training datasets 12 comprising sets of training images, A and B stored in a data store 13. The training datasets 12 are stored in a data store as part of the administrator system 6. The training process is discussed in further detail below.
The user system 8 comprises an input graphical user interface (GUI) 14, which in use receives one or more input images C 16 from a data store 17. In the following specification multiple images will be described but it is to be appreciated that these can also be single images. The input images C 16 may for example be digital images in any colour mode including, but not limited to, Red Green Blue (RGB), Red Green Blue Alpha (RGBA) or Grayscale, before image retouching has taken place.
The input GUI 14 may for example be part of a website and be rendered in a browser of the user system 8 or in a window of a mobile or desktop application provided at the user system 8. The manipulation process itself is automatic but the manipulation tasks to be included in the manipulation process (such as light enhancement, colour retouch etc.) may be optionally manually chosen by the user via the input GUI 14.
The manipulation tasks to be included in the manipulation process are provided to the AIRS 2, in the form of instructions 18, along with the unprocessed input images C 16, via the communications network 10. The AIRS 2 receives the instructions 18 and input images C 16 and carries out the requested image retouching on the input images C 16. The resultant images are processed output images D 20, which may be stored in a data store 22 of the AIRS 2 and/or output to the user system 8 via the communication network 10 for display in an output GUI 24. The output images 20 are typically digital images.
Figure 2 is a flow chart illustrating a method of operation of the image retouching system of Figure 1 . The method commences, at Step 202, with the user selecting one or more images to modify (input images C 16), which are then displayed, at Step 204, in the input GUI 14.
The input images C 16 may optionally be normalised (not shown) to meet pre-determined requirements before they are subject to subsequent computational steps. This normalisation may for example include scaling, cropping, image format conversion and/or any other data manipulation. This advantageously can standardise the input images for processing.
The user (optionally) specifies, at Step 206, the image manipulation tasks he/she would like to be applied to the input image C 16 and selects, at Step 208, the option 'Run' 30 provided on the input GUI 14 (described in detail later with reference to Figure 3). The input images C 16 and instructions 18 detailing which image manipulation tasks are to be carried out for a given input image are sent, at Step 210, to the AIRS 2 via the communications network 10. The AIRS 2 receives these instructions 18 and processes them, at Step 212, before the resultant output images D 20 are sent, at Step 214, to the user system 8 and displayed, at Step 216, in the GUI acting now as the output GUI 24, via the communications network 10. If the user considers, at Step 218, the output images D 20 to be acceptable, the output images D 20 may be downloaded and/or saved, at Step 220. If the output images D 20 are not considered acceptable, at Step 218, the user may specify, at Step 206, further image manipulations be carried out on output images D 20. In this case images D 20 are re-input to the AIRS 2 together with a new set of image- manipulation instructions 18 relating to the image to be reprocessed and the process is carried out until the user considers the resultant output images 20 to be acceptable.
Alternatively, the image manipulation tasks to be included in the manipulation process may be determined automatically. The system can optionally require no human interaction except for providing the input images to be modified. For example, in one embodiment, the system may be provided with a target image as the desired output. Tasks could then be automatically selected that, when applied to an input image, result in the target image. These tasks may be stored in a data store and applied to a subsequent batch of input images. In another embodiment, task parameters may be pre-set and subsequently applied to a series of images. This may for example be useful if the user would like all the images to be re-touched according to the same image specifications. In this embodiment, the system does not require human intervention once the task parameters have been set.
Figure 3 shows an example of an input GU1 14 of the user system 8, which is used to carry out Steps 202 to 210 of Figure 2. The input GUI 14 has four main interaction areas. The first is a browse button 32 which enables the user to select the image to be pulled in the input GUI 14 for consideration as an image to be processed. The second is a viewing window 34 for displaying the image to be processed which has been selected. The third is a set of slider controls 36, each slider control relating to a different image manipulation task and labelled with a task descriptor 37. Even though the present embodiment shows eight specific image manipulation tasks, it is possible in other embodiments for different types of task to be provided and also a different number of tasks. Finally, an execute control ('Run' button) 30 is provided to trigger the sending of the image 16 and instructions 18 to the AIRS 2.
In operation, through use of an image picker, the user selects, at Step 202 of Figure 2, an input image C 16 to be manipulated. In the example, shown in Figure 3, the user selects an input image 16 from the 'browse' icon 32. The selected image is then shown, at Step 204, in the viewing window 34. The user then specifies, at Step 206, the image manipulation tasks he or she would like to be applied to the input image C 16 by selecting one or more tasks and choosing the amount of retouch for each task, in this example using two slider controls 36 on the input GUI 14. The user typically selects the amount of retouch task by setting the slider 36 to a percentage of each task to be carried out. In the example shown in Figure 3, the 'correct colour' interactive slider is set to just over 50% and the 'enhance light' interactive slider is set to 100%.
In the present embodiment, the image displayed in the viewing window 34 changes to show approximately what effect the selected manipulation will produce on the image. This is useful to provide feedback to the user to see what effect a given image manipulation parameter will have on the original unprocessed image 16 before the image manipulations are finally selected.
The tasks refer to the type of image manipulation to be carried out, including but not limiting to colour correction, masking, light enhancement, detail enhancement, sharpness enhancement, imperfection removal, texture additions and background removal. The number of slider controls 36 corresponds to the number of possible image manipulation tasks available for the image being displayed. Typically these do not vary, but in another embodiment some aspects of the image may not be variable and so the corresponding image manipulation option may not be shown as available for the user to modify. This can be specified in the metadata of the image if required and determined by considering the metadata. Once the user has specified the levels of image manipulation to be carried out, the user input triggers the AIRS 2 to start the automated image manipulation process. The user input in this embodiment takes the form of actuating, at Step 208, the interactive 'Run' button 30, as illustrated in Figure 3. Once the user actuates the Run button 30, the input image C 16 is sent to the AIRS 2 as an input image 16 at Step 210, along with the instructions 18 detailing the image modifications the user requested in Step 206. The settings can apply to a single image or a batch of images. There may be a single retouch step or a series of retouch steps specified in the instructions 18. In one embodiment, all the modifications of the image may be processed sequentially but indivisibly as one block of work and one final result may be generated. In another embodiment, a hierarchy of arbitrary tasks can be applied in such a way, that intermediate images 38 can be generated which still represent photorealistic modifications of the original image 16. One of the intermediate images 38 may be the output image following at least one image manipulation task that will undergo at least one further image manipulation task for the process to be complete. Intermediate images 38 are not necessarily visual images. They may for example represent a data file representing the image in an intermediate state of processing.
Figure 4 shows an example of the output GUI 24, which concerns Steps 216 to 220 and Steps 206 to 210 (if the user would like further tasks to be carried out in a second iteration) of Figure 2. The processed output image D 20 is displayed in the output GUI 24 and may be downloadable (if being viewed in a browser window). The image shown in Figure 4 is a preview of the result of the manipulation process that can be downloaded or further manipulated. The user may choose to download the output image D 20 by for example activating, at Step 220 the interactive 'download' button 40 or choosing, at Step 218, for the output image D 20 to undergo further image manipulation (refining the retouching process). If the latter option is chosen, the output image D 20 may be subject to a further selection of image manipulation tasks via the slider controls 36 with the amount of manipulation of each task being determined by the positions of the slider controls 36 and the subsequent selection of the run button 30. The image is re-input into the AIRS 2 as re-input image 41 , along with the newly specified instructions 42 (additional image manipulation tasks) detailing the secondary image modifications requested by the user. The settings can also apply to a single image or a batch of images. It is to be appreciated that the output image 20 if being viewed in a browser window, but not actually downloaded to the user system 8 can be referenced by an identification number (typically generated by the AIRS 2) and provided to the user system 8. In this case there is no need to re-input the output image again to the AIRS 2 but rather all that is required is to just provide the identification number together with the new image manipulation instructions to the AIRS 2.
Figure 5 shows the Artificial Intelligence based Retouching System (AIRS) 2 of Figure 1 , in greater detail. The AIRS 2 comprises two major components: a design module (or design engine) 50 and generation module (or generation engine) 52. The design module 50 functions to carry out a training process and allows the system to learn a non-linear model 54 that, in one embodiment, maps input data to output data for a given image manipulation task. In another embodiment, the system may map input data to output data and learn a non-linear model for a plurality of image manipulation tasks. This may be considered to be a model for carrying out a plurality of image manipulation sub tasks. The training dataset 12 is provided to the design module 50 via the administrator system 6. Models 54 created by the design module 50 are stored in a data store 56 and used by the generation module 52. The data store 50 may be placed together with the design module 50 in one embodiment or be located separately in portable devices or machine systems in other embodiments. The generation module 52 functions to implement the image manipulation of a received input image 16 using the specified instructions 18 and the trained models 54. The AIRS 2 further comprises a communications module 58 for communicating with the user system 8 via the communications network 10. The generation module 52 in use receives the input image C 16 and instructions 18 via the communications module 58 and generates the output image 20, which is then output via the communications module 58.
Figure 6 is a flow chart illustrating a method of operation of the AIRS 2 of Figure 5 according to an embodiment of the present invention. The method commences with the administrator system 6 receiving, at Step 302, the training data set 12, which may be made up of sets of unpaired or paired digital training images, A and B. In one embodiment, each pair contains one image, A, with the original non-modified content and one image, B, with the same content after an image retouch process. Each pair may contain multiple image retouch steps or a single image retouch step.
In another embodiment, the pairs of images A and B may contain the same composition of items that differ by visible characteristics. For example, images of landscapes before and after a period of rain.
The manipulation process can be defined as A to B. In the same way the process can also apply to the opposite direction, B to A. That is, training A to B means that A (the training image before image retouching) is considered the input and B (the training image after image retouching) the output, while training B to A means that B is considered the input and A the output. For example, if image A contains content before retouching and B images contain the same content after retouching, training B to A means learning the function that, given any image after retouching, the image before the retouching process may be predicted.
For example, one image may be a representation before a colour correction and the corresponding image pair may show a representation with the same content after a colour correction has been applied. The system can be trained to perform a manipulation in both directions, so the after colour correction images may be inputted and the before colour correction images may be outputted. In this way, the system can learn to predict what colour corrected images would look like without a colour correction.
In another embodiment, the training data set 12 is still organised in two subsets, before and after retouching, but the two subsets may differ from each other in number and/or content, in a way that they may appear "unpaired", that is for each image A in the dataset, it is not guaranteed that there exists an image B in the dataset that shows the same content after an image manipulation process. Accordingly, the training system learns the characteristics of each group and subsequently the transfer function between the groups rather than specifically the transfer function from one input training image to pair output training image.
In a further embodiment, the training data set is not separated into pairs of images before and after retouching. Instead, the system is able to learn the distinction between retouched and not retouched images from a data set of mixed images (retouched and not retouched) in a completely unsupervised manner. In a similar way to how 3D properties can be learned from a 3D data set without receiving any information about 3D structures (discussed in further detail below), the system can learn the concept of retouched images autonomously without a clear distinction between images. In this embodiment, advantageously processed and unprocessed images (raw and retouched images) can be provided as the training set (mixed images) without pairing or grouping (any form of classification) prior to training. A machine-learning architecture (see below) can then create groupings of training images as well as characteristics of each group and subsequently transfer functions between groups all within the weightings and configuration of the machine-learning architecture.
The training data set 12 may include any subject or composition in each sample and the system can be trained to apply a retouching process to any new image containing that subject or composition.
The training images of the training data set 12 are pre-processed, at Step 304, to meet pre-determined requirements. The pre-processing may for example include, but is not limited to, scaling, cropping, image format conversion and/or any other data manipulation. This advantageously reduces the training time and/or increases the accuracy of the neural network 4 implementing the model 54 as all of the input training data images are made consistent in terms of size and scale such that the neural network model 54 does not have to accommodate these differences.
A batch of training images is created, at Step 306. A batch refers to the set of training images processed at one time for a specific image manipulation task. For example, a batch may be a plurality of pairs of training images relating to a specific task such as imperfection removal, colour correction or texture addition. Models 54 are then created, at Step 308, for each task at the design module 50. The models 54 provide the numerical function to be applied to a neural network 4 such that the neural network 4 can process images to implement each task and will be explained in more detail below.
The models 54 are stored at Step 310, in the data store 56. In use, the generation module 52 receives, at Step 312, the input image C 16 and instructions 18 pertaining to that image and processes them at Step 314. The output image D 20 is generated, at Step 316, by applying one or more models 54 to the input image C 16 corresponding to the tasks selected by the user, specified by the instructions 18. The output image D 20 may be sent, at Step 318, to the output GUI 24 of the user system 8.
Figure 7 schematically illustrates the design module 50 of Figure 5 in greater detail. At the heart of the design module 50 may be a convolutional neural network (CNN) 4 or an alternative machine-learning computer architecture, which is used to create runtime models 54 for image manipulation tasks. Each model 54 represents an image manipulation task or plurality of image manipulation sub tasks and may be created via several iterations of training the CNN 4.
Intermediate image manipulation task models represent models that are complete themselves but will be integrated with another model or plurality of models such that the final model runs a plurality of image manipulation tasks. Alternatively, in another embodiment intermediate models may be models that are deprecated by a subsequent model. In this regard, intermediate models of each task may be stored in the data store 56, such that, if imported in the generation module, the image manipulations learned so far can be presented to the user. That is, the intermediate models can give a preview (work in progress) of how much the system has learned so far and these previews can be provided back to the user if required.
In order to configure the CNN 4, the design module 50 of this embodiment comprises a batch retrieval engine 62 to retrieve a training data set 12 (images A and B for example) for a particular task from the administration system 6 together with an associated confidence threshold. The batch retrieval engine 62 is in communication with a task manager engine 63 which in turn provides one of the training data set images (image A) to the CNN 4 whilst the other (image B) is provided to a comparator 64. The task manager engine 63 is also responsible for initial configuration of the CNN 4 which typically has randomised weights before the first training image is input. The output of the CNN 4 is also provided to the comparator 64. The output of the comparator 64 is provided to an error calculator 66 which specifies an error between images A and B. The error calculator 66 provides its output to a threshold comparator 68 which compares the error to the confidence threshold received from the task manager engine 63. The output of the threshold comparator 68 is fed to a weight adjuster 69 which feeds back into the CNN 4 to adjust its weights to reduce the error (when the error is above the threshold). Also the output of the threshold comparator 68 goes to a trained model storage engine 70 which can retrieve the current CNN 4 settings from the CNN 4 and store these in the data store 56 as a final trained model of the current image manipulation task. The trained model storage engine 70 is also coupled to the task manager engine 62 to know which image processing task, the subject of the current CNN model 54, is currently being considered.
During the training process, image A from a pair of training images, A and B, relating to a specific task, is processed by the CNN 4 and undergoes a manipulation determined by the weights and biases of the neurons in the CNN 4. The manipulated image A, training output A', is then sent to the comparator 64, where it is compared to image B from the pair of training images, A and B. The difference (error) between image A' and B is calculated using the error calculator 66. The design module 50 determines whether the error is below or above a predetermined threshold using the threshold comparator 68. If the error is above a threshold, the weights and biases of the neurons in the CNNs 4 are updated at the weight adjuster 69 in dependence on the error value. A hyperparameter such as a Learning Rate (LR), which may also be stored with the training images, may determine how much the weights and biases should be updated based on the error value. Image A of the second batch of training images is then processed by the CNN 4 and undergoes a manipulation determined by the updated weights and biases of the neurons in the CNN 4. The process continues until the difference (error) between images A' and B is less than the predetermined threshold. When the error is less than the predetermined threshold, the trained model 54 (which corresponds to the weights and biases of the CNN neurons which led to the error between images A' and B being less than the predetermined threshold) is sent to the trained model storage engine 70. This trained model 54 provides the numerical function to apply to a neural network to process input images 16 to implement a specific task. Information relating to which image manipulation task the model 54 applies to may be retrieved from the task manager engine 62 and the values of the weights and biases of the neurons in the CNN 4 which led to the error between images A' and B being less than the predetermined threshold may be retrieved from the CNN 4. The trained model storage engine 70 stores the trained model 54 to the data store 56. As mentioned above, the data store 56 may also comprise intermediate models 72 for different tasks and a final trained model 74, which may incorporate all the models 54 involved to carry out every task instructed.
In an alternative embodiment, the training process in the design module 50 may be made by Machine Learning computational blocks (CNNs) 4 organised as a conditional Generative Adversarial Network. Generative Adversarial Networks (GANs) represent a way to train a generative model in an unsupervised manner [2]. They consist of two models that play an adversarial game. A generative model G captures the data distribution while a discriminative model D estimates the probability that a sample (in this case an image) came from the training set rather than from the generator G. Both G and D solve a non-linear mapping function. Unlike an unconditional GAN, both G and D are conditioned on some extra data. The extra data may for example comprise an input image. The system is forced to learn a probability distribution that generates images similar to the input image.
Alternatively the function of the CNNs 4 may be carried out using any network, software, hardware and/or algorithm that is able to learn a non-linear mapping function between the input and output images.
Figure 8 is a flow chart illustrating an example of how the training process, referred to when describing Figure 7 above, may be carried out. The process starts by initialising, at Step 402, the neurons of the CNNs 4. This may for example involve using random initialisation and setting the weights and biases of the CNN 4 neurons with random values. Then a new batch of training images is retrieved, at Step 404, relating to a specific task. A pair of training images from the batch, images A and B, are obtained, at Step 406, before image A is subject to a manipulation, at Step 408, determined by the current configuration of the CNN 4 (i.e. the current weights and biases of the CNN neurons). The manipulated image A is generated, at Step 410, and referred to as training output A'. The training output A' is then compared, at Step 412, to training image B at the comparator. The error (difference) between A' and B is then calculated, at Step 414, at the error calculator 66. The error calculated, at Step 414, is then used to adjust/ update, at Step 416, the weights and biases of the neurons in the CNN 4, in order to reduce the error between A' and B the next time Image A is passed through the CNN 4 in the next iteration. If the error is less than a predetermined threshold, the training is complete and at Step 420 the updated neuron weights and biases are stored in the data store of the AIRS 2 as a model 54 for this particular batch and associated task. If the error is greater than the predetermined threshold, the method continues by returning, at Step 406, to another round of training namely a second pair of training images is taken from the batch. The current function of the CNN 4 with the updated weights and biases of the neurons is applied to the image A of the second pair of training images, at Step 408, and the process continues until the error between A' and B as determined at Step 416, is less than the predetermined threshold.
An example batch of training images is schematically shown in Figure 9. In this example, the batch of training images relates to imperfection removal on t-shirts. Images A and B in the first pair of training images respectively show a crease 76 on the bottom left of a round-neck style t-shirt 78 and the same t- shirt 78 without the crease 76. Images A and B in the second pair of training images respectively show another t-shirt 78 with a larger crease 76 on the top right of the t-shirt 78 and the same t-shirt 78 without the crease 76. The third pair of training images, A and B, respectively shows a crease 76 on a V-neck t- shirt 79 and the same V-neck t-shirt 79 without the crease 76. By passing numerous pairs of training images through the design module 50 and updating the weights and biases of the CNN neurons with each iteration, a more accurate model 54 is created that is able to for example remove creases 76 from images of t-shirts 78, 79, no matter the size, location or style of t-shirt 78, 79.
Figure 10 schematically illustrates the generation module 52 of Figure 5 in greater detail. The generation module 52 has an instruction processor 80 for receiving input images to be processed and outputting processed images. For example Input image C 16 and the user's instructions 18 (relating to which image processing tasks the user would like to be carried out on input image C 16) are input to the instruction processor 80 from the input GUI 14 of the user system 8 and processed images are sent back to the output GUI 24 of the user system 8. The instruction processor 80 separates the instructions 18 from the input image 16 to be processed and sends the image processing tasks to a task selector 82 to which it is connected. The task selector 82 itself is connected to a model retrieval engine 84 and can specify which task model 54 is to be retrieved from the data store 56. The model retrieval engine 84 functions to retrieve the appropriate model 54 from the data store 56 and load it into a task model 86 to which is it connected. The task model 86 is a CNN 4 which is configured by the model 54 and is also connected to the instruction processor 80 such that images to be manipulated (processed) can be sent from the instruction processor 80 to the task model 86 and the output manipulated (processed) images generated can be sent from the task model 86 to the instruction processor 80. The output image 20 may be sent from the instruction processor 80 to the output GUI 24 of the user system .
The generation module 52 contains the CNNs responsible for the generation of new images but its functionality can be executed also in the design module 50.
Figure 1 1 is a flow chart that illustrates an example of how the image manipulation process may be carried out using the generation module 52 of Figure 10. The process starts by the instruction processor 80 of Figure 10 receiving input image C 16 and its associated instructions 18 at Step 502. The image processing tasks, specified in the instructions 18 are then determined, at Step 504, at the instruction processor 80 of Figure 10 and a first task is selected, at Step 506, as the current task by the task selector 82. The model 54 relating to the current task is then retrieved, at Step 508, from the data store 56 by the model retrieval engine 84 and sent to the task model 86. Image C 16 is also then sent to the task model 86 from the instruction processor 80 for processing. The task model 86 is configured with the model 54 relating to the first task and the configured model is applied, at Step 510, to image C 16 and thereby generates, at Step 512, an intermediate output image D' 88, which is sent to the instruction processor 80.
The amount of retouching of each input image 16 as detailed in the instructions 18 may be implemented by applying a normalised weighting to the model 54. For example, if the 'correct colour' interactive slider of Figure 3 is set to 50%, the colour correction model can be set at a weighting of 50%. Similarly, if the enhance light slider control of Figure 3 is set at 100%, the light enhancement model can be set at a full weighting (in the normalised scale of 1). The settings can apply to a single or a batch of images.
The system then determines, at Step 514, whether any tasks requested in the instructions 18 have not been carried out. If there are any tasks that have not been carried out, the next task is selected from the instructions 18 as the new current task at the task selector 82 and the method of implementing the image processing returns to retrieving, at Step 508, the model 54 for the current task. The image processing model 54 relating to this new task is then retrieved from the model retrieval engine 84, at Step 508, and the model 54 is applied, at Step 510, to intermediate output image D' 88. This process continues until all tasks requested in the instructions 18 have been carried out. At this point, the intermediate output image D' 88 becomes, at Step 516, a final output image D 20 and is sent, at Step 518, to the user, for example to the output GUI 24 of the user system 8. The intermediate output images 88 are not necessarily visual images. They may for example represent a data file representing the image in an intermediate state of processing.
If the image manipulation tasks are processed in series, the same generation module 52 can be reused for processing all the tasks. For example, for each task, the generation module 52 may be loaded with a model 54 related to the current image processing task and the values of the previous model provided to the task model CNN may be overwritten by new ones.
In a further embodiment of the present invention, the process schematically illustrated in Figure 11 may alternatively be carried out using a plurality of different generation modules 52, each generation module 52 operating on a different task. This enables image processing tasks for different images to be processed in parallel for improved efficiency, with multiple generation modules 52 running at the same time and processing different input images 16 concurrently. All of the features of this embodiment are the same as previously described embodiments except for the differences which are set out below.
Figure 12 is a schematic diagram illustrating an example of how this further embodiment is implemented. In this example, two tasks are carried out sequentially on an input image C 16, using two generation modules 52, 90. Task 1 is a colour correction image manipulation task and Task 2 is a light enhancement image manipulation task. Input image C 16 and the instructions 18 associated with Task 1 are supplied to the first generation module 52, for example via the input GUI 14. The first generation module 52 then retrieves the model 54 related to Task 1 from the data store 56. The first generation module 52 then generates an intermediate output image D' 88 and provides this to the second generation module 90. The second generation module 90 receives instructions 18 related to Task 2 from the input GU I 14 and retrieves the model 54 related to Task 2 from the data store 56. The second generation module 90 carries out Task 2 on the intermediate output image D' 88 to generate output image D 20.
The above-described functionality of the plurality of different generation modules 52, 90 may be implemented in several different ways. For example, the multiple generation modules may have the same structure but be different software instances (running on a single computer) that contain different parameters depending on the task to execute. The generation modules 52, 90 can be pre-stored and available to be implemented as required. Their implementation can be triggered to operate on the current manipulated image based on the instructions being processed. Also whilst the above embodiment has been described with different tasks being sent to different generation modules, all of the instructions 18 may be sent to each of the generation modules 52, 90 and the differentiation of which task is carried out at which generation module 52, 90 maybe determined at the specific generation module 52, 90.
The embodiment illustrated in Figure 12 may also advantageously enable instruction pipelining to take place to improve throughput and efficiency when multiple images are being processed. For example, if the system comprises three generation modules and the user would like to carry out three tasks on three input images 16, Image 1 C, Image 2C and Image 3C. An input Image 1 C may be input to the first generation module to implement Task 1 on Image 1 C. Following the process described in relation to Figure 12, Image 1 C (processed by the first generation module) will then be passed to the second generation module to implement Task 2 on Image 1 C. At this point, Image 2C may simultaneously be input to the first generation module to implement Task 1 on Image 2C. Image 1 C once it has been processed by the second generation module, then is passed on to the third generation module to implement Task 3 on Image 1 C and Image 2C simultaneously is passed on to the second generation module to implement Task 2 on Image 2C. At this point, Image 3C is input to the first generation module to implement Task 1 on Image 3C. Image 2C once it has been processed by the second generation module, then is sent to the third generation module to implement Task 3 on Image 2C and Image 3C simultaneously is passed to the second generation module to implement Task 2 on Image 3C. Finally, Image 3C, once it has been processed by the second generation module, is passed to the third generation module to implement Task 3 on Image 3C. This process is illustrated in the table below: Time Frame Time Frame Time Frame Time Frame Time Frame
1 2 3 4 5
First
Task 1 on Task 1 on Task 1 on
Generation
Image 1 C Image 2C Image 3C
Module
Second
Task 2 on Task 2 on Task 2 on
Generation
Image 1 C Image 2C Image 3C
Module
Third
Task 3 on Task 3 on Task 3 on
Generation
Image 1 C Image 2C Image 3C
Module
The further embodiment therefore enables tasks to be carried out on a plurality of images faster than if only a single task could be carried out on a single image at any one time (sequentially). In this example it can be seen that three images with three tasks can be processed in 5 equal time frames rather than 9 (if carried our sequentially).
Figures 13 and 14 illustrate a first multi-dimensional embodiment of the present invention, which enables a greater range of image processing capabilities to be realised by the system. The first multi-dimensional embodiment of Figures 13 and 14 enables more sophisticated image processing tasks to be carried out using 3D image processing techniques, namely which specify how 3D image processing tasks, which are far more sophisticated than 2D image processing techniques, are carried out on a 2D input image.
Referring now to Figure 13, there is shown a schematic diagram of such an embodiment. All of the features of this embodiment are the same as previously described embodiments except for the differences which are set out below. The design module 50 of the AIRS 2 is trained to learn in terms of spatial properties and receives a 3D dataset 100, as well as the training data set 12 described in relation to previous embodiments, from the administrator system 6. The 3D dataset 100 comprises sequences of images showing a scene where each image is taken from a different angle to teach the system the geometrical properties of objects, their material characterisation and how light illuminates the scene, enabling the AIRS 2 to learn the fundamental properties of the 3D world and render 3D scenes (or 3D representations) 101 from 2D images.
The design module 50 in Figure 13 is also connected to the data store 102 which is configured to store the 2D models 54 relating to specific tasks (described previously) and 3D representations of the models 104 which are generated from the 3D data set 100. This data store 102 is connected to the generation module 52 to provide both of the 2D models 54 and the 3D representations of the 2D models 104 to the generation module 52. A 2D input image C 16 may be retrieved from a local data store 17 of the user system 8 and manipulated at the input GUI 14 to specify 2D image modifications. However, the user system 8 also has a 3D user interface 105, where the user can specify 3D image modifications, namely to generate 3D image manipulation instructions 106. The input GU 1 14 incorporating a 3D user interface 105 as implemented on the user system 8, is connected to the generation module 52 and send not only the input image 16 but also the 2D and 3D instructions 18, 106 to the generation module 52. The output image 108 generated by the generation module 52 is in the form of a 3D scene 101 or a 2D image.
The problem of generating a representation of a scene, given one or more images as an input, in order to render the scene from previous unknown camera angles, has previously been explored (see Reference [3]). In a manner which is different from the existing approaches, the AIRS 2 in this embodiment generalises the process of scene understanding. Given an input image 16 and a manipulation process, the output image 108 may differ not only from a camera point of view (an image manipulation only possible in the 3D domain), but also in terms of lighting, position of the assets or other characteristics, exponentially increasing the degrees of freedom of the image retouching process. This is all possible because any input 2D image can be converted into a 3D representation to allow further 3D image manipulation tasks to be carried out on it as well as the 2D image manipulation tasks. However, the type of image manipulation task to be carried out determines which 3D representation is selected in order to predict what the resultant processed 3D representation (output image) will be.
The "3D representation" 104 (3D representation of a 2D model) can be considered to be a collection of data representing what the system has learned from the training process, in terms of 3D properties. The 3D representation can be applied or not for processing (retouching in an embodiment) the input images. In the case the 3D representation is not used, an input image can be retouched just by using pixel-related properties namely in the 2D domain. If the 3D representation is applied to the 2D input image, it generates a "prediction" of what the 3D scene could be. For example, taking an input image which is a simple image of a cube over a plane; in the 2D domain the input image may just show two objects (the cube and the plane) with possibly some shadows and lighting effects. However, in order for the system to learn what a "cube" and a "plane" are as objects in a 3D world, the system is trained with many images containing those objects. So once the system is trained, it applies its knowledge (the 3D representations) to generate a 3D scene that comprises of a cube and plane. The accuracy of how this 3D scene is related to the 2D input image depends of the training process (how well the system learned). Then this 3D scene can be editable or not, it can be visible to the user or not but it is always calculated as long as the system was enabled to "predict" the 3D representation of that 2D input image.
Figure 14 is a flow chart illustrating an example method of how the first multi-dimensional embodiment, referred to when describing Figure 13, may be implemented.
The method commences with the 3D data set 100 being sent, at Step 602, to the design module 50 by the administrator module 6. As touched on previously, the design module 50 is able to gain its own understanding of the fundamental properties of the 3D world by means of multiple sequences of synthetic or real images, provided by the 3D data set 100, from which the design module 50 learns how to represent spatial concepts. The sequences of images in the 3D data set 100 may show camera movements in different contexts, namely views of a scene from different angles and/or views of a scene with different lighting. For example, the design module may learn the 3D properties contained in the 3D data set 100 in an analogous way that photogrammetry-based techniques generate a 3D mesh of an object from a sequences of images taken around the object. The learning process is based on neural networks that learn the probability distribution of the 3D properties, given the 3D dataset 100. The AIRS 2 may therefore learn in a completely unsupervised way. The design module 50 may acquire a representation of the 3D world in terms of parameters such as perspective, geometry, lighting and/or occlusion. This representation of the 3D world may be optimised to fit in its own representational space that is not manageable by traditional 3D tools. Such representation is the latent space of vectors (mathematical representation) that contain the knowledge acquired by the system in a completely non- assisted manner. Having learned the concept of spatiality and other 3D properties, the system is able to render 3D images from 2D images and generate 3D representations of the 2D models 104. Neural scene representation and rendering is discussed further in Reference [3].
The training data set 12 is also sent, at Step 602, to the design module 50, which learns the pattern between unretouched and retouched images and calculates 2D models 54 associated with each task, as discussed in relation to previous embodiments. If the training data set 12 comprises a mix of retouched and unretouched images in accordance with the embodiment described above, the training data set 12 and 3D data set 100 may be combined to form a single data set which the system uses to learn the concept of retouching and 3D properties.
3D representations of the 2D models 104 associated with each task are determined at the design module 50, using the 3D data set 100, and stored, at Step 604, in the data store 102. For example, if a 2D image manipulation task model adds rain to an image, a 3D representation of the 2D model will add the rain with the concept of depth applied.
The AIRS 2 learns from the training data set 12 and 3D data set 100 to retouch images in such a way that every automated retouching process is consistent with the training set, but this process is executed with much more accuracy and with higher degrees of freedom than in the prior art. This is because the automated retouching process happens not only at pixel level (within the 2D domain), but inside the 3D world (3D domain) artificially generated by the design module through use of the 3D data set 100.
Input image C 16 may be selected, at Step 606, by the user and pulled into the input GU 1 14 where it may be viewed, at Step 608, as a 2D image. The user may request 2D tasks to be carried out, at Step 610, in the form of 2D instructions18. The input image may also be pulled, at Step 608, into the 3D user interface 105, where a user may specify, at Step 610, 3D image manipulation tasks be carried out, in the form of 3D instructions 106. The user may actuate, at Step 612, the 'Run' button on the input GUI 14/or 3D user interface 105 to trigger sending, at Step 614, of the input image 16 and the 2D and 3D instructions 18, 106 to the AIRS 2, and for the AIRS 2 to start the automatic image manipulation process. Examples of 3D tasks that may be carried out include rotating objects, adding lighting that takes into account where shadows should be, moving objects and moving the position of the camera.
Once the generation module 52 has received the input image 16, the 2D and 3D instructions 18, 108, the 2D models 54 and the 3D representations of the 2D models 104, it processes, at Step 616, the instructions and generates, at Step 618, a 3D scene 100 with the image processing tasks requested in the 2D and 3D instructions applied 18, 106. The tasks are applied using the process outlined in Figure 1 1 , with the additional step of using the 3D model representations in place of the 2D models 104 for any 3D tasks to be implemented. The image generated may be conditioned by the training data set 12 to be consistent with the retouched images contained in the training set 12. The output image D' 100 generated by this process is a 'projection' or 'prediction' of the input image in the 3D domain and is also referred to herein as a 3D scene. The 3D scene 100 generated may be the final output of the retouching process, providing a new augmented representation of the subject contained in the original input image. The 3D scene 100 may be optionally converted to a 2D output image.
The 3D scene 100 may be sent, at Step 620, to and displayed in the output GUI 24 and the user may decide whether this image is acceptable, at Step 622, or whether further image modifications should be applied. If further image modifications are required, the method returns to the user specifying, at Step 610, image modifications at the input/output GUI 14, 24 and the process is continued until the image is deemed acceptable, at Step 622. Once the image is acceptable, it is possible to download the image, at Step 624.
Figures 15 and 16 illustrate a second multi-dimensional embodiment of the present invention, which also enables a greater range of image processing capabilities to be realised by the system. The second multi- dimensional embodiment of Figures 15 and 14, also like the first multi-dimensional embodiment described above, enables more sophisticated image processing tasks to be carried out using 3D image processing techniques on a 2D input image. More specifically, the second multi-dimensional embodiment determines how a volumetric 3D scene 101 of the 2D input image 16 may be predicted, the 3D scene manipulated using 3D modelling tools, and the manipulated image output as a 3D scene 101 or converted back to a 2D image that contains all the manipulations executed on the 3D representation 1 10. Various aspects of the scene may be adjusted, such as lighting, the position of objects identified in the scene and the position of the camera.
As with Figure 13, Figure 15 includes the 3D data set 100 and training set 12, which are fed to the design module 50. The 2D models 54 associated with each task and 3D representations of the 2D models 104 are also stored in the data store 102. However, the input image 16 in Figure 15 is directly provided from the input GU I 14, rather than the input GU I 14 and the 3D user interface 105 as is the case in the first multi-dimensional embodiment. In this embodiment, the user selects 2D tasks to be carried out using the input GU I 14 and the corresponding 2D instructions 18 are sent to the generation module 52, along with the 2D image 16. Figure 15 further comprises a 3D tool 1 12 (which may or may not be part of the AIRS 2), and which receives a 3D scene 101 created by the generation module 52 using the 3D data set 100, with the 2D tasks requested by the user applied. The 3D scene 101 may be manipulated further at the 3D tool 1 12 using 3D modelling tools, following the user's instructions retrieved from a data store 1 14 or submitted in real time to the 3D tool 112. The fully manipulated 3D scene 101 may be output as a 3D image or converted to a 2D image 1 10 and output.
Figure 16 is a flow chart illustrating an example of how the second multi-dimensional embodiment, referred to when describing Figure 15, may be implemented. As with Figure 14, the 3D data set 100 and training data set 12 are sent, at Step 702, to the design module 50 by the administrator system 6. 2D models 54 associated with each task are created in the design module 50 using the same process described in relation to previous embodiments. The 2D models 54 associated with each task and 3D representations of the 2D models 104 are stored, at Step 704, in the data store 102. A user selects, at Step 706, an input image C 16 to modify which is uploaded and displayed, at Step 708, at the input GUI 14. The user then specifies, at Step 710, 2D image modifications to be carried out and when indicated, at Step 712, the associated instructions 18 are sent, at Step 714, to the generation module 52, along with the 2D image 16. The generation module 52 carries out, at Step 716, the 2D image processing tasks requested in the user's instructions 18 by applying the 2D models 54 from the data store 102 to the input image C 16 using the process described in relation to Figures 10 and 11 . The manipulated 2D image is then converted into a 3D image using the 3D data set 100, resulting in a 3D scene 101 . This 3D scene 101 is delivered, at Step 718, to the 3D tool 1 12. The user may for example request, at Step 720, modifications to the 3D characteristics of the image, geometrical properties of the scene and lighting in the form of 3D instructions 106. These 3D instructions 106 may be predetermined and stored in a 3D instruction data store 1 14 or requested by the user in real time, for example using a 3D user interface 105. The 3D characteristics of the image to be modified may be selected by manipulating the inferred 3D scene 101 . The instructions 106 are then processed, at Step 722, by the 3D tool 1 12 and outputs, at Step 724, either the 3D output image 108 or a 2D representation of the output image 1 10. If the output image 108, 1 10 is determined, at Step 726, to be acceptable, then the image is saved and made available for download as a processed image. Alternatively if the output image is not acceptable, at Step 726, then the method returns to the image to the 3D tool 1 12 and the user can re-specify, at Step 720, the 3D modifications to be carried out. This process continues until an acceptable output of the 3D tool 1 12 is achieved.
Figure 17 is an example of how the generation module 52 may predict a 3D volumetric scene 101 of a 2D input image C 16, where the input image shows 16 a living room in the daytime. Such a generated 3D scene 101 is produced using the 3D data set 100. The user may manipulate the volumetric 3D scene 101 to generate a new image or this may be automated. The input image shown in Figure 17 has daylight illumination and a set of items of furniture placed in specific locations. The 3D scene 101 may be manipulated to modify some fundamental properties of the 3D scene 101 . In the example shown in Figure 17, the lamp 1 16 on the table to the left of the room and the light on the ceiling are both turned on. The table 1 18 in the centre of the room and the waste paper bin 120 to the right of the left chair 122 have been removed. The lighting outside has also changed, representing a night-time scene. The 3D scene 101 in this example is rendered again to generate a 2D output image 1 10 with the modifications made to the 3D scene 101 applied.
Inferring the representation of a scene from an image allows more sophisticated manipulation. For example, since the AIRS 2 learns how to represent spatial concepts, if the user requests an object to be removed, such as the waste paper bin 120 in Figure 17, the AIRS 2 is able to predict what the chair 122 is likely to look like behind the bin and hence generate a realistic image of the living room without the bin.
In a third multi-dimensional embodiment of the present invention, the user may not separately request 3D image manipulations in the form of 3D instructions via the 3D user interface 105 or by manipulating the 3D scene at the 3D tool 112. Instead, the user may specify which tasks they would like to be carried out using the control sliders 36 at the input GU I 14 and the system applies the 3D representations of the 2D models 104 corresponding to the tasks requested (created using the 3D data set and stored in the data store 102) to the 2D input image, to create the 3D scene 101 . The 3D scene 101 with the tasks carried out may be rendered into a 2D image and output to the output GU I 24.
The generation of a 3D scene 101 from a 2D image may be the final result of the process [4]. In one embodiment illustrated in Figure 18, a stream of input images 130 may be processed in order to generate an animated 3D scene 132 that may be rendered from any camera point of view on a display 134 of any aspect, from flat displays to head-mounted displays. The coherence between frames is learned from the design module 50 during the training process.
Some processes of generating images from a text description have been described in Deep Learning [5]. In one embodiment shown in Figure 19, the selection of characteristics of the image to be modified can be requested through natural language in a natural language input GUI 140 where the query describes how the image may be retouched. In this regard the natural language input GUI 140 has within it areas for a manipulation description 142 which takes the place of the task descriptors 37 and slider controls 36 in the input GUI 14 shown in Figure 3. On a natural language user interface 140, through an image picker, the user may select an input image to be manipulated. In the example, shown in Figure 19, the user selects an input image 16 from the 'browse' icon 144. The selected image is then shown in the viewing window 146. The criteria of manipulation are stipulated through a description made in natural language that may be requested by means of a text input field 148 and previewed in the manipulation description 142. The image 16 may then be sent to the AIRS, along with the instructions 150 detailing which image manipulations are to be carried out, specified by the manipulation description in natural language. Natural language can be part of the training process, so the system is able to learn from previous descriptions how to generate new images that contain the description inserted by the user. The system may be able to collect enough knowledge about a concept through natural language in such a way that it can create images containing the concept, even if the training data 12 does not contain any images of that concept visually represented. For example, the concept of a "river" can be visually generated by the system by learning how water is represented in nature and how a river is verbally described, but no previous knowledge about how a river is visually represented.
The images generated by the system may be photorealistic and may pass a Turing Test that discerns between real and Al-generated images. Such a test can be qualitatively run by human testers, such as the perceptual studies on Amazon Mechanical Turk (AMT) or processed by an automatic system. Herein two automatic approaches are taken into account, without excluding any other method that may be considered equivalent. In the first approach, the generated image is processed by a classifier able to recognise the items contained in the original scene. Under the assumption that the retouch process is not heavily modifying the content, it is expected that the classifier may be able to recognise the same items in the same positions in the retouched scene as well. In other terms, its Inception Score (which helps to identify the similar types of items) should be the same or really close to equality. Alternatively, the image may be the input for a discriminator network of a Generative Adversarial Network (GAN) trained (described previously) with the dataset used for training the system. The discriminator should be not able to distinguish between real images and generated ones. In metrics, the Frechet Inception Distance (equivalent to error) should be minimised.
Any computer hardware compatible with the artificial intelligence system may be used in the above described embodiments.
Furthermore, it will be understood that features, advantages and functionality of the different embodiments described herein may be combined where context allows. In addition, a skilled person will appreciate that the functionality described above may be implemented using various combinations of computer software and computer hardware suitable programmed.
Having described several exemplary embodiments of the present invention and the implementation of different functions of the device in detail, it is to be appreciated that the skilled addressee will readily be able to adapt the basic configuration of the system to carry out described functionality without requiring detailed explanation of how this would be achieved. Therefore, in the present specification several functions of the system have been described in different places without an explanation of the required detailed implementation as this not necessary given the abilities of the skilled addressee. Also it is to be appreciated that the above described embodiments are exemplary only and that modifications will occur to those skilled in the art without departure from the spirit and scope of the present invention.
REFERENCES
The above described embodiments and description have referenced several publications. Each of the following disclosures is hereby Incorporated into the present disclosure by way of reference.
[1 ] isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." arXiv preprint arXiv:161 .07004 (2016).
[2] Goodfeliow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
[3] Eslami. S All, et al. "Neural scene representation and rendering." Science 360.6394 (2018). [4] Rematas, onstantinos, et al. "Soccer on Your Tabletop." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[5] Xu, Tao, et al. "AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks." arXiv preprint (2017).

Claims

Claims:
1 . A computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the method comprising:
Receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters;
For each one of the received plurality of unprocessed images:
Assigning one of the plurality of unprocessed images to an input image to be processed;
Processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by:
Retrieving an image-manipulation model associated with a current user- selected image-manipulation task;
Loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor;
Processing the input image to derive an output image which has been processed according to the current user-determined image-manipulation task; and
Storing the output image as one of the plurality of processed images once all of the user-selected image-manipulation tasks for the input image have been completed.
2. The image-manipulation method of Claim 1 , where in the step of processing each task of the set of user-selected image-manipulation tasks further comprises:
determining if all of the tasks in the set of user-selected image-manipulation tasks have been completed, and if not then:
assigning the output image as a new input image;
selecting any further user-selected image-manipulation task of the set of user- selected image-manipulation tasks which has not been completed; and
repeating the above steps of retrieving, loading and processing until all of the user- selected image-manipulation tasks have been processed for the input image.
3. The image-manipulation method of Claim 1 , further comprising determining if all of the plurality of unprocessed images have been processed and if not then:
assigning another one of the unprocessed images as the input image; and repeating the steps of: processing each task of the set of user-selected image- manipulation tasks on the input image until all of the user-selected image-manipulation tasks have been processed for the input image and then storing the output image.
4. The image-manipulation method of any of Claims 1 to 3, wherein at least one of the user- defined tasks comprises a corresponding plurality of image-manipulation sub tasks and the retrieving step comprises retrieving an image-manipulation model for executing the plurality of corresponding image-manipulation sub tasks.
5. The image-manipulation method of any of Claims 1 to 4, wherein the step of processing the input image comprises processing a plurality of different input images in parallel with each image having a different image-manipulation task being applied to it.
6. The image-manipulation method of any preceding claim, further comprising creating a plurality of image-manipulation models, each model being associated with a respective image- manipulation task and storing the image-manipulation model once created, wherein the creating step for each image-manipulation task comprises applying a plurality of training data sets in a machine- learning process to a model-creation processor having a machine-learning computer architecture.
7. The image-manipulation method of Claim 6, wherein the training data sets comprise a set of unprocessed training images which have not been processed to enhance a specific image manipulation task and a set of processed training images which have been processed to enhance the specific image manipulation task; the set of processed images being unpaired with the set of unprocessed images, and the applying step further comprises the machine-learning process determining characteristics of the set of unprocessed training images and the characteristics of the set of processed training images and using these characteristics to create the image-manipulation model.
8. The image-manipulation method of Claim 7, wherein the set of unprocessed training images and the set of processed training images are provided in a mixed set of training images and are not classified as processed or unprocessed training images prior to the applying step.
9. The image-manipulation method of any of Claims 6 to 8, wherein the creating step comprises:
Initialising operational parameters of the model creation processor;
Applying an error image to the model creation processor and generating a training output image;
Comparing the training output image with a correct image corresponding to the error image; Calculating an error between the training output image and its corresponding correct image; and
Adjusting the operational parameters of the model-creation processor to minimise the error for the next application of an error image.
10. The image-manipulation method of Claim 9, wherein the creating step comprises creating and storing an intermediate version of the image-manipulation model, the intermediate version being created whilst the error is greater than a predetermined threshold.
1 1 . The image-manipulation method of Claim 9 or 10, wherein the creating step comprises repeating the applying, comparing, calculating steps until the error is less than a predetermined threshold.
12. The image-manipulation method of Claim 10 or 1 1 , wherein the storing step comprises storing the operational parameters of the model creation processor as the image-manipulation model or the intermediate version of the image-manipulation model.
13. The image-manipulation method of Claim 12, wherein the processing step further comprises using the intermediate representation versions of the image-manipulation models to process an unprocessed input image and to generate and output an intermediate version of a partially processed image.
14. The image-manipulation method of any of Claims 6 to 13, further comprising standardising each of the plurality of training data sets of images to comply with a predetermined set of image criteria.
15. The image-manipulation method of Claim 14, wherein the standardising step comprises one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
16. The image-manipulation method of any of Claims 1 to 5, further comprising creating a plurality of image-manipulation models, each model being associated with a respective image-manipulation task and storing the image-manipulation model once created, wherein the creating step for each image-manipulation task includes use of a General Adversarial Network to create a probability distribution of a non-linear mapping function, the distribution representing the probability that an input image comes from a set of correct images.
17. The image-manipulation method of any of Claims 1 to 16, further comprising standardising each of the plurality of unprocessed images to comply with a predetermined set of image criteria.
18. The image-manipulation method of Claim 17, wherein the standardising step comprises one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
19. The image-manipulation method of any of Claims 1 to 16, further comprising creating the plurality of user-selected image-manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user-selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a set of user controls for setting the value of each one of a plurality of different image-manipulation tasks.
20. The image-manipulation method of Claim 19, wherein the creating step comprises manipulating at least one of the user controls and viewing an approximated result of the application of the image-manipulation task on the image displayed within the window and subsequently sending the value of the at least one image-manipulation tasks to the processor.
21 . The image-manipulation method of any of Claims 1 to 16, further comprising creating the plurality of user-selected image-manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user-selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a field for expressing image-manipulation tasks to be applied to the image in a natural language describing the image-manipulation tasks.
22. A computer-implemented automatic method of any preceding claim, wherein each input image is a two-dimensional (2D) image and each image-manipulation model is a 2D image- manipulation model, the method further comprising: generating a three-dimensional (3D) image- manipulation model representation of each 2D image-manipulation model from a 3D data set and storing each 3D model representation; wherein the processing step comprises using the 3D model representation to create a corresponding prediction of the 2D processed image in the 3D domain.
23. A computer-implemented automatic method of Claim 22, wherein the using step comprises: using the user-selected image-processing task to select the 3D image-manipulation model representation to be applied to the unprocessed 2D input image to generate the prediction of the 2D processed image in the 3D domain.
24. A computer-implemented automatic method of Claim 22, further comprising receiving one or more user-selected 3D image-manipulation tasks and selecting the 3D image-manipulation model representation corresponding to the 3D image-manipulation task and applying the selected 3D model representation to the 2D input image to generate the prediction of the 2D processed image in the 3D domain.
25. A computer-implemented automatic method of any of Claims 22 to 24, wherein the prediction of the 2D processed image in the 3D domain comprises a 3D processed output image.
26. A computer-implemented automatic method of any of Claims 22 to 24, further comprising providing each the prediction of the 2D processed image in the 3D domain to a 3D image processing tool and further processing the prediction using 3D image-processing instructions of the 3D image processing tool to generate a 3D processed output image.
27. A computer-implemented automatic method of any of Claims 22 to 26, wherein the step of generating a 3D model representation of each 2D image-manipulation model comprises reading a 3D data set comprising a sequence of 2D images of a scene taken from different angles and generating a spatial parameter in each 3D model representation derived from the sequence of 2D images, the spatial parameter.
28. A computer-implemented automatic method of any of Claims 22 to 27, wherein the processing step comprises removing a representation of a 3D object from the 2D image by identifying the object within the 2D input image using one or more 3D model representations and using 3D image-manipulation techniques to remove the object from the 3D image representation.
29. A computer-implemented automatic method of Claim 25, further comprising converting the processed 3D image representation into a 2D processed image and storing the 2D processed image.
30. A computer-implemented automatic method of any of Claims 1 to 29, further comprising determining an inverse function to a user-determined image-manipulation task, storing the same, and using the inverse function to undo any image processing task if required.
31 . A computer-implemented automatic method of image-manipulation for modifying an unprocessed image having variable set of image parameters to a processed image having a target set of image parameters, the method comprising:
Receiving the unprocessed image and a set of user-selected image-manipulation tasks to manipulate the unprocessed image to create the processed image having the target set of image parameters;
Assigning the received unprocessed image to an input image to be processed;
Processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by:
Retrieving an image-manipulation model associated with a current user-selected image-manipulation task;
Loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor;
Processing the input image to derive an output image which has been processed according to the current user-determined image-manipulation task; and
Storing the output image once all of the user-selected image-manipulation tasks for the input image have been completed.
32. A computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed two-dimensional (2D) images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the method comprising:
Receiving the plurality of unprocessed 2D images and a set of user-selected image- manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters;
For each one of the received plurality of unprocessed images:
Assigning one of the plurality of unprocessed images to a 2D input image to be processed;
Processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by:
Retrieving an associated three-dimensional (3D) representation of an image- manipulation model an associated with a current user-selected image-manipulation task; Loading a processor having a machine-learning computer architecture with the retrieved 3D representation of the model and applying the input image to the processor;
Processing the 2D input image to derive a 3D output image which has been processed within a 3D domain according to the current user-determined image- manipulation task; and
Storing the 3D output image as one of the plurality of processed images once all of the user-selected image-manipulation tasks for the input image have been completed.
33. An automated image-manipulation system for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the system comprising:
A receiver for receiving the plurality of unprocessed images and a set of user-selected image- manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters;
A processed image generator for processing each one of the received plurality of unprocessed images and the set of user-selected image-manipulation tasks, the processed image generator comprising:
An instruction processor for sequentially processing each task of the set of user- selected image-manipulation tasks on a selected one of the plurality of unprocessed images;
A model retrieval engine for retrieving image-manipulation models associated with each user-selected image-manipulation task;
A task model processor, having a machine-learning computer architecture, arranged to be loaded sequentially with each of the retrieved image-manipulation models and the selected image and to process the selected image to derive an output image which has been processed sequentially according to the current user-determined image-manipulation tasks; and
A data store for storing the output image as one of the plurality of processed images once all of the user-selected image-manipulation tasks for the selected image have been completed.
34. An automated image-manipulation system of Claim 33, wherein the automated image- manipulation system is a retouching system configured to correct both lighting conditions and irregularities in input images to output images having a common set of lighting conditions and the irregularities removed.
PCT/GB2018/052951 2017-10-12 2018-10-12 Automated image manipulation using artificial intelligence Ceased WO2019073267A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GBGB1716750.3A GB201716750D0 (en) 2017-10-12 2017-10-12 Automated image manipulation using artificial intelligence
GB1716750.3 2017-10-12
GB1814774.4 2018-09-11
GBGB1814774.4A GB201814774D0 (en) 2018-09-11 2018-09-11 Automated image manipulation using artificial intelligence and scene representation

Publications (1)

Publication Number Publication Date
WO2019073267A1 true WO2019073267A1 (en) 2019-04-18

Family

ID=64332339

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2018/052951 Ceased WO2019073267A1 (en) 2017-10-12 2018-10-12 Automated image manipulation using artificial intelligence

Country Status (1)

Country Link
WO (1) WO2019073267A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163991A (en) * 2020-09-11 2021-01-01 杭州缦图摄影有限公司 AI (Artificial intelligence) retouching training network model based on retouching slice
US11436437B2 (en) * 2019-01-31 2022-09-06 Huawei Technologies Co., Ltd. Three-dimension (3D) assisted personalized home object detection
US11531796B2 (en) 2020-08-17 2022-12-20 International Business Machines Corporation Guided design generation
US20230042221A1 (en) * 2021-07-23 2023-02-09 Adobe Inc. Modifying digital images utilizing a language guided image editing model
CN116843583A (en) * 2023-09-01 2023-10-03 荣耀终端有限公司 Image processing method, device, electronic equipment and storage medium
WO2024151411A1 (en) * 2023-01-10 2024-07-18 Sony Interactive Entertainment Inc. Using game state data for semantic understanding by an ai image generation model

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Internet Archive WayBack Machine - XnConvert", 15 September 2017 (2017-09-15), XP055540493, Retrieved from the Internet <URL:https://web.archive.org/web/20170915084653/https://www.xnview.com/en/xnconvert/> [retrieved on 20190110] *
ANONYMOUS: "XnConvert 1.75 - XnView Software", 7 September 2017 (2017-09-07), XP055540797, Retrieved from the Internet <URL:https://newsgroup.xnview.com/viewtopic.php?f=79&t=36221&sid=0877effdd9b47a8cfa194afd296b1b67> [retrieved on 20190111] *
ESLAMI, SM ALI ET AL.: "Neural scene representation and rendering", SCIENCE, vol. 360.6394, 2018
GOODFELLOW, IAN ET AL.: "Generative adversarial nets", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2014
ISOLA, PHILLIP ET AL.: "Image-to-image translation with conditional adversarial networks", ARXIV PREPRINT ARXIV:1611.07004, 2016
JUN-YAN ZHU ET AL: "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 March 2017 (2017-03-30), XP080753281, DOI: 10.1109/ICCV.2017.244 *
REMATAS, KONSTANTINOS ET AL.: "Soccer on Your Tabletop", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2018
WU JIAJUN ET AL: "Neural Scene De-rendering", IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. PROCEEDINGS, IEEE COMPUTER SOCIETY, US, 21 July 2017 (2017-07-21), pages 7035 - 7043, XP033250070, ISSN: 1063-6919, [retrieved on 20171106], DOI: 10.1109/CVPR.2017.744 *
XU, TAO ET AL.: "AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks", XIV PREPRINT, 2017

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11436437B2 (en) * 2019-01-31 2022-09-06 Huawei Technologies Co., Ltd. Three-dimension (3D) assisted personalized home object detection
US11531796B2 (en) 2020-08-17 2022-12-20 International Business Machines Corporation Guided design generation
CN112163991A (en) * 2020-09-11 2021-01-01 杭州缦图摄影有限公司 AI (Artificial intelligence) retouching training network model based on retouching slice
US20230042221A1 (en) * 2021-07-23 2023-02-09 Adobe Inc. Modifying digital images utilizing a language guided image editing model
US12248796B2 (en) * 2021-07-23 2025-03-11 Adobe Inc. Modifying digital images utilizing a language guided image editing model
WO2024151411A1 (en) * 2023-01-10 2024-07-18 Sony Interactive Entertainment Inc. Using game state data for semantic understanding by an ai image generation model
CN116843583A (en) * 2023-09-01 2023-10-03 荣耀终端有限公司 Image processing method, device, electronic equipment and storage medium
CN116843583B (en) * 2023-09-01 2024-05-14 荣耀终端有限公司 Image processing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US12394166B2 (en) Modifying poses of two-dimensional humans in two-dimensional images by reposing three-dimensional human models representing the two-dimensional humans
WO2019073267A1 (en) Automated image manipulation using artificial intelligence
US12175619B2 (en) Generating and visualizing planar surfaces within a three-dimensional space for modifying objects in a two-dimensional editing interface
US20240144520A1 (en) Generating three-dimensional human models representing two-dimensional humans in two-dimensional images
US12469194B2 (en) Generating shadows for placed objects in depth estimated scenes of two-dimensional images
US12482172B2 (en) Generating shadows for objects in two-dimensional images utilizing a plurality of shadow maps
US12210800B2 (en) Modifying digital images using combinations of direct interactions with the digital images and context-informing speech input
GB2623620A (en) Generating shadows for placed objects in depth estimated scenes of two-dimensional images
CN118710781A (en) Facial Expression and Pose Transfer Using End-to-End Machine Learning Model
US20240169630A1 (en) Synthesizing shadows in digital images utilizing diffusion models
US20240135561A1 (en) Modifying digital images via depth-aware object move
JP2023540434A (en) System and method for computer animation using operation order transformation engine
US20240127509A1 (en) Generating scale fields indicating pixel-to-metric distances relationships in digital images via neural networks
CN118710782A (en) Animated Facial Expression and Pose Transfer Using an End-to-End Machine Learning Model
GB2629245A (en) Generating and implementing semantic histories for editing digital images
CN113129347B (en) Self-supervision single-view three-dimensional hairline model reconstruction method and system
US12423855B2 (en) Generating modified two-dimensional images by customizing focal points via three-dimensional representations of the two-dimensional images
CN119399487A (en) Image processing method, model training method and electronic device
AU2024201173A1 (en) Generating three-dimensional human models representing two-dimensional humans in two-dimensional images
GB2628694A (en) Human inpainting utilizing a segmentation branch for generating an infill segmentation map
CN117853612A (en) Generating a modified digital image using a human repair model
Balusa et al. Bridging deep learning & 3D models from 2D images
US12488523B2 (en) Moving objects casting a shadow and generating proxy shadows within a digital image
CN120411294B (en) Intelligent redrawing method and device for clothing texture replacement
CN117576280B (en) Intelligent terminal cloud integrated generation method and system based on 3D digital person

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18804379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18804379

Country of ref document: EP

Kind code of ref document: A1