[go: up one dir, main page]

WO2019073267A1 - Manipulation d'image automatisée à l'aide de l'intelligence artificielle - Google Patents

Manipulation d'image automatisée à l'aide de l'intelligence artificielle Download PDF

Info

Publication number
WO2019073267A1
WO2019073267A1 PCT/GB2018/052951 GB2018052951W WO2019073267A1 WO 2019073267 A1 WO2019073267 A1 WO 2019073267A1 GB 2018052951 W GB2018052951 W GB 2018052951W WO 2019073267 A1 WO2019073267 A1 WO 2019073267A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
manipulation
images
processed
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2018/052951
Other languages
English (en)
Inventor
Marco Marchesi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Happy Finish Ltd
Original Assignee
Happy Finish Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB1716750.3A external-priority patent/GB201716750D0/en
Priority claimed from GBGB1814774.4A external-priority patent/GB201814774D0/en
Application filed by Happy Finish Ltd filed Critical Happy Finish Ltd
Publication of WO2019073267A1 publication Critical patent/WO2019073267A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the invention disclosed herein relates to an automated image manipulation system that uses artificial intelligence.
  • a computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the method comprising: receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; for each one of the received plurality of unprocessed images: assigning one of the plurality of unprocessed images to an input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an image-manipulation model associated with a current user-selected image- manipulation task; loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor; processing the input image to derive an output image which has been processed according to the current user-determined
  • the step of processing each task of the set of user-selected image-manipulation tasks may further comprise: determining if all of the tasks in the set of user-selected image-manipulation tasks have been completed, and if not then: assigning the output image as a new input image; selecting any further user- selected image-manipulation task of the set of user-selected image-manipulation tasks which has not been completed; and repeating the above steps of retrieving, loading and processing until all of the user- selected image-manipulation tasks have been processed for the input image.
  • the image-manipulation method may further comprising determining if all of the plurality of unprocessed images have been processed and if not then: assigning another one of the unprocessed images as the input image; and repeating the steps of: processing each task of the set of user-selected image- manipulation tasks on the input image until all of the user-selected image-manipulation tasks have been processed for the input image and then storing the output image.
  • the at least one of the user-defined tasks may in some embodiments comprise a corresponding plurality of image-manipulation sub tasks and the retrieving step may comprise retrieving an image-manipulation model for executing the plurality of corresponding image-manipulation sub tasks.
  • the step of processing the input image may comprise processing a plurality of different input images in parallel with each image having a different image-manipulation task being applied to it. This advantageously allows a technique called pipelining which effectively increases image processing throughput.
  • the image-manipulation method may further comprise creating a plurality of image-manipulation models, each model being associated with a respective image-manipulation task and storing the image- manipulation model once created, wherein the creating step for each image-manipulation task may comprise applying a plurality of training data sets in a machine-learning process to a model-creation processor having a machine-learning computer architecture.
  • the training data sets may comprise a set of unprocessed training images which have not been processed to enhance a specific image manipulation task and a set of processed training images which have been processed to enhance the specific image manipulation task; the set of processed images being unpaired with the set of unprocessed images, and the applying step may further comprise the machine-learning process determining characteristics of the set of unprocessed training images and the characteristics of the set of processed training images and using these characteristics to create the image-manipulation model.
  • the set of unprocessed training images and the set of processed training images may be provided in a mixed set of training images and are not classified as processed or unprocessed training images prior to the applying step. This advantageously minimises the burden on the user to configure the method prior to its execution.
  • the creating step may comprise: initialising operational parameters of the model creation processor; applying an error image to the model creation processor and generating a training output image; comparing the training output image with a correct image corresponding to the error image; calculating an error between the training output image and its corresponding correct image; and adjusting the operational parameters of the model-creation processor to minimise the error for the next application of an error image.
  • the creating step may comprise creating and storing an intermediate version of the image-manipulation model, the intermediate version being created whilst the error is greater than a predetermined threshold.
  • the intermediate versions of the image-manipulation model can advantageously be provided to a user to show an intermediate stage of a large batch processing process.
  • the creating step may comprise repeating the applying, comparing, calculating steps until the error is less than a predetermined threshold.
  • the storing step may comprise storing the operational parameters of the model creation processor as the image-manipulation model or the intermediate version of the image-manipulation model.
  • the processing step may further comprise using the intermediate representation versions of the image- manipulation models to process an unprocessed input image and to generate and output an intermediate version of a partially processed image.
  • the image-manipulation method may further comprise standardising each of the plurality of training data sets of images to comply with a predetermined set of image criteria.
  • the standardising step may comprise one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
  • the image-manipulation method may further comprise creating a plurality of image-manipulation models, each model being associated with a respective image-manipulation task and storing the image- manipulation model once created, wherein the creating step for each image-manipulation task includes use of a General Adversarial Network to create a probability distribution of a non-linear mapping function, the distribution representing the probability that an input image comes from a set of correct images.
  • the image-manipulation method may further comprise standardising each of the plurality of unprocessed images to comply with a predetermined set of image criteria.
  • the standardising step may comprise one or more of the following image-manipulation tasks: scaling, cropping and image format conversion.
  • the image-manipulation method may further comprise creating the plurality of user-selected image- manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user- selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a set of user controls for setting the value of each one of a plurality of different image-manipulation tasks.
  • the creating step may comprise in some embodiments manipulating at least one of the user controls and viewing an approximated result of the application of the image-manipulation task on the image displayed within the window and subsequently sending the value of the at least one image-manipulation tasks to the processor.
  • the image-manipulation method may further comprise creating the plurality of user-selected image- manipulation tasks for the plurality of unprocessed images, the step of creating the plurality of user- selected image-manipulation tasks being carried out using a Graphical User Interface having a window for displaying one of the unprocessed user selected images and a field for expressing image- manipulation tasks to be applied to the image in a natural language describing the image-manipulation tasks.
  • each input image may be a two-dimensional (2D) image and each image-manipulation model may be a 2D image-manipulation model
  • the method may further comprise: generating a three-dimensional (3D) image-manipulation model representation of each 2D image-manipulation model from a 3D data set and storing each 3D model representation; wherein the processing step comprises using the 3D model representation to create a corresponding prediction of the 2D processed image in the 3D domain.
  • the using step may comprise: using the user-selected image-processing task to select the 3D image- manipulation model representation to be applied to the unprocessed 2D input image to generate the prediction of the 2D processed image in the 3D domain.
  • the method may further comprise receiving one or more user-selected 3D image-manipulation tasks and selecting the 3D image-manipulation model representation corresponding to the 3D image-manipulation task and applying the selected 3D model representation to the 2D input image to generate the prediction of the 2D processed image in the 3D domain.
  • the prediction of the 2D processed image in the 3D domain may comprise a 3D processed output image.
  • the method may further comprise providing each the prediction of the 2D processed image in the 3D domain to a 3D image processing tool and further processing the prediction using 3D image-processing instructions of the 3D image processing tool to generate a 3D processed output image.
  • the step of generating a 3D model representation of each 2D image-manipulation model may comprise reading a 3D data set comprising a sequence of 2D images of a scene taken from different angles and generating a spatial parameter in each 3D model representation derived from the sequence of 2D images, the spatial parameter.
  • the processing step in some embodiments comprises removing a representation of a 3D object from the 2D image by identifying the object within the 2D input image using one or more 3D model representations and using 3D image-manipulation techniques to remove the object from the 3D image representation.
  • the method may further comprise converting the processed 3D image representation into a 2D processed image and storing the 2D processed image.
  • the method may further comprise determining an inverse function to a user-determined image- manipulation task, storing the same, and using the inverse function to undo any image processing task if required.
  • a computer-implemented automatic method of image-manipulation for modifying an unprocessed image having variable set of image parameters to a processed image having a target set of image parameters comprising: receiving the unprocessed image and a set of user-selected image-manipulation tasks to manipulate the unprocessed image to create the processed image having the target set of image parameters; assigning the received unprocessed image to an input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an image-manipulation model associated with a current user-selected image-manipulation task; loading a processor having a machine-learning computer architecture with the retrieved model and applying the input image to the processor; processing the input image to derive an output image which has been processed according to the current user-determined image-manipulation task; and storing the output image once all of the user-selected image-manipulation tasks for the
  • a computer-implemented automatic method of image-manipulation for modifying a plurality of unprocessed two-dimensional (2D) images having different sets of image parameters to a plurality of processed images having a target set of image parameters comprising: receiving the plurality of unprocessed 2D images and a set of user- selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; for each one of the received plurality of unprocessed images: assigning one of the plurality of unprocessed images to a 2D input image to be processed; processing each task of the set of user-selected image-manipulation tasks on the input image sequentially by: retrieving an associated three-dimensional (3D) representation of an image-manipulation model an associated with a current user-selected image-manipulation task; loading a processor having a machine-learning computer architecture with the retrieved 3D representation of the
  • an automated image-manipulation system for modifying a plurality of unprocessed images having different sets of image parameters to a plurality of processed images having a target set of image parameters, the system comprising: a receiver for receiving the plurality of unprocessed images and a set of user-selected image-manipulation tasks to manipulate the unprocessed images to create the plurality of processed images with each processed image having the target set of image parameters; a processed image generator for processing each one of the received plurality of unprocessed images and the set of user-selected image-manipulation tasks, the processed image generator comprising: an instruction processor for sequentially processing each task of the set of user-selected image-manipulation tasks on a selected one of the plurality of unprocessed images; a model retrieval engine for retrieving image-manipulation models associated with each user-selected image-manipulation task; a task model processor, having a machine-learning computer architecture, arranged to be loaded
  • the automated image-manipulation system is a retouching system configured to correct both lighting conditions and irregularities in input images to output images having a common set of lighting conditions and the irregularities removed.
  • Some of the present embodiments can carry out sophisticated image manipulation such as object removal to reveal previously hidden portions of the scene, altering camera angles and adding lighting from lighting sources within the scene that generates appropriate shadows. This is carried out by use of extrapolation of the 2-dimensional image into the 3-dimensional domain before image manipulation is carried out. This involves learning the properties of 3-dimensional objects and rendering a 2-dimensional image into a 3-dimensional scene. Examples of such 3-dimensional image manipulations include removing objects, altering the camera angle to view the scene in the image at a different angle and adding lighting and appropriate shadows. BRIEF DESCRIPTION OF THE DRAWINGS
  • Figure 1 is a schematic block diagram of an automated image manipulation system according to an embodiment of the present invention
  • Figure 2 is a flow chart showing steps involved in an automated image manipulation process using the system of Figure 1 ;
  • Figure 3 is a schematic representation of an input user interface provided on the user system shown in Figure 1 ;
  • Figure 4 is a schematic representation of an output user interface provided on the user system shown in Figure 1 which can be used to specify further changes to be made to the output image;
  • FIG 5 is a schematic block diagram of the Artificial Intelligence based Retouching System (AIRS) of Figure 1 ;
  • Figure 6 is a flow chart detailing the steps involved in the operation of the AIRS of Figure 5;
  • Figure 7 is a schematic block diagram of the design module of the AIRS of Figure 5;
  • Figure 8 is a flow chart detailing the steps involved in the training the AIRS using the design module of Figure 7;
  • Figure 9 is a schematic diagram of exemplary training images used to train the AIRS system of Figure 5;
  • Figure 10 is a schematic block diagram of the generation module of Figure 7;
  • Figure 1 1 is a flow chart detailing steps involved in the operation of the generation module of Figure 7;
  • Figure 12 is a schematic block diagram illustrating how different tasks may be executed by the AIRS in a different embodiment of the present invention
  • Figure 13 is a schematic block diagram illustrating a further embodiment of the present invention which utilises a 3D representation learned from the training dataset and a 3D user interface with the user input interface of Figure 1 ;
  • Figure 14 is a flow chart illustrating the further embodiment of Figure 13 operating to use the 3D representation
  • Figure 15 is a schematic block diagram illustrating a yet further embodiment of the present invention which utilises a 3D representation learned from the 3D dataset and a 3D tool for manipulating the generated 3D scene before output;
  • Figure 16 is a flow chart illustrating the yet further embodiment of Figure 15 operating to use the 3D tool
  • Figure 17 is a series of images illustrating an example of how a 3D scene may be created by AIRS by either of the embodiments of Figures 13 or 15 and manipulated in order to generate a modified version of the original image;
  • Figure 18 is a series of images illustrating an example of how a plurality of sequential input images are processed to generate a 3D scene by either of embodiments of Figures 13 or 15 that may be rendered from any camera point of view;
  • Figure 19 is a schematic representation of a natural language user interface of a final embodiment of the present invention to be used with the AIRS system of Figure 5.
  • One non-limiting example application of the image retouching system is in the area of fashion image retouching.
  • An original image where the colours, lighting, shape of models and assets (apparel) contained in the image have not been modified (unprocessed), may be processed by the system in such a way that the retouching carried out is indistinguishable from retouching carried out by a human artist.
  • the image may contain manipulations that are difficult to be achieved by human artists but are still consistent with the rest of the image.
  • the claimed image retouching system may be applicable to the fields of both photographic image retouching and video retouching.
  • the system includes at its heart an Artificial Intelligence based Retouching System (AIRS) 2, alternatively called a "Deep Retouching" system that is designed as a general solution for making the image retouching process automatic or at the very least partially human- assisted.
  • AVS Artificial Intelligence based Retouching System
  • Deep Retouching a system that is designed as a general solution for making the image retouching process automatic or at the very least partially human- assisted.
  • current image retouching methods require corrections to be carried out mainly manually, typically one image at a time for one image processing task.
  • the AIRS 2 Since image manipulation using the AIRS 2 is automatic (or at least semi-automatic), the AIRS 2 is able to modify multiple digital images at a significantly faster rate than the existing methods.
  • the time taken to manipulate an image decreases from hours/minutes when carried out by a human retoucher (depending on the task) to milliseconds when carried out by the automated system of the present embodiment. Any retouching process may be performed in real-time (namely milliseconds) on specific image processing computer hardware. This increase in speed is due to the fact that a new technique (described below) has been used to enable a sometimes complicated retouching process to be applied automatically using computer systems rather than to the speed of the computer system itself.
  • One set of parameters can be specified (only once) as a target set of tasks to be applied to all images in a group or batch of input images. Receipt of a batch of input images and one set of image processing tasks enables all images in the batch to be standardised to target parameters specified by tasks in a single set of instructions.
  • Examples of possible image enhancements and manipulations carried out by the AIRS 2 include modifications to the colour, light, shape, details and/or texture of an image or aspects of an image, removing and/or re-positioning assets within an image, masking, enhancing details, enhancing sharpness and/or removing imperfection, such that the final processed image is different from the original unprocessed image. This ability to handle so many different complicated aspects of image processing in a single system has not been possible previously.
  • the image manipulation process being carried out by the same system and having the same size image may take the same amount of time for each image, independent of the image content.
  • the time taken to retouch images automatically is therefore more predictable than existing manual methods. This enables the time taken for processing thousands of images to be predictable and can help in determining accurately when any resultant action can take place, for example when a website can go live with a catalogue of processed images.
  • the image manipulation process may be applied to the same image that has different sizes, achieving the same modifications scaled accordingly.
  • Existing methods may require each image to be modified individually, resulting in inconsistencies.
  • any modification that restores or enhances the original image or modifies the aesthetics and composition of the image may be defined as a task, image manipulation, image modification or retouch. These terms may be used interchangeably.
  • FIG. 1 A schematic diagram of the image retouching system according to an embodiment of the present invention is shown in Figure 1 .
  • the system comprises three components: an Artificial Intelligence based Retouching System (AIRS) 2 which includes within it a neural network 4; an administrator system 6 for configuring the AIRS 2 and a user interface with the system - in this embodiment in the form of a user system 8.
  • the AIRS 2 is operatively coupled to the user system 8 via a wide-area communications network 10 such as the Internet.
  • a wide-area communications network 10 such as the Internet.
  • the primary function of the administrator system 6 is to assist in the creation of models which can be used to configure the AIRS 2 to process images.
  • the administrator system 6 includes training datasets 12 comprising sets of training images, A and B stored in a data store 13.
  • the training datasets 12 are stored in a data store as part of the administrator system 6. The training process is discussed in further detail below.
  • the user system 8 comprises an input graphical user interface (GUI) 14, which in use receives one or more input images C 16 from a data store 17.
  • GUI input graphical user interface
  • the input images C 16 may for example be digital images in any colour mode including, but not limited to, Red Green Blue (RGB), Red Green Blue Alpha (RGBA) or Grayscale, before image retouching has taken place.
  • the input GUI 14 may for example be part of a website and be rendered in a browser of the user system 8 or in a window of a mobile or desktop application provided at the user system 8.
  • the manipulation process itself is automatic but the manipulation tasks to be included in the manipulation process (such as light enhancement, colour retouch etc.) may be optionally manually chosen by the user via the input GUI 14.
  • the manipulation tasks to be included in the manipulation process are provided to the AIRS 2, in the form of instructions 18, along with the unprocessed input images C 16, via the communications network 10.
  • the AIRS 2 receives the instructions 18 and input images C 16 and carries out the requested image retouching on the input images C 16.
  • the resultant images are processed output images D 20, which may be stored in a data store 22 of the AIRS 2 and/or output to the user system 8 via the communication network 10 for display in an output GUI 24.
  • the output images 20 are typically digital images.
  • FIG 2 is a flow chart illustrating a method of operation of the image retouching system of Figure 1 .
  • the method commences, at Step 202, with the user selecting one or more images to modify (input images C 16), which are then displayed, at Step 204, in the input GUI 14.
  • the input images C 16 may optionally be normalised (not shown) to meet pre-determined requirements before they are subject to subsequent computational steps.
  • This normalisation may for example include scaling, cropping, image format conversion and/or any other data manipulation. This advantageously can standardise the input images for processing.
  • the user specifies, at Step 206, the image manipulation tasks he/she would like to be applied to the input image C 16 and selects, at Step 208, the option 'Run' 30 provided on the input GUI 14 (described in detail later with reference to Figure 3).
  • the input images C 16 and instructions 18 detailing which image manipulation tasks are to be carried out for a given input image are sent, at Step 210, to the AIRS 2 via the communications network 10.
  • the AIRS 2 receives these instructions 18 and processes them, at Step 212, before the resultant output images D 20 are sent, at Step 214, to the user system 8 and displayed, at Step 216, in the GUI acting now as the output GUI 24, via the communications network 10.
  • the output images D 20 may be downloaded and/or saved, at Step 220. If the output images D 20 are not considered acceptable, at Step 218, the user may specify, at Step 206, further image manipulations be carried out on output images D 20. In this case images D 20 are re-input to the AIRS 2 together with a new set of image- manipulation instructions 18 relating to the image to be reprocessed and the process is carried out until the user considers the resultant output images 20 to be acceptable.
  • the image manipulation tasks to be included in the manipulation process may be determined automatically.
  • the system can optionally require no human interaction except for providing the input images to be modified.
  • the system may be provided with a target image as the desired output. Tasks could then be automatically selected that, when applied to an input image, result in the target image.
  • These tasks may be stored in a data store and applied to a subsequent batch of input images.
  • task parameters may be pre-set and subsequently applied to a series of images. This may for example be useful if the user would like all the images to be re-touched according to the same image specifications.
  • the system does not require human intervention once the task parameters have been set.
  • Figure 3 shows an example of an input GU1 14 of the user system 8, which is used to carry out Steps 202 to 210 of Figure 2.
  • the input GUI 14 has four main interaction areas. The first is a browse button 32 which enables the user to select the image to be pulled in the input GUI 14 for consideration as an image to be processed. The second is a viewing window 34 for displaying the image to be processed which has been selected. The third is a set of slider controls 36, each slider control relating to a different image manipulation task and labelled with a task descriptor 37. Even though the present embodiment shows eight specific image manipulation tasks, it is possible in other embodiments for different types of task to be provided and also a different number of tasks. Finally, an execute control ('Run' button) 30 is provided to trigger the sending of the image 16 and instructions 18 to the AIRS 2.
  • 'Run' button an execute control
  • an input image C 16 to be manipulated.
  • the user selects an input image 16 from the 'browse' icon 32.
  • the selected image is then shown, at Step 204, in the viewing window 34.
  • the user specifies, at Step 206, the image manipulation tasks he or she would like to be applied to the input image C 16 by selecting one or more tasks and choosing the amount of retouch for each task, in this example using two slider controls 36 on the input GUI 14.
  • the user typically selects the amount of retouch task by setting the slider 36 to a percentage of each task to be carried out.
  • the 'correct colour' interactive slider is set to just over 50% and the 'enhance light' interactive slider is set to 100%.
  • the image displayed in the viewing window 34 changes to show approximately what effect the selected manipulation will produce on the image. This is useful to provide feedback to the user to see what effect a given image manipulation parameter will have on the original unprocessed image 16 before the image manipulations are finally selected.
  • the tasks refer to the type of image manipulation to be carried out, including but not limiting to colour correction, masking, light enhancement, detail enhancement, sharpness enhancement, imperfection removal, texture additions and background removal.
  • the number of slider controls 36 corresponds to the number of possible image manipulation tasks available for the image being displayed. Typically these do not vary, but in another embodiment some aspects of the image may not be variable and so the corresponding image manipulation option may not be shown as available for the user to modify. This can be specified in the metadata of the image if required and determined by considering the metadata.
  • the user input in this embodiment takes the form of actuating, at Step 208, the interactive 'Run' button 30, as illustrated in Figure 3.
  • the input image C 16 is sent to the AIRS 2 as an input image 16 at Step 210, along with the instructions 18 detailing the image modifications the user requested in Step 206.
  • the settings can apply to a single image or a batch of images. There may be a single retouch step or a series of retouch steps specified in the instructions 18. In one embodiment, all the modifications of the image may be processed sequentially but indivisibly as one block of work and one final result may be generated.
  • intermediate images 38 can be generated which still represent photorealistic modifications of the original image 16.
  • One of the intermediate images 38 may be the output image following at least one image manipulation task that will undergo at least one further image manipulation task for the process to be complete.
  • Intermediate images 38 are not necessarily visual images. They may for example represent a data file representing the image in an intermediate state of processing.
  • Figure 4 shows an example of the output GUI 24, which concerns Steps 216 to 220 and Steps 206 to 210 (if the user would like further tasks to be carried out in a second iteration) of Figure 2.
  • the processed output image D 20 is displayed in the output GUI 24 and may be downloadable (if being viewed in a browser window).
  • the image shown in Figure 4 is a preview of the result of the manipulation process that can be downloaded or further manipulated.
  • the user may choose to download the output image D 20 by for example activating, at Step 220 the interactive 'download' button 40 or choosing, at Step 218, for the output image D 20 to undergo further image manipulation (refining the retouching process).
  • the output image D 20 may be subject to a further selection of image manipulation tasks via the slider controls 36 with the amount of manipulation of each task being determined by the positions of the slider controls 36 and the subsequent selection of the run button 30.
  • the image is re-input into the AIRS 2 as re-input image 41 , along with the newly specified instructions 42 (additional image manipulation tasks) detailing the secondary image modifications requested by the user.
  • the settings can also apply to a single image or a batch of images.
  • the output image 20 if being viewed in a browser window, but not actually downloaded to the user system 8 can be referenced by an identification number (typically generated by the AIRS 2) and provided to the user system 8. In this case there is no need to re-input the output image again to the AIRS 2 but rather all that is required is to just provide the identification number together with the new image manipulation instructions to the AIRS 2.
  • FIG. 5 shows the Artificial Intelligence based Retouching System (AIRS) 2 of Figure 1 , in greater detail.
  • the AIRS 2 comprises two major components: a design module (or design engine) 50 and generation module (or generation engine) 52.
  • the design module 50 functions to carry out a training process and allows the system to learn a non-linear model 54 that, in one embodiment, maps input data to output data for a given image manipulation task. In another embodiment, the system may map input data to output data and learn a non-linear model for a plurality of image manipulation tasks. This may be considered to be a model for carrying out a plurality of image manipulation sub tasks.
  • the training dataset 12 is provided to the design module 50 via the administrator system 6.
  • Models 54 created by the design module 50 are stored in a data store 56 and used by the generation module 52.
  • the data store 50 may be placed together with the design module 50 in one embodiment or be located separately in portable devices or machine systems in other embodiments.
  • the generation module 52 functions to implement the image manipulation of a received input image 16 using the specified instructions 18 and the trained models 54.
  • the AIRS 2 further comprises a communications module 58 for communicating with the user system 8 via the communications network 10.
  • the generation module 52 in use receives the input image C 16 and instructions 18 via the communications module 58 and generates the output image 20, which is then output via the communications module 58.
  • FIG. 6 is a flow chart illustrating a method of operation of the AIRS 2 of Figure 5 according to an embodiment of the present invention.
  • the method commences with the administrator system 6 receiving, at Step 302, the training data set 12, which may be made up of sets of unpaired or paired digital training images, A and B.
  • each pair contains one image, A, with the original non-modified content and one image, B, with the same content after an image retouch process.
  • Each pair may contain multiple image retouch steps or a single image retouch step.
  • the pairs of images A and B may contain the same composition of items that differ by visible characteristics. For example, images of landscapes before and after a period of rain.
  • the manipulation process can be defined as A to B.
  • B to A training A to B means that A (the training image before image retouching) is considered the input and B (the training image after image retouching) the output, while training B to A means that B is considered the input and A the output.
  • training B to A means learning the function that, given any image after retouching, the image before the retouching process may be predicted.
  • one image may be a representation before a colour correction and the corresponding image pair may show a representation with the same content after a colour correction has been applied.
  • the system can be trained to perform a manipulation in both directions, so the after colour correction images may be inputted and the before colour correction images may be outputted. In this way, the system can learn to predict what colour corrected images would look like without a colour correction.
  • the training data set 12 is still organised in two subsets, before and after retouching, but the two subsets may differ from each other in number and/or content, in a way that they may appear "unpaired", that is for each image A in the dataset, it is not guaranteed that there exists an image B in the dataset that shows the same content after an image manipulation process. Accordingly, the training system learns the characteristics of each group and subsequently the transfer function between the groups rather than specifically the transfer function from one input training image to pair output training image.
  • the training data set is not separated into pairs of images before and after retouching. Instead, the system is able to learn the distinction between retouched and not retouched images from a data set of mixed images (retouched and not retouched) in a completely unsupervised manner. In a similar way to how 3D properties can be learned from a 3D data set without receiving any information about 3D structures (discussed in further detail below), the system can learn the concept of retouched images autonomously without a clear distinction between images.
  • advantageously processed and unprocessed images can be provided as the training set (mixed images) without pairing or grouping (any form of classification) prior to training.
  • a machine-learning architecture (see below) can then create groupings of training images as well as characteristics of each group and subsequently transfer functions between groups all within the weightings and configuration of the machine-learning architecture.
  • the training data set 12 may include any subject or composition in each sample and the system can be trained to apply a retouching process to any new image containing that subject or composition.
  • the training images of the training data set 12 are pre-processed, at Step 304, to meet pre-determined requirements.
  • the pre-processing may for example include, but is not limited to, scaling, cropping, image format conversion and/or any other data manipulation. This advantageously reduces the training time and/or increases the accuracy of the neural network 4 implementing the model 54 as all of the input training data images are made consistent in terms of size and scale such that the neural network model 54 does not have to accommodate these differences.
  • a batch of training images is created, at Step 306.
  • a batch refers to the set of training images processed at one time for a specific image manipulation task.
  • a batch may be a plurality of pairs of training images relating to a specific task such as imperfection removal, colour correction or texture addition.
  • Models 54 are then created, at Step 308, for each task at the design module 50.
  • the models 54 provide the numerical function to be applied to a neural network 4 such that the neural network 4 can process images to implement each task and will be explained in more detail below.
  • the models 54 are stored at Step 310, in the data store 56.
  • the generation module 52 receives, at Step 312, the input image C 16 and instructions 18 pertaining to that image and processes them at Step 314.
  • the output image D 20 is generated, at Step 316, by applying one or more models 54 to the input image C 16 corresponding to the tasks selected by the user, specified by the instructions 18.
  • the output image D 20 may be sent, at Step 318, to the output GUI 24 of the user system 8.
  • FIG 7 schematically illustrates the design module 50 of Figure 5 in greater detail.
  • the design module 50 may be a convolutional neural network (CNN) 4 or an alternative machine-learning computer architecture, which is used to create runtime models 54 for image manipulation tasks.
  • CNN convolutional neural network
  • Each model 54 represents an image manipulation task or plurality of image manipulation sub tasks and may be created via several iterations of training the CNN 4.
  • Intermediate image manipulation task models represent models that are complete themselves but will be integrated with another model or plurality of models such that the final model runs a plurality of image manipulation tasks.
  • intermediate models may be models that are deprecated by a subsequent model.
  • intermediate models of each task may be stored in the data store 56, such that, if imported in the generation module, the image manipulations learned so far can be presented to the user. That is, the intermediate models can give a preview (work in progress) of how much the system has learned so far and these previews can be provided back to the user if required.
  • the design module 50 of this embodiment comprises a batch retrieval engine 62 to retrieve a training data set 12 (images A and B for example) for a particular task from the administration system 6 together with an associated confidence threshold.
  • the batch retrieval engine 62 is in communication with a task manager engine 63 which in turn provides one of the training data set images (image A) to the CNN 4 whilst the other (image B) is provided to a comparator 64.
  • the task manager engine 63 is also responsible for initial configuration of the CNN 4 which typically has randomised weights before the first training image is input.
  • the output of the CNN 4 is also provided to the comparator 64.
  • the output of the comparator 64 is provided to an error calculator 66 which specifies an error between images A and B.
  • the error calculator 66 provides its output to a threshold comparator 68 which compares the error to the confidence threshold received from the task manager engine 63.
  • the output of the threshold comparator 68 is fed to a weight adjuster 69 which feeds back into the CNN 4 to adjust its weights to reduce the error (when the error is above the threshold).
  • the output of the threshold comparator 68 goes to a trained model storage engine 70 which can retrieve the current CNN 4 settings from the CNN 4 and store these in the data store 56 as a final trained model of the current image manipulation task.
  • the trained model storage engine 70 is also coupled to the task manager engine 62 to know which image processing task, the subject of the current CNN model 54, is currently being considered.
  • image A from a pair of training images, A and B relating to a specific task, is processed by the CNN 4 and undergoes a manipulation determined by the weights and biases of the neurons in the CNN 4.
  • the manipulated image A, training output A' is then sent to the comparator 64, where it is compared to image B from the pair of training images, A and B.
  • the difference (error) between image A' and B is calculated using the error calculator 66.
  • the design module 50 determines whether the error is below or above a predetermined threshold using the threshold comparator 68. If the error is above a threshold, the weights and biases of the neurons in the CNNs 4 are updated at the weight adjuster 69 in dependence on the error value.
  • Image A of the second batch of training images is then processed by the CNN 4 and undergoes a manipulation determined by the updated weights and biases of the neurons in the CNN 4. The process continues until the difference (error) between images A' and B is less than the predetermined threshold.
  • the trained model 54 (which corresponds to the weights and biases of the CNN neurons which led to the error between images A' and B being less than the predetermined threshold) is sent to the trained model storage engine 70.
  • This trained model 54 provides the numerical function to apply to a neural network to process input images 16 to implement a specific task.
  • Information relating to which image manipulation task the model 54 applies to may be retrieved from the task manager engine 62 and the values of the weights and biases of the neurons in the CNN 4 which led to the error between images A' and B being less than the predetermined threshold may be retrieved from the CNN 4.
  • the trained model storage engine 70 stores the trained model 54 to the data store 56.
  • the data store 56 may also comprise intermediate models 72 for different tasks and a final trained model 74, which may incorporate all the models 54 involved to carry out every task instructed.
  • the training process in the design module 50 may be made by Machine Learning computational blocks (CNNs) 4 organised as a conditional Generative Adversarial Network.
  • Generative Adversarial Networks represent a way to train a generative model in an unsupervised manner [2]. They consist of two models that play an adversarial game.
  • a generative model G captures the data distribution while a discriminative model D estimates the probability that a sample (in this case an image) came from the training set rather than from the generator G.
  • G and D solve a non-linear mapping function.
  • both G and D are conditioned on some extra data.
  • the extra data may for example comprise an input image.
  • the system is forced to learn a probability distribution that generates images similar to the input image.
  • the function of the CNNs 4 may be carried out using any network, software, hardware and/or algorithm that is able to learn a non-linear mapping function between the input and output images.
  • FIG 8 is a flow chart illustrating an example of how the training process, referred to when describing Figure 7 above, may be carried out.
  • the process starts by initialising, at Step 402, the neurons of the CNNs 4. This may for example involve using random initialisation and setting the weights and biases of the CNN 4 neurons with random values.
  • a new batch of training images is retrieved, at Step 404, relating to a specific task.
  • a pair of training images from the batch, images A and B, are obtained, at Step 406, before image A is subject to a manipulation, at Step 408, determined by the current configuration of the CNN 4 (i.e. the current weights and biases of the CNN neurons).
  • the manipulated image A is generated, at Step 410, and referred to as training output A'.
  • the training output A' is then compared, at Step 412, to training image B at the comparator.
  • the error (difference) between A' and B is then calculated, at Step 414, at the error calculator 66.
  • the error calculated, at Step 414 is then used to adjust/ update, at Step 416, the weights and biases of the neurons in the CNN 4, in order to reduce the error between A' and B the next time Image A is passed through the CNN 4 in the next iteration. If the error is less than a predetermined threshold, the training is complete and at Step 420 the updated neuron weights and biases are stored in the data store of the AIRS 2 as a model 54 for this particular batch and associated task.
  • the method continues by returning, at Step 406, to another round of training namely a second pair of training images is taken from the batch.
  • the current function of the CNN 4 with the updated weights and biases of the neurons is applied to the image A of the second pair of training images, at Step 408, and the process continues until the error between A' and B as determined at Step 416, is less than the predetermined threshold.
  • FIG. 9 An example batch of training images is schematically shown in Figure 9.
  • the batch of training images relates to imperfection removal on t-shirts.
  • Images A and B in the first pair of training images respectively show a crease 76 on the bottom left of a round-neck style t-shirt 78 and the same t- shirt 78 without the crease 76.
  • Images A and B in the second pair of training images respectively show another t-shirt 78 with a larger crease 76 on the top right of the t-shirt 78 and the same t-shirt 78 without the crease 76.
  • the third pair of training images, A and B respectively shows a crease 76 on a V-neck t- shirt 79 and the same V-neck t-shirt 79 without the crease 76.
  • FIG 10 schematically illustrates the generation module 52 of Figure 5 in greater detail.
  • the generation module 52 has an instruction processor 80 for receiving input images to be processed and outputting processed images.
  • Input image C 16 and the user's instructions 18 (relating to which image processing tasks the user would like to be carried out on input image C 16) are input to the instruction processor 80 from the input GUI 14 of the user system 8 and processed images are sent back to the output GUI 24 of the user system 8.
  • the instruction processor 80 separates the instructions 18 from the input image 16 to be processed and sends the image processing tasks to a task selector 82 to which it is connected.
  • the task selector 82 itself is connected to a model retrieval engine 84 and can specify which task model 54 is to be retrieved from the data store 56.
  • the model retrieval engine 84 functions to retrieve the appropriate model 54 from the data store 56 and load it into a task model 86 to which is it connected.
  • the task model 86 is a CNN 4 which is configured by the model 54 and is also connected to the instruction processor 80 such that images to be manipulated (processed) can be sent from the instruction processor 80 to the task model 86 and the output manipulated (processed) images generated can be sent from the task model 86 to the instruction processor 80.
  • the output image 20 may be sent from the instruction processor 80 to the output GUI 24 of the user system .
  • the generation module 52 contains the CNNs responsible for the generation of new images but its functionality can be executed also in the design module 50.
  • Figure 1 1 is a flow chart that illustrates an example of how the image manipulation process may be carried out using the generation module 52 of Figure 10.
  • the process starts by the instruction processor 80 of Figure 10 receiving input image C 16 and its associated instructions 18 at Step 502.
  • the image processing tasks, specified in the instructions 18 are then determined, at Step 504, at the instruction processor 80 of Figure 10 and a first task is selected, at Step 506, as the current task by the task selector 82.
  • the model 54 relating to the current task is then retrieved, at Step 508, from the data store 56 by the model retrieval engine 84 and sent to the task model 86.
  • Image C 16 is also then sent to the task model 86 from the instruction processor 80 for processing.
  • the task model 86 is configured with the model 54 relating to the first task and the configured model is applied, at Step 510, to image C 16 and thereby generates, at Step 512, an intermediate output image D' 88, which is sent to the instruction processor 80.
  • the amount of retouching of each input image 16 as detailed in the instructions 18 may be implemented by applying a normalised weighting to the model 54. For example, if the 'correct colour' interactive slider of Figure 3 is set to 50%, the colour correction model can be set at a weighting of 50%. Similarly, if the enhance light slider control of Figure 3 is set at 100%, the light enhancement model can be set at a full weighting (in the normalised scale of 1). The settings can apply to a single or a batch of images.
  • the system determines, at Step 514, whether any tasks requested in the instructions 18 have not been carried out. If there are any tasks that have not been carried out, the next task is selected from the instructions 18 as the new current task at the task selector 82 and the method of implementing the image processing returns to retrieving, at Step 508, the model 54 for the current task.
  • the image processing model 54 relating to this new task is then retrieved from the model retrieval engine 84, at Step 508, and the model 54 is applied, at Step 510, to intermediate output image D' 88. This process continues until all tasks requested in the instructions 18 have been carried out.
  • the intermediate output image D' 88 becomes, at Step 516, a final output image D 20 and is sent, at Step 518, to the user, for example to the output GUI 24 of the user system 8.
  • the intermediate output images 88 are not necessarily visual images. They may for example represent a data file representing the image in an intermediate state of processing.
  • the same generation module 52 can be reused for processing all the tasks. For example, for each task, the generation module 52 may be loaded with a model 54 related to the current image processing task and the values of the previous model provided to the task model CNN may be overwritten by new ones.
  • the process schematically illustrated in Figure 11 may alternatively be carried out using a plurality of different generation modules 52, each generation module 52 operating on a different task.
  • This enables image processing tasks for different images to be processed in parallel for improved efficiency, with multiple generation modules 52 running at the same time and processing different input images 16 concurrently. All of the features of this embodiment are the same as previously described embodiments except for the differences which are set out below.
  • Figure 12 is a schematic diagram illustrating an example of how this further embodiment is implemented.
  • two tasks are carried out sequentially on an input image C 16, using two generation modules 52, 90.
  • Task 1 is a colour correction image manipulation task and Task 2 is a light enhancement image manipulation task.
  • Input image C 16 and the instructions 18 associated with Task 1 are supplied to the first generation module 52, for example via the input GUI 14.
  • the first generation module 52 retrieves the model 54 related to Task 1 from the data store 56.
  • the first generation module 52 then generates an intermediate output image D' 88 and provides this to the second generation module 90.
  • the second generation module 90 receives instructions 18 related to Task 2 from the input GU I 14 and retrieves the model 54 related to Task 2 from the data store 56.
  • the second generation module 90 carries out Task 2 on the intermediate output image D' 88 to generate output image D 20.
  • the multiple generation modules 52, 90 may have the same structure but be different software instances (running on a single computer) that contain different parameters depending on the task to execute.
  • the generation modules 52, 90 can be pre-stored and available to be implemented as required. Their implementation can be triggered to operate on the current manipulated image based on the instructions being processed.
  • all of the instructions 18 may be sent to each of the generation modules 52, 90 and the differentiation of which task is carried out at which generation module 52, 90 maybe determined at the specific generation module 52, 90.
  • the embodiment illustrated in Figure 12 may also advantageously enable instruction pipelining to take place to improve throughput and efficiency when multiple images are being processed.
  • the system comprises three generation modules and the user would like to carry out three tasks on three input images 16, Image 1 C, Image 2C and Image 3C.
  • An input Image 1 C may be input to the first generation module to implement Task 1 on Image 1 C.
  • Image 1 C (processed by the first generation module) will then be passed to the second generation module to implement Task 2 on Image 1 C.
  • Image 2C may simultaneously be input to the first generation module to implement Task 1 on Image 2C.
  • Image 1 C once it has been processed by the second generation module, then is passed on to the third generation module to implement Task 3 on Image 1 C and Image 2C simultaneously is passed on to the second generation module to implement Task 2 on Image 2C.
  • Image 3C is input to the first generation module to implement Task 1 on Image 3C.
  • Image 2C once it has been processed by the second generation module, then is sent to the third generation module to implement Task 3 on Image 2C and Image 3C simultaneously is passed to the second generation module to implement Task 2 on Image 3C.
  • Image 3C once it has been processed by the second generation module, is passed to the third generation module to implement Task 3 on Image 3C. This process is illustrated in the table below: Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame Time Frame
  • the further embodiment therefore enables tasks to be carried out on a plurality of images faster than if only a single task could be carried out on a single image at any one time (sequentially).
  • three images with three tasks can be processed in 5 equal time frames rather than 9 (if carried our sequentially).
  • Figures 13 and 14 illustrate a first multi-dimensional embodiment of the present invention, which enables a greater range of image processing capabilities to be realised by the system.
  • the first multi-dimensional embodiment of Figures 13 and 14 enables more sophisticated image processing tasks to be carried out using 3D image processing techniques, namely which specify how 3D image processing tasks, which are far more sophisticated than 2D image processing techniques, are carried out on a 2D input image.
  • the design module 50 of the AIRS 2 is trained to learn in terms of spatial properties and receives a 3D dataset 100, as well as the training data set 12 described in relation to previous embodiments, from the administrator system 6.
  • the 3D dataset 100 comprises sequences of images showing a scene where each image is taken from a different angle to teach the system the geometrical properties of objects, their material characterisation and how light illuminates the scene, enabling the AIRS 2 to learn the fundamental properties of the 3D world and render 3D scenes (or 3D representations) 101 from 2D images.
  • the design module 50 in Figure 13 is also connected to the data store 102 which is configured to store the 2D models 54 relating to specific tasks (described previously) and 3D representations of the models 104 which are generated from the 3D data set 100.
  • This data store 102 is connected to the generation module 52 to provide both of the 2D models 54 and the 3D representations of the 2D models 104 to the generation module 52.
  • a 2D input image C 16 may be retrieved from a local data store 17 of the user system 8 and manipulated at the input GUI 14 to specify 2D image modifications.
  • the user system 8 also has a 3D user interface 105, where the user can specify 3D image modifications, namely to generate 3D image manipulation instructions 106.
  • the input GU 1 14 incorporating a 3D user interface 105 as implemented on the user system 8, is connected to the generation module 52 and send not only the input image 16 but also the 2D and 3D instructions 18, 106 to the generation module 52.
  • the output image 108 generated by the generation module 52 is in the form of a 3D scene 101 or a 2D image.
  • the AIRS 2 in this embodiment generalises the process of scene understanding.
  • the output image 108 may differ not only from a camera point of view (an image manipulation only possible in the 3D domain), but also in terms of lighting, position of the assets or other characteristics, exponentially increasing the degrees of freedom of the image retouching process.
  • any input 2D image can be converted into a 3D representation to allow further 3D image manipulation tasks to be carried out on it as well as the 2D image manipulation tasks.
  • the type of image manipulation task to be carried out determines which 3D representation is selected in order to predict what the resultant processed 3D representation (output image) will be.
  • the "3D representation" 104 (3D representation of a 2D model) can be considered to be a collection of data representing what the system has learned from the training process, in terms of 3D properties.
  • the 3D representation can be applied or not for processing (retouching in an embodiment) the input images.
  • an input image can be retouched just by using pixel-related properties namely in the 2D domain. If the 3D representation is applied to the 2D input image, it generates a "prediction" of what the 3D scene could be. For example, taking an input image which is a simple image of a cube over a plane; in the 2D domain the input image may just show two objects (the cube and the plane) with possibly some shadows and lighting effects.
  • the system is trained with many images containing those objects. So once the system is trained, it applies its knowledge (the 3D representations) to generate a 3D scene that comprises of a cube and plane. The accuracy of how this 3D scene is related to the 2D input image depends of the training process (how well the system learned). Then this 3D scene can be editable or not, it can be visible to the user or not but it is always calculated as long as the system was enabled to "predict" the 3D representation of that 2D input image.
  • Figure 14 is a flow chart illustrating an example method of how the first multi-dimensional embodiment, referred to when describing Figure 13, may be implemented.
  • the method commences with the 3D data set 100 being sent, at Step 602, to the design module 50 by the administrator module 6.
  • the design module 50 is able to gain its own understanding of the fundamental properties of the 3D world by means of multiple sequences of synthetic or real images, provided by the 3D data set 100, from which the design module 50 learns how to represent spatial concepts.
  • the sequences of images in the 3D data set 100 may show camera movements in different contexts, namely views of a scene from different angles and/or views of a scene with different lighting.
  • the design module may learn the 3D properties contained in the 3D data set 100 in an analogous way that photogrammetry-based techniques generate a 3D mesh of an object from a sequences of images taken around the object.
  • the learning process is based on neural networks that learn the probability distribution of the 3D properties, given the 3D dataset 100.
  • the AIRS 2 may therefore learn in a completely unsupervised way.
  • the design module 50 may acquire a representation of the 3D world in terms of parameters such as perspective, geometry, lighting and/or occlusion.
  • This representation of the 3D world may be optimised to fit in its own representational space that is not manageable by traditional 3D tools.
  • Such representation is the latent space of vectors (mathematical representation) that contain the knowledge acquired by the system in a completely non- assisted manner. Having learned the concept of spatiality and other 3D properties, the system is able to render 3D images from 2D images and generate 3D representations of the 2D models 104. Neural scene representation and rendering is discussed further in Reference [3].
  • the training data set 12 is also sent, at Step 602, to the design module 50, which learns the pattern between unretouched and retouched images and calculates 2D models 54 associated with each task, as discussed in relation to previous embodiments. If the training data set 12 comprises a mix of retouched and unretouched images in accordance with the embodiment described above, the training data set 12 and 3D data set 100 may be combined to form a single data set which the system uses to learn the concept of retouching and 3D properties.
  • 3D representations of the 2D models 104 associated with each task are determined at the design module 50, using the 3D data set 100, and stored, at Step 604, in the data store 102. For example, if a 2D image manipulation task model adds rain to an image, a 3D representation of the 2D model will add the rain with the concept of depth applied.
  • the AIRS 2 learns from the training data set 12 and 3D data set 100 to retouch images in such a way that every automated retouching process is consistent with the training set, but this process is executed with much more accuracy and with higher degrees of freedom than in the prior art. This is because the automated retouching process happens not only at pixel level (within the 2D domain), but inside the 3D world (3D domain) artificially generated by the design module through use of the 3D data set 100.
  • Input image C 16 may be selected, at Step 606, by the user and pulled into the input GU 1 14 where it may be viewed, at Step 608, as a 2D image.
  • the user may request 2D tasks to be carried out, at Step 610, in the form of 2D instructions18.
  • the input image may also be pulled, at Step 608, into the 3D user interface 105, where a user may specify, at Step 610, 3D image manipulation tasks be carried out, in the form of 3D instructions 106.
  • the user may actuate, at Step 612, the 'Run' button on the input GUI 14/or 3D user interface 105 to trigger sending, at Step 614, of the input image 16 and the 2D and 3D instructions 18, 106 to the AIRS 2, and for the AIRS 2 to start the automatic image manipulation process.
  • 3D tasks include rotating objects, adding lighting that takes into account where shadows should be, moving objects and moving the position of the camera.
  • the generation module 52 processes, at Step 616, the instructions and generates, at Step 618, a 3D scene 100 with the image processing tasks requested in the 2D and 3D instructions applied 18, 106.
  • the tasks are applied using the process outlined in Figure 1 1 , with the additional step of using the 3D model representations in place of the 2D models 104 for any 3D tasks to be implemented.
  • the image generated may be conditioned by the training data set 12 to be consistent with the retouched images contained in the training set 12.
  • the output image D' 100 generated by this process is a 'projection' or 'prediction' of the input image in the 3D domain and is also referred to herein as a 3D scene.
  • the 3D scene 100 generated may be the final output of the retouching process, providing a new augmented representation of the subject contained in the original input image.
  • the 3D scene 100 may be optionally converted to a 2D output image.
  • the 3D scene 100 may be sent, at Step 620, to and displayed in the output GUI 24 and the user may decide whether this image is acceptable, at Step 622, or whether further image modifications should be applied. If further image modifications are required, the method returns to the user specifying, at Step 610, image modifications at the input/output GUI 14, 24 and the process is continued until the image is deemed acceptable, at Step 622. Once the image is acceptable, it is possible to download the image, at Step 624.
  • Figures 15 and 16 illustrate a second multi-dimensional embodiment of the present invention, which also enables a greater range of image processing capabilities to be realised by the system.
  • Various aspects of the scene may be adjusted, such as lighting, the position of objects identified in the scene and the position of the camera.
  • Figure 15 includes the 3D data set 100 and training set 12, which are fed to the design module 50.
  • the 2D models 54 associated with each task and 3D representations of the 2D models 104 are also stored in the data store 102.
  • the input image 16 in Figure 15 is directly provided from the input GU I 14, rather than the input GU I 14 and the 3D user interface 105 as is the case in the first multi-dimensional embodiment.
  • the user selects 2D tasks to be carried out using the input GU I 14 and the corresponding 2D instructions 18 are sent to the generation module 52, along with the 2D image 16.
  • Figure 15 further comprises a 3D tool 1 12 (which may or may not be part of the AIRS 2), and which receives a 3D scene 101 created by the generation module 52 using the 3D data set 100, with the 2D tasks requested by the user applied.
  • the 3D scene 101 may be manipulated further at the 3D tool 1 12 using 3D modelling tools, following the user's instructions retrieved from a data store 1 14 or submitted in real time to the 3D tool 112.
  • the fully manipulated 3D scene 101 may be output as a 3D image or converted to a 2D image 1 10 and output.
  • Figure 16 is a flow chart illustrating an example of how the second multi-dimensional embodiment, referred to when describing Figure 15, may be implemented.
  • the 3D data set 100 and training data set 12 are sent, at Step 702, to the design module 50 by the administrator system 6.
  • 2D models 54 associated with each task are created in the design module 50 using the same process described in relation to previous embodiments.
  • the 2D models 54 associated with each task and 3D representations of the 2D models 104 are stored, at Step 704, in the data store 102.
  • a user selects, at Step 706, an input image C 16 to modify which is uploaded and displayed, at Step 708, at the input GUI 14.
  • the user then specifies, at Step 710, 2D image modifications to be carried out and when indicated, at Step 712, the associated instructions 18 are sent, at Step 714, to the generation module 52, along with the 2D image 16.
  • the generation module 52 carries out, at Step 716, the 2D image processing tasks requested in the user's instructions 18 by applying the 2D models 54 from the data store 102 to the input image C 16 using the process described in relation to Figures 10 and 11 .
  • the manipulated 2D image is then converted into a 3D image using the 3D data set 100, resulting in a 3D scene 101 .
  • This 3D scene 101 is delivered, at Step 718, to the 3D tool 1 12.
  • the user may for example request, at Step 720, modifications to the 3D characteristics of the image, geometrical properties of the scene and lighting in the form of 3D instructions 106.
  • These 3D instructions 106 may be predetermined and stored in a 3D instruction data store 1 14 or requested by the user in real time, for example using a 3D user interface 105.
  • the 3D characteristics of the image to be modified may be selected by manipulating the inferred 3D scene 101 .
  • the instructions 106 are then processed, at Step 722, by the 3D tool 1 12 and outputs, at Step 724, either the 3D output image 108 or a 2D representation of the output image 1 10.
  • Step 726 If the output image 108, 1 10 is determined, at Step 726, to be acceptable, then the image is saved and made available for download as a processed image. Alternatively if the output image is not acceptable, at Step 726, then the method returns to the image to the 3D tool 1 12 and the user can re-specify, at Step 720, the 3D modifications to be carried out. This process continues until an acceptable output of the 3D tool 1 12 is achieved.
  • Figure 17 is an example of how the generation module 52 may predict a 3D volumetric scene 101 of a 2D input image C 16, where the input image shows 16 a living room in the daytime.
  • a generated 3D scene 101 is produced using the 3D data set 100.
  • the user may manipulate the volumetric 3D scene 101 to generate a new image or this may be automated.
  • the input image shown in Figure 17 has daylight illumination and a set of items of furniture placed in specific locations.
  • the 3D scene 101 may be manipulated to modify some fundamental properties of the 3D scene 101 .
  • the lamp 1 16 on the table to the left of the room and the light on the ceiling are both turned on.
  • the table 1 18 in the centre of the room and the waste paper bin 120 to the right of the left chair 122 have been removed.
  • the lighting outside has also changed, representing a night-time scene.
  • the 3D scene 101 in this example is rendered again to generate a 2D output image 1 10 with the modifications made to the 3D scene 101 applied.
  • the representation of a scene from an image allows more sophisticated manipulation. For example, since the AIRS 2 learns how to represent spatial concepts, if the user requests an object to be removed, such as the waste paper bin 120 in Figure 17, the AIRS 2 is able to predict what the chair 122 is likely to look like behind the bin and hence generate a realistic image of the living room without the bin.
  • the AIRS 2 learns how to represent spatial concepts, if the user requests an object to be removed, such as the waste paper bin 120 in Figure 17, the AIRS 2 is able to predict what the chair 122 is likely to look like behind the bin and hence generate a realistic image of the living room without the bin.
  • the user may not separately request 3D image manipulations in the form of 3D instructions via the 3D user interface 105 or by manipulating the 3D scene at the 3D tool 112. Instead, the user may specify which tasks they would like to be carried out using the control sliders 36 at the input GU I 14 and the system applies the 3D representations of the 2D models 104 corresponding to the tasks requested (created using the 3D data set and stored in the data store 102) to the 2D input image, to create the 3D scene 101 .
  • the 3D scene 101 with the tasks carried out may be rendered into a 2D image and output to the output GU I 24.
  • a stream of input images 130 may be processed in order to generate an animated 3D scene 132 that may be rendered from any camera point of view on a display 134 of any aspect, from flat displays to head-mounted displays.
  • the coherence between frames is learned from the design module 50 during the training process.
  • the selection of characteristics of the image to be modified can be requested through natural language in a natural language input GUI 140 where the query describes how the image may be retouched.
  • the natural language input GUI 140 has within it areas for a manipulation description 142 which takes the place of the task descriptors 37 and slider controls 36 in the input GUI 14 shown in Figure 3.
  • the user may select an input image to be manipulated.
  • the user selects an input image 16 from the 'browse' icon 144. The selected image is then shown in the viewing window 146.
  • the criteria of manipulation are stipulated through a description made in natural language that may be requested by means of a text input field 148 and previewed in the manipulation description 142.
  • the image 16 may then be sent to the AIRS, along with the instructions 150 detailing which image manipulations are to be carried out, specified by the manipulation description in natural language.
  • Natural language can be part of the training process, so the system is able to learn from previous descriptions how to generate new images that contain the description inserted by the user.
  • the system may be able to collect enough knowledge about a concept through natural language in such a way that it can create images containing the concept, even if the training data 12 does not contain any images of that concept visually represented.
  • the concept of a "river" can be visually generated by the system by learning how water is represented in nature and how a river is verbally described, but no previous knowledge about how a river is visually represented.
  • the images generated by the system may be photorealistic and may pass a Turing Test that discerns between real and Al-generated images.
  • a Turing Test can be qualitatively run by human testers, such as the perceptual studies on Amazon Mechanical Turk (AMT) or processed by an automatic system.
  • AMT Amazon Mechanical Turk
  • two automatic approaches are taken into account, without excluding any other method that may be considered equivalent.
  • the generated image is processed by a classifier able to recognise the items contained in the original scene. Under the assumption that the retouch process is not heavily modifying the content, it is expected that the classifier may be able to recognise the same items in the same positions in the retouched scene as well.
  • its Inception Score (which helps to identify the similar types of items) should be the same or really close to equality.
  • the image may be the input for a discriminator network of a Generative Adversarial Network (GAN) trained (described previously) with the dataset used for training the system.
  • GAN Generative Adversarial Network
  • the discriminator should be not able to distinguish between real images and generated ones. In metrics, the Frechet Inception Distance (equivalent to error) should be minimised.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne un procédé automatique, mis en œuvre par ordinateur, de manipulation d'image pour modifier une pluralité d'images non traitées ayant différents ensembles de paramètres d'image en une pluralité d'images traitées ayant un ensemble cible de paramètres d'image. Le procédé consiste à : recevoir la pluralité d'images non traitées et un ensemble de tâches de manipulation d'image sélectionnées par un utilisateur pour manipuler les images non traitées pour créer la pluralité d'images traitées, chaque image traitée ayant l'ensemble cible de paramètres d'image ; pour chaque image de la pluralité d'images non traitées reçue : attribuer une image parmi la pluralité d'images non traitées à une image d'entrée à traiter ; traiter chaque tâche de l'ensemble de tâches de manipulation d'image sélectionnées par l'utilisateur sur l'image d'entrée séquentiellement en : récupérant un modèle de manipulation d'image associé à une tâche actuelle de manipulation d'image sélectionnée par l'utilisateur ; charger un processeur ayant une architecture informatique d'apprentissage automatique avec le modèle récupéré et appliquer l'image d'entrée au processeur ; traiter l'image d'entrée pour obtenir une image de sortie qui a été traitée selon la tâche de manipulation d'image déterminée par l'utilisateur actuelle ; et stocker l'image de sortie sous la forme d'une image parmi la pluralité d'images traitées une fois que toutes les tâches de manipulation d'image sélectionnées par l'utilisateur pour l'image d'entrée ont été achevées. Le présent procédé peut également être utilisé pour traduire des images bidimensionnelles dans le domaine tridimensionnel pour un traitement d'image amélioré.
PCT/GB2018/052951 2017-10-12 2018-10-12 Manipulation d'image automatisée à l'aide de l'intelligence artificielle Ceased WO2019073267A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GBGB1716750.3A GB201716750D0 (en) 2017-10-12 2017-10-12 Automated image manipulation using artificial intelligence
GB1716750.3 2017-10-12
GB1814774.4 2018-09-11
GBGB1814774.4A GB201814774D0 (en) 2018-09-11 2018-09-11 Automated image manipulation using artificial intelligence and scene representation

Publications (1)

Publication Number Publication Date
WO2019073267A1 true WO2019073267A1 (fr) 2019-04-18

Family

ID=64332339

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2018/052951 Ceased WO2019073267A1 (fr) 2017-10-12 2018-10-12 Manipulation d'image automatisée à l'aide de l'intelligence artificielle

Country Status (1)

Country Link
WO (1) WO2019073267A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163991A (zh) * 2020-09-11 2021-01-01 杭州缦图摄影有限公司 基于修图切片的ai修图训练网络模型
US11436437B2 (en) * 2019-01-31 2022-09-06 Huawei Technologies Co., Ltd. Three-dimension (3D) assisted personalized home object detection
US11531796B2 (en) 2020-08-17 2022-12-20 International Business Machines Corporation Guided design generation
US20230042221A1 (en) * 2021-07-23 2023-02-09 Adobe Inc. Modifying digital images utilizing a language guided image editing model
CN116843583A (zh) * 2023-09-01 2023-10-03 荣耀终端有限公司 图像处理方法、装置、电子设备及存储介质
WO2024151411A1 (fr) * 2023-01-10 2024-07-18 Sony Interactive Entertainment Inc. Utilisation de données d'état de jeu pour une compréhension sémantique par un modèle de génération d'image par ia

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Internet Archive WayBack Machine - XnConvert", 15 September 2017 (2017-09-15), XP055540493, Retrieved from the Internet <URL:https://web.archive.org/web/20170915084653/https://www.xnview.com/en/xnconvert/> [retrieved on 20190110] *
ANONYMOUS: "XnConvert 1.75 - XnView Software", 7 September 2017 (2017-09-07), XP055540797, Retrieved from the Internet <URL:https://newsgroup.xnview.com/viewtopic.php?f=79&t=36221&sid=0877effdd9b47a8cfa194afd296b1b67> [retrieved on 20190111] *
ESLAMI, SM ALI ET AL.: "Neural scene representation and rendering", SCIENCE, vol. 360.6394, 2018
GOODFELLOW, IAN ET AL.: "Generative adversarial nets", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2014
ISOLA, PHILLIP ET AL.: "Image-to-image translation with conditional adversarial networks", ARXIV PREPRINT ARXIV:1611.07004, 2016
JUN-YAN ZHU ET AL: "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 March 2017 (2017-03-30), XP080753281, DOI: 10.1109/ICCV.2017.244 *
REMATAS, KONSTANTINOS ET AL.: "Soccer on Your Tabletop", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2018
WU JIAJUN ET AL: "Neural Scene De-rendering", IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. PROCEEDINGS, IEEE COMPUTER SOCIETY, US, 21 July 2017 (2017-07-21), pages 7035 - 7043, XP033250070, ISSN: 1063-6919, [retrieved on 20171106], DOI: 10.1109/CVPR.2017.744 *
XU, TAO ET AL.: "AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks", XIV PREPRINT, 2017

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11436437B2 (en) * 2019-01-31 2022-09-06 Huawei Technologies Co., Ltd. Three-dimension (3D) assisted personalized home object detection
US11531796B2 (en) 2020-08-17 2022-12-20 International Business Machines Corporation Guided design generation
CN112163991A (zh) * 2020-09-11 2021-01-01 杭州缦图摄影有限公司 基于修图切片的ai修图训练网络模型
US20230042221A1 (en) * 2021-07-23 2023-02-09 Adobe Inc. Modifying digital images utilizing a language guided image editing model
US12248796B2 (en) * 2021-07-23 2025-03-11 Adobe Inc. Modifying digital images utilizing a language guided image editing model
WO2024151411A1 (fr) * 2023-01-10 2024-07-18 Sony Interactive Entertainment Inc. Utilisation de données d'état de jeu pour une compréhension sémantique par un modèle de génération d'image par ia
CN116843583A (zh) * 2023-09-01 2023-10-03 荣耀终端有限公司 图像处理方法、装置、电子设备及存储介质
CN116843583B (zh) * 2023-09-01 2024-05-14 荣耀终端有限公司 图像处理方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
US12394166B2 (en) Modifying poses of two-dimensional humans in two-dimensional images by reposing three-dimensional human models representing the two-dimensional humans
WO2019073267A1 (fr) Manipulation d&#39;image automatisée à l&#39;aide de l&#39;intelligence artificielle
US12175619B2 (en) Generating and visualizing planar surfaces within a three-dimensional space for modifying objects in a two-dimensional editing interface
US20240144520A1 (en) Generating three-dimensional human models representing two-dimensional humans in two-dimensional images
US12469194B2 (en) Generating shadows for placed objects in depth estimated scenes of two-dimensional images
US12482172B2 (en) Generating shadows for objects in two-dimensional images utilizing a plurality of shadow maps
US12210800B2 (en) Modifying digital images using combinations of direct interactions with the digital images and context-informing speech input
GB2623620A (en) Generating shadows for placed objects in depth estimated scenes of two-dimensional images
CN118710781A (zh) 利用端到端机器学习模型的面部表情和姿势转移
US20240169630A1 (en) Synthesizing shadows in digital images utilizing diffusion models
US20240135561A1 (en) Modifying digital images via depth-aware object move
JP2023540434A (ja) 操作順序変形エンジンを用いたコンピュータアニメーションのためのシステム及び方法
US20240127509A1 (en) Generating scale fields indicating pixel-to-metric distances relationships in digital images via neural networks
CN118710782A (zh) 利用端到端机器学习模型的动画面部表情和姿势转移
GB2629245A (en) Generating and implementing semantic histories for editing digital images
CN113129347B (zh) 一种自监督单视图三维发丝模型重建方法及系统
US12423855B2 (en) Generating modified two-dimensional images by customizing focal points via three-dimensional representations of the two-dimensional images
CN119399487A (zh) 图像处理方法、模型训练方法以及电子设备
AU2024201173A1 (en) Generating three-dimensional human models representing two-dimensional humans in two-dimensional images
GB2628694A (en) Human inpainting utilizing a segmentation branch for generating an infill segmentation map
CN117853612A (zh) 利用人类修复模型生成经修改的数字图像
Balusa et al. Bridging deep learning & 3D models from 2D images
US12488523B2 (en) Moving objects casting a shadow and generating proxy shadows within a digital image
CN120411294B (zh) 一种用于服饰纹理替换的智能重绘方法及装置
CN117576280B (zh) 一种基于3d数字人的智能端云一体化生成方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18804379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18804379

Country of ref document: EP

Kind code of ref document: A1